quotechar : string
The character to used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
The default value is
". An example:
In : import pandas as pd In : from StringIO import StringIO In : s="""year, city, value ...: 2012, "Louisville KY", 3.5 ...: 2011, "Lexington, KY", 4.0""" In : pd.read_csv(StringIO(s), quotechar='"', skipinitialspace=True) Out: year city value 0 2012 Louisville KY 3.5 1 2011 Lexington, KY 4.0
The trick here is that you also have to use
skipinitialspace=True to deal with the spaces after the comma-delimiter.
Apart from a powerful csv reader, I can also strongly advice to use pandas with the heterogeneous data you have (the example output in numpy you give are all strings, although you could use structured arrays).
The problem with the additional comma,
np.genfromtxt does not deal with that.
One simple solution is to read the file with
csv.reader() from python's csv module into a list and then dump it into a numpy array if you like.
If you really want to use
np.genfromtxt, note that it can take iterators instead of files, e.g.
np.genfromtxt(my_iterator, ...). So, you can wrap a
csv.reader in an iterator and give it to
That would go something like this:
import csv import numpy as np np.genfromtxt(("\t".join(i) for i in csv.reader(open('myfile.csv'))), delimiter="\t")
This essentially replaces on-the-fly only the appropriate commas with tabs.
If you are using a numpy you probably want to work with numpy.ndarray. This will give you a numpy.ndarray:
import pandas data = pandas.read_csv('file.csv').as_matrix()
Pandas will handle the "Lexington, KY" case correctly