Apparently, this is something that many (even experienced) data scientists still google. Sometimes you’re dealing with a comma-separated value file that has no header. In this blog post I explain how to deal with this when you’re loading these files with pandas in Python.
The read_csv function in pandas is quite powerful. Compared to many other CSV-loading functions in Python and R, it offers many out-of-the-box parameters to clean the data while loading it.
When you’re dealing with a file that has no header, you can simply set the following parameter to None.
pd.read_csv('file.csv', header = None)
Yet, what’s even better, is that while you have no column names at hand, you can specify them manually, by passing a list to the names parameter.
pd.read_csv('file.csv', header = None, names = ['Column 1', 'Column 2', 'Column 3'])
However, we’re not very efficient in the example above. Did you know that you can simply pass a prefix, and the columns will be numbers automatically?
pd.read_csv('file.csv', header = None, prefix = 'Column ')
In huge CSV files, it’s often beneficial to only load specific columns into memory. In most situations, you’d pass a list of column names to the usecols parameter, yet it can also process a list of integers. To get the first and the third column, this is how you’d do it. Remember that Python uses zero-based indexing.
pd.read_csv('file.csv', header = None, usecols = [0, 2], names = ['Column 1', 'Column 3'])
By the way, I didn’t necessarily come up with this solution myself. Although I’m grateful you’ve visited this blog post, you should know I get a lot from websites like StackOverflow and I have a lot of coding books. This one by Matt Harrison (on Pandas 1.x!) has been updated in 2020 and is an absolute primer on Pandas basics. If you want something broad, ranging from data wrangling to machine learning, try “Mastering Pandas” by Stefanie Molin.
Great success!
Thanks
Glad it worked!
Αw, this waѕ a reaⅼly nice post. Finding the time
and actuaⅼ effort to create a superb article… but whɑt can I ѕay… I hesitate a lot and never seem to get anything done.
GlucoTrust is a revolutionary blood sugar support solution that eliminates the underlying causes of type 2 diabetes and associated health risks.
La mejor aplicación de control parental para proteger a sus hijos – monitoriza en secreto GPS, SMS, llamadas, WhatsApp, Facebook, ubicación. Puede monitorear de forma remota las actividades del teléfono móvil después de descargar e instalar apk en el teléfono de destino.