In this article we elaborate on the multiple ways to remove the final line of a CSV in Python when loading it with Pandas’ read_csv function.
Remove final row(s) after loading the file
Removing the final row from a Pandas DataFrame can be done with a simple slice. The following lines of code provide four ways to remove the final line from a DataFrame: by slicing, using iloc(), using head() and using drop().
import pandas as pd df = pd.read_csv(...) df[:-1] df.iloc[:-1,:] df.head(-1) df.drop(df.index[len(df) - 1])
Remove final rows while parsing the file
But what if you want to read a CSV without reading the last line into memory altogether? That’s where read_csv()‘s skipfooter comes into play. Here’s how the documentation describes this parameter:
skipfooter : int, default 0
Number of lines at bottom of file to skip (Unsupported with engine=’c’).
As you can see from the description, skipping the last row of a CSV is unsupported when you’d like to parse the file using the C engine. It is faster, but has less features — e.g. it doesn’t support skipfooter.
df = pd.read_csv(..., skipfooter = 1)
Alternatives
The Python engine is really a lot slower. So, what can you do when using the Python engine makes loading the files extremely slow?
- Improve loading speed by (1) specifying the dtypes in advance using dtype, (2) specify the headers using header, (3) specify columns using usecols.
- If you know the number of rows within the file, you can use the nrows parameter and simply subtract the number of rows you don’t want to read in.
- Split the file in chunks, and apply the skipfooter parameter only on the last file.
- Load the full file and use one of the methods described in the first section from this article.
thanks