Skip final row when reading a CSV with Pandas

In this article we elaborate on the multiple ways to remove the final line of a CSV in Python when loading it with Pandas’ read_csv function.

Remove final row(s) after loading the file

Removing the final row from a Pandas DataFrame can be done with a simple slice. The following lines of code provide four ways to remove the final line from a DataFrame: by slicing, using iloc(), using head() and using drop().

import pandas as pd

df = pd.read_csv(...)

df[:-1]
df.iloc[:-1,:]
df.head(-1)
df.drop(df.index[len(df) - 1])

Remove final rows while parsing the file

But what if you want to read a CSV without reading the last line into memory altogether? That’s where read_csv()‘s skipfooter comes into play. Here’s how the documentation describes this parameter:

skipfooter : int, default 0
Number of lines at bottom of file to skip (Unsupported with engine=’c’).

As you can see from the description, skipping the last row of a CSV is unsupported when you’d like to parse the file using the C engine. It is faster, but has less features — e.g. it doesn’t support skipfooter.

df = pd.read_csv(..., skipfooter = 1)

Alternatives

The Python engine is really a lot slower. So, what can you do when using the Python engine makes loading the files extremely slow?

Improve loading speed by (1) specifying the dtypes in advance using dtype, (2) specify the headers using header, (3) specify columns using usecols.
If you know the number of rows within the file, you can use the nrows parameter and simply subtract the number of rows you don’t want to read in.
Split the file in chunks, and apply the skipfooter parameter only on the last file.
Load the full file and use one of the methods described in the first section from this article.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Skip final row when reading a CSV with Pandas

Remove final row(s) after loading the file

Remove final rows while parsing the file

Alternatives

Say thanks, ask questions or give feedback

1 thought on “Skip final row when reading a CSV with Pandas”

Leave a Reply Cancel reply

Skip final row when reading a CSV with Pandas

Remove final row(s) after loading the file

Remove final rows while parsing the file

Alternatives

Say thanks, ask questions or give feedback

1 thought on “Skip final row when reading a CSV with Pandas”

Leave a Reply Cancel reply

Related Posts

How to do a SUMIF in PySpark

Check if Python logger already exists

Spark 3.0: Solving the “dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z” error