In this blog post we’ll do a small thing to remove trailing or leading whitespaces from a DataFrame’s column names. This is something that can occur when working with files that haven’t been properly formatted. There are multiple things you can do to fix this.
First, let’s create some dummy data. You can see that the first column has a trailing whitespace, while the second one has a leading whitespace.
df = pd.DataFrame( { 'POKEMON ': ['Bulbasaur', 'Charmander', 'Squirtle'], ' TYPE': ['Leaf', 'Fire', 'Water'] } )
You can check this by printing the columns:
print(df.columns) # Returns Index(['POKEMON ', ' TYPE'], dtype='object')
If you have hundreds of columns, you can use the following code to check which columns contain a whitespace.
[x for x in df.columns if x.endswith(' ') or x.startswith(' ')]
To strip whitespaces from column names, you can use str.strip, str.lstrip and str.rstrip. They are Series methods that can also be used on Indexes.
df.columns = df.columns.str.strip() # Leading and trailing df.columns = df.columns.str.lstrip() # Leading only df.columns = df.columns.str.rstrip() # Trailing only df = df.rename(columns=lambda x: x.strip()) # Slower alternative
Great success!
By the way, I didn’t necessarily come up with this solution myself. Although I’m grateful you’ve visited this blog post, you should know I get a lot from websites like StackOverflow and I have a lot of coding books. This one by Matt Harrison (on Pandas 1.x!) has been updated in 2020 and is an absolute primer on Pandas basics. If you want something broad, ranging from data wrangling to machine learning, try “Mastering Pandas” by Stefanie Molin.