I was curious how many ways there are to calculate the mode of a 1-D numpy array in Python. Apparently, quite a lot. Although all roads lead to Rome, some will take you there faster. That goes for many things in computer science: also for this seemingly trivial question.
Let’s load the packages I will be using throughout this blog post, and create a simple array with dummy data where the mode is clearly 1.
import numpy as np from scipy import stats from collections import Counter a = np.array([1,2,3,4,1,3,1,5,1,6])
What triggered me to go on this quest to find the fastest mode function, was that SciPy seemed to be extremely slow at it. The mean was 112 µs for this array, but it was more than a couple of second on my real data set.
Okay, 112 µs. Can we do faster? Sure we can! SciPy also has a find_repeats function, which checks which values of the array occur more than once, it also provided the count and orders it. This is way faster: 22 µs. That’s odd, because the values of the array are cast to float and I manually recast them to int.
NumPy’s unique() & argmax()
I created a lambda function that takes the unique values and their respective counts of an array. It takes the argmax() of the counts, and uses the returned value as index for the values. Surprisingly: only 18 µs.
mymode = lambda x : x[x.argmax()] %timeit mymode(np.unique(a, return_counts=True))
We can also try the “boring” statistics package, which has a mode() function. Surprisingly, it’s almost three times faster than the previous solution: 8 µs.
Watch out, because you’ll run into an error if you have no unique mode, e.g. two unique values are equally common.
StatisticsError: no unique mode; found 2 equally common values
Here is a cool solution that uses Counter() from collections. It’s a dictionary subclass, an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. That’s cool, because those counts are one step closer to the mode.
This solution is ridiculously faster than SciPy’s mode(): just north of 6 µs. That’s more than 17 times faster than what we started with.
There we go. From now on, we don’t use SciPy’s mode() for a 1-D array. Another problem solved!