Home » Calculating Mutual Information in Python

Calculating Mutual Information in Python

A while ago, I was making features from text documents and I wanted to check which n-grams correlated with the target classification. A common way to achieve this is by calculating the mutual information between the n-gram and the classification. In this blog post, I explain how you can calculate the mutual information between two variables in Python using SciKit-learn.

All quoted and copied definitions are taken from this great book on information theory. I highly recommend it and it’s freely available.


Let’s start with entropy, which is a “a measure of the uncertainty of a random variable”.

Manually calculating the entropy, can be done as follows.

import numpy as np
def entropy(p):
	return -(p * np.log2(p) + (1-p) * np.log2((1-p)))

If we use the log with base 2, the entropy is expressed in bits. It is perfectly reasonable to use another base, such as e. The entropy calculated with a natural log is expressed in nats.

Of course, you don’t need to create your own function as entropy is built into SciPy with a handy entropy function.

from scipy import stats
stats.entropy([0.95,0.05], base = 2)

Relative entropy

The relative entropy is a measure of the distance between two distributions: “The relative entropy D(p||q) is a measure of the inefficiency of assuming that the distribution is q when the true distribution is p.”

First, let’s calculate it manually:

def relative_entropy(p, q):
    return sum(p[i] * np.log2(p[i]/q[i]) for i in range(len(p)))
relative_entropy([0.95,0.05], [0.2,0.8])

Of course, the same can be achieved using the same function from SciPy. By passing a second distribution to the entropy function, the function assumes you’re looking for relative entropy between both distributions.

stats.entropy(pk = [0.95,0.05], qk = [0.2,0.8], base=2)

Keep in mind that the relative entropy is not symmetric. Flipping p and q wil yield different results.

Mutual Information or Information Gain

The information gain, on the other hand, is “a measure of the amount of information that one random variable contains about another random variable. It is the reduction in the uncertainty of one random variable due to the knowledge of the other.”

As you can see in the following formula, the information gain is the relative entropy between the joint probability mass function and the marginal probability mass function. This is also known as mutual information.

We can rewrite the formula so it becomes mathematical notation of our definition of information gain we specified earlier.

In other words: mutual information and information gain are the same thing, whereas mutual information describes the dependency between two variables and information gain describes the reduction of entropy.

To demonstrate, I created two arrays. They are balanced (50/50) and they are exactly the same. First, I create the entropy, expressed in nats: it is 0.69. If you would calculate the information gain or the mutual information, you will see it is the same, because by knowing b, you also know a, reducing the entropy to 0.

from sklearn.feature_selection import mutual_info_classif
from sklearn.metrics import mutual_info_score
a = np.array([1, 1, 1, 0, 0, 1, 0, 0, 0, 1])
b = np.array([1, 1, 1, 0, 0, 1, 0, 0, 0, 1])

print(stats.entropy([0.5,0.5])) # entropy of 0.69, expressed in nats
print(mutual_info_classif(a.reshape(-1,1), b, discrete_features = True)) # mutual information of 0.69, expressed in nats
print(mutual_info_score(a,b)) # information gain of 0.69, expressed in nats

That’s why the mutual information is a great method of feature selection, it tells you how much you know about your target variable, by looking at another variable.

Great success! You know now how to calculate mutual information in Python.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

1 thought on “Calculating Mutual Information in Python”

  1. Hi,

    I hope you are safe and healthy 🙂

    I’m reaching to write a guest article on your own website and will send you some topics for that.

    In case my ideas interest you, then I will send a post on any one of the topics chosen by you. The article is going to be of terrific quality and is free of cost.

    I’d just need you to provide me a backlink within the article. Looking forward to a reply.

    Thank you!


    Kristel Marquez

Leave a Reply

Your email address will not be published. Required fields are marked *