## What is the Matthews Correlation Coefficient?

Matthews Correlation Coefficient (MCC) has many names:

- Phi Coefficient
- Pearson’s Phi Coefficient
- Yule Phi Coefficient

Contrary to other performance metrics (such as F1-Score), the MCC is **regarded as one of the best measures to evaluate class predictions in a binary setting — even if there is a severe class imbalance**. Although there is no single measure to describe model performance, the MCC is often your best option.

In essence, the MCC is a correlation coefficient between the *predicted* values and the *true* values. That’s why it will return a value between -1 and 1. When the predictions are perfect, the MCC will be +1. When it does no better than random prediction, it will be 0. Finally, when the predictions and observations disagree, the MCC will be -1. In most situations, you’d want this value to be as close to 1.

The formula for the Matthews Correlation Coefficient:

\text{MCC} = \frac{ \mathit{TP} \times \mathit{TN} - \mathit{FP} \times \mathit{FN} } {\sqrt{ (\mathit{TP} + \mathit{FP}) ( \mathit{TP} + \mathit{FN} ) ( \mathit{TN} + \mathit{FP} ) ( \mathit{TN} + \mathit{FN} ) } }

Interestingly, if any of the four sums in the denominator is 0, the MCC will also be zero. This happens when:

- One of the classes is never found in the data (e.g. TP + FN = 0)
- If all predictions return the same value (e.g. TP + FP = 0)

It is worth noting that the formula can also be written using ratios only:

\text{MCC} = \sqrt{\mathit{PPV} \times \mathit{TPR} \times \mathit{TNR} \times \mathit{NPV}} -\sqrt{\mathit{FDR} \times \mathit{FNR} \times \mathit{FPR} \times \mathit{FOR}}

Further reading: