It’s been a while since I worked on a machine learning project from exploratory phase to model development. I bumped into a real newbie error and decided to write it down for my future self, and for you of course.
Here’s the code I wrote and I hoped to get one single value back: the Area Under the Curve.
from sklearn.metrics import auc auc(y_true, y_pred)
But that’s not what happened. This seemingly obvious code produced an error:
x is neither increasing nor decreasing […]
It seems that I hadn’t read the documentation for auc() properly: “Compute Area Under the Curve (AUC) using the trapezoidal rule […] This is a general function, given points on a curve.”
The function accepts x and y coordinates that are used to compute the Area Under the Curve. Furthermore, the x coordinates should be “either monotonic increasing or monotonic decreasing.” This explains the error!
These x and y coordinates, for each threshold, can easily be calculated using the roc_curve function. After coding it as follows, I got what I expected.
from sklearn.metrics import auc, roc_curve fpr, tpr, thresholds = roc_curve(y_true, y_pred, pos_label = 1) auc(fpr, tpr)
Finally, there is a shortcut. You don’t need to calculate the ROC curve and pass the coordinates for each threshold to the auc function. There is a general function that does it all in one line of code: roc_auc_score().
from sklearn.metrics import auc_roc_score roc_auc_score(y_true, y_pred)
Great success!
thank you boss