|Test your knowledge|
The ROC curve is an often-used performance metric for classification problems. In this article, we attempt to familiarize ourselves with this evaluation method from scratch, beginning with what a curve means, the definition of the ROC curve to the Area Under the ROC curve (AUC), and finally, its variants.
In the scope of classification performance measurement, a curve is a line graph showing how a model performs at varied thresholds.
Being a line graph, it is composed of 2 axes and, of course, a line running in-between. The 2 axes usually represent some of the simple ratios taken from the confusion matrix (e.g. Precision, Recall, and False-Positive Rate), while the line is plotted by stitching different points together.
For simplicity, we only examine the use of curves when working with binary labels. In the cases of multi-class classification, a curve should be made for each individual label.
There are many types of curves, in which the most popular ones seem to be the ROC and the Precision-Recall curves. Usually, their difference is only on the selections of which criteria for the 2-axes.
The ROC curve
The ROC (Receiver Operating Characteristic) curve is the curve with its 2-axes being the True Positive Rate (TPR) and False Positive Rate (FPR).
Be reminded that:
A ROC curve always has 2 ends at (0, 0) and (1, 1).
To draw a ROC curve in Python:
import matplotlib.pyplot as plt from sklearn.metrics import roc_curve fpr, tpr, threshold = roc_curve(y_true, y_score) plt.plot(fpr, tpr) plt.show()
The Area Under the ROC curve
As comparing models by looking at their curves is quite vague and inefficient, we have to look for another means that should make comparisons more simple and clear. The answer, intuitively, happens to be the area under those curves.
The Area Under the ROC curve (AUC) is a quantitative measurement of model performance. It is a condensed version of the ROC curve itself and is often used for model comparison. An AUC value of 1 implies a perfect model while being close to 0.5 is not so favored.
In another point of view, the AUC equals to the probability that a random Positive sample is ranked higher than a Negative one, as stated in Green M David et al.’s paper in 1966.
In Python, to get the AUC given the predicted and actual labels, we can take the below function from sklearn. For more parameter tuning, refer to its documents.
import sklearn sklearn.metrics.roc_auc_score(y_true, y_score)
Properties and Analysis
The ROC curve:
The AUC, as compared to the original ROC curve:
Important note for the ROC curve (and hence also the AUC) that worths pointing out separately is that it is weak if the dataset is highly imbalanced in favor of the Negative samples. These cases are quite common in practice, e.g. cancer detection, spam filtering.
If the proportion of Negative samples is huge, the number of True Negative would also be much larger than False Positive, which make the False Positive Rate unreliable, thus breaking the whole ROC space.
Variants of the ROC
Note that both CROC and CC are insensitive to class imbalance.
|Test your understanding|
- Wikipedia’s page about ROC curve: link
- Jorge M. Lobo et al.’s paper on the weaknesses of AUC: link
- Tom Fawcett’s An introduction to ROC analysis: link
- Signal detection theory and psychophysics by Green M David et al.: link
- Takaya et al.’s research on different metrics for imbalanced datasets: link
- Chris et al.’s detailed paper on Cost curves: link