Precision-Recall curve: an overview

A beautiful sight
Test your knowledge

Precision-Recall curve: an overview - Quiz 1

1 / 5

What is Recall?

2 / 5

What is Precision?

3 / 5

Suppose you have a dataset A with 10 Positive and 1000 Negative data points. As you want to make it balance, you sample the Positive 100 times to have a dataset B that contains 1000 Positive and 1000 Negative data points.

You use a model to give predictions on these 2 datasets, how are their Precision compared?

4 / 5

What is the requirement of models to be assessed by plotting a Curve (e.g. ROC Curve, PR Curve)?

5 / 5

Which of the below are suitable for imbalanced datasets? (Assume that the datasets have much less positive samples than the negative ones, and we want to emphasize the performance on those positive samples). Choose all that apply.

Your score is


Please rate this quiz

As we stated in the previous discussion, the ROC curve seems to be not a good evaluation metric when the data is highly imbalanced in favor of the Negative class. In this blog post, we introduce an alternative for the ROC: the Precision-Recall curve (PR-curve), which is a more reliable measurement for the cases when Positive samples are rare.


Being the same as the other curves, the Precision-Recall is simply a line graph showing the performance of a predictive model over a dataset. Although curves can be utilized for multi-class classification problems, they are most useful and comprehensive with binary labels, thus, we are going with this assumption (of binary labels) in this discussion.

A Precision-Recall curve differentiates itself from the others by its choice of the 2 axes, being the Precision and Recall rates, as literally implied by its name.

Precision and Recall are two measures computed from the Confusion Matrix, by:

Precision = \frac{TP}{TP + FP}

Recall = \frac{TP}{TP + FN}

An example of a PR-curve
An example of a PR-curve

Note that Recall is just another name of the True Positive Rate we used in the ROC curve.

To draw a Precision-Recall curve in Python, we can utilize a pre-built function from sklearn:

import matplotlib.pyplot as plt
from sklearn import metrics

precision, recall, thresholds = metrics.precision_recall_curve(y_true, y_pred)
plt.plot(recall, precision)

To get the Area Under the PR-curve, there are 2 ways: to approximate using the trapezoidal approximation formula or using the average precision score.

# method 1
precision, recall, thresholds = metrics.precision_recall_curve(y_true, y_pred)
metrics.auc(recall, precision)

# method 2
metrics.average_precision_score(y_true, y_pred)


At the first impression, we may take it that the PR-curve concerns more about the Positive class than the Negative one, as both the numerators of Precision and Recall are the count of True Positives. This preference is then proved to be useful in cases of datasets with a minority of Positive labels.

To illustrate, let us compare PR-curves on balanced and imbalanced datasets.

In the balanced one, the Positive samples take up 50% of the whole, while this number is 10% for the imbalanced dataset. For each test case, we will draw 4 PR-curves representing the performance of 4 classifiers, from Random, Moderate, Good to Perfect classifier.

Demonstration of different classifiers.
Demonstration of different classifiers. The blue dots represent the (real) Negative samples while the orange stars are the Positives. A better classifier separates the 2 labels more clearly.

And here are the resulting curves:

PR-curve of quality-different datasets

Firstly, we can see that the random classifier is represented by a horizontal line with its y-axis value equals the proportion of Positive samples. This is right because the Precision is supposed to be a constant irrespective of the fluctuation in Recall.

Secondly, except for the perfect classifier who is always a straight, superb line, the other ones are different for each type of datasets. More specifically, with the same distribution of predictions, the PR-curve shows a worse performance when the proportion of Positive samples is lower.

Why this is the case? Let’s look at the below for easier intuition:

Predictions of a classifier for balanced dataset (on the left) and imbalanced dataset (on the right)
Predictions of a classifier for the balanced dataset (on the left) and imbalanced dataset (on the right)

These are the prediction tables of a classifier on 2 datasets, the first one with 50% Positive and the second one with only 10% Positive (similar to our example above). While the True Positive Rate (Recall) and the True Negative Rate remains the same, the Precision shifts terribly.

Precision_{Balanced} = \frac{70}{70+40} \approx 0.64

Precision_{Imbalanced} = \frac{14}{14+72} \approx 0.16

This utterly makes sense, since, for the imbalanced dataset, only 14 of the 86 predicted Positive samples are actually true, which implies a bad classifier. This indication, however, is failed to be captured by most other types of curves (e.g. the ROC-curve).

Goods and bads

The PR-curve is sensitive to data distribution. Furthermore, it cares about the Positive cases more than the Negatives, which makes it suitable for datasets with a small fraction of Positive samples.

The base-line performance (i.e. performance of a random classifier) in the PR-curve is expressed by a horizontal line with y = \frac{P}{P+N} (thus the Area Under this PR-curve is also \frac{P}{P+N}), which is different for different datasets. This makes the comparison between datasets harder.

More general advantages and disadvantages of using a curve for evaluating model performance are given in this post.

Test your understanding

Precision-Recall curve: an overview - Quiz 2

1 / 4

What are the 2 axes of the PR-curve?

2 / 4

What best describes an imbalanced dataset?

3 / 4

Why is PR-curve (and its area) usually considered suitable for imbalanced datasets?

4 / 4

What is the area under the PR-curve for outputs given by a random classifier? (A random classifier is the one that does not make use of the predictors when predicting the labels.)

Your score is


Please rate this quiz


  • An in-depth analysis of PR-plot and some others for an imbalanced dataset from Takaya et al.: link
  • Another in-depth face-to-face comparison between metrics by Jakub: link

Leave a Reply