Built with Mathigon

Glossary

Select one of the keywords on the left…

Precision and RecallIntroduction

Reading time: ~10 min

Have you ever wondered how your phone knows if a photo contains a hot dog, or how your email app perfectly filters out spam? These are examples of classification—a machine learning superpower where a computer learns to sort things into distinct categories.

When a computer decides between just two things (like "hot dog" or "not a hot dog"), we call it classification. If it's sorting between many things, like predicting if your package will arrive "early," "late," or "on time," it is known as classification.

But how do we know if our AI is actually doing a good job? Your first thought might be to just look at its overall accuracy. However, accuracy can be surprisingly deceptive. In this chapter, we are going to explore why accuracy falls short, and introduce you to three incredibly powerful alternatives: precision, recall, and the F1-score.

To really understand these metrics, we first need to look at a tool that lays out everything our AI got right, and everything it got wrong: the confusion matrix.

The Confusion Matrix

Imagine we have built an AI doctor. Its job is to diagnose whether a patient has a specific rare illness or is completely healthy. The AI gives us a percentage—say, an 80% chance of illness. We then use a classification threshold to make the final call. If our decision threshold is 50%, anyone with a probability above that mark is diagnosed as "sick," and anyone below is "healthy."

Let's see what happens when the AI makes its predictions. There are exactly four possible outcomes:

Understanding the Four Outcomes

  • True Positives (TP): The AI correctly diagnosed a sick patient as sick.
  • True Negatives (TN): The AI correctly diagnosed a healthy patient as healthy.
  • False Positives (FP): The AI incorrectly diagnosed a healthy patient as sick. (A false alarm!)
  • False Negatives (FN): The AI incorrectly diagnosed a sick patient as healthy. (A dangerous miss!)

We can organize these four outcomes into a simple grid called the confusion matrix. It compares what actually happened with what the AI predicted would happen:

Predicted: PositivePredicted: Negative
Actual: PositiveTrue Positive (TP)False Negative (FN)
Actual: NegativeFalse Positive (FP)True Negative (TN)

By breaking down the AI's performance into these specific categories, we can see exactly how it is getting confused, rather than just knowing it made a general mistake.

Let's test your understanding of these outcomes!

Question: Your spam filter flags a perfectly normal email from your boss as spam and hides it. What kind of error is this?

True Positive (TP)
False Positive (FP)
True Negative (TN)
False Negative (FN)

Let's make sure we have this locked in. Try sorting these examples into the correct buckets:

The AI doctor gives a healthy patient a clean bill of health.
The email filter identifies a scam email and sends it to junk.
Your smart home security system triggers an alarm because a cat walked by.
The self-driving car fails to notice a stop sign and keeps driving.
True Negatives (TN)
True Positives (TP)
False Positives (FP)
False Negatives (FN)
Sina