Precision and RecallIntroduction
Have you ever wondered how your phone knows if a photo contains a hot dog, or how your email app perfectly filters out spam? These are examples of classification—a machine learning superpower where a computer learns to sort things into distinct categories.
When a computer decides between just two things (like "hot dog" or "not a hot dog"), we call it
But how do we know if our AI is actually doing a good job? Your first thought might be to just look at its overall accuracy. However, accuracy can be surprisingly deceptive. In this chapter, we are going to explore why accuracy falls short, and introduce you to three incredibly powerful alternatives: precision, recall, and the F1-score.
To really understand these metrics, we first need to look at a tool that lays out everything our AI got right, and everything it got wrong: the confusion matrix.
The Confusion Matrix
Imagine we have built an AI doctor. Its job is to diagnose whether a patient has a specific rare illness or is completely healthy. The AI gives us a percentage—say, an 80% chance of illness. We then use a classification threshold to make the final call. If our decision threshold is 50%, anyone with a probability above that mark is diagnosed as "sick," and anyone below is "healthy."
Let's see what happens when the AI makes its predictions. There are exactly four possible outcomes:
Understanding the Four Outcomes
- True Positives (TP): The AI correctly diagnosed a sick patient as sick.
- True Negatives (TN): The AI correctly diagnosed a healthy patient as healthy.
- False Positives (FP): The AI incorrectly diagnosed a healthy patient as sick. (A false alarm!)
- False Negatives (FN): The AI incorrectly diagnosed a sick patient as healthy. (A dangerous miss!)
We can organize these four outcomes into a simple grid called the confusion matrix. It compares what actually happened with what the AI predicted would happen:
| Predicted: Positive | Predicted: Negative | |
|---|---|---|
| Actual: Positive | True Positive (TP) | False Negative (FN) |
| Actual: Negative | False Positive (FP) | True Negative (TN) |
By breaking down the AI's performance into these specific categories, we can see exactly how it is getting confused, rather than just knowing it made a general mistake.
Let's test your understanding of these outcomes!
Question: Your spam filter flags a perfectly normal email from your boss as spam and hides it. What kind of error is this?
Let's make sure we have this locked in. Try sorting these examples into the correct buckets: