Built with Mathigon

Glossary

Select one of the keywords on the left…

Logistic RegressionScoring Our Model

Reading time: ~5 min

In the last chapter, we used Mean Squared Error (MSE) to score our straight lines. But MSE doesn't work very well when we are predicting probabilities. Instead, for Classification problems, we use a special scoring system called Log-Loss (sometimes called Binary Cross-Entropy).

The goal of Log-Loss is simple: reward the model for being right, and heavily punish the model for being confidently wrong.

Prompt: A minimal artistic drawing showing a small "Penalty" gauge that is calmly in the green "0" zone when correct, but broken and exploding into the red "infinity" zone when the model is confidently wrong.

The Intuition

Imagine the true answer is "Yes" (1).

  • If our model predicts a 0.99 probability (it's 99% sure it's a Yes), the model did a great job! The penalty (loss) is almost 0.
  • But what if the model predicts 0.01 (it's 99% sure it's a No)? It was horribly wrong while being totally confident. The Log-Loss formula will give it a massive , sending the error score skyrocketing toward infinity!

Overall, our goal during training is to:

Minimize the Log-Loss
Maximize the Log-Loss
Keep Log-Loss at exactly 1

The Mathematics

For those who love formulas!

Here is the exact formula the computer uses to calculate the penalty for our entire dataset:

\textrm{Log-Loss} = \sum_{i=0}^n - (y_i * \textrm{log}(p_i) + (1-y_i)*\textrm{log}(1-p_i))

  • y_i: the true answer (either exactly 0 or exactly 1).
  • p_i: our model's predicted probability (between 0 and 1).

Because the true answer is always either 0 or 1, one half of that big formula will always multiply to zero and disappear, leaving only the penalty for the actual outcome!

Look at the interactive chart below. You can see the "penalty curve" dynamically change depending on what the True Value is. Watch how the penalty shoots up to infinity when the model makes a bad guess!

Sina