Built with Mathigon

Glossary

Select one of the keywords on the left…

Logistic RegressionIntroduction

Reading time: ~10 min

In the last chapter, we taught our computer how to predict continuous numbers (like the exact price of a house or someone's height) using Linear Regression. But what if we don't want a number? What if we want a concrete "Yes" or "No" answer?

Welcome to the world of Classification! In this chapter, we will learn about Logistic Regression, which is a method designed specifically to classify data into categories.

Most of the time, we use this for binary classification—meaning we are choosing between exactly:

Two groups (e.g. Spam or Not Spam)
Three groups (e.g. Red, Green, Blue)
Infinite numeric possibilities!

The Problem with Straight Lines

To choose between two groups (let's call them 0 and 1), we need to calculate the probability that a data point belongs to group 1.

If you remember from math class, what is the absolute range of any true probability? (e.g. "There is a 75% chance of rain").

It must be between 0 and 1!
It can be anywhere from -1 to 1
It goes from 0 to 100

Prompt: A minimal artistic graphic showing a straight line aggressively shooting off to infinity, but being squashed by a boundary box labeled '0' and '1'.

If we used a straight line (our old friend y=mx+b), the line would eventually shoot off into infinity across the chart! We need a way to grab our straight line and "squash" it so its answers never go below 0 or above 1.

The Sigmoid "S" Curve

To solve this, we use a neat mathematical trick called the Sigmoid function. It takes a normal straight line and bends it into a beautiful "S" shape.

The Concept

Instead of predicting an exact number, the "S" curve outputs a percentage. If the output is 0.9, the model is 90% sure the answer is "Yes" (Group 1). If it outputs 0.1, it's 90% sure the answer is "No" (Group 0). We then pick a (usually 0.5) to make our final decision: anything above 0.5 becomes a "Yes", and anything below becomes a "No"!

The Mathematics

For the mathematically curious!

First, we calculate our standard straight line (our "linear predictor"), exactly like we did in the last chapter: z = \beta_0 + \beta_1 x_1 + \dots

Then, instead of making z our final answer, we pass it through the Sigmoid function to bend it and squash it between 0 and 1: P(y=1) = \frac{1}{1 + e^{-z}}

Now, the output P(y=1) represents the exact probability of our event happening!

Sina