Logistic RegressionIntroduction

In the last chapter, we taught our computer how to predict continuous numbers (like the exact price of a house or someone's height) using Linear Regression. But what if we don't want a number? What if we want a concrete "Yes" or "No" answer?

Welcome to the world of Classification! In this chapter, we will learn about Logistic Regression, which is a method designed specifically to classify data into categories.

Most of the time, we use this for binary classification—meaning we are choosing between exactly:

Two groups (e.g. Spam or Not Spam)

Three groups (e.g. Red, Green, Blue)

Infinite numeric possibilities!

The Problem with Straight Lines

To choose between two groups (let's call them 0 and 1), we need to calculate the probability that a data point belongs to group 1.

If you remember from math class, what is the absolute range of any true probability? (e.g. "There is a 75% chance of rain").

It must be between 0 and 1!

It can be anywhere from -1 to 1

It goes from 0 to 100

If we used a straight line (our old friend y=mx+b), the line would eventually shoot off into infinity across the chart! We need a way to grab our straight line and "squash" it so its answers never go below 0 or above 1.

The Sigmoid "S" Curve

To solve this, we use a neat mathematical trick called the Sigmoid function. It takes a normal straight line and bends it into a beautiful "S" shape.

The Concept

Instead of predicting an exact number, the "S" curve outputs a percentage. If the output is 0.9, the model is 90% sure the answer is "Yes" (Group 1). If it outputs 0.1, it's 90% sure the answer is "No" (Group 0). We then pick a (usually 0.5) to make our final decision: anything above 0.5 becomes a "Yes", and anything below becomes a "No"!

The Mathematics

For the mathematically curious!

First, we calculate our standard straight line (our "linear predictor"), exactly like we did in the last chapter:

Then, instead of making z our final answer, we pass it through the Sigmoid function to bend it and squash it between 0 and 1:

Now, the output represents the exact probability of our event happening!

Sign in to Curvingo

Share

Send us Feedback

Thanks for your feedback!

Reset Progress

Glossary

Logistic RegressionIntroduction

The Problem with Straight Lines

The Sigmoid "S" Curve

The Concept

The Mathematics