Built with Mathigon

Glossary

Select one of the keywords on the left…

Linear RegressionFinding the Best Line

Reading time: ~15 min

We know our goal is to draw a line that fits our data perfectly. This means we have to find the best possible coefficients (the numbers that control the tilt and position of our line) to make our prediction error as small as possible. But how do we find them? Guessing them by hand would take forever!

Luckily, computers use special methods to find these numbers quickly. We'll look at the two most popular ones: stepping down a hill, or jumping straight to the answer.

Method 1: Stepping Down a Hill

What if you were dropped on the side of a mountain, blindfolded, and told to find the lowest valley? How would you do it? You'd probably feel the slope of the ground with your feet and take a step downhill. You'd repeat this until the ground felt flat, indicating you've reached the:

very bottom!
highest peak
exact middle

This is exactly how a famous method called Gradient Descent works! It starts with a random line, checks how big the error is, and gradually tweaks the line to make the error smaller and smaller.

Prompt: A minimal artistic line art graphic showing a wireframe mountain or "bowl" shape with a tiny abstract figure stepping down into the valley.

The Intuition

Imagine our errors form a bowl-shaped valley. We want to reach the very bottom, where the error is smallest! Gradient descent checks the "slope" of the valley where we’re standing, and takes a small step downhill. With each step, our error gets , and our line fits the data better!

The Mathematics

For the mathematically curious! To find the bottom of the error valley, we use calculus to calculate the slope (the derivative) of our error.

Our line equation is simply: \hat{y} = \beta_0 + \beta_1 x_1

To improve our line, we update our numbers by moving in the opposite direction of the slope—always heading towards the bottom!

\beta_{new} = \beta_{current} - \alpha \times \text{slope}

Here, \alpha (alpha) is our "learning rate" – it controls how big of a step we take down the mountain!

Gradient descent is step-by-step. To see this in action, interact with the plot below. Try dragging the weights to create a "poorly" fit solution, and click to run gradient descent to watch the error drop!

Although this step-by-step method is incredibly popular, it's not magic. Sometimes it can get stuck in "mini-valleys" instead of finding the true bottom. Still, it is the engine powering modern machine learning!

Method 2: The Math Shortcut

Is there a way to jump straight to the answer without walking down a mountain? Yes! We have an algebra shortcut called the Normal Equation.

Instead of taking blind steps, we use a formula to calculate the exact bottom of the valley instantly.

(Optional Math): The computer uses matrices to find the perfect numbers in one go: \hat{\beta} = (X^{T}X)^{-1}X^{T}Y

Add circles to the chart below to see how the shortcut instantly calculates our line, no "steps" required!

If this finds the perfect answer instantly, why do we use the step-by-step method at all? Because as you add more and more variables, the math behind this shortcut becomes incredibly heavy for computers to calculate. For huge datasets, the shortcut becomes:

much slower
much faster
highly inaccurate

... and crashes! This makes the step-by-step Gradient Descent the clear winner in the real world.

Can We Trust Our Line?

After our algorithm finds the best coefficients, we have to ask a very important question: Did we find a real pattern, or just random noise?

Imagine you try to predict ice cream sales based on the number of letters in the cashier's name. A computer will dutifully calculate a coefficient for this, but it's probably meaningless! We need to know if we can trust our model.

Instead of jumping into complicated statistics, think of it as checking the confidence of our model. If our data points are tightly clustered around our line, the model is very confident. If the data points are scattered everywhere like a messy room, our "best fit" line might still be just a wild guess.

Prompt: A minimal artistic drawing showing two contrasting scatter plots: one with tightly clustered dots hugging a line, and another with completely scattered, chaotic dots representing random noise.

Keeping it Simple

When building models, always check if your predictions actually make sense in the real world before trusting the math blindly. Just because your computer found a line doesn't mean it found the truth!

Sina