Built with Mathigon

Glossary

Select one of the keywords on the left…

Linear RegressionIntroduction

Reading time: ~10 min

Linear Regression is one of the fundamental building blocks of machine learning, perfect for beginners starting their data science journey. At its core, it's a method that helps us predict numbers based on other related information - like predicting a house's based on its size, or a student's based on their study hours. While modern machine learning has grown to include complex methods like neural networks, Linear Regression remains widely used because it's powerful, easy to understand, and gives clear insights into how different factors affect our predictions. Think of it as the 'Hello World' of machine learning - mastering Linear Regression will give you a strong foundation for understanding more advanced concepts later on.

Let's Be More Specific

Linear regression is a supervised algorithm that learns to model a dependent variable (what we want to predict), y, as a function of some independent variables (aka "features"), x_i, by finding a line (or surface) that best "fits" the data.

For example, when predicting the price of a house using the number of rooms:

  • y (dependent variable):
  • x_1 (independent variable):

This simple idea extends to an arbitrary number of features, like predicting weight (y) from height (x_1) and age (x_2).

Prompt: A minimal artistic vector illustration showing an abstract geometric house on a clean scatter plot with a sharp trend line cutting through it.

The Regression Equation

In general, the true relationship underlying linear regression is expressed as:

y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p + \epsilon

Where:

  • y: the dependent variable; the thing we are trying to predict.
  • x_i: the independent variables; the features our model uses to model y.
  • \beta_i: the coefficients (aka "weights") of our regression model. These form the foundation of our model mapping inputs to outputs. They are what our model "learns" during optimization.
  • \epsilon: the irreducible error; a term that captures random noise and the unmodeled parts of our data.

Fitting a linear regression model is all about finding the set of coefficients that best approximate the outcome y based on our features. We may never know the true parameters for our model, but we can estimate them using our data. Once we've estimated these coefficients, \hat{\beta_i}, we predict future values, \hat{y}, as:

\hat{y} = \hat{\beta_0} + \hat{\beta_1} x_1 + \hat{\beta_2} x_2 + \dots + \hat{\beta_p} x_p

So predicting future values (often called inference) is as simple as plugging the values of our features x_i into our equation!

Sina