Linear RegressionIntroduction
Linear Regression is one of the fundamental building blocks of machine learning, perfect for beginners starting their data science journey. At its core, it's a method that helps us predict numbers based on other related information - like predicting a house's
Let's Be More Specific
Linear regression is a supervised algorithm that learns to model a dependent variable (what we want to predict), , as a function of some independent variables (aka "features"), , by finding a line (or surface) that best "fits" the data.
For example, when predicting the price of a house using the number of rooms:
- (dependent variable):
- (independent variable):
This simple idea extends to an arbitrary number of features, like predicting weight () from height () and age ().
Prompt: A minimal artistic vector illustration showing an abstract geometric house on a clean scatter plot with a sharp trend line cutting through it.
The Regression Equation
In general, the true relationship underlying linear regression is expressed as:
Where:
- : the dependent variable; the thing we are trying to predict.
- : the independent variables; the features our model uses to model .
- : the coefficients (aka "weights") of our regression model. These form the foundation of our model mapping inputs to outputs. They are what our model "learns" during optimization.
- : the irreducible error; a term that captures random noise and the unmodeled parts of our data.
Fitting a linear regression model is all about finding the set of coefficients that best approximate the outcome based on our features. We may never know the true parameters for our model, but we can estimate them using our data. Once we've estimated these coefficients, , we predict future values, , as:
So predicting future values (often called inference) is as simple as plugging the values of our features into our equation!