Underfitting and OverfittingLOESS Regression
When building an AI model, data scientists often get to turn invisible dials and flip switches to manually dictate how "complex" or "simple" the AI is allowed to be. We call these invisible dials hyperparameters. Let's look at how turning a simple hyperparameter dial controls the tug-of-war for a model called LOESS Regression.
LOESS (Locally Estimated Scatterplot Smoothing) is just a fancy mathematical technique for drawing a smooth curving line through a bunch of scattered data points. Instead of drawing one stiff, straight line through everything, LOESS looks at small local "neighborhoods" of data and draws a tiny line for each neighborhood, connecting them all into one long curve.
We can actually control the size of these local "neighborhoods" by turning a master smoothness dial:
High Smoothness
If we crank the dial up so high that a single neighborhood includes almost 100% of the entire dataset, LOESS stops caring about tiny local details entirely. This results in a massive, sweeping curve that completely ignores the fine-grained data, causing it to suffer from extreme
Low Smoothness
If we turn the dial all the way down so a neighborhood only contains 2 or 3 tiny data points, LOESS cares too much about local data. It violently zigs and zags, trying to flawlessly draw a line through every single dot. By trying to perfectly hit every point, it suffers from incredibly high variance (Overfitting).
Dialing it In
In the simulation below, a red LOESS curve is trying to fit the scattered dots. Try dragging the smoothness slider back and forth to literally see the tug-of-war between high bias (underfitting) and high variance (overfitting)!
K-Nearest Neighbors
If your AI is trying to figure out which neighborhood a completely un-mapped house belongs to, a solid strategy is to just ask the closest houses and accept a majority vote.
This simple technique is called K-Nearest Neighbors (KNN). If we set , the mystery house simply copies whatever its single closest neighbor is. If we set , the mystery house belongs to whatever the majority of its 69 closest neighbors are!
Once again, the value of is an invisible hyperparameter dial that directly controls our tug-of-war:
High K (Bias)
If we crank the dial up to , the AI takes a massive global vote and ignores local differences entirely. The result is a massive, highly-smoothed map that ignores local islands completely. This is extreme
Low K (Variance)
If we drop the dial down to , only the absolute single closest neighbor is listened to. The resulting map is a jagged, chaotic mess of tiny overlapping islands where every single dot tries to aggressively command its own tiny neighborhood. This is extreme variance!
Test the K-Dial!
Explore the trade-off for yourself below! The plot on the left shows the training data. The plot on the right shows decision regions based on the current value of K. Deeper colors reflect more confidence. Try dragging the K-slider to see it snap from high variance to high bias!
What About Double Descent?
If you plan on diving deeper into modern machine learning, you will inevitably hear about a spooky phenomenon known as Double Descent.
In extremely complex, modern AI (like deep neural networks), the classic U-shaped bias-variance curve we just learned about will suddenly drop into a second incredible dip where adding more complexity actually makes the model vastly better instead of worse!
Does this completely break the rule of bias and variance? Not at all! It actually firmly supports the classical tradeoff... but the mathematical proof requires diving into incredible depths.
We Will Return!
Fret not! As we detail in our advanced chapters, Double Descent actually rigorously supports the classical bias-variance tradeoff curve you just learned. Stay tuned for that advanced content to learn more as you journey deeper!