Testing Your AIOur Previous Approach
In a previous chapter, we introduced the standard way to securely test an AI: The Validation Set Approach.
As a quick refresher, this strategy involves slicing our giant dataset into three distinct, non-overlapping chunks:
The Training Set: The raw textbooks the AI studies to learn its parameters.
The Validation Set: The practice exams used to tweak and tune its final settings.
The Test Set: The final, untouched exam used to grade how it actually does in the real world.
While this standard approach is fast and very popular, it has a massive, glaring flaw:
The Luck of the Draw
When you randomly slice off 30% of your data to act as the test exam... how do you know you didn't accidentally slice off the absolute easiest questions in the entire dataset? Or the hardest?
Depending on exactly which data points randomly ended up inside the test set, your AI might get an artificially high or low final grade!
Furthermore, by permanently locking away a huge chunk of data purely for testing, you are actively preventing your AI from studying it. In machine learning, starving your model of training data almost always hurts its final performance.
If only there was a way to evaluate our model across all the variations in our data, without permanently locking any of it away from the training phase...