Built with Mathigon

Glossary

Select one of the keywords on the left…

Decision TreesAnother Look At Our Decision Tree

Reading time: ~5 min

Let's take a quick breather and recap exactly what we've learned so far!

We know that a Decision Tree asks a series of sequential questions to split data. We also learned that it measures the "messiness" of that data using entropy. Lastly, we discovered that it picks the best questions to ask by hunting for the largest possible Information Gain.

Together, these three concepts make up the complete recipe for training a Decision Tree algorithm!

To really cement this into your brain, let's take a look at the exact same tree we built earlier, but from a totally new perspective. Instead of just looking at the questions being asked, let's look at how the actual data points—and their corresponding entropy scores—change at every single step of the journey.

As you trace any path from top to bottom, you'll notice the pool of data points shrinks as things get partitioned into different decision and leaf nodes.

Did you also notice that the entropy score drops as we travel further down? But wait—not every single leaf node at the bottom has an entropy of perfectly 0. Why would we let a Decision Tree finish if the leaf isn't perfectly pure?

Test your knowledge: Why might we prevent a Decision Tree from growing too deep?

To prevent the tree from overfitting and failing on brand new data.
Because calculating entropy takes too much time.
Because the data naturally runs out of features.

The Danger of Going Too Deep

We intentionally stop a Decision Tree from growing completely pure to make sure it well. If a tree grows too deep, it essentially memorizes the training data, meaning it will perform terribly when it sees new data it hasn't encountered before. We call this problem overfitting.

Sina