Built with Mathigon

Glossary

Select one of the keywords on the left…

Logistic RegressionFinding the Best S-Curve

Reading time: ~10 min

How do we find the magical numbers (coefficients) that bend and shift our S-curve to perfectly trace the data without getting paralyzed by the Log-Loss penalty?

Just like with straight lines, computers can either slowly step toward the answer, or try an exact mathematical approach. Let's look at the two main methods.

Method 1: Gradient Descent

Remember walking down the mountain blindfolded from the last chapter? Gradient Descent works exactly the same way here! Instead of minimizing Mean Squared Error, the algorithm is now tiptoeing down the mountain trying to minimize our new penalty score:

Log-Loss!
Linear Regression Error!
Maximum Threshold!

Calculus calculates the "gradient" (this just means !). Since the gradient always points up the mountain, the computer multiplies it by a "learning rate" and takes a step in the exact opposite direction. Repeat this over and over, and you eventually reach the bottom of the penalty valley!

Prompt: An elegant, hyper-minimalist graphic displaying an abstract 3D wireframe valley. A tiny, glowing marker sits at the lowest point representing the minimum error.

Let's see this in action. Click the buttons below to run steps of Gradient Descent. Watch how the S-curve gradually shifts into the perfect shape as the error drops!

Method 2: Maximum Likelihood Estimation

What if instead of minimizing our penalty, we reversed our thinking completely? This gives us the second method: Maximum Likelihood Estimation (MLE).

The MLE Philosophy

Instead of asking "How do we make our errors as small as possible?", MLE asks: "What S-curve makes the data we observed the most likely to have happened?"

It's a mathematical flip! Because math is beautifully balanced, tuning the curve to capture the absolute highest likelihood for your data is mathematically identical to getting the absolute lowest Log-Loss. By using calculus to set complex derivatives to 0, a computer can perfectly solve for the curve that best fits the data without taking any blind steps.

Sina