Built with Mathigon

Glossary

Select one of the keywords on the left…

Equality Of OddsIntroduction

Reading time: ~15 min

Imagine you're casting actors for a school play. You want to pick the absolute best performers, but you also want to make sure you're treating every grade level fairly. If your method favors seniors over freshmen purely by accident, your audition process is broken.

This same problem exists in Machine Learning. When algorithms learn from past data, they can accidentally pick up human prejudices. If we aren't careful, our completely logical math formulas can end up being highly .

To fix this, we need mathematical rules to measure fairness. One incredibly popular fairness rule is called Equalized Odds (EO).

Equalized Odds: Making It Fair

What does it actually mean to be fair? It's not just about giving everyone the exact same number of spots. It's about ensuring your mistakes are distributed equally.

In a hiring scenario, an AI will make mistakes. It might wrongly reject a great candidate (a False Negative), or wrongly hire a bad candidate (a False Positive). EO argues that your AI is acting fairly only if its error rates are the exact same across all demographic groups.

The Intuition

If your AI system has a 5% chance of falsely rejecting a highly qualified candidate from Group A, it should also have exactly a 5% chance of falsely rejecting a strong candidate from Group B. The model's occasional misjudgments shouldn't disproportionately harm any one group.

The Mathematics

EO demands that the model's predictions \hat{Y} have the exact same True Positive Rate (TPR) and False Positive Rate (FPR) regardless of geometric group membership A.

\mathbb{P}(\hat{Y}=1| Y=y, A=a_1) = \mathbb{P}(\hat{Y}=1| Y=y, A=a_2), \; y \: \in \: \{0,1\}

Because forcing both FNR and FPR to match perfectly is notoriously difficult, data scientists sometimes use "relaxed" equations that only try to equalize one of the error rates at a time (e.g., setting y=1).

Sort these concepts to test your understanding!

Wrongly rejecting a great candidate
Wrongly hiring a bad candidate
Equalizing error rates between all groups
Equalized Odds
False Positive
False Negative

Measuring Fairness

To actually give a machine a "fairness score", data scientists calculate the difference between the error rates of two demographic groups.

FPR Balance

False Positive Error Rate (FPR) Balance We take the FPR for Group A and subtract the FPR for Group B. \textrm{FPR}_A - \textrm{FPR}_B

If the result is exactly 0, the model is perfectly fair in how it wrongly accepts people!

FNR Balance

False Negative Error Rate (FNR) Balance We take the FNR for Group A and subtract the FNR for Group B. \textrm{FNR}_A - \textrm{FNR}_B

If the result is strictly 0, the model is perfectly fair in how it wrongly rejects people.

According to strict Equalized Odds, a model is only truly fair if BOTH the FPR difference and FNR difference equal exactly zero simultaneously.

But is that actually possible in the real world?

Interactive Simulation

Play with the threshold slider below. As you move it, you change how strictly the model separates applicants. Watch what happens to the FPR and FNR differences. Can you find a magical threshold where both hit exactly 0 while still keeping the model's accuracy high?

Did you find a slider spot where the differences miraculously hit 0?

Yes, the model can easily be perfectly accurate and perfectly fair at the exact same time.
Yes, but only in "lazy" states where the model accepts everyone or rejects everyone, completely ruining its usefulness.
No, the slider breaks before the FNR difference ever reaches zero.

Those lazy states prove a massive point: just blindly demanding 0% differences mathematically can accidentally destroy your model's actual ability to make intelligent predictions.

You can verify that no perfectly balanced, high-accuracy spot exists for these two groups by looking at this chart tracking their error rates:

How Can We Actually Fix the Bias?

If the AI naturally wants to create an unfair model to maximize its accuracy, we have to intervene directly. There are two primary techniques software engineers use to force the AI to behave fairly:

During Training

Constrained Optimization While the AI is actively training, we can place strict mathematical constraints on its loss function. We tell the algorithm: "You are allowed to try and increase your accuracy, but ONLY if the difference between the groups' error rates stays strictly below a tiny limit \epsilon."

\min \quad L(\theta)
\textrm{subject to} \quad |\mathbb{P}(\hat{Y} \neq Y, A= A) - \mathbb{P}(\hat{Y} \neq Y, A= B)| \leq \epsilon

After Training

Post-Processing Thresholds Sometimes we can't retrain the model from scratch. Instead, we can apply different probability thresholds for different demographic groups to artificially force the outcomes to be fair. We search for the exact threshold combinations per group that balance the math out.

To visualize this post-processing search, engineers plot the error rates on an ROC curve for both Group A and Group B. To find a fair solution, you must find where the mathematical curves for both demographics intersect at a point that isn't totally "lazy".

To reveal more content, you have to complete all the activities and exercises above. 
Are you stuck? or reveal all steps

Next up:
Conclusion
Sina