In this project, we explore how the increase in complexity (in order to improve accuracy) in machine learning models (e.g. decision trees and random forest) affects fairness in the prediction, with respect to selected protected features (e.g. age and race).
We are interested to see if the higher accuracy caused by the overall higher complexity of an algorithm is due to sacrificing fairness in the prediction.
UCI Adult dataset: predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.
Selected protected features: age, race, sex, and native country.
Checkout the report
and presentation
directories for a summary of the results.
This study was done as part of a course project for EECS 6980 - Probabilistic Methods in Data Science