Gender shades: intersectional accuracy disparities in commercial gender classification, Buolamwini and Gebru, 2018
Paper, Tags: #machine-learning
We present an approach to evaluate bias present in automated facial analysis algorithms and datasets with respect to phenotypic subgroups. We create a facial analysis dataset, balanced by gender and skin type. We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group.
Since race and ethnic labels are ustable, we use skin type (6 levels) as a more visually precise label to measure dataset diversity. As for gender, we use female and male.
In commercial applications, male subjects are more accurately classified than female subjects, and lighter subjects better than darker individuals. Darker-skinned females are the group with the highest error rate.
We need rigorous reporting on the performance metrics on which algorithmic fairness debates center. Algorithmic transparency and accountability reach beyond technical reports and should include mechanisms for consent and redress.