Paper, Tags: #nlp
We survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent and lacking in normative reasoning, despite the fact that analyzing "bias" is an inherently normative process. We propose three recommendations that should guide work analyzing "bias" in NLP systems.
We categorized the papers' motivations and techniques for measuring or mitigating "bias" into 6 categories.
- Papers state a wide range of motivations, multiple motivations, vague motivations and sometimes no motivation at all.
- It's sometimes left unstated what it might mean for an NLP system to "discriminate", or how NLP systems contribute to "social injustice".
- Papers' motivations sometimes include no normative reasoning
- 32% of papers aren't motivated by any apparent normative concerns, often focusing instead on concerns about system performance.
- Even when papers state clear motivations, they're often unclear about why the system behaviors that are described as "bias" are harmful, in what ways, and to whom.
- It's unstated what "problematic biases" or non-ideal user experiences might look like, how the system behavior might result in these things, and who the relevant stakeholders or users might be.
- Papers about NLP systems developed for the same task often conceptualize "bias" differently
- Papers' motivations conflate allocational and representational harms
- Papers' techniques are not well grounded in the relevant literature outside of NLP
- Papers' techniques are poorly matched to their motivations
- Papers focus on a narrow range of potential sources of "bias"
- Most papers focus on system predictions as the potential sources of "bias", but most papers do not interrogate the implications of other decisions made during the development and deployment lifecycle
This paper describes how researchers and practicioners might avoid the pitfalls presented in the previous section. We propose three recommendations
- Ground work analyzing "bias" in NLP systems in the relevant literature outside of NLP that explores the relationship between language and social hierarchies. Treat representational harm as harmful in their own right.
- Provide explicit statements of why the system behaviors that are described as "bias" are harmful, in what ways, and to whom. Be forthright about the normative reasoning underlying these statements.
- Examine language use in practice by engaging with the lived experiences of members of communities affected by NLP systems. Interrogate and reimagine the power relations between technologists and such communities.
More in-depth descriptions about these recommendations can be found in section 4 of the paper.