layout	title	subtitle	img	importance	styles
page	Interplay Between Implicit Bias and Sycophancy in LLMs	Implications for Fairness in Educational Decisions	assets/img/bias_syco_poster.png	1	fivehundred { max-height:500; width: auto; }

Research final project for MIT seminar course 6.S986 "Large Language Models and Beyond" in Spring 2024 with professor Yoon Kim and collaborators Isabella Pu and Shrestha Mohanty.

{% include figure.liquid path="assets/img/bias_syco_poster.png" class="img-fluid rounded z-depth-1 fivehundred" %}

Presenting our poster at the end of the semester!

Abstract

As large language models (LLMs) present new possibilities for educational decision-making, it is essential to understand their potential impact on equity and fairness. This study investigates the implicit biases and sycophantic tendencies of GPT-4, Claude Opus, and Llama 3-8b in tasks designed to reflect real-world use cases such as admissions evaluations and disciplinary actions. Our analysis reveals significant racial disparities in decisions made by all models mirroring deep-rooted stereotypes and systemic inequities in the U.S. education system. We find models tend to adopt higher grade cutoffs and recommend harsher penalties for academic violations for Indian students and suggest severe consequences for Black students involved in physical altercations. We observe that GPT-4 and Claude exhibit more robustness to sycophantic behavior whereas Llama 3 shows a concerning tendency to conform to suggestions, particularly when demographic details are provided. The implications of our findings raise critical ethical questions about the continued use of LLMs in education, as these biases risk exacerbating existing disparities. Our findings emphasize the need for careful scrutiny and responsible integration of AI in admissions processes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bias_sycophancy_project.md

bias_sycophancy_project.md

Abstract

Files

bias_sycophancy_project.md

Latest commit

History

bias_sycophancy_project.md

File metadata and controls

Abstract