Skip to content

BridgeEdU Challenge Submission: Random forest applied to Princeton NLSF data for student persistence

Notifications You must be signed in to change notification settings

Yitaek/BridgeEdU-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BridgeEdU Challenge

Contributors: Yitaek Hwang, James Schaefer, Steven Wang, Hannah White

Executive Summary

Student persistence is a metric scrutinized by both the government and academic communities interested in improving education. Major works to understand persistence by the Obama Administration and other scholars have so far been incomplete. In fact, these reports do not take into consideration important factors such as first-generation college students, financial aid packages, and factors outside of the immediate classroom setting (e.g. having children). Even the studies by Princeton’s National Longitudinal Study of Freshmen use outdated logistic regression methods that yield low prediction power. The following study combined the available dataset and used socioeconomic frameworks created by previous research to determine high-risk population most prone to dropout. Using a machine learning algorithm called random forest, the model was able to predict 73.87% of students who are likely to drop out. This is a significant improvement to the existing logistic regression methods that yielded a poor 18.01% prediction accuracy. The following report summarizes the 21 most important factors including GPA, perception of prejudice on campus, and percentage of classes dropped and ranks them to create a potential SPS. A proposed plan of action is attached to utilize random forest algorithm for BridgeEDU to identify qualifying students for the emergency gap fund.

Contents:

  • Bridge Edu Final Draft.pdf: details the submitted report on student persistence
  • Code: Python, Jupyter notebook files
  • Model: Saved versions of the random forest model in pickeld format

The raw data we used can be downloaded from: princeton link

About

BridgeEdU Challenge Submission: Random forest applied to Princeton NLSF data for student persistence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published