Universities everywhere have a financially and ethnically diverse student population which shares the common goal of wanting to succeed in their studies. There exists both mutable and immutable factors that inhibit student achievement that will be measured in this study. Features pertaining to the individual as well as the institute to attempt to gather insights into this matter.
We will survey all University students and store, clean, analyze, and visualize the results. This study will generate a model to predict a student’s GPA, as well as the Institute rating by utilizing machine learning algorithms. General statistical analysis will be performed as well to gather insights on the student population. This will all be anonymous data collection.
This study is based on a similar dataset from Paulo Cortez at the University of Minho, Portugal.(https://archive.ics.uci.edu/ml/datasets/Student+Performance) This data was collected on two secondary education Portugese schools and the student’s grades in Math and Portugese classes. Where this study differs is the scope as well as the audience. The set of questions are more targeted towards the individual and focuses less on the family status. Also, it asks about institutional ratings and the students' thoughts on the services provided.
- Create new Google Form with the questions being asked.
- Copy code from INetResponses.gs to script editor of the form sheet.
- Create new Microsoft SQL 2019 Server
- Use redditTableCreate.sql script to create database needed for this.
- Use tableCreate.sql to create the tables needed for the DB
- Update code from INetResponses.gs to reflect your server/form settings.
- Run form then run INetResponses.gs to confirm data is sending to database.
- Create new Google Colab Pyspark instance
- Use OpenWIT.ipynb file to create Pyspark instance.
- Update pyodbc information and dataframe information with your data.
Database Scripts
- dbsetup.sql
- tableCreate.sql
- redditTableCreate.sql
Are the three files necessary to create the database and tables necessary for this process.
Google JDBC Pipeline Connection Scripts
- sendData.gs
- INetResponses.gs
Are the two files used in Google Sheets script editor in order to send data from the Google Form responses to the MS SQL Database.
https://www.youtube.com/watch?v=o8LtLGMh0FI
- Jim Garrison ([email protected]), Systems/DB Architect, Backend Operations
- Dylan Goldrick ([email protected]), Front End / ML Operations