Life Cycle of this Project:
- Understanding the Problem Statement: The problem statement is to determine how variables such as gender, race/ethnicity, parental level of education, lunch, and test preparation course affect student performance (test scores).
- Data Collection: Relevant data was gathered from Kaggle.
- Data Checks: A series of data checks were performed to ensure that the data was clean, complete, and in the correct format. This included checking for missing values, duplicate values, and outliers, as well as data types and the number of unique values in each column.
- Exploratory Data Analysis (EDA): The data was analyzed to understand its structure, patterns, and relationships. This involved computing summary statistics, exploring correlations between variables, identifying potential outliers or missing values, and finding numerical and categorical columns along with the number of unique values in each categorical column.
- Data Visualization: Visualizations were created to identify trends and patterns that may be difficult to see in tabular format, helping to gain insights quickly and communicate results effectively to others.
- Data Pre-Processing: The data was transformed to make it suitable for use with machine learning models. This involved techniques such as scaling, normalization, feature selection, or feature engineering.
- Model Training: Machine learning models were built using the pre-processed data. The data was split into training and test sets, and the training set was used to train the models.
- Model Evaluation: The performance of the models was evaluated using various metrics such as Root Mean Squared Error, Mean Absolute Error, R2 Score, and accuracy. This helped to determine which models were performing best.
- Choosing the Best Model: Based on the evaluation results, the best-performing model was chosen for predicting student performance.