The updated Streamlit application for predicting a student's performance index features a sleek, dark-themed design with a central focus on usability and aesthetics. It uses a pre-trained Random Forest model, loaded via joblib, to predict performance based on user inputs such as hours studied, previous scores, extracurricular activities, and more. Inputs are collected through a form with sliders and select boxes, and predictions are displayed with rounded values and categorized grades. Custom CSS enhances the UI with contrasting colors, borders, and padding to create a visually appealing and user-friendly interface.
The dataset link is are as follows :-https://www.kaggle.com/datasets/nikhil7280/student-performance-multiple-linear-regression
on this dataset, below processing are performed :
- featue scaling and column reinitialization
- errors and outliers removal using box plot
- remove na,missing values , regularization etc
- Drop duplicates , normalization , column dropping
(all this works ar depicted in student_performance_predictionn.ipynb file)
The project follows the below structured methodology ranging from data preprocessing pipeline to feature engineering model training, evaluation and deployment :-
-
Data Preprocessing and feature enginnering
-
Exploratory Data Analysis (EDA): after Data preprocessing the next step is Exploratory data analysis using different plotting libraries like matplotlib,pandas,seaborn and plotly.following plots were plotted in this step:-
- Pie charts
- violen plots
- box plot of numerical features
- count plots
- histogram
- model comparison graphs
- confusion matrix
- Correlation metrices
-
Model Training and evaluation: The four machine learning model Decision Tree ,SVM (linear kernel) ,Random Forest, Multiple linear regression are selected for model training over the inputed processed data:
The most accurate Random Forest model is then loaded into streamlit application after installing and using joblib library.
-
Inference: Deployed the model with the help streamlit web application to predict the student marks and grade using ML.
- Joblib: For downloading the RF model
- Scikit learn: For machine learning processing and operations
- Matplotlib: For plotting and visualizing the detection results.
- Pandas: For Data manipulation.
- NumPy: For efficient numerical operations.
- Seaborn : for advanced data visualizations
- plotly : for 3D data visualizations .
- Streamlit : for creating gui of the web application.
- requests : requests for creating Htttp requests
-
Clone the Repository:
git clone url_to_this_repository
-
Install Dependencies:
pip install -r requirements.txt
-
Download the model and Run the Model:
link : https://drive.google.com/file/d/1PFrcz_8IhXIudi5h-uxJo6VDZE7r058d/view?usp=sharing
streamlit run app.py