Project Aim
Develop and deploy a predictive model that is capable of providing information about whether a patient is likely to
incur a stroke (i.e., to determine which patients have high stroke risk).
Technical Objectives:
- Import data
- Perform exploratory data analysis
- Perform statistical inference
- Data visualization
- Develop a variety of machine learning models
- Assess the quality of these models
- Gain insights about meaningful features that relate to stroke likelihood
- Deploy the machine learning model via a Flask application
data/healthcare-dataset-stroke-data.csv:
The dataset used for training and evaluating the models.
utils
Contains utility scripts used in the notebook.
plots.py: Functions for data visualization.
stats_ML.py: Functions for statistical analysis and machine learning tasks.
stroke_risk.ipynb: Jupyter notebook for data analysis, model development, and evaluation.
requirements.txt: List of required Python packages.
deployment
contains all necessary files to deploy model as endpoint API for predictions on novel data
stroke_risk_deployment.py: Script to deploy the trained model using Flask.
deployment_requirements.txt: List of required Python packages for Dockerfile creation
model.pkl The final model chosen for this analysis (Polynomial Logistic Regression)
Dockerfile: Dockerfile to create Docker Image of this model
test_request.py: Python file for testing deployed model
Prerequisites
Make sure you have Python 3.10 installed. You can download it from python.org.
Installation: Clone the repository
Create a virtual environment:
python3 -m venv stroke_risk/
source stroke_risk/bin/activate # On Windows, use `venv\Scripts\activate`
Install the required packages:
pip install -r requirements.txt
Running the Notebook
To explore the data and run the models, start Jupyter Notebook:
jupyter notebook
Open stroke_risk.ipynb in the browser and run the cells to perform data analysis,
model training, and evaluation.
Requirements
This project uses the following packages:
flask~=3.0.3
IPython~=8.22.2
ipykernel~=6.29.3
jupyter_client~=8.6.0
jupyter_core~=5.7.1
jupyter_server~=2.13.0
matplotlib~=3.8.3
notebook~=7.1.1
numpy~=1.26.4
pandas~=2.2.1
python~=3.10.13
qtconsole~=5.5.1
requests~=2.31.0
scipy~=1.12.0
seaborn~=0.13.2
scikit-learn~=1.4.1.post1
xgboost~=2.0.3
For any questions or issues, please contact [email protected]