Skip to content

ShakilMahmudShuvo/Early-Diabetes-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Early-Diabetes-Prediction

I have worked on an Early Diabetes Detection project, which is a machine learning project aimed at predicting the likelihood of a person having diabetes based on their symptoms and other factors. I used a dataset collected from direct questionnaires given to patients at the Sylhet Diabetes Hospital in Sylhet, Bangladesh to train four different machine learning algorithms to predict the presence of diabetes.

The goal of this project is to help detect diabetes early and provide preventative care to those who need it. I prepared the data and engineered relevant features to enable effective modeling. I then applied different machine learning algorithms to the data, including logistic regression, decision trees, random forests, and neural networks. I evaluated the performance of these models using various metrics, including accuracy, precision, recall, and F1-score.

The steps Include:

  • EDA (Manually)
    • Dealing with missing data
    • Distribution of different attributes
  • Automated EDA using sweetviz and autoviz
  • Dataset Preprocessing
    • Changing target values into numerical values
    • Label encoding
    • Calculating Correlation between features
  • Feature Selection
  • Splitting into Train & Test
  • Data Normalization
  • k-Fold cross-validation
  • Model Building
    • Logistic Regression
    • Random Forest
    • SVM
    • KNN
    • Gaussian NB

Description of the dataset

Dataset Link : https://www.kaggle.com/datasets/ishandutta/early-stage-diabetes-risk-prediction-dataset?datasetId=886508&sortBy=dateRun&tab=profile

This data set contains information collected from direct questionnaires given to patients at the Sylhet Diabetes Hospital in Sylhet, Bangladesh, and approved by a doctor. The attributes include:

  • Age (between 20 and 65)
  • Sex (1 = Male, 2 = Female)
  • Polyuria (1 = Yes, 2 = No)
  • Polydipsia (1 = Yes, 2 = No)
  • Sudden weight loss (1 = Yes, 2 = No)
  • Weakness (1 = Yes, 2 = No)
  • Polyphagia (1 = Yes, 2 = No)
  • Genital thrush (1 = Yes, 2 = No)
  • Visual blurring (1 = Yes, 2 = No)
  • Itching (1 = Yes, 2 = No)
  • Irritability (1 = Yes, 2 = No)
  • Delayed healing (1 = Yes, 2 = No)
  • Partial paresis (1 = Yes, 2 = No)
  • Muscle stiffness (1 = Yes, 2 = No)
  • Alopecia (1 = Yes, 2 = No)
  • Obesity (1 = Yes, 2 = No)
  • Class (1 = Positive, 2 = Negative)

The data set is useful for predicting whether a patient has diabetes based on their symptoms and other factors.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published