Heart-Disease-Prediction

Concepts Used:

EDA, Feature Extraction, Seaborn, Pandas, Numpy, Inter-Quartile Range, Z-score, Pearson's Correlation coefficient, Spearman's Correlation coefficient, Logistic regression, Decision trees, Random forest, K nearest neighbours.

Data:

Data

Age : Age of the patient
Sex : Sex of the patient
exang: exercise induced angina (1 = yes; 0 = no)
ca: number of major vessels (0-3)
cp : Chest Pain type chest pain type
a)Value 1: typical angina
b)Value 2: atypical angina
c)Value 3: non-anginal pain
d)Value 4: asymptomatic
trtbps : resting blood pressure (in mm Hg)
chol : cholestoral in mg/dl fetched via BMI sensor
fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
rest_ecg : resting electrocardiographic results
a)Value 0: normal
b)Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
c)Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
d)thalach : maximum heart rate achieved
target : 0= less chance of heart attack 1= more chance of heart attack

EDA:

First, we saw here the datatype of the parameters using data.info() then we checked the number of duplicate records in the dataset and then removed it.Then we also checked for the NULL values in the dataset using data.isnull() and then removed the NULL values.

Detecting Outliers:

Detecting Outliers using Seaborn's Boxplots:
Here we found that outliers are present in trtbps, chol, thalachh, oldpeak, caa, thall.

Removing Outliers:

Removing the outliers using IQR(Inter-Quartile Range):
In IQR the data points that are not in the range (lower limit, upper limit) are considered as outliers.

upper limit = Q3 + 1.5 * IQR
lower limit = Q1 – 1.5 * IQR Afetr performing IQR, we found that 228 records still remain.

Removing outliers using Z-score:

Here the data point is considered as an outlier if the corresponding Z-score > 3. After performing Z-score we found that 287 records still remain.

As after performing Z-score we have more number of records, we preferred Z-score.

Correlation:

Finding Correlation using Seaborn's Heatmap:

2. Finding Correlation using Pearson's Correlation:

3. Finding correlation using Spearman's correlation:

Training Models:

Here the models we used to predict are:

Logistic Regression
Decision Trees
Random Forest
K nearest neighbor.
And their corresponding accuracy scores are:

Hence, after removeing the outliers we conclude that the Logistic regression algorithm is best suitable for this problem.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
HeartDiseasePrediction.ipynb		HeartDiseasePrediction.ipynb
README.md		README.md
heart.csv		heart.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart-Disease-Prediction

Concepts Used:

Data:

EDA:

Detecting Outliers:

Removing Outliers:

Correlation:

Training Models:

About

Releases

Packages

Languages

prathammehta16/Heart-Disease-Prediction

Folders and files

Latest commit

History

Repository files navigation

Heart-Disease-Prediction

Concepts Used:

Data:

EDA:

Detecting Outliers:

Removing Outliers:

Correlation:

Training Models:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages