Inusrance Prediction Model - ML Classification Problem

In in this project we tried to create different machine learning supervised models for pedicting whether an owner of a given house should consider applying for insurance. The data preprocessing and visualization will be included in this project.

The machine learning techniques/models used are: Decision Tree Classifier, Logistic Regression, Random Forest Classifier, SVM Classifier, and MLP Classifier.

The Code

The main.ipynb is the main entry point for the project.
Other files contain useful code for formatting, normalizing, and preprocessing of the data.

Data Visualization

Before jumping in and applying any classification algorithm, we should first understand and visualize the dataset. Here's some of the visualizations that we made for our dataset:

Correlation between the features

We can see that there are no highly correlated features in our dataset.

Histograms and Distribution

Buidling Dimension

Buidling Type

Number of Windows

Label (Class)

We can see that we are facing a problem of unbalanced data, so we should apply some oversampling techniques to avoid biased models.

Data Preprocessing

After understanding the dataset and the different features, we can now apply some data preprocessing to prepare the data for the classifcation model. In this project we applied the following data preprocessing:

NaN values: after careful study of the data, we removed some entries having NaN values, and replaces others with either mean, previous val, or next value.
Outliers: we used the Boxplot method to determine outliers, and again made a study on whether to remove these outliers or replace them with other values.
Encoding: we had to encode non-numeric values in order for the ML algrorithm to function correctly.
Normalization: in order to make it easier for the ML algorithm to learn, we applied scaling techninques like RobustScaler to normalize the data.

Classification Models

We applied different classification models and made some evaluation and comparisons to select the best model.

Decision Tree Classifier

After training this classfier, we got the following results on the test data:

accuracy (in %): 70.37727061015372
Confustion Matrix:

Logistic Regression Classifier

accuracy (in %): 77.17745691662785
Confustion Matrix:

Random Forest Classifier

accuracy (in %): 71.1690731252911
Confustion Matrix:

SVM Classifier

accuracy (in %): 76.61853749417791
Confustion Matrix:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
README.md		README.md
main.ipynb		main.ipynb
preprocessing.ipynb		preprocessing.ipynb
preprocessing.py		preprocessing.py
test.ipynb		test.ipynb
test_Insurance.csv		test_Insurance.csv
train_Insurance.csv		train_Insurance.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inusrance Prediction Model - ML Classification Problem

The Code

Data Visualization

Correlation between the features

Histograms and Distribution

Buidling Dimension

Buidling Type

Number of Windows

Label (Class)

Data Preprocessing

Classification Models

Decision Tree Classifier

Logistic Regression Classifier

Random Forest Classifier

SVM Classifier

About

Releases

Packages

Languages

waterflow80/inusrance-prediction-model

Folders and files

Latest commit

History

Repository files navigation

Inusrance Prediction Model - ML Classification Problem

The Code

Data Visualization

Correlation between the features

Histograms and Distribution

Buidling Dimension

Buidling Type

Number of Windows

Label (Class)

Data Preprocessing

Classification Models

Decision Tree Classifier

Logistic Regression Classifier

Random Forest Classifier

SVM Classifier

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages