Skip to content

In in this project we'll try to create different machine learning supervised models for pedicting whether a given house should consider apply for insurance. The data preprocessing and visualization will be included in this project.

Notifications You must be signed in to change notification settings

waterflow80/inusrance-prediction-model

Repository files navigation

Inusrance Prediction Model - ML Classification Problem

In in this project we tried to create different machine learning supervised models for pedicting whether an owner of a given house should consider applying for insurance. The data preprocessing and visualization will be included in this project.

The machine learning techniques/models used are: Decision Tree Classifier, Logistic Regression, Random Forest Classifier, SVM Classifier, and MLP Classifier.

The Code

  • The main.ipynb is the main entry point for the project.
  • Other files contain useful code for formatting, normalizing, and preprocessing of the data.

Data Visualization

Before jumping in and applying any classification algorithm, we should first understand and visualize the dataset. Here's some of the visualizations that we made for our dataset:

Correlation between the features

corr-matrix

We can see that there are no highly correlated features in our dataset.

Histograms and Distribution

Buidling Dimension

building-dim-hist

Buidling Type

building-type-hist

Number of Windows

number-windows0hist

Label (Class)

claim-hist

We can see that we are facing a problem of unbalanced data, so we should apply some oversampling techniques to avoid biased models.

Data Preprocessing

After understanding the dataset and the different features, we can now apply some data preprocessing to prepare the data for the classifcation model. In this project we applied the following data preprocessing:

  • NaN values: after careful study of the data, we removed some entries having NaN values, and replaces others with either mean, previous val, or next value.
  • Outliers: we used the Boxplot method to determine outliers, and again made a study on whether to remove these outliers or replace them with other values.
  • Encoding: we had to encode non-numeric values in order for the ML algrorithm to function correctly.
  • Normalization: in order to make it easier for the ML algorithm to learn, we applied scaling techninques like RobustScaler to normalize the data.

Classification Models

We applied different classification models and made some evaluation and comparisons to select the best model.

Decision Tree Classifier

After training this classfier, we got the following results on the test data:

  • accuracy (in %): 70.37727061015372

  • Confustion Matrix:

    confusion-matrix

Logistic Regression Classifier

  • accuracy (in %): 77.17745691662785
  • Confustion Matrix:

confusion-matrix-logis

Random Forest Classifier

  • accuracy (in %): 71.1690731252911

  • Confustion Matrix:

    confusion-matrix-rand-forest

SVM Classifier

  • accuracy (in %): 76.61853749417791
  • Confustion Matrix:

svm-confusion

About

In in this project we'll try to create different machine learning supervised models for pedicting whether a given house should consider apply for insurance. The data preprocessing and visualization will be included in this project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published