Skip to content

Machine Learning-based Risk Assessment for Retail Lenders -- Proof of Concept using Home Credit Data

License

Notifications You must be signed in to change notification settings

migueldiazacevedo/retail_lending_risk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Risk Evaluation Service for Retail Brands

risk

Proof of Concept: Home Credit Default Risk Prediction

(Inspired by the 2018 Home Credit Kaggle Competition)

Project Overview

This project demonstrates a proof-of-concept for risk evaluation using machine learning, focusing on predicting loan default risk. As part of a start-up product team simulation, I built a complete pipeline for risk assessment, from data exploration and preprocessing to model development and deployment. The model is aimed at helping financial institutions assess the risk of lending, especially in cases where customers have limited financial history.

Business Problem

Financial institutions often struggle to assess the credit risk of individuals with little or no credit history, such as first-time homebuyers and small business owners. This project explores whether machine learning can improve the accuracy of these assessments by analyzing a range of customer data, including loan applications, credit bureau reports, and payment histories.

Data Sources

The project uses a comprehensive set of financial data from Home Credit, including:

  • Current loan applications
  • Previous loan applications
  • Historical loan balances
  • Credit Bureau data
  • Payment history records

    All data can be found here.

Approach

  1. Data Exploration: Analyzed and visualized the relationships within the data to understand key features related to loan default risk.
  2. Data Preprocessing: Handled missing data, performed feature engineering, and created scalable preprocessing steps.
  3. Predictive Modeling: Developed and tuned machine learning models to predict the likelihood of loan default, with an emphasis on performance metrics.
  4. App Deployment: Deployed the final model as a containerized application to demonstrate its real-world use case.

Project Structure

.
├── README.md
├── data 
(contains some ready-made folders for data created in notebooks)
├── deployment
│   ├── Dockerfile
│   ├── app
│   │   └── main.py
│   ├── data (folder for X.parquet which is generated in notebook 5_ML_models)
│   ├── model
│   │   └── model.pkl
│   └── requirements.txt 
|             (requirements for containerization)
├── notebooks 
(RUN THESE IN ORDER TO SEE MY DEVELOPMENT PROCESS FOR THIS PROJECT)
│   ├── 1_feature_investigation_main_dataset.ipynb
│   ├── 2_feature_engineering_supplementary_data.ipynb
│   ├── 3_feature_preprocess_preliminary_models.ipynb
│   ├── 4_EDA.ipynb
│   └── 5_ML_models.ipynb
├── requirements.txt 
|   (an exact replica of my development environment)
└── utils
    ├── __init__.py
    ├── feature_tools.py
    ├── machine_learning.py
    ├── plot.py
    └── utils.py

Usage

I recommend creating a virtual environment, in this case I call it "home-credit".

In terminal:

python -m venv home-credit 

Activate venv in terminal

source home-credit/bin/activate

side note: can deactivate venv with

deactivate

Install all requirements by first going to the directory where requirements.txt is (e.g. project root directory)

cd name/of/root/directory

and then typing in terminal:

pip install -r requirements.txt

Now you are ready to run the Jupyter notebooks found in the notebooks directory using your favorite IDE or

jupyter lab

Step through the notebooks sequentially to gain an understanding of my workflow and the predictive algorithm that I generated.

Moreover, in the deployment directory there are all the necessary files in order run a containerized version of the loan prediction app.

Requirements

See full list of requirements with exact versions to recreate my development environment in requirements.txt

Key Requirements:

  • Boruta
  • jupyterlab
  • lightgbm
  • matplotlib
  • numpy
  • optuna
  • pandas
  • phik
  • scikit-learn
  • scipy
  • seaborn
  • shap
  • tqdm

License

MIT

Contact

Miguel A. Diaz-Acevedo at [email protected]

About

Machine Learning-based Risk Assessment for Retail Lenders -- Proof of Concept using Home Credit Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages