Smart Agent Recruitment

📝 Description

This is a classification machine learning problem to identify the best agents / applicants, for a Financial Distribution company, who will be able to source business for the company within 3 months post their 7 day corporate training.
In this project the predictions are made using LightGBM. Other models like XGBoost and AdaBoost was also used for experimentation.
A Power BI Dashboard is developed to capture the past trends in agent recruitment and derive meaningful insights from the Data.

📊 Power BI Dashboard

The Overview Dashboard and the Applicant Details Dashboard for FinMan Agent Recruitment.

📁 Code

Smart Agent Recruitment.ipynb
Smart Agent Recruitment.ipynb (nbviewer) (Click on this link if the notebook doesnot load on Github.)

⌛ Dataset

The dataset train.csv is used for training. The train dataset had 9,527 records with 23 features.
The dataset consisted the following attributes :

ID : Application ID for the Applicant.
Office_PIN
Application_Receipt_Date
Applicant_City_PIN
Applicant_Gender
Applicant_Birthdate
Applicant_Marital_Status
Applicant_Occupation
Applicant_Qualification
Manager_DOJ
Manager_Joining_Designation
Manager_Current_Designation
Manager_Grade
Manager_Status : Status of Employment of the Manager (Confirmed / Probation).
Manager_Gender
Manager_DoB
Manager_Num_Application : Number of applications sourced by the Manager.
Manager_Num_Coded
Manager_Business : Amount of Business Sourced by the Manager in the last 3 months.
Manager_Num_Products : Number of Produts sold by the Manager in the last 3 months.
Manager_Business2 : Amount of Business Sourced by the Manager in the last 3 months excluding the amount sourced by Category A advisor.
Manager_Num_Products2 : Number of Produts sold by the Manager in the last 3 months excluding the number sold by Category A advisor.
Business_Sourced : If the Applicant was able to source Business within 3 months (0 : Didn't Source Business , 1 : Sourced Business).

📃 Technical Overview

The project has been divided into the following steps :

1. Exploratory Data Analysis

In this step features having missing values and outliers, target variable distribution, numerical feature distribution, categorical feature distribution, Univariate and Bivariate Analysis was performed.
Some of the data insights are given below. (For the detail EDA please refer to the ipynb notebook)

During univariate Analysis, it is oberserved that all the numerical features had skewness.
The features Manager_Business and Manager_Business2 are highly coorelated. Similarly a high correlation is observed between Manager_Num_Products and Manager_Num_Products2. In order to remove multi-colinearity the columns Manager_Business2 and Manager_Num_Products2 will be dropped.
As expected there will be a strong correlation between Manager_Num_Products and Manager_Business. As the number of products sold increases the amount of business sourced by the Manager also increased.
The peak number of applications were received in the month of May, 2007. In initial months the number of applicatins received was low. However the number increased in the subsequent months.The a huge bulk of applications are received in the months starting from July till December in both the years of 2007 and 2008.
It is observed that initially in the period of Apr - Aug 2007, the number of products sold where business was sourced is very lesser than the times when the business was not sourced. The number of products sold where business was sourced started to increase in September, 2007. The difference between the number of products sold between busniess sourced and non-soucred gradually decreased and this trend continued till March, 2008. There were instances where Number of products sold when business was sourced is more than that when not sourced.
On investigating each applications received throughout the time period, a trend is captured. For a particular day the agent's application which was received first or relatively at the beginning of the day was able to source business within 3 months post 7 day training. This pattern is observed across all the 16 months of the train dataset. This trend will be captured in a feature in the Feature Engineering step.

2. Data Preprocessing / Cleaning

19 features out of 23 had missing values.
The Arbitray Value imputation is done for handling missing values in the numerical, categorical and date columns / features.
The date columns were converted to proper datetime data type.
Irrelevent features were dropped from train and test datasets.

3. Feature Engineering

In this step 4 extra numerical features were created :
- Agent_Age : The age of the Applicant / agent as on Application Receipt Date.
- Manager_Age : The age of the Manager as on Application Receipt Date.
- Manager_Exp : The work experience of Manager in the company.
- App_Order_Percent : Percentile of the position of the Application Received calculated at a daily level.
The categorical features (Applicant_Gender, Applicant_Occupation) were One Hot Encoded and (Manager_Joining_Designation, Manager_Current_Designation) were Label Encoded.

📈 Modelling and Evaluation

In the modelling part, the following models are used :
- XGBoost (Mean CV Scores : 0.88256, Variance in CV Scores : 0.00564)
- Light Gradient Boosting (Mean CV Scores : 0.8826, Variance in CV Scores : 0.00338)
- AdaBoost (Mean CV Scores : 0.8755 0.000684)
The scoring metric is ROC_AUC.
Randomized Search CV is used for hyperparameter tuning and finding the best parameters under roc_auc scoring.

📋 Results

Feature Importance

XGBoost

In the XGBoost model, the top 5 features of importance are : Agent_Age, App_Order_percent, Manager_Age, Applicant_City_PIN and Manager_Exp.

LightGBM

In the LightGBM model, the top 5 features of importance are : App_Order_percent, Manager_Exp, Applicant_City_PIN, Office_PIN, Manager_Age and Agent_Age.

⚙️ Tools and Technologies used

The tools used in this project include:

Python - This was needed to conduct Data Quality Assessment, Data Cleaning processes, Exploratory Data Analysis of the datasets and to gain useful insights, feature engineering and building the model.
Power BI - This Business Intelligence tool was required to explore data and create charts, graphs, visualizations to come up with a Dashboard to capture the past trends in Agent Recruitment.

✒️ Authors

Abhishek Chowdhury - Github Profile

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
dashboard images		dashboard images
README.md		README.md
Smart Agent Recruitment Dashboard.pbix		Smart Agent Recruitment Dashboard.pbix
Smart Agent Recruitment.ipynb		Smart Agent Recruitment.ipynb
lgbm_pred_cv.csv		lgbm_pred_cv.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart Agent Recruitment

📝 Description

📊 Power BI Dashboard

📁 Code

⌛ Dataset

📃 Technical Overview

1. Exploratory Data Analysis

2. Data Preprocessing / Cleaning

3. Feature Engineering

📈 Modelling and Evaluation

📋 Results

Feature Importance

XGBoost

LightGBM

⚙️ Tools and Technologies used

✒️ Authors

About

Releases

Packages

Languages

AbhishekGit-hash/Smart-Agent-Recruitment

Folders and files

Latest commit

History

Repository files navigation

Smart Agent Recruitment

📝 Description

📊 Power BI Dashboard

📁 Code

⌛ Dataset

📃 Technical Overview

1. Exploratory Data Analysis

2. Data Preprocessing / Cleaning

3. Feature Engineering

📈 Modelling and Evaluation

📋 Results

Feature Importance

XGBoost

LightGBM

⚙️ Tools and Technologies used

✒️ Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages