Our aim from the project is to make use of pandas, matplotlib, & seaborn libraries from python to extract insights from the data and xgboost, & scikit-learn libraries for machine learning.
Secondly, to learn how to hypertune the parameters using grid search cross validation for the xgboost machine learning model.
And in the end, to predict whether the loan applicant can replay the loan or not using voting ensembling techniques of combining the predictions from multiple machine learning algorithms.
Loan id, Gender, Married, Dependents, Education, Self Employed, Applicant income, Coapplicant income, Loan Amount,Credit History, Property_Area, Loan_Status
-
Applicants who are male and married tends to have more applicant income whereas applicant who are female and married have least applicant income
-
Applicants who are male and are graduated have more applicant income over the applicants who have not graduated.
-
Again the applicants who are married and graduated have the more applicant income.
-
Applicants who are not self employed have more applicant income than the applicants who are self employed.
-
Applicants who have more dependents have least applicant income whereas applicants which have no dependents have maximum applicant income.
-
Applicants who have property in urban and have credit history have maximum applicant income
-
Applicants who are graduate and have credit history have more applicant income.
-
Loan Amount is linearly dependent on Applicant income
-
From heatmaps, applicant income and loan amount are highly positively correlated.
-
Male applicants are more than female applicants.
-
No of applicants who are married are more than no of applicants who are not married.
-
Applicants with no dependents are maximum.
-
Applicants with graduation are more than applicants whith no graduation.
-
Property area is to be find more in semi urban areas and minimum in rural areas.