Built a supervised multi-class predictive model to bucket customers based on the events and actions recorded during their interactions with the VMWare's customer engagement portals
- Suggest business on come up with a set of segment rules to identify top individuals for a digital asset and to target them with personalization on the website.
- Have substantiated marketing and sales implications.
- Fine tune predictive models with appropriate parameters to predict for customer segments
R programming - SMOTE, LiblineaR, ggplot2, randomForest, RRF, gbm, xgboost
- SMOTE
- Random Forest
- LASSO
- Ridge Regression
- XGBoost
The dataset has 700+ variables and 50K+ records of customer interactions. The data set is confidential and could be bought from Harvard Business Publication.
Removed null valued rows and performed SMOTE to balance the dataset for customer converts vs non customer converts. Loaded the datas set for modeling. The code can be accessed here.
Post exploratory analysis, we decide on important variables as predictors for our model. We use the mean decrease in Gini Index to pick on the important variables and reduce the number of dimensions in the feature set.
Post running this model, we use the mean decrease in gini index to list the important variables. Below is the list of variables which are potential predictors for a model.
We built a Lasso regression model using the top 200 variables that came out significant from the Random Forest model. We performed Cross-validation to get the best cost paramater for the LASSO regression.
We also built XGBoost model using the top 200 variables from the Random Forest model. The XGBoost model outperformed the LASSO regression in terms of accuracy and recall by 6% and 4% . However the model lacks interpretability in understanding what veriables influence the marketing and sales of products on the digital portals.
-
Interms of factors influencing the conversion of a vistor to customer, - "product page views, first data of download, top resources and pdf downloads" are top variables with high importance.
-
Vistors who view the product page more than average views for the page, should be priortized and personalization is required.
-
Vistors downloading more PDFs of various products are interested in understanding the details further, hence persuing them will add up for conversion rate.