My objective in this project was to work with team to discover some structure to E commerce cosmetics data data using unsupervised learning techniques. We had access to 250,000 user-journeys and were asked to:
a. Find patterns among customer purchasing behaviors to identify categories of customers, i.e. how many categories of customers are there based on their purchasing behavior?
b. Visually inspect the customer categories to identify the distinctive categories and their most important features.
We ranked the data by feature importance and then applied dimensionality reduction and K-means clustering. We then analysed the resulting clusters to see which ones had the highest purchase ratios. We also calculated what percentage of the dataset's samples belonged to each cluster.
Result of project
Our team discovered that one of the clusters accounted for 94% of the data samples and 11.5% of users made a purchase. There were two other clusters of interest. One accounted for only 1.5% of the data but users made a purchase 26% of the time. Another cluster accounted for 4% of the data and users made a purchase 19% of the time.