Repo with the instructions for the Mini Project III.
This mini project is dedicated to following topics:
- Data Wrangling
- Data Visualization
- Data Preparation and Feature Engineering
- Dimensionality Reduction
- Unsupervised Learning
We will be using old data about different financial transactions. You can download the data from here. The data contains following tables:
- twm_customer - information about customers
- twm_accounts - information about accounts
- twm_checking_accounts - information about checking accounts (subset of twm_accounts)
- twm_credit_accounts - information about checking accounts (subset of twm_accounts)
- twm_savings_accounts - information about checking accounts (subset of twm_accounts)
- twm_transactions - information about financial transactions
- twm_savings_tran - information about savings transactions (subset of twm_transactions)
- twm_checking_tran - information about savings transactions (subset of twm_transactions)
- twm_credit_tran - information about credit checking (subset of twm_transactions)
In this miniproject, we will:
- create two separate customer segmentations (using clustering) to split them into 3-5 clusters:
- based on demographics (only on the information from twm_customer)
- based on their banking behavior. We can take following things into consideration as banking behavior:
- do they have savings account? How much do they save?
- do they have credit account? How much do they live in debt?
- are they making lot of small transactions or few huge ones?
- visualize the created clusters using radar charts and compare them against each other
- visualize segmentations using scatter plot. We will have to use PCA to be able to plot our observations in 2D.
- (stretch) visualize in 2D how our clusters are evolving in each iteration of KMeans (for at least 20 iterations).
- we will need to create own implementation of kmeans so we can see what is happening with the clusters during the iterations.