From cddf82e63962618ad1fcfb0972c78fc092474560 Mon Sep 17 00:00:00 2001 From: miqueasmd Date: Tue, 10 Dec 2024 13:09:41 +0100 Subject: [PATCH] Update lab-dw-aggregating.ipynb --- lab-dw-aggregating.ipynb | 1320 ++++++++++++++++++++++++++++++++++---- 1 file changed, 1180 insertions(+), 140 deletions(-) diff --git a/lab-dw-aggregating.ipynb b/lab-dw-aggregating.ipynb index fff3ae5..ac396c4 100644 --- a/lab-dw-aggregating.ipynb +++ b/lab-dw-aggregating.ipynb @@ -1,161 +1,1201 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "31969215-2a90-4d8b-ac36-646a7ae13744", - "metadata": { - "id": "31969215-2a90-4d8b-ac36-646a7ae13744" - }, - "source": [ - "# Lab | Data Aggregation and Filtering" - ] - }, - { - "cell_type": "markdown", - "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d", - "metadata": { - "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d" - }, - "source": [ - "In this challenge, we will continue to work with customer data from an insurance company. We will use the dataset called marketing_customer_analysis.csv, which can be found at the following link:\n", - "\n", - "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\n", - "\n", - "This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by first performing data cleaning, formatting, and structuring." - ] - }, - { - "cell_type": "markdown", - "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50", - "metadata": { - "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50" - }, - "source": [ - "1. Create a new DataFrame that only includes customers who have a total_claim_amount greater than $1,000 and have a response of \"Yes\" to the last marketing campaign." - ] - }, + "cells": [ + { + "cell_type": "markdown", + "id": "31969215-2a90-4d8b-ac36-646a7ae13744", + "metadata": { + "id": "31969215-2a90-4d8b-ac36-646a7ae13744" + }, + "source": [ + "# Lab | Data Aggregation and Filtering" + ] + }, + { + "cell_type": "markdown", + "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d", + "metadata": { + "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d" + }, + "source": [ + "In this challenge, we will continue to work with customer data from an insurance company. We will use the dataset called marketing_customer_analysis.csv, which can be found at the following link:\n", + "\n", + "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\n", + "\n", + "This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by first performing data cleaning, formatting, and structuring." + ] + }, + { + "cell_type": "markdown", + "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50", + "metadata": { + "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50" + }, + "source": [ + "1. Create a new DataFrame that only includes customers who have a total_claim_amount greater than $1,000 and have a response of \"Yes\" to the last marketing campaign." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "8c71d31e", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "b9be383e-5165-436e-80c8-57d4c757c8c3", - "metadata": { - "id": "b9be383e-5165-436e-80c8-57d4c757c8c3" - }, - "source": [ - "2. Using the original Dataframe, analyze the average total_claim_amount by each policy type and gender for customers who have responded \"Yes\" to the last marketing campaign. Write your conclusions." + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0CustomerStateCustomer Lifetime ValueResponseCoverageEducationEffective To DateEmploymentStatusGender...Number of Open ComplaintsNumber of PoliciesPolicy TypePolicyRenew Offer TypeSales ChannelTotal Claim AmountVehicle ClassVehicle SizeVehicle Type
189189OK31456California11009.130490YesPremiumBachelor1/24/11EmployedF...0.01Corporate AutoCorporate L3Offer2Agent1358.400000Luxury CarMedsizeNaN
236236YJ16163Oregon11009.130490YesPremiumBachelor1/24/11EmployedF...0.01Special AutoSpecial L3Offer2Agent1358.400000Luxury CarMedsizeA
419419GW43195Oregon25807.063000YesExtendedCollege2/13/11EmployedF...1.02Personal AutoPersonal L2Offer1Branch1027.200000Luxury CarSmallA
442442IP94270Arizona13736.132500YesPremiumMaster2/13/11DisabledF...0.08Personal AutoPersonal L2Offer1Web1261.319869SUVMedsizeA
587587FJ28407California5619.689084YesPremiumHigh School or Below1/26/11UnemployedM...0.01Personal AutoPersonal L1Offer2Web1027.000029SUVMedsizeA
\n", + "

5 rows × 26 columns

\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 Customer State Customer Lifetime Value Response \\\n", + "189 189 OK31456 California 11009.130490 Yes \n", + "236 236 YJ16163 Oregon 11009.130490 Yes \n", + "419 419 GW43195 Oregon 25807.063000 Yes \n", + "442 442 IP94270 Arizona 13736.132500 Yes \n", + "587 587 FJ28407 California 5619.689084 Yes \n", + "\n", + " Coverage Education Effective To Date EmploymentStatus Gender \\\n", + "189 Premium Bachelor 1/24/11 Employed F \n", + "236 Premium Bachelor 1/24/11 Employed F \n", + "419 Extended College 2/13/11 Employed F \n", + "442 Premium Master 2/13/11 Disabled F \n", + "587 Premium High School or Below 1/26/11 Unemployed M \n", + "\n", + " ... Number of Open Complaints Number of Policies Policy Type \\\n", + "189 ... 0.0 1 Corporate Auto \n", + "236 ... 0.0 1 Special Auto \n", + "419 ... 1.0 2 Personal Auto \n", + "442 ... 0.0 8 Personal Auto \n", + "587 ... 0.0 1 Personal Auto \n", + "\n", + " Policy Renew Offer Type Sales Channel Total Claim Amount \\\n", + "189 Corporate L3 Offer2 Agent 1358.400000 \n", + "236 Special L3 Offer2 Agent 1358.400000 \n", + "419 Personal L2 Offer1 Branch 1027.200000 \n", + "442 Personal L2 Offer1 Web 1261.319869 \n", + "587 Personal L1 Offer2 Web 1027.000029 \n", + "\n", + " Vehicle Class Vehicle Size Vehicle Type \n", + "189 Luxury Car Medsize NaN \n", + "236 Luxury Car Medsize A \n", + "419 Luxury Car Small A \n", + "442 SUV Medsize A \n", + "587 SUV Medsize A \n", + "\n", + "[5 rows x 26 columns]" ] - }, + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "# Load the dataset\n", + "url = \"https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\"\n", + "df = pd.read_csv(url)\n", + "\n", + "# Filter the DataFrame\n", + "filtered_df = df[(df['Total Claim Amount'] > 1000) & (df['Response'] == 'Yes')]\n", + "\n", + "# Display the filtered DataFrame\n", + "filtered_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "b9be383e-5165-436e-80c8-57d4c757c8c3", + "metadata": { + "id": "b9be383e-5165-436e-80c8-57d4c757c8c3" + }, + "source": [ + "2. Using the original Dataframe, analyze the average total_claim_amount by each policy type and gender for customers who have responded \"Yes\" to the last marketing campaign. Write your conclusions." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "f9e17188", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0", - "metadata": { - "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0" - }, - "source": [ - "3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers." + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Policy TypeGenderTotal Claim Amount
0Corporate AutoF433.738499
1Corporate AutoM408.582459
2Personal AutoF452.965929
3Personal AutoM457.010178
4Special AutoF453.280164
5Special AutoM429.527942
\n", + "
" + ], + "text/plain": [ + " Policy Type Gender Total Claim Amount\n", + "0 Corporate Auto F 433.738499\n", + "1 Corporate Auto M 408.582459\n", + "2 Personal Auto F 452.965929\n", + "3 Personal Auto M 457.010178\n", + "4 Special Auto F 453.280164\n", + "5 Special Auto M 429.527942" ] - }, + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Filter the DataFrame for customers who responded \"Yes\"\n", + "responded_yes_df = df[df['Response'] == 'Yes']\n", + "\n", + "# Group by 'Policy Type' and 'Gender' and calculate the average 'Total Claim Amount'\n", + "average_claims = responded_yes_df.groupby(['Policy Type', 'Gender'])['Total Claim Amount'].mean().reset_index()\n", + "\n", + "# Display the result\n", + "average_claims" + ] + }, + { + "cell_type": "markdown", + "id": "7efbabc6", + "metadata": {}, + "source": [ + "Based on the provided data, here are some conclusions:\n", + "\n", + "1. **Corporate Auto**:\n", + " - Female customers have a slightly higher average total claim amount ($433.74) compared to male customers ($408.58).\n", + "\n", + "2. **Personal Auto**:\n", + " - Male customers have a slightly higher average total claim amount ($457.01) compared to female customers ($452.97).\n", + "\n", + "3. **Special Auto**:\n", + " - Female customers have a slightly higher average total claim amount ($453.28) compared to male customers ($429.53).\n", + "\n", + "### General Observations:\n", + "- For **Corporate Auto** policies, females tend to have higher average claim amounts than males.\n", + "- For **Personal Auto** policies, males tend to have higher average claim amounts than females.\n", + "- For **Special Auto** policies, females tend to have higher average claim amounts than males.\n", + "\n", + "### Marketing Implications:\n", + "- The insurance company might consider tailoring their marketing strategies based on these insights. For example, they could emphasize the benefits of Corporate Auto policies to female customers and Personal Auto policies to male customers.\n", + "- Understanding these differences can help in designing more effective marketing campaigns and customer service strategies to address the specific needs and behaviors of different customer segments." + ] + }, + { + "cell_type": "markdown", + "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0", + "metadata": { + "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0" + }, + "source": [ + "3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "c98ca385", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d", - "metadata": { - "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d" - }, - "source": [ - "4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions." + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateCustomer Count
0Arizona1937
1California3552
2Nevada993
3Oregon2909
4Washington888
\n", + "
" + ], + "text/plain": [ + " State Customer Count\n", + "0 Arizona 1937\n", + "1 California 3552\n", + "2 Nevada 993\n", + "3 Oregon 2909\n", + "4 Washington 888" ] - }, + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Group by 'State' and count the number of customers\n", + "state_counts = df.groupby('State').size().reset_index(name='Customer Count')\n", + "\n", + "# Filter the results to include only states with more than 500 customers\n", + "filtered_states = state_counts[state_counts['Customer Count'] > 500]\n", + "\n", + "# Display the result\n", + "filtered_states" + ] + }, + { + "cell_type": "markdown", + "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d", + "metadata": { + "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d" + }, + "source": [ + "4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "82a6611a", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "b42999f9-311f-481e-ae63-40a5577072c5", - "metadata": { - "id": "b42999f9-311f-481e-ae63-40a5577072c5" - }, - "source": [ - "## Bonus" + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EducationGendermaxminmedian
0BachelorF73225.956521904.0008525640.505303
1BachelorM67907.270501898.0076755548.031892
2CollegeF61850.188031898.6836865623.611187
3CollegeM61134.683071918.1197006005.847375
4DoctorF44856.113972395.5700005332.462694
5DoctorM32677.342842267.6040385577.669457
6High School or BelowF55277.445892144.9215356039.553187
7High School or BelowM83325.381191940.9812216286.731006
8MasterF51016.067042417.7770325729.855012
9MasterM50568.259122272.3073105579.099207
\n", + "
" + ], + "text/plain": [ + " Education Gender max min median\n", + "0 Bachelor F 73225.95652 1904.000852 5640.505303\n", + "1 Bachelor M 67907.27050 1898.007675 5548.031892\n", + "2 College F 61850.18803 1898.683686 5623.611187\n", + "3 College M 61134.68307 1918.119700 6005.847375\n", + "4 Doctor F 44856.11397 2395.570000 5332.462694\n", + "5 Doctor M 32677.34284 2267.604038 5577.669457\n", + "6 High School or Below F 55277.44589 2144.921535 6039.553187\n", + "7 High School or Below M 83325.38119 1940.981221 6286.731006\n", + "8 Master F 51016.06704 2417.777032 5729.855012\n", + "9 Master M 50568.25912 2272.307310 5579.099207" ] - }, + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Group by 'Education' and 'Gender' and calculate the max, min, and median 'Customer Lifetime Value'\n", + "clv_stats = df.groupby(['Education', 'Gender'])['Customer Lifetime Value'].agg(['max', 'min', 'median']).reset_index()\n", + "\n", + "# Display the result\n", + "clv_stats" + ] + }, + { + "cell_type": "markdown", + "id": "f38647e8", + "metadata": {}, + "source": [ + "Based on the provided data, here are some conclusions regarding the customer lifetime value (CLV) by education level and gender:\n", + "\n", + "### General Observations:\n", + "1. **Bachelor's Degree**:\n", + " - Female customers have a higher maximum CLV ($73,225.96) compared to male customers ($67,907.27).\n", + " - The median CLV is slightly higher for female customers ($5,640.51) compared to male customers ($5,548.03).\n", + "\n", + "2. **College**:\n", + " - Female customers have a higher maximum CLV ($61,850.19) compared to male customers ($61,134.68).\n", + " - The median CLV is higher for male customers ($6,005.85) compared to female customers ($5,623.61).\n", + "\n", + "3. **Doctorate**:\n", + " - Female customers have a higher maximum CLV ($44,856.11) compared to male customers ($32,677.34).\n", + " - The median CLV is slightly higher for male customers ($5,577.67) compared to female customers ($5,332.46).\n", + "\n", + "4. **High School or Below**:\n", + " - Male customers have a significantly higher maximum CLV ($83,325.38) compared to female customers ($55,277.45).\n", + " - The median CLV is higher for male customers ($6,286.73) compared to female customers ($6,039.55).\n", + "\n", + "5. **Master's Degree**:\n", + " - Female customers have a slightly higher maximum CLV ($51,016.07) compared to male customers ($50,568.26).\n", + " - The median CLV is slightly higher for female customers ($5,729.86) compared to male customers ($5,579.10).\n", + "\n", + "### Conclusions:\n", + "- **High School or Below**: Male customers in this education category have the highest maximum CLV ($83,325.38) among all groups, indicating that this segment might include some highly valuable customers.\n", + "- **Bachelor's Degree**: Female customers with a Bachelor's degree have the highest maximum CLV ($73,225.96) among female customers, suggesting that this segment is particularly valuable.\n", + "- **Doctorate**: Female customers with a Doctorate degree have a higher maximum CLV compared to their male counterparts, but the median CLV is higher for males.\n", + "- **College and Master's Degree**: The differences in maximum and median CLV between genders are relatively small, indicating a more balanced distribution of customer value in these education levels.\n", + "\n", + "### Marketing Implications:\n", + "- The insurance company might consider focusing marketing efforts on male customers with a high school education or below, as they have the highest maximum CLV.\n", + "- Female customers with a Bachelor's degree also represent a valuable segment and could be targeted with tailored marketing campaigns.\n", + "- Understanding these differences can help in designing more effective marketing strategies and customer retention programs to maximize the lifetime value of different customer segments." + ] + }, + { + "cell_type": "markdown", + "id": "b42999f9-311f-481e-ae63-40a5577072c5", + "metadata": { + "id": "b42999f9-311f-481e-ae63-40a5577072c5" + }, + "source": [ + "## Bonus" + ] + }, + { + "cell_type": "markdown", + "id": "81ff02c5-6584-4f21-a358-b918697c6432", + "metadata": { + "id": "81ff02c5-6584-4f21-a358-b918697c6432" + }, + "source": [ + "5. The marketing team wants to analyze the number of policies sold by state and month. Present the data in a table where the months are arranged as columns and the states are arranged as rows." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "4f175f4e", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "81ff02c5-6584-4f21-a358-b918697c6432", - "metadata": { - "id": "81ff02c5-6584-4f21-a358-b918697c6432" - }, - "source": [ - "5. The marketing team wants to analyze the number of policies sold by state and month. Present the data in a table where the months are arranged as columns and the states are arranged as rows." + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Month12
State
Arizona1008929
California19181634
Nevada551442
Oregon15651344
Washington463425
\n", + "
" + ], + "text/plain": [ + "Month 1 2\n", + "State \n", + "Arizona 1008 929\n", + "California 1918 1634\n", + "Nevada 551 442\n", + "Oregon 1565 1344\n", + "Washington 463 425" ] - }, + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Ensure the date column is in datetime format with specified format\n", + "df['Effective To Date'] = pd.to_datetime(df['Effective To Date'], format='%m/%d/%Y')\n", + "\n", + "# Extract the month from the date column\n", + "df['Month'] = df['Effective To Date'].dt.month\n", + "\n", + "# Group by 'State' and 'Month' and count the number of policies sold\n", + "policies_by_state_month = df.groupby(['State', 'Month']).size().reset_index(name='Policy Count')\n", + "\n", + "# Pivot the table to get months as columns and states as rows\n", + "pivot_table = policies_by_state_month.pivot(index='State', columns='Month', values='Policy Count').fillna(0)\n", + "\n", + "# Display the result\n", + "pivot_table" + ] + }, + { + "cell_type": "markdown", + "id": "b6aec097-c633-4017-a125-e77a97259cda", + "metadata": { + "id": "b6aec097-c633-4017-a125-e77a97259cda" + }, + "source": [ + "6. Display a new DataFrame that contains the number of policies sold by month, by state, for the top 3 states with the highest number of policies sold.\n", + "\n", + "*Hint:*\n", + "- *To accomplish this, you will first need to group the data by state and month, then count the number of policies sold for each group. Afterwards, you will need to sort the data by the count of policies sold in descending order.*\n", + "- *Next, you will select the top 3 states with the highest number of policies sold.*\n", + "- *Finally, you will create a new DataFrame that contains the number of policies sold by month for each of the top 3 states.*" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "327c08b5", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "b6aec097-c633-4017-a125-e77a97259cda", - "metadata": { - "id": "b6aec097-c633-4017-a125-e77a97259cda" - }, - "source": [ - "6. Display a new DataFrame that contains the number of policies sold by month, by state, for the top 3 states with the highest number of policies sold.\n", - "\n", - "*Hint:*\n", - "- *To accomplish this, you will first need to group the data by state and month, then count the number of policies sold for each group. Afterwards, you will need to sort the data by the count of policies sold in descending order.*\n", - "- *Next, you will select the top 3 states with the highest number of policies sold.*\n", - "- *Finally, you will create a new DataFrame that contains the number of policies sold by month for each of the top 3 states.*" + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Month12
State
Arizona1008929
California19181634
Oregon15651344
\n", + "
" + ], + "text/plain": [ + "Month 1 2\n", + "State \n", + "Arizona 1008 929\n", + "California 1918 1634\n", + "Oregon 1565 1344" ] - }, + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Extract the month from the date column\n", + "df['Month'] = df['Effective To Date'].dt.month\n", + "\n", + "# Group by 'State' and 'Month' and count the number of policies sold\n", + "policies_by_state_month = df.groupby(['State', 'Month']).size().reset_index(name='Policy Count')\n", + "\n", + "# Sum the total number of policies sold by state\n", + "total_policies_by_state = policies_by_state_month.groupby('State')['Policy Count'].sum().reset_index()\n", + "\n", + "# Sort the states by the total number of policies sold in descending order\n", + "top_states = total_policies_by_state.sort_values(by='Policy Count', ascending=False).head(3)\n", + "\n", + "# Filter the original grouped data to include only the top 3 states\n", + "top_states_policies = policies_by_state_month[policies_by_state_month['State'].isin(top_states['State'])]\n", + "\n", + "# Pivot the table to get months as columns and states as rows\n", + "pivot_table_top_states = top_states_policies.pivot(index='State', columns='Month', values='Policy Count').fillna(0)\n", + "\n", + "# Display the result\n", + "pivot_table_top_states" + ] + }, + { + "cell_type": "markdown", + "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009", + "metadata": { + "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009" + }, + "source": [ + "7. The marketing team wants to analyze the effect of different marketing channels on the customer response rate.\n", + "\n", + "Hint: You can use melt to unpivot the data and create a table that shows the customer response rate (those who responded \"Yes\") by marketing channel." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "933238fd", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009", - "metadata": { - "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009" - }, - "source": [ - "7. The marketing team wants to analyze the effect of different marketing channels on the customer response rate.\n", - "\n", - "Hint: You can use melt to unpivot the data and create a table that shows the customer response rate (those who responded \"Yes\") by marketing channel." + "data": { + "text/plain": [ + "array(['Agent', 'Call Center', 'Branch', 'Web'], dtype=object)" ] - }, + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['Sales Channel'].unique()" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "262da406", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d", - "metadata": { - "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d" - }, - "source": [ - "External Resources for Data Filtering: https://towardsdatascience.com/filtering-data-frames-in-pandas-b570b1f834b9" + "data": { + "text/plain": [ + "Unnamed: 0 int64\n", + "Customer object\n", + "State object\n", + "Customer Lifetime Value float64\n", + "Response object\n", + "Coverage object\n", + "Education object\n", + "Effective To Date datetime64[ns]\n", + "EmploymentStatus object\n", + "Gender object\n", + "Income int64\n", + "Location Code object\n", + "Marital Status object\n", + "Monthly Premium Auto int64\n", + "Months Since Last Claim float64\n", + "Months Since Policy Inception int64\n", + "Number of Open Complaints float64\n", + "Number of Policies int64\n", + "Policy Type object\n", + "Policy object\n", + "Renew Offer Type object\n", + "Sales Channel object\n", + "Total Claim Amount float64\n", + "Vehicle Class object\n", + "Vehicle Size object\n", + "Vehicle Type object\n", + "Month int32\n", + "dtype: object" ] - }, + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "88eea386", + "metadata": {}, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "id": "449513f4-0459-46a0-a18d-9398d974c9ad", - "metadata": { - "id": "449513f4-0459-46a0-a18d-9398d974c9ad" - }, - "outputs": [], - "source": [ - "# your code goes here" + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Sales ChannelResponse CountTotal CountResponse Rate
0Agent74241210.180053
1Branch32630220.107876
2Call Center22121410.103223
3Web17716260.108856
\n", + "
" + ], + "text/plain": [ + " Sales Channel Response Count Total Count Response Rate\n", + "0 Agent 742 4121 0.180053\n", + "1 Branch 326 3022 0.107876\n", + "2 Call Center 221 2141 0.103223\n", + "3 Web 177 1626 0.108856" ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.13" - }, - "colab": { - "provenance": [] - } + ], + "source": [ + "# Ensure the 'Response' column is in the correct format\n", + "df['Response'] = df['Response'].astype(str)\n", + "\n", + "# Identify the marketing channel column\n", + "marketing_channel_column = 'Sales Channel'\n", + "\n", + "# Filter the data to include only customers who responded \"Yes\"\n", + "responded_yes_df = df[df['Response'] == 'Yes']\n", + "\n", + "# Calculate the response count for each marketing channel\n", + "response_count = responded_yes_df.groupby(marketing_channel_column).size().reset_index(name='Response Count')\n", + "\n", + "# Calculate the total number of customers for each marketing channel\n", + "total_customers = df.groupby(marketing_channel_column).size().reset_index(name='Total Count')\n", + "\n", + "# Merge the response count and total count dataframes\n", + "merged_df = pd.merge(response_count, total_customers, on=marketing_channel_column)\n", + "\n", + "# Calculate the response rate\n", + "merged_df['Response Rate'] = merged_df['Response Count'] / merged_df['Total Count']\n", + "\n", + "# Display the result\n", + "merged_df" + ] + }, + { + "cell_type": "markdown", + "id": "ee462336", + "metadata": {}, + "source": [ + "Here are some conclusions regarding the effectiveness of different marketing channels on the customer response rate:\n", + "\n", + "### Data Summary:\n", + "- **Agent**:\n", + " - Response Count: 742\n", + " - Total Count: 4121\n", + " - Response Rate: 18.01%\n", + "\n", + "- **Branch**:\n", + " - Response Count: 326\n", + " - Total Count: 3022\n", + " - Response Rate: 10.79%\n", + "\n", + "- **Call Center**:\n", + " - Response Count: 221\n", + " - Total Count: 2141\n", + " - Response Rate: 10.32%\n", + "\n", + "- **Web**:\n", + " - Response Count: 177\n", + " - Total Count: 1626\n", + " - Response Rate: 10.89%\n", + "\n", + "### Conclusions:\n", + "1. **Agent Channel**:\n", + " - The Agent channel has the highest response rate at 18.01%.\n", + " - This indicates that customers are more likely to respond positively when contacted through an agent compared to other channels.\n", + " - The high response rate suggests that personal interaction through agents is effective in engaging customers.\n", + "\n", + "2. **Branch Channel**:\n", + " - The Branch channel has a response rate of 10.79%.\n", + " - While lower than the Agent channel, it is still relatively effective.\n", + " - This suggests that in-person interactions at branches can also be a good way to engage customers, though not as effective as agents.\n", + "\n", + "3. **Call Center Channel**:\n", + " - The Call Center channel has a response rate of 10.32%.\n", + " - This is slightly lower than the Branch channel.\n", + " - It indicates that while call centers are useful, they might not be as effective as personal or in-person interactions.\n", + "\n", + "4. **Web Channel**:\n", + " - The Web channel has a response rate of 10.89%.\n", + " - This is comparable to the Branch and Call Center channels.\n", + " - It suggests that online interactions are as effective as in-person and call center interactions, but still less effective than agent interactions.\n", + "\n", + "### Marketing Implications:\n", + "- **Focus on Agent Channel**: Given the highest response rate, the marketing team should consider investing more in the Agent channel. Training and expanding the agent network could yield higher customer engagement and response rates.\n", + "- **Enhance Branch and Web Channels**: Since the Branch and Web channels have similar response rates, efforts to improve customer experience in these channels could help increase their effectiveness. This could include better online tools, more personalized in-branch services, and targeted marketing campaigns.\n", + "- **Optimize Call Center Operations**: While the Call Center channel has the lowest response rate, it is still a significant channel. Improving call scripts, training call center staff, and using data analytics to target calls more effectively could help improve response rates.\n", + "\n", + "Overall, the data suggests that personal interactions through agents are the most effective way to engage customers, followed by in-person branch interactions and online/web interactions." + ] + }, + { + "cell_type": "markdown", + "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d", + "metadata": { + "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d" + }, + "source": [ + "External Resources for Data Filtering: https://towardsdatascience.com/filtering-data-frames-in-pandas-b570b1f834b9" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" }, - "nbformat": 4, - "nbformat_minor": 5 -} \ No newline at end of file + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}