diff --git a/lab-dw-aggregating.ipynb b/lab-dw-aggregating.ipynb index fff3ae5..aeb1b10 100644 --- a/lab-dw-aggregating.ipynb +++ b/lab-dw-aggregating.ipynb @@ -1,161 +1,1587 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "31969215-2a90-4d8b-ac36-646a7ae13744", - "metadata": { - "id": "31969215-2a90-4d8b-ac36-646a7ae13744" - }, - "source": [ - "# Lab | Data Aggregation and Filtering" - ] - }, - { - "cell_type": "markdown", - "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d", - "metadata": { - "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d" - }, - "source": [ - "In this challenge, we will continue to work with customer data from an insurance company. We will use the dataset called marketing_customer_analysis.csv, which can be found at the following link:\n", - "\n", - "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\n", - "\n", - "This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by first performing data cleaning, formatting, and structuring." - ] - }, - { - "cell_type": "markdown", - "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50", - "metadata": { - "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50" - }, - "source": [ - "1. Create a new DataFrame that only includes customers who have a total_claim_amount greater than $1,000 and have a response of \"Yes\" to the last marketing campaign." - ] - }, - { - "cell_type": "markdown", - "id": "b9be383e-5165-436e-80c8-57d4c757c8c3", - "metadata": { - "id": "b9be383e-5165-436e-80c8-57d4c757c8c3" - }, - "source": [ - "2. Using the original Dataframe, analyze the average total_claim_amount by each policy type and gender for customers who have responded \"Yes\" to the last marketing campaign. Write your conclusions." + "cells": [ + { + "cell_type": "markdown", + "id": "31969215-2a90-4d8b-ac36-646a7ae13744", + "metadata": { + "id": "31969215-2a90-4d8b-ac36-646a7ae13744" + }, + "source": [ + "# Lab | Data Aggregation and Filtering" + ] + }, + { + "cell_type": "markdown", + "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d", + "metadata": { + "id": "a8f08a52-bec0-439b-99cc-11d3809d8b5d" + }, + "source": [ + "In this challenge, we will continue to work with customer data from an insurance company. We will use the dataset called marketing_customer_analysis.csv, which can be found at the following link:\n", + "\n", + "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\n", + "\n", + "This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by first performing data cleaning, formatting, and structuring." + ] + }, + { + "cell_type": "markdown", + "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50", + "metadata": { + "id": "9c98ddc5-b041-4c94-ada1-4dfee5c98e50" + }, + "source": [ + "1. Create a new DataFrame that only includes customers who have a total_claim_amount greater than $1,000 and have a response of \"Yes\" to the last marketing campaign." + ] + }, + { + "cell_type": "markdown", + "id": "b9be383e-5165-436e-80c8-57d4c757c8c3", + "metadata": { + "id": "b9be383e-5165-436e-80c8-57d4c757c8c3" + }, + "source": [ + "2. Using the original Dataframe, analyze the average total_claim_amount by each policy type and gender for customers who have responded \"Yes\" to the last marketing campaign. Write your conclusions." + ] + }, + { + "cell_type": "markdown", + "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0", + "metadata": { + "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0" + }, + "source": [ + "3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers." + ] + }, + { + "cell_type": "markdown", + "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d", + "metadata": { + "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d" + }, + "source": [ + "4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions." + ] + }, + { + "cell_type": "markdown", + "id": "b42999f9-311f-481e-ae63-40a5577072c5", + "metadata": { + "id": "b42999f9-311f-481e-ae63-40a5577072c5" + }, + "source": [ + "## Bonus" + ] + }, + { + "cell_type": "markdown", + "id": "81ff02c5-6584-4f21-a358-b918697c6432", + "metadata": { + "id": "81ff02c5-6584-4f21-a358-b918697c6432" + }, + "source": [ + "5. The marketing team wants to analyze the number of policies sold by state and month. Present the data in a table where the months are arranged as columns and the states are arranged as rows." + ] + }, + { + "cell_type": "markdown", + "id": "b6aec097-c633-4017-a125-e77a97259cda", + "metadata": { + "id": "b6aec097-c633-4017-a125-e77a97259cda" + }, + "source": [ + "6. Display a new DataFrame that contains the number of policies sold by month, by state, for the top 3 states with the highest number of policies sold.\n", + "\n", + "*Hint:*\n", + "- *To accomplish this, you will first need to group the data by state and month, then count the number of policies sold for each group. Afterwards, you will need to sort the data by the count of policies sold in descending order.*\n", + "- *Next, you will select the top 3 states with the highest number of policies sold.*\n", + "- *Finally, you will create a new DataFrame that contains the number of policies sold by month for each of the top 3 states.*" + ] + }, + { + "cell_type": "markdown", + "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009", + "metadata": { + "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009" + }, + "source": [ + "7. The marketing team wants to analyze the effect of different marketing channels on the customer response rate.\n", + "\n", + "Hint: You can use melt to unpivot the data and create a table that shows the customer response rate (those who responded \"Yes\") by marketing channel." + ] + }, + { + "cell_type": "markdown", + "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d", + "metadata": { + "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d" + }, + "source": [ + "External Resources for Data Filtering: https://towardsdatascience.com/filtering-data-frames-in-pandas-b570b1f834b9" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "449513f4-0459-46a0-a18d-9398d974c9ad", + "metadata": { + "id": "449513f4-0459-46a0-a18d-9398d974c9ad" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0CustomerStateCustomer Lifetime ValueResponseCoverageEducationEffective To DateEmploymentStatusGender...Number of Open ComplaintsNumber of PoliciesPolicy TypePolicyRenew Offer TypeSales ChannelTotal Claim AmountVehicle ClassVehicle SizeVehicle Type
00DK49336Arizona4809.216960NoBasicCollege2/18/11EmployedM...0.09Corporate AutoCorporate L3Offer3Agent292.800000Four-Door CarMedsizeNaN
11KX64629California2228.525238NoBasicCollege1/18/11UnemployedF...0.01Personal AutoPersonal L3Offer4Call Center744.924331Four-Door CarMedsizeNaN
22LZ68649Washington14947.917300NoBasicBachelor2/10/11EmployedM...0.02Personal AutoPersonal L3Offer3Call Center480.000000SUVMedsizeA
33XL78013Oregon22332.439460YesExtendedCollege1/11/11EmployedM...0.02Corporate AutoCorporate L3Offer2Branch484.013411Four-Door CarMedsizeA
44QA50777Oregon9025.067525NoPremiumBachelor1/17/11Medical LeaveF...NaN7Personal AutoPersonal L2Offer1Branch707.925645Four-Door CarMedsizeNaN
\n", + "

5 rows × 26 columns

\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 Customer State Customer Lifetime Value Response \\\n", + "0 0 DK49336 Arizona 4809.216960 No \n", + "1 1 KX64629 California 2228.525238 No \n", + "2 2 LZ68649 Washington 14947.917300 No \n", + "3 3 XL78013 Oregon 22332.439460 Yes \n", + "4 4 QA50777 Oregon 9025.067525 No \n", + "\n", + " Coverage Education Effective To Date EmploymentStatus Gender ... \\\n", + "0 Basic College 2/18/11 Employed M ... \n", + "1 Basic College 1/18/11 Unemployed F ... \n", + "2 Basic Bachelor 2/10/11 Employed M ... \n", + "3 Extended College 1/11/11 Employed M ... \n", + "4 Premium Bachelor 1/17/11 Medical Leave F ... \n", + "\n", + " Number of Open Complaints Number of Policies Policy Type Policy \\\n", + "0 0.0 9 Corporate Auto Corporate L3 \n", + "1 0.0 1 Personal Auto Personal L3 \n", + "2 0.0 2 Personal Auto Personal L3 \n", + "3 0.0 2 Corporate Auto Corporate L3 \n", + "4 NaN 7 Personal Auto Personal L2 \n", + "\n", + " Renew Offer Type Sales Channel Total Claim Amount Vehicle Class \\\n", + "0 Offer3 Agent 292.800000 Four-Door Car \n", + "1 Offer4 Call Center 744.924331 Four-Door Car \n", + "2 Offer3 Call Center 480.000000 SUV \n", + "3 Offer2 Branch 484.013411 Four-Door Car \n", + "4 Offer1 Branch 707.925645 Four-Door Car \n", + "\n", + " Vehicle Size Vehicle Type \n", + "0 Medsize NaN \n", + "1 Medsize NaN \n", + "2 Medsize A \n", + "3 Medsize A \n", + "4 Medsize NaN \n", + "\n", + "[5 rows x 26 columns]" ] - }, - { - "cell_type": "markdown", - "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0", - "metadata": { - "id": "7050f4ac-53c5-4193-a3c0-8699b87196f0" - }, - "source": [ - "3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers." + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "url = \"https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv\"\n", + "\n", + "df_1 = pd.read_csv(url)\n", + "\n", + "df_1.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "id": "2ebb0ee8-cf2d-4afa-abbb-81600375cc20", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " Unnamed: 0 Customer State Customer Lifetime Value Response \\\n", + "189 189 OK31456 California 11009.130490 Yes \n", + "236 236 YJ16163 Oregon 11009.130490 Yes \n", + "419 419 GW43195 Oregon 25807.063000 Yes \n", + "442 442 IP94270 Arizona 13736.132500 Yes \n", + "587 587 FJ28407 California 5619.689084 Yes \n", + "... ... ... ... ... ... \n", + "10351 10351 FN44127 Oregon 3508.569533 Yes \n", + "10373 10373 XZ64172 Oregon 10963.957230 Yes \n", + "10487 10487 IX60941 Oregon 3508.569533 Yes \n", + "10565 10565 QO62792 Oregon 7840.165778 Yes \n", + "10708 10708 CK39096 Oregon 5619.689084 Yes \n", + "\n", + " Coverage Education Effective To Date EmploymentStatus \\\n", + "189 Premium Bachelor 1/24/11 Employed \n", + "236 Premium Bachelor 1/24/11 Employed \n", + "419 Extended College 2/13/11 Employed \n", + "442 Premium Master 2/13/11 Disabled \n", + "587 Premium High School or Below 1/26/11 Unemployed \n", + "... ... ... ... ... \n", + "10351 Extended College 1/5/11 Medical Leave \n", + "10373 Premium High School or Below 2/8/11 Employed \n", + "10487 Extended College 1/5/11 Medical Leave \n", + "10565 Extended College 1/14/11 Employed \n", + "10708 Premium High School or Below 1/26/11 Unemployed \n", + "\n", + " Gender ... Number of Open Complaints Number of Policies \\\n", + "189 F ... 0.0 1 \n", + "236 F ... 0.0 1 \n", + "419 F ... 1.0 2 \n", + "442 F ... 0.0 8 \n", + "587 M ... 0.0 1 \n", + "... ... ... ... ... \n", + "10351 M ... 1.0 1 \n", + "10373 M ... 0.0 1 \n", + "10487 M ... 1.0 1 \n", + "10565 M ... 2.0 1 \n", + "10708 M ... 0.0 1 \n", + "\n", + " Policy Type Policy Renew Offer Type Sales Channel \\\n", + "189 Corporate Auto Corporate L3 Offer2 Agent \n", + "236 Special Auto Special L3 Offer2 Agent \n", + "419 Personal Auto Personal L2 Offer1 Branch \n", + "442 Personal Auto Personal L2 Offer1 Web \n", + "587 Personal Auto Personal L1 Offer2 Web \n", + "... ... ... ... ... \n", + "10351 Personal Auto Personal L2 Offer2 Branch \n", + "10373 Corporate Auto Corporate L2 Offer1 Agent \n", + "10487 Personal Auto Personal L3 Offer2 Branch \n", + "10565 Personal Auto Personal L3 Offer2 Agent \n", + "10708 Personal Auto Personal L3 Offer2 Web \n", + "\n", + " Total Claim Amount Vehicle Class Vehicle Size Vehicle Type \n", + "189 1358.400000 Luxury Car Medsize NaN \n", + "236 1358.400000 Luxury Car Medsize A \n", + "419 1027.200000 Luxury Car Small A \n", + "442 1261.319869 SUV Medsize A \n", + "587 1027.000029 SUV Medsize A \n", + "... ... ... ... ... \n", + "10351 1176.278800 Four-Door Car Small NaN \n", + "10373 1324.800000 Luxury SUV Medsize NaN \n", + "10487 1176.278800 Four-Door Car Small NaN \n", + "10565 1008.000000 NaN NaN NaN \n", + "10708 1027.000029 SUV Medsize A \n", + "\n", + "[67 rows x 26 columns]\n" + ] + } + ], + "source": [ + "filtered_df = df_1[(df_1['Total Claim Amount'] > 1000) & (df_1['Response'] == 'Yes')]\n", + "print(filtered_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "9e843c75-7350-4b40-b52e-3b96c444c391", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Aggregated Data:\n", + " Response Total Claim Amount Unnamed: 0 \\\n", + "0 No 0.099007 3663 \n", + "1 No 0.382107 6869 \n", + "2 No 0.423310 13169 \n", + "3 No 0.517753 7567 \n", + "4 No 0.769185 10529 \n", + "... ... ... ... \n", + "5161 Yes 1261.319869 31890 \n", + "5162 Yes 1294.700423 44458 \n", + "5163 Yes 1300.800000 34595 \n", + "5164 Yes 1324.800000 46968 \n", + "5165 Yes 1358.400000 28239 \n", + "\n", + " Customer \\\n", + "0 EY74093 \n", + "1 PN21042PN21042 \n", + "2 IS30186IS30186 \n", + "3 HG65722 \n", + "4 WW74234 \n", + "... ... \n", + "5161 IP94270ZA45956TC99043MO21402MC30750FD73388 \n", + "5162 ZW93288CG15505UP29839YU86361IV54766GD55093 \n", + "5163 MA15172UY18770QW47320ZR79373DT77901YP53257 \n", + "5164 JC11405WP58340TQ84956NZ60700XT93203WP58340XZ64172 \n", + "5165 OK31456YJ16163TA66375FH77504TY31217AJ89108TY31217 \n", + "\n", + " State \\\n", + "0 Nevada \n", + "1 ArizonaArizona \n", + "2 CaliforniaCalifornia \n", + "3 Oregon \n", + "4 Arizona \n", + "... ... \n", + "5161 ArizonaCaliforniaCaliforniaOregonOregonArizona \n", + "5162 CaliforniaCaliforniaNevadaWashingtonOregonCali... \n", + "5163 CaliforniaCaliforniaCaliforniaArizonaOregonOregon \n", + "5164 OregonArizonaCaliforniaWashingtonCaliforniaAri... \n", + "5165 CaliforniaOregonOregonCaliforniaWashingtonCali... \n", + "\n", + " Customer Lifetime Value \\\n", + "0 5004.135361 \n", + "1 9077.695636 \n", + "2 12372.995662 \n", + "3 12819.102890 \n", + "4 3969.433177 \n", + "... ... \n", + "5161 82416.795000 \n", + "5162 114965.939640 \n", + "5163 61078.302240 \n", + "5164 76747.700610 \n", + "5165 77063.913430 \n", + "\n", + " Coverage \\\n", + "0 Basic \n", + "1 BasicBasic \n", + "2 ExtendedExtended \n", + "3 Premium \n", + "4 Basic \n", + "... ... \n", + "5161 PremiumPremiumPremiumPremiumPremiumPremium \n", + "5162 ExtendedExtendedExtendedExtendedExtendedExtended \n", + "5163 PremiumPremiumPremiumPremiumPremiumPremium \n", + "5164 PremiumPremiumPremiumPremiumPremiumPremiumPremium \n", + "5165 PremiumPremiumPremiumPremiumPremiumPremiumPremium \n", + "\n", + " Education \\\n", + "0 High School or Below \n", + "1 BachelorBachelor \n", + "2 CollegeCollege \n", + "3 Doctor \n", + "4 College \n", + "... ... \n", + "5161 MasterMasterMasterMasterMasterMaster \n", + "5162 MasterMasterMasterMasterMasterMaster \n", + "5163 BachelorBachelorBachelorBachelorBachelorBachelor \n", + "5164 High School or BelowHigh School or BelowHigh S... \n", + "5165 BachelorBachelorBachelorBachelorBachelorBachel... \n", + "\n", + " Effective To Date \\\n", + "0 2/21/11 \n", + "1 2/17/112/17/11 \n", + "2 1/31/111/31/11 \n", + "3 1/25/11 \n", + "4 2/5/11 \n", + "... ... \n", + "5161 2/13/112/13/112/13/112/13/112/13/112/13/11 \n", + "5162 2/20/112/20/112/20/112/20/112/20/112/20/11 \n", + "5163 1/5/111/5/111/5/111/5/111/5/111/5/11 \n", + "5164 2/8/112/8/112/8/112/8/112/8/112/8/112/8/11 \n", + "5165 1/24/111/24/111/24/111/24/111/24/111/24/111/24/11 \n", + "\n", + " EmploymentStatus ... \\\n", + "0 Employed ... \n", + "1 EmployedEmployed ... \n", + "2 EmployedEmployed ... \n", + "3 Disabled ... \n", + "4 Employed ... \n", + "... ... ... \n", + "5161 DisabledDisabledDisabledDisabledDisabledDisabled ... \n", + "5162 EmployedEmployedEmployedEmployedEmployedEmployed ... \n", + "5163 RetiredRetiredRetiredRetiredRetiredRetired ... \n", + "5164 EmployedEmployedEmployedEmployedEmployedEmploy... ... \n", + "5165 EmployedEmployedEmployedEmployedEmployedEmploy... ... \n", + "\n", + " Months Since Policy Inception Number of Open Complaints \\\n", + "0 93 0.0 \n", + "1 124 0.0 \n", + "2 22 0.0 \n", + "3 4 3.0 \n", + "4 29 0.0 \n", + "... ... ... \n", + "5161 474 0.0 \n", + "5162 354 0.0 \n", + "5163 240 0.0 \n", + "5164 637 0.0 \n", + "5165 301 0.0 \n", + "\n", + " Number of Policies Policy Type \\\n", + "0 8 Corporate Auto \n", + "1 2 Personal AutoPersonal Auto \n", + "2 6 Personal AutoPersonal Auto \n", + "3 3 Personal Auto \n", + "4 1 Personal Auto \n", + "... ... ... \n", + "5161 48 Personal AutoPersonal AutoPersonal AutoPersona... \n", + "5162 42 Personal AutoPersonal AutoPersonal AutoPersona... \n", + "5163 6 Personal AutoPersonal AutoPersonal AutoPersona... \n", + "5164 7 Personal AutoPersonal AutoCorporate AutoPerson... \n", + "5165 7 Corporate AutoSpecial AutoCorporate AutoPerson... \n", + "\n", + " Policy \\\n", + "0 Corporate L1 \n", + "1 Personal L3Personal L3 \n", + "2 Personal L2Personal L2 \n", + "3 Personal L2 \n", + "4 Personal L3 \n", + "... ... \n", + "5161 Personal L2Personal L2Personal L2Personal L3Pe... \n", + "5162 Personal L1Personal L3Personal L1Personal L1Pe... \n", + "5163 Personal L2Personal L3Personal L1Personal L3Pe... \n", + "5164 Personal L3Personal L2Corporate L3Personal L2P... \n", + "5165 Corporate L3Special L3Corporate L2Personal L3P... \n", + "\n", + " Renew Offer Type \\\n", + "0 Offer4 \n", + "1 Offer1Offer1 \n", + "2 Offer2Offer2 \n", + "3 Offer1 \n", + "4 Offer1 \n", + "... ... \n", + "5161 Offer1Offer1Offer1Offer1Offer1Offer1 \n", + "5162 Offer2Offer2Offer2Offer2Offer2Offer2 \n", + "5163 Offer1Offer1Offer1Offer1Offer1Offer1 \n", + "5164 Offer1Offer1Offer1Offer1Offer1Offer1Offer1 \n", + "5165 Offer2Offer2Offer2Offer2Offer2Offer2Offer2 \n", + "\n", + " Sales Channel \\\n", + "0 Agent \n", + "1 Call CenterCall Center \n", + "2 AgentAgent \n", + "3 Branch \n", + "4 Agent \n", + "... ... \n", + "5161 WebWebWebWebWebWeb \n", + "5162 AgentAgentAgentAgentAgentAgent \n", + "5163 AgentAgentAgentAgentAgentAgent \n", + "5164 AgentAgentAgentAgentAgentAgentAgent \n", + "5165 AgentAgentAgentAgentAgentAgentAgent \n", + "\n", + " Vehicle Class \\\n", + "0 Four-Door Car \n", + "1 Sports CarSports Car \n", + "2 Four-Door Car \n", + "3 SUV \n", + "4 SUV \n", + "... ... \n", + "5161 SUVSUVSUVSUVSUVSUV \n", + "5162 Luxury SUVLuxury SUVLuxury SUVLuxury SUVLuxury... \n", + "5163 Luxury SUVLuxury SUVLuxury SUVLuxury SUVLuxury... \n", + "5164 Luxury SUVLuxury SUVLuxury SUVLuxury SUVLuxury... \n", + "5165 Luxury CarLuxury CarLuxury CarLuxury CarLuxury... \n", + "\n", + " Vehicle Size Vehicle Type \n", + "0 Medsize A \n", + "1 MedsizeMedsize 0 \n", + "2 Medsize A \n", + "3 Medsize 0 \n", + "4 Medsize A \n", + "... ... ... \n", + "5161 MedsizeMedsizeMedsizeMedsizeMedsizeMedsize AA \n", + "5162 MedsizeMedsizeMedsizeMedsizeMedsizeMedsize AAA \n", + "5163 MedsizeMedsizeMedsizeMedsizeMedsizeMedsize AA \n", + "5164 MedsizeMedsizeMedsizeMedsizeMedsizeMedsizeMedsize AAAA \n", + "5165 MedsizeMedsizeMedsizeMedsizeMedsizeMedsize AAAAAA \n", + "\n", + "[5166 rows x 26 columns]\n" + ] + } + ], + "source": [ + "aggregated = df_1.groupby(['Response', 'Total Claim Amount']).sum().reset_index()\n", + "\n", + "print(\"Aggregated Data:\")\n", + "print(aggregated)" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "c6932a21-89cf-4bcb-b4be-d8ea66722ef6", + "metadata": {}, + "outputs": [], + "source": [ + "conditional_1 = df_1[\"Total Claim Amount\"] > 1000\n", + "conditional_2 = df_1[\"Response\"] == \"Yes\"\n", + "\n", + "df_1 = [conditional_1 & conditional_2]" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "1cf24d63-4110-4449-b69d-866648ea75c2", + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "list indices must be integers or slices, not str", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[45], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m avg_claim \u001b[38;5;241m=\u001b[39m df_1[df_1[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mResponse\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mYes\u001b[39m\u001b[38;5;124m'\u001b[39m]\u001b[38;5;241m.\u001b[39mgroupby([\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mPolicy Type\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mGender\u001b[39m\u001b[38;5;124m'\u001b[39m])[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mTotal Claim Amount\u001b[39m\u001b[38;5;124m'\u001b[39m]\u001b[38;5;241m.\u001b[39mmean()\n\u001b[0;32m 2\u001b[0m \u001b[38;5;28mprint\u001b[39m(avg_claim)\n", + "\u001b[1;31mTypeError\u001b[0m: list indices must be integers or slices, not str" + ] + } + ], + "source": [ + "avg_claim = df_1[df_1['Response'] == 'Yes'].groupby(['Policy Type', 'Gender'])['Total Claim Amount'].mean()\n", + "print(avg_claim)" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "695ae159-95a6-4df1-866c-db3fe243a596", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0 False\n", + " 1 False\n", + " 2 False\n", + " 3 False\n", + " 4 False\n", + " ... \n", + " 10905 False\n", + " 10906 False\n", + " 10907 False\n", + " 10908 False\n", + " 10909 False\n", + " Length: 10910, dtype: bool]" ] - }, - { - "cell_type": "markdown", - "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d", - "metadata": { - "id": "b60a4443-a1a7-4bbf-b78e-9ccdf9895e0d" - }, - "source": [ - "4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions." + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_1" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "id": "fee80761-57dc-45ba-87a6-54b6756f997c", + "metadata": {}, + "outputs": [ + { + "ename": "AttributeError", + "evalue": "'list' object has no attribute 'groupby'", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[65], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m df_1\u001b[38;5;241m.\u001b[39mgroupby([\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mResponse\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mPolicy Type\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mGender\u001b[39m\u001b[38;5;124m\"\u001b[39m])[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mTotal Claim Amount\u001b[39m\u001b[38;5;124m\"\u001b[39m]\u001b[38;5;241m.\u001b[39mmean()\n", + "\u001b[1;31mAttributeError\u001b[0m: 'list' object has no attribute 'groupby'" + ] + } + ], + "source": [ + "df_1.groupby([\"Response\", \"Policy Type\", \"Gender\"])[\"Total Claim Amount\"].mean()" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "5d194915-3026-49e5-bf7b-9d4506079947", + "metadata": {}, + "outputs": [ + { + "ename": "AttributeError", + "evalue": "'list' object has no attribute 'groupby'", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[41], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m customers_by_state \u001b[38;5;241m=\u001b[39m df_1\u001b[38;5;241m.\u001b[39mgroupby(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mState\u001b[39m\u001b[38;5;124m'\u001b[39m)\u001b[38;5;241m.\u001b[39msize()\n\u001b[0;32m 2\u001b[0m filtered_states \u001b[38;5;241m=\u001b[39m customers_by_state[customers_by_state \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m500\u001b[39m]\n\u001b[0;32m 3\u001b[0m \u001b[38;5;28mprint\u001b[39m(filtered_states)\n", + "\u001b[1;31mAttributeError\u001b[0m: 'list' object has no attribute 'groupby'" + ] + } + ], + "source": [ + "customers_by_state = df_1.groupby('State').size()\n", + "filtered_states = customers_by_state[customers_by_state > 500]\n", + "print(filtered_states)" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "id": "684d1e53-8a87-4620-bdc6-a2995637387c", + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "list indices must be integers or slices, not str", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[75], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m df_1[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mState\u001b[39m\u001b[38;5;124m\"\u001b[39m]\u001b[38;5;241m.\u001b[39mfilter()\n", + "\u001b[1;31mTypeError\u001b[0m: list indices must be integers or slices, not str" + ] + } + ], + "source": [ + "df_1[\"State\"].filter()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2dd5abc4-926a-47bc-84da-b5bb56f7afaf", + "metadata": {}, + "outputs": [], + "source": [ + "df.groupby([\"Pclass\", \"Sex\"]).agg({\"Fare\": [\"min\", \"mean\", \"max\"]})" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "fee9149d-71cd-4627-b12b-41d4e4e561b9", + "metadata": {}, + "outputs": [ + { + "ename": "AttributeError", + "evalue": "'list' object has no attribute 'groupby'", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[39], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m clv_stats \u001b[38;5;241m=\u001b[39m df_1\u001b[38;5;241m.\u001b[39mgroupby([\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mEducation\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mGender\u001b[39m\u001b[38;5;124m'\u001b[39m])[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mCustomer Lifetime Value\u001b[39m\u001b[38;5;124m'\u001b[39m]\u001b[38;5;241m.\u001b[39magg([\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mmax\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mmin\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mmedian\u001b[39m\u001b[38;5;124m'\u001b[39m])\n\u001b[0;32m 2\u001b[0m \u001b[38;5;28mprint\u001b[39m(clv_stats)\n", + "\u001b[1;31mAttributeError\u001b[0m: 'list' object has no attribute 'groupby'" + ] + } + ], + "source": [ + "clv_stats = df_1.groupby(['Education', 'Gender'])['Customer Lifetime Value'].agg(['max', 'min', 'median'])\n", + "print(clv_stats)" + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "id": "f601f7c4-ffd4-4636-ad85-748d2623bdf8", + "metadata": {}, + "outputs": [ + { + "ename": "AttributeError", + "evalue": "'list' object has no attribute 'groupby'", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[79], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m df_1\u001b[38;5;241m.\u001b[39mgroupby([\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mGender\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mEducation\u001b[39m\u001b[38;5;124m\"\u001b[39m])\u001b[38;5;241m.\u001b[39magg({\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCustomer Lifetime Value\u001b[39m\u001b[38;5;124m\"\u001b[39m: [\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmin\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmean\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmax\u001b[39m\u001b[38;5;124m\"\u001b[39m]})\n", + "\u001b[1;31mAttributeError\u001b[0m: 'list' object has no attribute 'groupby'" + ] + } + ], + "source": [ + "df_1.groupby([\"Gender\", \"Education\"]).agg({\"Customer Lifetime Value\": [\"min\", \"mean\", \"max\"]})" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "7c02315b-3c66-4f41-af19-ae7cc08e9a5a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0CustomerStateCustomer Lifetime ValueResponseCoverageEducationEffective To DateEmploymentStatusGender...Number of Open ComplaintsNumber of PoliciesPolicy TypePolicyRenew Offer TypeSales ChannelTotal Claim AmountVehicle ClassVehicle SizeVehicle Type
00DK49336Arizona4809.216960NoBasicCollege2/18/11EmployedM...0.09Corporate AutoCorporate L3Offer3Agent292.800000Four-Door CarMedsizeNaN
11KX64629California2228.525238NoBasicCollege1/18/11UnemployedF...0.01Personal AutoPersonal L3Offer4Call Center744.924331Four-Door CarMedsizeNaN
22LZ68649Washington14947.917300NoBasicBachelor2/10/11EmployedM...0.02Personal AutoPersonal L3Offer3Call Center480.000000SUVMedsizeA
33XL78013Oregon22332.439460YesExtendedCollege1/11/11EmployedM...0.02Corporate AutoCorporate L3Offer2Branch484.013411Four-Door CarMedsizeA
44QA50777Oregon9025.067525NoPremiumBachelor1/17/11Medical LeaveF...NaN7Personal AutoPersonal L2Offer1Branch707.925645Four-Door CarMedsizeNaN
\n", + "

5 rows × 26 columns

\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 Customer State Customer Lifetime Value Response \\\n", + "0 0 DK49336 Arizona 4809.216960 No \n", + "1 1 KX64629 California 2228.525238 No \n", + "2 2 LZ68649 Washington 14947.917300 No \n", + "3 3 XL78013 Oregon 22332.439460 Yes \n", + "4 4 QA50777 Oregon 9025.067525 No \n", + "\n", + " Coverage Education Effective To Date EmploymentStatus Gender ... \\\n", + "0 Basic College 2/18/11 Employed M ... \n", + "1 Basic College 1/18/11 Unemployed F ... \n", + "2 Basic Bachelor 2/10/11 Employed M ... \n", + "3 Extended College 1/11/11 Employed M ... \n", + "4 Premium Bachelor 1/17/11 Medical Leave F ... \n", + "\n", + " Number of Open Complaints Number of Policies Policy Type Policy \\\n", + "0 0.0 9 Corporate Auto Corporate L3 \n", + "1 0.0 1 Personal Auto Personal L3 \n", + "2 0.0 2 Personal Auto Personal L3 \n", + "3 0.0 2 Corporate Auto Corporate L3 \n", + "4 NaN 7 Personal Auto Personal L2 \n", + "\n", + " Renew Offer Type Sales Channel Total Claim Amount Vehicle Class \\\n", + "0 Offer3 Agent 292.800000 Four-Door Car \n", + "1 Offer4 Call Center 744.924331 Four-Door Car \n", + "2 Offer3 Call Center 480.000000 SUV \n", + "3 Offer2 Branch 484.013411 Four-Door Car \n", + "4 Offer1 Branch 707.925645 Four-Door Car \n", + "\n", + " Vehicle Size Vehicle Type \n", + "0 Medsize NaN \n", + "1 Medsize NaN \n", + "2 Medsize A \n", + "3 Medsize A \n", + "4 Medsize NaN \n", + "\n", + "[5 rows x 26 columns]" ] - }, - { - "cell_type": "markdown", - "id": "b42999f9-311f-481e-ae63-40a5577072c5", - "metadata": { - "id": "b42999f9-311f-481e-ae63-40a5577072c5" - }, - "source": [ - "## Bonus" + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_1.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b8100f10-a688-4bf7-8089-9935de26ea33", + "metadata": {}, + "outputs": [], + "source": [ + "#Bonus: Analyze the number of policies sold by state and month and unstack" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "id": "f3c3ffc3-0774-4655-b934-24a3433c6c5f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'2'" ] - }, - { - "cell_type": "markdown", - "id": "81ff02c5-6584-4f21-a358-b918697c6432", - "metadata": { - "id": "81ff02c5-6584-4f21-a358-b918697c6432" - }, - "source": [ - "5. The marketing team wants to analyze the number of policies sold by state and month. Present the data in a table where the months are arranged as columns and the states are arranged as rows." + }, + "execution_count": 83, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_1[\"Effective To Date\"][0][0:1]" + ] + }, + { + "cell_type": "code", + "execution_count": 87, + "id": "7475244d-474a-4ca9-8b50-4140a7f4a919", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2\n", + "1 1\n", + "2 2\n", + "3 1\n", + "4 1\n", + " ..\n", + "10905 1\n", + "10906 1\n", + "10907 2\n", + "10908 2\n", + "10909 1\n", + "Name: Month, Length: 10910, dtype: object" ] - }, - { - "cell_type": "markdown", - "id": "b6aec097-c633-4017-a125-e77a97259cda", - "metadata": { - "id": "b6aec097-c633-4017-a125-e77a97259cda" - }, - "source": [ - "6. Display a new DataFrame that contains the number of policies sold by month, by state, for the top 3 states with the highest number of policies sold.\n", - "\n", - "*Hint:*\n", - "- *To accomplish this, you will first need to group the data by state and month, then count the number of policies sold for each group. Afterwards, you will need to sort the data by the count of policies sold in descending order.*\n", - "- *Next, you will select the top 3 states with the highest number of policies sold.*\n", - "- *Finally, you will create a new DataFrame that contains the number of policies sold by month for each of the top 3 states.*" + }, + "execution_count": 87, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_1['Month'] = df_1['Effective To Date'].str[:1]\n", + "df_1[\"Month\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 93, + "id": "67a8cff1-e28e-4ffb-b5d1-a895eeefe243", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0CustomerStateCustomer Lifetime ValueResponseCoverageEducationEffective To DateEmploymentStatusGender...Number of PoliciesPolicy TypePolicyRenew Offer TypeSales ChannelTotal Claim AmountVehicle ClassVehicle SizeVehicle TypeMonth
00DK49336Arizona4809.216960NoBasicCollege2/18/11EmployedM...9Corporate AutoCorporate L3Offer3Agent292.800000Four-Door CarMedsizeNaN2
11KX64629California2228.525238NoBasicCollege1/18/11UnemployedF...1Personal AutoPersonal L3Offer4Call Center744.924331Four-Door CarMedsizeNaN1
22LZ68649Washington14947.917300NoBasicBachelor2/10/11EmployedM...2Personal AutoPersonal L3Offer3Call Center480.000000SUVMedsizeA2
33XL78013Oregon22332.439460YesExtendedCollege1/11/11EmployedM...2Corporate AutoCorporate L3Offer2Branch484.013411Four-Door CarMedsizeA1
44QA50777Oregon9025.067525NoPremiumBachelor1/17/11Medical LeaveF...7Personal AutoPersonal L2Offer1Branch707.925645Four-Door CarMedsizeNaN1
\n", + "

5 rows × 27 columns

\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 Customer State Customer Lifetime Value Response \\\n", + "0 0 DK49336 Arizona 4809.216960 No \n", + "1 1 KX64629 California 2228.525238 No \n", + "2 2 LZ68649 Washington 14947.917300 No \n", + "3 3 XL78013 Oregon 22332.439460 Yes \n", + "4 4 QA50777 Oregon 9025.067525 No \n", + "\n", + " Coverage Education Effective To Date EmploymentStatus Gender ... \\\n", + "0 Basic College 2/18/11 Employed M ... \n", + "1 Basic College 1/18/11 Unemployed F ... \n", + "2 Basic Bachelor 2/10/11 Employed M ... \n", + "3 Extended College 1/11/11 Employed M ... \n", + "4 Premium Bachelor 1/17/11 Medical Leave F ... \n", + "\n", + " Number of Policies Policy Type Policy Renew Offer Type \\\n", + "0 9 Corporate Auto Corporate L3 Offer3 \n", + "1 1 Personal Auto Personal L3 Offer4 \n", + "2 2 Personal Auto Personal L3 Offer3 \n", + "3 2 Corporate Auto Corporate L3 Offer2 \n", + "4 7 Personal Auto Personal L2 Offer1 \n", + "\n", + " Sales Channel Total Claim Amount Vehicle Class Vehicle Size \\\n", + "0 Agent 292.800000 Four-Door Car Medsize \n", + "1 Call Center 744.924331 Four-Door Car Medsize \n", + "2 Call Center 480.000000 SUV Medsize \n", + "3 Branch 484.013411 Four-Door Car Medsize \n", + "4 Branch 707.925645 Four-Door Car Medsize \n", + "\n", + " Vehicle Type Month \n", + "0 NaN 2 \n", + "1 NaN 1 \n", + "2 A 2 \n", + "3 A 1 \n", + "4 NaN 1 \n", + "\n", + "[5 rows x 27 columns]" ] - }, - { - "cell_type": "markdown", - "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009", - "metadata": { - "id": "ba975b8a-a2cf-4fbf-9f59-ebc381767009" - }, - "source": [ - "7. The marketing team wants to analyze the effect of different marketing channels on the customer response rate.\n", - "\n", - "Hint: You can use melt to unpivot the data and create a table that shows the customer response rate (those who responded \"Yes\") by marketing channel." + }, + "execution_count": 93, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_1[\"Month\"] = df_1[\"Month\"]\n", + "df_1.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 107, + "id": "da704598-bff1-428f-91c3-ae77e3830f0b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "State Month\n", + "Arizona 1 3052\n", + " 2 2864\n", + "California 1 5673\n", + " 2 4929\n", + "Nevada 1 1493\n", + " 2 1278\n", + "Oregon 1 4697\n", + " 2 3969\n", + "Washington 1 1358\n", + " 2 1225\n", + "Name: Number of Policies, dtype: int64" ] - }, - { - "cell_type": "markdown", - "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d", - "metadata": { - "id": "e4378d94-48fb-4850-a802-b1bc8f427b2d" - }, - "source": [ - "External Resources for Data Filtering: https://towardsdatascience.com/filtering-data-frames-in-pandas-b570b1f834b9" + }, + "execution_count": 107, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "state = df_1.groupby([\"State\", \"Month\"])[\"Number of Policies\"].sum()\n", + "state" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b29e9b26-0a45-4c7c-9049-b45713c31743", + "metadata": {}, + "outputs": [], + "source": [ + "#Display a new DataFrame for top 3 states with highest number of policies sold" + ] + }, + { + "cell_type": "code", + "execution_count": 121, + "id": "6db8dc5a-ee50-4a77-a25d-95190ed959f4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "State\n", + "Arizona 5916\n", + "California 10602\n", + "Nevada 2771\n", + "Oregon 8666\n", + "Washington 2583\n", + "Name: Number of Policies, dtype: int64" ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "449513f4-0459-46a0-a18d-9398d974c9ad", - "metadata": { - "id": "449513f4-0459-46a0-a18d-9398d974c9ad" - }, - "outputs": [], - "source": [ - "# your code goes here" + }, + "execution_count": 121, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_top = df_1.groupby([\"State\"])[\"Number of Policies\"].sum()\n", + "df_top" + ] + }, + { + "cell_type": "code", + "execution_count": 123, + "id": "76475d06-93f7-4919-8926-4278bf525bc1", + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "Series.sort_values() got an unexpected keyword argument 'by'", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[123], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m sorted_df \u001b[38;5;241m=\u001b[39m df_top\u001b[38;5;241m.\u001b[39msort_values(by\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mNumber of Policies\u001b[39m\u001b[38;5;124m'\u001b[39m)\n", + "\u001b[1;31mTypeError\u001b[0m: Series.sort_values() got an unexpected keyword argument 'by'" + ] + } + ], + "source": [ + "sorted_df = df_top.sort_values(by='Number of Policies')" + ] + }, + { + "cell_type": "code", + "execution_count": 125, + "id": "1d00633d-23a7-4998-85d9-175fcfcda590", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Number of Policies
State
Arizona5916
California10602
Nevada2771
Oregon8666
Washington2583
\n", + "
" + ], + "text/plain": [ + " Number of Policies\n", + "State \n", + "Arizona 5916\n", + "California 10602\n", + "Nevada 2771\n", + "Oregon 8666\n", + "Washington 2583" ] + }, + "execution_count": 125, + "metadata": {}, + "output_type": "execute_result" } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.13" - }, - "colab": { - "provenance": [] - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} \ No newline at end of file + ], + "source": [ + "pd.DataFrame(df_top)" + ] + }, + { + "cell_type": "code", + "execution_count": 135, + "id": "b634d2aa-8356-4999-918e-701ad9126d07", + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "Series.sort_values() got an unexpected keyword argument 'by'", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[135], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m sorted_df \u001b[38;5;241m=\u001b[39m df_top\u001b[38;5;241m.\u001b[39msort_values(by \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mNumber of Policies\u001b[39m\u001b[38;5;124m'\u001b[39m)\n", + "\u001b[1;31mTypeError\u001b[0m: Series.sort_values() got an unexpected keyword argument 'by'" + ] + } + ], + "source": [ + "sorted_df = df_top.sort_values(by = 'Number of Policies')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "81caf42a-f133-475d-82d5-13272ca08c17", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}