Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lab July dataframe #76

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
File renamed without changes.
915 changes: 915 additions & 0 deletions challenge-1.ipynb

Large diffs are not rendered by default.

704 changes: 704 additions & 0 deletions challenge-2.ipynb

Large diffs are not rendered by default.

299 changes: 299 additions & 0 deletions challenge-3.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,299 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Challenge 3\n",
"\n",
"In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.\n",
"\n",
"You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:\n",
"\n",
"**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.\n",
"\n",
"**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Q1: How to identify VIP & Preferred Customers?\n",
"\n",
"We start by importing all the required libraries:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# import required libraries\n",
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, extract and import `Orders` dataset into a dataframe variable called `orders`. Print the head of `orders` to overview the data:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"\n",
"orders=pd.read_csv(\"Pokemon.csv\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"\"Identify VIP and Preferred Customers\" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:\n",
"\n",
"## How to label customers whose aggregated `amount_spent` is in a given quantile range?\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We break down the main problem into several sub problems:\n",
"\n",
"#### Sub Problem 1: How to aggregate the `amount_spent` for unique customers?\n",
"\n",
"#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?\n",
"\n",
"#### Sub Problem 3: How to label selected customers as \"VIP\" or \"Preferred\"?\n",
"\n",
"*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*\n",
"\n",
"Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"orders.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Tenemos casi 400.000 pedidos. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Obtenemos un nuevo dataframe agrupando los pedidos por clientes, haciendo un groupby por ID cliente\n",
"orders_by_customers = orders.groupby(\"CustomerID\")[\"amount_spent\"].sum().reset_index()\n",
"orders_by_customers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Una vez hemos agrupado los pedidos en función del ID del cliente, vemos que tenemos 4400 clientes. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Sacamos un nuevo dataframe unicamente con los clientes cuyo amount_spent esté por encima del quantile 95\n",
"vip_clients= orders_by_customers[orders_by_customers[\"amount_spent\"]>= orders_by_customers[\"amount_spent\"].quantile(0.95)]\n",
"vip_clients"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Podemos decir que este dataframe con 2000 clientes corresponde a los clientes VIP\n",
"# Ya que son todos aquellos cuyos \"amount_spent\" están por encima del quantile 95"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# A continuación, sacamos los clientes con un percentil entre el 75 y el 95\n",
"# Calcular los percentiles\n",
"percentile_75 = orders_by_customers[\"amount_spent\"].quantile(0.75)\n",
"percentile_95 = orders_by_customers[\"amount_spent\"].quantile(0.95)\n",
"\n",
"# Filtrar los clientes que están entre el percentil 75 y el percentil 95\n",
"preferred_clients = orders_by_customers[(orders_by_customers[\"amount_spent\"] >= percentile_75) & \n",
" (orders_by_customers[\"amount_spent\"] < percentile_95)]\n",
"preferred_clients"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:\n",
"\n",
"## Q2: How to identify which country has the most VIP Customers?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"country_customers = orders.groupby([\"CustomerID\",\"Country\"])[\"amount_spent\"].sum().reset_index()\n",
"country_customers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Sacamos un nuevo dataframe unicamente con los clientes cuyo amount_spent esté por encima del quantile 95\n",
"vip_clients_country= country_customers[country_customers[\"amount_spent\"]>= country_customers[\"amount_spent\"].quantile(0.95)]\n",
"vip_clients_country"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Contamos el numero de clientes VIP por pais mediante un groupby\n",
"vip_counts = vip_clients_country.groupby(\"Country\")[\"CustomerID\"].nunique().reset_index()\n",
"vip_counts"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Ordenamos los resultados para conocer qué país es el que tiene más clientes VIP\n",
"most_vip_country = vip_counts.sort_values(by=\"CustomerID\", ascending=False).head(1)\n",
"most_vip_country"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# United Kingdom es el país con más clientes VIP"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Q3: How to identify which country has the most VIP+Preferred Customers combined?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Para que sean VIP + Preferred se sacan todos aquellos clientes por encima del percentil 75\n",
"# Sacamos un nuevo dataframe unicamente con los clientes cuyo amount_spent esté por encima del quantile 75\n",
"vip_preferred_clients_country= country_customers[country_customers[\"amount_spent\"]>= country_customers[\"amount_spent\"].quantile(0.75)]\n",
"vip_preferred_clients_country"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Contamos el numero de clientes VIP y Preferred por pais mediante un groupby\n",
"vip_preferred_counts = vip_preferred_clients_country.groupby(\"Country\")[\"CustomerID\"].nunique().reset_index()\n",
"vip_preferred_counts"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Ordenamos los resultados para conocer qué país es el que tiene más clientes VIP\n",
"most_vip_preferred_country = vip_preferred_counts.sort_values(by=\"CustomerID\", ascending=False).head(1)\n",
"most_vip_preferred_country"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# De nuevo, es United Kingdom el país con más clientes VIP y Preferred"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading