ta-data-pt-rmt · JulyBeiner · Nov 19, 2024 · Nov 19, 2024
diff --git a/your-code/Orders.zip → Orders.zip b/your-code/Orders.zip → Orders.zip
diff --git a/your-code/Pokemon.csv → Pokemon.csv b/your-code/Pokemon.csv → Pokemon.csv
diff --git a/challenge-1.ipynb b/challenge-1.ipynb
diff --git a/challenge-2.ipynb b/challenge-2.ipynb
diff --git a/challenge-3.ipynb b/challenge-3.ipynb
@@ -0,0 +1,299 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Challenge 3\n",
+    "\n",
+    "In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.\n",
+    "\n",
+    "You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:\n",
+    "\n",
+    "**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.\n",
+    "\n",
+    "**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q1: How to identify VIP & Preferred Customers?\n",
+    "\n",
+    "We start by importing all the required libraries:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# import required libraries\n",
+    "import numpy as np\n",
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, extract and import `Orders` dataset into a dataframe variable called `orders`. Print the head of `orders` to overview the data:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "orders=pd.read_csv(\"Pokemon.csv\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "\"Identify VIP and Preferred Customers\" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:\n",
+    "\n",
+    "## How to label customers whose aggregated `amount_spent` is in a given quantile range?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We break down the main problem into several sub problems:\n",
+    "\n",
+    "#### Sub Problem 1: How to aggregate the  `amount_spent` for unique customers?\n",
+    "\n",
+    "#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?\n",
+    "\n",
+    "#### Sub Problem 3: How to label selected customers as \"VIP\" or \"Preferred\"?\n",
+    "\n",
+    "*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*\n",
+    "\n",
+    "Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "orders.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Tenemos casi 400.000 pedidos. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Obtenemos un nuevo dataframe agrupando los pedidos por clientes, haciendo un groupby por ID cliente\n",
+    "orders_by_customers = orders.groupby(\"CustomerID\")[\"amount_spent\"].sum().reset_index()\n",
+    "orders_by_customers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Una vez hemos agrupado los pedidos en función del ID del cliente, vemos que tenemos 4400 clientes. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Sacamos un nuevo dataframe unicamente con los clientes cuyo amount_spent esté por encima del quantile 95\n",
+    "vip_clients= orders_by_customers[orders_by_customers[\"amount_spent\"]>= orders_by_customers[\"amount_spent\"].quantile(0.95)]\n",
+    "vip_clients"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Podemos decir que este dataframe con 2000 clientes corresponde a los clientes VIP\n",
+    "# Ya que son todos aquellos cuyos \"amount_spent\" están por encima del quantile 95"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# A continuación, sacamos los clientes con un percentil entre el 75 y el 95\n",
+    "# Calcular los percentiles\n",
+    "percentile_75 = orders_by_customers[\"amount_spent\"].quantile(0.75)\n",
+    "percentile_95 = orders_by_customers[\"amount_spent\"].quantile(0.95)\n",
+    "\n",
+    "# Filtrar los clientes que están entre el percentil 75 y el percentil 95\n",
+    "preferred_clients = orders_by_customers[(orders_by_customers[\"amount_spent\"] >= percentile_75) & \n",
+    "                                         (orders_by_customers[\"amount_spent\"] < percentile_95)]\n",
+    "preferred_clients"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:\n",
+    "\n",
+    "## Q2: How to identify which country has the most VIP Customers?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "country_customers = orders.groupby([\"CustomerID\",\"Country\"])[\"amount_spent\"].sum().reset_index()\n",
+    "country_customers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Sacamos un nuevo dataframe unicamente con los clientes cuyo amount_spent esté por encima del quantile 95\n",
+    "vip_clients_country= country_customers[country_customers[\"amount_spent\"]>= country_customers[\"amount_spent\"].quantile(0.95)]\n",
+    "vip_clients_country"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Contamos el numero de clientes VIP por pais mediante un groupby\n",
+    "vip_counts = vip_clients_country.groupby(\"Country\")[\"CustomerID\"].nunique().reset_index()\n",
+    "vip_counts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Ordenamos los resultados para conocer qué país es el que tiene más clientes VIP\n",
+    "most_vip_country = vip_counts.sort_values(by=\"CustomerID\", ascending=False).head(1)\n",
+    "most_vip_country"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# United Kingdom es el país con más clientes VIP"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q3: How to identify which country has the most VIP+Preferred Customers combined?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Para que sean VIP + Preferred se sacan todos aquellos clientes por encima del percentil 75\n",
+    "# Sacamos un nuevo dataframe unicamente con los clientes cuyo amount_spent esté por encima del quantile 75\n",
+    "vip_preferred_clients_country= country_customers[country_customers[\"amount_spent\"]>= country_customers[\"amount_spent\"].quantile(0.75)]\n",
+    "vip_preferred_clients_country"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Contamos el numero de clientes VIP y Preferred por pais mediante un groupby\n",
+    "vip_preferred_counts = vip_preferred_clients_country.groupby(\"Country\")[\"CustomerID\"].nunique().reset_index()\n",
+    "vip_preferred_counts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Ordenamos los resultados para conocer qué país es el que tiene más clientes VIP\n",
+    "most_vip_preferred_country = vip_preferred_counts.sort_values(by=\"CustomerID\", ascending=False).head(1)\n",
+    "most_vip_preferred_country"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# De nuevo, es United Kingdom el país con más clientes VIP y Preferred"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "base",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}