From 1568949f8678aa91932acf01c20c43967fa41dbf Mon Sep 17 00:00:00 2001 From: Yuliya Lavrenyuk Date: Fri, 10 May 2024 23:06:00 +0200 Subject: [PATCH] Yuliya Lavrenyuk --- your-code/challenge-1.ipynb | 3797 ++++++++++++++++++++++++++++++++--- your-code/challenge-2.ipynb | 1285 ++++++++++-- your-code/challenge-3.ipynb | 1581 +++++++++++++-- 3 files changed, 6045 insertions(+), 618 deletions(-) diff --git a/your-code/challenge-1.ipynb b/your-code/challenge-1.ipynb index cd674cb..64c8609 100644 --- a/your-code/challenge-1.ipynb +++ b/your-code/challenge-1.ipynb @@ -1,276 +1,3521 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 1\n", - "\n", - "In this challenge you will be working on **Pokemon**. You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.\n", - "\n", - "![Pokemon](../images/pokemon.jpg)\n", - "\n", - "Follow the instructions below and enter your code.\n", - "\n", - "#### Import all required libraries." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# import libraries" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Import data set.\n", - "\n", - "Read the dataset `pokemon.csv` into a dataframe called `pokemon`.\n", - "\n", - "*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# import dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Print first 10 rows of `pokemon`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.\n", - "\n", - "For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions below. Read the descriptions and understand what each column means. This knowledge is helpful in your work with the data.\n", - "\n", - "| Column | Description |\n", - "| --- | --- |\n", - "| # | ID for each pokemon |\n", - "| Name | Name of each pokemon |\n", - "| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |\n", - "| Type 2 | Some pokemon are dual type and have 2 |\n", - "| Total | A general guide to how strong a pokemon is |\n", - "| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |\n", - "| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |\n", - "| Defense | The base damage resistance against normal attacks |\n", - "| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |\n", - "| SP Def | The base damage resistance against special attacks |\n", - "| Speed | Determines which pokemon attacks first each round |\n", - "| Generation | Number of generation |\n", - "| Legendary | True if Legendary Pokemon False if not |" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Obtain the distinct values across `Type 1` and `Type 2`.\n", - "\n", - "Exctract all the values in `Type 1` and `Type 2`. Then create an array containing the distinct values across both fields." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Cleanup `Name` that contain \"Mega\".\n", - "\n", - "If you have checked out the pokemon names carefully enough, you should have found there are junk texts in the pokemon names which contain \"Mega\". We want to clean up the pokemon names. For instance, \"VenusaurMega Venusaur\" should be \"Mega Venusaur\", and \"CharizardMega Charizard X\" should be \"Mega Charizard X\"." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here\n", - "\n", - "\n", - "# test transformed data\n", - "pokemon.head(10)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`.\n", - "\n", - "For instance, if a pokemon has the Attack score 49 and Defense score 49, the corresponding `A/D Ratio` is 49/49=1." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Identify the pokemon with the highest `A/D Ratio`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Identify the pokemon with the lowest A/D Ratio." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.\n", - "\n", - "Rules:\n", - "\n", - "* If both `Type 1` and `Type 2` have valid values, the `Combo Type` value should contain both values in the form of ` `. For example, if `Type 1` value is `Grass` and `Type 2` value is `Poison`, `Combo Type` will be `Grass-Poison`.\n", - "\n", - "* If `Type 1` has valid value but `Type 2` is not, `Combo Type` will be the same as `Type 1`. For example, if `Type 1` is `Fire` whereas `Type 2` is `NaN`, `Combo Type` will be `Fire`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Identify the pokemon whose `A/D Ratio` are among the top 5." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### For the 5 pokemon printed above, aggregate `Combo Type` and use a list to store the unique values.\n", - "\n", - "Your end product is a list containing the distinct `Combo Type` values of the 5 pokemon with the highest `A/D Ratio`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### For each of the `Combo Type` values obtained from the previous question, calculate the mean scores of all numeric fields across all pokemon.\n", - "\n", - "Your output should look like below:\n", - "\n", - "![Aggregate](../images/aggregated-mean.png)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Challenge 1\n", + "\n", + "In this challenge you will be working on **Pokemon**. You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.\n", + "\n", + "![Pokemon](../images/pokemon.jpg)\n", + "\n", + "Follow the instructions below and enter your code.\n", + "\n", + "#### Import all required libraries." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "# import libraries\n", + "import pandas as pd\n", + "import numpy as np\n", + "import re" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Import data set.\n", + "\n", + "Read the dataset `pokemon.csv` into a dataframe called `pokemon`.\n", + "\n", + "*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "# import dataset\n", + "\n", + "pokemon = pd.read_csv(\"Pokemon.csv\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Print first 10 rows of `pokemon`." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
55CharmeleonFireNaN4055864588065801False
66CharizardFireFlying534788478109851001False
76CharizardMega Charizard XFireDragon63478130111130851001False
86CharizardMega Charizard YFireFlying63478104781591151001False
97SquirtleWaterNaN3144448655064431False
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 \n", + "3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n", + "4 4 Charmander Fire NaN 309 39 52 43 \n", + "5 5 Charmeleon Fire NaN 405 58 64 58 \n", + "6 6 Charizard Fire Flying 534 78 84 78 \n", + "7 6 CharizardMega Charizard X Fire Dragon 634 78 130 111 \n", + "8 6 CharizardMega Charizard Y Fire Flying 634 78 104 78 \n", + "9 7 Squirtle Water NaN 314 44 48 65 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary \n", + "0 65 65 45 1 False \n", + "1 80 80 60 1 False \n", + "2 100 100 80 1 False \n", + "3 122 120 80 1 False \n", + "4 60 50 65 1 False \n", + "5 80 65 80 1 False \n", + "6 109 85 100 1 False \n", + "7 130 85 100 1 False \n", + "8 159 115 100 1 False \n", + "9 50 64 43 1 False " + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "pokemon.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.\n", + "\n", + "For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions below. Read the descriptions and understand what each column means. This knowledge is helpful in your work with the data.\n", + "\n", + "| Column | Description |\n", + "| --- | --- |\n", + "| # | ID for each pokemon |\n", + "| Name | Name of each pokemon |\n", + "| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |\n", + "| Type 2 | Some pokemon are dual type and have 2 |\n", + "| Total | A general guide to how strong a pokemon is |\n", + "| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |\n", + "| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |\n", + "| Defense | The base damage resistance against normal attacks |\n", + "| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |\n", + "| SP Def | The base damage resistance against special attacks |\n", + "| Speed | Determines which pokemon attacks first each round |\n", + "| Generation | Number of generation |\n", + "| Legendary | True if Legendary Pokemon False if not |" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Obtain the distinct values across `Type 1` and `Type 2`.\n", + "\n", + "Exctract all the values in `Type 1` and `Type 2`. Then create an array containing the distinct values across both fields." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['Grass' 'Fire' 'Water' 'Bug' 'Normal' 'Poison' 'Electric' 'Ground'\n", + " 'Fairy' 'Fighting' 'Psychic' 'Rock' 'Ghost' 'Ice' 'Dragon' 'Dark' 'Steel'\n", + " 'Flying'] ['Poison' nan 'Flying' 'Dragon' 'Ground' 'Fairy' 'Grass' 'Fighting'\n", + " 'Psychic' 'Steel' 'Ice' 'Rock' 'Dark' 'Water' 'Electric' 'Fire' 'Ghost'\n", + " 'Bug' 'Normal']\n" + ] + }, + { + "data": { + "text/plain": [ + "numpy.ndarray" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "type_1 = pokemon[\"Type 1\"].unique()\n", + "\n", + "type_2 = pokemon[\"Type 2\"].unique()\n", + "\n", + "print(type_1,type_2)\n", + "\n", + "type(type_1)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['Grass' 'Fire' 'Water' 'Bug' 'Normal' 'Poison' 'Electric' 'Ground'\n", + " 'Fairy' 'Fighting' 'Psychic' 'Rock' 'Ghost' 'Ice' 'Dragon' 'Dark' 'Steel'\n", + " 'Flying' nan]\n" + ] + } + ], + "source": [ + "types = pd.concat([pokemon[\"Type 1\"],pokemon[\"Type 2\"]])\n", + "\n", + "# Get unique values across both fields\n", + "distinct_types = types.unique()\n", + "print(distinct_types)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Cleanup `Name` that contain \"Mega\".\n", + "\n", + "If you have checked out the pokemon names carefully enough, you should have found there are junk texts in the pokemon names which contain \"Mega\". We want to clean up the pokemon names. For instance, \"VenusaurMega Venusaur\" should be \"Mega Venusaur\", and \"CharizardMega Charizard X\" should be \"Mega Charizard X\"." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
33VenusaurMega VenusaurGrassPoison62580100123122120801False
76CharizardMega Charizard XFireDragon63478130111130851001False
86CharizardMega Charizard YFireFlying63478104781591151001False
129BlastoiseMega BlastoiseWaterNaN63079103120135115781False
1915BeedrillMega BeedrillBugPoison495651504015801451False
2318PidgeotMega PidgeotNormalFlying579838080135801211False
7165AlakazamMega AlakazamPsychicNaN590555065175951501False
8780SlowbroMega SlowbroWaterPsychic590957518013080301False
10294GengarMega GengarGhostPoison600606580170951301False
124115KangaskhanMega KangaskhanNormalNaN590105125100601001001False
137127PinsirMega PinsirBugFlying6006515512065901051False
141130GyaradosMega GyaradosWaterDark6409515510970130811False
154142AerodactylMega AerodactylRockFlying615801358570951501False
163150MewtwoMega Mewtwo XPsychicFighting7801061901001541001301True
164150MewtwoMega Mewtwo YPsychicNaN780106150701941201401True
168154MeganiumGrassNaN525808210083100802False
196181AmpharosMega AmpharosElectricDragon6109095105165110452False
224208SteelixMega SteelixSteelGround610751252305595302False
229212ScizorMega ScizorBugSteel6007015014065100752False
232214HeracrossMega HeracrossBugFighting6008018511540105752False
248229HoundoomMega HoundoomDarkFire600759090140901152False
268248TyranitarMega TyranitarRockDark70010016415095120712False
275254SceptileMega SceptileGrassDragon6307011075145851453False
279257BlazikenMega BlazikenFireFighting6308016080130801003False
283260SwampertMega SwampertWaterGround63510015011095110703False
306282GardevoirMega GardevoirPsychicFairy6186885651651351003False
327302SableyeMega SableyeDarkGhost480508512585115203False
329303MawileMega MawileSteelFairy480501051255595503False
333306AggronMega AggronSteelNaN630701402306080503False
336308MedichamMega MedichamFightingPsychic510601008580851003False
339310ManectricMega ManectricElectricNaN575707580135801353False
349319SharpedoMega SharpedoWaterDark5607014070110651053False
354323CameruptMega CameruptFireGround56070120100145105203False
366334AltariaMega AltariaDragonFairy59075110110110105803False
387354BanetteMega BanetteGhostNaN55564165759383753False
393359AbsolMega AbsolDarkNaN5656515060115601153False
397362GlalieMega GlalieIceNaN5808012080120801003False
409373SalamenceMega SalamenceDragonFlying70095145130120901203False
413376MetagrossMega MetagrossSteelPsychic700801451501051101103False
418380LatiasMega LatiasDragonPsychic700801001201401501103True
420381LatiosMega LatiosDragonPsychic700801301001601201103True
426384RayquazaMega RayquazaDragonFlying7801051801001801001153True
476428LopunnyMega LopunnyNormalFighting580651369454961354False
494445GarchompMega GarchompDragonGround70010817011512095924False
498448LucarioMega LucarioFightingSteel6257014588140701124False
511460AbomasnowMega AbomasnowGrassIce59490132105132105304False
527475GalladeMega GalladePsychicFighting6186816595651151104False
591531AudinoMega AudinoNormalFairy5451036012680126505False
796719DiancieMega DiancieRockFairy700501601101601101106True
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack \\\n", + "3 3 VenusaurMega Venusaur Grass Poison 625 80 100 \n", + "7 6 CharizardMega Charizard X Fire Dragon 634 78 130 \n", + "8 6 CharizardMega Charizard Y Fire Flying 634 78 104 \n", + "12 9 BlastoiseMega Blastoise Water NaN 630 79 103 \n", + "19 15 BeedrillMega Beedrill Bug Poison 495 65 150 \n", + "23 18 PidgeotMega Pidgeot Normal Flying 579 83 80 \n", + "71 65 AlakazamMega Alakazam Psychic NaN 590 55 50 \n", + "87 80 SlowbroMega Slowbro Water Psychic 590 95 75 \n", + "102 94 GengarMega Gengar Ghost Poison 600 60 65 \n", + "124 115 KangaskhanMega Kangaskhan Normal NaN 590 105 125 \n", + "137 127 PinsirMega Pinsir Bug Flying 600 65 155 \n", + "141 130 GyaradosMega Gyarados Water Dark 640 95 155 \n", + "154 142 AerodactylMega Aerodactyl Rock Flying 615 80 135 \n", + "163 150 MewtwoMega Mewtwo X Psychic Fighting 780 106 190 \n", + "164 150 MewtwoMega Mewtwo Y Psychic NaN 780 106 150 \n", + "168 154 Meganium Grass NaN 525 80 82 \n", + "196 181 AmpharosMega Ampharos Electric Dragon 610 90 95 \n", + "224 208 SteelixMega Steelix Steel Ground 610 75 125 \n", + "229 212 ScizorMega Scizor Bug Steel 600 70 150 \n", + "232 214 HeracrossMega Heracross Bug Fighting 600 80 185 \n", + "248 229 HoundoomMega Houndoom Dark Fire 600 75 90 \n", + "268 248 TyranitarMega Tyranitar Rock Dark 700 100 164 \n", + "275 254 SceptileMega Sceptile Grass Dragon 630 70 110 \n", + "279 257 BlazikenMega Blaziken Fire Fighting 630 80 160 \n", + "283 260 SwampertMega Swampert Water Ground 635 100 150 \n", + "306 282 GardevoirMega Gardevoir Psychic Fairy 618 68 85 \n", + "327 302 SableyeMega Sableye Dark Ghost 480 50 85 \n", + "329 303 MawileMega Mawile Steel Fairy 480 50 105 \n", + "333 306 AggronMega Aggron Steel NaN 630 70 140 \n", + "336 308 MedichamMega Medicham Fighting Psychic 510 60 100 \n", + "339 310 ManectricMega Manectric Electric NaN 575 70 75 \n", + "349 319 SharpedoMega Sharpedo Water Dark 560 70 140 \n", + "354 323 CameruptMega Camerupt Fire Ground 560 70 120 \n", + "366 334 AltariaMega Altaria Dragon Fairy 590 75 110 \n", + "387 354 BanetteMega Banette Ghost NaN 555 64 165 \n", + "393 359 AbsolMega Absol Dark NaN 565 65 150 \n", + "397 362 GlalieMega Glalie Ice NaN 580 80 120 \n", + "409 373 SalamenceMega Salamence Dragon Flying 700 95 145 \n", + "413 376 MetagrossMega Metagross Steel Psychic 700 80 145 \n", + "418 380 LatiasMega Latias Dragon Psychic 700 80 100 \n", + "420 381 LatiosMega Latios Dragon Psychic 700 80 130 \n", + "426 384 RayquazaMega Rayquaza Dragon Flying 780 105 180 \n", + "476 428 LopunnyMega Lopunny Normal Fighting 580 65 136 \n", + "494 445 GarchompMega Garchomp Dragon Ground 700 108 170 \n", + "498 448 LucarioMega Lucario Fighting Steel 625 70 145 \n", + "511 460 AbomasnowMega Abomasnow Grass Ice 594 90 132 \n", + "527 475 GalladeMega Gallade Psychic Fighting 618 68 165 \n", + "591 531 AudinoMega Audino Normal Fairy 545 103 60 \n", + "796 719 DiancieMega Diancie Rock Fairy 700 50 160 \n", + "\n", + " Defense Sp. Atk Sp. Def Speed Generation Legendary \n", + "3 123 122 120 80 1 False \n", + "7 111 130 85 100 1 False \n", + "8 78 159 115 100 1 False \n", + "12 120 135 115 78 1 False \n", + "19 40 15 80 145 1 False \n", + "23 80 135 80 121 1 False \n", + "71 65 175 95 150 1 False \n", + "87 180 130 80 30 1 False \n", + "102 80 170 95 130 1 False \n", + "124 100 60 100 100 1 False \n", + "137 120 65 90 105 1 False \n", + "141 109 70 130 81 1 False \n", + "154 85 70 95 150 1 False \n", + "163 100 154 100 130 1 True \n", + "164 70 194 120 140 1 True \n", + "168 100 83 100 80 2 False \n", + "196 105 165 110 45 2 False \n", + "224 230 55 95 30 2 False \n", + "229 140 65 100 75 2 False \n", + "232 115 40 105 75 2 False \n", + "248 90 140 90 115 2 False \n", + "268 150 95 120 71 2 False \n", + "275 75 145 85 145 3 False \n", + "279 80 130 80 100 3 False \n", + "283 110 95 110 70 3 False \n", + "306 65 165 135 100 3 False \n", + "327 125 85 115 20 3 False \n", + "329 125 55 95 50 3 False \n", + "333 230 60 80 50 3 False \n", + "336 85 80 85 100 3 False \n", + "339 80 135 80 135 3 False \n", + "349 70 110 65 105 3 False \n", + "354 100 145 105 20 3 False \n", + "366 110 110 105 80 3 False \n", + "387 75 93 83 75 3 False \n", + "393 60 115 60 115 3 False \n", + "397 80 120 80 100 3 False \n", + "409 130 120 90 120 3 False \n", + "413 150 105 110 110 3 False \n", + "418 120 140 150 110 3 True \n", + "420 100 160 120 110 3 True \n", + "426 100 180 100 115 3 True \n", + "476 94 54 96 135 4 False \n", + "494 115 120 95 92 4 False \n", + "498 88 140 70 112 4 False \n", + "511 105 132 105 30 4 False \n", + "527 95 65 115 110 4 False \n", + "591 126 80 126 50 5 False \n", + "796 110 160 110 110 6 True " + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "pokemon[pokemon[\"Name\"].str.contains(\"Mega\")]\n", + "\n", + "\n", + "# test transformed data\n", + "#pokemon.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33Mega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
55CharmeleonFireNaN4055864588065801False
66CharizardFireFlying534788478109851001False
76Mega Charizard XFireDragon63478130111130851001False
86Mega Charizard YFireFlying63478104781591151001False
97SquirtleWaterNaN3144448655064431False
108WartortleWaterNaN4055963806580581False
119BlastoiseWaterNaN530798310085105781False
129Mega BlastoiseWaterNaN63079103120135115781False
1310CaterpieBugNaN1954530352020451False
1411MetapodBugNaN2055020552525301False
1512ButterfreeBugFlying3956045509080701False
1613WeedleBugPoison1954035302020501False
1714KakunaBugPoison2054525502525351False
1815BeedrillBugPoison3956590404580751False
1915Mega BeedrillBugPoison495651504015801451False
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense Sp. Atk \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 65 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 80 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 100 \n", + "3 3 Mega Venusaur Grass Poison 625 80 100 123 122 \n", + "4 4 Charmander Fire NaN 309 39 52 43 60 \n", + "5 5 Charmeleon Fire NaN 405 58 64 58 80 \n", + "6 6 Charizard Fire Flying 534 78 84 78 109 \n", + "7 6 Mega Charizard X Fire Dragon 634 78 130 111 130 \n", + "8 6 Mega Charizard Y Fire Flying 634 78 104 78 159 \n", + "9 7 Squirtle Water NaN 314 44 48 65 50 \n", + "10 8 Wartortle Water NaN 405 59 63 80 65 \n", + "11 9 Blastoise Water NaN 530 79 83 100 85 \n", + "12 9 Mega Blastoise Water NaN 630 79 103 120 135 \n", + "13 10 Caterpie Bug NaN 195 45 30 35 20 \n", + "14 11 Metapod Bug NaN 205 50 20 55 25 \n", + "15 12 Butterfree Bug Flying 395 60 45 50 90 \n", + "16 13 Weedle Bug Poison 195 40 35 30 20 \n", + "17 14 Kakuna Bug Poison 205 45 25 50 25 \n", + "18 15 Beedrill Bug Poison 395 65 90 40 45 \n", + "19 15 Mega Beedrill Bug Poison 495 65 150 40 15 \n", + "\n", + " Sp. Def Speed Generation Legendary \n", + "0 65 45 1 False \n", + "1 80 60 1 False \n", + "2 100 80 1 False \n", + "3 120 80 1 False \n", + "4 50 65 1 False \n", + "5 65 80 1 False \n", + "6 85 100 1 False \n", + "7 85 100 1 False \n", + "8 115 100 1 False \n", + "9 64 43 1 False \n", + "10 80 58 1 False \n", + "11 105 78 1 False \n", + "12 115 78 1 False \n", + "13 20 45 1 False \n", + "14 25 30 1 False \n", + "15 80 70 1 False \n", + "16 20 50 1 False \n", + "17 25 35 1 False \n", + "18 80 75 1 False \n", + "19 80 145 1 False " + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def clean_names(names):\n", + " \"\"\"\"The function which keeps everything starting from Mega\"\"\"\n", + " match = re.search(r'Mega.*', names)\n", + " if match:\n", + " return match.group(0)\n", + " else:\n", + " return names\n", + " \n", + "pokemon[\"Name\"] = pokemon[\"Name\"].apply(clean_names)\n", + "\n", + "pokemon.head(20)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`.\n", + "\n", + "For instance, if a pokemon has the Attack score 49 and Defense score 49, the corresponding `A/D Ratio` is 49/49=1." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D Ratio
01BulbasaurGrassPoison3184549496565451False1.000000
12IvysaurGrassPoison4056062638080601False0.984127
23VenusaurGrassPoison525808283100100801False0.987952
33Mega VenusaurGrassPoison62580100123122120801False0.813008
44CharmanderFireNaN3093952436050651False1.209302
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense Sp. Atk \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 65 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 80 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 100 \n", + "3 3 Mega Venusaur Grass Poison 625 80 100 123 122 \n", + "4 4 Charmander Fire NaN 309 39 52 43 60 \n", + "\n", + " Sp. Def Speed Generation Legendary A/D Ratio \n", + "0 65 45 1 False 1.000000 \n", + "1 80 60 1 False 0.984127 \n", + "2 100 80 1 False 0.987952 \n", + "3 120 80 1 False 0.813008 \n", + "4 50 65 1 False 1.209302 " + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "pokemon[\"A/D Ratio\"] = pokemon[\"Attack\"]/pokemon[\"Defense\"]\n", + "\n", + "pokemon.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Identify the pokemon with the highest `A/D Ratio`." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D Ratio
429386DeoxysAttack FormePsychicNaN6005018020180201503True9.0
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "429 386 DeoxysAttack Forme Psychic NaN 600 50 180 20 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary A/D Ratio \n", + "429 180 20 150 3 True 9.0 " + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "pokemon[pokemon[\"A/D Ratio\"]== pokemon[\"A/D Ratio\"].max()]\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Identify the pokemon with the lowest A/D Ratio." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D Ratio
230213ShuckleBugRock50520102301023052False0.043478
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def \\\n", + "230 213 Shuckle Bug Rock 505 20 10 230 10 230 \n", + "\n", + " Speed Generation Legendary A/D Ratio \n", + "230 5 2 False 0.043478 " + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "pokemon[pokemon[\"A/D Ratio\"]== pokemon[\"A/D Ratio\"].min()]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.\n", + "\n", + "Rules:\n", + "\n", + "* If both `Type 1` and `Type 2` have valid values, the `Combo Type` value should contain both values in the form of ` `. For example, if `Type 1` value is `Grass` and `Type 2` value is `Poison`, `Combo Type` will be `Grass-Poison`.\n", + "\n", + "* If `Type 1` has valid value but `Type 2` is not, `Combo Type` will be the same as `Type 1`. For example, if `Type 1` is `Fire` whereas `Type 2` is `NaN`, `Combo Type` will be `Fire`." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D RatioCombo Type
01BulbasaurGrassPoison3184549496565451False1.000000Grass-Poison
12IvysaurGrassPoison4056062638080601False0.984127Grass-Poison
23VenusaurGrassPoison525808283100100801False0.987952Grass-Poison
33Mega VenusaurGrassPoison62580100123122120801False0.813008Grass-Poison
44CharmanderFireNaN3093952436050651False1.209302Fire
55CharmeleonFireNaN4055864588065801False1.103448Fire
66CharizardFireFlying534788478109851001False1.076923Fire-Flying
76Mega Charizard XFireDragon63478130111130851001False1.171171Fire-Dragon
86Mega Charizard YFireFlying63478104781591151001False1.333333Fire-Flying
97SquirtleWaterNaN3144448655064431False0.738462Water
108WartortleWaterNaN4055963806580581False0.787500Water
119BlastoiseWaterNaN530798310085105781False0.830000Water
129Mega BlastoiseWaterNaN63079103120135115781False0.858333Water
1310CaterpieBugNaN1954530352020451False0.857143Bug
1411MetapodBugNaN2055020552525301False0.363636Bug
1512ButterfreeBugFlying3956045509080701False0.900000Bug-Flying
1613WeedleBugPoison1954035302020501False1.166667Bug-Poison
1714KakunaBugPoison2054525502525351False0.500000Bug-Poison
1815BeedrillBugPoison3956590404580751False2.250000Bug-Poison
1915Mega BeedrillBugPoison495651504015801451False3.750000Bug-Poison
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense Sp. Atk \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 65 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 80 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 100 \n", + "3 3 Mega Venusaur Grass Poison 625 80 100 123 122 \n", + "4 4 Charmander Fire NaN 309 39 52 43 60 \n", + "5 5 Charmeleon Fire NaN 405 58 64 58 80 \n", + "6 6 Charizard Fire Flying 534 78 84 78 109 \n", + "7 6 Mega Charizard X Fire Dragon 634 78 130 111 130 \n", + "8 6 Mega Charizard Y Fire Flying 634 78 104 78 159 \n", + "9 7 Squirtle Water NaN 314 44 48 65 50 \n", + "10 8 Wartortle Water NaN 405 59 63 80 65 \n", + "11 9 Blastoise Water NaN 530 79 83 100 85 \n", + "12 9 Mega Blastoise Water NaN 630 79 103 120 135 \n", + "13 10 Caterpie Bug NaN 195 45 30 35 20 \n", + "14 11 Metapod Bug NaN 205 50 20 55 25 \n", + "15 12 Butterfree Bug Flying 395 60 45 50 90 \n", + "16 13 Weedle Bug Poison 195 40 35 30 20 \n", + "17 14 Kakuna Bug Poison 205 45 25 50 25 \n", + "18 15 Beedrill Bug Poison 395 65 90 40 45 \n", + "19 15 Mega Beedrill Bug Poison 495 65 150 40 15 \n", + "\n", + " Sp. Def Speed Generation Legendary A/D Ratio Combo Type \n", + "0 65 45 1 False 1.000000 Grass-Poison \n", + "1 80 60 1 False 0.984127 Grass-Poison \n", + "2 100 80 1 False 0.987952 Grass-Poison \n", + "3 120 80 1 False 0.813008 Grass-Poison \n", + "4 50 65 1 False 1.209302 Fire \n", + "5 65 80 1 False 1.103448 Fire \n", + "6 85 100 1 False 1.076923 Fire-Flying \n", + "7 85 100 1 False 1.171171 Fire-Dragon \n", + "8 115 100 1 False 1.333333 Fire-Flying \n", + "9 64 43 1 False 0.738462 Water \n", + "10 80 58 1 False 0.787500 Water \n", + "11 105 78 1 False 0.830000 Water \n", + "12 115 78 1 False 0.858333 Water \n", + "13 20 45 1 False 0.857143 Bug \n", + "14 25 30 1 False 0.363636 Bug \n", + "15 80 70 1 False 0.900000 Bug-Flying \n", + "16 20 50 1 False 1.166667 Bug-Poison \n", + "17 25 35 1 False 0.500000 Bug-Poison \n", + "18 80 75 1 False 2.250000 Bug-Poison \n", + "19 80 145 1 False 3.750000 Bug-Poison " + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "def type_concat(row):\n", + "\n", + " \"\"\"Function check for nan values and return single or concateneted type\"\"\"\n", + "\n", + " if pd.isna(row[\"Type 1\"]):\n", + " return row[\"Type 2\"]\n", + " elif pd.isna(row[\"Type 2\"]):\n", + " return row[\"Type 1\"]\n", + " else:\n", + " return row[\"Type 1\"] + \"-\" + row[\"Type 2\"]\n", + "\n", + " \n", + "pokemon[\"Combo Type\"] = pokemon.apply(type_concat,axis = 1)\n", + "\n", + "pokemon.head(20)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Identify the pokemon whose `A/D Ratio` are among the top 5." + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D RatioCombo Type
429386DeoxysAttack FormePsychicNaN6005018020180201503True9.000Psychic
347318CarvanhaWaterDark3054590206520653False4.500Water-Dark
1915Mega BeedrillBugPoison495651504015801451False3.750Bug-Poison
453408CranidosRockNaN35067125403030584False3.125Rock
348319SharpedoWaterDark46070120409540953False3.000Water-Dark
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "429 386 DeoxysAttack Forme Psychic NaN 600 50 180 20 \n", + "347 318 Carvanha Water Dark 305 45 90 20 \n", + "19 15 Mega Beedrill Bug Poison 495 65 150 40 \n", + "453 408 Cranidos Rock NaN 350 67 125 40 \n", + "348 319 Sharpedo Water Dark 460 70 120 40 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary A/D Ratio Combo Type \n", + "429 180 20 150 3 True 9.000 Psychic \n", + "347 65 20 65 3 False 4.500 Water-Dark \n", + "19 15 80 145 1 False 3.750 Bug-Poison \n", + "453 30 30 58 4 False 3.125 Rock \n", + "348 95 40 95 3 False 3.000 Water-Dark " + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + " sorted_df= pokemon.sort_values([\"A/D Ratio\"],ascending = False).head()\n", + "display(sorted_df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### For the 5 pokemon printed above, aggregate `Combo Type` and use a list to store the unique values.\n", + "\n", + "Your end product is a list containing the distinct `Combo Type` values of the 5 pokemon with the highest `A/D Ratio`." + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['Water-Dark', 'Psychic', 'Bug-Poison', 'Rock']\n" + ] + }, + { + "data": { + "text/plain": [ + "list" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "\n", + "combo_type_dict = sorted_df[\"Combo Type\"].to_list()\n", + "\n", + "combo_type = list(set(combo_type_dict))\n", + "\n", + "print(combo_type)\n", + "\n", + "type(combo_type)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### For each of the `Combo Type` values obtained from the previous question, calculate the mean scores of all numeric fields across all pokemon.\n", + "\n", + "Your output should look like below:\n", + "\n", + "![Aggregate](../images/aggregated-mean.png)" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationA/D Ratio
Combo Type
Bug289.70588253.05882450.35294155.41176539.29411843.64705947.9411763.4705880.940179
Bug-Electric395.50000060.00000062.00000055.00000077.00000055.00000086.5000005.0000001.111667
Bug-Fighting550.00000080.000000155.00000095.00000040.000000100.00000080.0000002.0000001.637681
Bug-Fire455.00000070.00000072.50000060.00000092.50000080.00000080.0000005.0000001.234266
Bug-Flying419.50000063.00000070.14285761.57142972.85714369.07142982.8571432.8571431.146274
..............................
Water-Ice511.66666790.00000083.333333113.33333380.00000078.33333366.6666671.0000000.821759
Water-Poison426.66666761.66666768.33333358.33333361.66666791.66666785.0000001.3333331.162149
Water-Psychic481.00000087.00000073.000000104.00000094.00000079.00000044.0000001.2000000.783668
Water-Rock428.75000070.75000082.750000112.75000061.50000065.00000036.0000003.7500000.727170
Water-Steel530.00000084.00000086.00000088.000000111.000000101.00000060.0000004.0000000.977273
\n", + "

154 rows × 9 columns

\n", + "
" + ], + "text/plain": [ + " Total HP Attack Defense Sp. Atk \\\n", + "Combo Type \n", + "Bug 289.705882 53.058824 50.352941 55.411765 39.294118 \n", + "Bug-Electric 395.500000 60.000000 62.000000 55.000000 77.000000 \n", + "Bug-Fighting 550.000000 80.000000 155.000000 95.000000 40.000000 \n", + "Bug-Fire 455.000000 70.000000 72.500000 60.000000 92.500000 \n", + "Bug-Flying 419.500000 63.000000 70.142857 61.571429 72.857143 \n", + "... ... ... ... ... ... \n", + "Water-Ice 511.666667 90.000000 83.333333 113.333333 80.000000 \n", + "Water-Poison 426.666667 61.666667 68.333333 58.333333 61.666667 \n", + "Water-Psychic 481.000000 87.000000 73.000000 104.000000 94.000000 \n", + "Water-Rock 428.750000 70.750000 82.750000 112.750000 61.500000 \n", + "Water-Steel 530.000000 84.000000 86.000000 88.000000 111.000000 \n", + "\n", + " Sp. Def Speed Generation A/D Ratio \n", + "Combo Type \n", + "Bug 43.647059 47.941176 3.470588 0.940179 \n", + "Bug-Electric 55.000000 86.500000 5.000000 1.111667 \n", + "Bug-Fighting 100.000000 80.000000 2.000000 1.637681 \n", + "Bug-Fire 80.000000 80.000000 5.000000 1.234266 \n", + "Bug-Flying 69.071429 82.857143 2.857143 1.146274 \n", + "... ... ... ... ... \n", + "Water-Ice 78.333333 66.666667 1.000000 0.821759 \n", + "Water-Poison 91.666667 85.000000 1.333333 1.162149 \n", + "Water-Psychic 79.000000 44.000000 1.200000 0.783668 \n", + "Water-Rock 65.000000 36.000000 3.750000 0.727170 \n", + "Water-Steel 101.000000 60.000000 4.000000 0.977273 \n", + "\n", + "[154 rows x 9 columns]" + ] + }, + "execution_count": 66, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "pokemon[\"Combo Type\"] == pokemon.groupby([\"Combo Type\"]).agg({'Total':'mean', 'HP':'mean', 'Attack':'mean', 'Defense':'mean',\n", + " 'Sp. Atk':'mean', 'Sp. Def':'mean', 'Speed':'mean', 'Generation':'mean', 'A/D Ratio':'mean'})" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D RatioCombo Type
1613WeedleBugPoison1954035302020501False1.166667Bug-Poison
1714KakunaBugPoison2054525502525351False0.500000Bug-Poison
1815BeedrillBugPoison3956590404580751False2.250000Bug-Poison
1915Mega BeedrillBugPoison495651504015801451False3.750000Bug-Poison
5348VenonatBugPoison3056055504055451False1.100000Bug-Poison
................................................
667606BeheeyemPsychicNaN48575757512595405False1.000000Psychic
726658GreninjaWaterDark530729567103711226False1.417910Water-Dark
745677EspurrPsychicNaN3556248546360686False0.888889Psychic
746678MeowsticMalePsychicNaN46674487683811046False0.631579Psychic
747678MeowsticFemalePsychicNaN46674487683811046False0.631579Psychic
\n", + "

65 rows × 15 columns

\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "16 13 Weedle Bug Poison 195 40 35 30 \n", + "17 14 Kakuna Bug Poison 205 45 25 50 \n", + "18 15 Beedrill Bug Poison 395 65 90 40 \n", + "19 15 Mega Beedrill Bug Poison 495 65 150 40 \n", + "53 48 Venonat Bug Poison 305 60 55 50 \n", + ".. ... ... ... ... ... .. ... ... \n", + "667 606 Beheeyem Psychic NaN 485 75 75 75 \n", + "726 658 Greninja Water Dark 530 72 95 67 \n", + "745 677 Espurr Psychic NaN 355 62 48 54 \n", + "746 678 MeowsticMale Psychic NaN 466 74 48 76 \n", + "747 678 MeowsticFemale Psychic NaN 466 74 48 76 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary A/D Ratio Combo Type \n", + "16 20 20 50 1 False 1.166667 Bug-Poison \n", + "17 25 25 35 1 False 0.500000 Bug-Poison \n", + "18 45 80 75 1 False 2.250000 Bug-Poison \n", + "19 15 80 145 1 False 3.750000 Bug-Poison \n", + "53 40 55 45 1 False 1.100000 Bug-Poison \n", + ".. ... ... ... ... ... ... ... \n", + "667 125 95 40 5 False 1.000000 Psychic \n", + "726 103 71 122 6 False 1.417910 Water-Dark \n", + "745 63 60 68 6 False 0.888889 Psychic \n", + "746 83 81 104 6 False 0.631579 Psychic \n", + "747 83 81 104 6 False 0.631579 Psychic \n", + "\n", + "[65 rows x 15 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "filtered_df = pokemon[pokemon[\"Combo Type\"].isin(combo_type) == True]\n", + "\n", + "display(filtered_df)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationA/D Ratio
Combo Type
Bug-Poison347.91666753.75000068.33333358.08333342.50000059.33333365.9166672.3333331.315989
Psychic464.55263272.55263264.94736867.23684298.55263282.39473778.8684213.3421051.164196
Rock409.44444467.111111103.333333107.22222240.55555658.33333332.8888893.8888891.260091
Water-Dark493.83333369.166667120.00000065.16666788.83333363.50000087.1666673.1666672.291949
\n", + "
" + ], + "text/plain": [ + " Total HP Attack Defense Sp. Atk \\\n", + "Combo Type \n", + "Bug-Poison 347.916667 53.750000 68.333333 58.083333 42.500000 \n", + "Psychic 464.552632 72.552632 64.947368 67.236842 98.552632 \n", + "Rock 409.444444 67.111111 103.333333 107.222222 40.555556 \n", + "Water-Dark 493.833333 69.166667 120.000000 65.166667 88.833333 \n", + "\n", + " Sp. Def Speed Generation A/D Ratio \n", + "Combo Type \n", + "Bug-Poison 59.333333 65.916667 2.333333 1.315989 \n", + "Psychic 82.394737 78.868421 3.342105 1.164196 \n", + "Rock 58.333333 32.888889 3.888889 1.260091 \n", + "Water-Dark 63.500000 87.166667 3.166667 2.291949 " + ] + }, + "execution_count": 75, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "filtered_df.groupby([\"Combo Type\"]).agg({'Total':'mean', 'HP':'mean', 'Attack':'mean', 'Defense':'mean','Sp. Atk':'mean', 'Sp. Def':'mean', 'Speed':'mean', 'Generation':'mean', 'A/D Ratio':'mean'})" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/your-code/challenge-2.ipynb b/your-code/challenge-2.ipynb index d347731..e0a4cf8 100644 --- a/your-code/challenge-2.ipynb +++ b/your-code/challenge-2.ipynb @@ -1,195 +1,1090 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 2\n", - "\n", - "In this challenge we will continue working with the `Pokemon` dataset. We will attempt solving a slightly more complex problem in which we will practice the iterative data analysis process you leaned in [this video](https://www.youtube.com/watch?v=xOomNicqbkk).\n", - "\n", - "The problem statement is as follows:\n", - "\n", - "**You are at a Pokemon black market planning to buy a Pokemon for battle. All Pokemon are sold at the same price and you can only afford to buy one. You cannot choose which specific Pokemon to buy. However, you can specify the type of the Pokemon - one type that exists in either `Type 1` or `Type 2`. Which type should you choose in order to maximize your chance of receiving a good Pokemon?**\n", - "\n", - "To remind you about the 3 steps of iterative data analysis, they are:\n", - "\n", - "1. Setting Expectations\n", - "1. Collecting Information\n", - "1. Reacting to Data / Revising Expectations\n", - "\n", - "Following the iterative process, we'll guide you in completing the challenge." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "## Problem Solving Iteration 1\n", - "\n", - "In this iteration we'll analyze the problem and identify the breakthrough. The original question statement is kind of vague because we don't know what a *good pokemon* really means as represented in the data. We'll start by understanding the dataset and see if we can find some insights." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Import libraries\n", - "import numpy as np\n", - "import pandas as pd" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "# Importing the dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the data it seems whether a pokemon is good depends on its abilities as represented in the fields of `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total`. We are not sure about `Generation` and `Legendary` because they are not necessarily the decisive factors of the pokemon abilities.\n", - "\n", - "But `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total` are a lot of fields! If we look at them all at once it's very complicated. This isn't Mission Impossible but it's ideal that we tackle this kind of problem after we learn Machine Learning (which you will do in Module 3). For now, is there a way to consolidate the fields we need to look into?\n", - "\n", - "Fortunately there seems to be a way. It appears the `Total` field is computed based on the other 6 fields. But we need to prove our theory. If we can approve there is a formula to compute `Total` based on the other 6 abilities, we only need to look into `Total`.\n", - "\n", - "We have the following expectation now:\n", - "\n", - "#### The `Total` field is computed based on `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, and `Speed`.\n", - "\n", - "We need to collect the following information:\n", - "\n", - "* **What is the formula to compute `Total`?**\n", - "* **Does the formula work for all pokemon?**\n", - "\n", - "In the cell below, make a hypothesis on how `Total` is computed and test your hypothesis." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Problem Solving Iteration 2\n", - "\n", - "Now that we have consolidated the abilities fields, we can update the problem statement. The new problem statement is:\n", - "\n", - "### Which pokemon type is most likely to have the highest `Total` value?\n", - "\n", - "In the updated problem statement, we assume there is a certain relationship between the `Total` and the pokemon type. But we have two *type* fields (`Type 1` and `Type 2`) that have string values. In data analysis, string fields have to be transformed to numerical format in order to be analyzed. \n", - "\n", - "In addition, keep in mind that `Type 1` always has a value but `Type 2` is sometimes empty (having the `NaN` value). Also, the pokemon type we choose may be either in `Type 1` or `Type 2`.\n", - "\n", - "Now our expectation is:\n", - "\n", - "#### `Type 1` and `Type 2` string variables need to be converted to numerical variables in order to identify the relationship between `Total` and the pokemon type.\n", - "\n", - "The information we need to collect is:\n", - "\n", - "#### How to convert two string variables to numerical?\n", - "\n", - "Let's address the first question first. You can use a method called **One Hot Encoding** which is frequently used in machine learning to encode categorical string variables to numerical. The idea is to gather all the possible string values in a categorical field and create a numerical field for each unique string value. Each of those numerical fields uses `1` and `0` to indicate whether the data record has the corresponding categorical value. A detailed explanation of One Hot Encoding can be found in [this article](https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f). You will formally learn it in Module 3.\n", - "\n", - "For instance, if a pokemon has `Type 1` as `Poison` and `Type 2` as `Fire`, then its `Poison` and `Fire` fields are `1` whereas all other fields are `0`. If a pokemon has `Type 1` as `Water` and `Type 2` as `NaN`, then its `Water` field is `1` whereas all other fields are `0`.\n", - "\n", - "#### In the next cell, use One Hot Encoding to encode `Type 1` and `Type 2`. Use the pokemon type values as the names of the numerical fields you create.\n", - "\n", - "The new numerical variables you create should look like below:\n", - "\n", - "![One Hot Encoding](../images/one-hot-encoding.png)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Problem Solving Iteration 3\n", - "\n", - "Now we have encoded the pokemon types, we will identify the relationship between `Total` and the encoded fields. Our expectation is:\n", - "\n", - "#### There are relationships between `Total` and the encoded pokemon type variables and we need to identify the correlations.\n", - "\n", - "The information we need to collect is:\n", - "\n", - "#### How to identify the relationship between `Total` and the encoded pokemon type fields?\n", - "\n", - "There are multiple ways to answer this question. The easiest way is to use correlation. In the cell below, calculate the correlation of `Total` to each of the encoded fields. Rank the correlations and identify the #1 pokemon type that is most likely to have the highest `Total`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Bonus Question\n", - "\n", - "Say now you can choose both `Type 1` and `Type 2` of the pokemon. In order to receive the best pokemon, which types will you choose?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Challenge 2\n", + "\n", + "In this challenge we will continue working with the `Pokemon` dataset. We will attempt solving a slightly more complex problem in which we will practice the iterative data analysis process you leaned in [this video](https://www.youtube.com/watch?v=xOomNicqbkk).\n", + "\n", + "The problem statement is as follows:\n", + "\n", + "**You are at a Pokemon black market planning to buy a Pokemon for battle. All Pokemon are sold at the same price and you can only afford to buy one. You cannot choose which specific Pokemon to buy. However, you can specify the type of the Pokemon - one type that exists in either `Type 1` or `Type 2`. Which type should you choose in order to maximize your chance of receiving a good Pokemon?**\n", + "\n", + "To remind you about the 3 steps of iterative data analysis, they are:\n", + "\n", + "1. Setting Expectations\n", + "1. Collecting Information\n", + "1. Reacting to Data / Revising Expectations\n", + "\n", + "Following the iterative process, we'll guide you in completing the challenge." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Problem Solving Iteration 1\n", + "\n", + "In this iteration we'll analyze the problem and identify the breakthrough. The original question statement is kind of vague because we don't know what a *good pokemon* really means as represented in the data. We'll start by understanding the dataset and see if we can find some insights." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "# Import libraries\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn.preprocessing import OneHotEncoder" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 \n", + "3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n", + "4 4 Charmander Fire NaN 309 39 52 43 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary \n", + "0 65 65 45 1 False \n", + "1 80 80 60 1 False \n", + "2 100 100 80 1 False \n", + "3 122 120 80 1 False \n", + "4 60 50 65 1 False " + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Importing the dataset\n", + "\n", + "pokemon = pd.read_csv(\"Pokemon.csv\")\n", + "\n", + "pokemon.head()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the data it seems whether a pokemon is good depends on its abilities as represented in the fields of `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total`. We are not sure about `Generation` and `Legendary` because they are not necessarily the decisive factors of the pokemon abilities.\n", + "\n", + "But `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total` are a lot of fields! If we look at them all at once it's very complicated. This isn't Mission Impossible but it's ideal that we tackle this kind of problem after we learn Machine Learning (which you will do in Module 3). For now, is there a way to consolidate the fields we need to look into?\n", + "\n", + "Fortunately there seems to be a way. It appears the `Total` field is computed based on the other 6 fields. But we need to prove our theory. If we can approve there is a formula to compute `Total` based on the other 6 abilities, we only need to look into `Total`.\n", + "\n", + "We have the following expectation now:\n", + "\n", + "#### The `Total` field is computed based on `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, and `Speed`.\n", + "\n", + "We need to collect the following information:\n", + "\n", + "* **What is the formula to compute `Total`?**\n", + "* **Does the formula work for all pokemon?**\n", + "\n", + "In the cell below, make a hypothesis on how `Total` is computed and test your hypothesis." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "check\n", + "True 800\n", + "Name: count, dtype: int64" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "cols_to_sum = ['HP', 'Attack', 'Defense','Sp. Atk', 'Sp. Def', 'Speed']\n", + "\n", + "pokemon[\"total_suggest\"] = pokemon[cols_to_sum].sum(axis = 1)\n", + "\n", + "pokemon[\"check\"]= pokemon[\"Total\"] == pokemon[\"Total\"]\n", + "\n", + "pokemon[\"check\"].value_counts()\n", + "\n", + "#All 800 rows are true, it means that that \"Total\" is sum of 'HP', 'Attack', 'Defense','Sp. Atk', 'Sp. Def', 'Speed'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Problem Solving Iteration 2\n", + "\n", + "Now that we have consolidated the abilities fields, we can update the problem statement. The new problem statement is:\n", + "\n", + "### Which pokemon type is most likely to have the highest `Total` value?\n", + "\n", + "In the updated problem statement, we assume there is a certain relationship between the `Total` and the pokemon type. But we have two *type* fields (`Type 1` and `Type 2`) that have string values. In data analysis, string fields have to be transformed to numerical format in order to be analyzed. \n", + "\n", + "In addition, keep in mind that `Type 1` always has a value but `Type 2` is sometimes empty (having the `NaN` value). Also, the pokemon type we choose may be either in `Type 1` or `Type 2`.\n", + "\n", + "Now our expectation is:\n", + "\n", + "#### `Type 1` and `Type 2` string variables need to be converted to numerical variables in order to identify the relationship between `Total` and the pokemon type.\n", + "\n", + "The information we need to collect is:\n", + "\n", + "#### How to convert two string variables to numerical?\n", + "\n", + "Let's address the first question first. You can use a method called **One Hot Encoding** which is frequently used in machine learning to encode categorical string variables to numerical. The idea is to gather all the possible string values in a categorical field and create a numerical field for each unique string value. Each of those numerical fields uses `1` and `0` to indicate whether the data record has the corresponding categorical value. A detailed explanation of One Hot Encoding can be found in [this article](https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f). You will formally learn it in Module 3.\n", + "\n", + "For instance, if a pokemon has `Type 1` as `Poison` and `Type 2` as `Fire`, then its `Poison` and `Fire` fields are `1` whereas all other fields are `0`. If a pokemon has `Type 1` as `Water` and `Type 2` as `NaN`, then its `Water` field is `1` whereas all other fields are `0`.\n", + "\n", + "#### In the next cell, use One Hot Encoding to encode `Type 1` and `Type 2`. Use the pokemon type values as the names of the numerical fields you create.\n", + "\n", + "The new numerical variables you create should look like below:\n", + "\n", + "\n", + "![One Hot Encoding](../images/one-hot-encoding.png)" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [], + "source": [ + "unique_types = concatenated_types.unique()\n", + "\n", + "\n", + "dummies = pd.get_dummies(pokemon[\"Type 1\"], columns=unique_types, dtype=int)\n", + "dummies2 = pd.get_dummies(pokemon[\"Type 2\"], columns=unique_types, dtype=int)" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
BugDarkDragonElectricFairyFightingFireFlyingGhostGrassGroundIceNormalPoisonPsychicRockSteelWater
0000000000100010000
1000000000100010000
2000000000100010000
3000000000100010000
4000000100000000000
.........................................................
795000010000000000100
796000010000000000100
797000000001000001000
798010000000000001000
799000000100000000001
\n", + "

800 rows × 18 columns

\n", + "
" + ], + "text/plain": [ + " Bug Dark Dragon Electric Fairy Fighting Fire Flying Ghost Grass \\\n", + "0 0 0 0 0 0 0 0 0 0 1 \n", + "1 0 0 0 0 0 0 0 0 0 1 \n", + "2 0 0 0 0 0 0 0 0 0 1 \n", + "3 0 0 0 0 0 0 0 0 0 1 \n", + "4 0 0 0 0 0 0 1 0 0 0 \n", + ".. ... ... ... ... ... ... ... ... ... ... \n", + "795 0 0 0 0 1 0 0 0 0 0 \n", + "796 0 0 0 0 1 0 0 0 0 0 \n", + "797 0 0 0 0 0 0 0 0 1 0 \n", + "798 0 1 0 0 0 0 0 0 0 0 \n", + "799 0 0 0 0 0 0 1 0 0 0 \n", + "\n", + " Ground Ice Normal Poison Psychic Rock Steel Water \n", + "0 0 0 0 1 0 0 0 0 \n", + "1 0 0 0 1 0 0 0 0 \n", + "2 0 0 0 1 0 0 0 0 \n", + "3 0 0 0 1 0 0 0 0 \n", + "4 0 0 0 0 0 0 0 0 \n", + ".. ... ... ... ... ... ... ... ... \n", + "795 0 0 0 0 0 1 0 0 \n", + "796 0 0 0 0 0 1 0 0 \n", + "797 0 0 0 0 1 0 0 0 \n", + "798 0 0 0 0 1 0 0 0 \n", + "799 0 0 0 0 0 0 0 1 \n", + "\n", + "[800 rows x 18 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "all_types = dummies + dummies2\n", + "\n", + "display(all_types)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Problem Solving Iteration 3\n", + "\n", + "Now we have encoded the pokemon types, we will identify the relationship between `Total` and the encoded fields. Our expectation is:\n", + "\n", + "#### There are relationships between `Total` and the encoded pokemon type variables and we need to identify the correlations.\n", + "\n", + "The information we need to collect is:\n", + "\n", + "#### How to identify the relationship between `Total` and the encoded pokemon type fields?\n", + "\n", + "There are multiple ways to answer this question. The easiest way is to use correlation. In the cell below, calculate the correlation of `Total` to each of the encoded fields. Rank the correlations and identify the #1 pokemon type that is most likely to have the highest `Total`." + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. Def...GhostGrassGroundIceNormalPoisonPsychicRockSteelWater
01BulbasaurGrassPoison3184549496565...0100010000
12IvysaurGrassPoison4056062638080...0100010000
23VenusaurGrassPoison525808283100100...0100010000
33VenusaurMega VenusaurGrassPoison62580100123122120...0100010000
44CharmanderFireNaN3093952436050...0000000000
..................................................................
795719DiancieRockFairy60050100150100150...0000000100
796719DiancieMega DiancieRockFairy70050160110160110...0000000100
797720HoopaHoopa ConfinedPsychicGhost6008011060150130...1000001000
798720HoopaHoopa UnboundPsychicDark6808016060170130...0000001000
799721VolcanionFireWater6008011012013090...0000000001
\n", + "

800 rows × 31 columns

\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 \n", + "3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n", + "4 4 Charmander Fire NaN 309 39 52 43 \n", + ".. ... ... ... ... ... .. ... ... \n", + "795 719 Diancie Rock Fairy 600 50 100 150 \n", + "796 719 DiancieMega Diancie Rock Fairy 700 50 160 110 \n", + "797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 \n", + "798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 \n", + "799 721 Volcanion Fire Water 600 80 110 120 \n", + "\n", + " Sp. Atk Sp. Def ... Ghost Grass Ground Ice Normal Poison \\\n", + "0 65 65 ... 0 1 0 0 0 1 \n", + "1 80 80 ... 0 1 0 0 0 1 \n", + "2 100 100 ... 0 1 0 0 0 1 \n", + "3 122 120 ... 0 1 0 0 0 1 \n", + "4 60 50 ... 0 0 0 0 0 0 \n", + ".. ... ... ... ... ... ... ... ... ... \n", + "795 100 150 ... 0 0 0 0 0 0 \n", + "796 160 110 ... 0 0 0 0 0 0 \n", + "797 150 130 ... 1 0 0 0 0 0 \n", + "798 170 130 ... 0 0 0 0 0 0 \n", + "799 130 90 ... 0 0 0 0 0 0 \n", + "\n", + " Psychic Rock Steel Water \n", + "0 0 0 0 0 \n", + "1 0 0 0 0 \n", + "2 0 0 0 0 \n", + "3 0 0 0 0 \n", + "4 0 0 0 0 \n", + ".. ... ... ... ... \n", + "795 0 1 0 0 \n", + "796 0 1 0 0 \n", + "797 1 0 0 0 \n", + "798 1 0 0 0 \n", + "799 0 0 0 1 \n", + "\n", + "[800 rows x 31 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# your code here\n", + "\n", + "pokemon = pd.concat([pokemon,all_types],axis = 1)\n", + "\n", + "display(pokemon)" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Total\n", + "600 37\n", + "405 26\n", + "580 23\n", + "500 23\n", + "300 19\n", + " ..\n", + "352 1\n", + "334 1\n", + "454 1\n", + "640 1\n", + "514 1\n", + "Name: count, Length: 200, dtype: int64" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pokemon[\"Total\"].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Bonus Question\n", + "\n", + "Say now you can choose both `Type 1` and `Type 2` of the pokemon. In order to receive the best pokemon, which types will you choose?" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "# your code here" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/your-code/challenge-3.ipynb b/your-code/challenge-3.ipynb index a42a586..8f58950 100644 --- a/your-code/challenge-3.ipynb +++ b/your-code/challenge-3.ipynb @@ -1,147 +1,1434 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 3\n", - "\n", - "In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.\n", - "\n", - "You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:\n", - "\n", - "**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.\n", - "\n", - "**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Q1: How to identify VIP & Preferred Customers?\n", - "\n", - "We start by importing all the required libraries:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# import required libraries\n", - "import numpy as np\n", - "import pandas as pd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, extract and import `Orders` dataset into a dataframe variable called `orders`. Print the head of `orders` to overview the data:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "\"Identify VIP and Preferred Customers\" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:\n", - "\n", - "## How to label customers whose aggregated `amount_spent` is in a given quantile range?\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We break down the main problem into several sub problems:\n", - "\n", - "#### Sub Problem 1: How to aggregate the `amount_spent` for unique customers?\n", - "\n", - "#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?\n", - "\n", - "#### Sub Problem 3: How to label selected customers as \"VIP\" or \"Preferred\"?\n", - "\n", - "*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*\n", - "\n", - "Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:\n", - "\n", - "## Q2: How to identify which country has the most VIP Customers?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Q3: How to identify which country has the most VIP+Preferred Customers combined?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Challenge 3\n", + "\n", + "In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.\n", + "\n", + "You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:\n", + "\n", + "**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.\n", + "\n", + "**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q1: How to identify VIP & Preferred Customers?\n", + "\n", + "We start by importing all the required libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "# import required libraries\n", + "import numpy as np\n", + "import pandas as pd\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, extract and import `Orders` dataset into a dataframe variable called `orders`. Print the head of `orders` to overview the data:" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0InvoiceNoStockCodeyearmonthdayhourDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryamount_spent
0053636585123A20101238white hanging heart t-light holder62010-12-01 08:26:002.5517850United Kingdom15.30
115363657105320101238white metal lantern62010-12-01 08:26:003.3917850United Kingdom20.34
2253636584406B20101238cream cupid hearts coat hanger82010-12-01 08:26:002.7517850United Kingdom22.00
3353636584029G20101238knitted union flag hot water bottle62010-12-01 08:26:003.3917850United Kingdom20.34
4453636584029E20101238red woolly hottie white heart.62010-12-01 08:26:003.3917850United Kingdom20.34
.............................................
39791954190458158722613201112512pack of 20 spaceboy napkins122011-12-09 12:50:000.8512680France10.20
39792054190558158722899201112512children's apron dolly girl62011-12-09 12:50:002.1012680France12.60
39792154190658158723254201112512childrens cutlery dolly girl42011-12-09 12:50:004.1512680France16.60
39792254190758158723255201112512childrens cutlery circus parade42011-12-09 12:50:004.1512680France16.60
39792354190858158722138201112512baking set 9 piece retrospot32011-12-09 12:50:004.9512680France14.85
\n", + "

397924 rows × 14 columns

\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 InvoiceNo StockCode year month day hour \\\n", + "0 0 536365 85123A 2010 12 3 8 \n", + "1 1 536365 71053 2010 12 3 8 \n", + "2 2 536365 84406B 2010 12 3 8 \n", + "3 3 536365 84029G 2010 12 3 8 \n", + "4 4 536365 84029E 2010 12 3 8 \n", + "... ... ... ... ... ... ... ... \n", + "397919 541904 581587 22613 2011 12 5 12 \n", + "397920 541905 581587 22899 2011 12 5 12 \n", + "397921 541906 581587 23254 2011 12 5 12 \n", + "397922 541907 581587 23255 2011 12 5 12 \n", + "397923 541908 581587 22138 2011 12 5 12 \n", + "\n", + " Description Quantity InvoiceDate \\\n", + "0 white hanging heart t-light holder 6 2010-12-01 08:26:00 \n", + "1 white metal lantern 6 2010-12-01 08:26:00 \n", + "2 cream cupid hearts coat hanger 8 2010-12-01 08:26:00 \n", + "3 knitted union flag hot water bottle 6 2010-12-01 08:26:00 \n", + "4 red woolly hottie white heart. 6 2010-12-01 08:26:00 \n", + "... ... ... ... \n", + "397919 pack of 20 spaceboy napkins 12 2011-12-09 12:50:00 \n", + "397920 children's apron dolly girl 6 2011-12-09 12:50:00 \n", + "397921 childrens cutlery dolly girl 4 2011-12-09 12:50:00 \n", + "397922 childrens cutlery circus parade 4 2011-12-09 12:50:00 \n", + "397923 baking set 9 piece retrospot 3 2011-12-09 12:50:00 \n", + "\n", + " UnitPrice CustomerID Country amount_spent \n", + "0 2.55 17850 United Kingdom 15.30 \n", + "1 3.39 17850 United Kingdom 20.34 \n", + "2 2.75 17850 United Kingdom 22.00 \n", + "3 3.39 17850 United Kingdom 20.34 \n", + "4 3.39 17850 United Kingdom 20.34 \n", + "... ... ... ... ... \n", + "397919 0.85 12680 France 10.20 \n", + "397920 2.10 12680 France 12.60 \n", + "397921 4.15 12680 France 16.60 \n", + "397922 4.15 12680 France 16.60 \n", + "397923 4.95 12680 France 14.85 \n", + "\n", + "[397924 rows x 14 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# your code here\n", + "\n", + "orders = pd.read_csv(\"Orders.zip\", compression = \"zip\",index_col=False)\n", + "\n", + "display(orders)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "\"Identify VIP and Preferred Customers\" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:\n", + "\n", + "## How to label customers whose aggregated `amount_spent` is in a given quantile range?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We break down the main problem into several sub problems:\n", + "\n", + "#### Sub Problem 1: How to aggregate the `amount_spent` for unique customers?\n", + "\n", + "#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?\n", + "\n", + "#### Sub Problem 3: How to label selected customers as \"VIP\" or \"Preferred\"?\n", + "\n", + "*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*\n", + "\n", + "Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "# your code here\n", + "\n", + "orders_grouped = orders.groupby([\"CustomerID\"]).agg({\"amount_spent\":\"sum\"})" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
amount_spent
CustomerID
1234677183.60
123474310.00
123481797.24
123491757.55
12350334.40
......
18280180.60
1828180.82
18282178.05
182832094.88
182871837.28
\n", + "

4339 rows × 1 columns

\n", + "
" + ], + "text/plain": [ + " amount_spent\n", + "CustomerID \n", + "12346 77183.60\n", + "12347 4310.00\n", + "12348 1797.24\n", + "12349 1757.55\n", + "12350 334.40\n", + "... ...\n", + "18280 180.60\n", + "18281 80.82\n", + "18282 178.05\n", + "18283 2094.88\n", + "18287 1837.28\n", + "\n", + "[4339 rows x 1 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display(orders_grouped)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
amount_spentCustomer status
CustomerID
1234677183.60VIP Customers
123474310.00Preffered Customers
123481797.24Preffered Customers
123491757.55Preffered Customers
12350334.40Other Customers
.........
18280180.60Other Customers
1828180.82Other Customers
18282178.05Other Customers
182832094.88Preffered Customers
182871837.28Preffered Customers
\n", + "

4339 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " amount_spent Customer status\n", + "CustomerID \n", + "12346 77183.60 VIP Customers\n", + "12347 4310.00 Preffered Customers\n", + "12348 1797.24 Preffered Customers\n", + "12349 1757.55 Preffered Customers\n", + "12350 334.40 Other Customers\n", + "... ... ...\n", + "18280 180.60 Other Customers\n", + "18281 80.82 Other Customers\n", + "18282 178.05 Other Customers\n", + "18283 2094.88 Preffered Customers\n", + "18287 1837.28 Preffered Customers\n", + "\n", + "[4339 rows x 2 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "quantile_95 = orders_grouped[\"amount_spent\"].quantile(q=0.95)\n", + "quantile_75 = orders_grouped[\"amount_spent\"].quantile(q=0.75)\n", + "\n", + "\n", + "def quantile(row):\n", + " if row[\"amount_spent\"]> quantile_95:\n", + " return \"VIP Customers\"\n", + " elif row[\"amount_spent\"]< quantile_95 and row[\"amount_spent\"]> quantile_75:\n", + " return \"Preffered Customers\"\n", + " else:\n", + " return \"Other Customers\"\n", + "\n", + "\n", + " \n", + "orders_grouped[\"Customer status\"]= orders_grouped.apply(quantile, axis = 1)\n", + "\n", + "display(orders_grouped)\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:\n", + "\n", + "## Q2: How to identify which country has the most VIP Customers?" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0InvoiceNoStockCodeyearmonthdayhourDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryamount_spentCustomer status
0053636585123A20101238white hanging heart t-light holder62010-12-01 08:26:002.5517850United Kingdom15.30Preffered Customers
115363657105320101238white metal lantern62010-12-01 08:26:003.3917850United Kingdom20.34Preffered Customers
2253636584406B20101238cream cupid hearts coat hanger82010-12-01 08:26:002.7517850United Kingdom22.00Preffered Customers
3353636584029G20101238knitted union flag hot water bottle62010-12-01 08:26:003.3917850United Kingdom20.34Preffered Customers
4453636584029E20101238red woolly hottie white heart.62010-12-01 08:26:003.3917850United Kingdom20.34Preffered Customers
................................................
39791954190458158722613201112512pack of 20 spaceboy napkins122011-12-09 12:50:000.8512680France10.20Other Customers
39792054190558158722899201112512children's apron dolly girl62011-12-09 12:50:002.1012680France12.60Other Customers
39792154190658158723254201112512childrens cutlery dolly girl42011-12-09 12:50:004.1512680France16.60Other Customers
39792254190758158723255201112512childrens cutlery circus parade42011-12-09 12:50:004.1512680France16.60Other Customers
39792354190858158722138201112512baking set 9 piece retrospot32011-12-09 12:50:004.9512680France14.85Other Customers
\n", + "

397924 rows × 15 columns

\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 InvoiceNo StockCode year month day hour \\\n", + "0 0 536365 85123A 2010 12 3 8 \n", + "1 1 536365 71053 2010 12 3 8 \n", + "2 2 536365 84406B 2010 12 3 8 \n", + "3 3 536365 84029G 2010 12 3 8 \n", + "4 4 536365 84029E 2010 12 3 8 \n", + "... ... ... ... ... ... ... ... \n", + "397919 541904 581587 22613 2011 12 5 12 \n", + "397920 541905 581587 22899 2011 12 5 12 \n", + "397921 541906 581587 23254 2011 12 5 12 \n", + "397922 541907 581587 23255 2011 12 5 12 \n", + "397923 541908 581587 22138 2011 12 5 12 \n", + "\n", + " Description Quantity InvoiceDate \\\n", + "0 white hanging heart t-light holder 6 2010-12-01 08:26:00 \n", + "1 white metal lantern 6 2010-12-01 08:26:00 \n", + "2 cream cupid hearts coat hanger 8 2010-12-01 08:26:00 \n", + "3 knitted union flag hot water bottle 6 2010-12-01 08:26:00 \n", + "4 red woolly hottie white heart. 6 2010-12-01 08:26:00 \n", + "... ... ... ... \n", + "397919 pack of 20 spaceboy napkins 12 2011-12-09 12:50:00 \n", + "397920 children's apron dolly girl 6 2011-12-09 12:50:00 \n", + "397921 childrens cutlery dolly girl 4 2011-12-09 12:50:00 \n", + "397922 childrens cutlery circus parade 4 2011-12-09 12:50:00 \n", + "397923 baking set 9 piece retrospot 3 2011-12-09 12:50:00 \n", + "\n", + " UnitPrice CustomerID Country amount_spent \\\n", + "0 2.55 17850 United Kingdom 15.30 \n", + "1 3.39 17850 United Kingdom 20.34 \n", + "2 2.75 17850 United Kingdom 22.00 \n", + "3 3.39 17850 United Kingdom 20.34 \n", + "4 3.39 17850 United Kingdom 20.34 \n", + "... ... ... ... ... \n", + "397919 0.85 12680 France 10.20 \n", + "397920 2.10 12680 France 12.60 \n", + "397921 4.15 12680 France 16.60 \n", + "397922 4.15 12680 France 16.60 \n", + "397923 4.95 12680 France 14.85 \n", + "\n", + " Customer status \n", + "0 Preffered Customers \n", + "1 Preffered Customers \n", + "2 Preffered Customers \n", + "3 Preffered Customers \n", + "4 Preffered Customers \n", + "... ... \n", + "397919 Other Customers \n", + "397920 Other Customers \n", + "397921 Other Customers \n", + "397922 Other Customers \n", + "397923 Other Customers \n", + "\n", + "[397924 rows x 15 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# your code here\n", + "\n", + "merged_df = pd.merge(orders, orders_grouped[\"Customer status\"], how = \"left\", on =[\"CustomerID\"])\n", + "\n", + "display(merged_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CustomerID
Country
United Kingdom84185
EIRE7077
France3290
Germany3127
Netherlands2080
Australia898
Portugal681
Switzerland594
Spain511
Norway420
Channel Islands364
Finland294
Cyprus248
Singapore222
Japan205
Sweden198
Belgium54
Denmark36
\n", + "
" + ], + "text/plain": [ + " CustomerID\n", + "Country \n", + "United Kingdom 84185\n", + "EIRE 7077\n", + "France 3290\n", + "Germany 3127\n", + "Netherlands 2080\n", + "Australia 898\n", + "Portugal 681\n", + "Switzerland 594\n", + "Spain 511\n", + "Norway 420\n", + "Channel Islands 364\n", + "Finland 294\n", + "Cyprus 248\n", + "Singapore 222\n", + "Japan 205\n", + "Sweden 198\n", + "Belgium 54\n", + "Denmark 36" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = merged_df[merged_df[\"Customer status\"] == \"VIP Customers\"]\n", + "\n", + "df.groupby([\"Country\"]).agg({\"CustomerID\":'count'}).sort_values(by=\"CustomerID\", ascending=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q3: How to identify which country has the most VIP+Preferred Customers combined?" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CustomerID
CountryCustomer status
United KingdomPreffered Customers137450
VIP Customers84185
EIREVIP Customers7077
GermanyPreffered Customers4222
FranceVIP Customers3290
GermanyVIP Customers3127
FrancePreffered Customers3011
NetherlandsVIP Customers2080
BelgiumPreffered Customers1503
SpainPreffered Customers1058
AustraliaVIP Customers898
SwitzerlandPreffered Customers776
PortugalVIP Customers681
NorwayPreffered Customers608
SwitzerlandVIP Customers594
SpainVIP Customers511
ItalyPreffered Customers507
NorwayVIP Customers420
PortugalPreffered Customers412
Channel IslandsVIP Customers364
FinlandVIP Customers294
CyprusVIP Customers248
Channel IslandsPreffered Customers225
SingaporeVIP Customers222
IsraelPreffered Customers214
FinlandPreffered Customers210
JapanVIP Customers205
CyprusPreffered Customers203
SwedenVIP Customers198
IcelandPreffered Customers182
DenmarkPreffered Customers181
EIREPreffered Customers161
AustriaPreffered Customers158
PolandPreffered Customers149
CanadaPreffered Customers135
AustraliaPreffered Customers130
SwedenPreffered Customers75
MaltaPreffered Customers67
JapanPreffered Customers67
BelgiumVIP Customers54
LebanonPreffered Customers45
DenmarkVIP Customers36
GreecePreffered Customers32
\n", + "
" + ], + "text/plain": [ + " CustomerID\n", + "Country Customer status \n", + "United Kingdom Preffered Customers 137450\n", + " VIP Customers 84185\n", + "EIRE VIP Customers 7077\n", + "Germany Preffered Customers 4222\n", + "France VIP Customers 3290\n", + "Germany VIP Customers 3127\n", + "France Preffered Customers 3011\n", + "Netherlands VIP Customers 2080\n", + "Belgium Preffered Customers 1503\n", + "Spain Preffered Customers 1058\n", + "Australia VIP Customers 898\n", + "Switzerland Preffered Customers 776\n", + "Portugal VIP Customers 681\n", + "Norway Preffered Customers 608\n", + "Switzerland VIP Customers 594\n", + "Spain VIP Customers 511\n", + "Italy Preffered Customers 507\n", + "Norway VIP Customers 420\n", + "Portugal Preffered Customers 412\n", + "Channel Islands VIP Customers 364\n", + "Finland VIP Customers 294\n", + "Cyprus VIP Customers 248\n", + "Channel Islands Preffered Customers 225\n", + "Singapore VIP Customers 222\n", + "Israel Preffered Customers 214\n", + "Finland Preffered Customers 210\n", + "Japan VIP Customers 205\n", + "Cyprus Preffered Customers 203\n", + "Sweden VIP Customers 198\n", + "Iceland Preffered Customers 182\n", + "Denmark Preffered Customers 181\n", + "EIRE Preffered Customers 161\n", + "Austria Preffered Customers 158\n", + "Poland Preffered Customers 149\n", + "Canada Preffered Customers 135\n", + "Australia Preffered Customers 130\n", + "Sweden Preffered Customers 75\n", + "Malta Preffered Customers 67\n", + "Japan Preffered Customers 67\n", + "Belgium VIP Customers 54\n", + "Lebanon Preffered Customers 45\n", + "Denmark VIP Customers 36\n", + "Greece Preffered Customers 32" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "df = merged_df[~(merged_df[\"Customer status\"] == \"Other Customers\")]\n", + "\n", + "df.groupby([\"Country\",\"Customer status\"]).agg({\"CustomerID\":'count'}).sort_values(by=\"CustomerID\", ascending=False)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}