From f752b7f200d07d7fc79e48d4f5752f862fc58469 Mon Sep 17 00:00:00 2001
From: jrasero
Date: Wed, 23 Oct 2024 16:10:56 -0400
Subject: [PATCH] Update documentation
---
06_numpy_intro.html | 9 +-
.../chapters/module-2/026-functions.ipynb | 1913 --------
.../module-2/In-class-activity-Control.ipynb | 82 -
.../chapters/module-2/In-class_100324.ipynb | 95 -
.../module-2/In-class_100324_sols.ipynb | 256 --
_sources/chapters/module-4/041-numpy.ipynb | 761 ---
.../module-4/043-PandasI-Introduction.ipynb | 1865 ++++++++
...andasII-Exploration_and_Manipulation.ipynb | 4095 +++++++++++++++++
_sources/chapters/module-4/Untitled.ipynb | 99 +
chapters/01-getting_started.html | 14 +
chapters/02-python-basics.html | 14 +
chapters/04-python-basics.html | 14 +
.../module-1/012-intro_python (copia).html | 14 +
chapters/module-1/012-intro_python.html | 14 +
chapters/module-1/013-intro_R.html | 14 +
chapters/module-1/Practice.html | 14 +
chapters/module-1/about_course.html | 14 +
chapters/module-1/jupyter_notebooks.html | 14 +
chapters/module-1/programming.html | 14 +
chapters/module-1/tech_stack.html | 14 +
chapters/module-1/your_first_program.html | 14 +
chapters/module-2/02-cover.html | 14 +
chapters/module-2/021-variables.html | 14 +
chapters/module-2/022-operators.html | 14 +
chapters/module-2/023-strings.html | 14 +
chapters/module-2/024-structures.html | 18 +-
.../module-2/0241-structures_exercises.html | 14 +
chapters/module-2/025-conditional.html | 28 +-
.../module-2/0251-conditional_exercises.html | 14 +
chapters/module-2/026-functions.html | 1664 -------
.../module-2/026-iterables_and_iterators.html | 16 +-
.../module-2/0261-functions_exercises.html | 14 +
chapters/module-2/027-functions.html | 26 +-
chapters/module-2/In-class_100324.html | 482 --
chapters/module-2/In-class_100324_sols.html | 602 ---
chapters/module-3/029-packages.html | 14 +
chapters/module-3/03-cover.html | 14 +
.../module-3/031-errors_and_exceptions.html | 11 +-
.../031-errors_and_exceptions_w_sols.html | 10 +-
chapters/module-3/032-classes.html | 21 +-
.../module-3/033-reading_writing_files.html | 20 +-
chapters/module-3/lab-recursion.html | 9 +-
chapters/module-4/041-numpy.html | 1080 -----
chapters/module-4/041-numpyI.html | 39 +-
chapters/module-4/042-numpyII.html | 77 +-
.../module-4/043-PandasI-Introduction.html | 1857 ++++++++
...PandasII-Exploration_and_Manipulation.html | 3338 ++++++++++++++
chapters/module-4/07-numpy-continued.html | 9 +-
.../Untitled.html} | 97 +-
genindex.html | 1 +
index.html | 1 +
objects.inv | Bin 1218 -> 1301 bytes
.../033-reading_writing_files.err.log | 31 -
...asII-Exploration_and_Manipulation.err.log} | 10 +-
.../Untitled.err.log} | 17 +-
search.html | 1 +
searchindex.js | 2 +-
57 files changed, 11841 insertions(+), 7095 deletions(-)
delete mode 100644 _sources/chapters/module-2/026-functions.ipynb
delete mode 100644 _sources/chapters/module-2/In-class-activity-Control.ipynb
delete mode 100644 _sources/chapters/module-2/In-class_100324.ipynb
delete mode 100644 _sources/chapters/module-2/In-class_100324_sols.ipynb
delete mode 100644 _sources/chapters/module-4/041-numpy.ipynb
create mode 100644 _sources/chapters/module-4/043-PandasI-Introduction.ipynb
create mode 100644 _sources/chapters/module-4/044-PandasII-Exploration_and_Manipulation.ipynb
create mode 100644 _sources/chapters/module-4/Untitled.ipynb
delete mode 100644 chapters/module-2/026-functions.html
delete mode 100644 chapters/module-2/In-class_100324.html
delete mode 100644 chapters/module-2/In-class_100324_sols.html
delete mode 100644 chapters/module-4/041-numpy.html
create mode 100644 chapters/module-4/043-PandasI-Introduction.html
create mode 100644 chapters/module-4/044-PandasII-Exploration_and_Manipulation.html
rename chapters/{module-2/In-class-activity-Control.html => module-4/Untitled.html} (69%)
delete mode 100644 reports/chapters/module-3/033-reading_writing_files.err.log
rename reports/chapters/module-4/{042-numpyII.err.log => 044-PandasII-Exploration_and_Manipulation.err.log} (82%)
rename reports/chapters/{module-2/026-functions.err.log => module-4/Untitled.err.log} (68%)
diff --git a/06_numpy_intro.html b/06_numpy_intro.html
index 2925aca..f437def 100644
--- a/06_numpy_intro.html
+++ b/06_numpy_intro.html
@@ -32,9 +32,9 @@
-
+
-
+
@@ -206,8 +206,9 @@
Module 4: Python for Data Science
diff --git a/_sources/chapters/module-2/026-functions.ipynb b/_sources/chapters/module-2/026-functions.ipynb
deleted file mode 100644
index 230a920..0000000
--- a/_sources/chapters/module-2/026-functions.ipynb
+++ /dev/null
@@ -1,1913 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "a72ce3cb",
- "metadata": {},
- "source": [
- "# Functions\n",
- "\n",
- "What you will learn in this lesson:\n",
- "\n",
- "- Defining functions\n",
- "- Calling functions\n",
- "- Parameters and arguments\n",
- "- Return values\n",
- "- Lambda functions\n",
- "- Scope and lifetime of variables"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "e5a277cd",
- "metadata": {
- "id": "pwrqpQn0iYhf"
- },
- "source": [
- "## I. Introduction\n",
- "\n",
- "Functions take input, perform a specific task and, optionally, produce an output. They contain a block of code to do their work. \n",
- "\n",
- "Function inputs are called both `parameters` and `arguments`.\n",
- "\n",
- "Functions can return a single value, multiple values, or even no value at.\n",
- "\n",
- "Why do we use functions? \n",
- "\n",
- "- **Code economy**\n",
- "\n",
- "With functions, you can keep your code **short** and **concise**. Once a function is defined, it can be used as many times as needed, which is great to not need to write the same code over and over. In addition, functions help your code be more **readable**. For example, if you give a function a well-chosen name, anyone could read your code, and already infer what it does.\n",
- "\n",
- "Other forms of code economy is through modules and packages, which is you a way of grouping your code (e.g. functions).\n",
- "\n",
- "- **Parametrization**\n",
- "\n",
- "Functions accept parameters. Therefore, one can study different function’s behaviors by changing the different parameters.\n",
- "\n",
- "- **Production**\n",
- "\n",
- "We can use functions and the fact that they return one or multiple values for production."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "1ad4dde7",
- "metadata": {},
- "source": [
- "### Built-in functions"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "e8b51ee6",
- "metadata": {},
- "source": [
- "Python provides many built-in functions. You can find the list here: [Python built-in functions](https://docs.python.org/3/library/functions.html) \n",
- "\n",
- "We have already seen some examples of built-in functions, such as `print()`, `id()`, `isinstance()`, `enumerate()` and `zip()`.\n",
- "\n",
- "Another example is `bool()`, which takes an argument (e.g. a variable) and returns True or False"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "id": "53d8dfb8",
- "metadata": {
- "id": "R0SHTW08iYhg"
- },
- "outputs": [
- {
- "data": {
- "text/plain": [
- "True"
- ]
- },
- "execution_count": 2,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "# set a variable and pass into a conditional statement\n",
- "\n",
- "x = 3\n",
- "bool(x < 4)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "id": "bbf61df8",
- "metadata": {
- "id": "7sw_0hnHiYhg"
- },
- "outputs": [
- {
- "data": {
- "text/plain": [
- "False"
- ]
- },
- "execution_count": 3,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "bool(x >= 4)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "7096ed03-a6df-4669-9e62-e45c7f22d855",
- "metadata": {},
- "source": [
- "Or the function is `help()`, which invokes the built-in help system."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "5f213aaf",
- "metadata": {
- "id": "Chfh1ORHiYhh"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Help on class bool in module builtins:\n",
- "\n",
- "class bool(int)\n",
- " | bool(x) -> bool\n",
- " | \n",
- " | Returns True when the argument x is true, False otherwise.\n",
- " | The builtins True and False are the only two instances of the class bool.\n",
- " | The class bool is a subclass of the class int, and cannot be subclassed.\n",
- " | \n",
- " | Method resolution order:\n",
- " | bool\n",
- " | int\n",
- " | object\n",
- " | \n",
- " | Methods defined here:\n",
- " | \n",
- " | __and__(self, value, /)\n",
- " | Return self&value.\n",
- " | \n",
- " | __or__(self, value, /)\n",
- " | Return self|value.\n",
- " | \n",
- " | __rand__(self, value, /)\n",
- " | Return value&self.\n",
- " | \n",
- " | __repr__(self, /)\n",
- " | Return repr(self).\n",
- " | \n",
- " | __ror__(self, value, /)\n",
- " | Return value|self.\n",
- " | \n",
- " | __rxor__(self, value, /)\n",
- " | Return value^self.\n",
- " | \n",
- " | __xor__(self, value, /)\n",
- " | Return self^value.\n",
- " | \n",
- " | ----------------------------------------------------------------------\n",
- " | Static methods defined here:\n",
- " | \n",
- " | __new__(*args, **kwargs)\n",
- " | Create and return a new object. See help(type) for accurate signature.\n",
- " | \n",
- " | ----------------------------------------------------------------------\n",
- " | Methods inherited from int:\n",
- " | \n",
- " | __abs__(self, /)\n",
- " | abs(self)\n",
- " | \n",
- " | __add__(self, value, /)\n",
- " | Return self+value.\n",
- " | \n",
- " | __bool__(self, /)\n",
- " | True if self else False\n",
- " | \n",
- " | __ceil__(...)\n",
- " | Ceiling of an Integral returns itself.\n",
- " | \n",
- " | __divmod__(self, value, /)\n",
- " | Return divmod(self, value).\n",
- " | \n",
- " | __eq__(self, value, /)\n",
- " | Return self==value.\n",
- " | \n",
- " | __float__(self, /)\n",
- " | float(self)\n",
- " | \n",
- " | __floor__(...)\n",
- " | Flooring an Integral returns itself.\n",
- " | \n",
- " | __floordiv__(self, value, /)\n",
- " | Return self//value.\n",
- " | \n",
- " | __format__(self, format_spec, /)\n",
- " | Default object formatter.\n",
- " | \n",
- " | __ge__(self, value, /)\n",
- " | Return self>=value.\n",
- " | \n",
- " | __getattribute__(self, name, /)\n",
- " | Return getattr(self, name).\n",
- " | \n",
- " | __getnewargs__(self, /)\n",
- " | \n",
- " | __gt__(self, value, /)\n",
- " | Return self>value.\n",
- " | \n",
- " | __hash__(self, /)\n",
- " | Return hash(self).\n",
- " | \n",
- " | __index__(self, /)\n",
- " | Return self converted to an integer, if self is suitable for use as an index into a list.\n",
- " | \n",
- " | __int__(self, /)\n",
- " | int(self)\n",
- " | \n",
- " | __invert__(self, /)\n",
- " | ~self\n",
- " | \n",
- " | __le__(self, value, /)\n",
- " | Return self<=value.\n",
- " | \n",
- " | __lshift__(self, value, /)\n",
- " | Return self<>self.\n",
- " | \n",
- " | __rshift__(self, value, /)\n",
- " | Return self>>value.\n",
- " | \n",
- " | __rsub__(self, value, /)\n",
- " | Return value-self.\n",
- " | \n",
- " | __rtruediv__(self, value, /)\n",
- " | Return value/self.\n",
- " | \n",
- " | __sizeof__(self, /)\n",
- " | Returns size in memory, in bytes.\n",
- " | \n",
- " | __sub__(self, value, /)\n",
- " | Return self-value.\n",
- " | \n",
- " | __truediv__(self, value, /)\n",
- " | Return self/value.\n",
- " | \n",
- " | __trunc__(...)\n",
- " | Truncating an Integral returns itself.\n",
- " | \n",
- " | as_integer_ratio(self, /)\n",
- " | Return integer ratio.\n",
- " | \n",
- " | Return a pair of integers, whose ratio is exactly equal to the original int\n",
- " | and with a positive denominator.\n",
- " | \n",
- " | >>> (10).as_integer_ratio()\n",
- " | (10, 1)\n",
- " | >>> (-10).as_integer_ratio()\n",
- " | (-10, 1)\n",
- " | >>> (0).as_integer_ratio()\n",
- " | (0, 1)\n",
- " | \n",
- " | bit_count(self, /)\n",
- " | Number of ones in the binary representation of the absolute value of self.\n",
- " | \n",
- " | Also known as the population count.\n",
- " | \n",
- " | >>> bin(13)\n",
- " | '0b1101'\n",
- " | >>> (13).bit_count()\n",
- " | 3\n",
- " | \n",
- " | bit_length(self, /)\n",
- " | Number of bits necessary to represent self in binary.\n",
- " | \n",
- " | >>> bin(37)\n",
- " | '0b100101'\n",
- " | >>> (37).bit_length()\n",
- " | 6\n",
- " | \n",
- " | conjugate(...)\n",
- " | Returns self, the complex conjugate of any int.\n",
- " | \n",
- " | to_bytes(self, /, length=1, byteorder='big', *, signed=False)\n",
- " | Return an array of bytes representing an integer.\n",
- " | \n",
- " | length\n",
- " | Length of bytes object to use. An OverflowError is raised if the\n",
- " | integer is not representable with the given number of bytes. Default\n",
- " | is length 1.\n",
- " | byteorder\n",
- " | The byte order used to represent the integer. If byteorder is 'big',\n",
- " | the most significant byte is at the beginning of the byte array. If\n",
- " | byteorder is 'little', the most significant byte is at the end of the\n",
- " | byte array. To request the native byte order of the host system, use\n",
- " | `sys.byteorder' as the byte order value. Default is to use 'big'.\n",
- " | signed\n",
- " | Determines whether two's complement is used to represent the integer.\n",
- " | If signed is False and a negative integer is given, an OverflowError\n",
- " | is raised.\n",
- " | \n",
- " | ----------------------------------------------------------------------\n",
- " | Class methods inherited from int:\n",
- " | \n",
- " | from_bytes(bytes, byteorder='big', *, signed=False)\n",
- " | Return the integer represented by the given array of bytes.\n",
- " | \n",
- " | bytes\n",
- " | Holds the array of bytes to convert. The argument must either\n",
- " | support the buffer protocol or be an iterable object producing bytes.\n",
- " | Bytes and bytearray are examples of built-in objects that support the\n",
- " | buffer protocol.\n",
- " | byteorder\n",
- " | The byte order used to represent the integer. If byteorder is 'big',\n",
- " | the most significant byte is at the beginning of the byte array. If\n",
- " | byteorder is 'little', the most significant byte is at the end of the\n",
- " | byte array. To request the native byte order of the host system, use\n",
- " | `sys.byteorder' as the byte order value. Default is to use 'big'.\n",
- " | signed\n",
- " | Indicates whether two's complement is used to represent the integer.\n",
- " | \n",
- " | ----------------------------------------------------------------------\n",
- " | Data descriptors inherited from int:\n",
- " | \n",
- " | denominator\n",
- " | the denominator of a rational number in lowest terms\n",
- " | \n",
- " | imag\n",
- " | the imaginary part of a complex number\n",
- " | \n",
- " | numerator\n",
- " | the numerator of a rational number in lowest terms\n",
- " | \n",
- " | real\n",
- " | the real part of a complex number\n",
- "\n"
- ]
- }
- ],
- "source": [
- "help(bool)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "535f1089",
- "metadata": {
- "id": "ZxNz1zlQiYhh"
- },
- "source": [
- "## II. Creating Functions\n",
- "\n",
- "Let's write a function to compare a list of values against a threshold."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "id": "27e84754",
- "metadata": {
- "id": "A-pPSAWpiYhh"
- },
- "outputs": [],
- "source": [
- "def vals_greater_than_or_equal_to_threshold(vals, thresh):\n",
- " '''\n",
- " PURPOSE: Given a list of values, compare each value against a threshold\n",
- "\n",
- " INPUTS\n",
- " vals list of ints or floats\n",
- " thresh int or float\n",
- "\n",
- " OUTPUT\n",
- " bools list of booleans\n",
- " '''\n",
- "\n",
- " bools = [val >= thresh for val in vals]\n",
- " return bools"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "72c39f46",
- "metadata": {
- "id": "fKh4CY2GiYhh"
- },
- "source": [
- "**Let's break down the components:**\n",
- "- the function definition starts with `def`, followed by name, one or more arguments in parenthesis, and then a colon.\n",
- "- next comes a `docstring` to provide annotation\n",
- "- the function body follows\n",
- "- lastly is a `return` statement\n",
- "\n",
- "The `function call` allows for function use. It consists of function name and required arguments:\n",
- "\n",
- "`vals_greater_than_or_equal_to_threshold(arg1, arg2)` where `arg1`, `arg2` are arbitrary names."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f6a3758b",
- "metadata": {
- "id": "gVvykZQviYhh"
- },
- "source": [
- "#### docstring\n",
- "\n",
- "- A `docstring` is a string that occurs as first statement in module, function, class, or method definition\n",
- "- Saved in `__doc__` attribute\n",
- "- Needs to be indented\n",
- "- ``` '''enclosed in triple quotes like this''' ```\n",
- "\n",
- "**We gave this function a descriptive docstring to:**\n",
- "\n",
- "- explain its purpose\n",
- "- name each input and output, and give their data types"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "id": "2cba0c04",
- "metadata": {
- "id": "_yAjwpBK6O1u"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "bool(x) -> bool\n",
- "\n",
- "Returns True when the argument x is true, False otherwise.\n",
- "The builtins True and False are the only two instances of the class bool.\n",
- "The class bool is a subclass of the class int, and cannot be subclassed.\n"
- ]
- }
- ],
- "source": [
- "# Call the docstring to read about a function\n",
- "print(bool.__doc__)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "id": "b22dbd98-c620-471a-ad40-69818fcf00b6",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Help on function vals_greater_than_or_equal_to_threshold in module __main__:\n",
- "\n",
- "vals_greater_than_or_equal_to_threshold(vals, thresh)\n",
- " PURPOSE: Given a list of values, compare each value against a threshold\n",
- " \n",
- " INPUTS\n",
- " vals list of ints or floats\n",
- " thresh int or float\n",
- " \n",
- " OUTPUT\n",
- " bools list of booleans\n",
- "\n"
- ]
- }
- ],
- "source": [
- "help(vals_greater_than_or_equal_to_threshold)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "a06e91c2",
- "metadata": {
- "id": "7jv-faHxiYhh"
- },
- "source": [
- "---"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "6106a68e",
- "metadata": {
- "id": "pktQhKifiYhh"
- },
- "source": [
- "The function body used a `list comprehension` for the compare:\n",
- "\n",
- "`[val >= thresh for val in vals]`\n",
- "\n",
- "**Let's test our function**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "id": "c438fdb9",
- "metadata": {
- "id": "8AhHzhMgiYhh"
- },
- "outputs": [
- {
- "data": {
- "text/plain": [
- "[False, True]"
- ]
- },
- "execution_count": 10,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "# validate that it works for ints\n",
- "\n",
- "x = [3, 4]\n",
- "thr = 4\n",
- "\n",
- "vals_greater_than_or_equal_to_threshold(x, thr)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "id": "74a03d8d",
- "metadata": {
- "id": "COZGpj27iYhh"
- },
- "outputs": [
- {
- "data": {
- "text/plain": [
- "[False, True]"
- ]
- },
- "execution_count": 11,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "# validate that it works for floats\n",
- "\n",
- "x = [3.0, 4.2]\n",
- "thr = 4.2\n",
- "\n",
- "vals_greater_than_or_equal_to_threshold(x, thr)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "b0f5fcb8",
- "metadata": {
- "id": "GeWMQNEsiYhh"
- },
- "source": [
- "This gives correct results and does exactly what we want. "
- ]
- },
- {
- "cell_type": "markdown",
- "id": "4a9adf81",
- "metadata": {
- "id": "in16N9rXiYhh"
- },
- "source": [
- "**print the docstring**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "id": "c6857a03",
- "metadata": {
- "id": "VKApgn6XiYhh"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\n",
- " PURPOSE: Given a list of values, compare each value against a threshold\n",
- "\n",
- " INPUTS\n",
- " vals list of ints or floats\n",
- " thresh int or float\n",
- "\n",
- " OUTPUT\n",
- " bools list of booleans\n",
- " \n"
- ]
- }
- ],
- "source": [
- "print(vals_greater_than_or_equal_to_threshold.__doc__)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "ed9431c3",
- "metadata": {
- "id": "4f5zLGXbiYhh"
- },
- "source": [
- "**print the help**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 13,
- "id": "96ae88e9",
- "metadata": {
- "id": "ro9qsgsaiYhh"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Help on function vals_greater_than_or_equal_to_threshold in module __main__:\n",
- "\n",
- "vals_greater_than_or_equal_to_threshold(vals, thresh)\n",
- " PURPOSE: Given a list of values, compare each value against a threshold\n",
- " \n",
- " INPUTS\n",
- " vals list of ints or floats\n",
- " thresh int or float\n",
- " \n",
- " OUTPUT\n",
- " bools list of booleans\n",
- "\n"
- ]
- }
- ],
- "source": [
- "help(vals_greater_than_or_equal_to_threshold)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "1ff9fe71",
- "metadata": {
- "id": "5RCRce0_iYhi"
- },
- "source": [
- "## III. Arguments and parameters\n",
- "\n",
- "**Functions need to be called with correct number of parameters.**\n",
- " \n",
- "This function requires two params, but the function call includes only one param\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "50f4e553",
- "metadata": {
- "id": "fZUJYTBliYhi"
- },
- "outputs": [
- {
- "ename": "TypeError",
- "evalue": "fcn_bad_args() missing 1 required positional argument: 'y'",
- "output_type": "error",
- "traceback": [
- "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
- "Input \u001b[0;32mIn [1]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m x\u001b[38;5;241m+\u001b[39my\n\u001b[1;32m 5\u001b[0m \u001b[38;5;66;03m# function call with only 1 of the 2 arguments\u001b[39;00m\n\u001b[0;32m----> 6\u001b[0m fcn_bad_args(\u001b[38;5;241m10\u001b[39m)\n",
- "\u001b[0;31mTypeError\u001b[0m: fcn_bad_args() missing 1 required positional argument: 'y'"
- ]
- }
- ],
- "source": [
- "## function requiring 2 parameters\n",
- "def fcn_bad_args(x, y):\n",
- " return x+y\n",
- "\n",
- "# function call with only 1 of the 2 arguments\n",
- "fcn_bad_args(10)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "beffed14",
- "metadata": {
- "id": "iRRKVuiTiYhi"
- },
- "source": [
- "**When calling a function, parameter order matters.**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "id": "cf1dabe6",
- "metadata": {
- "id": "moaHmNmWiYhi"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "fcn_swapped_args(x,y) = 7\n",
- "fcn_swapped_args(y,x) = 11\n"
- ]
- }
- ],
- "source": [
- "x = 1\n",
- "y = 2\n",
- "\n",
- "# function with order of x then y\n",
- "def fcn_swapped_args(x, y):\n",
- " out = 5 * x + y\n",
- " return out\n",
- "\n",
- "# call function in correct order\n",
- "print('fcn_swapped_args(x,y) =', fcn_swapped_args(x,y))\n",
- "\n",
- "# call function in incorrect order\n",
- "print('fcn_swapped_args(y,x) =', fcn_swapped_args(y,x))"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2e87abcf",
- "metadata": {
- "id": "GXEHQpFxiYhi"
- },
- "source": [
- "Generally it's best to keep parameters in order. \n",
- "\n",
- "You can swap the order by putting the parameter names in the function call."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "id": "4c139ef2",
- "metadata": {
- "id": "FATEk49QiYhi"
- },
- "outputs": [
- {
- "data": {
- "text/plain": [
- "7"
- ]
- },
- "execution_count": 3,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "x1 = 1\n",
- "y1 = 2\n",
- "\n",
- "# call parameter names in function call\n",
- "fcn_swapped_args(y=y1, x=x1)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "a0bac45e",
- "metadata": {
- "id": "pguDfat9iYhi"
- },
- "source": [
- "**Weirdness Alert**\n",
- "\n",
- "Note that the same name can be used for the parameter names and the variables passed to them.\n",
- "\n",
- "The names themselves have nothng to do with each other!\n",
- "\n",
- "In other words, just because a function names an argument `x`, \\\n",
- "the variables passed to it don't have to name `x` or anything like it. \\\n",
- "They can even be named the same thing -- it does not matter."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 52,
- "id": "d5af7774",
- "metadata": {
- "id": "wXAk3914iYhi"
- },
- "outputs": [
- {
- "data": {
- "text/plain": [
- "7"
- ]
- },
- "execution_count": 52,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "foo = 1\n",
- "bar = 2\n",
- "\n",
- "fcn_swapped_args(foo, bar)\n",
- "\n",
- "# works even though function was written as fcn_swapped_arg(x, y)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f776e8ff",
- "metadata": {},
- "source": [
- "**Parameters can be positional, where order matters, or by keyword. (JAVI)** "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "id": "9e86b44f",
- "metadata": {},
- "outputs": [],
- "source": [
- "# function with order of x then y\n",
- "def fcn_swapped_args(x, y, *, param):\n",
- " out = 5 * x + y + param\n",
- " return out"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "id": "81a153b6",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "8"
- ]
- },
- "execution_count": 5,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "fcn_swapped_args(1,2, param=1)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "bf4402d1",
- "metadata": {
- "id": "gG0H4XFgiYhi"
- },
- "source": [
- "## IV. Unpacking list-likes using `*args`\n",
- "\n",
- "The `*` operator can be passed to avoid specifying the arguments individual."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "id": "af9f9181",
- "metadata": {
- "id": "GWJUkt0NiYhi"
- },
- "outputs": [],
- "source": [
- "def show_arg_expansion(*models):\n",
- "\n",
- " print(\"models :\", models)\n",
- " print(\"input arg type :\", type(models))\n",
- " print(\"input arg length:\", len(models))\n",
- " print(\"-----------------------------\")\n",
- "\n",
- " for mod in models:\n",
- " print(mod)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "112349f4",
- "metadata": {
- "id": "1-F_nKl9iYhi"
- },
- "source": [
- "We can pass a tuple of values to the function..."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "id": "eee8af4d",
- "metadata": {
- "id": "GFQ3TXbliYhi"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "models : ('logreg', 'naive_bayes', 'gbm')\n",
- "input arg type : \n",
- "input arg length: 3\n",
- "-----------------------------\n",
- "logreg\n",
- "naive_bayes\n",
- "gbm\n"
- ]
- }
- ],
- "source": [
- "show_arg_expansion(\"logreg\", \"naive_bayes\", \"gbm\")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2d5f6efc",
- "metadata": {
- "id": "P6r87diciYhi"
- },
- "source": [
- "You can pass a list to the function.\n",
- "\n",
- "If you want the elements unpacked, put * before the list."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "id": "ab121d92",
- "metadata": {
- "id": "ruipdu0wiYho"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "models : ('logreg', 'naive_bayes', 'gbm')\n",
- "input arg type : \n",
- "input arg length: 3\n",
- "-----------------------------\n",
- "logreg\n",
- "naive_bayes\n",
- "gbm\n"
- ]
- }
- ],
- "source": [
- "models = [\"logreg\", \"naive_bayes\", \"gbm\"]\n",
- "show_arg_expansion(*models)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "ee876a90",
- "metadata": {
- "id": "Bf1mbaOkiYho"
- },
- "source": [
- "**This approach allows your function to accept an arbitrary number of arguments**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "id": "ce334362",
- "metadata": {
- "id": "eUaGaNdQiYhp"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "models : (['a', 'b', 'c', 'd', 'e', 'f', 'g'],)\n",
- "input arg type : \n",
- "input arg length: 1\n",
- "-----------------------------\n",
- "['a', 'b', 'c', 'd', 'e', 'f', 'g']\n"
- ]
- }
- ],
- "source": [
- "show_arg_expansion('a b c d e f g'.split())"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "id": "e4d33c11",
- "metadata": {
- "id": "6YEbcVy3iYhp"
- },
- "outputs": [],
- "source": [
- "def arg_expansion_example(x, y):\n",
- " return x**y"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "bfc5b09c",
- "metadata": {
- "id": "xckEFWq3iYhp"
- },
- "source": [
- "**The reverse is true, too.**\n",
- "\n",
- "You can use the `*` operator to pass list-like objects to a function that specifies its arguments."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "id": "f53f3340",
- "metadata": {
- "id": "wU7NxYsziYhp"
- },
- "outputs": [
- {
- "data": {
- "text/plain": [
- "256"
- ]
- },
- "execution_count": 9,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "my_args = [2, 8]\n",
- "arg_expansion_example(*my_args)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "e08e573f",
- "metadata": {
- "id": "0MEHR85ziYhp"
- },
- "source": [
- "But, the passed object must be the right length."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "id": "05a992b3",
- "metadata": {
- "id": "edIKHtO6iYhp"
- },
- "outputs": [
- {
- "ename": "TypeError",
- "evalue": "arg_expansion_example() takes 2 positional arguments but 3 were given",
- "output_type": "error",
- "traceback": [
- "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
- "Input \u001b[0;32mIn [10]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m my_args2 \u001b[38;5;241m=\u001b[39m [\u001b[38;5;241m2\u001b[39m, \u001b[38;5;241m8\u001b[39m, \u001b[38;5;241m5\u001b[39m]\n\u001b[0;32m----> 2\u001b[0m arg_expansion_example(\u001b[38;5;241m*\u001b[39mmy_args2)\n",
- "\u001b[0;31mTypeError\u001b[0m: arg_expansion_example() takes 2 positional arguments but 3 were given"
- ]
- }
- ],
- "source": [
- "my_args2 = [2, 8, 5]\n",
- "arg_expansion_example(*my_args2)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "b00557e5",
- "metadata": {
- "id": "DuItUjJWiYhp"
- },
- "source": [
- "## V. Default Arguments"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "92eb66ed",
- "metadata": {
- "id": "zOX-dScLiYhp"
- },
- "source": [
- "`default arguments` set the value when left unspecified."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "id": "e87c0659",
- "metadata": {
- "id": "PhA5ZjvZiYhp"
- },
- "outputs": [],
- "source": [
- "def show_results(precision, printing=True):\n",
- " precision = round(precision, 2)\n",
- "\n",
- " if printing:\n",
- " print('precision =', precision)\n",
- " return precision"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "id": "f3e8190d",
- "metadata": {
- "id": "eSoSgF9YiYhp"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "precision = 0.91\n"
- ]
- }
- ],
- "source": [
- "pr = 0.912\n",
- "res = show_results(pr)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "fd91d810",
- "metadata": {
- "id": "gUL5UY1uiYhp"
- },
- "source": [
- "The function call didn't specify `printing`, so it defaulted to True.\n",
- "\n",
- "Default arguments must follow non-default arguments. This causes trouble:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 13,
- "id": "e13eaf43",
- "metadata": {
- "id": "06tJnUQviYhp"
- },
- "outputs": [
- {
- "ename": "SyntaxError",
- "evalue": "non-default argument follows default argument (1599950692.py, line 1)",
- "output_type": "error",
- "traceback": [
- "\u001b[0;36m Input \u001b[0;32mIn [13]\u001b[0;36m\u001b[0m\n\u001b[0;31m def show_results(precision, printing=True, uhoh):\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m non-default argument follows default argument\n"
- ]
- }
- ],
- "source": [
- "def show_results(precision, printing=True, uhoh):\n",
- " precision = round(precision, 2)\n",
- "\n",
- " if printing:\n",
- " print('precision =', precision)\n",
- " return precision"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "3934953d",
- "metadata": {
- "id": "ECgpfITPiYhp"
- },
- "source": [
- "## VI. Returning Values\n",
- "\n",
- "Functions are not required to have return statement.\n",
- "If there is no return statement, function returns `None` object. \n",
- "\n",
- "Functions can return no value (`None` object), one value, or many. \n",
- "\n",
- "Any Python object can be returned. "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 14,
- "id": "c9dcf261",
- "metadata": {
- "id": "fZtuJVPdiYhp"
- },
- "outputs": [],
- "source": [
- "# returns None, and prints.\n",
- "\n",
- "def fcn_nothing_to_return(x, y):\n",
- " out = 'nothing to see here!'\n",
- " print(out)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 15,
- "id": "7b3da1ba",
- "metadata": {
- "id": "p2l6oKBxiYhp"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "nothing to see here!\n"
- ]
- }
- ],
- "source": [
- "fcn_nothing_to_return(x, y)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 16,
- "id": "df02aa66",
- "metadata": {
- "id": "1sYG6tFMiYhp"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "nothing to see here!\n",
- "None\n"
- ]
- }
- ],
- "source": [
- "r = fcn_nothing_to_return(1, 1)\n",
- "print(r)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 17,
- "id": "2fb50cf1",
- "metadata": {
- "id": "iSW-A_N1iYhp"
- },
- "outputs": [],
- "source": [
- "# returns three values\n",
- "def negate_coords(x, y, z):\n",
- " return -x, -y, -z"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 18,
- "id": "b311a871",
- "metadata": {
- "id": "UQrSV9nKiYhp"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "a= -10\n",
- "b= -20\n",
- "c= -30\n"
- ]
- }
- ],
- "source": [
- "a,b,c = negate_coords(10,20,30)\n",
- "print('a=', a)\n",
- "print('b=', b)\n",
- "print('c=', c)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2e58d483",
- "metadata": {
- "id": "ifsJheijiYhp"
- },
- "source": [
- "**If you don't need an output, use the dummy variable `_`**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 19,
- "id": "4c08f4c0",
- "metadata": {
- "id": "AgBl1V6TiYhp"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "d= -10\n",
- "e= -20\n"
- ]
- }
- ],
- "source": [
- "d,e,_ = negate_coords(10,20,30)\n",
- "print('d=', d)\n",
- "print('e=', e)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "d70c1880",
- "metadata": {
- "id": "AmIDwm-xiYhq"
- },
- "source": [
- "**Note:** For clarity purposes, it's generally a good idea to include return statements, even if not returning a value. \n",
- "You can use `return` or `return None`."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "e2419a1b",
- "metadata": {
- "id": "AlIg9oEEiYhq"
- },
- "source": [
- "**Functions can contain multiple return statements**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 23,
- "id": "0b4abca7",
- "metadata": {
- "id": "rbJKLgjQiYhq"
- },
- "outputs": [],
- "source": [
- "# For non-negative values, the first `return` is reached. \n",
- "# For negative values, the second `return` is reached.\n",
- "def absolute_value(num):\n",
- " if num >= 0:\n",
- " return num\n",
- " return -num"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 24,
- "id": "9747ffe6",
- "metadata": {
- "id": "n66QsiX0iYhq"
- },
- "outputs": [
- {
- "data": {
- "text/plain": [
- "4"
- ]
- },
- "execution_count": 24,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "absolute_value(-4)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 25,
- "id": "55691a2a",
- "metadata": {
- "id": "PBB4Rgl6iYhq"
- },
- "outputs": [
- {
- "data": {
- "text/plain": [
- "4"
- ]
- },
- "execution_count": 25,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "absolute_value(4)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "7a24a84a",
- "metadata": {
- "id": "M5ofH0wjiYhq"
- },
- "source": [
- "## VII. Variable Scope\n",
- "\n",
- "A variable's **scope** is the part of a program where it is **visible**.\n",
- "\n",
- "Visible means available or usable.\n",
- "\n",
- "If a variable is **in scope** to a function, it is visible the function.\n",
- "\n",
- "If it is **out of scope** to a function, it is not visible the function.\n",
- "\n",
- "When a variable is defined inside of a function, is is not visible outside of the function.\n",
- "\n",
- "We say such variables are **local** to the function.\n",
- "\n",
- "They are also removed from memory when the function completes."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 33,
- "id": "b5ad2fc5",
- "metadata": {
- "id": "xv-6GxpmiYhq"
- },
- "outputs": [],
- "source": [
- "def show_scope(x):\n",
- " x = 10*x\n",
- " z = 4\n",
- " print('z inside function =', z)\n",
- " print('memory address of z inside function =', hex(id(z)))\n",
- " return x"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 34,
- "id": "8b34ad6f",
- "metadata": {
- "id": "b-rPp26DiYhq"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "z inside function = 4\n",
- "memory address of z inside function = 0x87f448\n"
- ]
- },
- {
- "data": {
- "text/plain": [
- "60"
- ]
- },
- "execution_count": 34,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "# This code recognizes z from inside the function.\n",
- "show_scope(6)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 35,
- "id": "70e63789",
- "metadata": {
- "id": "WPxisGHoiYhq"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "z = 2\n"
- ]
- }
- ],
- "source": [
- "# Calling it from outside, where it isn't defined, throws an error.\n",
- "print('z =', z)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "d1bf5b32",
- "metadata": {
- "id": "MEQu0vTEiYhq"
- },
- "source": [
- "If we define `z` and call the function, the update to `z` won't pass outside the function."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 36,
- "id": "c89eafdd",
- "metadata": {
- "id": "hy-7O-SIiYhq"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "z outside: 0x87f408\n",
- "z inside function = 4\n",
- "memory address of z inside function = 0x87f448\n",
- "z = 2\n"
- ]
- }
- ],
- "source": [
- "z = 2\n",
- "print('z outside:', hex(id(z)))\n",
- "out = show_scope(6)\n",
- "print('z = ', z)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "ce7821a9",
- "metadata": {
- "id": "Qw_Ux3H6iYhq"
- },
- "source": [
- "### Local versus Global Variables\n",
- "\n",
- "It is helpful to have a good understanding of local versus global variables. \n",
- "\n",
- "Not having this understanding can lead to surprises and confusion. \n",
- "\n",
- "**Example 1: Variable defined outside function, used inside function**\n",
- "\n",
- "In the code below: \n",
- "\n",
- "`x` is global and seen from inside the function. \n",
- "`r` is local to the function. trying to print outside function throws error."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 43,
- "id": "1979b09a",
- "metadata": {
- "id": "Shy4_1YJiYhq"
- },
- "outputs": [],
- "source": [
- "x = 10\n",
- "\n",
- "def fcn(r):\n",
- " out = x + r\n",
- " return(out)\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 44,
- "id": "46ed7069",
- "metadata": {
- "id": "Wa7Nou2oiYhr"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "16\n"
- ]
- }
- ],
- "source": [
- "print(fcn(6)) # works"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 48,
- "id": "94ea7040",
- "metadata": {
- "id": "LzWya7D3iYhr"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "None\n"
- ]
- }
- ],
- "source": [
- "print(r) # fails"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "8d446ec3",
- "metadata": {
- "id": "_qKdTaTUiYhr"
- },
- "source": [
- "**Example 2: Variable defined outside function, updated and used inside function**\n",
- "\n",
- "`fcn` uses the local version of `x`"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 49,
- "id": "20d5484d",
- "metadata": {
- "id": "M6ljfBCXiYhr"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "x from fcn: 20\n",
- "fcn(6): 26\n",
- "x: 10\n"
- ]
- }
- ],
- "source": [
- "x = 10\n",
- "\n",
- "def fcn(a):\n",
- " x = 20\n",
- " sum = x + a\n",
- " print('x from fcn:', x)\n",
- " return(sum)\n",
- "\n",
- "print('fcn(6):', fcn(6))\n",
- "print('x:', x)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "9dc44cb9",
- "metadata": {
- "id": "AIoU3FS_iYhr"
- },
- "source": [
- "**Example 3: Variable defined outside function. Inside function, print variable, update, and use**\n",
- "\n",
- "This one may be confusing. It fails! \n",
- "\n",
- "Python treats `x` inside function as the local `x`. \n",
- "The print() occurs before `x` is assigned, so it can't find `x`."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 50,
- "id": "8631e0b7",
- "metadata": {
- "id": "CgCgPEjSiYhr"
- },
- "outputs": [
- {
- "ename": "UnboundLocalError",
- "evalue": "cannot access local variable 'x' where it is not associated with a value",
- "output_type": "error",
- "traceback": [
- "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[0;31mUnboundLocalError\u001b[0m Traceback (most recent call last)",
- "Input \u001b[0;32mIn [50]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mx from fcn, after update:\u001b[39m\u001b[38;5;124m'\u001b[39m, x)\n\u001b[1;32m 8\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m(out)\n\u001b[0;32m---> 10\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mfcn(6):\u001b[39m\u001b[38;5;124m'\u001b[39m, fcn(\u001b[38;5;241m6\u001b[39m))\n\u001b[1;32m 11\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mx:\u001b[39m\u001b[38;5;124m'\u001b[39m, x)\n",
- "Input \u001b[0;32mIn [50]\u001b[0m, in \u001b[0;36mfcn\u001b[0;34m(a)\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mfcn\u001b[39m(a):\n\u001b[0;32m----> 4\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mx from fcn, before update:\u001b[39m\u001b[38;5;124m'\u001b[39m, x)\n\u001b[1;32m 5\u001b[0m x \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m20\u001b[39m\n\u001b[1;32m 6\u001b[0m out \u001b[38;5;241m=\u001b[39m x \u001b[38;5;241m+\u001b[39m a\n",
- "\u001b[0;31mUnboundLocalError\u001b[0m: cannot access local variable 'x' where it is not associated with a value"
- ]
- }
- ],
- "source": [
- "x = 10\n",
- "\n",
- "def fcn(a):\n",
- " print('x from fcn, before update:', x)\n",
- " x = 20\n",
- " out = x + a\n",
- " print('x from fcn, after update:', x)\n",
- " return(out)\n",
- "\n",
- "print('fcn(6):', fcn(6))\n",
- "print('x:', x)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "ebe3ddd2",
- "metadata": {
- "id": "ezZJ3GPIiYhr"
- },
- "source": [
- "The error can be fixed by referencing x as `global` inside function.\n",
- "\n",
- "Only necessary if we wish to reassign the variable.\n",
- "\n",
- "It is also useful when we want several functions to operate on the same variable"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 51,
- "id": "519f8a98",
- "metadata": {
- "id": "WZpWaWqSiYhr"
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "x from fcn, before update: 10\n",
- "x from fcn, after update: 20\n",
- "fcn(6): 26\n",
- "x: 20\n"
- ]
- }
- ],
- "source": [
- "x = 10\n",
- "\n",
- "def fcn(a):\n",
- " global x # add this to reference global x outside function\n",
- " print('x from fcn, before update:', x)\n",
- " x = 20\n",
- " out = x + a\n",
- " print('x from fcn, after update:', x)\n",
- " return(out)\n",
- "\n",
- "print('fcn(6):', fcn(6))\n",
- "print('x:', x)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "fda91375",
- "metadata": {
- "id": "U5bXlKVFiYhr"
- },
- "source": [
- "## VIII. Function Design\n",
- "\n",
- "\n",
- "Some good practices for creating and using functions:\n",
- "\n",
- "- design a function to do one thing.\n",
- "\n",
- "Make them as simple as possible. In this way, a function will be: \n",
- "- more comprehensible\n",
- "- easier to maintain\n",
- "- reusable\n",
- "\n",
- "This helps avoid situations where a team has 20 variations of similar functions.\n",
- "\n",
- "Give your function a good name. What makes a function name good? \n",
- "\n",
- "- It should reflect the action in performs.\n",
- "- Be consistent in naming conventions.\n",
- "- A name like `compute_variances_sort_save_print` suggests the function is overworked!\n",
- "- If the function `compute_variances` also produces plots and updates variables, it will cause confusion. \n",
- "\n",
- "Always give your function a docstring:\n",
- "\n",
- "- Particularly important since indicating data types is not required. \n",
- "- As a side note, you can include this information by using `type annotation`.\n",
- "\n",
- "Function docstrings are stored in attribute `__doc__`; they can be shown like this:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "fc4e113d",
- "metadata": {
- "id": "0SE10DYaiYhr"
- },
- "outputs": [],
- "source": [
- "print(bool.__doc__)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "1bc5b382",
- "metadata": {
- "id": "yZQX-GkIiYhr"
- },
- "source": [
- "---"
- ]
- }
- ],
- "metadata": {
- "anaconda-cloud": {},
- "colab": {
- "include_colab_link": true,
- "provenance": []
- },
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.11.9"
- },
- "vscode": {
- "interpreter": {
- "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
- }
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/_sources/chapters/module-2/In-class-activity-Control.ipynb b/_sources/chapters/module-2/In-class-activity-Control.ipynb
deleted file mode 100644
index 11ab158..0000000
--- a/_sources/chapters/module-2/In-class-activity-Control.ipynb
+++ /dev/null
@@ -1,82 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "a113c941-6d6e-4b92-a4ad-f14bbab59828",
- "metadata": {},
- "source": [
- "4 - What prints in the above `if`, `elif` example if val = 5? "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "id": "7e53291d-5337-4e8c-b16f-5668d934a678",
- "metadata": {},
- "outputs": [],
- "source": [
- "val = 5\n",
- "\n",
- "if -10 < val < -5:\n",
- " print('bucket 1')\n",
- "if -5 <= val < -2:\n",
- " print('bucket 2')\n",
- "elif val == -2:\n",
- " print('bucket 3')"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "9e6e97b3-f7c3-4018-aaa1-b1208d41063b",
- "metadata": {},
- "source": [
- "1 - Write a statement that evaluates if a number is a number is divisble by 2, divisble by 3, or divisble by both numbers. If there is a number that is not divisable by either number, make sure to include a statement for that case. *Test the numbers: 4, 6, 9, 13"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "216bdc5c-f5ad-4d84-bec7-d4cf2bfc5349",
- "metadata": {},
- "source": [
- "2- Write a piece of code that does the following:\n",
- "\n",
- "- First, it sets a variable `it` to 0\n",
- "- Then, it sets a variable `max_iter` to 100\n",
- "- After that, it does the following while `it` < `max_iter`:\n",
- " - If `it` is equal to 0 or `it` is divisible by 10, print `it`. hint: use modulo operator %\n",
- " - It increases `it` by 1\n",
- "\n",
- "How many iterations did this loop run?"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "5c1e6c85-6f1e-4037-9286-e7a1bd091200",
- "metadata": {},
- "outputs": [],
- "source": []
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.11.9"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/_sources/chapters/module-2/In-class_100324.ipynb b/_sources/chapters/module-2/In-class_100324.ipynb
deleted file mode 100644
index be4804d..0000000
--- a/_sources/chapters/module-2/In-class_100324.ipynb
+++ /dev/null
@@ -1,95 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "6b64190e-5dce-4574-ad0d-ddb5a5a2cc00",
- "metadata": {},
- "source": [
- "## Instructions \n",
- "* Complete Jupyter Notebook for tasks. Clearly show relevant work.\n",
- "\n",
- "* Before beginning to fill in the notebook, make sure you have written down your name in the first cell, as well as of any collaborators in case you have worked in groups.\n",
- "\n",
- "* Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel $\\rightarrow$ Restart) and then **run all cells** (in the menubar, select Cell $\\rightarrow$ Run All).\n",
- "\n",
- "* Make sure your changes have been changed and when ready, submit the jupyter notebook through Canvas. "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 29,
- "id": "91980179-e320-4362-8941-3ef911ce7671",
- "metadata": {},
- "outputs": [],
- "source": [
- "NAME = \"\"\n",
- "COLLABORATORS = \"\""
- ]
- },
- {
- "cell_type": "markdown",
- "id": "332218e0-c6bc-4c48-9fc6-7bf2294a273b",
- "metadata": {},
- "source": [
- "---"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f08804f8-a3cb-4f48-82b2-068bb2c347ea",
- "metadata": {},
- "source": [
- "**1. The following function aims at returning whether a given list of numbers contains at least one element divisible by 7, but it fails due to a bug. You can test this failing behaviour by passing, for example, the list: [10, 7, 13].**\n",
- "\n",
- "**Redefine the function correcting the bug. Test that it behaves correctly with several examples. In addition, in a separate markdown cell, explain why the function was failing. (3 points)**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "ee7b6179-eca2-4fae-89b6-c2bc5db5863e",
- "metadata": {},
- "outputs": [],
- "source": [
- "def has_atleast_one_seven(nums):\n",
- " \"\"\"Return whether the given list of numbers is lucky. A lucky list contains\n",
- " at least one number divisible by 7.\n",
- " \"\"\"\n",
- " for num in nums:\n",
- " if num % 7 == 0:\n",
- " return True\n",
- " else:\n",
- " return False"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "70fed6c8-c03a-4b5d-ba84-20f33a4c9870",
- "metadata": {},
- "source": [
- "**2. Using a for loop and and if statements, print all the numbers between 10 and 1000 (including both sides) that are divisible by 7 and the sum of their digits is greater than 10, but only if the number itself is also odd. (2 points)**"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.11.9"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/_sources/chapters/module-2/In-class_100324_sols.ipynb b/_sources/chapters/module-2/In-class_100324_sols.ipynb
deleted file mode 100644
index 456d179..0000000
--- a/_sources/chapters/module-2/In-class_100324_sols.ipynb
+++ /dev/null
@@ -1,256 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "6b64190e-5dce-4574-ad0d-ddb5a5a2cc00",
- "metadata": {},
- "source": [
- "## Instructions \n",
- "* Complete Jupyter Notebook for tasks. Clearly show relevant work.\n",
- "\n",
- "* Before beginning to fill in the notebook, make sure you have written down your name in the first cell, as well as of any collaborators in case you have worked in groups.\n",
- "\n",
- "* Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel $\\rightarrow$ Restart) and then **run all cells** (in the menubar, select Cell $\\rightarrow$ Run All).\n",
- "\n",
- "* Make sure your changes have been changed and when ready, submit the jupyter notebook through Canvas. "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 29,
- "id": "91980179-e320-4362-8941-3ef911ce7671",
- "metadata": {},
- "outputs": [],
- "source": [
- "NAME = \"\"\n",
- "COLLABORATORS = \"\""
- ]
- },
- {
- "cell_type": "markdown",
- "id": "332218e0-c6bc-4c48-9fc6-7bf2294a273b",
- "metadata": {},
- "source": [
- "---"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f08804f8-a3cb-4f48-82b2-068bb2c347ea",
- "metadata": {},
- "source": [
- "**1. The following function aims at returning whether a given list of numbers contains at least one element divisible by 7, but it fails due to a bug. You can test this failing behaviour by passing, for example, the list: [10, 7, 13].**\n",
- "\n",
- "**Redefine the function correcting the bug. Test that it behaves correctly with several examples. In addition, in a separate markdown cell, explain why the function was failing. (3 points)**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "ee7b6179-eca2-4fae-89b6-c2bc5db5863e",
- "metadata": {},
- "outputs": [],
- "source": [
- "def has_atleast_one_seven(nums):\n",
- " \"\"\"Return whether the given list of numbers is lucky. A lucky list contains\n",
- " at least one number divisible by 7.\n",
- " \"\"\"\n",
- " for num in nums:\n",
- " if num % 7 == 0:\n",
- " return True\n",
- " \n",
- " return False"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "e9c98609-e356-4f40-b1e3-b81488101022",
- "metadata": {},
- "outputs": [],
- "source": [
- "# Solution\n",
- "def has_atleast_one_seven(nums):\n",
- " \"\"\"Return whether the given list of numbers is lucky. A lucky list contains\n",
- " at least one number divisible by 7.\n",
- " \"\"\"\n",
- " for num in nums:\n",
- " if num % 7 == 0:\n",
- " return True\n",
- "\n",
- " return False"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "id": "df21433c-eaec-4448-93b4-bf3fd6ce4719",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "True"
- ]
- },
- "execution_count": 2,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "has_atleast_one_seven([10,1,13,1,1,1,1,6,7])"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "id": "2cf67aef-1518-49c4-97d2-dc888289945a",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "False"
- ]
- },
- "execution_count": 3,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "has_atleast_one_seven([10,8,13,1,4,5,6,81,6])"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "15b49768-ee34-4350-9342-73a27876f36c",
- "metadata": {},
- "source": [
- "The bug was that in the original function, only one iteration was performed, because of both return in the if/else, which made you exit the function after the first iteration. We only need to exit the function early if we happen to hit an element divisible by 7. If there is not, we should carry on looping the list until the end. If that happens, that's becaus we don't have any element divisible by 7, and the function should return a False."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "70fed6c8-c03a-4b5d-ba84-20f33a4c9870",
- "metadata": {},
- "source": [
- "**2. Using a for loop and and if statements, print all the numbers between 10 and 1000 (including both sides) that are divisible by 7 and the sum of their digits is greater than 10, but only if the number itself is also odd. (2 points)**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "id": "4480d428-ee6e-4629-b298-e0001548828e",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "49\n",
- "77\n",
- "119\n",
- "147\n",
- "175\n",
- "189\n",
- "245\n",
- "259\n",
- "273\n",
- "287\n",
- "329\n",
- "357\n",
- "371\n",
- "385\n",
- "399\n",
- "427\n",
- "455\n",
- "469\n",
- "483\n",
- "497\n",
- "525\n",
- "539\n",
- "553\n",
- "567\n",
- "581\n",
- "595\n",
- "609\n",
- "623\n",
- "637\n",
- "651\n",
- "665\n",
- "679\n",
- "693\n",
- "707\n",
- "735\n",
- "749\n",
- "763\n",
- "777\n",
- "791\n",
- "805\n",
- "819\n",
- "833\n",
- "847\n",
- "861\n",
- "875\n",
- "889\n",
- "903\n",
- "917\n",
- "931\n",
- "945\n",
- "959\n",
- "973\n",
- "987\n"
- ]
- }
- ],
- "source": [
- "numbers = range(10, 1001) # You have to use 1001 here to include 1000\n",
- "\n",
- "for num in numbers:\n",
- " if num % 7 == 0: # This condition checks if the number is divisible by 7\n",
- " num_as_str = str(num) # convert the number to string and store this info in a new variable\n",
- "\n",
- " # This block now calculates the sum of the digits of num as a string \n",
- " sum_digits = 0\n",
- "\n",
- " for s in num_as_str: # This loop through its individual digit of num and update the sumation adding the iterated digit\n",
- " sum_digits = sum_digits + int(s) # we have to convert the character back to an integer. \n",
- "\n",
- " # Now add a condition to print num, only if the sum of digits is greater than 10 and num is odd, i.e, not even.\n",
- " if (sum_digits > 10) and (num % 2 != 0):\n",
- " print(num)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "cd604c69-21c9-4e4b-8e0d-d80c0cd89ec4",
- "metadata": {},
- "outputs": [],
- "source": []
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.11.9"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/_sources/chapters/module-4/041-numpy.ipynb b/_sources/chapters/module-4/041-numpy.ipynb
deleted file mode 100644
index 82da112..0000000
--- a/_sources/chapters/module-4/041-numpy.ipynb
+++ /dev/null
@@ -1,761 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "view-in-github"
- },
- "source": [
- " "
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "-y-T4jtwGviE"
- },
- "source": [
- "```\n",
- "Course: DS1002\n",
- "Module: Module 5\n",
- "Topic: NumPy Continued\n",
- "```\n",
- "\n",
- "### PREREQUISITES\n",
- "- import / import as\n",
- "- variables\n",
- "- creating basic arrays\n",
- "\n",
- "### SOURCES\n",
- "- https://numpy.org/\n",
- "- https://en.wikipedia.org/wiki/NumPy\n",
- "- https://www.scipy.org/\n",
- "- https://en.wikipedia.org/wiki/SciPy\n",
- "- https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html\n",
- "- https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html\n",
- "\n",
- "### OBJECTIVES\n",
- "- Introduction to Numpy\n",
- "\n",
- "### CONCEPTS\n",
- "- The numpy package contains useful functions for math operations\n",
- "- The ndarray is the workhorse of the package"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {
- "id": "AILfOgNJGviG"
- },
- "outputs": [],
- "source": [
- "import numpy as np"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "ncuzokBmGviH"
- },
- "source": [
- "# Data Types\n",
- "\n",
- "One way to control the data type of a NumPy array is to declare it when the array is created using the `dtype` keyword argument. Take a look at the data type NumPy uses by default when creating an array with `np.zeros()`. Could it be updated?\n",
- "\n",
- "* Using `np.zeros()`, create an array of zeros that has three rows and two columns; call it `zero_array`. \n",
- "* Print the data type of `zero_array`.\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "xafx9TvIGviH"
- },
- "outputs": [],
- "source": [
- "# Create an array of zeros with three rows and two columns\n",
- "zero_array = np.zeros((3, 2))\n",
- "\n",
- "# Print the data type of zero_array\n",
- "print(zero_array.dtype)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "6hJhoXE_GviH"
- },
- "source": [
- "* Create a new array of zeros called `zero_int_array`, which will also have three rows and two columns, but the data type should be `np.int32`. \n",
- "\n",
- "* Print the data type of `zero_int_array`."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "9YQeHJmeGviH"
- },
- "outputs": [],
- "source": [
- "# Create a new array of int32 zeros with three rows and two columns\n",
- "zero_int_array = np.zeros((3, 2), dtype=np.int32)\n",
- "\n",
- "# Print the data type of zero_int_array\n",
- "print(zero_int_array.dtype)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "a5e12dCWGviH"
- },
- "outputs": [],
- "source": [
- "data1 = [6, 7.5, 8, 0, 1] # create a list\n",
- "arr1 = np.array(data1) # turn list into a numpy array\n",
- "arr1"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "1qzrva3xGviI"
- },
- "outputs": [],
- "source": [
- "data2 = [[1, 2, 3, 4], [5, 6, 7, 8]] # create a 2-dimensional list\n",
- "arr2 = np.array(data2) # turn that list into a numpy array\n",
- "arr2"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "KwLy0JTSGviI"
- },
- "outputs": [],
- "source": [
- "arr2.ndim # get the dimension"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "CeU7pinvGviI"
- },
- "outputs": [],
- "source": [
- "arr2.shape # get the shape"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "zOybNYgdGviI"
- },
- "outputs": [],
- "source": [
- "arr1.dtype # get the data type for arr1"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "YDDGk_bJGviI"
- },
- "outputs": [],
- "source": [
- "arr2.dtype # get the data type for arr2"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "5aAHNKBdHFJc"
- },
- "outputs": [],
- "source": [
- "arr1 = np.array([1, 2, 3], dtype=np.float64)\n",
- "arr1.dtype"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "tYx2e2SYHcNU"
- },
- "outputs": [],
- "source": [
- "arr2 = np.array([1, 2, 3], dtype=np.int32)\n",
- "arr2.dtype"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "w_Y6EmmYHdBr"
- },
- "outputs": [],
- "source": [
- "arr = np.array([1, 2, 3, 4, 5])\n",
- "arr.dtype"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "d9y9gQZnHfgQ"
- },
- "outputs": [],
- "source": [
- "float_arr = arr.astype(np.float64)\n",
- "float_arr.dtype"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "Ioa95YuUHjke"
- },
- "outputs": [],
- "source": [
- "arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])\n",
- "arr"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "H2Ref496HmEn"
- },
- "outputs": [],
- "source": [
- "arr.astype(np.int32)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "YcVkJW44Hovc"
- },
- "outputs": [],
- "source": [
- "numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)\n",
- "numeric_strings.astype(float)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "_mnr60XBhlQu"
- },
- "source": [
- "# Basic Array Manipulations + Calculations\n",
- "\n",
- "NumPy has over 500 basic operations, most of which can be performed upon array data. Here are some common/obvious examples:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "oj_yeFhVhr00"
- },
- "outputs": [],
- "source": [
- "# Start with a basic two-dimensional array and manipulate in basic ways:\n",
- "foo = np.array([[1,2,3,4,5],[6,7,8,9,10]])\n",
- "\n",
- "# flip - reverse the data of an array\n",
- "oof = np.flip(foo)\n",
- "print(oof)\n",
- "\n",
- "# copy - copy an array to an entirely separate array\n",
- "goo = np.copy(foo)\n",
- "print(goo)\n",
- "\n",
- "# concatenate - combine all elements within an array into a single list\n",
- "new_foo = np.concatenate(foo)\n",
- "print(new_foo)\n",
- "\n",
- "# min\n",
- "foomin = np.min(foo)\n",
- "print(foomin)\n",
- "\n",
- "# max\n",
- "foomax = np.max(foo)\n",
- "print(foomax)\n",
- "\n",
- "# mean\n",
- "foomean = np.mean(foo)\n",
- "print(foomean)\n",
- "\n",
- "# sin/cos/tan\n",
- "foosin = np.sin(foo)\n",
- "print(foosin)\n",
- "\n",
- "# standard deviation\n",
- "foostd = np.std(foo)\n",
- "print(foostd)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "CiViyuDPel1F"
- },
- "source": [
- "# Inserting + Dropping Array Values\n",
- "\n",
- "There are times it's useful to drop a specific index or the start/end of an array of values."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "JWSMOpVNeuz0"
- },
- "outputs": [],
- "source": [
- "# Drop the first item\n",
- "\n",
- "myarr = np.array([10,15,20,25,30,35,40,45,50])\n",
- "\n",
- "drop_start = myarr[1:]\n",
- "print(drop_start)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "55GV3eHqfCGi"
- },
- "outputs": [],
- "source": [
- "# Drop the last item\n",
- "\n",
- "drop_end = myarr[:-1]\n",
- "print(drop_end)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "t_0oJQGPfTo3"
- },
- "outputs": [],
- "source": [
- "# Drop a specific index\n",
- "\n",
- "drop_second_index = np.delete(myarr, 2)\n",
- "print(drop_second_index)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "D054l_D1fdW4"
- },
- "outputs": [],
- "source": [
- "# Drop every other item in the array\n",
- "# Removes every other item starting with 0\n",
- "\n",
- "every_other = np.delete(myarr, np.arange(0, len(myarr), 2))\n",
- "print(every_other)\n",
- "\n",
- "# Another version that removes every other starting with 1\n",
- "every_other = np.delete(myarr, np.arange(0, len(myarr), 2))\n",
- "print(every_other)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "Pt51js_uGviI"
- },
- "source": [
- "# Slicing\n",
- "\n",
- "**Higher Dimensional Arrays**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "cXyHQD1uGviI",
- "jupyter": {
- "outputs_hidden": false
- },
- "outputId": "98fb2a52-e17c-4267-c0ef-6da2e1ac87f6"
- },
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([[1, 2, 3],\n",
- " [4, 5, 6],\n",
- " [7, 8, 9]])"
- ]
- },
- "execution_count": 81,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
- "arr2d"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "C-uyvPMbGviJ",
- "jupyter": {
- "outputs_hidden": false
- },
- "outputId": "df69c70d-d8ae-48d1-c79e-42140e209b31"
- },
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([7, 8, 9])"
- ]
- },
- "execution_count": 82,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "arr2d[2]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "v6oZZo3_GviJ",
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "arr2d[0][2]"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "OQoiWqP8GviJ"
- },
- "source": [
- "\n",
- "\n",
- "**Slicing: Simplified notation**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "dlzPyEjyGviJ",
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "arr2d[0, 2]"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "H5_SeBXOGviJ"
- },
- "source": [
- "A nice visual of a 2D array\n",
- "\n",
- " "
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "kchDQjMnGviJ"
- },
- "source": [
- "**Two-Dimensional Array Slicing**\n",
- "\n",
- " "
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "8_yM1k7FGviJ"
- },
- "source": [
- "**3D arrays**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "Ag4yJHkeGviJ",
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])\n",
- "\n",
- "arr3d"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "HlvrpZVPGviJ"
- },
- "outputs": [],
- "source": [
- "arr3d.shape"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "Fm-pWgYXGviJ",
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "arr3d"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "6VAqLUBRGviJ",
- "tags": []
- },
- "source": [
- "If you find NumPy's way of showing the data a bit difficult to parse visually.\n",
- "\n",
- "đź’ˇ **Here is a way to visualize 3 and higher dimensional data:**\n",
- "\n",
- "```python\n",
- "[ # AXIS 0 CONTAINS 2 ELEMENTS (arrays)\n",
- " [ # AXIS 1 CONTAINS 2 ELEMENTS (arrays)\n",
- " [1, 2, 3], # AXIS 3 CONTAINS 3 ELEMENTS (integers)\n",
- " [4, 5, 6] # AXIS 3\n",
- " ], \n",
- " [ # AXIS 1\n",
- " [7, 8, 9],\n",
- " [10, 11, 12]\n",
- " ]\n",
- "]\n",
- "```\n",
- "Each axis is a level in the nested hierarchy, i.e. a tree or DAG (directed-acyclic graph).\n",
- "\n",
- "* Each axis is a container.\n",
- "* There is only one top container.\n",
- "* Only the bottom containers have data."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "NKkVfrMYGviJ"
- },
- "source": [
- "**Omit lower indices**\n",
- "\n",
- "In multidimensional arrays, if you omit later indices, the returned object will be a **lower-dimensional ndarray** consisting of all the data contained by the higher indexed dimension.\n",
- "\n",
- "So in the 2 Ă— 2 Ă— 3 array `arr3d`:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "ERwpcfHNGviJ",
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "arr3d[0]"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "j3Tj6WFaGviJ"
- },
- "source": [
- "Saving data before modifying an array."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "V4ofZLp0GviJ",
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "old_values = arr3d[0].copy()\n",
- "arr3d[0] = 42\n",
- "arr3d"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "O8U6N49VGviJ"
- },
- "source": [
- "Putting the data back."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "TCxoaJhSGviK",
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "arr3d[0] = old_values\n",
- "arr3d"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "3S5QVCrhGviK"
- },
- "source": [
- "Similarly, `arr3d[1, 0]` gives you all of the values whose indices start with (1, 0), forming a 1-dimensional array:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "9YZIypDsGviK",
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "arr3d[1, 0]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "rZ8KkmPOGviK",
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "x = arr3d[1]\n",
- "x"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "kW_r5Om9GviK",
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "x[0]"
- ]
- }
- ],
- "metadata": {
- "anaconda-cloud": {},
- "colab": {
- "include_colab_link": true,
- "provenance": []
- },
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.11.9"
- },
- "vscode": {
- "interpreter": {
- "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
- }
- }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}
diff --git a/_sources/chapters/module-4/043-PandasI-Introduction.ipynb b/_sources/chapters/module-4/043-PandasI-Introduction.ipynb
new file mode 100644
index 0000000..3357a46
--- /dev/null
+++ b/_sources/chapters/module-4/043-PandasI-Introduction.ipynb
@@ -0,0 +1,1865 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "96ee8d82-a996-4adc-898f-1e10d94b7040",
+ "metadata": {},
+ "source": [
+ "# Introduction to Pandas\n",
+ "\n",
+ "What you will learn in this lesson:\n",
+ "\n",
+ "- What is Pandas\n",
+ "- How to import Pandas\n",
+ "- Create series and dataframes\n",
+ "- A first glimpse to dataframes' attributes and methods.\n",
+ "\n",
+ "## What is Pandas?\n",
+ "\n",
+ "Pandas is a fundamental data manipulation library in Python, widely used in data science and analytics.\n",
+ "\n",
+ "It provides two key data structures:\n",
+ "\n",
+ "- **Series**: A one-dimensional labeled array capable of holding any data type.\n",
+ "\n",
+ "- **DataFrame**: A two-dimensional labeled data structure with columns that can contain different types of data.\n",
+ "\n",
+ "By far, the most important data structure in Pandas (and R) is the dataframe. In most data science applications, we work with tabular data where rows represent observations and columns represent features. Effective data manipulation is critical for preparing clean and useful datasets for analysis, and this is where Pandas (or R’s dplyr) and DataFrames play an essential role.\n",
+ "\n",
+ "While Pandas dataframes are inspired by R’s Dataframe structure, there are key differences beyond the programming languages. Notably, Pandas dataframes have **indexes**, whereas R Dataframes do not, which introduces different approaches to handling and manipulating data."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f3208084-a199-498f-a6a0-63b72f5c0352",
+ "metadata": {},
+ "source": [
+ "## Importing pandas\n",
+ "\n",
+ "Pandas is a package, so to use it, we need to first `import` it. \n",
+ "\n",
+ "It is very common to give Pandas the name alias `pd`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "79bba53f-9a07-493e-b592-13a7b5a49280",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "523e83d7-b762-436b-b214-25a1fb0fbb03",
+ "metadata": {},
+ "source": [
+ "## Axis Labels \n",
+ "\n",
+ "Before diving into series and dataframes, it is important to understand that both data structures store data along axes (like in NumPy), but, these data also have **labels** along each axis. These axis labels are collectively referred to as the **index**.\n",
+ "\n",
+ "Therefore, series and dataframes have:\n",
+ "* An array that holds the data. \n",
+ "* The ondexes that hold the labels for observations (rows) and features (columns).\n",
+ "\n",
+ "Therefore, in contrast to NumPy, Pandas integrates identifible data in a natural way, making it easier to work with structured data.\n",
+ "\n",
+ "Why we use an index?\n",
+ "\n",
+ "* It allows you to access elements in an array by name.\n",
+ "* It enables series objects with shared index labels to be easily combined.\n",
+ "\n",
+ "In fact, **a dataframe is a collection of series** with a common index. \n",
+ "\n",
+ "To this collection of series the dataframe adds a set of labels along the horizontal axis.\n",
+ "* The index is **axis 0** or the rows.\n",
+ "* The columns are another kind of index, called **axis 1**.\n",
+ "\n",
+ "It is **crucial** to understand the difference between the index of a dataframe and its data in order to understand how dataframes work. Many errors stem from not understanding this difference.\n",
+ "\n",
+ "**Indexes are powerful and controversial.**\n",
+ "\n",
+ "* They enable complex operations when accessing or combining data.\n",
+ "* However, they can be costly in terms of performance and challenging to work with (especially multi-indexes).\n",
+ "* Users coming from R might find Pandas dataframes behave differently than expected, leading to some confusion.\n",
+ "\n",
+ "Below are some visuals to help:\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "625845f5-7886-476d-9453-13565aad3cef",
+ "metadata": {},
+ "source": [
+ "## Series\n",
+ "\n",
+ "A series is essentially a one-dimensional array with **labels** along its axis. Its data must be of a single type, similar to NumPy arrays (which are used internally by Pandas).\n",
+ "\n",
+ "The simplest way to create a series is by using the `pd.Series()` function."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a276ee6a-4e95-4a79-ab32-27862b160d28",
+ "metadata": {},
+ "source": [
+ "### How to create a series"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4dc959c1-8e1e-4018-b1f7-2af2b13ab81a",
+ "metadata": {},
+ "source": [
+ "- **From a `list`**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 80,
+ "id": "50594c01-0837-4873-8a59-68ce2cc1b9a4",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "0 10\n",
+ "1 20\n",
+ "2 30\n",
+ "3 40\n",
+ "4 50\n",
+ "dtype: int64\n"
+ ]
+ }
+ ],
+ "source": [
+ "data = [10, 20, 30, 40, 50]\n",
+ "series = pd.Series(data)\n",
+ "print(series)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f14b72c2-6224-46df-a58f-c939bf370111",
+ "metadata": {},
+ "source": [
+ "- **From a `dictionary`**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 81,
+ "id": "8d99f945-682a-480d-9df9-8455ed97a084",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a 1\n",
+ "b 2\n",
+ "c 3\n",
+ "dtype: int64\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Series from a dictionary\n",
+ "data_dict = {'a': 1, 'b': 2, 'c': 3}\n",
+ "series_dict = pd.Series(data_dict)\n",
+ "print(series_dict)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1a59ca5e-a2b4-4261-a161-2a2df90457bd",
+ "metadata": {},
+ "source": [
+ "### Properties overview"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "62097929-702b-4bab-8db0-5ee4b586dbdf",
+ "metadata": {},
+ "source": [
+ "- Indexing and slicing work similarly as with lists:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 103,
+ "id": "33200065-1dcf-432c-a40d-2429d866ac89",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "10\n",
+ "1 20\n",
+ "2 30\n",
+ "dtype: int64\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Accessing elements in a Series\n",
+ "print(series[0]) # First element\n",
+ "print(series[1:3]) # Slicing"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bfabf9db-6806-45c8-8887-759a893d0737",
+ "metadata": {},
+ "source": [
+ "- It has methods and attributes:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 106,
+ "id": "39f57a5b-501f-4f6c-87d3-8c1847954fbc",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([10, 20, 30, 40, 50])"
+ ]
+ },
+ "execution_count": 106,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# This attribute provides the series as numpy array\n",
+ "series.values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 107,
+ "id": "6c71c9ab-fa4c-4848-a6dc-0d2de367c5da",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "150\n",
+ "30.0\n"
+ ]
+ }
+ ],
+ "source": [
+ "# This two methods return the sum and mean\n",
+ "print(series.sum())\n",
+ "print(series.mean())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6036ce1a-560d-4369-832f-d96a1bbaaccd",
+ "metadata": {},
+ "source": [
+ "## Data Frames"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0e1452c5-ec69-4e82-9f4d-bfbd9e558ff1",
+ "metadata": {},
+ "source": [
+ "As mentioned earlier, a dataframe is a two-dimensional labeled data structure with columns that can contain different data types. You can think of it as similar to an Excel table, where each column can store different types of data (e.g., numbers, text, or dates)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e37fa218-ae58-4ae1-b6d8-3e0965f2c8e6",
+ "metadata": {},
+ "source": [
+ "### How to create a dataframe\n",
+ "\n",
+ "The simplest way to create a dataframe is by using the `pd.DataFrame()` function."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5b55c483-ba46-444e-abdb-12a936f10538",
+ "metadata": {},
+ "source": [
+ "- **As dict of arrays or lists. This is the easiest and probably most common way, along with reading from a file:**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "6aba0b9c-40f6-4540-87c2-127145d55faa",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " Name \n",
+ " Age \n",
+ " City \n",
+ " Coffe Lover \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " Alice \n",
+ " 24 \n",
+ " New York \n",
+ " True \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " Bob \n",
+ " 27 \n",
+ " Los Angeles \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " Charlie \n",
+ " 22 \n",
+ " Chicago \n",
+ " True \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " David \n",
+ " 32 \n",
+ " Houston \n",
+ " True \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " Name Age City Coffe Lover\n",
+ "0 Alice 24 New York True\n",
+ "1 Bob 27 Los Angeles False\n",
+ "2 Charlie 22 Chicago True\n",
+ "3 David 32 Houston True"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Creating a DataFrame from a dictionary\n",
+ "data = {\n",
+ " 'Name': ['Alice', 'Bob', 'Charlie', 'David'],\n",
+ " 'Age': [24, 27, 22, 32],\n",
+ " 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],\n",
+ " 'Coffe Lover': [True, False, True, True]\n",
+ "}\n",
+ "df = pd.DataFrame(data)\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "90cd4fad-fc4d-4364-b808-3ce232f6992f",
+ "metadata": {},
+ "source": [
+ "- **As a list of lists, where each list corresponds to one observation:** "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "e572919e-6ecf-45bb-ba4c-570d65a133fd",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 2 \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " Alice \n",
+ " 24 \n",
+ " New York \n",
+ " True \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " Bob \n",
+ " 27 \n",
+ " Los Angeles \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " Charlie \n",
+ " 22 \n",
+ " Chicago \n",
+ " True \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " David \n",
+ " 32 \n",
+ " Houston \n",
+ " True \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " 0 1 2 3\n",
+ "0 Alice 24 New York True\n",
+ "1 Bob 27 Los Angeles False\n",
+ "2 Charlie 22 Chicago True\n",
+ "3 David 32 Houston True"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Creating a DataFrame from a dictionary\n",
+ "data = [\n",
+ " ['Alice', 24, 'New York', True],\n",
+ " ['Bob', 27, 'Los Angeles', False],\n",
+ " ['Charlie', 22, 'Chicago', True],\n",
+ " ['David', 32, 'Houston', True]\n",
+ "]\n",
+ "df = pd.DataFrame(data)\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f6784bf9-baca-461a-8528-32a5a7a11ca1",
+ "metadata": {},
+ "source": [
+ "As you can see, if we only pass the data, Pandas will automatically assign sequential integers as labels for both axes (rows and columns).\n",
+ "\n",
+ "However, we can customize this behavior by specifying our own labels when creating the dataframe:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "4a38b3db-83ff-42d8-8154-0550cc21f073",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " Name \n",
+ " Age \n",
+ " City \n",
+ " Coffe Lover \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " obs1 \n",
+ " Alice \n",
+ " 24 \n",
+ " New York \n",
+ " True \n",
+ " \n",
+ " \n",
+ " obs2 \n",
+ " Bob \n",
+ " 27 \n",
+ " Los Angeles \n",
+ " False \n",
+ " \n",
+ " \n",
+ " obs3 \n",
+ " Charlie \n",
+ " 22 \n",
+ " Chicago \n",
+ " True \n",
+ " \n",
+ " \n",
+ " obs4 \n",
+ " David \n",
+ " 32 \n",
+ " Houston \n",
+ " True \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " Name Age City Coffe Lover\n",
+ "obs1 Alice 24 New York True\n",
+ "obs2 Bob 27 Los Angeles False\n",
+ "obs3 Charlie 22 Chicago True\n",
+ "obs4 David 32 Houston True"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "index = [\"obs1\",\"obs2\",\"obs3\",\"obs4\"]\n",
+ "columns = ['Name', 'Age', 'City', 'Coffe Lover']\n",
+ "\n",
+ "df = pd.DataFrame(data, columns=columns, index=index)\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a27ba58e-0c7f-4a19-97e3-06aceb139998",
+ "metadata": {},
+ "source": [
+ "Alternatively, dataframes are objects, meaning they come with attributes and methods. The `index` and `columns` attributes allow you to retrieve the labels for axis 0 (rows) and axis 1 (columns) and redefine them if needed:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "43ae2f42-0089-4c45-8524-1aa02043d7e8",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " 0 1 2 3\n",
+ "0 Alice 24 New York True\n",
+ "1 Bob 27 Los Angeles False\n",
+ "2 Charlie 22 Chicago True\n",
+ "3 David 32 Houston True\n",
+ " Name Age City Coffe Lover\n",
+ "obs1 Alice 24 New York True\n",
+ "obs2 Bob 27 Los Angeles False\n",
+ "obs3 Charlie 22 Chicago True\n",
+ "obs4 David 32 Houston True\n"
+ ]
+ }
+ ],
+ "source": [
+ "data = [\n",
+ " ['Alice', 24, 'New York', True],\n",
+ " ['Bob', 27, 'Los Angeles', False],\n",
+ " ['Charlie', 22, 'Chicago', True],\n",
+ " ['David', 32, 'Houston', True]\n",
+ "]\n",
+ "df = pd.DataFrame(data)\n",
+ "print(df)\n",
+ "\n",
+ "df.index = index\n",
+ "df.columns = columns\n",
+ "\n",
+ "print(df)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "943b2a28-dd09-4a6a-b083-6cb4def771a8",
+ "metadata": {},
+ "source": [
+ "- From a file using, for example, `pd.read_csv`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "f94f6e51-fb9d-4e87-84a0-ed1c0b338535",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['',\n",
+ " 'Read a comma-separated values (csv) file into DataFrame.',\n",
+ " '',\n",
+ " 'Also supports optionally iterating or breaking of the file',\n",
+ " 'into chunks.',\n",
+ " '',\n",
+ " 'Additional help can be found in the online docs for',\n",
+ " '`IO Tools `_.',\n",
+ " '',\n",
+ " 'Parameters',\n",
+ " '----------',\n",
+ " 'filepath_or_buffer : str, path object or file-like object',\n",
+ " ' Any valid string path is acceptable. The string could be a URL. Valid',\n",
+ " ' URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is',\n",
+ " ' expected. A local file could be: file://localhost/path/to/table.csv.']"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.read_csv.__doc__.split(\"\\n\")[:15]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "id": "00bac682-2533-4bad-9d1c-39ad23adf833",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 5.1 \n",
+ " 3.5 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 4.9 \n",
+ " 3.0 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 3.2 \n",
+ " 1.3 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4.6 \n",
+ " 3.1 \n",
+ " 1.5 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 5.0 \n",
+ " 3.6 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 145 \n",
+ " 6.7 \n",
+ " 3.0 \n",
+ " 5.2 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 146 \n",
+ " 6.3 \n",
+ " 2.5 \n",
+ " 5.0 \n",
+ " 1.9 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 147 \n",
+ " 6.5 \n",
+ " 3.0 \n",
+ " 5.2 \n",
+ " 2.0 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 148 \n",
+ " 6.2 \n",
+ " 3.4 \n",
+ " 5.4 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 149 \n",
+ " 5.9 \n",
+ " 3.0 \n",
+ " 5.1 \n",
+ " 1.8 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " \n",
+ " 150 rows Ă— 5 columns \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "0 5.1 3.5 1.4 0.2 setosa\n",
+ "1 4.9 3.0 1.4 0.2 setosa\n",
+ "2 4.7 3.2 1.3 0.2 setosa\n",
+ "3 4.6 3.1 1.5 0.2 setosa\n",
+ "4 5.0 3.6 1.4 0.2 setosa\n",
+ ".. ... ... ... ... ...\n",
+ "145 6.7 3.0 5.2 2.3 virginica\n",
+ "146 6.3 2.5 5.0 1.9 virginica\n",
+ "147 6.5 3.0 5.2 2.0 virginica\n",
+ "148 6.2 3.4 5.4 2.3 virginica\n",
+ "149 5.9 3.0 5.1 1.8 virginica\n",
+ "\n",
+ "[150 rows x 5 columns]"
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df = pd.read_csv(\"https://raw.githubusercontent.com/mwaskom/seaborn-data/refs/heads/master/iris.csv\")\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "72b5cb2d-f4eb-4d03-9f7c-30c2c42b5d47",
+ "metadata": {},
+ "source": [
+ "By default, this function expects a file with comma-separated values (CSV). However, you have the flexibility to read files with different delimiters by specifying the `sep` parameter:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "id": "5046f06b-c947-4cc1-af7f-d27d8cea3dfa",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " AGE \n",
+ " SEX \n",
+ " BMI \n",
+ " BP \n",
+ " S1 \n",
+ " S2 \n",
+ " S3 \n",
+ " S4 \n",
+ " S5 \n",
+ " S6 \n",
+ " Y \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 59 \n",
+ " 2 \n",
+ " 32.1 \n",
+ " 101.00 \n",
+ " 157 \n",
+ " 93.2 \n",
+ " 38.0 \n",
+ " 4.00 \n",
+ " 4.8598 \n",
+ " 87 \n",
+ " 151 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 1 \n",
+ " 21.6 \n",
+ " 87.00 \n",
+ " 183 \n",
+ " 103.2 \n",
+ " 70.0 \n",
+ " 3.00 \n",
+ " 3.8918 \n",
+ " 69 \n",
+ " 75 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 72 \n",
+ " 2 \n",
+ " 30.5 \n",
+ " 93.00 \n",
+ " 156 \n",
+ " 93.6 \n",
+ " 41.0 \n",
+ " 4.00 \n",
+ " 4.6728 \n",
+ " 85 \n",
+ " 141 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 24 \n",
+ " 1 \n",
+ " 25.3 \n",
+ " 84.00 \n",
+ " 198 \n",
+ " 131.4 \n",
+ " 40.0 \n",
+ " 5.00 \n",
+ " 4.8903 \n",
+ " 89 \n",
+ " 206 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 50 \n",
+ " 1 \n",
+ " 23.0 \n",
+ " 101.00 \n",
+ " 192 \n",
+ " 125.4 \n",
+ " 52.0 \n",
+ " 4.00 \n",
+ " 4.2905 \n",
+ " 80 \n",
+ " 135 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 437 \n",
+ " 60 \n",
+ " 2 \n",
+ " 28.2 \n",
+ " 112.00 \n",
+ " 185 \n",
+ " 113.8 \n",
+ " 42.0 \n",
+ " 4.00 \n",
+ " 4.9836 \n",
+ " 93 \n",
+ " 178 \n",
+ " \n",
+ " \n",
+ " 438 \n",
+ " 47 \n",
+ " 2 \n",
+ " 24.9 \n",
+ " 75.00 \n",
+ " 225 \n",
+ " 166.0 \n",
+ " 42.0 \n",
+ " 5.00 \n",
+ " 4.4427 \n",
+ " 102 \n",
+ " 104 \n",
+ " \n",
+ " \n",
+ " 439 \n",
+ " 60 \n",
+ " 2 \n",
+ " 24.9 \n",
+ " 99.67 \n",
+ " 162 \n",
+ " 106.6 \n",
+ " 43.0 \n",
+ " 3.77 \n",
+ " 4.1271 \n",
+ " 95 \n",
+ " 132 \n",
+ " \n",
+ " \n",
+ " 440 \n",
+ " 36 \n",
+ " 1 \n",
+ " 30.0 \n",
+ " 95.00 \n",
+ " 201 \n",
+ " 125.2 \n",
+ " 42.0 \n",
+ " 4.79 \n",
+ " 5.1299 \n",
+ " 85 \n",
+ " 220 \n",
+ " \n",
+ " \n",
+ " 441 \n",
+ " 36 \n",
+ " 1 \n",
+ " 19.6 \n",
+ " 71.00 \n",
+ " 250 \n",
+ " 133.2 \n",
+ " 97.0 \n",
+ " 3.00 \n",
+ " 4.5951 \n",
+ " 92 \n",
+ " 57 \n",
+ " \n",
+ " \n",
+ " \n",
+ " 442 rows Ă— 11 columns \n",
+ " "
+ ],
+ "text/plain": [
+ " AGE SEX BMI BP S1 S2 S3 S4 S5 S6 Y\n",
+ "0 59 2 32.1 101.00 157 93.2 38.0 4.00 4.8598 87 151\n",
+ "1 48 1 21.6 87.00 183 103.2 70.0 3.00 3.8918 69 75\n",
+ "2 72 2 30.5 93.00 156 93.6 41.0 4.00 4.6728 85 141\n",
+ "3 24 1 25.3 84.00 198 131.4 40.0 5.00 4.8903 89 206\n",
+ "4 50 1 23.0 101.00 192 125.4 52.0 4.00 4.2905 80 135\n",
+ ".. ... ... ... ... ... ... ... ... ... ... ...\n",
+ "437 60 2 28.2 112.00 185 113.8 42.0 4.00 4.9836 93 178\n",
+ "438 47 2 24.9 75.00 225 166.0 42.0 5.00 4.4427 102 104\n",
+ "439 60 2 24.9 99.67 162 106.6 43.0 3.77 4.1271 95 132\n",
+ "440 36 1 30.0 95.00 201 125.2 42.0 4.79 5.1299 85 220\n",
+ "441 36 1 19.6 71.00 250 133.2 97.0 3.00 4.5951 92 57\n",
+ "\n",
+ "[442 rows x 11 columns]"
+ ]
+ },
+ "execution_count": 36,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Here columns were separated using tabular spaces\n",
+ "df = pd.read_csv(\"https://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt\", sep=\"\\t\")\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b26d4d36-1692-44bb-8981-e86e21d3cd2f",
+ "metadata": {},
+ "source": [
+ "### An introduction to some attributes and methods"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e0dfb3d7-2094-43fe-ac13-b5de44a2ee41",
+ "metadata": {},
+ "source": [
+ "- `index`, `columns`: Retrieve the row and columns labels."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "id": "053d2545-8b02-46d4-933e-0f484a27cf55",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(RangeIndex(start=0, stop=442, step=1),\n",
+ " Index(['AGE', 'SEX', 'BMI', 'BP', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'Y'], dtype='object'))"
+ ]
+ },
+ "execution_count": 37,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.index, df.columns"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8d0eec0d-598f-4231-886e-07cee19a661d",
+ "metadata": {},
+ "source": [
+ "We can also assign names to the axes, not just to the rows (observations) and columns (features):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "id": "bb0db4f0-bdcb-40c9-85c8-fb401038f45d",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " cols_id \n",
+ " AGE \n",
+ " SEX \n",
+ " BMI \n",
+ " BP \n",
+ " S1 \n",
+ " S2 \n",
+ " S3 \n",
+ " S4 \n",
+ " S5 \n",
+ " S6 \n",
+ " Y \n",
+ " \n",
+ " \n",
+ " obs_id \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 59 \n",
+ " 2 \n",
+ " 32.1 \n",
+ " 101.00 \n",
+ " 157 \n",
+ " 93.2 \n",
+ " 38.0 \n",
+ " 4.00 \n",
+ " 4.8598 \n",
+ " 87 \n",
+ " 151 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 1 \n",
+ " 21.6 \n",
+ " 87.00 \n",
+ " 183 \n",
+ " 103.2 \n",
+ " 70.0 \n",
+ " 3.00 \n",
+ " 3.8918 \n",
+ " 69 \n",
+ " 75 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 72 \n",
+ " 2 \n",
+ " 30.5 \n",
+ " 93.00 \n",
+ " 156 \n",
+ " 93.6 \n",
+ " 41.0 \n",
+ " 4.00 \n",
+ " 4.6728 \n",
+ " 85 \n",
+ " 141 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 24 \n",
+ " 1 \n",
+ " 25.3 \n",
+ " 84.00 \n",
+ " 198 \n",
+ " 131.4 \n",
+ " 40.0 \n",
+ " 5.00 \n",
+ " 4.8903 \n",
+ " 89 \n",
+ " 206 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 50 \n",
+ " 1 \n",
+ " 23.0 \n",
+ " 101.00 \n",
+ " 192 \n",
+ " 125.4 \n",
+ " 52.0 \n",
+ " 4.00 \n",
+ " 4.2905 \n",
+ " 80 \n",
+ " 135 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 437 \n",
+ " 60 \n",
+ " 2 \n",
+ " 28.2 \n",
+ " 112.00 \n",
+ " 185 \n",
+ " 113.8 \n",
+ " 42.0 \n",
+ " 4.00 \n",
+ " 4.9836 \n",
+ " 93 \n",
+ " 178 \n",
+ " \n",
+ " \n",
+ " 438 \n",
+ " 47 \n",
+ " 2 \n",
+ " 24.9 \n",
+ " 75.00 \n",
+ " 225 \n",
+ " 166.0 \n",
+ " 42.0 \n",
+ " 5.00 \n",
+ " 4.4427 \n",
+ " 102 \n",
+ " 104 \n",
+ " \n",
+ " \n",
+ " 439 \n",
+ " 60 \n",
+ " 2 \n",
+ " 24.9 \n",
+ " 99.67 \n",
+ " 162 \n",
+ " 106.6 \n",
+ " 43.0 \n",
+ " 3.77 \n",
+ " 4.1271 \n",
+ " 95 \n",
+ " 132 \n",
+ " \n",
+ " \n",
+ " 440 \n",
+ " 36 \n",
+ " 1 \n",
+ " 30.0 \n",
+ " 95.00 \n",
+ " 201 \n",
+ " 125.2 \n",
+ " 42.0 \n",
+ " 4.79 \n",
+ " 5.1299 \n",
+ " 85 \n",
+ " 220 \n",
+ " \n",
+ " \n",
+ " 441 \n",
+ " 36 \n",
+ " 1 \n",
+ " 19.6 \n",
+ " 71.00 \n",
+ " 250 \n",
+ " 133.2 \n",
+ " 97.0 \n",
+ " 3.00 \n",
+ " 4.5951 \n",
+ " 92 \n",
+ " 57 \n",
+ " \n",
+ " \n",
+ " \n",
+ " 442 rows Ă— 11 columns \n",
+ " "
+ ],
+ "text/plain": [
+ "cols_id AGE SEX BMI BP S1 S2 S3 S4 S5 S6 Y\n",
+ "obs_id \n",
+ "0 59 2 32.1 101.00 157 93.2 38.0 4.00 4.8598 87 151\n",
+ "1 48 1 21.6 87.00 183 103.2 70.0 3.00 3.8918 69 75\n",
+ "2 72 2 30.5 93.00 156 93.6 41.0 4.00 4.6728 85 141\n",
+ "3 24 1 25.3 84.00 198 131.4 40.0 5.00 4.8903 89 206\n",
+ "4 50 1 23.0 101.00 192 125.4 52.0 4.00 4.2905 80 135\n",
+ "... ... ... ... ... ... ... ... ... ... ... ...\n",
+ "437 60 2 28.2 112.00 185 113.8 42.0 4.00 4.9836 93 178\n",
+ "438 47 2 24.9 75.00 225 166.0 42.0 5.00 4.4427 102 104\n",
+ "439 60 2 24.9 99.67 162 106.6 43.0 3.77 4.1271 95 132\n",
+ "440 36 1 30.0 95.00 201 125.2 42.0 4.79 5.1299 85 220\n",
+ "441 36 1 19.6 71.00 250 133.2 97.0 3.00 4.5951 92 57\n",
+ "\n",
+ "[442 rows x 11 columns]"
+ ]
+ },
+ "execution_count": 38,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.index.name = 'obs_id'\n",
+ "df.columns.name = 'cols_id'\n",
+ "\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "451953f3-54ab-4b50-8a06-6125173b429f",
+ "metadata": {},
+ "source": [
+ "- `values`: Retrieves dataframe's data as a numpy array:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "id": "7ceb4fb3-8d08-451f-82fe-a5de87afe756",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[ 59. , 2. , 32.1 , ..., 4.8598, 87. , 151. ],\n",
+ " [ 48. , 1. , 21.6 , ..., 3.8918, 69. , 75. ],\n",
+ " [ 72. , 2. , 30.5 , ..., 4.6728, 85. , 141. ],\n",
+ " ...,\n",
+ " [ 60. , 2. , 24.9 , ..., 4.1271, 95. , 132. ],\n",
+ " [ 36. , 1. , 30. , ..., 5.1299, 85. , 220. ],\n",
+ " [ 36. , 1. , 19.6 , ..., 4.5951, 92. , 57. ]])"
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "id": "e677d22b-6d84-473b-9dd8-d1630b5dd1f1",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "numpy.ndarray"
+ ]
+ },
+ "execution_count": 40,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "type(df.values)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a1a1fb96-677e-454e-b722-63c7eddc2e39",
+ "metadata": {},
+ "source": [
+ "- `copy()`: gives the new df a clean break from the original. Otherwise, the copied df will point to the same object as the original."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "id": "4a536b23-3dcc-4080-8bc2-d04b1e94bd42",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df = pd.DataFrame({'x':[0,2,1,5], 'y':[1,1,0,0], 'z':[True,False,False,False]}) "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "id": "32e91488-8087-4237-818b-7f9f0012a093",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df_deep = df.copy() # deep copy; changes to df will not pass through\n",
+ "df_shallow = df # shallow copy; changes to df will pass through"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "id": "a2d75830-a9ab-4178-abed-9fb1c932308f",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "0x7ff8160f35d0 0x7ff815f695d0 0x7ff8160f35d0\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(hex(id(df)), hex(id(df_deep)), hex(id(df_shallow)))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "64da1637-4bb1-469f-89ff-202a009355d2",
+ "metadata": {},
+ "source": [
+ "- `dtypes`: provides the types of each column:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "id": "f65466e1-f00b-42c0-bb79-39ede3107819",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "x int64\n",
+ "y int64\n",
+ "z bool\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 44,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.dtypes"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "38dde1cd-47ac-4bdf-8bf5-8aebc4ba3722",
+ "metadata": {},
+ "source": [
+ "- `info()`: prints information about the dataframe including the index dtype and columns, non-null values and memory usage."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 45,
+ "id": "e1413b00-6ae6-4e71-bae7-13f6e522dd57",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "RangeIndex: 4 entries, 0 to 3\n",
+ "Data columns (total 3 columns):\n",
+ " # Column Non-Null Count Dtype\n",
+ "--- ------ -------------- -----\n",
+ " 0 x 4 non-null int64\n",
+ " 1 y 4 non-null int64\n",
+ " 2 z 4 non-null bool \n",
+ "dtypes: bool(1), int64(2)\n",
+ "memory usage: 200.0 bytes\n"
+ ]
+ }
+ ],
+ "source": [
+ "df.info()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2cd2400a-2053-4634-858d-cc412e39890e",
+ "metadata": {},
+ "source": [
+ "- `rename()`: Renames columns or index labels. It can rename one or more fields at once using a dict, which acts as a mapper: "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "id": "fa476c02-7b09-4188-9db9-aa57f948a677",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " x \n",
+ " y \n",
+ " is_label \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " True \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2 \n",
+ " 1 \n",
+ " False \n",
+ " \n",
+ " \n",
+ " obs3 \n",
+ " 1 \n",
+ " 0 \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 5 \n",
+ " 0 \n",
+ " False \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " x y is_label\n",
+ "0 0 1 True\n",
+ "1 2 1 False\n",
+ "obs3 1 0 False\n",
+ "3 5 0 False"
+ ]
+ },
+ "execution_count": 46,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.rename(columns={'z': 'is_label'}, index={2: \"obs3\"})"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f6edf8f0-ffea-48b8-afbf-7a69f192420b",
+ "metadata": {},
+ "source": [
+ "Note that to update the dataframe, you need to redefine the variable that stores it."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "id": "7d04e795-0104-4de8-8443-6ed0045b6133",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " x \n",
+ " y \n",
+ " z \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " True \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2 \n",
+ " 1 \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 1 \n",
+ " 0 \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 5 \n",
+ " 0 \n",
+ " False \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " x y z\n",
+ "0 0 1 True\n",
+ "1 2 1 False\n",
+ "2 1 0 False\n",
+ "3 5 0 False"
+ ]
+ },
+ "execution_count": 47,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "id": "dd3bc123-75df-4e69-a813-5986a53b545e",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " x \n",
+ " y \n",
+ " is_label \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " True \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2 \n",
+ " 1 \n",
+ " False \n",
+ " \n",
+ " \n",
+ " obs3 \n",
+ " 1 \n",
+ " 0 \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 5 \n",
+ " 0 \n",
+ " False \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " x y is_label\n",
+ "0 0 1 True\n",
+ "1 2 1 False\n",
+ "obs3 1 0 False\n",
+ "3 5 0 False"
+ ]
+ },
+ "execution_count": 48,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df = df.rename(columns={'z': 'is_label'}, index={2: \"obs3\"})\n",
+ "\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "52fcaa5f-4837-4b76-86b2-2fc98979a334",
+ "metadata": {},
+ "source": [
+ "## Practice exercises"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0e35cd2f-f285-414d-8d76-1cac827dc396",
+ "metadata": {},
+ "source": [
+ "```{exercise}\n",
+ ":label: pandas1\n",
+ "\n",
+ "Create a dataframe called `dat` by passing a dictionary of inputs. Here are the requirements:\n",
+ "\n",
+ "- has a column named `features` containing floats\n",
+ "- has a column named `labels` containing integers 0, 1, 2 \n",
+ "\n",
+ "Print the df.\n",
+ "\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 51,
+ "id": "5f9e215f-d862-4876-ad59-c14d1cb891c1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Your answers from here"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "12e4b20f-4c16-4de5-853d-b0ffcd18f8b1",
+ "metadata": {},
+ "source": [
+ "```{exercise}\n",
+ ":label: pandas2\n",
+ "\n",
+ "Rename the `labels` column in `dat` to `label`. \n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 50,
+ "id": "596595a4-1f53-4000-81a1-7bb6afc8d56a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Your answers from here"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.9"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/_sources/chapters/module-4/044-PandasII-Exploration_and_Manipulation.ipynb b/_sources/chapters/module-4/044-PandasII-Exploration_and_Manipulation.ipynb
new file mode 100644
index 0000000..924c5ea
--- /dev/null
+++ b/_sources/chapters/module-4/044-PandasII-Exploration_and_Manipulation.ipynb
@@ -0,0 +1,4095 @@
+{
+ "cells": [
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "13aa848b",
+ "metadata": {},
+ "source": [
+ "# PandasII: Exploration and Manipulation\n",
+ "\n",
+ "What you will learn:\n",
+ "- Introduce pandas dataframes and the essential operations"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "4ac80975",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# import dependencies\n",
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f69d5682",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "Let's load a bigger data set to explore more functionality."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "495b4018",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 5.1 \n",
+ " 3.5 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 4.9 \n",
+ " 3.0 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 3.2 \n",
+ " 1.3 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4.6 \n",
+ " 3.1 \n",
+ " 1.5 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 5.0 \n",
+ " 3.6 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 145 \n",
+ " 6.7 \n",
+ " 3.0 \n",
+ " 5.2 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 146 \n",
+ " 6.3 \n",
+ " 2.5 \n",
+ " 5.0 \n",
+ " 1.9 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 147 \n",
+ " 6.5 \n",
+ " 3.0 \n",
+ " 5.2 \n",
+ " 2.0 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 148 \n",
+ " 6.2 \n",
+ " 3.4 \n",
+ " 5.4 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 149 \n",
+ " 5.9 \n",
+ " 3.0 \n",
+ " 5.1 \n",
+ " 1.8 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " \n",
+ " 150 rows Ă— 5 columns \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "0 5.1 3.5 1.4 0.2 setosa\n",
+ "1 4.9 3.0 1.4 0.2 setosa\n",
+ "2 4.7 3.2 1.3 0.2 setosa\n",
+ "3 4.6 3.1 1.5 0.2 setosa\n",
+ "4 5.0 3.6 1.4 0.2 setosa\n",
+ ".. ... ... ... ... ...\n",
+ "145 6.7 3.0 5.2 2.3 virginica\n",
+ "146 6.3 2.5 5.0 1.9 virginica\n",
+ "147 6.5 3.0 5.2 2.0 virginica\n",
+ "148 6.2 3.4 5.4 2.3 virginica\n",
+ "149 5.9 3.0 5.1 1.8 virginica\n",
+ "\n",
+ "[150 rows x 5 columns]"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df = pd.read_csv(\"https://raw.githubusercontent.com/mwaskom/seaborn-data/refs/heads/master/iris.csv\")\n",
+ "iris_df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "897df551",
+ "metadata": {},
+ "source": [
+ "Check the data type of `iris`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "cc587038",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "pandas.core.frame.DataFrame"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "type(iris_df)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a28f0730-121e-4eef-91ae-b1851d68d6db",
+ "metadata": {},
+ "source": [
+ "## Data Inspection"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "18a5abf0-efbb-4fec-a7eb-52ea24102d10",
+ "metadata": {},
+ "source": [
+ "### Exploring its structure"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "24062066",
+ "metadata": {},
+ "source": [
+ "- `head()`: returns the first records in dataframe:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "6e79ac0b",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 5.1 \n",
+ " 3.5 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 4.9 \n",
+ " 3.0 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 3.2 \n",
+ " 1.3 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4.6 \n",
+ " 3.1 \n",
+ " 1.5 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 5.0 \n",
+ " 3.6 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "0 5.1 3.5 1.4 0.2 setosa\n",
+ "1 4.9 3.0 1.4 0.2 setosa\n",
+ "2 4.7 3.2 1.3 0.2 setosa\n",
+ "3 4.6 3.1 1.5 0.2 setosa\n",
+ "4 5.0 3.6 1.4 0.2 setosa"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "1a3341d5",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 5.1 \n",
+ " 3.5 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 4.9 \n",
+ " 3.0 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 3.2 \n",
+ " 1.3 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4.6 \n",
+ " 3.1 \n",
+ " 1.5 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 5.0 \n",
+ " 3.6 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 5.4 \n",
+ " 3.9 \n",
+ " 1.7 \n",
+ " 0.4 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 4.6 \n",
+ " 3.4 \n",
+ " 1.4 \n",
+ " 0.3 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 5.0 \n",
+ " 3.4 \n",
+ " 1.5 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 4.4 \n",
+ " 2.9 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 4.9 \n",
+ " 3.1 \n",
+ " 1.5 \n",
+ " 0.1 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "0 5.1 3.5 1.4 0.2 setosa\n",
+ "1 4.9 3.0 1.4 0.2 setosa\n",
+ "2 4.7 3.2 1.3 0.2 setosa\n",
+ "3 4.6 3.1 1.5 0.2 setosa\n",
+ "4 5.0 3.6 1.4 0.2 setosa\n",
+ "5 5.4 3.9 1.7 0.4 setosa\n",
+ "6 4.6 3.4 1.4 0.3 setosa\n",
+ "7 5.0 3.4 1.5 0.2 setosa\n",
+ "8 4.4 2.9 1.4 0.2 setosa\n",
+ "9 4.9 3.1 1.5 0.1 setosa"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# You can specify how many rows to show\n",
+ "iris_df.head(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "359395f7",
+ "metadata": {},
+ "source": [
+ "- `tail()`: returns the last records in dataframe."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "096cbcf6",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 145 \n",
+ " 6.7 \n",
+ " 3.0 \n",
+ " 5.2 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 146 \n",
+ " 6.3 \n",
+ " 2.5 \n",
+ " 5.0 \n",
+ " 1.9 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 147 \n",
+ " 6.5 \n",
+ " 3.0 \n",
+ " 5.2 \n",
+ " 2.0 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 148 \n",
+ " 6.2 \n",
+ " 3.4 \n",
+ " 5.4 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 149 \n",
+ " 5.9 \n",
+ " 3.0 \n",
+ " 5.1 \n",
+ " 1.8 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "145 6.7 3.0 5.2 2.3 virginica\n",
+ "146 6.3 2.5 5.0 1.9 virginica\n",
+ "147 6.5 3.0 5.2 2.0 virginica\n",
+ "148 6.2 3.4 5.4 2.3 virginica\n",
+ "149 5.9 3.0 5.1 1.8 virginica"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.tail()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "cd014af8",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 140 \n",
+ " 6.7 \n",
+ " 3.1 \n",
+ " 5.6 \n",
+ " 2.4 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 141 \n",
+ " 6.9 \n",
+ " 3.1 \n",
+ " 5.1 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 142 \n",
+ " 5.8 \n",
+ " 2.7 \n",
+ " 5.1 \n",
+ " 1.9 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 143 \n",
+ " 6.8 \n",
+ " 3.2 \n",
+ " 5.9 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 144 \n",
+ " 6.7 \n",
+ " 3.3 \n",
+ " 5.7 \n",
+ " 2.5 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 145 \n",
+ " 6.7 \n",
+ " 3.0 \n",
+ " 5.2 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 146 \n",
+ " 6.3 \n",
+ " 2.5 \n",
+ " 5.0 \n",
+ " 1.9 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 147 \n",
+ " 6.5 \n",
+ " 3.0 \n",
+ " 5.2 \n",
+ " 2.0 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 148 \n",
+ " 6.2 \n",
+ " 3.4 \n",
+ " 5.4 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 149 \n",
+ " 5.9 \n",
+ " 3.0 \n",
+ " 5.1 \n",
+ " 1.8 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "140 6.7 3.1 5.6 2.4 virginica\n",
+ "141 6.9 3.1 5.1 2.3 virginica\n",
+ "142 5.8 2.7 5.1 1.9 virginica\n",
+ "143 6.8 3.2 5.9 2.3 virginica\n",
+ "144 6.7 3.3 5.7 2.5 virginica\n",
+ "145 6.7 3.0 5.2 2.3 virginica\n",
+ "146 6.3 2.5 5.0 1.9 virginica\n",
+ "147 6.5 3.0 5.2 2.0 virginica\n",
+ "148 6.2 3.4 5.4 2.3 virginica\n",
+ "149 5.9 3.0 5.1 1.8 virginica"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Again, we can can specify how many rows to show\n",
+ "iris_df.tail(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "45e495f5",
+ "metadata": {},
+ "source": [
+ "- `dtypes`: returns the data types of each column"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "c12d0f01",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "sepal_length float64\n",
+ "sepal_width float64\n",
+ "petal_length float64\n",
+ "petal_width float64\n",
+ "species object\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.dtypes"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "36df3ba6",
+ "metadata": {},
+ "source": [
+ "- `shape`: As with NumPy, the shape of the dataframe (rows, columns)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "15b4e581",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(150, 5)"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8cb0b767",
+ "metadata": {},
+ "source": [
+ "You can also use the built-in funciton `len()` to obtain the row (record) count."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "be7bf6fa",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "150"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(iris_df)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "65ecc35c",
+ "metadata": {},
+ "source": [
+ "- `columns`: contains the column names."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "89e82424",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',\n",
+ " 'species'],\n",
+ " dtype='object')"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.columns"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "861715f7",
+ "metadata": {},
+ "source": [
+ "- `info()`: "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "8d306a64",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "RangeIndex: 150 entries, 0 to 149\n",
+ "Data columns (total 5 columns):\n",
+ " # Column Non-Null Count Dtype \n",
+ "--- ------ -------------- ----- \n",
+ " 0 sepal_length 150 non-null float64\n",
+ " 1 sepal_width 150 non-null float64\n",
+ " 2 petal_length 150 non-null float64\n",
+ " 3 petal_width 150 non-null float64\n",
+ " 4 species 150 non-null object \n",
+ "dtypes: float64(4), object(1)\n",
+ "memory usage: 6.0+ KB\n"
+ ]
+ }
+ ],
+ "source": [
+ "iris_df.info()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c9e0c71a-9f48-4377-b830-241bea8a5a4b",
+ "metadata": {},
+ "source": [
+ "### Summarizing data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cf99d600-6676-4513-91ff-bfcef43765a3",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "- `describe()`: summarizes the central tendency (i.e. mean), dispersion (i.e. standard deviation) and shape of a dataset's distribution."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "1ddb0055",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " count \n",
+ " 150.000000 \n",
+ " 150.000000 \n",
+ " 150.000000 \n",
+ " 150.000000 \n",
+ " \n",
+ " \n",
+ " mean \n",
+ " 5.843333 \n",
+ " 3.057333 \n",
+ " 3.758000 \n",
+ " 1.199333 \n",
+ " \n",
+ " \n",
+ " std \n",
+ " 0.828066 \n",
+ " 0.435866 \n",
+ " 1.765298 \n",
+ " 0.762238 \n",
+ " \n",
+ " \n",
+ " min \n",
+ " 4.300000 \n",
+ " 2.000000 \n",
+ " 1.000000 \n",
+ " 0.100000 \n",
+ " \n",
+ " \n",
+ " 25% \n",
+ " 5.100000 \n",
+ " 2.800000 \n",
+ " 1.600000 \n",
+ " 0.300000 \n",
+ " \n",
+ " \n",
+ " 50% \n",
+ " 5.800000 \n",
+ " 3.000000 \n",
+ " 4.350000 \n",
+ " 1.300000 \n",
+ " \n",
+ " \n",
+ " 75% \n",
+ " 6.400000 \n",
+ " 3.300000 \n",
+ " 5.100000 \n",
+ " 1.800000 \n",
+ " \n",
+ " \n",
+ " max \n",
+ " 7.900000 \n",
+ " 4.400000 \n",
+ " 6.900000 \n",
+ " 2.500000 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width\n",
+ "count 150.000000 150.000000 150.000000 150.000000\n",
+ "mean 5.843333 3.057333 3.758000 1.199333\n",
+ "std 0.828066 0.435866 1.765298 0.762238\n",
+ "min 4.300000 2.000000 1.000000 0.100000\n",
+ "25% 5.100000 2.800000 1.600000 0.300000\n",
+ "50% 5.800000 3.000000 4.350000 1.300000\n",
+ "75% 6.400000 3.300000 5.100000 1.800000\n",
+ "max 7.900000 4.400000 6.900000 2.500000"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.describe()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "30896c3d",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " count \n",
+ " mean \n",
+ " std \n",
+ " min \n",
+ " 25% \n",
+ " 50% \n",
+ " 75% \n",
+ " max \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " 150.0 \n",
+ " 5.843333 \n",
+ " 0.828066 \n",
+ " 4.3 \n",
+ " 5.1 \n",
+ " 5.80 \n",
+ " 6.4 \n",
+ " 7.9 \n",
+ " \n",
+ " \n",
+ " sepal_width \n",
+ " 150.0 \n",
+ " 3.057333 \n",
+ " 0.435866 \n",
+ " 2.0 \n",
+ " 2.8 \n",
+ " 3.00 \n",
+ " 3.3 \n",
+ " 4.4 \n",
+ " \n",
+ " \n",
+ " petal_length \n",
+ " 150.0 \n",
+ " 3.758000 \n",
+ " 1.765298 \n",
+ " 1.0 \n",
+ " 1.6 \n",
+ " 4.35 \n",
+ " 5.1 \n",
+ " 6.9 \n",
+ " \n",
+ " \n",
+ " petal_width \n",
+ " 150.0 \n",
+ " 1.199333 \n",
+ " 0.762238 \n",
+ " 0.1 \n",
+ " 0.3 \n",
+ " 1.30 \n",
+ " 1.8 \n",
+ " 2.5 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " count mean std min 25% 50% 75% max\n",
+ "sepal_length 150.0 5.843333 0.828066 4.3 5.1 5.80 6.4 7.9\n",
+ "sepal_width 150.0 3.057333 0.435866 2.0 2.8 3.00 3.3 4.4\n",
+ "petal_length 150.0 3.758000 1.765298 1.0 1.6 4.35 5.1 6.9\n",
+ "petal_width 150.0 1.199333 0.762238 0.1 0.3 1.30 1.8 2.5"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.describe().T"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bdb7c40c-0551-4789-b3ca-633cffb9c987",
+ "metadata": {},
+ "source": [
+ "By default, if the dataframe contains mixed type data (numeric and categorical), it will summarize only the numeric data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "294d2ade",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " species \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " count \n",
+ " 150 \n",
+ " \n",
+ " \n",
+ " unique \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " top \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " freq \n",
+ " 50 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " species\n",
+ "count 150\n",
+ "unique 3\n",
+ "top setosa\n",
+ "freq 50"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df[[\"species\"]].describe()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "03ed92bf",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "count 150.000000\n",
+ "mean 5.843333\n",
+ "std 0.828066\n",
+ "min 4.300000\n",
+ "25% 5.100000\n",
+ "50% 5.800000\n",
+ "75% 6.400000\n",
+ "max 7.900000\n",
+ "Name: sepal_length, dtype: float64"
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.sepal_length.describe()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "afc78a19-4120-4119-bccb-270690aa00cf",
+ "metadata": {},
+ "source": [
+ "- `value_counts()`: returns the frequency for each distinct value. Arguments give the ability to sort by count or index, normalize, and more. Look at its [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html) for further details."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "8979b360",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "setosa 50\n",
+ "versicolor 50\n",
+ "virginica 50\n",
+ "Name: species, dtype: int64"
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.species.value_counts()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1e252829-800f-4d04-b04d-498d1abd8616",
+ "metadata": {},
+ "source": [
+ "Show percentages instead of counts"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "1345fc52",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "setosa 0.333333\n",
+ "versicolor 0.333333\n",
+ "virginica 0.333333\n",
+ "Name: species, dtype: float64"
+ ]
+ },
+ "execution_count": 27,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.species.value_counts(normalize=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dbb5a45a-f665-4226-9b03-a9c6063b0dfe",
+ "metadata": {},
+ "source": [
+ "- `.corr()`: returns the correlation between numeric columns."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "id": "8fee461e",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/tmp/ipykernel_106035/1934569051.py:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.\n",
+ " iris_df.corr()\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " 1.000000 \n",
+ " -0.117570 \n",
+ " 0.871754 \n",
+ " 0.817941 \n",
+ " \n",
+ " \n",
+ " sepal_width \n",
+ " -0.117570 \n",
+ " 1.000000 \n",
+ " -0.428440 \n",
+ " -0.366126 \n",
+ " \n",
+ " \n",
+ " petal_length \n",
+ " 0.871754 \n",
+ " -0.428440 \n",
+ " 1.000000 \n",
+ " 0.962865 \n",
+ " \n",
+ " \n",
+ " petal_width \n",
+ " 0.817941 \n",
+ " -0.366126 \n",
+ " 0.962865 \n",
+ " 1.000000 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width\n",
+ "sepal_length 1.000000 -0.117570 0.871754 0.817941\n",
+ "sepal_width -0.117570 1.000000 -0.428440 -0.366126\n",
+ "petal_length 0.871754 -0.428440 1.000000 0.962865\n",
+ "petal_width 0.817941 -0.366126 0.962865 1.000000"
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.corr()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "af86a280-53c5-4231-91b3-100b1995a7be",
+ "metadata": {},
+ "source": [
+ "Correlation can be computed on two fields by subsetting on them:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "id": "02d1c9ee",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " petal_length \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " 1.000000 \n",
+ " 0.871754 \n",
+ " \n",
+ " \n",
+ " petal_length \n",
+ " 0.871754 \n",
+ " 1.000000 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length petal_length\n",
+ "sepal_length 1.000000 0.871754\n",
+ "petal_length 0.871754 1.000000"
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df[['sepal_length','petal_length']].corr()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "id": "f433fe95",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " petal_length \n",
+ " sepal_width \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " 1.000000 \n",
+ " 0.871754 \n",
+ " -0.11757 \n",
+ " \n",
+ " \n",
+ " petal_length \n",
+ " 0.871754 \n",
+ " 1.000000 \n",
+ " -0.42844 \n",
+ " \n",
+ " \n",
+ " sepal_width \n",
+ " -0.117570 \n",
+ " -0.428440 \n",
+ " 1.00000 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length petal_length sepal_width\n",
+ "sepal_length 1.000000 0.871754 -0.11757\n",
+ "petal_length 0.871754 1.000000 -0.42844\n",
+ "sepal_width -0.117570 -0.428440 1.00000"
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df[['sepal_length','petal_length','sepal_width']].corr()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "450bf331",
+ "metadata": {},
+ "source": [
+ "## Selection and Indexing"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9ec3e70b-915e-4616-a680-e9e1c34cd736",
+ "metadata": {},
+ "source": [
+ "### By Index"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cc454247",
+ "metadata": {},
+ "source": [
+ "We use `iloc[]` to extract rows using **indexes**. \n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 58,
+ "id": "98a9ae6e",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "sepal_length 4.7\n",
+ "sepal_width 3.2\n",
+ "petal_length 1.3\n",
+ "petal_width 0.2\n",
+ "species setosa\n",
+ "Name: 2, dtype: object"
+ ]
+ },
+ "execution_count": 58,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# This fetches row 3, and all columns:\n",
+ "\n",
+ "iris_df.iloc[2]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a82a9f45",
+ "metadata": {},
+ "source": [
+ "fetch rows with indices 1,2 (the right endpoint is exclusive), and all columns."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 59,
+ "id": "c5c45d06",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " obs_id \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 4.9 \n",
+ " 3.0 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 3.2 \n",
+ " 1.3 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "obs_id \n",
+ "1 4.9 3.0 1.4 0.2 setosa\n",
+ "2 4.7 3.2 1.3 0.2 setosa"
+ ]
+ },
+ "execution_count": 59,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.iloc[1:3]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3bc78532",
+ "metadata": {},
+ "source": [
+ "fetch rows with indices 1,2 and first three columns (positions 0, 1, 2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 60,
+ "id": "408ba901",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " \n",
+ " \n",
+ " obs_id \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 4.9 \n",
+ " 3.0 \n",
+ " 1.4 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 3.2 \n",
+ " 1.3 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length\n",
+ "obs_id \n",
+ "1 4.9 3.0 1.4\n",
+ "2 4.7 3.2 1.3"
+ ]
+ },
+ "execution_count": 60,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.iloc[1:3, 0:3]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "46975617",
+ "metadata": {},
+ "source": [
+ "You can apply slices to column names too. You don't need `.iloc[]` here."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 62,
+ "id": "5056b057",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Index(['sepal_length', 'sepal_width', 'petal_length'], dtype='object')"
+ ]
+ },
+ "execution_count": 62,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.columns[0:3]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ccfdea8c-d7ff-48c6-89a4-a75bd16f88c8",
+ "metadata": {},
+ "source": [
+ "### By label"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f9be9788",
+ "metadata": {},
+ "source": [
+ "We can select by row and column labels using `.loc[]`. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d84fe9ab",
+ "metadata": {},
+ "source": [
+ "Here we ask for rows with labels (indexes) 1-3, and it gives exactly that \n",
+ "`.iloc[]` returned rows with indices 1,2.\n",
+ "\n",
+ "**Author note: This is by far the more useful of the two in my experience.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 63,
+ "id": "cd1a1a13",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " obs_id \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 4.9 \n",
+ " 3.0 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 3.2 \n",
+ " 1.3 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4.6 \n",
+ " 3.1 \n",
+ " 1.5 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "obs_id \n",
+ "1 4.9 3.0 1.4 0.2 setosa\n",
+ "2 4.7 3.2 1.3 0.2 setosa\n",
+ "3 4.6 3.1 1.5 0.2 setosa"
+ ]
+ },
+ "execution_count": 63,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.loc[1:3]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5ccd9d19",
+ "metadata": {},
+ "source": [
+ "Subset on columns with column name (as a string) or list of strings"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 65,
+ "id": "332696cc",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " petal_width \n",
+ " \n",
+ " \n",
+ " obs_id \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 4.9 \n",
+ " 0.2 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 0.2 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4.6 \n",
+ " 0.2 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length petal_width\n",
+ "obs_id \n",
+ "1 4.9 0.2\n",
+ "2 4.7 0.2\n",
+ "3 4.6 0.2"
+ ]
+ },
+ "execution_count": 65,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.loc[1:3, ['sepal_length','petal_width']]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "10439dcc",
+ "metadata": {},
+ "source": [
+ "Select all rows, specific columns"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 66,
+ "id": "e322dddf",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " petal_width \n",
+ " \n",
+ " \n",
+ " obs_id \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 5.1 \n",
+ " 0.2 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 4.9 \n",
+ " 0.2 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 0.2 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4.6 \n",
+ " 0.2 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 5.0 \n",
+ " 0.2 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 145 \n",
+ " 6.7 \n",
+ " 2.3 \n",
+ " \n",
+ " \n",
+ " 146 \n",
+ " 6.3 \n",
+ " 1.9 \n",
+ " \n",
+ " \n",
+ " 147 \n",
+ " 6.5 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 148 \n",
+ " 6.2 \n",
+ " 2.3 \n",
+ " \n",
+ " \n",
+ " 149 \n",
+ " 5.9 \n",
+ " 1.8 \n",
+ " \n",
+ " \n",
+ " \n",
+ " 150 rows Ă— 2 columns \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length petal_width\n",
+ "obs_id \n",
+ "0 5.1 0.2\n",
+ "1 4.9 0.2\n",
+ "2 4.7 0.2\n",
+ "3 4.6 0.2\n",
+ "4 5.0 0.2\n",
+ "... ... ...\n",
+ "145 6.7 2.3\n",
+ "146 6.3 1.9\n",
+ "147 6.5 2.0\n",
+ "148 6.2 2.3\n",
+ "149 5.9 1.8\n",
+ "\n",
+ "[150 rows x 2 columns]"
+ ]
+ },
+ "execution_count": 66,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.loc[:, ['sepal_length','petal_width']]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "485059f0",
+ "metadata": {},
+ "source": [
+ "### Boolean Filtering\n",
+ "\n",
+ "It's very common to subset a dataframe based on some condition on the data.\n",
+ "\n",
+ "🔑 Note that even though we are filtering rows, we are not using `.loc[]` or `.iloc[]` here.\n",
+ "\n",
+ "Pandas knows what to do if you pass a boolean structure."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 73,
+ "id": "ef3d5652",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "obs_id\n",
+ "0 False\n",
+ "1 False\n",
+ "2 False\n",
+ "3 False\n",
+ "4 False\n",
+ " ... \n",
+ "145 False\n",
+ "146 False\n",
+ "147 False\n",
+ "148 False\n",
+ "149 False\n",
+ "Name: sepal_length, Length: 150, dtype: bool"
+ ]
+ },
+ "execution_count": 73,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.sepal_length >= 7.5"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 76,
+ "id": "059d604e",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " obs_id \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 105 \n",
+ " 7.6 \n",
+ " 3.0 \n",
+ " 6.6 \n",
+ " 2.1 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 117 \n",
+ " 7.7 \n",
+ " 3.8 \n",
+ " 6.7 \n",
+ " 2.2 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 118 \n",
+ " 7.7 \n",
+ " 2.6 \n",
+ " 6.9 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 122 \n",
+ " 7.7 \n",
+ " 2.8 \n",
+ " 6.7 \n",
+ " 2.0 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 131 \n",
+ " 7.9 \n",
+ " 3.8 \n",
+ " 6.4 \n",
+ " 2.0 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 135 \n",
+ " 7.7 \n",
+ " 3.0 \n",
+ " 6.1 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "obs_id \n",
+ "105 7.6 3.0 6.6 2.1 virginica\n",
+ "117 7.7 3.8 6.7 2.2 virginica\n",
+ "118 7.7 2.6 6.9 2.3 virginica\n",
+ "122 7.7 2.8 6.7 2.0 virginica\n",
+ "131 7.9 3.8 6.4 2.0 virginica\n",
+ "135 7.7 3.0 6.1 2.3 virginica"
+ ]
+ },
+ "execution_count": 76,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.loc[iris_df.sepal_length >= 7.5,:]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 78,
+ "id": "b922f38e",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " obs_id \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 3.2 \n",
+ " 1.3 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4.6 \n",
+ " 3.1 \n",
+ " 1.5 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 4.6 \n",
+ " 3.4 \n",
+ " 1.4 \n",
+ " 0.3 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 22 \n",
+ " 4.6 \n",
+ " 3.6 \n",
+ " 1.0 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 29 \n",
+ " 4.7 \n",
+ " 3.2 \n",
+ " 1.6 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 41 \n",
+ " 4.5 \n",
+ " 2.3 \n",
+ " 1.3 \n",
+ " 0.3 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 47 \n",
+ " 4.6 \n",
+ " 3.2 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "obs_id \n",
+ "2 4.7 3.2 1.3 0.2 setosa\n",
+ "3 4.6 3.1 1.5 0.2 setosa\n",
+ "6 4.6 3.4 1.4 0.3 setosa\n",
+ "22 4.6 3.6 1.0 0.2 setosa\n",
+ "29 4.7 3.2 1.6 0.2 setosa\n",
+ "41 4.5 2.3 1.3 0.3 setosa\n",
+ "47 4.6 3.2 1.4 0.2 setosa"
+ ]
+ },
+ "execution_count": 78,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.loc[(iris_df['sepal_length' ]>= 4.5) & (iris_df['sepal_length'] <= 4.7),:]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3a57da01",
+ "metadata": {},
+ "source": [
+ "## Masking"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "94fe5317",
+ "metadata": {},
+ "source": [
+ "Here's an example of **masking** using boolean conditions passed to the dataframe selector:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7583a8af",
+ "metadata": {},
+ "source": [
+ "Here are the **values** for the feature `sepal length`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 81,
+ "id": "db8c53c0",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9, 5.4, 4.8, 4.8,\n",
+ " 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5. ,\n",
+ " 5. , 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5. , 5.5, 4.9, 4.4,\n",
+ " 5.1, 5. , 4.5, 4.4, 5. , 5.1, 4.8, 5.1, 4.6, 5.3, 5. , 7. , 6.4,\n",
+ " 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5. , 5.9, 6. , 6.1, 5.6,\n",
+ " 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7,\n",
+ " 6. , 5.7, 5.5, 5.5, 5.8, 6. , 5.4, 6. , 6.7, 6.3, 5.6, 5.5, 5.5,\n",
+ " 6.1, 5.8, 5. , 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 7.1, 6.3,\n",
+ " 6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5,\n",
+ " 7.7, 7.7, 6. , 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2,\n",
+ " 7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6. , 6.9, 6.7, 6.9, 5.8,\n",
+ " 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9])"
+ ]
+ },
+ "execution_count": 81,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.sepal_length.values"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0f17cc32",
+ "metadata": {},
+ "source": [
+ "And here are **the boolean values** generated by applying a comparison operator to those values:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 82,
+ "id": "70b43e0e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "mask = iris_df.sepal_length >= 7.5"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 83,
+ "id": "3c50ab61",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([False, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False, True, False, False,\n",
+ " False, False, False, False, False, False, False, False, False,\n",
+ " True, True, False, False, False, True, False, False, False,\n",
+ " False, False, False, False, False, True, False, False, False,\n",
+ " True, False, False, False, False, False, False, False, False,\n",
+ " False, False, False, False, False, False])"
+ ]
+ },
+ "execution_count": 83,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "mask.values"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8f6bb3e6",
+ "metadata": {},
+ "source": [
+ "The two sets of values have the same shape.\n",
+ "\n",
+ "We can now overlay the logical values over the numeric ones and keep only what is `True`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 84,
+ "id": "123042dd",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([7.6, 7.7, 7.7, 7.7, 7.9, 7.7])"
+ ]
+ },
+ "execution_count": 84,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.sepal_length[mask].values"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3eaf05e6-c9f9-4472-ad1a-25f1ff859049",
+ "metadata": {},
+ "source": [
+ "## Sorting and Ranking"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f4c0a086-13be-447a-9da4-13f1d4b72fdd",
+ "metadata": {},
+ "source": [
+ "**`.sort_values()`**\n",
+ "\n",
+ "Sort by values\n",
+ "- `by` parameter takes string or list of strings\n",
+ "- `ascending` takes True or False\n",
+ "- `inplace` will save sorted values into the df\n",
+ "\n",
+ "[Details](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "id": "db23981b-4838-46cc-af21-fbe1fb354e3d",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 13 \n",
+ " 4.3 \n",
+ " 3.0 \n",
+ " 1.1 \n",
+ " 0.1 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 4.4 \n",
+ " 2.9 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 38 \n",
+ " 4.4 \n",
+ " 3.0 \n",
+ " 1.3 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 42 \n",
+ " 4.4 \n",
+ " 3.2 \n",
+ " 1.3 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 41 \n",
+ " 4.5 \n",
+ " 2.3 \n",
+ " 1.3 \n",
+ " 0.3 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 122 \n",
+ " 7.7 \n",
+ " 2.8 \n",
+ " 6.7 \n",
+ " 2.0 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 117 \n",
+ " 7.7 \n",
+ " 3.8 \n",
+ " 6.7 \n",
+ " 2.2 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 118 \n",
+ " 7.7 \n",
+ " 2.6 \n",
+ " 6.9 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 135 \n",
+ " 7.7 \n",
+ " 3.0 \n",
+ " 6.1 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 131 \n",
+ " 7.9 \n",
+ " 3.8 \n",
+ " 6.4 \n",
+ " 2.0 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " \n",
+ " 150 rows Ă— 5 columns \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "13 4.3 3.0 1.1 0.1 setosa\n",
+ "8 4.4 2.9 1.4 0.2 setosa\n",
+ "38 4.4 3.0 1.3 0.2 setosa\n",
+ "42 4.4 3.2 1.3 0.2 setosa\n",
+ "41 4.5 2.3 1.3 0.3 setosa\n",
+ ".. ... ... ... ... ...\n",
+ "122 7.7 2.8 6.7 2.0 virginica\n",
+ "117 7.7 3.8 6.7 2.2 virginica\n",
+ "118 7.7 2.6 6.9 2.3 virginica\n",
+ "135 7.7 3.0 6.1 2.3 virginica\n",
+ "131 7.9 3.8 6.4 2.0 virginica\n",
+ "\n",
+ "[150 rows x 5 columns]"
+ ]
+ },
+ "execution_count": 33,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.sort_values(by=['sepal_length','petal_width'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5590f9d3-bbf1-4c45-84b6-c2b5c288716f",
+ "metadata": {},
+ "source": [
+ "## `.sort_index()`\n",
+ "\n",
+ "Sort by index. Example sorts by descending index"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "id": "8aa29fc1-0ef8-40dd-8829-81395b15f6a0",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 149 \n",
+ " 5.9 \n",
+ " 3.0 \n",
+ " 5.1 \n",
+ " 1.8 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 148 \n",
+ " 6.2 \n",
+ " 3.4 \n",
+ " 5.4 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 147 \n",
+ " 6.5 \n",
+ " 3.0 \n",
+ " 5.2 \n",
+ " 2.0 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 146 \n",
+ " 6.3 \n",
+ " 2.5 \n",
+ " 5.0 \n",
+ " 1.9 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " 145 \n",
+ " 6.7 \n",
+ " 3.0 \n",
+ " 5.2 \n",
+ " 2.3 \n",
+ " virginica \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 5.0 \n",
+ " 3.6 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4.6 \n",
+ " 3.1 \n",
+ " 1.5 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 3.2 \n",
+ " 1.3 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 4.9 \n",
+ " 3.0 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 5.1 \n",
+ " 3.5 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " \n",
+ " 150 rows Ă— 5 columns \n",
+ " "
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "149 5.9 3.0 5.1 1.8 virginica\n",
+ "148 6.2 3.4 5.4 2.3 virginica\n",
+ "147 6.5 3.0 5.2 2.0 virginica\n",
+ "146 6.3 2.5 5.0 1.9 virginica\n",
+ "145 6.7 3.0 5.2 2.3 virginica\n",
+ ".. ... ... ... ... ...\n",
+ "4 5.0 3.6 1.4 0.2 setosa\n",
+ "3 4.6 3.1 1.5 0.2 setosa\n",
+ "2 4.7 3.2 1.3 0.2 setosa\n",
+ "1 4.9 3.0 1.4 0.2 setosa\n",
+ "0 5.1 3.5 1.4 0.2 setosa\n",
+ "\n",
+ "[150 rows x 5 columns]"
+ ]
+ },
+ "execution_count": 34,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "iris_df.sort_index(axis=0, ascending=False)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "db062841",
+ "metadata": {},
+ "source": [
+ "## Dealing with Missing Data\n",
+ "\n",
+ "Pandas primarily uses the data type `np.nan` from NumPy to represent missing data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 53,
+ "id": "2ae69551",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 54,
+ "id": "88d0b1c6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df_miss = pd.DataFrame({\n",
+ " 'x':[2, np.nan, 1], \n",
+ " 'y':[np.nan, np.nan, 6]}\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 55,
+ "id": "8404fdeb",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " x \n",
+ " y \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 2.0 \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 1.0 \n",
+ " 6.0 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " x y\n",
+ "0 2.0 NaN\n",
+ "1 NaN NaN\n",
+ "2 1.0 6.0"
+ ]
+ },
+ "execution_count": 55,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_miss"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "565b8fa8",
+ "metadata": {},
+ "source": [
+ "## `.dropna()` \n",
+ "\n",
+ "This will drop all rows with missing data in any column.\n",
+ "\n",
+ "[Details](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 56,
+ "id": "0f90aff6",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " x \n",
+ " y \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 1.0 \n",
+ " 6.0 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " x y\n",
+ "2 1.0 6.0"
+ ]
+ },
+ "execution_count": 56,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_drop_all = df_miss.dropna()\n",
+ "df_drop_all"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "190e3c8d",
+ "metadata": {},
+ "source": [
+ "The `subset` parameter takes a list of column names to specify which columns should have missing values."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 57,
+ "id": "ba5ad471",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " x \n",
+ " y \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 2.0 \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 1.0 \n",
+ " 6.0 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " x y\n",
+ "0 2.0 NaN\n",
+ "2 1.0 6.0"
+ ]
+ },
+ "execution_count": 57,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_drop_x = df_miss.dropna(subset=['x'])\n",
+ "df_drop_x"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c7efa14a",
+ "metadata": {},
+ "source": [
+ "## `.fillna()`\n",
+ "\n",
+ "This will replace missing values with whatever you set it to, e.g. $0$s.\n",
+ "\n",
+ "[Details](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html)\n",
+ "\n",
+ "We can pass the results of an operation -- for example to peform simple imputation, we can replace missing values in each column with the median value of the respective column:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 58,
+ "id": "c697c8f4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df_filled = df_miss.fillna(df_miss.median())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 59,
+ "id": "cc10a2b7",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " x \n",
+ " y \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 2.0 \n",
+ " 6.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 1.5 \n",
+ " 6.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 1.0 \n",
+ " 6.0 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " x y\n",
+ "0 2.0 6.0\n",
+ "1 1.5 6.0\n",
+ "2 1.0 6.0"
+ ]
+ },
+ "execution_count": 59,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_filled"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4e04c613",
+ "metadata": {},
+ "source": [
+ "## Column selection, addition, deletion\n",
+ "\n",
+ "### Selection\n",
+ "\n",
+ "Use bracket notation or dot notation. \n",
+ "- bracket notation: variable name must be a string"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 142,
+ "id": "157649ef",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(0 1\n",
+ " 1 1\n",
+ " obs3 0\n",
+ " 3 0\n",
+ " Name: y, dtype: int64,\n",
+ " pandas.core.series.Series)"
+ ]
+ },
+ "execution_count": 142,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df['y'], type(df['y'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "39b9c648",
+ "metadata": {},
+ "source": [
+ "- As an object attribute"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 143,
+ "id": "9790c7a9",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(0 1\n",
+ " 1 1\n",
+ " obs3 0\n",
+ " 3 0\n",
+ " Name: y, dtype: int64,\n",
+ " pandas.core.series.Series)"
+ ]
+ },
+ "execution_count": 143,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.y, type(df.y)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e8767d21",
+ "metadata": {},
+ "source": [
+ "Dot notation is very convenient, since as object attributes they can be tab-completed in various editing environments.\n",
+ "\n",
+ "But:\n",
+ "- It only works if the column names are not reserved words\n",
+ "- It can't be used when created a new column (see below)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "40a660ef",
+ "metadata": {},
+ "source": [
+ "As we can see, the selected columns are series, so its properties and features apply to both of them:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 144,
+ "id": "bed23420-33c8-4114-bb63-390c878c1dee",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(1, 1)"
+ ]
+ },
+ "execution_count": 144,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# indexing\n",
+ "df.y.values[0], df['y'][0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 145,
+ "id": "20529c40",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([1, 1, 0, 0]), array([1, 1, 0, 0]))"
+ ]
+ },
+ "execution_count": 145,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Accessing values attribute\n",
+ "df.y.values, df['y'].values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 146,
+ "id": "49e07b5f",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(0.5, 0.5)"
+ ]
+ },
+ "execution_count": 146,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Taking the mean\n",
+ "df.y.mean(), df['y'].mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b2dedf62",
+ "metadata": {},
+ "source": [
+ "### Column Selection\n",
+ "\n",
+ "You select columns from a dataframe by passing a value or list (or any expression that evaluates to a list)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 147,
+ "id": "8c9aa654",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(0 0\n",
+ " 1 2\n",
+ " obs3 1\n",
+ " 3 5\n",
+ " Name: x, dtype: int64,\n",
+ " pandas.core.series.Series)"
+ ]
+ },
+ "execution_count": 147,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# single bracket gives you a series\n",
+ "df['x'], type(df['x'])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 148,
+ "id": "6cc6cf62",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "( x\n",
+ " 0 0\n",
+ " 1 2\n",
+ " obs3 1\n",
+ " 3 5,\n",
+ " pandas.core.frame.DataFrame)"
+ ]
+ },
+ "execution_count": 148,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# double bracket gives you the selected column as new dataframe\n",
+ "df[['x']], type(df[['x']])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 149,
+ "id": "8aa7d7d8",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " y \n",
+ " x \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 1 \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " obs3 \n",
+ " 0 \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " y x\n",
+ "0 1 0\n",
+ "1 1 2\n",
+ "obs3 0 1\n",
+ "3 0 5"
+ ]
+ },
+ "execution_count": 149,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df[['y', 'x']]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "216b06d1",
+ "metadata": {},
+ "source": [
+ "### Addition\n",
+ "\n",
+ "It is typical to create a new column from existing columns. \n",
+ "\n",
+ "In this example, a new column (or field) is created by summing `x` and `y`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 150,
+ "id": "45a650cb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df['x_plus_y'] = df.x + df.y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 151,
+ "id": "a9742abe",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " x \n",
+ " y \n",
+ " is_label \n",
+ " x_plus_y \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " True \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2 \n",
+ " 1 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " obs3 \n",
+ " 1 \n",
+ " 0 \n",
+ " False \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 5 \n",
+ " 0 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " x y is_label x_plus_y\n",
+ "0 0 1 True 1\n",
+ "1 2 1 False 3\n",
+ "obs3 1 0 False 1\n",
+ "3 5 0 False 5"
+ ]
+ },
+ "execution_count": 151,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b70f49b7",
+ "metadata": {},
+ "source": [
+ "Note that:\n",
+ "\n",
+ "- The left side has form: DataFrame name, bracket notation, new column name\n",
+ "- The assignment operator `=` is used\n",
+ "- The right side contains an expression; here, two df columns are summed "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cb93606a",
+ "metadata": {},
+ "source": [
+ "Bracket notation also works on the fields, but it's more typing:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 152,
+ "id": "65346e2d",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " x \n",
+ " y \n",
+ " is_label \n",
+ " x_plus_y \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " True \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2 \n",
+ " 1 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " obs3 \n",
+ " 1 \n",
+ " 0 \n",
+ " False \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 5 \n",
+ " 0 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " x y is_label x_plus_y\n",
+ "0 0 1 True 1\n",
+ "1 2 1 False 3\n",
+ "obs3 1 0 False 1\n",
+ "3 5 0 False 5"
+ ]
+ },
+ "execution_count": 152,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df['x_plus_y'] = df['x'] + df['y']\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "112dbcd6",
+ "metadata": {},
+ "source": [
+ "The bracket notation must be used when assigning to a new column. This will break:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 153,
+ "id": "86dd3bca",
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "SyntaxError",
+ "evalue": "invalid syntax (1004225935.py, line 1)",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;36m Cell \u001b[0;32mIn[153], line 1\u001b[0;36m\u001b[0m\n\u001b[0;31m df.'x_plus_y' = df.x + df.y\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
+ ]
+ }
+ ],
+ "source": [
+ "df.'x_plus_y' = df.x + df.y"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b2bedd5b",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### Removing Columns"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "30b4bfb1",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "- Using the reserverd keyword `del` to drop a DataFrame or single columns from the dataframe:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 154,
+ "id": "5bef1897",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df_drop = df.copy()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 155,
+ "id": "7a234a8c",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " x \n",
+ " y \n",
+ " is_label \n",
+ " x_plus_y \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " True \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2 \n",
+ " 1 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " x y is_label x_plus_y\n",
+ "0 0 1 True 1\n",
+ "1 2 1 False 3"
+ ]
+ },
+ "execution_count": 155,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_drop.head(2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 156,
+ "id": "f996dd87",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# delete the column 'x'\n",
+ "del df_drop['x']"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 157,
+ "id": "2f8a2a9c",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " y \n",
+ " is_label \n",
+ " x_plus_y \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " True \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 1 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " obs3 \n",
+ " 0 \n",
+ " False \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " y is_label x_plus_y\n",
+ "0 1 True 1\n",
+ "1 1 False 3\n",
+ "obs3 0 False 1\n",
+ "3 0 False 5"
+ ]
+ },
+ "execution_count": 157,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_drop"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b34fe8af",
+ "metadata": {},
+ "source": [
+ "- Using the method `drop()` to drop one or more columns. It takes takes `axis` parameter:\n",
+ " - axis=0 refers to rows \n",
+ " - axis=1 refers to columns "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 158,
+ "id": "13358267",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " y \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " obs3 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " y\n",
+ "0 1\n",
+ "1 1\n",
+ "obs3 0\n",
+ "3 0"
+ ]
+ },
+ "execution_count": 158,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Here we drop columns\n",
+ "df_drop = df_drop.drop(['x_plus_y', 'is_label'], axis=1)\n",
+ "df_drop"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 159,
+ "id": "cf8015ca-b92e-40b5-9058-109c0fcff6d9",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " y \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " obs3 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ " y\n",
+ "1 1\n",
+ "obs3 0\n",
+ "3 0"
+ ]
+ },
+ "execution_count": 159,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Now a particular observation\n",
+ "df_drop = df_drop.drop([0], axis=0)\n",
+ "df_drop"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.9"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/_sources/chapters/module-4/Untitled.ipynb b/_sources/chapters/module-4/Untitled.ipynb
new file mode 100644
index 0000000..8364f17
--- /dev/null
+++ b/_sources/chapters/module-4/Untitled.ipynb
@@ -0,0 +1,99 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "65484824-12ff-4837-bcce-d7b7b408b38d",
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "NameError",
+ "evalue": "name 'iris' is not defined",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn[1], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m SEPAL_LENGTH \u001b[38;5;241m=\u001b[39m iris\u001b[38;5;241m.\u001b[39msepal_length\u001b[38;5;241m.\u001b[39mvalue_counts()\u001b[38;5;241m.\u001b[39mto_frame(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mn\u001b[39m\u001b[38;5;124m'\u001b[39m)\n",
+ "\u001b[0;31mNameError\u001b[0m: name 'iris' is not defined"
+ ]
+ }
+ ],
+ "source": [
+ "SEPAL_LENGTH = iris.sepal_length.value_counts().to_frame('n')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "9c686681-ea30-43a0-a1e9-e402ddf3d76a",
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "NameError",
+ "evalue": "name 'SEPAL_LENGTH' is not defined",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn[2], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m SEPAL_LENGTH\n",
+ "\u001b[0;31mNameError\u001b[0m: name 'SEPAL_LENGTH' is not defined"
+ ]
+ }
+ ],
+ "source": [
+ "SEPAL_LENGTH"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "84e857d3-9de3-42b9-94f6-abaebf77d756",
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "NameError",
+ "evalue": "name 'SEPAL_LENGTH' is not defined",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn[3], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m SEPAL_LENGTH\u001b[38;5;241m.\u001b[39msort_index()\u001b[38;5;241m.\u001b[39mplot\u001b[38;5;241m.\u001b[39mbar(figsize\u001b[38;5;241m=\u001b[39m(\u001b[38;5;241m8\u001b[39m,\u001b[38;5;241m4\u001b[39m), rot\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m45\u001b[39m)\n",
+ "\u001b[0;31mNameError\u001b[0m: name 'SEPAL_LENGTH' is not defined"
+ ]
+ }
+ ],
+ "source": [
+ "SEPAL_LENGTH.sort_index().plot.bar(figsize=(8,4), rot=45);"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c53229bc-ead0-4768-9943-89ac76af3962",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.9"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/chapters/01-getting_started.html b/chapters/01-getting_started.html
index 17d09ce..fff861a 100644
--- a/chapters/01-getting_started.html
+++ b/chapters/01-getting_started.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/02-python-basics.html b/chapters/02-python-basics.html
index ad07c5f..b69928b 100644
--- a/chapters/02-python-basics.html
+++ b/chapters/02-python-basics.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/04-python-basics.html b/chapters/04-python-basics.html
index 54bc761..63f9f0d 100644
--- a/chapters/04-python-basics.html
+++ b/chapters/04-python-basics.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-1/012-intro_python (copia).html b/chapters/module-1/012-intro_python (copia).html
index 4c5fe66..ab394c7 100644
--- a/chapters/module-1/012-intro_python (copia).html
+++ b/chapters/module-1/012-intro_python (copia).html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-1/012-intro_python.html b/chapters/module-1/012-intro_python.html
index 270526d..78aadff 100644
--- a/chapters/module-1/012-intro_python.html
+++ b/chapters/module-1/012-intro_python.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-1/013-intro_R.html b/chapters/module-1/013-intro_R.html
index 1d40f25..508d142 100644
--- a/chapters/module-1/013-intro_R.html
+++ b/chapters/module-1/013-intro_R.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-1/Practice.html b/chapters/module-1/Practice.html
index af42d7b..6c88945 100644
--- a/chapters/module-1/Practice.html
+++ b/chapters/module-1/Practice.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-1/about_course.html b/chapters/module-1/about_course.html
index 5070e18..43397f4 100644
--- a/chapters/module-1/about_course.html
+++ b/chapters/module-1/about_course.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-1/jupyter_notebooks.html b/chapters/module-1/jupyter_notebooks.html
index 03ddc50..e185f79 100644
--- a/chapters/module-1/jupyter_notebooks.html
+++ b/chapters/module-1/jupyter_notebooks.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-1/programming.html b/chapters/module-1/programming.html
index 1c8ebfc..529703f 100644
--- a/chapters/module-1/programming.html
+++ b/chapters/module-1/programming.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-1/tech_stack.html b/chapters/module-1/tech_stack.html
index d1374b4..760a6e8 100644
--- a/chapters/module-1/tech_stack.html
+++ b/chapters/module-1/tech_stack.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-1/your_first_program.html b/chapters/module-1/your_first_program.html
index 5f436a6..36776e7 100644
--- a/chapters/module-1/your_first_program.html
+++ b/chapters/module-1/your_first_program.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-2/02-cover.html b/chapters/module-2/02-cover.html
index 36f708a..3bf13d1 100644
--- a/chapters/module-2/02-cover.html
+++ b/chapters/module-2/02-cover.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-2/021-variables.html b/chapters/module-2/021-variables.html
index 0a8a50f..f7667f2 100644
--- a/chapters/module-2/021-variables.html
+++ b/chapters/module-2/021-variables.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-2/022-operators.html b/chapters/module-2/022-operators.html
index 8ee7915..5000f57 100644
--- a/chapters/module-2/022-operators.html
+++ b/chapters/module-2/022-operators.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-2/023-strings.html b/chapters/module-2/023-strings.html
index 0d7838e..0e67b77 100644
--- a/chapters/module-2/023-strings.html
+++ b/chapters/module-2/023-strings.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-2/024-structures.html b/chapters/module-2/024-structures.html
index 889b15e..abf0c3d 100644
--- a/chapters/module-2/024-structures.html
+++ b/chapters/module-2/024-structures.html
@@ -32,9 +32,9 @@
-
+
-
+
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-2/0241-structures_exercises.html b/chapters/module-2/0241-structures_exercises.html
index d0c76e7..db9e446 100644
--- a/chapters/module-2/0241-structures_exercises.html
+++ b/chapters/module-2/0241-structures_exercises.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-2/025-conditional.html b/chapters/module-2/025-conditional.html
index ba96f86..935a5a4 100644
--- a/chapters/module-2/025-conditional.html
+++ b/chapters/module-2/025-conditional.html
@@ -32,9 +32,9 @@
-
+
-
+
@@ -61,6 +61,7 @@
+
@@ -194,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
@@ -995,6 +1010,15 @@ Practice exercises | | |
+
+
+
next
+
Iterables and Iterators
+
+
+
diff --git a/chapters/module-2/0251-conditional_exercises.html b/chapters/module-2/0251-conditional_exercises.html
index 83293ec..755bd92 100644
--- a/chapters/module-2/0251-conditional_exercises.html
+++ b/chapters/module-2/0251-conditional_exercises.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-2/026-functions.html b/chapters/module-2/026-functions.html
deleted file mode 100644
index caa25a9..0000000
--- a/chapters/module-2/026-functions.html
+++ /dev/null
@@ -1,1664 +0,0 @@
-
-
-
-
-
-
-
-
-
-
- Functions — DS-1002 Programming for Data Science
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Skip to main content
-
-
-
-
-
- Back to top
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Functions
-What you will learn in this lesson:
-
-
-I. Introduction
-Functions take input, perform a specific task and, optionally, produce an output. They contain a block of code to do their work.
-Function inputs are called both parameters
and arguments
.
-Functions can return a single value, multiple values, or even no value at.
-Why do we use functions?
-
-With functions, you can keep your code short and concise . Once a function is defined, it can be used as many times as needed, which is great to not need to write the same code over and over. In addition, functions help your code be more readable . For example, if you give a function a well-chosen name, anyone could read your code, and already infer what it does.
-Other forms of code economy is through modules and packages, which is you a way of grouping your code (e.g. functions).
-
-Functions accept parameters. Therefore, one can study different function’s behaviors by changing the different parameters.
-
-We can use functions and the fact that they return one or multiple values for production.
-
-Built-in functions
-Python provides many built-in functions. You can find the list here: Python built-in functions
-We have already seen some examples of built-in functions, such as print()
, id()
, isinstance()
, enumerate()
and zip()
.
-Another example is bool()
, which takes an argument (e.g. a variable) and returns True or False
-
-
-Or the function is help()
, which invokes the built-in help system.
-
-
-
-
Help on class bool in module builtins:
-
-class bool(int)
- | bool(x) -> bool
- |
- | Returns True when the argument x is true, False otherwise.
- | The builtins True and False are the only two instances of the class bool.
- | The class bool is a subclass of the class int, and cannot be subclassed.
- |
- | Method resolution order:
- | bool
- | int
- | object
- |
- | Methods defined here:
- |
- | __and__(self, value, /)
- | Return self&value.
- |
- | __or__(self, value, /)
- | Return self|value.
- |
- | __rand__(self, value, /)
- | Return value&self.
- |
- | __repr__(self, /)
- | Return repr(self).
- |
- | __ror__(self, value, /)
- | Return value|self.
- |
- | __rxor__(self, value, /)
- | Return value^self.
- |
- | __xor__(self, value, /)
- | Return self^value.
- |
- | ----------------------------------------------------------------------
- | Static methods defined here:
- |
- | __new__(*args, **kwargs)
- | Create and return a new object. See help(type) for accurate signature.
- |
- | ----------------------------------------------------------------------
- | Methods inherited from int:
- |
- | __abs__(self, /)
- | abs(self)
- |
- | __add__(self, value, /)
- | Return self+value.
- |
- | __bool__(self, /)
- | True if self else False
- |
- | __ceil__(...)
- | Ceiling of an Integral returns itself.
- |
- | __divmod__(self, value, /)
- | Return divmod(self, value).
- |
- | __eq__(self, value, /)
- | Return self==value.
- |
- | __float__(self, /)
- | float(self)
- |
- | __floor__(...)
- | Flooring an Integral returns itself.
- |
- | __floordiv__(self, value, /)
- | Return self//value.
- |
- | __format__(self, format_spec, /)
- | Default object formatter.
- |
- | __ge__(self, value, /)
- | Return self>=value.
- |
- | __getattribute__(self, name, /)
- | Return getattr(self, name).
- |
- | __getnewargs__(self, /)
- |
- | __gt__(self, value, /)
- | Return self>value.
- |
- | __hash__(self, /)
- | Return hash(self).
- |
- | __index__(self, /)
- | Return self converted to an integer, if self is suitable for use as an index into a list.
- |
- | __int__(self, /)
- | int(self)
- |
- | __invert__(self, /)
- | ~self
- |
- | __le__(self, value, /)
- | Return self<=value.
- |
- | __lshift__(self, value, /)
- | Return self<<value.
- |
- | __lt__(self, value, /)
- | Return self<value.
- |
- | __mod__(self, value, /)
- | Return self%value.
- |
- | __mul__(self, value, /)
- | Return self*value.
- |
- | __ne__(self, value, /)
- | Return self!=value.
- |
- | __neg__(self, /)
- | -self
- |
- | __pos__(self, /)
- | +self
- |
- | __pow__(self, value, mod=None, /)
- | Return pow(self, value, mod).
- |
- | __radd__(self, value, /)
- | Return value+self.
- |
- | __rdivmod__(self, value, /)
- | Return divmod(value, self).
- |
- | __rfloordiv__(self, value, /)
- | Return value//self.
- |
- | __rlshift__(self, value, /)
- | Return value<<self.
- |
- | __rmod__(self, value, /)
- | Return value%self.
- |
- | __rmul__(self, value, /)
- | Return value*self.
- |
- | __round__(...)
- | Rounding an Integral returns itself.
- |
- | Rounding with an ndigits argument also returns an integer.
- |
- | __rpow__(self, value, mod=None, /)
- | Return pow(value, self, mod).
- |
- | __rrshift__(self, value, /)
- | Return value>>self.
- |
- | __rshift__(self, value, /)
- | Return self>>value.
- |
- | __rsub__(self, value, /)
- | Return value-self.
- |
- | __rtruediv__(self, value, /)
- | Return value/self.
- |
- | __sizeof__(self, /)
- | Returns size in memory, in bytes.
- |
- | __sub__(self, value, /)
- | Return self-value.
- |
- | __truediv__(self, value, /)
- | Return self/value.
- |
- | __trunc__(...)
- | Truncating an Integral returns itself.
- |
- | as_integer_ratio(self, /)
- | Return integer ratio.
- |
- | Return a pair of integers, whose ratio is exactly equal to the original int
- | and with a positive denominator.
- |
- | >>> (10).as_integer_ratio()
- | (10, 1)
- | >>> (-10).as_integer_ratio()
- | (-10, 1)
- | >>> (0).as_integer_ratio()
- | (0, 1)
- |
- | bit_count(self, /)
- | Number of ones in the binary representation of the absolute value of self.
- |
- | Also known as the population count.
- |
- | >>> bin(13)
- | '0b1101'
- | >>> (13).bit_count()
- | 3
- |
- | bit_length(self, /)
- | Number of bits necessary to represent self in binary.
- |
- | >>> bin(37)
- | '0b100101'
- | >>> (37).bit_length()
- | 6
- |
- | conjugate(...)
- | Returns self, the complex conjugate of any int.
- |
- | to_bytes(self, /, length=1, byteorder='big', *, signed=False)
- | Return an array of bytes representing an integer.
- |
- | length
- | Length of bytes object to use. An OverflowError is raised if the
- | integer is not representable with the given number of bytes. Default
- | is length 1.
- | byteorder
- | The byte order used to represent the integer. If byteorder is 'big',
- | the most significant byte is at the beginning of the byte array. If
- | byteorder is 'little', the most significant byte is at the end of the
- | byte array. To request the native byte order of the host system, use
- | `sys.byteorder' as the byte order value. Default is to use 'big'.
- | signed
- | Determines whether two's complement is used to represent the integer.
- | If signed is False and a negative integer is given, an OverflowError
- | is raised.
- |
- | ----------------------------------------------------------------------
- | Class methods inherited from int:
- |
- | from_bytes(bytes, byteorder='big', *, signed=False)
- | Return the integer represented by the given array of bytes.
- |
- | bytes
- | Holds the array of bytes to convert. The argument must either
- | support the buffer protocol or be an iterable object producing bytes.
- | Bytes and bytearray are examples of built-in objects that support the
- | buffer protocol.
- | byteorder
- | The byte order used to represent the integer. If byteorder is 'big',
- | the most significant byte is at the beginning of the byte array. If
- | byteorder is 'little', the most significant byte is at the end of the
- | byte array. To request the native byte order of the host system, use
- | `sys.byteorder' as the byte order value. Default is to use 'big'.
- | signed
- | Indicates whether two's complement is used to represent the integer.
- |
- | ----------------------------------------------------------------------
- | Data descriptors inherited from int:
- |
- | denominator
- | the denominator of a rational number in lowest terms
- |
- | imag
- | the imaginary part of a complex number
- |
- | numerator
- | the numerator of a rational number in lowest terms
- |
- | real
- | the real part of a complex number
-
-
-
-
-
-
-
-II. Creating Functions
-Let’s write a function to compare a list of values against a threshold.
-
-Let’s break down the components:
-
-the function definition starts with def
, followed by name, one or more arguments in parenthesis, and then a colon.
-next comes a docstring
to provide annotation
-the function body follows
-lastly is a return
statement
-
-The function call
allows for function use. It consists of function name and required arguments:
-vals_greater_than_or_equal_to_threshold(arg1, arg2)
where arg1
, arg2
are arbitrary names.
-
-docstring
-
-A docstring
is a string that occurs as first statement in module, function, class, or method definition
-Saved in __doc__
attribute
-Needs to be indented
-'''enclosed in triple quotes like this'''
-
-We gave this function a descriptive docstring to:
-
-
-
-
-
bool(x) -> bool
-
-Returns True when the argument x is true, False otherwise.
-The builtins True and False are the only two instances of the class bool.
-The class bool is a subclass of the class int, and cannot be subclassed.
-
-
-
-
-
-
-
-
Help on function vals_greater_than_or_equal_to_threshold in module __main__:
-
-vals_greater_than_or_equal_to_threshold(vals, thresh)
- PURPOSE: Given a list of values, compare each value against a threshold
-
- INPUTS
- vals list of ints or floats
- thresh int or float
-
- OUTPUT
- bools list of booleans
-
-
-
-
-
-The function body used a list comprehension
for the compare:
-[val >= thresh for val in vals]
-Let’s test our function
-
-
-This gives correct results and does exactly what we want.
-print the docstring
-
-
-
-
PURPOSE: Given a list of values, compare each value against a threshold
-
- INPUTS
- vals list of ints or floats
- thresh int or float
-
- OUTPUT
- bools list of booleans
-
-
-
-
-
-print the help
-
-
-
-
Help on function vals_greater_than_or_equal_to_threshold in module __main__:
-
-vals_greater_than_or_equal_to_threshold(vals, thresh)
- PURPOSE: Given a list of values, compare each value against a threshold
-
- INPUTS
- vals list of ints or floats
- thresh int or float
-
- OUTPUT
- bools list of booleans
-
-
-
-
-
-
-
-III. Arguments and parameters
-Functions need to be called with correct number of parameters.
-This function requires two params, but the function call includes only one param
-
-
-
-
---------------------------------------------------------------------------
-TypeError Traceback (most recent call last)
-Cell In [ 11 ], line 6
- 3 return x + y
- 5 # function call with only 1 of the 2 arguments
-----> 6 fcn_bad_args ( 10 )
-
-TypeError : fcn_bad_args() missing 1 required positional argument: 'y'
-
-
-
-
-When calling a function, parameter order matters.
-
-
-
-
fcn_swapped_args(x,y) = 7
-fcn_swapped_args(y,x) = 11
-
-
-
-
-Generally it’s best to keep parameters in order.
-You can swap the order by putting the parameter names in the function call.
-
-Weirdness Alert
-Note that the same name can be used for the parameter names and the variables passed to them.
-The names themselves have nothng to do with each other!
-In other words, just because a function names an argument x
,
-the variables passed to it don’t have to name x
or anything like it.
-They can even be named the same thing – it does not matter.
-
-Parameters can be positional, where order matters, or by keyword. (JAVI)
-
-
-
-
-IV. Unpacking list-likes using *args
-The *
operator can be passed to avoid specifying the arguments individual.
-
-We can pass a tuple of values to the function…
-
-
-
-
models : ('logreg', 'naive_bayes', 'gbm')
-input arg type : <class 'tuple'>
-input arg length: 3
------------------------------
-logreg
-naive_bayes
-gbm
-
-
-
-
-You can pass a list to the function.
-If you want the elements unpacked, put * before the list.
-
-
-
-
models : ('logreg', 'naive_bayes', 'gbm')
-input arg type : <class 'tuple'>
-input arg length: 3
------------------------------
-logreg
-naive_bayes
-gbm
-
-
-
-
-This approach allows your function to accept an arbitrary number of arguments
-
-
-
-
models : (['a', 'b', 'c', 'd', 'e', 'f', 'g'],)
-input arg type : <class 'tuple'>
-input arg length: 1
------------------------------
-['a', 'b', 'c', 'd', 'e', 'f', 'g']
-
-
-
-
-
-The reverse is true, too.
-You can use the *
operator to pass list-like objects to a function that specifies its arguments.
-
-But, the passed object must be the right length.
-
-
-
-
---------------------------------------------------------------------------
-TypeError Traceback (most recent call last)
-Input In [10], in <cell line: 0> ()
- 1 my_args2 = [ 2 , 8 , 5 ]
-----> 2 arg_expansion_example ( * my_args2 )
-
-TypeError : arg_expansion_example() takes 2 positional arguments but 3 were given
-
-
-
-
-
-
-V. Default Arguments
-default arguments
set the value when left unspecified.
-
-
-The function call didn’t specify printing
, so it defaulted to True.
-Default arguments must follow non-default arguments. This causes trouble:
-
-
-
-
Input In [ 13 ]
- def show_results ( precision , printing = True , uhoh ):
- ^
-SyntaxError : non-default argument follows default argument
-
-
-
-
-
-
-VI. Returning Values
-Functions are not required to have return statement.
-If there is no return statement, function returns None
object.
-Functions can return no value (None
object), one value, or many.
-Any Python object can be returned.
-
-
-
-
-
-
nothing to see here!
-None
-
-
-
-
-
-
-If you don’t need an output, use the dummy variable _
-
-Note: For clarity purposes, it’s generally a good idea to include return statements, even if not returning a value.
-You can use return
or return None
.
-Functions can contain multiple return statements
-
-
-
-
-
-VII. Variable Scope
-A variable’s scope is the part of a program where it is visible .
-Visible means available or usable.
-If a variable is in scope to a function, it is visible the function.
-If it is out of scope to a function, it is not visible the function.
-When a variable is defined inside of a function, is is not visible outside of the function.
-We say such variables are local to the function.
-They are also removed from memory when the function completes.
-
-
-
-
-
z inside function = 4
-memory address of z inside function = 0x87f448
-
-
-
-
-
-
-If we define z
and call the function, the update to z
won’t pass outside the function.
-
-
-
-
z outside: 0x87f408
-z inside function = 4
-memory address of z inside function = 0x87f448
-z = 2
-
-
-
-
-
-Local versus Global Variables
-It is helpful to have a good understanding of local versus global variables.
-Not having this understanding can lead to surprises and confusion.
-Example 1: Variable defined outside function, used inside function
-In the code below:
-x
is global and seen from inside the function.
-r
is local to the function. trying to print outside function throws error.
-
-
-
-Example 2: Variable defined outside function, updated and used inside function
-fcn
uses the local version of x
-
-
-
-
x from fcn: 20
-fcn(6): 26
-x: 10
-
-
-
-
-Example 3: Variable defined outside function. Inside function, print variable, update, and use
-This one may be confusing. It fails!
-Python treats x
inside function as the local x
.
-The print() occurs before x
is assigned, so it can’t find x
.
-
-
-
-
---------------------------------------------------------------------------
-UnboundLocalError Traceback (most recent call last)
-Input In [50], in <cell line: 0> ()
- 7 print ( 'x from fcn, after update:' , x )
- 8 return ( out )
----> 10 print ( 'fcn(6):' , fcn ( 6 ))
- 11 print ( 'x:' , x )
-
-Input In [50], in fcn (a)
- 3 def fcn ( a ):
-----> 4 print ( 'x from fcn, before update:' , x )
- 5 x = 20
- 6 out = x + a
-
-UnboundLocalError : cannot access local variable 'x' where it is not associated with a value
-
-
-
-
-The error can be fixed by referencing x as global
inside function.
-Only necessary if we wish to reassign the variable.
-It is also useful when we want several functions to operate on the same variable
-
-
-
-
x from fcn, before update: 10
-x from fcn, after update: 20
-fcn(6): 26
-x: 20
-
-
-
-
-
-
-
-VIII. Function Design
-Some good practices for creating and using functions:
-
-Make them as simple as possible. In this way, a function will be:
-
-more comprehensible
-easier to maintain
-reusable
-
-This helps avoid situations where a team has 20 variations of similar functions.
-Give your function a good name. What makes a function name good?
-
-It should reflect the action in performs.
-Be consistent in naming conventions.
-A name like compute_variances_sort_save_print
suggests the function is overworked!
-If the function compute_variances
also produces plots and updates variables, it will cause confusion.
-
-Always give your function a docstring:
-
-Particularly important since indicating data types is not required.
-As a side note, you can include this information by using type annotation
.
-
-Function docstrings are stored in attribute __doc__
; they can be shown like this:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/chapters/module-2/026-iterables_and_iterators.html b/chapters/module-2/026-iterables_and_iterators.html
index cff0686..8bf6ca6 100644
--- a/chapters/module-2/026-iterables_and_iterators.html
+++ b/chapters/module-2/026-iterables_and_iterators.html
@@ -32,9 +32,9 @@
-
+
-
+
@@ -197,6 +197,18 @@
Control Structures
Iterables and Iterators
Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-2/0261-functions_exercises.html b/chapters/module-2/0261-functions_exercises.html
index 55f88ab..5a85529 100644
--- a/chapters/module-2/0261-functions_exercises.html
+++ b/chapters/module-2/0261-functions_exercises.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-2/027-functions.html b/chapters/module-2/027-functions.html
index a91e8fd..30fe71c 100644
--- a/chapters/module-2/027-functions.html
+++ b/chapters/module-2/027-functions.html
@@ -32,9 +32,9 @@
-
+
-
+
@@ -61,6 +61,7 @@
+
@@ -196,6 +197,18 @@
Control Structures
Iterables and Iterators
Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
@@ -1813,6 +1826,15 @@ Practice exercises
+
+
+
next
+
Errors and Exceptions
+
+
+
diff --git a/chapters/module-2/In-class_100324.html b/chapters/module-2/In-class_100324.html
deleted file mode 100644
index b24bb26..0000000
--- a/chapters/module-2/In-class_100324.html
+++ /dev/null
@@ -1,482 +0,0 @@
-
-
-
-
-
-
-
-
-
-
- Instructions — DS-1002 Programming for Data Science
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Skip to main content
-
-
-
-
-
- Back to top
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Instructions
-
-Complete Jupyter Notebook for tasks. Clearly show relevant work.
-Before beginning to fill in the notebook, make sure you have written down your name in the first cell, as well as of any collaborators in case you have worked in groups.
-Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel \(\rightarrow\) Restart) and then run all cells (in the menubar, select Cell \(\rightarrow\) Run All).
-Make sure your changes have been changed and when ready, submit the jupyter notebook through Canvas.
-
-
-
-1. The following function aims at returning whether a given list of numbers contains at least one element divisible by 7, but it fails due to a bug. You can test this failing behaviour by passing, for example, the list: [10, 7, 13].
-Redefine the function correcting the bug. Test that it behaves correctly with several examples. In addition, in a separate markdown cell, explain why the function was failing. (3 points)
-
-2. Using a for loop and and if statements, print all the numbers between 10 and 1000 (including both sides) that are divisible by 7 and the sum of their digits is greater than 10, but only if the number itself is also odd. (2 points)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/chapters/module-2/In-class_100324_sols.html b/chapters/module-2/In-class_100324_sols.html
deleted file mode 100644
index 633eeed..0000000
--- a/chapters/module-2/In-class_100324_sols.html
+++ /dev/null
@@ -1,602 +0,0 @@
-
-
-
-
-
-
-
-
-
-
- Instructions — DS-1002 Programming for Data Science
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Skip to main content
-
-
-
-
-
- Back to top
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Instructions
-
-Complete Jupyter Notebook for tasks. Clearly show relevant work.
-Before beginning to fill in the notebook, make sure you have written down your name in the first cell, as well as of any collaborators in case you have worked in groups.
-Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel \(\rightarrow\) Restart) and then run all cells (in the menubar, select Cell \(\rightarrow\) Run All).
-Make sure your changes have been changed and when ready, submit the jupyter notebook through Canvas.
-
-
-
-1. The following function aims at returning whether a given list of numbers contains at least one element divisible by 7, but it fails due to a bug. You can test this failing behaviour by passing, for example, the list: [10, 7, 13].
-Redefine the function correcting the bug. Test that it behaves correctly with several examples. In addition, in a separate markdown cell, explain why the function was failing. (3 points)
-
-
-
-
-The bug was that in the original function, only one iteration was performed, because of both return in the if/else, which made you exit the function after the first iteration. We only need to exit the function early if we happen to hit an element divisible by 7. If there is not, we should carry on looping the list until the end. If that happens, that’s becaus we don’t have any element divisible by 7, and the function should return a False.
-2. Using a for loop and and if statements, print all the numbers between 10 and 1000 (including both sides) that are divisible by 7 and the sum of their digits is greater than 10, but only if the number itself is also odd. (2 points)
-
-
-
-
49
-77
-119
-147
-175
-189
-245
-259
-273
-287
-329
-357
-371
-385
-399
-427
-455
-469
-483
-497
-525
-539
-553
-567
-581
-595
-609
-623
-637
-651
-665
-679
-693
-707
-735
-749
-763
-777
-791
-805
-819
-833
-847
-861
-875
-889
-903
-917
-931
-945
-959
-973
-987
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/chapters/module-3/029-packages.html b/chapters/module-3/029-packages.html
index 345c3a6..565163f 100644
--- a/chapters/module-3/029-packages.html
+++ b/chapters/module-3/029-packages.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-3/03-cover.html b/chapters/module-3/03-cover.html
index d765d82..1b13b5a 100644
--- a/chapters/module-3/03-cover.html
+++ b/chapters/module-3/03-cover.html
@@ -195,6 +195,20 @@
Strings
Data Structures
Control Structures
+Iterables and Iterators
+Functions
+
+Module 3: Python II (Advance)
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-3/031-errors_and_exceptions.html b/chapters/module-3/031-errors_and_exceptions.html
index fb49f89..f2c86a8 100644
--- a/chapters/module-3/031-errors_and_exceptions.html
+++ b/chapters/module-3/031-errors_and_exceptions.html
@@ -32,9 +32,9 @@
-
+
-
+
@@ -202,6 +202,13 @@
+Module 4: Python for Data Science
+
diff --git a/chapters/module-3/031-errors_and_exceptions_w_sols.html b/chapters/module-3/031-errors_and_exceptions_w_sols.html
index 6b69712..9482d03 100644
--- a/chapters/module-3/031-errors_and_exceptions_w_sols.html
+++ b/chapters/module-3/031-errors_and_exceptions_w_sols.html
@@ -32,9 +32,9 @@
-
+
-
+
@@ -203,6 +203,12 @@
Errors and Exceptions
Introduction to object-oriented programming (OOP)
Reading and Writing Files
+
+Module 4: Python for Data Science
+
diff --git a/chapters/module-3/032-classes.html b/chapters/module-3/032-classes.html
index 9939d7a..098e8a4 100644
--- a/chapters/module-3/032-classes.html
+++ b/chapters/module-3/032-classes.html
@@ -32,9 +32,9 @@
-
+
-
+
@@ -61,6 +61,7 @@
+
@@ -201,6 +202,13 @@
+Module 4: Python for Data Science
+
@@ -949,6 +957,15 @@ Practice excersises
+
+
+
next
+
Reading and Writing Files
+
+
+
diff --git a/chapters/module-3/033-reading_writing_files.html b/chapters/module-3/033-reading_writing_files.html
index e3ad510..0be8ded 100644
--- a/chapters/module-3/033-reading_writing_files.html
+++ b/chapters/module-3/033-reading_writing_files.html
@@ -32,9 +32,9 @@
-
+
-
+
@@ -61,6 +61,7 @@
+
@@ -202,6 +203,12 @@
Errors and Exceptions
Introduction to object-oriented programming (OOP)
Reading and Writing Files
+
+Module 4: Python for Data Science
+
@@ -788,6 +795,15 @@ Practice excersises
+
+
+
next
+
NumPy (Part I)
+
+
+
diff --git a/chapters/module-3/lab-recursion.html b/chapters/module-3/lab-recursion.html
index 33ac835..8eadd90 100644
--- a/chapters/module-3/lab-recursion.html
+++ b/chapters/module-3/lab-recursion.html
@@ -32,9 +32,9 @@
-
+
-
+
@@ -206,8 +206,9 @@
Module 4: Python for Data Science
diff --git a/chapters/module-4/041-numpy.html b/chapters/module-4/041-numpy.html
deleted file mode 100644
index bb57a8d..0000000
--- a/chapters/module-4/041-numpy.html
+++ /dev/null
@@ -1,1080 +0,0 @@
-
-
-
-
-
-
-
-
-
-
- PREREQUISITES — DS-1002 Programming for Data Science
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Skip to main content
-
-
-
-
-
- Back to top
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
PREREQUISITES
-
-
-
-
-
-
Contents
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Course : DS1002
-Module : Module 5
-Topic : NumPy Continued
-
-
-
-PREREQUISITES
-
-import / import as
-variables
-creating basic arrays
-
-
-
-
-
-
-Data Types
-One way to control the data type of a NumPy array is to declare it when the array is created using the dtype
keyword argument. Take a look at the data type NumPy uses by default when creating an array with np.zeros()
. Could it be updated?
-
-Using np.zeros()
, create an array of zeros that has three rows and two columns; call it zero_array
.
-Print the data type of zero_array
.
-
-
-
-Create a new array of zeros called zero_int_array
, which will also have three rows and two columns, but the data type should be np.int32
.
-Print the data type of zero_int_array
.
-
-
-
-
-
-
array([6. , 7.5, 8. , 0. , 1. ])
-
-
-
-
-
-
-
-
array([[1, 2, 3, 4],
- [5, 6, 7, 8]])
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
array([ 3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
-
-
-
-
-
-
-
-
array([ 3, -1, -2, 0, 12, 10], dtype=int32)
-
-
-
-
-
-
-
-
array([ 1.25, -9.6 , 42. ])
-
-
-
-
-
-
-Basic Array Manipulations + Calculations
-NumPy has over 500 basic operations, most of which can be performed upon array data. Here are some common/obvious examples:
-
-
-
-
[[10 9 8 7 6]
- [ 5 4 3 2 1]]
-[[ 1 2 3 4 5]
- [ 6 7 8 9 10]]
-[ 1 2 3 4 5 6 7 8 9 10]
-1
-10
-5.5
-[[ 0.84147098 0.90929743 0.14112001 -0.7568025 -0.95892427]
- [-0.2794155 0.6569866 0.98935825 0.41211849 -0.54402111]]
-2.8722813232690143
-
-
-
-
-
-
-Inserting + Dropping Array Values
-There are times it’s useful to drop a specific index or the start/end of an array of values.
-
-
-
-
[15 20 25 30 35 40 45 50]
-
-
-
-
-
-
-
-
[10 15 20 25 30 35 40 45]
-
-
-
-
-
-
-
-
[10 15 25 30 35 40 45 50]
-
-
-
-
-
-
-
-
[15 25 35 45]
-[15 25 35 45]
-
-
-
-
-
-
-Slicing
-Higher Dimensional Arrays
-
-
-
-
array([[1, 2, 3],
- [4, 5, 6],
- [7, 8, 9]])
-
-
-
-
-
-
-Slicing: Simplified notation
-
-A nice visual of a 2D array
-Two-Dimensional Array Slicing
-3D arrays
-
-
-
-
array([[[ 1, 2, 3],
- [ 4, 5, 6]],
-
- [[ 7, 8, 9],
- [10, 11, 12]]])
-
-
-
-
-
-
-
-
-
array([[[ 1, 2, 3],
- [ 4, 5, 6]],
-
- [[ 7, 8, 9],
- [10, 11, 12]]])
-
-
-
-
-If you find NumPy’s way of showing the data a bit difficult to parse visually.
-đź’ˇ Here is a way to visualize 3 and higher dimensional data:
-[ # AXIS 0 CONTAINS 2 ELEMENTS (arrays)
- [ # AXIS 1 CONTAINS 2 ELEMENTS (arrays)
- [ 1 , 2 , 3 ], # AXIS 3 CONTAINS 3 ELEMENTS (integers)
- [ 4 , 5 , 6 ] # AXIS 3
- ],
- [ # AXIS 1
- [ 7 , 8 , 9 ],
- [ 10 , 11 , 12 ]
- ]
-]
-
-
-Each axis is a level in the nested hierarchy, i.e. a tree or DAG (directed-acyclic graph).
-
-Each axis is a container.
-There is only one top container.
-Only the bottom containers have data.
-
-Omit lower indices
-In multidimensional arrays, if you omit later indices, the returned object will be a lower-dimensional ndarray consisting of all the data contained by the higher indexed dimension.
-So in the 2 Ă— 2 Ă— 3 array arr3d
:
-
-
-
-
array([[1, 2, 3],
- [4, 5, 6]])
-
-
-
-
-Saving data before modifying an array.
-
-
-
-
array([[[42, 42, 42],
- [42, 42, 42]],
-
- [[ 7, 8, 9],
- [10, 11, 12]]])
-
-
-
-
-Putting the data back.
-
-
-
-
array([[[ 1, 2, 3],
- [ 4, 5, 6]],
-
- [[ 7, 8, 9],
- [10, 11, 12]]])
-
-
-
-
-Similarly, arr3d[1, 0]
gives you all of the values whose indices start with (1, 0), forming a 1-dimensional array:
-
-
-
-
-
array([[ 7, 8, 9],
- [10, 11, 12]])
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/chapters/module-4/041-numpyI.html b/chapters/module-4/041-numpyI.html
index e74858e..f339712 100644
--- a/chapters/module-4/041-numpyI.html
+++ b/chapters/module-4/041-numpyI.html
@@ -32,9 +32,9 @@
-
+
-
+
@@ -208,6 +208,7 @@
@@ -481,7 +482,7 @@ The ndarray object