diff --git a/_freeze/posts/tidy-data-a-recipe-for-efficient-data-analysis/index/execute-results/html.json b/_freeze/posts/tidy-data-a-recipe-for-efficient-data-analysis/index/execute-results/html.json
index e522e0e..ee318a7 100644
--- a/_freeze/posts/tidy-data-a-recipe-for-efficient-data-analysis/index/execute-results/html.json
+++ b/_freeze/posts/tidy-data-a-recipe-for-efficient-data-analysis/index/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "4c62362bc0bfd7f3f9def2c3cf1de1e4",
+  "hash": "565760e0f3b63c95b458bdd01ef18118",
   "result": {
     "engine": "knitr",
-    "markdown": "---\ntitle: \"Tidy Data: A Recipe for Efficient Data Analysis\"\ndescription: \"On the importance of tidy data for efficient analysis using the analogy of a well-organized kitchen\"\nauthor: \"Christoph Scheuch\"\ndate: \"2023-11-24\" \nimage: thumbnail.png\n---\n\n\nImagine trying to cook a meal in a disorganized kitchen where ingredients are mixed up and nothing is labeled. It would be chaotic and time-consuming to look for the right ingredients and there might be some trial error involved, possibly ruining your planned meal. \n\nTidy data are like a well-organized shelves in your kitchen. Each shelf provides a collection of containers that semantically belong together. Each container on the shelf holds one type of ingredient, and the labels on the containers clearly describe what is inside. In the same way, tidy data organizes information into a clear and consistent format, where each **type of observational unit forms a table**, **each variable is in a column**, and **each observation is in a row**  [@Wickham2014].\n\nTidying data is about structuring datasets to facilitate analysis or report generation. By following the principle that each variable forms a column, each observation forms a row, and each type of observational unit forms a table, data analysis becomes more intuitive, akin to cooking in a well-organized kitchen where everything has its place and you spend less time on searching for ingredients.\n\n## Example for tidy data\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n\ningredients <- tibble(\n  type = c(\"Flour\", \"Sugar\", \"Butter\", \"Eggs\", \"Milk\", \"Salt\", \"Olive Oil\", \"Tomatoes\", \"Chicken\", \"Rice\"),\n  quantity = c(500, 200, 100, 4, 1, 10, 0.2, 300, 400, 250),\n  unit = c(\"grams\", \"grams\", \"grams\", \"units\", \"liters\", \"grams\", \"liters\", \"grams\", \"grams\", \"grams\")\n)\n\nspices <- tibble(\n  type = c(\"Paprika\", \"Turmeric\", \"Cumin\", \"Coriander\", \"Cinnamon\", \"Chili Powder\", \"Oregano\", \"Thyme\", \"Saffron\", \"Nutmeg\"),\n  quantity = c(50, 40, 30, 25, 20, 15, 10, 8, 5, 12),\n  unit = c(\"grams\", \"grams\", \"grams\", \"grams\", \"grams\", \"grams\", \"grams\", \"grams\", \"grams\", \"grams\")\n)\n\ndairies <- tibble(\n  type = c(\"Milk\", \"Butter\", \"Yogurt\", \"Cheese\", \"Cream\", \"Cottage Cheese\", \"Sour Cream\", \"Ghee\", \"Whipping Cream\", \"Ice Cream\"),\n  quantity = c(1, 200, 150, 100, 0.5, 250, 150, 100, 0.3, 500),\n  unit = c(\"liters\", \"grams\", \"grams\", \"grams\", \"liters\", \"grams\", \"grams\", \"grams\", \"liters\", \"grams\")\n)\n```\n:::\n\n\n## When colum headers are values, not variable names\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntibble(\n  type = c(\"Milk\", \"Butter\", \"Yogurt\", \"Cheese\", \"Cream\", \"Cottage Cheese\", \"Sour Cream\", \"Ghee\", \"Whipping Cream\", \"Ice Cream\"),\n  liters = c(1, NA, NA, NA, 0.5, NA, NA, NA, 0.3, NA),\n  grams = c(NA, 200, 150, 100, NA, 250, 150, 100, NA, 500)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 10 × 3\n   type           liters grams\n   <chr>           <dbl> <dbl>\n 1 Milk              1      NA\n 2 Butter           NA     200\n 3 Yogurt           NA     150\n 4 Cheese           NA     100\n 5 Cream             0.5    NA\n 6 Cottage Cheese   NA     250\n 7 Sour Cream       NA     150\n 8 Ghee             NA     100\n 9 Whipping Cream    0.3    NA\n10 Ice Cream        NA     500\n```\n\n\n:::\n:::\n\n\n## When multiple variables are stored in one column\n\nThe `quantity_and_unit` column combines both the quantity and the unit of measurement into one string for each ingredient. This format makes it harder to perform numerical operations on the quantities or to filter or aggregate the data based on the unit of measurement.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntibble(\n  type = c(\"Flour\", \"Sugar\", \"Butter\", \"Eggs\", \"Milk\", \"Salt\", \"Olive Oil\", \"Tomatoes\", \"Chicken\", \"Rice\"),\n  quantity_and_unit = c(\"500 grams\", \"200 grams\", \"100 grams\", \"4 units\", \"1 liter\", \"10 grams\", \"0.2 liters\", \"300 grams\", \"400 grams\", \"250 grams\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 10 × 2\n   type      quantity_and_unit\n   <chr>     <chr>            \n 1 Flour     500 grams        \n 2 Sugar     200 grams        \n 3 Butter    100 grams        \n 4 Eggs      4 units          \n 5 Milk      1 liter          \n 6 Salt      10 grams         \n 7 Olive Oil 0.2 liters       \n 8 Tomatoes  300 grams        \n 9 Chicken   400 grams        \n10 Rice      250 grams        \n```\n\n\n:::\n:::\n\n\n## When variables are stored in both rows and columns\n\nThe quantity for each ingredient for two different recipes is stored in separate columns. This structure makes it harder to perform operations like filtering or summarizing the data by recipe or ingredient.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntibble(\n  ingredient = c(\"Flour\", \"Sugar\", \"Butter\", \"Eggs\", \"Milk\"),\n  recipe1_quantity = c(\"500 grams\", \"200 grams\", \"100 grams\", \"4 units\", \"1 liter\"),\n  recipe2_quantity = c(\"300 grams\", \"150 grams\", \"50 grams\", \"3\", \"0.5 liters\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 3\n  ingredient recipe1_quantity recipe2_quantity\n  <chr>      <chr>            <chr>           \n1 Flour      500 grams        300 grams       \n2 Sugar      200 grams        150 grams       \n3 Butter     100 grams        50 grams        \n4 Eggs       4 units          3               \n5 Milk       1 liter          0.5 liters      \n```\n\n\n:::\n:::\n\n\nTo convert this data to a tidy format, you would typically want to gather the quantities into a single column, and include additional columns to specify the recipe and unit of measurement for each quantity.\n\n## When there are multiple types of data in the same column\n\nThe table is trying to describe a recipe but combines different types of data within the same columns. There are ingredients with their quantities, a utensil, and cooking time, all mixed together.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntibble(\n  type = c(\"Flour\", \"Butter\", \"Whisk\", \"Sugar\", \"Baking Time\"),\n  quantity = c(\"500 grams\", \"100 grams\", \"1\", \"200 grams\", \"30 minutes\"),\n  category = c(\"Ingredient\", \"Ingredient\", \"Utensil\", \"Ingredient\", \"Time\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 3\n  type        quantity   category  \n  <chr>       <chr>      <chr>     \n1 Flour       500 grams  Ingredient\n2 Butter      100 grams  Ingredient\n3 Whisk       1          Utensil   \n4 Sugar       200 grams  Ingredient\n5 Baking Time 30 minutes Time      \n```\n\n\n:::\n:::\n\n\nA tidy approach would typically separate these different types of data into separate tables or at least into distinct sets of columns, making it clear what each part of the data represents and facilitating further analysis and visualization.\n\n## When some data is missing\n\nKey points:\n\n- Huge difference between NA and 0 (or any other value)\n- Are you sure that you don't have the ingredient or do you just don't know?\n- Missing are dropped in filters \n\n\n::: {.cell}\n\n```{.r .cell-code}\ntibble(\n  type = c(\"Flour\", \"Sugar\", \"Butter\", \"Eggs\", \"Milk\", \"Salt\", \"Olive Oil\", \"Tomatoes\", \"Chicken\", NA),\n  quantity = c(NA, 200, 100, 4, 1, 10, 0.2, 300, 400, 250),\n  unit = c(\"grams\", \"grams\", \"grams\", \"units\", NA, \"grams\", \"liters\", \"grams\", \"grams\", \"grams\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 10 × 3\n   type      quantity unit  \n   <chr>        <dbl> <chr> \n 1 Flour         NA   grams \n 2 Sugar        200   grams \n 3 Butter       100   grams \n 4 Eggs           4   units \n 5 Milk           1   <NA>  \n 6 Salt          10   grams \n 7 Olive Oil      0.2 liters\n 8 Tomatoes     300   grams \n 9 Chicken      400   grams \n10 <NA>         250   grams \n```\n\n\n:::\n:::",
+    "markdown": "---\ntitle: \"Tidy Data: A Recipe for Efficient Data Analysis\"\ndescription: \"On the importance of tidy data for efficient analysis using the analogy of a well-organized kitchen\"\nauthor: \"Christoph Scheuch\"\ndate: \"2023-11-24\" \nimage: thumbnail.png\n---\n\n\nImagine trying to cook a meal in a disorganized kitchen where ingredients are mixed up and nothing is labeled. It would be chaotic and time-consuming to look for the right ingredients and there might be some trial error involved, possibly ruining your planned meal. \n\nTidy data are like well-organized shelves in your kitchen. Each shelf provides a collection of containers that semantically belong together, e.g., spices or dairies. Each container on the shelf holds one type of ingredient, and the labels on the containers clearly describe what is inside, e.g., pepper or milk. In the same way, tidy data organizes information into a clear and consistent format, where each **type of observational unit forms a table**, **each variable is in a column**, and **each observation is in a row** [@Wickham2014].\n\nTidying data is about structuring datasets to facilitate analysis, visualization, report generation, or modelling. By following the principle that each variable forms a column, each observation forms a row, and each type of observational unit forms a table, data analysis becomes more intuitive, akin to cooking in a well-organized kitchen where everything has its place and you spend less time on searching for ingredients.\n\n## Example for tidy data\n\nTo illustrate the concept of tidy data in our tidy kitchen, suppose we have a table called `ingredient` that contains information about all the ingredients that we currently have in our kitchen. It might look as follows:\n\n| name      | quantity | unit   | category  |\n|-----------|----------|--------|-----------|\n| flour     | 500      | grams  | baking    |\n| sugar     | 200      | grams  | baking    |\n| butter    | 100      | grams  | dairy     |\n| eggs      | 4        | units  | dairy     |\n| milk      | 1        | liters | dairy     |\n| salt      | 10       | grams  | seasoning |\n| olive oil | 0.2      | liters | oil       |\n| tomatoes  | 300      | grams  | vegetable |\n| chicken   | 400      | grams  | meat      |\n| rice      | 250      | grams  | grain     |\n\nEach row refers to a specific ingredient and each column has a dedicated type and meaning. For instance, the column `quantity` contains information about how much of the ingredient called `name` we currently have and which `unit` we use to measure it. \n\nSimilarly, we could have a table just for `dairy` that might look as follows:\n\n| name           | quantity | unit   |\n|----------------|----------|--------|\n| milk           | 1        | liters |\n| butter         | 200      | grams  |\n| yogurt         | 150      | grams  |\n| cheese         | 100      | grams  |\n| cream          | 0.5      | liters |\n| cottage cheese | 250      | grams  |\n| sour cream     | 150      | grams  |\n| ghee           | 100      | grams  |\n| whipping cream | 0.3      | liters |\n| ice cream      | 500      | grams  |\n\nNotice that there is no `category` column in this table? It would actually be redundant to have this column because all rows in the `dairy`` table have the same category.\n\n## When colum headers are values, not variable names\n\nNow let us move to data structures that are untidy. Consider the following variant of our `dairy` table:\n\n| type           | liters | grams |\n|----------------|--------|-------|\n| milk           | 1      |       |\n| butter         |        | 200   |\n| yogurt         |        | 150   |\n| cheese         |        | 100   |\n| cream          | 0.5    |       |\n| cottage cheese |        | 250   |\n| sour cream     |        | 150   |\n| ghee           |        | 100   |\n| whipping cream | 0.3    |       |\n| ice cream      |        | 500   |\n\nWhat is the issue here? Each row still refers to a specific dairy product. However, instead of  dedicated `quantity` and `unit` columns, we have a `liters` and `grams` column. Since the units differ across dairy products, the table even contains missing values in the form of emtpy cells. So if you want to find out how much of ice cream you still have, you need to also check out the column name.  In practice, we would create dedicated `quantity` and `unit` columns. we might even decide to have the same unit for all ingredients (e.g., measure everything in grams) and just keep a `quantity` column.\n\n## When multiple variables are stored in one column\n\nLet us consider the following untidy version of our `ingredient` table. \n\n| type      | quantity_and_unit |\n|-----------|-------------------|\n| flour     | 500 grams         |\n| sugar     | 200 grams         |\n| butter    | 100 grams         |\n| eggs      | 4 units           |\n| milk      | 1 liter           |\n| salt      | 10 grams          |\n| olive oil | 0.2 liters        |\n| tomatoes  | 300 grams         |\n| chicken   | 400 grams         |\n| rice      | 250 grams         |\n\nThis one is really annoying, since the `quantity_and_unit` column combines both the quantity and the unit of measurement into one string for each ingredient. Why is this an issue? This format actually makes it harder to perform numerical operations on the quantities or to filter or aggregate the data based on the unit of measurement. So in practice, we would actually start our data analysis by splitting out the `quantity_and_unit` column into `quantity` and `unit`.\n\n## When variables are stored in both rows and columns\n\nThe quantity for each ingredient for two different recipes is stored in separate columns. This structure makes it harder to perform operations like filtering or summarizing the data by recipe or ingredient.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──\n✔ dplyr     1.1.2     ✔ readr     2.1.4\n✔ forcats   1.0.0     ✔ stringr   1.5.0\n✔ ggplot2   3.4.2     ✔ tibble    3.2.1\n✔ lubridate 1.9.2     ✔ tidyr     1.3.0\n✔ purrr     1.0.1     \n── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──\n✖ dplyr::filter() masks stats::filter()\n✖ dplyr::lag()    masks stats::lag()\nℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors\n```\n\n\n:::\n\n```{.r .cell-code}\ntibble(\n  ingredient = c(\"Flour\", \"Sugar\", \"Butter\", \"Eggs\", \"Milk\"),\n  recipe1_quantity = c(\"500 grams\", \"200 grams\", \"100 grams\", \"4 units\", \"1 liter\"),\n  recipe2_quantity = c(\"300 grams\", \"150 grams\", \"50 grams\", \"3\", \"0.5 liters\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 3\n  ingredient recipe1_quantity recipe2_quantity\n  <chr>      <chr>            <chr>           \n1 Flour      500 grams        300 grams       \n2 Sugar      200 grams        150 grams       \n3 Butter     100 grams        50 grams        \n4 Eggs       4 units          3               \n5 Milk       1 liter          0.5 liters      \n```\n\n\n:::\n:::\n\n\nTo convert this data to a tidy format, you would typically want to gather the quantities into a single column, and include additional columns to specify the recipe and unit of measurement for each quantity.\n\n## When there are multiple types of data in the same column\n\n\n\n| type         | quantity    | category   |\n|--------------|-------------|------------|\n| flour        | 500 grams   | ingredient |\n| butter       | 100 grams   | ingredient |\n| whisk        | 1           | utensil    |\n| sugar        | 200 grams   | ingredient |\n| baking time  | 30 minutes  | time       |\n\nThe table is trying to describe a recipe but combines different types of data within the same columns. There are ingredients with their quantities, a utensil, and cooking time, all mixed together.\n\nA tidy approach would typically separate these different types of data into separate tables or at least into distinct sets of columns, making it clear what each part of the data represents and facilitating further analysis and visualization.\n\n## When some data is missing\n\nKey points:\n\n- Huge difference between NA and 0 (or any other value)\n- Are you sure that you don't have the ingredient or do you just don't know?\n- Missing are dropped in filters \n\n| type      | quantity | unit   |\n|-----------|----------|--------|\n| flour     |          | grams  |\n| sugar     | 200      | grams  |\n| butter    | 100      | grams  |\n| eggs      | 4        | units  |\n| milk      | 1        |        |\n| salt      | 10       | grams  |\n| olive oil | 0.2      | liters |\n| tomatoes  | 300      | grams  |\n| chicken   | 400      | grams  |\n|           | 250      | grams  |\n",
     "supporting": [],
     "filters": [
       "rmarkdown/pagebreak.lua"
diff --git a/docs/index.html b/docs/index.html
index dc233b8..ecb0801 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -137,7 +137,7 @@
 
 <div class="quarto-listing quarto-listing-container-grid" id="listing-listing">
 <div class="list grid quarto-listing-cols-3">
-<div class="g-col-1" data-index="0" data-listing-date-sort="1700780400000" data-listing-file-modified-sort="1700843651647" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4" data-listing-word-count-sort="746">
+<div class="g-col-1" data-index="0" data-listing-date-sort="1700780400000" data-listing-file-modified-sort="1700916735155" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12" data-listing-word-count-sort="2248">
 <a href="./posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html" class="quarto-grid-link">
 <div class="quarto-grid-item card h-100 card-left">
 <p class="card-img-top">
diff --git a/docs/posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html b/docs/posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html
index 8c49df3..72df2b8 100644
--- a/docs/posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html
+++ b/docs/posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html
@@ -23,40 +23,6 @@
   margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 
   vertical-align: middle;
 }
-/* CSS for syntax highlighting */
-pre > code.sourceCode { white-space: pre; position: relative; }
-pre > code.sourceCode > span { line-height: 1.25; }
-pre > code.sourceCode > span:empty { height: 1.2em; }
-.sourceCode { overflow: visible; }
-code.sourceCode > span { color: inherit; text-decoration: inherit; }
-div.sourceCode { margin: 1em 0; }
-pre.sourceCode { margin: 0; }
-@media screen {
-div.sourceCode { overflow: auto; }
-}
-@media print {
-pre > code.sourceCode { white-space: pre-wrap; }
-pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
-}
-pre.numberSource code
-  { counter-reset: source-line 0; }
-pre.numberSource code > span
-  { position: relative; left: -4em; counter-increment: source-line; }
-pre.numberSource code > span > a:first-child::before
-  { content: counter(source-line);
-    position: relative; left: -1em; text-align: right; vertical-align: baseline;
-    border: none; display: inline-block;
-    -webkit-touch-callout: none; -webkit-user-select: none;
-    -khtml-user-select: none; -moz-user-select: none;
-    -ms-user-select: none; user-select: none;
-    padding: 0 4px; width: 4em;
-  }
-pre.numberSource { margin-left: 3em;  padding-left: 4px; }
-div.sourceCode
-  {   }
-@media screen {
-pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
-}
 /* CSS for citations */
 div.csl-bib-body { }
 div.csl-entry {
@@ -188,156 +154,431 @@ <h1 class="title">Tidy Data: A Recipe for Efficient Data Analysis</h1>
 
 
 <p>Imagine trying to cook a meal in a disorganized kitchen where ingredients are mixed up and nothing is labeled. It would be chaotic and time-consuming to look for the right ingredients and there might be some trial error involved, possibly ruining your planned meal.</p>
-<p>Tidy data are like a well-organized shelves in your kitchen. Each shelf provides a collection of containers that semantically belong together. Each container on the shelf holds one type of ingredient, and the labels on the containers clearly describe what is inside. In the same way, tidy data organizes information into a clear and consistent format, where each <strong>type of observational unit forms a table</strong>, <strong>each variable is in a column</strong>, and <strong>each observation is in a row</strong> <span class="citation" data-cites="Wickham2014">(<a href="#ref-Wickham2014" role="doc-biblioref">Wickham 2014</a>)</span>.</p>
-<p>Tidying data is about structuring datasets to facilitate analysis or report generation. By following the principle that each variable forms a column, each observation forms a row, and each type of observational unit forms a table, data analysis becomes more intuitive, akin to cooking in a well-organized kitchen where everything has its place and you spend less time on searching for ingredients.</p>
+<p>Tidy data are like well-organized shelves in your kitchen. Each shelf provides a collection of containers that semantically belong together, e.g., spices or dairies. Each container on the shelf holds one type of ingredient, and the labels on the containers clearly describe what is inside, e.g., pepper or milk. In the same way, tidy data organizes information into a clear and consistent format, where each <strong>type of observational unit forms a table</strong>, <strong>each variable is in a column</strong>, and <strong>each observation is in a row</strong> <span class="citation" data-cites="Wickham2014">(<a href="#ref-Wickham2014" role="doc-biblioref">Wickham 2014</a>)</span>.</p>
+<p>Tidying data is about structuring datasets to facilitate analysis, visualization, report generation, or modelling. By following the principle that each variable forms a column, each observation forms a row, and each type of observational unit forms a table, data analysis becomes more intuitive, akin to cooking in a well-organized kitchen where everything has its place and you spend less time on searching for ingredients.</p>
 <section id="example-for-tidy-data" class="level2">
 <h2 class="anchored" data-anchor-id="example-for-tidy-data">Example for tidy data</h2>
-<div class="cell">
-<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(tidyverse)</span>
-<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>ingredients <span class="ot">&lt;-</span> <span class="fu">tibble</span>(</span>
-<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="fu">c</span>(<span class="st">"Flour"</span>, <span class="st">"Sugar"</span>, <span class="st">"Butter"</span>, <span class="st">"Eggs"</span>, <span class="st">"Milk"</span>, <span class="st">"Salt"</span>, <span class="st">"Olive Oil"</span>, <span class="st">"Tomatoes"</span>, <span class="st">"Chicken"</span>, <span class="st">"Rice"</span>),</span>
-<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>  <span class="at">quantity =</span> <span class="fu">c</span>(<span class="dv">500</span>, <span class="dv">200</span>, <span class="dv">100</span>, <span class="dv">4</span>, <span class="dv">1</span>, <span class="dv">10</span>, <span class="fl">0.2</span>, <span class="dv">300</span>, <span class="dv">400</span>, <span class="dv">250</span>),</span>
-<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>  <span class="at">unit =</span> <span class="fu">c</span>(<span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"units"</span>, <span class="st">"liters"</span>, <span class="st">"grams"</span>, <span class="st">"liters"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>)</span>
-<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>spices <span class="ot">&lt;-</span> <span class="fu">tibble</span>(</span>
-<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="fu">c</span>(<span class="st">"Paprika"</span>, <span class="st">"Turmeric"</span>, <span class="st">"Cumin"</span>, <span class="st">"Coriander"</span>, <span class="st">"Cinnamon"</span>, <span class="st">"Chili Powder"</span>, <span class="st">"Oregano"</span>, <span class="st">"Thyme"</span>, <span class="st">"Saffron"</span>, <span class="st">"Nutmeg"</span>),</span>
-<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>  <span class="at">quantity =</span> <span class="fu">c</span>(<span class="dv">50</span>, <span class="dv">40</span>, <span class="dv">30</span>, <span class="dv">25</span>, <span class="dv">20</span>, <span class="dv">15</span>, <span class="dv">10</span>, <span class="dv">8</span>, <span class="dv">5</span>, <span class="dv">12</span>),</span>
-<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a>  <span class="at">unit =</span> <span class="fu">c</span>(<span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>)</span>
-<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a>dairies <span class="ot">&lt;-</span> <span class="fu">tibble</span>(</span>
-<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="fu">c</span>(<span class="st">"Milk"</span>, <span class="st">"Butter"</span>, <span class="st">"Yogurt"</span>, <span class="st">"Cheese"</span>, <span class="st">"Cream"</span>, <span class="st">"Cottage Cheese"</span>, <span class="st">"Sour Cream"</span>, <span class="st">"Ghee"</span>, <span class="st">"Whipping Cream"</span>, <span class="st">"Ice Cream"</span>),</span>
-<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a>  <span class="at">quantity =</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">200</span>, <span class="dv">150</span>, <span class="dv">100</span>, <span class="fl">0.5</span>, <span class="dv">250</span>, <span class="dv">150</span>, <span class="dv">100</span>, <span class="fl">0.3</span>, <span class="dv">500</span>),</span>
-<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a>  <span class="at">unit =</span> <span class="fu">c</span>(<span class="st">"liters"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"liters"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"liters"</span>, <span class="st">"grams"</span>)</span>
-<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-</div>
+<p>To illustrate the concept of tidy data in our tidy kitchen, suppose we have a table called <code>ingredient</code> that contains information about all the ingredients that we currently have in our kitchen. It might look as follows:</p>
+<table class="table">
+<thead>
+<tr class="header">
+<th>name</th>
+<th>quantity</th>
+<th>unit</th>
+<th>category</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>flour</td>
+<td>500</td>
+<td>grams</td>
+<td>baking</td>
+</tr>
+<tr class="even">
+<td>sugar</td>
+<td>200</td>
+<td>grams</td>
+<td>baking</td>
+</tr>
+<tr class="odd">
+<td>butter</td>
+<td>100</td>
+<td>grams</td>
+<td>dairy</td>
+</tr>
+<tr class="even">
+<td>eggs</td>
+<td>4</td>
+<td>units</td>
+<td>dairy</td>
+</tr>
+<tr class="odd">
+<td>milk</td>
+<td>1</td>
+<td>liters</td>
+<td>dairy</td>
+</tr>
+<tr class="even">
+<td>salt</td>
+<td>10</td>
+<td>grams</td>
+<td>seasoning</td>
+</tr>
+<tr class="odd">
+<td>olive oil</td>
+<td>0.2</td>
+<td>liters</td>
+<td>oil</td>
+</tr>
+<tr class="even">
+<td>tomatoes</td>
+<td>300</td>
+<td>grams</td>
+<td>vegetable</td>
+</tr>
+<tr class="odd">
+<td>chicken</td>
+<td>400</td>
+<td>grams</td>
+<td>meat</td>
+</tr>
+<tr class="even">
+<td>rice</td>
+<td>250</td>
+<td>grams</td>
+<td>grain</td>
+</tr>
+</tbody>
+</table>
+<p>Each row refers to a specific ingredient and each column has a dedicated type and meaning. For instance, the column <code>quantity</code> contains information about how much of the ingredient called <code>name</code> we currently have and which <code>unit</code> we use to measure it.</p>
+<p>Similarly, we could have a table just for <code>dairy</code> that might look as follows:</p>
+<table class="table">
+<thead>
+<tr class="header">
+<th>name</th>
+<th>quantity</th>
+<th>unit</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>milk</td>
+<td>1</td>
+<td>liters</td>
+</tr>
+<tr class="even">
+<td>butter</td>
+<td>200</td>
+<td>grams</td>
+</tr>
+<tr class="odd">
+<td>yogurt</td>
+<td>150</td>
+<td>grams</td>
+</tr>
+<tr class="even">
+<td>cheese</td>
+<td>100</td>
+<td>grams</td>
+</tr>
+<tr class="odd">
+<td>cream</td>
+<td>0.5</td>
+<td>liters</td>
+</tr>
+<tr class="even">
+<td>cottage cheese</td>
+<td>250</td>
+<td>grams</td>
+</tr>
+<tr class="odd">
+<td>sour cream</td>
+<td>150</td>
+<td>grams</td>
+</tr>
+<tr class="even">
+<td>ghee</td>
+<td>100</td>
+<td>grams</td>
+</tr>
+<tr class="odd">
+<td>whipping cream</td>
+<td>0.3</td>
+<td>liters</td>
+</tr>
+<tr class="even">
+<td>ice cream</td>
+<td>500</td>
+<td>grams</td>
+</tr>
+</tbody>
+</table>
+<p>Notice that there is no <code>category</code> column in this table? It would actually be redundant to have this column because all rows in the `dairy`` table have the same category.</p>
 </section>
 <section id="when-colum-headers-are-values-not-variable-names" class="level2">
 <h2 class="anchored" data-anchor-id="when-colum-headers-are-values-not-variable-names">When colum headers are values, not variable names</h2>
-<div class="cell">
-<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">tibble</span>(</span>
-<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="fu">c</span>(<span class="st">"Milk"</span>, <span class="st">"Butter"</span>, <span class="st">"Yogurt"</span>, <span class="st">"Cheese"</span>, <span class="st">"Cream"</span>, <span class="st">"Cottage Cheese"</span>, <span class="st">"Sour Cream"</span>, <span class="st">"Ghee"</span>, <span class="st">"Whipping Cream"</span>, <span class="st">"Ice Cream"</span>),</span>
-<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>  <span class="at">liters =</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="cn">NA</span>, <span class="cn">NA</span>, <span class="cn">NA</span>, <span class="fl">0.5</span>, <span class="cn">NA</span>, <span class="cn">NA</span>, <span class="cn">NA</span>, <span class="fl">0.3</span>, <span class="cn">NA</span>),</span>
-<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">grams =</span> <span class="fu">c</span>(<span class="cn">NA</span>, <span class="dv">200</span>, <span class="dv">150</span>, <span class="dv">100</span>, <span class="cn">NA</span>, <span class="dv">250</span>, <span class="dv">150</span>, <span class="dv">100</span>, <span class="cn">NA</span>, <span class="dv">500</span>)</span>
-<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-<div class="cell-output cell-output-stdout">
-<pre><code># A tibble: 10 × 3
-   type           liters grams
-   &lt;chr&gt;           &lt;dbl&gt; &lt;dbl&gt;
- 1 Milk              1      NA
- 2 Butter           NA     200
- 3 Yogurt           NA     150
- 4 Cheese           NA     100
- 5 Cream             0.5    NA
- 6 Cottage Cheese   NA     250
- 7 Sour Cream       NA     150
- 8 Ghee             NA     100
- 9 Whipping Cream    0.3    NA
-10 Ice Cream        NA     500</code></pre>
-</div>
-</div>
+<p>Now let us move to data structures that are untidy. Consider the following variant of our <code>dairy</code> table:</p>
+<table class="table">
+<thead>
+<tr class="header">
+<th>type</th>
+<th>liters</th>
+<th>grams</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>milk</td>
+<td>1</td>
+<td></td>
+</tr>
+<tr class="even">
+<td>butter</td>
+<td></td>
+<td>200</td>
+</tr>
+<tr class="odd">
+<td>yogurt</td>
+<td></td>
+<td>150</td>
+</tr>
+<tr class="even">
+<td>cheese</td>
+<td></td>
+<td>100</td>
+</tr>
+<tr class="odd">
+<td>cream</td>
+<td>0.5</td>
+<td></td>
+</tr>
+<tr class="even">
+<td>cottage cheese</td>
+<td></td>
+<td>250</td>
+</tr>
+<tr class="odd">
+<td>sour cream</td>
+<td></td>
+<td>150</td>
+</tr>
+<tr class="even">
+<td>ghee</td>
+<td></td>
+<td>100</td>
+</tr>
+<tr class="odd">
+<td>whipping cream</td>
+<td>0.3</td>
+<td></td>
+</tr>
+<tr class="even">
+<td>ice cream</td>
+<td></td>
+<td>500</td>
+</tr>
+</tbody>
+</table>
+<p>What is the issue here? Each row still refers to a specific dairy product. However, instead of dedicated <code>quantity</code> and <code>unit</code> columns, we have a <code>liters</code> and <code>grams</code> column. Since the units differ across dairy products, the table even contains missing values in the form of emtpy cells. So if you want to find out how much of ice cream you still have, you need to also check out the column name. In practice, we would create dedicated <code>quantity</code> and <code>unit</code> columns. we might even decide to have the same unit for all ingredients (e.g., measure everything in grams) and just keep a <code>quantity</code> column.</p>
 </section>
 <section id="when-multiple-variables-are-stored-in-one-column" class="level2">
 <h2 class="anchored" data-anchor-id="when-multiple-variables-are-stored-in-one-column">When multiple variables are stored in one column</h2>
-<p>The <code>quantity_and_unit</code> column combines both the quantity and the unit of measurement into one string for each ingredient. This format makes it harder to perform numerical operations on the quantities or to filter or aggregate the data based on the unit of measurement.</p>
-<div class="cell">
-<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="fu">tibble</span>(</span>
-<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="fu">c</span>(<span class="st">"Flour"</span>, <span class="st">"Sugar"</span>, <span class="st">"Butter"</span>, <span class="st">"Eggs"</span>, <span class="st">"Milk"</span>, <span class="st">"Salt"</span>, <span class="st">"Olive Oil"</span>, <span class="st">"Tomatoes"</span>, <span class="st">"Chicken"</span>, <span class="st">"Rice"</span>),</span>
-<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>  <span class="at">quantity_and_unit =</span> <span class="fu">c</span>(<span class="st">"500 grams"</span>, <span class="st">"200 grams"</span>, <span class="st">"100 grams"</span>, <span class="st">"4 units"</span>, <span class="st">"1 liter"</span>, <span class="st">"10 grams"</span>, <span class="st">"0.2 liters"</span>, <span class="st">"300 grams"</span>, <span class="st">"400 grams"</span>, <span class="st">"250 grams"</span>)</span>
-<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-<div class="cell-output cell-output-stdout">
-<pre><code># A tibble: 10 × 2
-   type      quantity_and_unit
-   &lt;chr&gt;     &lt;chr&gt;            
- 1 Flour     500 grams        
- 2 Sugar     200 grams        
- 3 Butter    100 grams        
- 4 Eggs      4 units          
- 5 Milk      1 liter          
- 6 Salt      10 grams         
- 7 Olive Oil 0.2 liters       
- 8 Tomatoes  300 grams        
- 9 Chicken   400 grams        
-10 Rice      250 grams        </code></pre>
-</div>
-</div>
+<p>Let us consider the following untidy version of our <code>ingredient</code> table.</p>
+<table class="table">
+<thead>
+<tr class="header">
+<th>type</th>
+<th>quantity_and_unit</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>flour</td>
+<td>500 grams</td>
+</tr>
+<tr class="even">
+<td>sugar</td>
+<td>200 grams</td>
+</tr>
+<tr class="odd">
+<td>butter</td>
+<td>100 grams</td>
+</tr>
+<tr class="even">
+<td>eggs</td>
+<td>4 units</td>
+</tr>
+<tr class="odd">
+<td>milk</td>
+<td>1 liter</td>
+</tr>
+<tr class="even">
+<td>salt</td>
+<td>10 grams</td>
+</tr>
+<tr class="odd">
+<td>olive oil</td>
+<td>0.2 liters</td>
+</tr>
+<tr class="even">
+<td>tomatoes</td>
+<td>300 grams</td>
+</tr>
+<tr class="odd">
+<td>chicken</td>
+<td>400 grams</td>
+</tr>
+<tr class="even">
+<td>rice</td>
+<td>250 grams</td>
+</tr>
+</tbody>
+</table>
+<p>This one is really annoying, since the <code>quantity_and_unit</code> column combines both the quantity and the unit of measurement into one string for each ingredient. Why is this an issue? This format actually makes it harder to perform numerical operations on the quantities or to filter or aggregate the data based on the unit of measurement. So in practice, we would actually start our data analysis by splitting out the <code>quantity_and_unit</code> column into <code>quantity</code> and <code>unit</code>.</p>
 </section>
 <section id="when-variables-are-stored-in-both-rows-and-columns" class="level2">
 <h2 class="anchored" data-anchor-id="when-variables-are-stored-in-both-rows-and-columns">When variables are stored in both rows and columns</h2>
+<p>Let us extend our kitchen analogy by additionally considering recipes. For simplicity, a recipe just denotes how much of each ingredient is required. The following table contains two variants of a recipe for pancakes:</p>
+<table class="table">
+<thead>
+<tr class="header">
+<th>ingredient</th>
+<th>recipe1_quantity</th>
+<th>recipe2_quantity</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>flour</td>
+<td>500 grams</td>
+<td>300 grams</td>
+</tr>
+<tr class="even">
+<td>sugar</td>
+<td>200 grams</td>
+<td>150 grams</td>
+</tr>
+<tr class="odd">
+<td>butter</td>
+<td>100 grams</td>
+<td>50 grams</td>
+</tr>
+<tr class="even">
+<td>eggs</td>
+<td>4 units</td>
+<td>3 units</td>
+</tr>
+<tr class="odd">
+<td>milk</td>
+<td>1 liters</td>
+<td>0.5 liters</td>
+</tr>
+</tbody>
+</table>
 <p>The quantity for each ingredient for two different recipes is stored in separate columns. This structure makes it harder to perform operations like filtering or summarizing the data by recipe or ingredient.</p>
-<div class="cell">
-<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="fu">tibble</span>(</span>
-<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>  <span class="at">ingredient =</span> <span class="fu">c</span>(<span class="st">"Flour"</span>, <span class="st">"Sugar"</span>, <span class="st">"Butter"</span>, <span class="st">"Eggs"</span>, <span class="st">"Milk"</span>),</span>
-<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>  <span class="at">recipe1_quantity =</span> <span class="fu">c</span>(<span class="st">"500 grams"</span>, <span class="st">"200 grams"</span>, <span class="st">"100 grams"</span>, <span class="st">"4 units"</span>, <span class="st">"1 liter"</span>),</span>
-<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">recipe2_quantity =</span> <span class="fu">c</span>(<span class="st">"300 grams"</span>, <span class="st">"150 grams"</span>, <span class="st">"50 grams"</span>, <span class="st">"3"</span>, <span class="st">"0.5 liters"</span>)</span>
-<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-<div class="cell-output cell-output-stdout">
-<pre><code># A tibble: 5 × 3
-  ingredient recipe1_quantity recipe2_quantity
-  &lt;chr&gt;      &lt;chr&gt;            &lt;chr&gt;           
-1 Flour      500 grams        300 grams       
-2 Sugar      200 grams        150 grams       
-3 Butter     100 grams        50 grams        
-4 Eggs       4 units          3               
-5 Milk       1 liter          0.5 liters      </code></pre>
-</div>
-</div>
-<p>To convert this data to a tidy format, you would typically want to gather the quantities into a single column, and include additional columns to specify the recipe and unit of measurement for each quantity.</p>
+<p>To convert this data to a tidy format, you would typically want to gather the quantities into a single column, and include additional columns to specify the recipe and unit of measurement for each quantity. We can then filer</p>
 </section>
 <section id="when-there-are-multiple-types-of-data-in-the-same-column" class="level2">
 <h2 class="anchored" data-anchor-id="when-there-are-multiple-types-of-data-in-the-same-column">When there are multiple types of data in the same column</h2>
+<p>A recipe typically contains information on the required utensils and how much time a step requires. Consider the following table with different types of data:</p>
+<table class="table">
+<thead>
+<tr class="header">
+<th>type</th>
+<th>quantity</th>
+<th>category</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>flour</td>
+<td>500 grams</td>
+<td>ingredient</td>
+</tr>
+<tr class="even">
+<td>butter</td>
+<td>100 grams</td>
+<td>ingredient</td>
+</tr>
+<tr class="odd">
+<td>whisk</td>
+<td>1 unit</td>
+<td>utensil</td>
+</tr>
+<tr class="even">
+<td>sugar</td>
+<td>200 grams</td>
+<td>ingredient</td>
+</tr>
+<tr class="odd">
+<td>baking time</td>
+<td>30 minutes</td>
+<td>time</td>
+</tr>
+</tbody>
+</table>
 <p>The table is trying to describe a recipe but combines different types of data within the same columns. There are ingredients with their quantities, a utensil, and cooking time, all mixed together.</p>
-<div class="cell">
-<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="fu">tibble</span>(</span>
-<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="fu">c</span>(<span class="st">"Flour"</span>, <span class="st">"Butter"</span>, <span class="st">"Whisk"</span>, <span class="st">"Sugar"</span>, <span class="st">"Baking Time"</span>),</span>
-<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>  <span class="at">quantity =</span> <span class="fu">c</span>(<span class="st">"500 grams"</span>, <span class="st">"100 grams"</span>, <span class="st">"1"</span>, <span class="st">"200 grams"</span>, <span class="st">"30 minutes"</span>),</span>
-<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">category =</span> <span class="fu">c</span>(<span class="st">"Ingredient"</span>, <span class="st">"Ingredient"</span>, <span class="st">"Utensil"</span>, <span class="st">"Ingredient"</span>, <span class="st">"Time"</span>)</span>
-<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-<div class="cell-output cell-output-stdout">
-<pre><code># A tibble: 5 × 3
-  type        quantity   category  
-  &lt;chr&gt;       &lt;chr&gt;      &lt;chr&gt;     
-1 Flour       500 grams  Ingredient
-2 Butter      100 grams  Ingredient
-3 Whisk       1          Utensil   
-4 Sugar       200 grams  Ingredient
-5 Baking Time 30 minutes Time      </code></pre>
-</div>
-</div>
 <p>A tidy approach would typically separate these different types of data into separate tables or at least into distinct sets of columns, making it clear what each part of the data represents and facilitating further analysis and visualization.</p>
 </section>
 <section id="when-some-data-is-missing" class="level2">
 <h2 class="anchored" data-anchor-id="when-some-data-is-missing">When some data is missing</h2>
+<p>As a last example for untidy data, let us consider the original <code>ingredient</code> table again, but with a few empty cells.</p>
 <p>Key points:</p>
 <ul>
 <li>Huge difference between NA and 0 (or any other value)</li>
 <li>Are you sure that you don’t have the ingredient or do you just don’t know?</li>
 <li>Missing are dropped in filters</li>
 </ul>
-<div class="cell">
-<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">tibble</span>(</span>
-<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="fu">c</span>(<span class="st">"Flour"</span>, <span class="st">"Sugar"</span>, <span class="st">"Butter"</span>, <span class="st">"Eggs"</span>, <span class="st">"Milk"</span>, <span class="st">"Salt"</span>, <span class="st">"Olive Oil"</span>, <span class="st">"Tomatoes"</span>, <span class="st">"Chicken"</span>, <span class="cn">NA</span>),</span>
-<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>  <span class="at">quantity =</span> <span class="fu">c</span>(<span class="cn">NA</span>, <span class="dv">200</span>, <span class="dv">100</span>, <span class="dv">4</span>, <span class="dv">1</span>, <span class="dv">10</span>, <span class="fl">0.2</span>, <span class="dv">300</span>, <span class="dv">400</span>, <span class="dv">250</span>),</span>
-<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">unit =</span> <span class="fu">c</span>(<span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"units"</span>, <span class="cn">NA</span>, <span class="st">"grams"</span>, <span class="st">"liters"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>, <span class="st">"grams"</span>)</span>
-<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-<div class="cell-output cell-output-stdout">
-<pre><code># A tibble: 10 × 3
-   type      quantity unit  
-   &lt;chr&gt;        &lt;dbl&gt; &lt;chr&gt; 
- 1 Flour         NA   grams 
- 2 Sugar        200   grams 
- 3 Butter       100   grams 
- 4 Eggs           4   units 
- 5 Milk           1   &lt;NA&gt;  
- 6 Salt          10   grams 
- 7 Olive Oil      0.2 liters
- 8 Tomatoes     300   grams 
- 9 Chicken      400   grams 
-10 &lt;NA&gt;         250   grams </code></pre>
-</div>
-</div>
+<table class="table">
+<thead>
+<tr class="header">
+<th>name</th>
+<th>quantity</th>
+<th>unit</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>flour</td>
+<td></td>
+<td>grams</td>
+</tr>
+<tr class="even">
+<td>sugar</td>
+<td>200</td>
+<td>grams</td>
+</tr>
+<tr class="odd">
+<td>butter</td>
+<td>100</td>
+<td>grams</td>
+</tr>
+<tr class="even">
+<td>eggs</td>
+<td>4</td>
+<td>units</td>
+</tr>
+<tr class="odd">
+<td>milk</td>
+<td>10</td>
+<td></td>
+</tr>
+<tr class="even">
+<td>salt</td>
+<td>10</td>
+<td>grams</td>
+</tr>
+<tr class="odd">
+<td>olive oil</td>
+<td>0.2</td>
+<td>liters</td>
+</tr>
+<tr class="even">
+<td>tomatoes</td>
+<td>300</td>
+<td>grams</td>
+</tr>
+<tr class="odd">
+<td>chicken</td>
+<td>400</td>
+<td>grams</td>
+</tr>
+<tr class="even">
+<td></td>
+<td>250</td>
+<td>grams</td>
+</tr>
+</tbody>
+</table>
+<p>What is the issue here? There are actually a couple of them:</p>
+<ul>
+<li>The <code>flour</code> row does have any information about <code>quantity</code>, so we just don’t know how much we have.</li>
+<li>The <code>milk</code> row does not contain a <code>unit</code>, so we might have 10 liters, 10 milliliters, or 10 cups of milk.</li>
+<li>The last row does not have any <code>name</code>, so we have 250 grams of something that we just can’t identify.</li>
+</ul>
+<p>Why is this important? It makes a huge difference how me treat the missing information. For instance, we might make an educated guess for milk if we always record that information in litres, then the missing unit is very likely litres. For flour, we could play it safe and just say that the available quantity is zero. For the ingredient without a name, we might have to throw it away or ask somebody else to tell us what it is.</p>
+<p>Overall, these examples highlight the most important issues that you might have to consider when preparing data for your analysis.</p>
 
 
 
diff --git a/docs/search.json b/docs/search.json
index 09aa8d7..4bf2bf3 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -11,48 +11,48 @@
     "href": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html",
     "title": "Tidy Data: A Recipe for Efficient Data Analysis",
     "section": "",
-    "text": "Imagine trying to cook a meal in a disorganized kitchen where ingredients are mixed up and nothing is labeled. It would be chaotic and time-consuming to look for the right ingredients and there might be some trial error involved, possibly ruining your planned meal.\nTidy data are like a well-organized shelves in your kitchen. Each shelf provides a collection of containers that semantically belong together. Each container on the shelf holds one type of ingredient, and the labels on the containers clearly describe what is inside. In the same way, tidy data organizes information into a clear and consistent format, where each type of observational unit forms a table, each variable is in a column, and each observation is in a row (Wickham 2014).\nTidying data is about structuring datasets to facilitate analysis or report generation. By following the principle that each variable forms a column, each observation forms a row, and each type of observational unit forms a table, data analysis becomes more intuitive, akin to cooking in a well-organized kitchen where everything has its place and you spend less time on searching for ingredients."
+    "text": "Imagine trying to cook a meal in a disorganized kitchen where ingredients are mixed up and nothing is labeled. It would be chaotic and time-consuming to look for the right ingredients and there might be some trial error involved, possibly ruining your planned meal.\nTidy data are like well-organized shelves in your kitchen. Each shelf provides a collection of containers that semantically belong together, e.g., spices or dairies. Each container on the shelf holds one type of ingredient, and the labels on the containers clearly describe what is inside, e.g., pepper or milk. In the same way, tidy data organizes information into a clear and consistent format, where each type of observational unit forms a table, each variable is in a column, and each observation is in a row (Wickham 2014).\nTidying data is about structuring datasets to facilitate analysis, visualization, report generation, or modelling. By following the principle that each variable forms a column, each observation forms a row, and each type of observational unit forms a table, data analysis becomes more intuitive, akin to cooking in a well-organized kitchen where everything has its place and you spend less time on searching for ingredients."
   },
   {
     "objectID": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#example-for-tidy-data",
     "href": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#example-for-tidy-data",
     "title": "Tidy Data: A Recipe for Efficient Data Analysis",
     "section": "Example for tidy data",
-    "text": "Example for tidy data\n\nlibrary(tidyverse)\n\ningredients &lt;- tibble(\n  type = c(\"Flour\", \"Sugar\", \"Butter\", \"Eggs\", \"Milk\", \"Salt\", \"Olive Oil\", \"Tomatoes\", \"Chicken\", \"Rice\"),\n  quantity = c(500, 200, 100, 4, 1, 10, 0.2, 300, 400, 250),\n  unit = c(\"grams\", \"grams\", \"grams\", \"units\", \"liters\", \"grams\", \"liters\", \"grams\", \"grams\", \"grams\")\n)\n\nspices &lt;- tibble(\n  type = c(\"Paprika\", \"Turmeric\", \"Cumin\", \"Coriander\", \"Cinnamon\", \"Chili Powder\", \"Oregano\", \"Thyme\", \"Saffron\", \"Nutmeg\"),\n  quantity = c(50, 40, 30, 25, 20, 15, 10, 8, 5, 12),\n  unit = c(\"grams\", \"grams\", \"grams\", \"grams\", \"grams\", \"grams\", \"grams\", \"grams\", \"grams\", \"grams\")\n)\n\ndairies &lt;- tibble(\n  type = c(\"Milk\", \"Butter\", \"Yogurt\", \"Cheese\", \"Cream\", \"Cottage Cheese\", \"Sour Cream\", \"Ghee\", \"Whipping Cream\", \"Ice Cream\"),\n  quantity = c(1, 200, 150, 100, 0.5, 250, 150, 100, 0.3, 500),\n  unit = c(\"liters\", \"grams\", \"grams\", \"grams\", \"liters\", \"grams\", \"grams\", \"grams\", \"liters\", \"grams\")\n)"
+    "text": "Example for tidy data\nTo illustrate the concept of tidy data in our tidy kitchen, suppose we have a table called ingredient that contains information about all the ingredients that we currently have in our kitchen. It might look as follows:\n\n\n\nname\nquantity\nunit\ncategory\n\n\n\n\nflour\n500\ngrams\nbaking\n\n\nsugar\n200\ngrams\nbaking\n\n\nbutter\n100\ngrams\ndairy\n\n\neggs\n4\nunits\ndairy\n\n\nmilk\n1\nliters\ndairy\n\n\nsalt\n10\ngrams\nseasoning\n\n\nolive oil\n0.2\nliters\noil\n\n\ntomatoes\n300\ngrams\nvegetable\n\n\nchicken\n400\ngrams\nmeat\n\n\nrice\n250\ngrams\ngrain\n\n\n\nEach row refers to a specific ingredient and each column has a dedicated type and meaning. For instance, the column quantity contains information about how much of the ingredient called name we currently have and which unit we use to measure it.\nSimilarly, we could have a table just for dairy that might look as follows:\n\n\n\nname\nquantity\nunit\n\n\n\n\nmilk\n1\nliters\n\n\nbutter\n200\ngrams\n\n\nyogurt\n150\ngrams\n\n\ncheese\n100\ngrams\n\n\ncream\n0.5\nliters\n\n\ncottage cheese\n250\ngrams\n\n\nsour cream\n150\ngrams\n\n\nghee\n100\ngrams\n\n\nwhipping cream\n0.3\nliters\n\n\nice cream\n500\ngrams\n\n\n\nNotice that there is no category column in this table? It would actually be redundant to have this column because all rows in the `dairy`` table have the same category."
   },
   {
     "objectID": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#when-colum-headers-are-values-not-variable-names",
     "href": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#when-colum-headers-are-values-not-variable-names",
     "title": "Tidy Data: A Recipe for Efficient Data Analysis",
     "section": "When colum headers are values, not variable names",
-    "text": "When colum headers are values, not variable names\n\ntibble(\n  type = c(\"Milk\", \"Butter\", \"Yogurt\", \"Cheese\", \"Cream\", \"Cottage Cheese\", \"Sour Cream\", \"Ghee\", \"Whipping Cream\", \"Ice Cream\"),\n  liters = c(1, NA, NA, NA, 0.5, NA, NA, NA, 0.3, NA),\n  grams = c(NA, 200, 150, 100, NA, 250, 150, 100, NA, 500)\n)\n\n# A tibble: 10 × 3\n   type           liters grams\n   &lt;chr&gt;           &lt;dbl&gt; &lt;dbl&gt;\n 1 Milk              1      NA\n 2 Butter           NA     200\n 3 Yogurt           NA     150\n 4 Cheese           NA     100\n 5 Cream             0.5    NA\n 6 Cottage Cheese   NA     250\n 7 Sour Cream       NA     150\n 8 Ghee             NA     100\n 9 Whipping Cream    0.3    NA\n10 Ice Cream        NA     500"
+    "text": "When colum headers are values, not variable names\nNow let us move to data structures that are untidy. Consider the following variant of our dairy table:\n\n\n\ntype\nliters\ngrams\n\n\n\n\nmilk\n1\n\n\n\nbutter\n\n200\n\n\nyogurt\n\n150\n\n\ncheese\n\n100\n\n\ncream\n0.5\n\n\n\ncottage cheese\n\n250\n\n\nsour cream\n\n150\n\n\nghee\n\n100\n\n\nwhipping cream\n0.3\n\n\n\nice cream\n\n500\n\n\n\nWhat is the issue here? Each row still refers to a specific dairy product. However, instead of dedicated quantity and unit columns, we have a liters and grams column. Since the units differ across dairy products, the table even contains missing values in the form of emtpy cells. So if you want to find out how much of ice cream you still have, you need to also check out the column name. In practice, we would create dedicated quantity and unit columns. we might even decide to have the same unit for all ingredients (e.g., measure everything in grams) and just keep a quantity column."
   },
   {
     "objectID": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#when-multiple-variables-are-stored-in-one-column",
     "href": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#when-multiple-variables-are-stored-in-one-column",
     "title": "Tidy Data: A Recipe for Efficient Data Analysis",
     "section": "When multiple variables are stored in one column",
-    "text": "When multiple variables are stored in one column\nThe quantity_and_unit column combines both the quantity and the unit of measurement into one string for each ingredient. This format makes it harder to perform numerical operations on the quantities or to filter or aggregate the data based on the unit of measurement.\n\ntibble(\n  type = c(\"Flour\", \"Sugar\", \"Butter\", \"Eggs\", \"Milk\", \"Salt\", \"Olive Oil\", \"Tomatoes\", \"Chicken\", \"Rice\"),\n  quantity_and_unit = c(\"500 grams\", \"200 grams\", \"100 grams\", \"4 units\", \"1 liter\", \"10 grams\", \"0.2 liters\", \"300 grams\", \"400 grams\", \"250 grams\")\n)\n\n# A tibble: 10 × 2\n   type      quantity_and_unit\n   &lt;chr&gt;     &lt;chr&gt;            \n 1 Flour     500 grams        \n 2 Sugar     200 grams        \n 3 Butter    100 grams        \n 4 Eggs      4 units          \n 5 Milk      1 liter          \n 6 Salt      10 grams         \n 7 Olive Oil 0.2 liters       \n 8 Tomatoes  300 grams        \n 9 Chicken   400 grams        \n10 Rice      250 grams"
+    "text": "When multiple variables are stored in one column\nLet us consider the following untidy version of our ingredient table.\n\n\n\ntype\nquantity_and_unit\n\n\n\n\nflour\n500 grams\n\n\nsugar\n200 grams\n\n\nbutter\n100 grams\n\n\neggs\n4 units\n\n\nmilk\n1 liter\n\n\nsalt\n10 grams\n\n\nolive oil\n0.2 liters\n\n\ntomatoes\n300 grams\n\n\nchicken\n400 grams\n\n\nrice\n250 grams\n\n\n\nThis one is really annoying, since the quantity_and_unit column combines both the quantity and the unit of measurement into one string for each ingredient. Why is this an issue? This format actually makes it harder to perform numerical operations on the quantities or to filter or aggregate the data based on the unit of measurement. So in practice, we would actually start our data analysis by splitting out the quantity_and_unit column into quantity and unit."
   },
   {
     "objectID": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#when-variables-are-stored-in-both-rows-and-columns",
     "href": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#when-variables-are-stored-in-both-rows-and-columns",
     "title": "Tidy Data: A Recipe for Efficient Data Analysis",
     "section": "When variables are stored in both rows and columns",
-    "text": "When variables are stored in both rows and columns\nThe quantity for each ingredient for two different recipes is stored in separate columns. This structure makes it harder to perform operations like filtering or summarizing the data by recipe or ingredient.\n\ntibble(\n  ingredient = c(\"Flour\", \"Sugar\", \"Butter\", \"Eggs\", \"Milk\"),\n  recipe1_quantity = c(\"500 grams\", \"200 grams\", \"100 grams\", \"4 units\", \"1 liter\"),\n  recipe2_quantity = c(\"300 grams\", \"150 grams\", \"50 grams\", \"3\", \"0.5 liters\")\n)\n\n# A tibble: 5 × 3\n  ingredient recipe1_quantity recipe2_quantity\n  &lt;chr&gt;      &lt;chr&gt;            &lt;chr&gt;           \n1 Flour      500 grams        300 grams       \n2 Sugar      200 grams        150 grams       \n3 Butter     100 grams        50 grams        \n4 Eggs       4 units          3               \n5 Milk       1 liter          0.5 liters      \n\n\nTo convert this data to a tidy format, you would typically want to gather the quantities into a single column, and include additional columns to specify the recipe and unit of measurement for each quantity."
+    "text": "When variables are stored in both rows and columns\nLet us extend our kitchen analogy by additionally considering recipes. For simplicity, a recipe just denotes how much of each ingredient is required. The following table contains two variants of a recipe for pancakes:\n\n\n\ningredient\nrecipe1_quantity\nrecipe2_quantity\n\n\n\n\nflour\n500 grams\n300 grams\n\n\nsugar\n200 grams\n150 grams\n\n\nbutter\n100 grams\n50 grams\n\n\neggs\n4 units\n3 units\n\n\nmilk\n1 liters\n0.5 liters\n\n\n\nThe quantity for each ingredient for two different recipes is stored in separate columns. This structure makes it harder to perform operations like filtering or summarizing the data by recipe or ingredient.\nTo convert this data to a tidy format, you would typically want to gather the quantities into a single column, and include additional columns to specify the recipe and unit of measurement for each quantity. We can then filer"
   },
   {
     "objectID": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#when-there-are-multiple-types-of-data-in-the-same-column",
     "href": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#when-there-are-multiple-types-of-data-in-the-same-column",
     "title": "Tidy Data: A Recipe for Efficient Data Analysis",
     "section": "When there are multiple types of data in the same column",
-    "text": "When there are multiple types of data in the same column\nThe table is trying to describe a recipe but combines different types of data within the same columns. There are ingredients with their quantities, a utensil, and cooking time, all mixed together.\n\ntibble(\n  type = c(\"Flour\", \"Butter\", \"Whisk\", \"Sugar\", \"Baking Time\"),\n  quantity = c(\"500 grams\", \"100 grams\", \"1\", \"200 grams\", \"30 minutes\"),\n  category = c(\"Ingredient\", \"Ingredient\", \"Utensil\", \"Ingredient\", \"Time\")\n)\n\n# A tibble: 5 × 3\n  type        quantity   category  \n  &lt;chr&gt;       &lt;chr&gt;      &lt;chr&gt;     \n1 Flour       500 grams  Ingredient\n2 Butter      100 grams  Ingredient\n3 Whisk       1          Utensil   \n4 Sugar       200 grams  Ingredient\n5 Baking Time 30 minutes Time      \n\n\nA tidy approach would typically separate these different types of data into separate tables or at least into distinct sets of columns, making it clear what each part of the data represents and facilitating further analysis and visualization."
+    "text": "When there are multiple types of data in the same column\nA recipe typically contains information on the required utensils and how much time a step requires. Consider the following table with different types of data:\n\n\n\ntype\nquantity\ncategory\n\n\n\n\nflour\n500 grams\ningredient\n\n\nbutter\n100 grams\ningredient\n\n\nwhisk\n1 unit\nutensil\n\n\nsugar\n200 grams\ningredient\n\n\nbaking time\n30 minutes\ntime\n\n\n\nThe table is trying to describe a recipe but combines different types of data within the same columns. There are ingredients with their quantities, a utensil, and cooking time, all mixed together.\nA tidy approach would typically separate these different types of data into separate tables or at least into distinct sets of columns, making it clear what each part of the data represents and facilitating further analysis and visualization."
   },
   {
     "objectID": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#when-some-data-is-missing",
     "href": "posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html#when-some-data-is-missing",
     "title": "Tidy Data: A Recipe for Efficient Data Analysis",
     "section": "When some data is missing",
-    "text": "When some data is missing\nKey points:\n\nHuge difference between NA and 0 (or any other value)\nAre you sure that you don’t have the ingredient or do you just don’t know?\nMissing are dropped in filters\n\n\ntibble(\n  type = c(\"Flour\", \"Sugar\", \"Butter\", \"Eggs\", \"Milk\", \"Salt\", \"Olive Oil\", \"Tomatoes\", \"Chicken\", NA),\n  quantity = c(NA, 200, 100, 4, 1, 10, 0.2, 300, 400, 250),\n  unit = c(\"grams\", \"grams\", \"grams\", \"units\", NA, \"grams\", \"liters\", \"grams\", \"grams\", \"grams\")\n)\n\n# A tibble: 10 × 3\n   type      quantity unit  \n   &lt;chr&gt;        &lt;dbl&gt; &lt;chr&gt; \n 1 Flour         NA   grams \n 2 Sugar        200   grams \n 3 Butter       100   grams \n 4 Eggs           4   units \n 5 Milk           1   &lt;NA&gt;  \n 6 Salt          10   grams \n 7 Olive Oil      0.2 liters\n 8 Tomatoes     300   grams \n 9 Chicken      400   grams \n10 &lt;NA&gt;         250   grams"
+    "text": "When some data is missing\nAs a last example for untidy data, let us consider the original ingredient table again, but with a few empty cells.\nKey points:\n\nHuge difference between NA and 0 (or any other value)\nAre you sure that you don’t have the ingredient or do you just don’t know?\nMissing are dropped in filters\n\n\n\n\nname\nquantity\nunit\n\n\n\n\nflour\n\ngrams\n\n\nsugar\n200\ngrams\n\n\nbutter\n100\ngrams\n\n\neggs\n4\nunits\n\n\nmilk\n10\n\n\n\nsalt\n10\ngrams\n\n\nolive oil\n0.2\nliters\n\n\ntomatoes\n300\ngrams\n\n\nchicken\n400\ngrams\n\n\n\n250\ngrams\n\n\n\nWhat is the issue here? There are actually a couple of them:\n\nThe flour row does have any information about quantity, so we just don’t know how much we have.\nThe milk row does not contain a unit, so we might have 10 liters, 10 milliliters, or 10 cups of milk.\nThe last row does not have any name, so we have 250 grams of something that we just can’t identify.\n\nWhy is this important? It makes a huge difference how me treat the missing information. For instance, we might make an educated guess for milk if we always record that information in litres, then the missing unit is very likely litres. For flour, we could play it safe and just say that the available quantity is zero. For the ingredient without a name, we might have to throw it away or ask somebody else to tell us what it is.\nOverall, these examples highlight the most important issues that you might have to consider when preparing data for your analysis."
   }
 ]
\ No newline at end of file
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 2349c86..e86386f 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -6,6 +6,6 @@
   </url>
   <url>
     <loc>https://www.tidy-intelligence.com/posts/tidy-data-a-recipe-for-efficient-data-analysis/index.html</loc>
-    <lastmod>2023-11-24T16:34:11.647Z</lastmod>
+    <lastmod>2023-11-25T12:52:15.155Z</lastmod>
   </url>
 </urlset>
diff --git a/posts/tidy-data-a-recipe-for-efficient-data-analysis/index.qmd b/posts/tidy-data-a-recipe-for-efficient-data-analysis/index.qmd
index a79eb6f..081484a 100644
--- a/posts/tidy-data-a-recipe-for-efficient-data-analysis/index.qmd
+++ b/posts/tidy-data-a-recipe-for-efficient-data-analysis/index.qmd
@@ -8,96 +8,145 @@ image: thumbnail.png
 
 Imagine trying to cook a meal in a disorganized kitchen where ingredients are mixed up and nothing is labeled. It would be chaotic and time-consuming to look for the right ingredients and there might be some trial error involved, possibly ruining your planned meal. 
 
-Tidy data are like a well-organized shelves in your kitchen. Each shelf provides a collection of containers that semantically belong together. Each container on the shelf holds one type of ingredient, and the labels on the containers clearly describe what is inside. In the same way, tidy data organizes information into a clear and consistent format, where each **type of observational unit forms a table**, **each variable is in a column**, and **each observation is in a row**  [@Wickham2014].
+Tidy data are like well-organized shelves in your kitchen. Each shelf provides a collection of containers that semantically belong together, e.g., spices or dairies. Each container on the shelf holds one type of ingredient, and the labels on the containers clearly describe what is inside, e.g., pepper or milk. In the same way, tidy data organizes information into a clear and consistent format, where each **type of observational unit forms a table**, **each variable is in a column**, and **each observation is in a row** [@Wickham2014].
 
-Tidying data is about structuring datasets to facilitate analysis or report generation. By following the principle that each variable forms a column, each observation forms a row, and each type of observational unit forms a table, data analysis becomes more intuitive, akin to cooking in a well-organized kitchen where everything has its place and you spend less time on searching for ingredients.
+Tidying data is about structuring datasets to facilitate analysis, visualization, report generation, or modelling. By following the principle that each variable forms a column, each observation forms a row, and each type of observational unit forms a table, data analysis becomes more intuitive, akin to cooking in a well-organized kitchen where everything has its place and you spend less time on searching for ingredients.
 
 ## Example for tidy data
 
-```{r}
-#| message: false
-library(tidyverse)
-
-ingredients <- tibble(
-  type = c("Flour", "Sugar", "Butter", "Eggs", "Milk", "Salt", "Olive Oil", "Tomatoes", "Chicken", "Rice"),
-  quantity = c(500, 200, 100, 4, 1, 10, 0.2, 300, 400, 250),
-  unit = c("grams", "grams", "grams", "units", "liters", "grams", "liters", "grams", "grams", "grams")
-)
-
-spices <- tibble(
-  type = c("Paprika", "Turmeric", "Cumin", "Coriander", "Cinnamon", "Chili Powder", "Oregano", "Thyme", "Saffron", "Nutmeg"),
-  quantity = c(50, 40, 30, 25, 20, 15, 10, 8, 5, 12),
-  unit = c("grams", "grams", "grams", "grams", "grams", "grams", "grams", "grams", "grams", "grams")
-)
-
-dairies <- tibble(
-  type = c("Milk", "Butter", "Yogurt", "Cheese", "Cream", "Cottage Cheese", "Sour Cream", "Ghee", "Whipping Cream", "Ice Cream"),
-  quantity = c(1, 200, 150, 100, 0.5, 250, 150, 100, 0.3, 500),
-  unit = c("liters", "grams", "grams", "grams", "liters", "grams", "grams", "grams", "liters", "grams")
-)
-```
+To illustrate the concept of tidy data in our tidy kitchen, suppose we have a table called `ingredient` that contains information about all the ingredients that we currently have in our kitchen. It might look as follows:
+
+| name      | quantity | unit   | category  |
+|-----------|----------|--------|-----------|
+| flour     | 500      | grams  | baking    |
+| sugar     | 200      | grams  | baking    |
+| butter    | 100      | grams  | dairy     |
+| eggs      | 4        | units  | dairy     |
+| milk      | 1        | liters | dairy     |
+| salt      | 10       | grams  | seasoning |
+| olive oil | 0.2      | liters | oil       |
+| tomatoes  | 300      | grams  | vegetable |
+| chicken   | 400      | grams  | meat      |
+| rice      | 250      | grams  | grain     |
+
+Each row refers to a specific ingredient and each column has a dedicated type and meaning. For instance, the column `quantity` contains information about how much of the ingredient called `name` we currently have and which `unit` we use to measure it. 
+
+Similarly, we could have a table just for `dairy` that might look as follows:
+
+| name           | quantity | unit   |
+|----------------|----------|--------|
+| milk           | 1        | liters |
+| butter         | 200      | grams  |
+| yogurt         | 150      | grams  |
+| cheese         | 100      | grams  |
+| cream          | 0.5      | liters |
+| cottage cheese | 250      | grams  |
+| sour cream     | 150      | grams  |
+| ghee           | 100      | grams  |
+| whipping cream | 0.3      | liters |
+| ice cream      | 500      | grams  |
+
+Notice that there is no `category` column in this table? It would actually be redundant to have this column because all rows in the `dairy`` table have the same category.
 
 ## When colum headers are values, not variable names
 
-```{r}
-tibble(
-  type = c("Milk", "Butter", "Yogurt", "Cheese", "Cream", "Cottage Cheese", "Sour Cream", "Ghee", "Whipping Cream", "Ice Cream"),
-  liters = c(1, NA, NA, NA, 0.5, NA, NA, NA, 0.3, NA),
-  grams = c(NA, 200, 150, 100, NA, 250, 150, 100, NA, 500)
-)
-```
+Now let us move to data structures that are untidy. Consider the following variant of our `dairy` table:
+
+| type           | liters | grams |
+|----------------|--------|-------|
+| milk           | 1      |       |
+| butter         |        | 200   |
+| yogurt         |        | 150   |
+| cheese         |        | 100   |
+| cream          | 0.5    |       |
+| cottage cheese |        | 250   |
+| sour cream     |        | 150   |
+| ghee           |        | 100   |
+| whipping cream | 0.3    |       |
+| ice cream      |        | 500   |
+
+What is the issue here? Each row still refers to a specific dairy product. However, instead of  dedicated `quantity` and `unit` columns, we have a `liters` and `grams` column. Since the units differ across dairy products, the table even contains missing values in the form of emtpy cells. So if you want to find out how much of ice cream you still have, you need to also check out the column name.  In practice, we would create dedicated `quantity` and `unit` columns. we might even decide to have the same unit for all ingredients (e.g., measure everything in grams) and just keep a `quantity` column.
 
 ## When multiple variables are stored in one column
 
-The `quantity_and_unit` column combines both the quantity and the unit of measurement into one string for each ingredient. This format makes it harder to perform numerical operations on the quantities or to filter or aggregate the data based on the unit of measurement.
+Let us consider the following untidy version of our `ingredient` table. 
 
-```{r}
-tibble(
-  type = c("Flour", "Sugar", "Butter", "Eggs", "Milk", "Salt", "Olive Oil", "Tomatoes", "Chicken", "Rice"),
-  quantity_and_unit = c("500 grams", "200 grams", "100 grams", "4 units", "1 liter", "10 grams", "0.2 liters", "300 grams", "400 grams", "250 grams")
-)
-```
+| type      | quantity_and_unit |
+|-----------|-------------------|
+| flour     | 500 grams         |
+| sugar     | 200 grams         |
+| butter    | 100 grams         |
+| eggs      | 4 units           |
+| milk      | 1 liter           |
+| salt      | 10 grams          |
+| olive oil | 0.2 liters        |
+| tomatoes  | 300 grams         |
+| chicken   | 400 grams         |
+| rice      | 250 grams         |
+
+This one is really annoying, since the `quantity_and_unit` column combines both the quantity and the unit of measurement into one string for each ingredient. Why is this an issue? This format actually makes it harder to perform numerical operations on the quantities or to filter or aggregate the data based on the unit of measurement. So in practice, we would actually start our data analysis by splitting out the `quantity_and_unit` column into `quantity` and `unit`.
 
 ## When variables are stored in both rows and columns
 
-The quantity for each ingredient for two different recipes is stored in separate columns. This structure makes it harder to perform operations like filtering or summarizing the data by recipe or ingredient.
+Let us extend our kitchen analogy by additionally considering recipes. For simplicity, a recipe just denotes how much of each ingredient is required. The following table contains two variants of a recipe for pancakes:
 
-```{r}
-tibble(
-  ingredient = c("Flour", "Sugar", "Butter", "Eggs", "Milk"),
-  recipe1_quantity = c("500 grams", "200 grams", "100 grams", "4 units", "1 liter"),
-  recipe2_quantity = c("300 grams", "150 grams", "50 grams", "3", "0.5 liters")
-)
-```
+| ingredient | recipe1_quantity | recipe2_quantity |
+|------------|------------------|------------------|
+| flour      | 500 grams        | 300 grams        |
+| sugar      | 200 grams        | 150 grams        |
+| butter     | 100 grams        | 50 grams         |
+| eggs       | 4 units          | 3 units          |
+| milk       | 1 liters         | 0.5 liters       |
 
-To convert this data to a tidy format, you would typically want to gather the quantities into a single column, and include additional columns to specify the recipe and unit of measurement for each quantity.
+The quantity for each ingredient for two different recipes is stored in separate columns. This structure makes it harder to perform operations like filtering or summarizing the data by recipe or ingredient.
+
+To convert this data to a tidy format, you would typically want to gather the quantities into a single column, and include additional columns to specify the recipe and unit of measurement for each quantity. We can then filer 
 
 ## When there are multiple types of data in the same column
 
-The table is trying to describe a recipe but combines different types of data within the same columns. There are ingredients with their quantities, a utensil, and cooking time, all mixed together.
+A recipe typically contains information on the required utensils and how much time a step requires. Consider the following table with different types of data:
 
-```{r}
-tibble(
-  type = c("Flour", "Butter", "Whisk", "Sugar", "Baking Time"),
-  quantity = c("500 grams", "100 grams", "1", "200 grams", "30 minutes"),
-  category = c("Ingredient", "Ingredient", "Utensil", "Ingredient", "Time")
-)
-```
+| type         | quantity    | category   |
+|--------------|-------------|------------|
+| flour        | 500 grams   | ingredient |
+| butter       | 100 grams   | ingredient |
+| whisk        | 1 unit      | utensil    |
+| sugar        | 200 grams   | ingredient |
+| baking time  | 30 minutes  | time       |
+
+The table is trying to describe a recipe but combines different types of data within the same columns. There are ingredients with their quantities, a utensil, and cooking time, all mixed together.
 
 A tidy approach would typically separate these different types of data into separate tables or at least into distinct sets of columns, making it clear what each part of the data represents and facilitating further analysis and visualization.
 
 ## When some data is missing
 
+As a last example for untidy data, let us consider the original `ingredient` table again, but with a few empty cells. 
+
 Key points:
 
 - Huge difference between NA and 0 (or any other value)
 - Are you sure that you don't have the ingredient or do you just don't know?
 - Missing are dropped in filters 
 
-```{r}
-tibble(
-  type = c("Flour", "Sugar", "Butter", "Eggs", "Milk", "Salt", "Olive Oil", "Tomatoes", "Chicken", NA),
-  quantity = c(NA, 200, 100, 4, 1, 10, 0.2, 300, 400, 250),
-  unit = c("grams", "grams", "grams", "units", NA, "grams", "liters", "grams", "grams", "grams")
-)
-```
\ No newline at end of file
+| name      | quantity | unit   |
+|-----------|----------|--------|
+| flour     |          | grams  |
+| sugar     | 200      | grams  |
+| butter    | 100      | grams  |
+| eggs      | 4        | units  |
+| milk      | 10       |        |
+| salt      | 10       | grams  |
+| olive oil | 0.2      | liters |
+| tomatoes  | 300      | grams  |
+| chicken   | 400      | grams  |
+|           | 250      | grams  |
+
+What is the issue here? There are actually a couple of them:
+
+- The `flour` row does have any information about `quantity`, so we just don't know how much we have. 
+- The `milk` row does not contain a `unit`, so we might have 10 liters, 10 milliliters, or 10 cups of milk. 
+- The last row does not have any `name`, so we have 250 grams of something that we just can't identify.
+
+Why is this important? It makes a huge difference how me treat the missing information. For instance, we might make an educated guess for milk if we always record that information in litres, then the missing unit is very likely litres. For flour, we could play it safe and just say that the available quantity is zero. For the ingredient without a name, we might have to throw it away or ask somebody else to tell us what it is. 
+
+Overall, these examples highlight the most important issues that you might have to consider when preparing data for your analysis. 
\ No newline at end of file

name	quantity	unit	category
flour	500	grams	baking
sugar	200	grams	baking
butter	100	grams	dairy
eggs	4	units	dairy
milk	1	liters	dairy
salt	10	grams	seasoning
olive oil	0.2	liters	oil
tomatoes	300	grams	vegetable
chicken	400	grams	meat
rice	250	grams	grain
type	liters	grams
milk	1
butter		200
yogurt		150
cheese		100
cream	0.5
cottage cheese		250
sour cream		150
ghee		100
whipping cream	0.3
ice cream		500
type	quantity_and_unit
flour	500 grams
sugar	200 grams
butter	100 grams
eggs	4 units
milk	1 liter
salt	10 grams
olive oil	0.2 liters
tomatoes	300 grams
chicken	400 grams
rice	250 grams
type	quantity	category
flour	500 grams	ingredient
butter	100 grams	ingredient
whisk	1 unit	utensil
sugar	200 grams	ingredient
baking time	30 minutes	time