diff --git a/.DS_Store b/.DS_Store index 9bb41f39..c395bc2b 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/docs/.DS_Store b/docs/.DS_Store new file mode 100644 index 00000000..1cca5398 Binary files /dev/null and b/docs/.DS_Store differ diff --git a/docs/_freeze/getting-started/01_installation/execute-results/html.json b/docs/_freeze/getting-started/01_installation/execute-results/html.json index 25199ef4..1051e17a 100644 --- a/docs/_freeze/getting-started/01_installation/execute-results/html.json +++ b/docs/_freeze/getting-started/01_installation/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "f67ba1a2adfc794cdce793a18949cc40", + "hash": "2f2ffbe619aadc4e265ba920bfb7b9b6", "result": { "markdown": "---\ntitle: Install\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\n---\n\n\n\n# Quick Install\n\nLet's get you up and running with `pytimetk` fast with the latest stable release. \n\n```bash\npip install pytimetk\n```\n\nYou can install from GitHub with this code. \n\n```bash\npip install git+https://github.com/business-science/pytimetk.git\n```\n\n# Next steps\n\nCheck out the [Quick Start Guide Next.](/getting-started/02_quick_start.html)\n\n# More Coming Soon...\n\nWe are in the early stages of development. But it's obvious the potential for `pytimetk` now in Python. 🐍\n\n- Please [⭐ us on GitHub](https://github.com/business-science/pytimetk) (it takes 2-seconds and means a lot). \n- To make requests, please see our [Project Roadmap GH Issue #2](https://github.com/business-science/pytimetk/issues/2). You can make requests there. \n- Want to contribute? [See our contributing guide here.](/contributing.html) \n\n", "supporting": [ - "01_installation_files\\figure-html" + "01_installation_files/figure-html" ], "filters": [], "includes": {} diff --git a/docs/_freeze/getting-started/02_quick_start/execute-results/html.json b/docs/_freeze/getting-started/02_quick_start/execute-results/html.json index 4ea7210a..6db361a3 100644 --- a/docs/_freeze/getting-started/02_quick_start/execute-results/html.json +++ b/docs/_freeze/getting-started/02_quick_start/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "c4446a656a953aa8ece8e6b2419db3c5", + "hash": "be94e8503518decbc272da87d8234adf", "result": { - "markdown": "---\ntitle: Quick Start\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\n---\n\n\n\n# Quick Start: A Monthly Sales Analysis\n\nThis is a simple exercise to showcase the power of our 2 most popular function:\n\n1. [`summarize_by_time()`](/reference/summarize_by_time.html)\n2. [`plot_timeseries()`](/reference/plot_timeseries.html)\n\n## Import Libraries & Data\n\nFirst, `import pytimetk as tk`. This gets you access to the most important functions. Use `tk.load_dataset()` to load the \"bike_sales_sample\" dataset.\n\n::: {.callout-note collapse=\"false\"}\n## About the Bike Sales Sample Dataset\n\nThis dataset contains \"orderlines\" for orders recieved. The `order_date` column contains timestamps. We can use this column to peform sales aggregations (e.g. total revenue).\n:::\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\nimport pandas as pd\n\ndf = tk.load_dataset('bike_sales_sample')\ndf['order_date'] = pd.to_datetime(df['order_date'])\n\ndf \n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
order_idorder_lineorder_datequantitypricetotal_pricemodelcategory_1category_2frame_materialbikeshop_namecitystate
0112011-01-07160706070Jekyll Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
1122011-01-07159705970Trigger Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
2212011-01-10127702770Beast of the East 1MountainTrailAluminumKansas City 29ersKansas CityKS
3222011-01-10159705970Trigger Carbon 2MountainOver MountainCarbonKansas City 29ersKansas CityKS
4312011-01-1011066010660Supersix Evo Hi-Mod TeamRoadElite RoadCarbonLouisville Race EquipmentLouisvilleKY
..........................................
246132132011-12-22114101410CAAD8 105RoadElite RoadAluminumMiami Race EquipmentMiamiFL
246232212011-12-28112501250Synapse Disc TiagraRoadEndurance RoadAluminumPhoenix Bi-pedsPhoenixAZ
246332222011-12-28126602660Bad Habit 2MountainTrailAluminumPhoenix Bi-pedsPhoenixAZ
246432232011-12-28123402340F-Si 1MountainCross Country RaceAluminumPhoenix Bi-pedsPhoenixAZ
246532242011-12-28158605860Synapse Hi-Mod Dura AceRoadEndurance RoadCarbonPhoenix Bi-pedsPhoenixAZ
\n

2466 rows Γ— 13 columns

\n
\n```\n:::\n:::\n\n\n## Using `summarize_by_time()` for a Sales Analysis\n\nYour company might be interested in sales patterns for various categories of bicycles. We can obtain a grouped monthly sales aggregation by `category_1` in two lines of code:\n\n1. First use pandas's `groupby()` method to group the DataFrame on `category_1`\n2. Next, use timetk's `summarize_by_time()` method to apply the sum function my month start (\"MS\") and use `wide_format = 'False'` to return the dataframe in a long format (Note long format is the default). \n\nThe result is the total revenue for Mountain and Road bikes by month. \n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\nsummary_category_1_df = df \\\n .groupby(\"category_1\") \\\n .summarize_by_time(\n date_column = 'order_date', \n value_column = 'total_price',\n freq = \"MS\",\n agg_func = 'sum',\n wide_format = False\n )\n\n# First 5 rows shown\nsummary_category_1_df.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
category_1order_datetotal_price
0Mountain2011-01-01221490
1Mountain2011-02-01660555
2Mountain2011-03-01358855
3Mountain2011-04-011075975
4Mountain2011-05-01450440
\n
\n```\n:::\n:::\n\n\n## Visualizing Sales Patterns\n\n::: {.callout-note collapse=\"false\"}\n## Now available: `plot_timeseries()`.\n\nPlot time series is a quick and easy way to visualize time series and make professional time series plots. \n:::\n\nWith the data summarized by time, we can visualize with `plot_timeseries()`. `pytimetk` functions are `groupby()` aware meaning they understand if your data is grouped to do things by group. This is useful in time series where we often deal with 100s of time series groups. \n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\nsummary_category_1_df \\\n .groupby('category_1') \\\n .plot_timeseries(\n date_column = 'order_date',\n value_column = 'total_price',\n smooth_frac = 0.8\n )\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n# Next steps\n\nCheck out the [Data Visualization Guide Next.](/guides/01_visualization.html)\n\n# More Coming Soon...\n\nWe are in the early stages of development. But it's obvious the potential for `pytimetk` now in Python. 🐍\n\n- Please [⭐ us on GitHub](https://github.com/business-science/pytimetk) (it takes 2-seconds and means a lot). \n- To make requests, please see our [Project Roadmap GH Issue #2](https://github.com/business-science/pytimetk/issues/2). You can make requests there. \n- Want to contribute? [See our contributing guide here.](/contributing.html) \n\n", + "markdown": "---\ntitle: Quick Start\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\n---\n\n\n\n# Quick Start: A Monthly Sales Analysis\n\nThis is a simple exercise to showcase the power of our 2 most popular function:\n\n1. [`summarize_by_time()`](/reference/summarize_by_time.html)\n2. [`plot_timeseries()`](/reference/plot_timeseries.html)\n\n## Import Libraries & Data\n\nFirst, `import pytimetk as tk`. This gets you access to the most important functions. Use `tk.load_dataset()` to load the \"bike_sales_sample\" dataset.\n\n::: {.callout-note collapse=\"false\"}\n## About the Bike Sales Sample Dataset\n\nThis dataset contains \"orderlines\" for orders recieved. The `order_date` column contains timestamps. We can use this column to peform sales aggregations (e.g. total revenue).\n:::\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\nimport pandas as pd\n\ndf = tk.load_dataset('bike_sales_sample')\ndf['order_date'] = pd.to_datetime(df['order_date'])\n\ndf \n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
order_idorder_lineorder_datequantitypricetotal_pricemodelcategory_1category_2frame_materialbikeshop_namecitystate
0112011-01-07160706070Jekyll Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
1122011-01-07159705970Trigger Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
2212011-01-10127702770Beast of the East 1MountainTrailAluminumKansas City 29ersKansas CityKS
3222011-01-10159705970Trigger Carbon 2MountainOver MountainCarbonKansas City 29ersKansas CityKS
4312011-01-1011066010660Supersix Evo Hi-Mod TeamRoadElite RoadCarbonLouisville Race EquipmentLouisvilleKY
..........................................
246132132011-12-22114101410CAAD8 105RoadElite RoadAluminumMiami Race EquipmentMiamiFL
246232212011-12-28112501250Synapse Disc TiagraRoadEndurance RoadAluminumPhoenix Bi-pedsPhoenixAZ
246332222011-12-28126602660Bad Habit 2MountainTrailAluminumPhoenix Bi-pedsPhoenixAZ
246432232011-12-28123402340F-Si 1MountainCross Country RaceAluminumPhoenix Bi-pedsPhoenixAZ
246532242011-12-28158605860Synapse Hi-Mod Dura AceRoadEndurance RoadCarbonPhoenix Bi-pedsPhoenixAZ
\n

2466 rows Γ— 13 columns

\n
\n```\n:::\n:::\n\n\n## Using `summarize_by_time()` for a Sales Analysis\n\nYour company might be interested in sales patterns for various categories of bicycles. We can obtain a grouped monthly sales aggregation by `category_1` in two lines of code:\n\n1. First use pandas's `groupby()` method to group the DataFrame on `category_1`\n2. Next, use timetk's `summarize_by_time()` method to apply the sum function my month start (\"MS\") and use `wide_format = 'False'` to return the dataframe in a long format (Note long format is the default). \n\nThe result is the total revenue for Mountain and Road bikes by month. \n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\nsummary_category_1_df = df \\\n .groupby(\"category_1\") \\\n .summarize_by_time(\n date_column = 'order_date', \n value_column = 'total_price',\n freq = \"MS\",\n agg_func = 'sum',\n wide_format = False\n )\n\n# First 5 rows shown\nsummary_category_1_df.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
category_1order_datetotal_price
0Mountain2011-01-01221490
1Mountain2011-02-01660555
2Mountain2011-03-01358855
3Mountain2011-04-011075975
4Mountain2011-05-01450440
\n
\n```\n:::\n:::\n\n\n## Visualizing Sales Patterns\n\n::: {.callout-note collapse=\"false\"}\n## Now available: `plot_timeseries()`.\n\nPlot time series is a quick and easy way to visualize time series and make professional time series plots. \n:::\n\nWith the data summarized by time, we can visualize with `plot_timeseries()`. `pytimetk` functions are `groupby()` aware meaning they understand if your data is grouped to do things by group. This is useful in time series where we often deal with 100s of time series groups. \n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\nsummary_category_1_df \\\n .groupby('category_1') \\\n .plot_timeseries(\n date_column = 'order_date',\n value_column = 'total_price',\n smooth_frac = 0.8\n )\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n# Next steps\n\nCheck out the [Data Visualization Guide Next.](/guides/01_visualization.html)\n\n# More Coming Soon...\n\nWe are in the early stages of development. But it's obvious the potential for `pytimetk` now in Python. 🐍\n\n- Please [⭐ us on GitHub](https://github.com/business-science/pytimetk) (it takes 2-seconds and means a lot). \n- To make requests, please see our [Project Roadmap GH Issue #2](https://github.com/business-science/pytimetk/issues/2). You can make requests there. \n- Want to contribute? [See our contributing guide here.](/contributing.html) \n\n", "supporting": [ - "02_quick_start_files\\figure-html" + "02_quick_start_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/guides/01_visualization/execute-results/html.json b/docs/_freeze/guides/01_visualization/execute-results/html.json index 59f36f3e..d9294ac1 100644 --- a/docs/_freeze/guides/01_visualization/execute-results/html.json +++ b/docs/_freeze/guides/01_visualization/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "f44b3cfc7c31b3a2b87ae6b336ea50e2", + "hash": "82ecbea829dd2e38fc90806e2f0e04a5", "result": { - "markdown": "---\ntitle: Data Visualization\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\n---\n\n::: {.callout-note collapse=\"false\"}\n## How this guide benefits you\n\nThis guide covers how to use the `plot_timeseries()` for data visualization. Once you understand how it works, you can apply explore time series data easier than ever. \n:::\n\nThis tutorial focuses on, [plot_timeseries()](https://business-science.github.io/pytimetk/reference/plot_timeseries.html#timetk.plot_timeseries), a workhorse time-series plotting function that:\n\n* Generates interactive plotly plots (great for exploring & streamlit/shiny apps)\n* Consolidates 20+ lines of plotnine/matpotlib & plotly code\n* Scales well to many time series\n* Can be converted from interactive plotly to static plotnine/matplotlib plots\n\n# Libraries\n\nRun the following code to setup for this tutorial.\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\n# Import packages\nimport pytimetk as tk\nimport pandas as pd\n```\n:::\n\n\n# Plotting Time Series\n\nLet’s start with a popular time series, `taylor_30_min`, which includes energy demand in megawatts at a sampling interval of 30-minutes. This is a single time series.\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Import a Time Series Data Set\ntaylor_30_min = tk.load_dataset(\"taylor_30_min\", parse_dates = ['date'])\ntaylor_30_min\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
02000-06-05 00:00:00+00:0022262
12000-06-05 00:30:00+00:0021756
22000-06-05 01:00:00+00:0022247
32000-06-05 01:30:00+00:0022759
42000-06-05 02:00:00+00:0022549
.........
40272000-08-27 21:30:00+00:0027946
40282000-08-27 22:00:00+00:0027133
40292000-08-27 22:30:00+00:0025996
40302000-08-27 23:00:00+00:0024610
40312000-08-27 23:30:00+00:0023132
\n

4032 rows Γ— 2 columns

\n
\n```\n:::\n:::\n\n\nThe [plot_timeseries()](https://business-science.github.io/pytimetk/reference/plot_timeseries.html#timetk.plot_timeseries) function generates an interactive plotly chart by default.\n\n* Simply provide the date variable (time-based column, date_column) and the numeric variable (value_column) that changes over time as the first 2 arguments.\n* By default, the plotting engine is plotly, which is interactive and excellent for data exploration and apps. However, if you require static plots for reports, you can set the engine to engine = 'plotnine' or engine = 'matplotlib'.\n\nInteractive plot\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\ntaylor_30_min.plot_timeseries('date', 'value')\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\nStatic plot\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\ntaylor_30_min.plot_timeseries(\n 'date', 'value',\n engine = 'plotnine'\n)\n```\n\n::: {.cell-output .cell-output-display}\n![](01_visualization_files/figure-html/cell-5-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```\n
\n```\n:::\n:::\n\n\n## Plotting Groups\n\nNext, let’s move on to a dataset with time series groups, m4_monthly, which is a sample of 4 time series from the M4 competition that are sampled at a monthly frequency.\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n# Import a Time Series Data Set\nm4_monthly = tk.load_dataset(\"m4_monthly\", parse_dates = ['date'])\nm4_monthly\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue
0M11976-06-018000
1M11976-07-018350
2M11976-08-018570
3M11976-09-017700
4M11976-10-017080
............
1569M10002015-02-01880
1570M10002015-03-01800
1571M10002015-04-011140
1572M10002015-05-01970
1573M10002015-06-011430
\n

1574 rows Γ— 3 columns

\n
\n```\n:::\n:::\n\n\nVisualizing grouped data is as simple as grouping the data set with `groupby()` before run it into the `plot_timeseries()` function. Here are the key points:\n\n* Groups can be added using the pandas `groupby()`.\n* These groups are then converted into facets.\n* Using `facet_ncol = 2` returns a 2-column faceted plot.\n* Setting `facet_scales = \"free\"` allows the x and y-axes of each plot to scale independently of the other plots.\n\n::: {.cell execution_count=6}\n``` {.python .cell-code}\nm4_monthly.groupby('id').plot_timeseries(\n 'date', 'value', \n facet_ncol = 2, \n facet_scales = \"free\"\n)\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\nThe groups can also be vizualized in the same plot using `color_column` paramenter. Let's come back to `taylor_30_min` dataframe.\n\n::: {.cell execution_count=7}\n``` {.python .cell-code}\n# load data\ntaylor_30_min = tk.load_dataset(\"taylor_30_min\", parse_dates = ['date'])\n\n# extract the month using pandas\ntaylor_30_min['month'] = pd.to_datetime(taylor_30_min['date']).dt.month\n\n# plot groups\ntaylor_30_min.plot_timeseries(\n 'date', 'value', \n color_column = 'month'\n)\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n# Next steps\n\nCheck out the [Pytimetk Basics Guide next.](/guides/02_timetk_concepts.html)\n\n# More Coming Soon...\n\nWe are in the early stages of development. But it's obvious the potential for `pytimetk` now in Python. 🐍\n\n- Please [⭐ us on GitHub](https://github.com/business-science/pytimetk) (it takes 2-seconds and means a lot). \n- To make requests, please see our [Project Roadmap GH Issue #2](https://github.com/business-science/pytimetk/issues/2). You can make requests there. \n- Want to contribute? [See our contributing guide here.](/contributing.html) \n\n", + "markdown": "---\ntitle: Data Visualization\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\n---\n\n::: {.callout-note collapse=\"false\"}\n## How this guide benefits you\n\nThis guide covers how to use the `plot_timeseries()` for data visualization. Once you understand how it works, you can apply explore time series data easier than ever. \n:::\n\nThis tutorial focuses on, [plot_timeseries()](https://business-science.github.io/pytimetk/reference/plot_timeseries.html#timetk.plot_timeseries), a workhorse time-series plotting function that:\n\n* Generates interactive plotly plots (great for exploring & streamlit/shiny apps)\n* Consolidates 20+ lines of plotnine/matpotlib & plotly code\n* Scales well to many time series\n* Can be converted from interactive plotly to static plotnine/matplotlib plots\n\n# Libraries\n\nRun the following code to setup for this tutorial.\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\n# Import packages\nimport pytimetk as tk\nimport pandas as pd\n```\n:::\n\n\n# Plotting Time Series\n\nLet’s start with a popular time series, `taylor_30_min`, which includes energy demand in megawatts at a sampling interval of 30-minutes. This is a single time series.\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Import a Time Series Data Set\ntaylor_30_min = tk.load_dataset(\"taylor_30_min\", parse_dates = ['date'])\ntaylor_30_min\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
02000-06-05 00:00:00+00:0022262
12000-06-05 00:30:00+00:0021756
22000-06-05 01:00:00+00:0022247
32000-06-05 01:30:00+00:0022759
42000-06-05 02:00:00+00:0022549
.........
40272000-08-27 21:30:00+00:0027946
40282000-08-27 22:00:00+00:0027133
40292000-08-27 22:30:00+00:0025996
40302000-08-27 23:00:00+00:0024610
40312000-08-27 23:30:00+00:0023132
\n

4032 rows Γ— 2 columns

\n
\n```\n:::\n:::\n\n\nThe [plot_timeseries()](https://business-science.github.io/pytimetk/reference/plot_timeseries.html#timetk.plot_timeseries) function generates an interactive plotly chart by default.\n\n* Simply provide the date variable (time-based column, date_column) and the numeric variable (value_column) that changes over time as the first 2 arguments.\n* By default, the plotting engine is plotly, which is interactive and excellent for data exploration and apps. However, if you require static plots for reports, you can set the engine to engine = 'plotnine' or engine = 'matplotlib'.\n\nInteractive plot\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\ntaylor_30_min.plot_timeseries('date', 'value')\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\nStatic plot\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\ntaylor_30_min.plot_timeseries(\n 'date', 'value',\n engine = 'plotnine'\n)\n```\n\n::: {.cell-output .cell-output-display}\n![](01_visualization_files/figure-html/cell-5-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```\n
\n```\n:::\n:::\n\n\n## Plotting Groups\n\nNext, let’s move on to a dataset with time series groups, m4_monthly, which is a sample of 4 time series from the M4 competition that are sampled at a monthly frequency.\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n# Import a Time Series Data Set\nm4_monthly = tk.load_dataset(\"m4_monthly\", parse_dates = ['date'])\nm4_monthly\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue
0M11976-06-018000
1M11976-07-018350
2M11976-08-018570
3M11976-09-017700
4M11976-10-017080
............
1569M10002015-02-01880
1570M10002015-03-01800
1571M10002015-04-011140
1572M10002015-05-01970
1573M10002015-06-011430
\n

1574 rows Γ— 3 columns

\n
\n```\n:::\n:::\n\n\nVisualizing grouped data is as simple as grouping the data set with `groupby()` before run it into the `plot_timeseries()` function. Here are the key points:\n\n* Groups can be added using the pandas `groupby()`.\n* These groups are then converted into facets.\n* Using `facet_ncol = 2` returns a 2-column faceted plot.\n* Setting `facet_scales = \"free\"` allows the x and y-axes of each plot to scale independently of the other plots.\n\n::: {.cell execution_count=6}\n``` {.python .cell-code}\nm4_monthly.groupby('id').plot_timeseries(\n 'date', 'value', \n facet_ncol = 2, \n facet_scales = \"free\"\n)\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\nThe groups can also be vizualized in the same plot using `color_column` paramenter. Let's come back to `taylor_30_min` dataframe.\n\n::: {.cell execution_count=7}\n``` {.python .cell-code}\n# load data\ntaylor_30_min = tk.load_dataset(\"taylor_30_min\", parse_dates = ['date'])\n\n# extract the month using pandas\ntaylor_30_min['month'] = pd.to_datetime(taylor_30_min['date']).dt.month\n\n# plot groups\ntaylor_30_min.plot_timeseries(\n 'date', 'value', \n color_column = 'month'\n)\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n# Next steps\n\nCheck out the [Pytimetk Basics Guide next.](/guides/02_timetk_concepts.html)\n\n# More Coming Soon...\n\nWe are in the early stages of development. But it's obvious the potential for `pytimetk` now in Python. 🐍\n\n- Please [⭐ us on GitHub](https://github.com/business-science/pytimetk) (it takes 2-seconds and means a lot). \n- To make requests, please see our [Project Roadmap GH Issue #2](https://github.com/business-science/pytimetk/issues/2). You can make requests there. \n- Want to contribute? [See our contributing guide here.](/contributing.html) \n\n", "supporting": [ - "01_visualization_files\\figure-html" + "01_visualization_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/guides/02_timetk_concepts/execute-results/html.json b/docs/_freeze/guides/02_timetk_concepts/execute-results/html.json index c8e0e7c4..9c43ddf9 100644 --- a/docs/_freeze/guides/02_timetk_concepts/execute-results/html.json +++ b/docs/_freeze/guides/02_timetk_concepts/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "217762ef63d4e97e9449444f2ddfa8af", + "hash": "064052dc4f43bc472ae9282dda2467e6", "result": { "markdown": "---\ntitle: PyTimeTK Basics\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\n---\n\n> *PyTimeTK has one mission:* To make time series analysis simpler, easier, and faster in Python. This goal requires some opinionated ways of treating time series in Python. We will conceptually lay out how `pytimetk` can help. \n\n::: {.callout-note collapse=\"false\"}\n## How this guide benefits you\n\nThis guide covers how to use `pytimetk` conceptually. Once you understand key concepts, you can go from basic to advanced time series analysis very fast. \n:::\n\n\n\nLet's first start with how to think about time series data conceptually. **Time series data has 3 core properties.** \n\n# The 3 Core Properties of Time Series Data\n\nEvery time series DataFrame should have the following properties:\n\n1. *Time Series Index:* A column containing 'datetime64' time stamps.\n2. *Value Columns:* One or more columns containing numeric data that can be aggregated and visualized by time\n3. *Group Columns (Optional):* One or more `categorical` or `str` columns that can be grouped by and time series can be evaluated by groups. \n\nIn practice here's what this looks like using the \"m4_daily\" dataset:\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\n# Import packages\nimport pytimetk as tk\nimport pandas as pd\nimport numpy as np\n\n# Import a Time Series Data Set\nm4_daily_df = tk.load_dataset(\"m4_daily\", parse_dates = ['date'])\nm4_daily_df\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue
0D102014-07-032076.2
1D102014-07-042073.4
2D102014-07-052048.7
3D102014-07-062048.9
4D102014-07-072006.4
............
9738D5002012-09-199418.8
9739D5002012-09-209365.7
9740D5002012-09-219445.9
9741D5002012-09-229497.9
9742D5002012-09-239545.3
\n

9743 rows Γ— 3 columns

\n
\n```\n:::\n:::\n\n\n::: {.callout-note collapse=\"false\"}\n## (Example: m4_daily dataset) 3 Core Properties of Time Series Data\n\nWe can see that the `m4_daily` dataset has:\n\n1. *Time Series Index:* The `date` column\n2. *Value Column(s):* The `value` column\n3. *Group Column(s):* The `id` column\n:::\n\n::: {.callout-important collapse=\"false\"}\n## Missing any of the 3 Core Properties of Time Series Data\n\nIf your data is not formatted properly for `pytimetk`, meaning it's missing columns containing datetime, numeric values, or grouping columns, this can impact your ability to use `pytimetk` for time series anlysis. \n:::\n\n::: {.callout-important collapse=\"false\"}\n## No Pandas Index, No Problem\n\nTimetk standardizes using a date column. This is to reduce friction in converting to other package formats like `polars`, which don't use an an index (each row is indexed by its integer position). \n:::\n\n# The 2 Ways that Timetk Makes Time Series Analysis Easier\n\n::: {.callout-note collapse=\"false\"}\n## 2 Types of Time Series Functions\n\n1. Pandas `DataFrame` Operations\n2. Pandas `Series` Operations \n:::\n\nTimetk contains a number of functions designed to make time series analysis operations easier. In general, these operations come in 2 types of time series functions:\n\n1. *Pandas DataFrame Operations:* These functions work on `pd.DataFrame` objects and derivatives such as `groupby()` objects for Grouped Time Series Analysis. You will see `data` as the first parameter in these functions. \n \n2. *Pandas Series Operations:* These functions work on `pd.Series` objects.\n \n - *Time Series Index Operations:* Are designed for *Time Series index*. You will see `idx` as the first parameter of these functions. In these cases, these functions also work with `datetime64` values (e.g. those produced when you parse_dates via `pd.read_csv()` or create time series with `pd.date_range()`)\n \n - *Numeric Operations:* Are designed for *Numeric Values*. You will see `x` as the first parameter for these functions. \n\nLet's take a look at how to use the different types of Time Series Analysis functions in `pytimetk`. We'll start with Type 1: Pandas `DataFrame` Operations. \n\n## Type 1: Pandas DataFrame Operations\n\nBefore we start using `pytimetk`, let's make sure our data is set up properly. \n\n### Timetk Data Format Compliance\n\n::: {.callout-important collapse=\"false\"}\n## 3 Core Properties Must Be Upheald\n\nA `pytimetk`-Compliant Pandas `DataFrame` must have:\n\n1. *Time Series Index:* A Time Stamp column containing `datetime64` values\n2. *Value Column(s):* The value column(s) containing `float` or `int` values\n3. *Group Column(s):* Optionally for grouped time series analysis, one or more columns containg `str` or `categorical` values (shown as an object)\n\nIf these are NOT upheld, this will impact your ability to use `pytimetk` DataFrame operations. \n:::\n\n::: {.callout-tip collapse=\"false\"}\n## Inspect the DataFrame\n\nUse Pandas `info()` method to check compliance. \n:::\n\nUsing pandas `info()` method, we can see that we have a compliant data frame with a `date` column containing `datetime64` and a `value` column containing `float64`. For grouped analysis we have the `id` column containing `object` dtype. \n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Tip: Inspect for compliance with info()\nm4_daily_df.info()\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\nRangeIndex: 9743 entries, 0 to 9742\nData columns (total 3 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 id 9743 non-null object \n 1 date 9743 non-null datetime64[ns]\n 2 value 9743 non-null float64 \ndtypes: datetime64[ns](1), float64(1), object(1)\nmemory usage: 228.5+ KB\n```\n:::\n:::\n\n\n### Grouped Time Series Analysis with Summarize By Time\n\nFirst, inspect how the `summarize_by_time` function works by calling `help()`. \n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Review the summarize_by_time documentation (output not shown)\nhelp(tk.summarize_by_time)\n```\n:::\n\n\n::: {.callout-note collapse=\"false\"}\n## Help Doc Info: `summarize_by_time()`\n\n- The first parameter is `data`, indicating this is a `DataFrame` operation. \n- The Examples show different use cases for how to apply the function on a DataFrame\n:::\n\nLet's test the `summarize_by_time()` DataFrame operation out using the grouped approach with method chaining. DataFrame operations can be used as Pandas methods with method-chaining, which allows us to more succinctly apply time series operations.\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# Grouped Summarize By Time with Method Chaining\ndf_summarized = (\n m4_daily_df\n .groupby('id')\n .summarize_by_time(\n date_column = 'date',\n value_column = 'value',\n freq = 'QS', # QS = Quarter Start\n agg_func = [\n 'mean', \n 'median', \n 'min',\n ('q25', lambda x: np.quantile(x, 0.25)),\n ('q75', lambda x: np.quantile(x, 0.75)),\n 'max',\n ('range',lambda x: x.max() - x.min()),\n ],\n )\n)\n\ndf_summarized\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue_meanvalue_medianvalue_minvalue_q25value_q75value_maxvalue_range
0D102014-07-011960.0788891979.901781.61915.2252002.5752076.2294.6
1D102014-10-012184.5869572154.052022.82125.0752274.1502344.9322.1
2D102015-01-012309.8300002312.302209.62284.5752342.1502392.4182.8
3D102015-04-012344.4813192333.002185.12301.7502391.0002499.8314.7
4D102015-07-012156.7543482186.701856.61997.2502289.4252368.1511.5
..............................
105D5002011-07-019727.3217399745.558964.59534.12510003.90010463.91499.4
106D5002011-10-018175.5652177897.006755.07669.8758592.5759860.03105.0
107D5002012-01-018291.3175828412.607471.57814.8008677.8508980.71509.2
108D5002012-04-018654.0208798471.108245.68389.8509017.2509349.21103.6
109D5002012-07-018770.5023538690.508348.18604.4008846.0009545.31197.2
\n

110 rows Γ— 9 columns

\n
\n```\n:::\n:::\n\n\n::: {.callout-note collapse=\"false\"}\n## Key Takeaways: `summarize_by_time()`\n\n- The `data` must comply with the 3 core properties (date column, value column(s), and group column(s)) \n- The aggregation functions were applied by combination of group (id) and resample (Quarter Start)\n- The result was a pandas DataFrame with group column, resampled date column, and summary values (mean, median, min, 25th-quantile, etc)\n:::\n\n### Another DataFrame Example: Creating 29 Engineered Features\n\nLet's examine another `DataFrame` function, `tk.augment_timeseries_signature()`. Feel free to inspect the documentation with `help(tk.augment_timeseries_signature)`.\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n# Creating 29 engineered features from the date column\n# Not run: help(tk.augment_timeseries_signature)\ndf_augmented = (\n m4_daily_df\n .augment_timeseries_signature(date_column = 'date')\n)\n\ndf_augmented.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluedate_index_numdate_yeardate_year_isodate_yearstartdate_yearenddate_leapyeardate_half...date_mdaydate_qdaydate_ydaydate_weekenddate_hourdate_minutedate_seconddate_mseconddate_nseconddate_am_pm
0D102014-07-032076.21404345600201420140002...33184000000am
1D102014-07-042073.41404432000201420140002...44185000000am
2D102014-07-052048.71404518400201420140002...55186000000am
3D102014-07-062048.91404604800201420140002...66187100000am
4D102014-07-072006.41404691200201420140002...77188000000am
\n

5 rows Γ— 32 columns

\n
\n```\n:::\n:::\n\n\n::: {.callout-note collapse=\"false\"}\n## Key Takeaways: `augment_timeseries_signature()`\n\n- The `data` must comply with the 1 of the 3 core properties (date column) \n- The result was a pandas DataFrame with 29 time series features that can be used for Machine Learning and Forecasting\n:::\n\n\n### Making Future Dates with Future Frame\n\nA common time series task before forecasting with machine learning models is to make a future DataFrame some `length_out` into the future. You can do this with `tk.future_frame()`. Here's how. \n\n::: {.cell execution_count=6}\n``` {.python .cell-code}\n# Preparing a time series data set for Machine Learning Forecasting\nfull_augmented_df = (\n m4_daily_df \n .groupby('id')\n .future_frame('date', length_out = 365)\n .augment_timeseries_signature('date')\n)\nfull_augmented_df\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluedate_index_numdate_yeardate_year_isodate_yearstartdate_yearenddate_leapyeardate_half...date_mdaydate_qdaydate_ydaydate_weekenddate_hourdate_minutedate_seconddate_mseconddate_nseconddate_am_pm
0D102014-07-032076.21404345600201420140002...33184000000am
1D102014-07-042073.41404432000201420140002...44185000000am
2D102014-07-052048.71404518400201420140002...55186000000am
3D102014-07-062048.91404604800201420140002...66187100000am
4D102014-07-072006.41404691200201420140002...77188000000am
..................................................................
4556D5002013-09-19NaN1379548800201320130002...1981262000000am
4557D5002013-09-20NaN1379635200201320130002...2082263000000am
4558D5002013-09-21NaN1379721600201320130002...2183264000000am
4559D5002013-09-22NaN1379808000201320130002...2284265100000am
4560D5002013-09-23NaN1379894400201320130002...2385266000000am
\n

11203 rows Γ— 32 columns

\n
\n```\n:::\n:::\n\n\nWe can then get the future data by keying in on the data with `value` column that is missing (`np.nan`).\n\n::: {.cell execution_count=7}\n``` {.python .cell-code}\n# Get the future data (just the observations that haven't happened yet)\nfuture_df = (\n full_augmented_df\n .query('value.isna()')\n)\nfuture_df\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluedate_index_numdate_yeardate_year_isodate_yearstartdate_yearenddate_leapyeardate_half...date_mdaydate_qdaydate_ydaydate_weekenddate_hourdate_minutedate_seconddate_mseconddate_nseconddate_am_pm
674D102016-05-07NaN1462579200201620160011...737128000000am
675D102016-05-08NaN1462665600201620160011...838129100000am
676D102016-05-09NaN1462752000201620160011...939130000000am
677D102016-05-10NaN1462838400201620160011...1040131000000am
678D102016-05-11NaN1462924800201620160011...1141132000000am
..................................................................
4556D5002013-09-19NaN1379548800201320130002...1981262000000am
4557D5002013-09-20NaN1379635200201320130002...2082263000000am
4558D5002013-09-21NaN1379721600201320130002...2183264000000am
4559D5002013-09-22NaN1379808000201320130002...2284265100000am
4560D5002013-09-23NaN1379894400201320130002...2385266000000am
\n

1460 rows Γ— 32 columns

\n
\n```\n:::\n:::\n\n\n## Type 2: Pandas Series Operations\n\nThe main difference between a `DataFrame` operation and a Series operation is that we are operating on an array of values from typically one of the following `dtypes`:\n\n1. Timestamps (`datetime64`)\n2. Numeric (`float64` or `int64`) \n\nThe first argument of Series operations that operate on Timestamps will always be `idx`. \n\nLet's take a look at one shall we? We'll start with a common action: Making future time series from an existing time series with a regular frequency. \n\n### The Make Future Time Series Function\n\nSay we have a monthly sequence of timestamps. What if we want to create a forecast where we predict 12 months into the future? Well, we will need to create 12 future timestamps. Here's how. \n\nFirst create a `pd.date_range()` with dates starting at the beginning of each month.\n\n::: {.cell execution_count=8}\n``` {.python .cell-code}\n# Make a monthly date range\ndates_dt = pd.date_range(\"2023-01\", \"2024-01\", freq=\"MS\")\ndates_dt\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```\nDatetimeIndex(['2023-01-01', '2023-02-01', '2023-03-01', '2023-04-01',\n '2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01',\n '2023-09-01', '2023-10-01', '2023-11-01', '2023-12-01',\n '2024-01-01'],\n dtype='datetime64[ns]', freq='MS')\n```\n:::\n:::\n\n\nNext, use `tk.make_future_timeseries()` to create the next 12 timestamps in the sequence. \n\n::: {.panel-tabset group=\"future-dates\"}\n\n## Pandas Series\n\n::: {.cell execution_count=9}\n``` {.python .cell-code}\n# Pandas Series: Future Dates\nfuture_series = pd.Series(dates_dt).make_future_timeseries(12)\nfuture_series\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```\n0 2024-02-01\n1 2024-03-01\n2 2024-04-01\n3 2024-05-01\n4 2024-06-01\n5 2024-07-01\n6 2024-08-01\n7 2024-09-01\n8 2024-10-01\n9 2024-11-01\n10 2024-12-01\n11 2025-01-01\ndtype: datetime64[ns]\n```\n:::\n:::\n\n\n## DateTimeIndex\n\n::: {.cell execution_count=10}\n``` {.python .cell-code}\n# DateTimeIndex: Future Dates\nfuture_dt = tk.make_future_timeseries(\n idx = dates_dt,\n length_out = 12\n)\nfuture_dt\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```\n0 2024-02-01\n1 2024-03-01\n2 2024-04-01\n3 2024-05-01\n4 2024-06-01\n5 2024-07-01\n6 2024-08-01\n7 2024-09-01\n8 2024-10-01\n9 2024-11-01\n10 2024-12-01\n11 2025-01-01\ndtype: datetime64[ns]\n```\n:::\n:::\n\n\n:::\n\nWe can combine the actual and future timestamps into one combined timeseries. \n\n::: {.cell execution_count=11}\n``` {.python .cell-code}\n# Combining the 2 series and resetting the index\ncombined_timeseries = (\n pd.concat(\n [pd.Series(dates_dt), pd.Series(future_dt)],\n axis=0\n )\n .reset_index(drop = True)\n)\n\ncombined_timeseries\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```\n0 2023-01-01\n1 2023-02-01\n2 2023-03-01\n3 2023-04-01\n4 2023-05-01\n5 2023-06-01\n6 2023-07-01\n7 2023-08-01\n8 2023-09-01\n9 2023-10-01\n10 2023-11-01\n11 2023-12-01\n12 2024-01-01\n13 2024-02-01\n14 2024-03-01\n15 2024-04-01\n16 2024-05-01\n17 2024-06-01\n18 2024-07-01\n19 2024-08-01\n20 2024-09-01\n21 2024-10-01\n22 2024-11-01\n23 2024-12-01\n24 2025-01-01\ndtype: datetime64[ns]\n```\n:::\n:::\n\n\nNext, we'll take a look at how to go from an irregular time series to a regular time series. \n\n### Flooring Dates\n\nAn example is `tk.floor_date`, which is used to round down dates. See `help(tk.floor_date)`.\n\nFlooring dates is often used as part of a strategy to go from an irregular time series to regular by combining with an aggregation. Often `summarize_by_time()` is used (I'll share why shortly). But conceptually, date flooring is the secret. \n\n\n::: {.panel-tabset group=\"flooring\"}\n\n## With Flooring\n\n::: {.cell execution_count=12}\n``` {.python .cell-code}\n# Monthly flooring rounds dates down to 1st of the month\nm4_daily_df['date'].floor_date(unit = \"M\")\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```\n0 2014-07-01\n1 2014-07-01\n2 2014-07-01\n3 2014-07-01\n4 2014-07-01\n ... \n9738 2012-09-01\n9739 2012-09-01\n9740 2012-09-01\n9741 2012-09-01\n9742 2012-09-01\nName: date, Length: 9743, dtype: datetime64[ns]\n```\n:::\n:::\n\n\n## Without Flooring\n\n::: {.cell execution_count=13}\n``` {.python .cell-code}\n# Before Flooring\nm4_daily_df['date']\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```\n0 2014-07-03\n1 2014-07-04\n2 2014-07-05\n3 2014-07-06\n4 2014-07-07\n ... \n9738 2012-09-19\n9739 2012-09-20\n9740 2012-09-21\n9741 2012-09-22\n9742 2012-09-23\nName: date, Length: 9743, dtype: datetime64[ns]\n```\n:::\n:::\n\n\n:::\n\nThis \"date flooring\" operation can be useful for creating date groupings.\n\n::: {.cell execution_count=14}\n``` {.python .cell-code}\n# Adding a date group with floor_date()\ndates_grouped_by_month = (\n m4_daily_df\n .assign(date_group = lambda x: x['date'].floor_date(\"M\"))\n)\n\ndates_grouped_by_month\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluedate_group
0D102014-07-032076.22014-07-01
1D102014-07-042073.42014-07-01
2D102014-07-052048.72014-07-01
3D102014-07-062048.92014-07-01
4D102014-07-072006.42014-07-01
...............
9738D5002012-09-199418.82012-09-01
9739D5002012-09-209365.72012-09-01
9740D5002012-09-219445.92012-09-01
9741D5002012-09-229497.92012-09-01
9742D5002012-09-239545.32012-09-01
\n

9743 rows Γ— 4 columns

\n
\n```\n:::\n:::\n\n\nWe can then do grouped operations. \n\n::: {.cell execution_count=15}\n``` {.python .cell-code}\n# Example of a grouped operation with floored dates\nsummary_df = (\n dates_grouped_by_month\n .drop('date', axis=1) \\\n .groupby(['id', 'date_group'])\n .mean() \\\n .reset_index()\n)\n\nsummary_df\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddate_groupvalue
0D102014-07-011967.493103
1D102014-08-011985.548387
2D102014-09-011926.593333
3D102014-10-012100.077419
4D102014-11-012155.326667
............
318D5002012-05-018407.096774
319D5002012-06-019124.903333
320D5002012-07-018674.551613
321D5002012-08-018666.054839
322D5002012-09-019040.604348
\n

323 rows Γ— 3 columns

\n
\n```\n:::\n:::\n\n\nOf course for this operation, we can do it faster with `summarize_by_time()` (and it's much more flexible). \n\n::: {.cell execution_count=16}\n``` {.python .cell-code}\n# Summarize by time is less code and more flexible\n(\n m4_daily_df \n .groupby('id')\n .summarize_by_time(\n 'date', 'value', \n freq = \"MS\",\n agg_func = ['mean', 'median', 'min', 'max']\n )\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue_meanvalue_medianvalue_minvalue_max
0D102014-07-011967.4931031978.801876.02076.2
1D102014-08-011985.5483871995.601914.72027.5
2D102014-09-011926.5933331920.951781.62023.5
3D102014-10-012100.0774192107.602022.82154.9
4D102014-11-012155.3266672149.302083.52245.4
.....................
318D5002012-05-018407.0967748430.808245.68578.1
319D5002012-06-019124.9033339163.858686.19349.2
320D5002012-07-018674.5516138673.608407.59091.1
321D5002012-08-018666.0548398667.408348.18939.6
322D5002012-09-019040.6043489091.408500.09545.3
\n

323 rows Γ— 6 columns

\n
\n```\n:::\n:::\n\n\nAnd that's the core idea behind `pytimetk`, writing less code and getting more. \n\n\n\nNext, let's do one more function. The brother of `augment_timeseries_signature()`...\n\n### The Get Time Series Signature Function\n\nThis function takes a pandas `Series` or `DateTimeIndex` and returns a `DataFrame` containing the 29 engineered features. \n\nStart with either a DateTimeIndex...\n\n::: {.cell execution_count=17}\n``` {.python .cell-code}\ntimestamps_dt = pd.date_range(\"2023\", \"2024\", freq = \"D\")\ntimestamps_dt\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```\nDatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',\n '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',\n '2023-01-09', '2023-01-10',\n ...\n '2023-12-23', '2023-12-24', '2023-12-25', '2023-12-26',\n '2023-12-27', '2023-12-28', '2023-12-29', '2023-12-30',\n '2023-12-31', '2024-01-01'],\n dtype='datetime64[ns]', length=366, freq='D')\n```\n:::\n:::\n\n\n... Or a Pandas Series.\n\n::: {.cell execution_count=18}\n``` {.python .cell-code}\ntimestamps_series = pd.Series(timestamps_dt)\ntimestamps_series\n```\n\n::: {.cell-output .cell-output-display execution_count=18}\n```\n0 2023-01-01\n1 2023-01-02\n2 2023-01-03\n3 2023-01-04\n4 2023-01-05\n ... \n361 2023-12-28\n362 2023-12-29\n363 2023-12-30\n364 2023-12-31\n365 2024-01-01\nLength: 366, dtype: datetime64[ns]\n```\n:::\n:::\n\n\nAnd you can use the pandas Series function, `tk.get_timeseries_signature()` to create 29 features from the date sequence. \n\n::: {.panel-tabset group=\"get_timeseries_signature\"}\n\n## Pandas Series\n\n::: {.cell execution_count=19}\n``` {.python .cell-code}\n# Pandas series: get_timeseries_signature\ntimestamps_series.get_timeseries_signature()\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
index_numyearyear_isoyearstartyearendleapyearhalfquarterquarteryearquarterstart...mdayqdayydayweekendhourminutesecondmsecondnsecondam_pm
0167253120020232022100112023Q11...111100000am
1167261760020232023000112023Q10...222000000am
2167270400020232023000112023Q10...333000000am
3167279040020232023000112023Q10...444000000am
4167287680020232023000112023Q10...555000000am
..................................................................
361170372160020232023000242023Q40...2889362000000am
362170380800020232023000242023Q40...2990363000000am
363170389440020232023000242023Q40...3091364000000am
364170398080020232023010242023Q40...3192365100000am
365170406720020242024101112024Q11...111000000am
\n

366 rows Γ— 29 columns

\n
\n```\n:::\n:::\n\n\n## DateTimeIndex\n\n::: {.cell execution_count=20}\n``` {.python .cell-code}\n# DateTimeIndex: get_timeseries_signature\ntk.get_timeseries_signature(timestamps_dt)\n```\n\n::: {.cell-output .cell-output-display execution_count=20}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
index_numyearyear_isoyearstartyearendleapyearhalfquarterquarteryearquarterstart...mdayqdayydayweekendhourminutesecondmsecondnsecondam_pm
0167253120020232022100112023Q11...111100000am
1167261760020232023000112023Q10...222000000am
2167270400020232023000112023Q10...333000000am
3167279040020232023000112023Q10...444000000am
4167287680020232023000112023Q10...555000000am
..................................................................
361170372160020232023000242023Q40...2889362000000am
362170380800020232023000242023Q40...2990363000000am
363170389440020232023000242023Q40...3091364000000am
364170398080020232023010242023Q40...3192365100000am
365170406720020242024101112024Q11...111000000am
\n

366 rows Γ— 29 columns

\n
\n```\n:::\n:::\n\n\n:::\n\n# Next steps\n\nCheck out the [Pandas Frequency Guide next.](/guides/03_pandas_frequency.html)\n\n# More Coming Soon...\n\nWe are in the early stages of development. But it's obvious the potential for `pytimetk` now in Python. 🐍\n\n- Please [⭐ us on GitHub](https://github.com/business-science/pytimetk) (it takes 2-seconds and means a lot). \n- To make requests, please see our [Project Roadmap GH Issue #2](https://github.com/business-science/pytimetk/issues/2). You can make requests there. \n- Want to contribute? [See our contributing guide here.](/contributing.html) \n\n", "supporting": [ - "02_timetk_concepts_files\\figure-html" + "02_timetk_concepts_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/guides/03_pandas_frequency/execute-results/html.json b/docs/_freeze/guides/03_pandas_frequency/execute-results/html.json index 0664d39a..6cc498e8 100644 --- a/docs/_freeze/guides/03_pandas_frequency/execute-results/html.json +++ b/docs/_freeze/guides/03_pandas_frequency/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "2ed693dfd96efb6f1921eb696a64d9ee", + "hash": "92fd719798ef778f3a372013aeba4e79", "result": { "markdown": "---\ntitle: Pandas Frequencies\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\n---\n\n::: {.callout-note collapse=\"false\"}\n## How this guide benefits you\n\nThis guide covers how to use the `pandas` frequency strings within `pytimetk`. Once you understand key frequencies, you can apply them to manipulate time series data like a pro. \n:::\n\n# Pandas Frequencies\n\nPandas offers a variety of frequency strings, also known as offset aliases, to define the frequency of a time series. Here are some common frequency strings used in pandas:\n\n1. **'B'**: Business Day\n2. **'D'**: Calendar day\n3. **'W'**: Weekly\n4. **'M'**: Month end\n5. **'BM'**: Business month end\n6. **'MS'**: Month start\n7. **'BMS'**: Business month start\n8. **'Q'**: Quarter end\n9. **'BQ'**: Business quarter end\n10. **'QS'**: Quarter start\n11. **'BQS'**: Business quarter start\n12. **'A' or 'Y'**: Year end\n13. **'BA' or 'BY'**: Business year end\n14. **'AS' or 'YS'**: Year start\n15. **'BAS' or 'BYS'**: Business year start\n16. **'H'**: Hourly\n17. **'T' or 'min'**: Minutely\n18. **'S'**: Secondly\n19. **'L' or 'ms'**: Milliseconds\n20. **'U'**: Microseconds\n21. **'N'**: Nanoseconds\n\n### Custom Frequencies:\n- You can also create custom frequencies by combining base frequencies, like:\n - **'2D'**: Every 2 days\n - **'3W'**: Every 3 weeks\n - **'4H'**: Every 4 hours\n - **'1H30T'**: Every 1 hour and 30 minutes\n\n### Compound Frequencies:\n- You can combine multiple frequencies by adding them together.\n - **'1D1H'**: 1 day and 1 hour\n - **'1H30T'**: 1 hour and 30 minutes\n\n### Example:\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\n\n# Creating a date range with daily frequency\ndate_range_daily = pd.date_range(start='2023-01-01', end='2023-01-10', freq='D')\n\ndate_range_daily\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```\nDatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',\n '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',\n '2023-01-09', '2023-01-10'],\n dtype='datetime64[ns]', freq='D')\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Creating a date range with 2 days frequency\ndate_range_two_days = pd.date_range(start='2023-01-01', end='2023-01-10', freq='2D')\n\ndate_range_two_days\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```\nDatetimeIndex(['2023-01-01', '2023-01-03', '2023-01-05', '2023-01-07',\n '2023-01-09'],\n dtype='datetime64[ns]', freq='2D')\n```\n:::\n:::\n\n\nThese frequency strings help in resampling, creating date ranges, and handling time-series data efficiently in pandas.\n\n# Timetk Incorporates Pandas Frequencies\n\nNow that you've seen pandas frequencies, you'll see them pop up in many of the `pytimetk` functions. \n\n### Example: Padding Dates\n\nThis example shows how to use Pandas frequencies inside of `pytimetk` functions. \n\nWe'll use `pad_by_time` to show how to use freq to fill in missing dates. \n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# DataFrame with missing dates\nimport pandas as pd\n\ndata = {\n # '2023-09-05' is missing\n 'datetime': ['2023-09-01', '2023-09-02', '2023-09-03', '2023-09-04', '2023-09-06'], \n 'value': [10, 30, 40, 50, 60]\n}\n\ndf = pd.DataFrame(data)\ndf['datetime'] = pd.to_datetime(df['datetime'])\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datetimevalue
02023-09-0110
12023-09-0230
22023-09-0340
32023-09-0450
42023-09-0660
\n
\n```\n:::\n:::\n\n\nWe can resample to fill in the missing day using `pad_by_time` with `freq = 'D'`.\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\nimport pytimetk as tk\n\ndf.pad_by_time('datetime', freq = 'D')\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datetimevalue
02023-09-0110.0
12023-09-0230.0
22023-09-0340.0
32023-09-0450.0
42023-09-05NaN
52023-09-0660.0
\n
\n```\n:::\n:::\n\n\nWhat about resampling every 12 hours? Just set `freq = '12H'.\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\nimport pytimetk as tk\n\ndf.pad_by_time('datetime', freq = '12H')\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datetimevalue
02023-09-01 00:00:0010.0
12023-09-01 12:00:00NaN
22023-09-02 00:00:0030.0
32023-09-02 12:00:00NaN
42023-09-03 00:00:0040.0
52023-09-03 12:00:00NaN
62023-09-04 00:00:0050.0
72023-09-04 12:00:00NaN
82023-09-05 00:00:00NaN
92023-09-05 12:00:00NaN
102023-09-06 00:00:0060.0
\n
\n```\n:::\n:::\n\n\nYou'll see these pandas frequencies come up as the parameter `freq` in many `pytimetk` functions. \n\n# Next Steps\n\nCheck out the [Data Wrangling Guide next.](/guides/04_wrangling.html)\n\n# More Coming Soon...\n\nWe are in the early stages of development. But it's obvious the potential for `pytimetk` now in Python. 🐍\n\n- Please [⭐ us on GitHub](https://github.com/business-science/pytimetk) (it takes 2-seconds and means a lot). \n- To make requests, please see our [Project Roadmap GH Issue #2](https://github.com/business-science/pytimetk/issues/2). You can make requests there. \n- Want to contribute? [See our contributing guide here.](/contributing.html) \n\n", "supporting": [ - "03_pandas_frequency_files\\figure-html" + "03_pandas_frequency_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/guides/04_wrangling/execute-results/html.json b/docs/_freeze/guides/04_wrangling/execute-results/html.json index 9ecaf1be..87e66644 100644 --- a/docs/_freeze/guides/04_wrangling/execute-results/html.json +++ b/docs/_freeze/guides/04_wrangling/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "e75c8a8cb0322ad0c9a4c7b343fedb4d", + "hash": "523aef7285bb4538095143ce684eb5aa", "result": { "markdown": "---\ntitle: Data Wrangling\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\n---\n\nThis section will cover data wrangling for timeseries using pytimetk. We'll show examples for the following functions:\n\n* `summarize_by_time()`\n* `future_frame()`\n* `pad_by_time()`\n\n::: {.callout-note collapse=\"false\"}\n## Perequisite\n\nBefore proceeding, be sure to review the Timetk Basics section if you haven't already.\n\n:::\n\n# Summarize by Time\n\n`summarize_by_time()` aggregates time series data from lower frequency (time periods) to higher frequency.\n\n**Load Libraries & Data**\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\n# import libraries\nimport pytimetk as tk\nimport pandas as pd\nimport numpy as np\n\n# import data\nm4_daily_df = tk.load_dataset('m4_daily', parse_dates = ['date'])\n\nprint(m4_daily_df.head())\nprint('\\nLength of the full dataset:', len(m4_daily_df))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n id date value\n0 D10 2014-07-03 2076.2\n1 D10 2014-07-04 2073.4\n2 D10 2014-07-05 2048.7\n3 D10 2014-07-06 2048.9\n4 D10 2014-07-07 2006.4\n\nLength of the full dataset: 9743\n```\n:::\n:::\n\n\n::: {.callout-tip collapse=\"false\"}\n## Help Doc Info: `summarize_by_time`\n\nUse `help(tk.summarize_by_time)` to review additional helpful documentation.\n\n:::\n\n## Basic Example\n\nThe `m4_daily` dataset has a **daily** frequency. Say we are interested in forecasting at the **weekly** level. We can use `summarize_by_time()` to aggregate to a weekly level\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# summarize by time: daily to weekly\nsummarized_df = m4_daily_df \\\n\t.summarize_by_time(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tfreq = 'W',\n\t\tagg_func = 'sum'\n\t)\n\nprint(summarized_df.head())\nprint('\\nLength of the full dataset:', len(summarized_df))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n date value\n0 1978-06-25 27328.12\n1 1978-07-02 63621.88\n2 1978-07-09 63334.38\n3 1978-07-16 63737.51\n4 1978-07-23 64718.76\n\nLength of the full dataset: 1977\n```\n:::\n:::\n\n\nThe data has now been aggregated at the weekly level. Notice we now have 1977 rows, compared to full dataset which had 9743 rows.\n\n\n## Additional Aggregate Functions\n`summarize_by_time()` can take additional aggregate functions in the `agg_func` argument.\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# summarize by time with additional aggregate functions\nsummarized_multiple_agg_df = m4_daily_df \\\n\t.summarize_by_time(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tfreq = 'W',\n\t\tagg_func = ['sum', 'min', 'max']\n\t)\n\nsummarized_multiple_agg_df.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue_sumvalue_minvalue_max
01978-06-2527328.129103.129115.62
11978-07-0263621.889046.889115.62
21978-07-0963334.389028.129096.88
31978-07-1663737.519075.009146.88
41978-07-2364718.769171.889315.62
\n
\n```\n:::\n:::\n\n\n## Summarize by Time with Grouped Time Series\n`summarize_by_time()` also works with groups.\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# summarize by time with groups and additional aggregate functions\ngrouped_summarized_df = (\n m4_daily_df\n .groupby('id')\n .summarize_by_time(\n date_column = 'date',\n value_column = 'value',\n freq = 'W',\n agg_func = [\n 'sum',\n 'min',\n ('q25', lambda x: np.quantile(x, 0.25)),\n\t\t\t\t'median',\n ('q75', lambda x: np.quantile(x, 0.75)),\n 'max'\n ],\n )\n)\n\ngrouped_summarized_df.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue_sumvalue_minvalue_q25value_medianvalue_q75value_max
0D102014-07-068247.22048.72048.852061.152074.102076.2
1D102014-07-1314040.81978.82003.952007.402013.802019.1
2D102014-07-2013867.61943.01955.301988.302005.602014.5
3D102014-07-2713266.31876.01887.151891.001895.851933.3
4D102014-08-0313471.21886.21914.601920.001939.551956.7
\n
\n```\n:::\n:::\n\n\n# Future Frame\n\n`future_frame()` can be used to extend timeseries data beyond the existing index (date). This is necessary when trying to make future predictions.\n\n::: {.callout-tip collapse=\"false\"}\n## Help Doc Info: `future_frame()`\n\nUse `help(tk.future_frame)` to review additional helpful documentation.\n\n:::\n\n\n## Basic Example\nWe'll continue with our use of the `m4_daily_df` dataset. Recall we've alread aggregated at the **weekly** level (`summarized_df`). Lets checkout the last week in the `summarized_df`:\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n# last week in dataset\nsummarized_df \\\n .sort_values(by = 'date', ascending = True) \\\n .iloc[: -1] \\\n .tail(1)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
19752016-05-0117959.8
\n
\n```\n:::\n:::\n\n\n::: {.callout-note collapse=\"false\"}\n## `iloc()`\n\n`iloc[: -1]` is used to filter out the last row and keep only dates that are the start of the week.\n\n:::\n\nWe can see that the last week is the week of 2016-05-01. Now say we wanted to forecast the next 8 weeks. We can extend the dataset beyound the week of 2016-05-01:\n\n::: {.cell execution_count=6}\n``` {.python .cell-code}\n# extend dataset by 12 weeks\nsummarized_extended_df = summarized_df \\\n\t.future_frame(\n\t\tdate_column = 'date',\n\t\tlength_out = 8\n\t)\n\nsummarized_extended_df\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
01978-06-2527328.12
11978-07-0263621.88
21978-07-0963334.38
31978-07-1663737.51
41978-07-2364718.76
.........
19802016-06-05NaN
19812016-06-12NaN
19822016-06-19NaN
19832016-06-26NaN
19842016-07-03NaN
\n

1985 rows Γ— 2 columns

\n
\n```\n:::\n:::\n\n\nTo get only the future data, we can filter the dataset for where `value` is missing (`np.nan`).\n\n::: {.cell execution_count=7}\n``` {.python .cell-code}\n# get only future data\nsummarized_extended_df \\\n\t.query('value.isna()')\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
19772016-05-15NaN
19782016-05-22NaN
19792016-05-29NaN
19802016-06-05NaN
19812016-06-12NaN
19822016-06-19NaN
19832016-06-26NaN
19842016-07-03NaN
\n
\n```\n:::\n:::\n\n\n## Future Frame with Grouped Time Series\n`future_frame()` also works for grouped time series. We can see an example using our grouped summarized dataset (`grouped_summarized_df`) from earlier:\n\n::: {.cell execution_count=8}\n``` {.python .cell-code}\n# future frame with grouped time series\ngrouped_summarized_df[['id', 'date', 'value_sum']] \\\n\t.groupby('id') \\\n\t.future_frame(\n\t\tdate_column = 'date',\n\t\tlength_out = 8\n\t) \\\n\t.query('value_sum.isna()') # filtering to return only the future data\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue_sum
97D102016-05-15NaN
98D102016-05-22NaN
99D102016-05-29NaN
100D102016-06-05NaN
101D102016-06-12NaN
102D102016-06-19NaN
103D102016-06-26NaN
104D102016-07-03NaN
600D1602011-07-10NaN
601D1602011-07-17NaN
602D1602011-07-24NaN
603D1602011-07-31NaN
604D1602011-08-07NaN
605D1602011-08-14NaN
606D1602011-08-21NaN
607D1602011-08-28NaN
98D4101980-05-11NaN
99D4101980-05-18NaN
100D4101980-05-25NaN
101D4101980-06-01NaN
102D4101980-06-08NaN
103D4101980-06-15NaN
104D4101980-06-22NaN
105D4101980-06-29NaN
600D5002012-09-30NaN
601D5002012-10-07NaN
602D5002012-10-14NaN
603D5002012-10-21NaN
604D5002012-10-28NaN
605D5002012-11-04NaN
606D5002012-11-11NaN
607D5002012-11-18NaN
\n
\n```\n:::\n:::\n\n\n# Pad by Time\n\n`pad_by_time()` can be used to add rows where timestamps are missing. For example, when working with sales data that may have missing values on weekends or holidays.\n\n::: {.callout-tip collapse=\"false\"}\n## Help Doc Info: `pad_by_time()`\n\nUse `help(tk.pad_by_time)` to review additional helpful documentation.\n\n:::\n\n## Basic Example\nLet's start with a basic example to see how `pad_by_time()` works. We'll create some sample data with missing timestamps:\n\n::: {.cell execution_count=9}\n``` {.python .cell-code}\n# libraries\nimport pytimetk as tk\nimport pandas as pd\nimport numpy as np\n\n# sample quarterly data with missing timestamp for Q3\ndates = pd.to_datetime([\"2021-01-01\", \"2021-04-01\", \"2021-10-01\"])\nvalue = range(len(dates))\n\ndf = pd.DataFrame({\n 'date': dates,\n 'value': range(len(dates))\n})\n\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
02021-01-010
12021-04-011
22021-10-012
\n
\n```\n:::\n:::\n\n\nNow we can use `pad_by_time()` to fill in the missing timestamp:\n\n::: {.cell execution_count=10}\n``` {.python .cell-code}\n# pad by time\ndf \\\n\t.pad_by_time(\n\t\tdate_column = 'date',\n\t\tfreq = 'QS' # specifying quarter start frequency\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
02021-01-010.0
12021-04-011.0
22021-07-01NaN
32021-10-012.0
\n
\n```\n:::\n:::\n\n\nWe can also specify shorter time frequency:\n\n::: {.cell execution_count=11}\n``` {.python .cell-code}\n# pad by time with shorter frequency\ndf \\\n\t.pad_by_time(\n\t\tdate_column = 'date',\n\t\tfreq = 'MS' # specifying month start frequency\n\t) \\\n\t.assign(value = lambda x: x['value'].fillna(0)) # replace NaN with 0\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
02021-01-010.0
12021-02-010.0
22021-03-010.0
32021-04-011.0
42021-05-010.0
52021-06-010.0
62021-07-010.0
72021-08-010.0
82021-09-010.0
92021-10-012.0
\n
\n```\n:::\n:::\n\n\n## Pad by Time with Grouped Time Series\n`pad_by_time()` can also be used with grouped time series. Let's use the `stocks_daily` dataset to showcase an example:\n\n::: {.cell execution_count=12}\n``` {.python .cell-code}\n# load dataset\nstocks_df = tk.load_dataset('stocks_daily', parse_dates = ['date'])\n\n# pad by time\nstocks_df \\\n\t.groupby('symbol') \\\n\t.pad_by_time(\n\t\tdate_column = 'date',\n\t\tfreq = 'D'\n\t) \\\n\t.assign(id = lambda x: x['symbol'].ffill())\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datesymbolopenhighlowclosevolumeadjustedid
02013-01-02AAPL19.77928519.82142819.34392919.608213560518000.016.791180AAPL
12013-01-03AAPL19.56714219.63107119.32142819.360714352965200.016.579241AAPL
22013-01-04AAPL19.17750019.23678618.77964218.821428594333600.016.117437AAPL
32013-01-05AAPLNaNNaNNaNNaNNaNNaNAAPL
42013-01-06AAPLNaNNaNNaNNaNNaNNaNAAPL
..............................
234852023-09-17NVDANaNNaNNaNNaNNaNNaNNVDA
234862023-09-18NVDA427.480011442.420013420.000000439.66000450027100.0439.660004NVDA
234872023-09-19NVDA438.329987439.660004430.019989435.20001237306400.0435.200012NVDA
234882023-09-20NVDA436.000000439.029999422.230011422.39001536710800.0422.390015NVDA
234892023-09-21NVDA415.829987421.000000409.799988410.17001344893000.0410.170013NVDA
\n

23490 rows Γ— 9 columns

\n
\n```\n:::\n:::\n\n\nTo replace NaN with 0 in a dataframe with multiple columns:\n\n::: {.cell execution_count=13}\n``` {.python .cell-code}\nfrom functools import partial\n\n# columns to replace NaN with 0\ncols_to_fill = ['open', 'high', 'low', 'close', 'volume', 'adjusted']\n\n# define a function to fillna\ndef fill_na_col(df, col):\n return df[col].fillna(0)\n\n# pad by time and replace NaN with 0\nstocks_df \\\n\t.groupby('symbol') \\\n\t.pad_by_time(\n\t\tdate_column = 'date',\n\t\tfreq = 'D'\n\t) \\\n\t.assign(id = lambda x: x['symbol'].ffill()) \\\n\t.assign(**{col: partial(fill_na_col, col=col) for col in cols_to_fill})\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datesymbolopenhighlowclosevolumeadjustedid
02013-01-02AAPL19.77928519.82142819.34392919.608213560518000.016.791180AAPL
12013-01-03AAPL19.56714219.63107119.32142819.360714352965200.016.579241AAPL
22013-01-04AAPL19.17750019.23678618.77964218.821428594333600.016.117437AAPL
32013-01-05AAPL0.0000000.0000000.0000000.0000000.00.000000AAPL
42013-01-06AAPL0.0000000.0000000.0000000.0000000.00.000000AAPL
..............................
234852023-09-17NVDA0.0000000.0000000.0000000.0000000.00.000000NVDA
234862023-09-18NVDA427.480011442.420013420.000000439.66000450027100.0439.660004NVDA
234872023-09-19NVDA438.329987439.660004430.019989435.20001237306400.0435.200012NVDA
234882023-09-20NVDA436.000000439.029999422.230011422.39001536710800.0422.390015NVDA
234892023-09-21NVDA415.829987421.000000409.799988410.17001344893000.0410.170013NVDA
\n

23490 rows Γ— 9 columns

\n
\n```\n:::\n:::\n\n\n# Next Steps\n\nCheck out the [Adding Features (Augmenting) Time Series Data Guide next.](/guides/05_augmenting.html)\n\n# More Coming Soon...\n\nWe are in the early stages of development. But it's obvious the potential for `pytimetk` now in Python. 🐍\n\n- Please [⭐ us on GitHub](https://github.com/business-science/pytimetk) (it takes 2-seconds and means a lot). \n- To make requests, please see our [Project Roadmap GH Issue #2](https://github.com/business-science/pytimetk/issues/2). You can make requests there. \n- Want to contribute? [See our contributing guide here.](/contributing.html) \n\n", "supporting": [ - "04_wrangling_files\\figure-html" + "04_wrangling_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/guides/05_augmenting/execute-results/html.json b/docs/_freeze/guides/05_augmenting/execute-results/html.json index c17b7342..6a9c3d92 100644 --- a/docs/_freeze/guides/05_augmenting/execute-results/html.json +++ b/docs/_freeze/guides/05_augmenting/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "e97f3d638f4a394cb4414fb6abdf703a", + "hash": "5942d79f0504e7b3389e09644c7412bd", "result": { - "markdown": "---\ntitle: Adding Features (Augmenting)\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\n---\n\nThis section will cover the `augment` set of functions, use to add many additional time series features to a dataset. We'll cover how to use the following set of functions\n\n* `augment_lags()`\n* `augment_leads()`\n* `augment_rolling()`\n* `augment_time_series_signature()`\n* `augment_holiday_signature()`\n* `augment_fourier()`\n\n# Augment Lags / Leads\n**Lags** are commonly used in time series forecasting to incorportate the past values of a feature as predictors. **Leads**, while not as common as Lags in time series might be useful in scenarios where you want to predict a future value based on other future values.\n\n::: {.callout-tip collapse=\"false\"}\n## Help Doc Info: `augment_lag()`, `augment_leads()`\n\nUse `help(tk.augment_lags)` and `help(tk.augment_leads)` to review additional helpful documentation.\n\n:::\n\n## Basic Examples\n\nAdd 1 or more lags / leads to a dataset:\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\n# import libraries\nimport pytimetk as tk\nimport pandas as pd\nimport numpy as np\nimport random\n\n# create sample data\ndates = pd.date_range(start = '2023-09-18', end = '2023-09-24')\nvalues = [random.randint(10, 50) for _ in range(7)]\n\ndf = pd.DataFrame({\n 'date': dates,\n 'value': values\n})\n\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
02023-09-1823
12023-09-1925
22023-09-2048
32023-09-2121
42023-09-2240
52023-09-2316
62023-09-2435
\n
\n```\n:::\n:::\n\n\nCreate lag / lead of 3 days:\n\n:::{.panel-tabset groups=\"augment-leads-lags\"}\n## Lag\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# augment lag\ndf \\\n .augment_lags(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tlags = 3\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_lag_3
02023-09-1823NaN
12023-09-1925NaN
22023-09-2048NaN
32023-09-212123.0
42023-09-224025.0
52023-09-231648.0
62023-09-243521.0
\n
\n```\n:::\n:::\n\n\n## Lead\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# augment leads\ndf \\\n .augment_leads(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tleads = 3\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_lead_3
02023-09-182321.0
12023-09-192540.0
22023-09-204816.0
32023-09-212135.0
42023-09-2240NaN
52023-09-2316NaN
62023-09-2435NaN
\n
\n```\n:::\n:::\n\n\n:::\n\nWe can create multiple lag / lead values for a single time series:\n\n:::{.panel-tabset groups=\"augment-leads-lags-multiple\"}\n## Lag\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# multiple lagged values for a single time series\ndf \\\n\t.augment_lags(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tlags = (1, 3)\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_lag_1value_lag_2value_lag_3
02023-09-1823NaNNaNNaN
12023-09-192523.0NaNNaN
22023-09-204825.023.0NaN
32023-09-212148.025.023.0
42023-09-224021.048.025.0
52023-09-231640.021.048.0
62023-09-243516.040.021.0
\n
\n```\n:::\n:::\n\n\n## Lead\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n# multiple leads values for a single time series\ndf \\\n\t.augment_leads(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tleads = (1, 3)\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_lead_1value_lead_2value_lead_3
02023-09-182325.048.021.0
12023-09-192548.021.040.0
22023-09-204821.040.016.0
32023-09-212140.016.035.0
42023-09-224016.035.0NaN
52023-09-231635.0NaNNaN
62023-09-2435NaNNaNNaN
\n
\n```\n:::\n:::\n\n\n:::\n\n\n## Augment Lags / Leads For Grouped Time Series\n\n`augment_lags()` and `augment_leads()` also works for grouped time series data. Lets use the `m4_daily_df` dataset to showcase examples:\n\n::: {.cell execution_count=6}\n``` {.python .cell-code}\n# load m4_daily_df\nm4_daily_df = tk.load_dataset('m4_daily', parse_dates = ['date'])\n```\n:::\n\n\n:::{.panel-tabset groups=\"augment-leads-lags-group\"}\n## Lag\n\n::: {.cell execution_count=7}\n``` {.python .cell-code}\n# agument lags for grouped time series\nm4_daily_df \\\n\t.groupby(\"id\") \\\n .augment_lags(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tlags = (1, 7)\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_lag_1value_lag_2value_lag_3value_lag_4value_lag_5value_lag_6value_lag_7
0D102014-07-032076.2NaNNaNNaNNaNNaNNaNNaN
1D102014-07-042073.42076.2NaNNaNNaNNaNNaNNaN
2D102014-07-052048.72073.42076.2NaNNaNNaNNaNNaN
3D102014-07-062048.92048.72073.42076.2NaNNaNNaNNaN
4D102014-07-072006.42048.92048.72073.42076.2NaNNaNNaN
.................................
9738D5002012-09-199418.89431.99437.79474.69359.29286.99265.49091.4
9739D5002012-09-209365.79418.89431.99437.79474.69359.29286.99265.4
9740D5002012-09-219445.99365.79418.89431.99437.79474.69359.29286.9
9741D5002012-09-229497.99445.99365.79418.89431.99437.79474.69359.2
9742D5002012-09-239545.39497.99445.99365.79418.89431.99437.79474.6
\n

9743 rows Γ— 10 columns

\n
\n```\n:::\n:::\n\n\n## Lead\n\n::: {.cell execution_count=8}\n``` {.python .cell-code}\n# augment leads for grouped time series\nm4_daily_df \\\n\t.groupby(\"id\") \\\n .augment_leads(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tleads = (1, 7)\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_lead_1value_lead_2value_lead_3value_lead_4value_lead_5value_lead_6value_lead_7
0D102014-07-032076.22073.42048.72048.92006.42017.62019.12007.4
1D102014-07-042073.42048.72048.92006.42017.62019.12007.42010.0
2D102014-07-052048.72048.92006.42017.62019.12007.42010.02001.5
3D102014-07-062048.92006.42017.62019.12007.42010.02001.51978.8
4D102014-07-072006.42017.62019.12007.42010.02001.51978.81988.3
.................................
9738D5002012-09-199418.89365.79445.99497.99545.3NaNNaNNaN
9739D5002012-09-209365.79445.99497.99545.3NaNNaNNaNNaN
9740D5002012-09-219445.99497.99545.3NaNNaNNaNNaNNaN
9741D5002012-09-229497.99545.3NaNNaNNaNNaNNaNNaN
9742D5002012-09-239545.3NaNNaNNaNNaNNaNNaNNaN
\n

9743 rows Γ— 10 columns

\n
\n```\n:::\n:::\n\n\n:::\n\n\n# Augment Rolling\n\nA **Rolling Window** refers to a specific-sized subset of time series data that moves sequentially over the dataset.\n\nRolling windows play a crucial role in time series forecasting due to their ability to smooth out data, highlight seasonality, and detect anomalies.\n\n`augment_rolling()` applies multiple rolling window functions with varying window sizes to time series data.\n\n::: {.callout-tip collapse=\"false\"}\n## Help Doc Info: `augment_rolling()`\n\nUse `help(tk.augment_rolling)` to review additional helpful documentation.\n\n:::\n\n\n## Basic Examples\n\nWe'll continue with the use of our sample `df` created earlier:\n\n::: {.cell execution_count=9}\n``` {.python .cell-code}\n# window = 3 days, window function = mean\ndf \\\n\t.augment_rolling(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\twindow = 3,\n\t\twindow_func = 'mean'\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_rolling_mean_win_3
02023-09-1823NaN
12023-09-1925NaN
22023-09-204832.000000
32023-09-212131.333333
42023-09-224036.333333
52023-09-231625.666667
62023-09-243530.333333
\n
\n```\n:::\n:::\n\n\nIt is important to understand how the `center` parameter in `augment_rolling()` works.\n\n::: {.callout-important collapse=\"false\"}\n## `center`\n\nWhen set to `True` (default) the value of the rolling window will be **centered**, meaning that the value at the center of the window will be used as the result.\nWhen set to `False` (default) the rolling window will **not be centered**, meaning that the value at the end of the window will be used as the result.\n\n:::\n\nLets see an example:\n\n:::{.panel-tabset groups=\"augment-rolling\"}\n\n## Augment Rolling: Center = True\n\n::: {.cell execution_count=10}\n``` {.python .cell-code}\n# agument rolling: center = true\ndf \\\n\t.augment_rolling(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\twindow = 3,\n\t\twindow_func = 'mean',\n\t\tcenter = True\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_rolling_mean_win_3
02023-09-1823NaN
12023-09-192532.000000
22023-09-204831.333333
32023-09-212136.333333
42023-09-224025.666667
52023-09-231630.333333
62023-09-2435NaN
\n
\n```\n:::\n:::\n\n\nNote that we are using a 3 day rolling window and applying a `mean` to `value`. In simplier terms, `value_rolling_mean_win_3` is a 3 day rolling average of `value` with `center` set to `True`. Thus the function starts computing the `mean` from `2023-09-19`\n\n## Augment Rolling: Center = False\n\n::: {.cell execution_count=11}\n``` {.python .cell-code}\n# agument rolling: center = false\ndf \\\n\t.augment_rolling(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\twindow = 3,\n\t\twindow_func = 'mean',\n\t\tcenter = True\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_rolling_mean_win_3
02023-09-1823NaN
12023-09-192532.000000
22023-09-204831.333333
32023-09-212136.333333
42023-09-224025.666667
52023-09-231630.333333
62023-09-2435NaN
\n
\n```\n:::\n:::\n\n\nNote that we are using a 3 day rolling window and applying a `mean` to `value`. In simplier terms, `value_rolling_mean_win_3` is a 3 day rolling average of `value` with `center` set to `False`. Thus the function starts computing the `mean` from `2023-09-20`. The same `value` for `2023-19-18` and `2023-09-19` are returned as `value_rolling_mean_win_3` since it did not detected the third to apply the 3 day rolling average.\n\n:::\n\n\n## Augment Rolling with Multiple Windows and Window Functions\n\nMultiple window functions can be passed to the `window` and `window_func` parameters:\n\n::: {.cell execution_count=12}\n``` {.python .cell-code}\n# augment rolling: window of 2 & 7 days, window_func of mean and standard deviation\nm4_daily_df \\\n\t.query('id == \"D10\"') \\\n .augment_rolling(\n date_column = 'date',\n value_column = 'value',\n window = [2,7],\n window_func = ['mean', ('std', lambda x: x.std())]\n )\n\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_rolling_mean_win_2value_rolling_std_win_2value_rolling_mean_win_7value_rolling_std_win_7
0D102014-07-032076.2NaNNaNNaNNaN
1D102014-07-042073.42074.801.402074.8000001.400000
2D102014-07-052048.72061.0512.352066.10000012.356645
3D102014-07-062048.92048.800.102061.80000013.037830
4D102014-07-072006.42027.6521.252050.72000025.041038
........................
669D102016-05-022630.72615.8514.852579.47142928.868159
670D102016-05-032649.32640.009.302594.80000033.081631
671D102016-05-042631.82640.558.752601.37142935.145563
672D102016-05-052622.52627.154.652607.45714334.584508
673D102016-05-062620.12621.301.202618.32857122.923270
\n

674 rows Γ— 7 columns

\n
\n```\n:::\n:::\n\n\n## Augment Rolling with Grouped Time Series\n\n`agument_rolling` can be used on grouped time series data:\n\n::: {.cell execution_count=13}\n``` {.python .cell-code}\n## augment rolling on grouped time series: window of 2 & 7 days, window_func of mean and standard deviation\nm4_daily_df \\\n\t.groupby('id') \\\n .augment_rolling(\n date_column = 'date',\n value_column = 'value',\n window = [2,7],\n window_func = ['mean', ('std', lambda x: x.std())]\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_rolling_mean_win_2value_rolling_std_win_2value_rolling_mean_win_7value_rolling_std_win_7
0D102014-07-032076.2NaNNaNNaNNaN
1D102014-07-042073.42074.801.402074.8000001.400000
2D102014-07-052048.72061.0512.352066.10000012.356645
3D102014-07-062048.92048.800.102061.80000013.037830
4D102014-07-072006.42027.6521.252050.72000025.041038
........................
9738D5002012-09-199418.89425.356.559382.07142974.335988
9739D5002012-09-209365.79392.2526.559396.40000058.431303
9740D5002012-09-219445.99405.8040.109419.11428639.184451
9741D5002012-09-229497.99471.9026.009438.92857138.945336
9742D5002012-09-239545.39521.6023.709449.02857153.379416
\n

9743 rows Γ— 7 columns

\n
\n```\n:::\n:::\n\n\n# Augment Time Series Signature\n\n`augment_timeseries_signature()` is designed to assist in generating additional features\nfrom a given date column.\n\n::: {.callout-tip collapse=\"false\"}\n## Help Doc Info: `augment_timeseries_signature()`\n\nUse `help(tk.augment_timeseries_signature)` to review additional helpful documentation.\n\n:::\n\n## Basic Example\n\nWe'll showcase an example using the `m4_daily_df` dataset by generating 29 additional features from the `date` column:\n\n::: {.cell execution_count=14}\n``` {.python .cell-code}\n# augment time series signature\nm4_daily_df \\\n .query('id == \"D10\"') \\\n\t.augment_timeseries_signature(\n\t\tdate_column = 'date'\n\t) \\\n .head()\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluedate_index_numdate_yeardate_year_isodate_yearstartdate_yearenddate_leapyeardate_half...date_mdaydate_qdaydate_ydaydate_weekenddate_hourdate_minutedate_seconddate_mseconddate_nseconddate_am_pm
0D102014-07-032076.21404345600201420140002...33184000000am
1D102014-07-042073.41404432000201420140002...44185000000am
2D102014-07-052048.71404518400201420140002...55186000000am
3D102014-07-062048.91404604800201420140002...66187100000am
4D102014-07-072006.41404691200201420140002...77188000000am
\n

5 rows Γ— 32 columns

\n
\n```\n:::\n:::\n\n\n# Augment Holiday Signature\n\n`augment_holiday_signature()` is used to flag holidays from a date column based on date and country.\n\n::: {.callout-tip collapse=\"false\"}\n## Help Doc Info: `augment_holiday_signature()`\n\nUse `help(tk.augment_holiday_signature)` to review additional helpful documentation.\n\n:::\n\n## Basic Example\n\nWe'll showcase an example using some sample data:\n\n::: {.cell execution_count=15}\n``` {.python .cell-code}\n# create sample data\ndates = pd.date_range(start = '2022-12-25', end = '2023-01-05')\n\ndf = pd.DataFrame({'date': dates})\n\n# augment time series signature: USA\ndf \\\n .augment_holiday_signature(\n\t\tdate_column = 'date',\n\t\tcountry_name = 'UnitedStates'\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
dateis_holidaybefore_holidayafter_holidayholiday_name
02022-12-25110Christmas Day
12022-12-26101Christmas Day (Observed)
22022-12-27001NaN
32022-12-28000NaN
42022-12-29000NaN
52022-12-30000NaN
62022-12-31010NaN
72023-01-01110New Year's Day
82023-01-02101New Year's Day (Observed)
92023-01-03001NaN
102023-01-04000NaN
112023-01-05000NaN
\n
\n```\n:::\n:::\n\n\n# Augment Fourier\n\nComing soon ....\n\n", + "markdown": "---\ntitle: Adding Features (Augmenting)\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\n---\n\nThis section will cover the `augment` set of functions, use to add many additional time series features to a dataset. We'll cover how to use the following set of functions\n\n* `augment_lags()`\n* `augment_leads()`\n* `augment_rolling()`\n* `augment_time_series_signature()`\n* `augment_holiday_signature()`\n* `augment_fourier()`\n\n# Augment Lags / Leads\n**Lags** are commonly used in time series forecasting to incorportate the past values of a feature as predictors. **Leads**, while not as common as Lags in time series might be useful in scenarios where you want to predict a future value based on other future values.\n\n::: {.callout-tip collapse=\"false\"}\n## Help Doc Info: `augment_lag()`, `augment_leads()`\n\nUse `help(tk.augment_lags)` and `help(tk.augment_leads)` to review additional helpful documentation.\n\n:::\n\n## Basic Examples\n\nAdd 1 or more lags / leads to a dataset:\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\n# import libraries\nimport pytimetk as tk\nimport pandas as pd\nimport numpy as np\nimport random\n\n# create sample data\ndates = pd.date_range(start = '2023-09-18', end = '2023-09-24')\nvalues = [random.randint(10, 50) for _ in range(7)]\n\ndf = pd.DataFrame({\n 'date': dates,\n 'value': values\n})\n\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
02023-09-1811
12023-09-1931
22023-09-2047
32023-09-2134
42023-09-2230
52023-09-2330
62023-09-2438
\n
\n```\n:::\n:::\n\n\nCreate lag / lead of 3 days:\n\n:::{.panel-tabset groups=\"augment-leads-lags\"}\n## Lag\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# augment lag\ndf \\\n .augment_lags(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tlags = 3\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_lag_3
02023-09-1811NaN
12023-09-1931NaN
22023-09-2047NaN
32023-09-213411.0
42023-09-223031.0
52023-09-233047.0
62023-09-243834.0
\n
\n```\n:::\n:::\n\n\n## Lead\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# augment leads\ndf \\\n .augment_leads(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tleads = 3\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_lead_3
02023-09-181134.0
12023-09-193130.0
22023-09-204730.0
32023-09-213438.0
42023-09-2230NaN
52023-09-2330NaN
62023-09-2438NaN
\n
\n```\n:::\n:::\n\n\n:::\n\nWe can create multiple lag / lead values for a single time series:\n\n:::{.panel-tabset groups=\"augment-leads-lags-multiple\"}\n## Lag\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# multiple lagged values for a single time series\ndf \\\n\t.augment_lags(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tlags = (1, 3)\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_lag_1value_lag_2value_lag_3
02023-09-1811NaNNaNNaN
12023-09-193111.0NaNNaN
22023-09-204731.011.0NaN
32023-09-213447.031.011.0
42023-09-223034.047.031.0
52023-09-233030.034.047.0
62023-09-243830.030.034.0
\n
\n```\n:::\n:::\n\n\n## Lead\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n# multiple leads values for a single time series\ndf \\\n\t.augment_leads(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tleads = (1, 3)\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_lead_1value_lead_2value_lead_3
02023-09-181131.047.034.0
12023-09-193147.034.030.0
22023-09-204734.030.030.0
32023-09-213430.030.038.0
42023-09-223030.038.0NaN
52023-09-233038.0NaNNaN
62023-09-2438NaNNaNNaN
\n
\n```\n:::\n:::\n\n\n:::\n\n\n## Augment Lags / Leads For Grouped Time Series\n\n`augment_lags()` and `augment_leads()` also works for grouped time series data. Lets use the `m4_daily_df` dataset to showcase examples:\n\n::: {.cell execution_count=6}\n``` {.python .cell-code}\n# load m4_daily_df\nm4_daily_df = tk.load_dataset('m4_daily', parse_dates = ['date'])\n```\n:::\n\n\n:::{.panel-tabset groups=\"augment-leads-lags-group\"}\n## Lag\n\n::: {.cell execution_count=7}\n``` {.python .cell-code}\n# agument lags for grouped time series\nm4_daily_df \\\n\t.groupby(\"id\") \\\n .augment_lags(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tlags = (1, 7)\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_lag_1value_lag_2value_lag_3value_lag_4value_lag_5value_lag_6value_lag_7
0D102014-07-032076.2NaNNaNNaNNaNNaNNaNNaN
1D102014-07-042073.42076.2NaNNaNNaNNaNNaNNaN
2D102014-07-052048.72073.42076.2NaNNaNNaNNaNNaN
3D102014-07-062048.92048.72073.42076.2NaNNaNNaNNaN
4D102014-07-072006.42048.92048.72073.42076.2NaNNaNNaN
.................................
9738D5002012-09-199418.89431.99437.79474.69359.29286.99265.49091.4
9739D5002012-09-209365.79418.89431.99437.79474.69359.29286.99265.4
9740D5002012-09-219445.99365.79418.89431.99437.79474.69359.29286.9
9741D5002012-09-229497.99445.99365.79418.89431.99437.79474.69359.2
9742D5002012-09-239545.39497.99445.99365.79418.89431.99437.79474.6
\n

9743 rows Γ— 10 columns

\n
\n```\n:::\n:::\n\n\n## Lead\n\n::: {.cell execution_count=8}\n``` {.python .cell-code}\n# augment leads for grouped time series\nm4_daily_df \\\n\t.groupby(\"id\") \\\n .augment_leads(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\tleads = (1, 7)\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_lead_1value_lead_2value_lead_3value_lead_4value_lead_5value_lead_6value_lead_7
0D102014-07-032076.22073.42048.72048.92006.42017.62019.12007.4
1D102014-07-042073.42048.72048.92006.42017.62019.12007.42010.0
2D102014-07-052048.72048.92006.42017.62019.12007.42010.02001.5
3D102014-07-062048.92006.42017.62019.12007.42010.02001.51978.8
4D102014-07-072006.42017.62019.12007.42010.02001.51978.81988.3
.................................
9738D5002012-09-199418.89365.79445.99497.99545.3NaNNaNNaN
9739D5002012-09-209365.79445.99497.99545.3NaNNaNNaNNaN
9740D5002012-09-219445.99497.99545.3NaNNaNNaNNaNNaN
9741D5002012-09-229497.99545.3NaNNaNNaNNaNNaNNaN
9742D5002012-09-239545.3NaNNaNNaNNaNNaNNaNNaN
\n

9743 rows Γ— 10 columns

\n
\n```\n:::\n:::\n\n\n:::\n\n\n# Augment Rolling\n\nA **Rolling Window** refers to a specific-sized subset of time series data that moves sequentially over the dataset.\n\nRolling windows play a crucial role in time series forecasting due to their ability to smooth out data, highlight seasonality, and detect anomalies.\n\n`augment_rolling()` applies multiple rolling window functions with varying window sizes to time series data.\n\n::: {.callout-tip collapse=\"false\"}\n## Help Doc Info: `augment_rolling()`\n\nUse `help(tk.augment_rolling)` to review additional helpful documentation.\n\n:::\n\n\n## Basic Examples\n\nWe'll continue with the use of our sample `df` created earlier:\n\n::: {.cell execution_count=9}\n``` {.python .cell-code}\n# window = 3 days, window function = mean\ndf \\\n\t.augment_rolling(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\twindow = 3,\n\t\twindow_func = 'mean'\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_rolling_mean_win_3
02023-09-1811NaN
12023-09-1931NaN
22023-09-204729.666667
32023-09-213437.333333
42023-09-223037.000000
52023-09-233031.333333
62023-09-243832.666667
\n
\n```\n:::\n:::\n\n\nIt is important to understand how the `center` parameter in `augment_rolling()` works.\n\n::: {.callout-important collapse=\"false\"}\n## `center`\n\nWhen set to `True` (default) the value of the rolling window will be **centered**, meaning that the value at the center of the window will be used as the result.\nWhen set to `False` (default) the rolling window will **not be centered**, meaning that the value at the end of the window will be used as the result.\n\n:::\n\nLets see an example:\n\n:::{.panel-tabset groups=\"augment-rolling\"}\n\n## Augment Rolling: Center = True\n\n::: {.cell execution_count=10}\n``` {.python .cell-code}\n# agument rolling: center = true\ndf \\\n\t.augment_rolling(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\twindow = 3,\n\t\twindow_func = 'mean',\n\t\tcenter = True\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_rolling_mean_win_3
02023-09-1811NaN
12023-09-193129.666667
22023-09-204737.333333
32023-09-213437.000000
42023-09-223031.333333
52023-09-233032.666667
62023-09-2438NaN
\n
\n```\n:::\n:::\n\n\nNote that we are using a 3 day rolling window and applying a `mean` to `value`. In simplier terms, `value_rolling_mean_win_3` is a 3 day rolling average of `value` with `center` set to `True`. Thus the function starts computing the `mean` from `2023-09-19`\n\n## Augment Rolling: Center = False\n\n::: {.cell execution_count=11}\n``` {.python .cell-code}\n# agument rolling: center = false\ndf \\\n\t.augment_rolling(\n\t\tdate_column = 'date',\n\t\tvalue_column = 'value',\n\t\twindow = 3,\n\t\twindow_func = 'mean',\n\t\tcenter = True\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevaluevalue_rolling_mean_win_3
02023-09-1811NaN
12023-09-193129.666667
22023-09-204737.333333
32023-09-213437.000000
42023-09-223031.333333
52023-09-233032.666667
62023-09-2438NaN
\n
\n```\n:::\n:::\n\n\nNote that we are using a 3 day rolling window and applying a `mean` to `value`. In simplier terms, `value_rolling_mean_win_3` is a 3 day rolling average of `value` with `center` set to `False`. Thus the function starts computing the `mean` from `2023-09-20`. The same `value` for `2023-19-18` and `2023-09-19` are returned as `value_rolling_mean_win_3` since it did not detected the third to apply the 3 day rolling average.\n\n:::\n\n\n## Augment Rolling with Multiple Windows and Window Functions\n\nMultiple window functions can be passed to the `window` and `window_func` parameters:\n\n::: {.cell execution_count=12}\n``` {.python .cell-code}\n# augment rolling: window of 2 & 7 days, window_func of mean and standard deviation\nm4_daily_df \\\n\t.query('id == \"D10\"') \\\n .augment_rolling(\n date_column = 'date',\n value_column = 'value',\n window = [2,7],\n window_func = ['mean', ('std', lambda x: x.std())]\n )\n\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_rolling_mean_win_2value_rolling_std_win_2value_rolling_mean_win_7value_rolling_std_win_7
0D102014-07-032076.2NaNNaNNaNNaN
1D102014-07-042073.42074.801.402074.8000001.400000
2D102014-07-052048.72061.0512.352066.10000012.356645
3D102014-07-062048.92048.800.102061.80000013.037830
4D102014-07-072006.42027.6521.252050.72000025.041038
........................
669D102016-05-022630.72615.8514.852579.47142928.868159
670D102016-05-032649.32640.009.302594.80000033.081631
671D102016-05-042631.82640.558.752601.37142935.145563
672D102016-05-052622.52627.154.652607.45714334.584508
673D102016-05-062620.12621.301.202618.32857122.923270
\n

674 rows Γ— 7 columns

\n
\n```\n:::\n:::\n\n\n## Augment Rolling with Grouped Time Series\n\n`agument_rolling` can be used on grouped time series data:\n\n::: {.cell execution_count=13}\n``` {.python .cell-code}\n## augment rolling on grouped time series: window of 2 & 7 days, window_func of mean and standard deviation\nm4_daily_df \\\n\t.groupby('id') \\\n .augment_rolling(\n date_column = 'date',\n value_column = 'value',\n window = [2,7],\n window_func = ['mean', ('std', lambda x: x.std())]\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_rolling_mean_win_2value_rolling_std_win_2value_rolling_mean_win_7value_rolling_std_win_7
0D102014-07-032076.2NaNNaNNaNNaN
1D102014-07-042073.42074.801.402074.8000001.400000
2D102014-07-052048.72061.0512.352066.10000012.356645
3D102014-07-062048.92048.800.102061.80000013.037830
4D102014-07-072006.42027.6521.252050.72000025.041038
........................
9738D5002012-09-199418.89425.356.559382.07142974.335988
9739D5002012-09-209365.79392.2526.559396.40000058.431303
9740D5002012-09-219445.99405.8040.109419.11428639.184451
9741D5002012-09-229497.99471.9026.009438.92857138.945336
9742D5002012-09-239545.39521.6023.709449.02857153.379416
\n

9743 rows Γ— 7 columns

\n
\n```\n:::\n:::\n\n\n# Augment Time Series Signature\n\n`augment_timeseries_signature()` is designed to assist in generating additional features\nfrom a given date column.\n\n::: {.callout-tip collapse=\"false\"}\n## Help Doc Info: `augment_timeseries_signature()`\n\nUse `help(tk.augment_timeseries_signature)` to review additional helpful documentation.\n\n:::\n\n## Basic Example\n\nWe'll showcase an example using the `m4_daily_df` dataset by generating 29 additional features from the `date` column:\n\n::: {.cell execution_count=14}\n``` {.python .cell-code}\n# augment time series signature\nm4_daily_df \\\n .query('id == \"D10\"') \\\n\t.augment_timeseries_signature(\n\t\tdate_column = 'date'\n\t) \\\n .head()\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluedate_index_numdate_yeardate_year_isodate_yearstartdate_yearenddate_leapyeardate_half...date_mdaydate_qdaydate_ydaydate_weekenddate_hourdate_minutedate_seconddate_mseconddate_nseconddate_am_pm
0D102014-07-032076.21404345600201420140002...33184000000am
1D102014-07-042073.41404432000201420140002...44185000000am
2D102014-07-052048.71404518400201420140002...55186000000am
3D102014-07-062048.91404604800201420140002...66187100000am
4D102014-07-072006.41404691200201420140002...77188000000am
\n

5 rows Γ— 32 columns

\n
\n```\n:::\n:::\n\n\n# Augment Holiday Signature\n\n`augment_holiday_signature()` is used to flag holidays from a date column based on date and country.\n\n::: {.callout-tip collapse=\"false\"}\n## Help Doc Info: `augment_holiday_signature()`\n\nUse `help(tk.augment_holiday_signature)` to review additional helpful documentation.\n\n:::\n\n## Basic Example\n\nWe'll showcase an example using some sample data:\n\n::: {.cell execution_count=15}\n``` {.python .cell-code}\n# create sample data\ndates = pd.date_range(start = '2022-12-25', end = '2023-01-05')\n\ndf = pd.DataFrame({'date': dates})\n\n# augment time series signature: USA\ndf \\\n .augment_holiday_signature(\n\t\tdate_column = 'date',\n\t\tcountry_name = 'UnitedStates'\n\t)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
dateis_holidaybefore_holidayafter_holidayholiday_name
02022-12-25110Christmas Day
12022-12-26101Christmas Day (Observed)
22022-12-27001NaN
32022-12-28000NaN
42022-12-29000NaN
52022-12-30000NaN
62022-12-31010NaN
72023-01-01110New Year's Day
82023-01-02101New Year's Day (Observed)
92023-01-03001NaN
102023-01-04000NaN
112023-01-05000NaN
\n
\n```\n:::\n:::\n\n\n# Augment Fourier\n\nComing soon ....\n\n", "supporting": [ - "05_augmenting_files\\figure-html" + "05_augmenting_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/index/execute-results/html.json b/docs/_freeze/index/execute-results/html.json index 7d71de90..dbcf0e0a 100644 --- a/docs/_freeze/index/execute-results/html.json +++ b/docs/_freeze/index/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "4e6a7ddd086db1bae62f067f59aa3a1a", + "hash": "1299954c29d34484bd91fc2c5ffa8a71", "result": { - "markdown": "---\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\ntitle: PyTimeTK \n---\n\n\n\n\n\n> The Time Series Toolkit for Python\n\n**PyTimetk's Mission:** To make time series analysis easier, faster, and more enjoyable in Python.\n\n\n\n# Installation\n\nInstall the Latest Stable Version:\n\n``` bash\npip install pytimetk\n```\n\nAlternatively, install the Development GitHub Version:\n\n```bash\npip install git+https://github.com/business-science/pytimetk.git\n```\n\n# Quick Start: A Monthly Sales Analysis\n\nThis is a simple exercise to showcase the power of [`summarize_by_time()`](/reference/summarize_by_time.html):\n\n### Import Libraries & Data\n\nFirst, `import pytimetk as tk`. This gets you access to the most important functions. Use `tk.load_dataset()` to load the \"bike_sales_sample\" dataset.\n\n::: {.callout-note collapse=\"false\"}\n## About the Bike Sales Sample Dataset\n\nThis dataset contains \"orderlines\" for orders recieved. The `order_date` column contains timestamps. We can use this column to peform sales aggregations (e.g. total revenue).\n:::\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\nimport pandas as pd\n\ndf = tk.load_dataset('bike_sales_sample')\ndf['order_date'] = pd.to_datetime(df['order_date'])\n\ndf \n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
order_idorder_lineorder_datequantitypricetotal_pricemodelcategory_1category_2frame_materialbikeshop_namecitystate
0112011-01-07160706070Jekyll Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
1122011-01-07159705970Trigger Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
2212011-01-10127702770Beast of the East 1MountainTrailAluminumKansas City 29ersKansas CityKS
3222011-01-10159705970Trigger Carbon 2MountainOver MountainCarbonKansas City 29ersKansas CityKS
4312011-01-1011066010660Supersix Evo Hi-Mod TeamRoadElite RoadCarbonLouisville Race EquipmentLouisvilleKY
..........................................
246132132011-12-22114101410CAAD8 105RoadElite RoadAluminumMiami Race EquipmentMiamiFL
246232212011-12-28112501250Synapse Disc TiagraRoadEndurance RoadAluminumPhoenix Bi-pedsPhoenixAZ
246332222011-12-28126602660Bad Habit 2MountainTrailAluminumPhoenix Bi-pedsPhoenixAZ
246432232011-12-28123402340F-Si 1MountainCross Country RaceAluminumPhoenix Bi-pedsPhoenixAZ
246532242011-12-28158605860Synapse Hi-Mod Dura AceRoadEndurance RoadCarbonPhoenix Bi-pedsPhoenixAZ
\n

2466 rows Γ— 13 columns

\n
\n```\n:::\n:::\n\n\n### Using `summarize_by_time()` for a Sales Analysis\n\nYour company might be interested in sales patterns for various categories of bicycles. We can obtain a grouped monthly sales aggregation by `category_1` in two lines of code:\n\n1. First use pandas's `groupby()` method to group the DataFrame on `category_1`\n2. Next, use timetk's `summarize_by_time()` method to apply the sum function my month start (\"MS\") and use `wide_format = 'False'` to return the dataframe in a long format (Note long format is the default). \n\nThe result is the total revenue for Mountain and Road bikes by month. \n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\nsummary_category_1_df = df \\\n .groupby(\"category_1\") \\\n .summarize_by_time(\n date_column = 'order_date', \n value_column = 'total_price',\n freq = \"MS\",\n agg_func = 'sum',\n wide_format = False\n )\n\n# First 5 rows shown\nsummary_category_1_df.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
category_1order_datetotal_price
0Mountain2011-01-01221490
1Mountain2011-02-01660555
2Mountain2011-03-01358855
3Mountain2011-04-011075975
4Mountain2011-05-01450440
\n
\n```\n:::\n:::\n\n\n### Visualizing Sales Patterns\n\n::: {.callout-note collapse=\"false\"}\n## Now available: `plot_timeseries()`.\n\nPlot time series is a quick and easy way to visualize time series and make professional time series plots. \n:::\n\nWith the data summarized by time, we can visualize with `plot_timeseries()`. `pytimetk` functions are `groupby()` aware meaning they understand if your data is grouped to do things by group. This is useful in time series where we often deal with 100s of time series groups. \n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\nsummary_category_1_df \\\n .groupby('category_1') \\\n .plot_timeseries(\n date_column = 'order_date',\n value_column = 'total_price',\n smooth_frac = 0.8\n )\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n# Contributing\n\nInterested in helping us make this the best Python package for time series analysis? We'd love your help. \n\n[Follow these instructions to Contribute.](/contributing.html)\n\n# More Coming Soon...\n\nWe are in the early stages of development. But it's obvious the potential for `pytimetk` now in Python. 🐍\n\n- Please [⭐ us on GitHub](https://github.com/business-science/pytimetk) (it takes 2-seconds and means a lot). \n- To make requests, please see our [Project Roadmap GH Issue #2](https://github.com/business-science/pytimetk/issues/2). You can make requests there. \n- Want to contribute? [See our contributing guide here.](/contributing.html) \n\n", + "markdown": "---\ntoc: true\ntoc-depth: 3\nnumber-sections: true\nnumber-depth: 2\ntitle: PyTimeTK \n---\n\n\n\n\n\n> The Time Series Toolkit for Python\n\n**PyTimetk's Mission:** To make time series analysis easier, faster, and more enjoyable in Python.\n\n\n\n# Installation\n\nInstall the Latest Stable Version:\n\n``` bash\npip install pytimetk\n```\n\nAlternatively, install the Development GitHub Version:\n\n```bash\npip install git+https://github.com/business-science/pytimetk.git\n```\n\n# Quick Start: A Monthly Sales Analysis\n\nThis is a simple exercise to showcase the power of [`summarize_by_time()`](/reference/summarize_by_time.html):\n\n### Import Libraries & Data\n\nFirst, `import pytimetk as tk`. This gets you access to the most important functions. Use `tk.load_dataset()` to load the \"bike_sales_sample\" dataset.\n\n::: {.callout-note collapse=\"false\"}\n## About the Bike Sales Sample Dataset\n\nThis dataset contains \"orderlines\" for orders recieved. The `order_date` column contains timestamps. We can use this column to peform sales aggregations (e.g. total revenue).\n:::\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\nimport pandas as pd\n\ndf = tk.load_dataset('bike_sales_sample')\ndf['order_date'] = pd.to_datetime(df['order_date'])\n\ndf \n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
order_idorder_lineorder_datequantitypricetotal_pricemodelcategory_1category_2frame_materialbikeshop_namecitystate
0112011-01-07160706070Jekyll Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
1122011-01-07159705970Trigger Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
2212011-01-10127702770Beast of the East 1MountainTrailAluminumKansas City 29ersKansas CityKS
3222011-01-10159705970Trigger Carbon 2MountainOver MountainCarbonKansas City 29ersKansas CityKS
4312011-01-1011066010660Supersix Evo Hi-Mod TeamRoadElite RoadCarbonLouisville Race EquipmentLouisvilleKY
..........................................
246132132011-12-22114101410CAAD8 105RoadElite RoadAluminumMiami Race EquipmentMiamiFL
246232212011-12-28112501250Synapse Disc TiagraRoadEndurance RoadAluminumPhoenix Bi-pedsPhoenixAZ
246332222011-12-28126602660Bad Habit 2MountainTrailAluminumPhoenix Bi-pedsPhoenixAZ
246432232011-12-28123402340F-Si 1MountainCross Country RaceAluminumPhoenix Bi-pedsPhoenixAZ
246532242011-12-28158605860Synapse Hi-Mod Dura AceRoadEndurance RoadCarbonPhoenix Bi-pedsPhoenixAZ
\n

2466 rows Γ— 13 columns

\n
\n```\n:::\n:::\n\n\n### Using `summarize_by_time()` for a Sales Analysis\n\nYour company might be interested in sales patterns for various categories of bicycles. We can obtain a grouped monthly sales aggregation by `category_1` in two lines of code:\n\n1. First use pandas's `groupby()` method to group the DataFrame on `category_1`\n2. Next, use timetk's `summarize_by_time()` method to apply the sum function my month start (\"MS\") and use `wide_format = 'False'` to return the dataframe in a long format (Note long format is the default). \n\nThe result is the total revenue for Mountain and Road bikes by month. \n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\nsummary_category_1_df = df \\\n .groupby(\"category_1\") \\\n .summarize_by_time(\n date_column = 'order_date', \n value_column = 'total_price',\n freq = \"MS\",\n agg_func = 'sum',\n wide_format = False\n )\n\n# First 5 rows shown\nsummary_category_1_df.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
category_1order_datetotal_price
0Mountain2011-01-01221490
1Mountain2011-02-01660555
2Mountain2011-03-01358855
3Mountain2011-04-011075975
4Mountain2011-05-01450440
\n
\n```\n:::\n:::\n\n\n### Visualizing Sales Patterns\n\n::: {.callout-note collapse=\"false\"}\n## Now available: `plot_timeseries()`.\n\nPlot time series is a quick and easy way to visualize time series and make professional time series plots. \n:::\n\nWith the data summarized by time, we can visualize with `plot_timeseries()`. `pytimetk` functions are `groupby()` aware meaning they understand if your data is grouped to do things by group. This is useful in time series where we often deal with 100s of time series groups. \n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\nsummary_category_1_df \\\n .groupby('category_1') \\\n .plot_timeseries(\n date_column = 'order_date',\n value_column = 'total_price',\n smooth_frac = 0.8\n )\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n# Contributing\n\nInterested in helping us make this the best Python package for time series analysis? We'd love your help. \n\n[Follow these instructions to Contribute.](/contributing.html)\n\n# More Coming Soon...\n\nWe are in the early stages of development. But it's obvious the potential for `pytimetk` now in Python. 🐍\n\n- Please [⭐ us on GitHub](https://github.com/business-science/pytimetk) (it takes 2-seconds and means a lot). \n- To make requests, please see our [Project Roadmap GH Issue #2](https://github.com/business-science/pytimetk/issues/2). You can make requests there. \n- Want to contribute? [See our contributing guide here.](/contributing.html) \n\n", "supporting": [ - "index_files\\figure-html" + "index_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/augment_holiday_signature/execute-results/html.json b/docs/_freeze/reference/augment_holiday_signature/execute-results/html.json index bbafecb6..0793cd1c 100644 --- a/docs/_freeze/reference/augment_holiday_signature/execute-results/html.json +++ b/docs/_freeze/reference/augment_holiday_signature/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "fdc0be67d1e4fb1237e81304ec5e4a11", + "hash": "01745938e168616f510e172446cc3e5d", "result": { "markdown": "---\ntitle: augment_holiday_signature\n---\n\n\n\n`augment_holiday_signature(data, date_column, country_name='UnitedStates')`\n\nEngineers 4 different holiday features from a single datetime for 80+ countries.\n\nNote: Requires the `holidays` package to be installed. See https://pypi.org/project/holidays/ for more information.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|----------------|-----------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|\n| `data` | Union\\[pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy\\] | The input DataFrame. | _required_ |\n| `date_column` | str | The name of the datetime-like column in the DataFrame. | _required_ |\n| `country_name` | str | The name of the country for which to generate holiday features. Defaults to United States holidays, but the following countries are currently available and accessible by the full name or ISO code: Any of the following are acceptable keys for `country_name`: Available Countries: Full Country, Abrv. #1, #2, #3 Angola: Angola, AO, AGO, Argentina: Argentina, AR, ARG, Aruba: Aruba, AW, ABW, Australia: Australia, AU, AUS, Austria: Austria, AT, AUT, Bangladesh: Bangladesh, BD, BGD, Belarus: Belarus, BY, BLR, Belgium: Belgium, BE, BEL, Botswana: Botswana, BW, BWA, Brazil: Brazil, BR, BRA, Bulgaria: Bulgaria, BG, BLG, Burundi: Burundi, BI, BDI, Canada: Canada, CA, CAN, Chile: Chile, CL, CHL, Colombia: Colombia, CO, COL, Croatia: Croatia, HR, HRV, Curacao: Curacao, CW, CUW, Czechia: Czechia, CZ, CZE, Denmark: Denmark, DK, DNK, Djibouti: Djibouti, DJ, DJI, Dominican Republic: DominicanRepublic, DO, DOM, Egypt: Egypt, EG, EGY, England: England, Estonia: Estonia, EE, EST, European Central Bank: EuropeanCentralBank, Finland: Finland, FI, FIN, France: France, FR, FRA, Georgia: Georgia, GE, GEO, Germany: Germany, DE, DEU, Greece: Greece, GR, GRC, Honduras: Honduras, HN, HND, Hong Kong: HongKong, HK, HKG, Hungary: Hungary, HU, HUN, Iceland: Iceland, IS, ISL, India: India, IN, IND, Ireland: Ireland, IE, IRL, Isle Of Man: IsleOfMan, Israel: Israel, IL, ISR, Italy: Italy, IT, ITA, Jamaica: Jamaica, JM, JAM, Japan: Japan, JP, JPN, Kenya: Kenya, KE, KEN, Korea: Korea, KR, KOR, Latvia: Latvia, LV, LVA, Lithuania: Lithuania, LT, LTU, Luxembourg: Luxembourg, LU, LUX, Malaysia: Malaysia, MY, MYS, Malawi: Malawi, MW, MWI, Mexico: Mexico, MX, MEX, Morocco: Morocco, MA, MOR, Mozambique: Mozambique, MZ, MOZ, Netherlands: Netherlands, NL, NLD, NewZealand: NewZealand, NZ, NZL, Nicaragua: Nicaragua, NI, NIC, Nigeria: Nigeria, NG, NGA, Northern Ireland: NorthernIreland, Norway: Norway, NO, NOR, Paraguay: Paraguay, PY, PRY, Peru: Peru, PE, PER, Poland: Poland, PL, POL, Portugal: Portugal, PT, PRT, Portugal Ext: PortugalExt, PTE, Romania: Romania, RO, ROU, Russia: Russia, RU, RUS, Saudi Arabia: SaudiArabia, SA, SAU, Scotland: Scotland, Serbia: Serbia, RS, SRB, Singapore: Singapore, SG, SGP, Slovokia: Slovokia, SK, SVK, Slovenia: Slovenia, SI, SVN, South Africa: SouthAfrica, ZA, ZAF, Spain: Spain, ES, ESP, Sweden: Sweden, SE, SWE, Switzerland: Switzerland, CH, CHE, Turkey: Turkey, TR, TUR, Ukraine: Ukraine, UA, UKR, United Arab Emirates: UnitedArabEmirates, AE, ARE, United Kingdom: UnitedKingdom, GB, GBR, UK, United States: UnitedStates, US, USA, Venezuela: Venezuela, YV, VEN, Vietnam: Vietnam, VN, VNM, Wales: Wales | `'UnitedStates'` |\n\n## Returns\n\n| Type | Description |\n|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| pd.DataFrame: | A pandas DataFrame with three holiday-specific features: - is_holiday: (0, 1) indicator for holiday - before_holiday: (0, 1) indicator for day before holiday - after_holiday: (0, 1) indicator for day after holiday - holiday_name: name of the holiday |\n\n## Example\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\n# Make a DataFrame with a date column\nstart_date = '2023-01-01'\nend_date = '2023-01-10'\ndf = pd.DataFrame(pd.date_range(start=start_date, end=end_date), columns=['date'])\n\n# Add holiday features for US\ntk.augment_holiday_signature(df, 'date', 'UnitedStates')\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
dateis_holidaybefore_holidayafter_holidayholiday_name
02023-01-01110New Year's Day
12023-01-02101New Year's Day (Observed)
22023-01-03001NaN
32023-01-04000NaN
42023-01-05000NaN
52023-01-06000NaN
62023-01-07000NaN
72023-01-08000NaN
82023-01-09000NaN
92023-01-10000NaN
\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Add holiday features for France\ntk.augment_holiday_signature(df, 'date', 'France')\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
dateis_holidaybefore_holidayafter_holidayholiday_name
02023-01-01100New Year's Day
12023-01-02001NaN
22023-01-03000NaN
32023-01-04000NaN
42023-01-05000NaN
52023-01-06000NaN
62023-01-07000NaN
72023-01-08000NaN
82023-01-09000NaN
92023-01-10000NaN
\n
\n```\n:::\n:::\n\n\n", "supporting": [ - "augment_holiday_signature_files\\figure-html" + "augment_holiday_signature_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/augment_lags/execute-results/html.json b/docs/_freeze/reference/augment_lags/execute-results/html.json index 497bd73a..459e9f76 100644 --- a/docs/_freeze/reference/augment_lags/execute-results/html.json +++ b/docs/_freeze/reference/augment_lags/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "5ebf27483059fea06dec32060ef4994d", + "hash": "89e9af90e2f509759d870831752d2634", "result": { "markdown": "---\ntitle: augment_lags\n---\n\n\n\n`augment_lags(data, date_column, value_column, lags=1)`\n\nAdds lags to a Pandas DataFrame or DataFrameGroupBy object.\n\nThe `augment_lags` function takes a Pandas DataFrame or GroupBy object, a date column, a value column or list of value columns, and a lag or list of lags, and adds lagged versions of the value columns to the DataFrame.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|----------------|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `data` | pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy | The `data` parameter is the input DataFrame or DataFrameGroupBy object that you want to add lagged columns to. | _required_ |\n| `date_column` | str | The `date_column` parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to sort the data before adding the lagged values. | _required_ |\n| `value_column` | str or list | The `value_column` parameter is the column(s) in the DataFrame that you want to add lagged values for. It can be either a single column name (string) or a list of column names. | _required_ |\n| `lags` | int or tuple or list | The `lags` parameter is an integer, tuple, or list that specifies the number of lagged values to add to the DataFrame. - If it is an integer, the function will add that number of lagged values for each column specified in the `value_column` parameter. - If it is a tuple, it will generate lags from the first to the second value (inclusive). - If it is a list, it will generate lags based on the values in the list. | `1` |\n\n## Returns\n\n| Type | Description |\n|--------------|-----------------------------------------------------|\n| pd.DataFrame | A Pandas DataFrame with lagged columns added to it. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\ndf = tk.load_dataset('m4_daily', parse_dates=['date'])\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue
0D102014-07-032076.2
1D102014-07-042073.4
2D102014-07-052048.7
3D102014-07-062048.9
4D102014-07-072006.4
............
9738D5002012-09-199418.8
9739D5002012-09-209365.7
9740D5002012-09-219445.9
9741D5002012-09-229497.9
9742D5002012-09-239545.3
\n

9743 rows Γ— 3 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Add a lagged value of 2 for each grouped time series\nlagged_df = (\n df \n .groupby('id')\n .augment_lags(\n date_column='date',\n value_column='value',\n lags=2\n )\n)\nlagged_df\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_lag_2
0D102014-07-032076.2NaN
1D102014-07-042073.4NaN
2D102014-07-052048.72076.2
3D102014-07-062048.92073.4
4D102014-07-072006.42048.7
...............
9738D5002012-09-199418.89437.7
9739D5002012-09-209365.79431.9
9740D5002012-09-219445.99418.8
9741D5002012-09-229497.99365.7
9742D5002012-09-239545.39445.9
\n

9743 rows Γ— 4 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Add 7 lagged values for a single time series\nlagged_df_single = (\n df \n .query('id == \"D10\"')\n .augment_lags(\n date_column='date',\n value_column='value',\n lags=(1, 7)\n )\n)\nlagged_df_single\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_lag_1value_lag_2value_lag_3value_lag_4value_lag_5value_lag_6value_lag_7
0D102014-07-032076.2NaNNaNNaNNaNNaNNaNNaN
1D102014-07-042073.42076.2NaNNaNNaNNaNNaNNaN
2D102014-07-052048.72073.42076.2NaNNaNNaNNaNNaN
3D102014-07-062048.92048.72073.42076.2NaNNaNNaNNaN
4D102014-07-072006.42048.92048.72073.42076.2NaNNaNNaN
.................................
669D102016-05-022630.72601.02572.92544.02579.92585.82542.02534.2
670D102016-05-032649.32630.72601.02572.92544.02579.92585.82542.0
671D102016-05-042631.82649.32630.72601.02572.92544.02579.92585.8
672D102016-05-052622.52631.82649.32630.72601.02572.92544.02579.9
673D102016-05-062620.12622.52631.82649.32630.72601.02572.92544.0
\n

674 rows Γ— 10 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# Add 2 lagged values, 2 and 4, for a single time series\nlagged_df_single_two = (\n df \n .query('id == \"D10\"')\n .augment_lags(\n date_column='date',\n value_column='value',\n lags=[2, 4]\n )\n)\nlagged_df_single_two\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_lag_2value_lag_4
0D102014-07-032076.2NaNNaN
1D102014-07-042073.4NaNNaN
2D102014-07-052048.72076.2NaN
3D102014-07-062048.92073.4NaN
4D102014-07-072006.42048.72076.2
..................
669D102016-05-022630.72572.92579.9
670D102016-05-032649.32601.02544.0
671D102016-05-042631.82630.72572.9
672D102016-05-052622.52649.32601.0
673D102016-05-062620.12631.82630.7
\n

674 rows Γ— 5 columns

\n
\n```\n:::\n:::\n\n\n", "supporting": [ - "augment_lags_files\\figure-html" + "augment_lags_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/augment_leads/execute-results/html.json b/docs/_freeze/reference/augment_leads/execute-results/html.json index 21599ea6..f91524f0 100644 --- a/docs/_freeze/reference/augment_leads/execute-results/html.json +++ b/docs/_freeze/reference/augment_leads/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "520d6a4551626914f29c848450f76bb0", + "hash": "32b6708df5ac17bfe5614aca3cd4e314", "result": { "markdown": "---\ntitle: augment_leads\n---\n\n\n\n`augment_leads(data, date_column, value_column, leads=1)`\n\nAdds leads to a Pandas DataFrame or DataFrameGroupBy object.\n\nThe `augment_leads` function takes a Pandas DataFrame or GroupBy object, a date column, a value column or list of value columns, and a lead or list of leads, and adds leaded versions of the value columns to the DataFrame.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|----------------|----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `data` | pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy | The `data` parameter is the input DataFrame or DataFrameGroupBy object that you want to add leaded columns to. | _required_ |\n| `date_column` | str | The `date_column` parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to sort the data before adding the leaded values. | _required_ |\n| `value_column` | str or list | The `value_column` parameter is the column(s) in the DataFrame that you want to add leaded values for. It can be either a single column name (string) or a list of column names. | _required_ |\n| `leads` | int or tuple or list | The `leads` parameter is an integer, tuple, or list that specifies the number of leaded values to add to the DataFrame. If it is an integer, the function will add that number of leaded values for each column specified in the `value_column` parameter. If it is a tuple, it will generate leads from the first to the second value (inclusive). If it is a list, it will generate leads based on the values in the list. | `1` |\n\n## Returns\n\n| Type | Description |\n|--------------|-----------------------------------------------------|\n| pd.DataFrame | A Pandas DataFrame with leaded columns added to it. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\ndf = tk.load_dataset('m4_daily', parse_dates=['date'])\n```\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Add a leaded value of 2 for each grouped time series\nleaded_df = (\n df \n .groupby('id')\n .augment_leads(\n date_column='date',\n value_column='value',\n leads=2\n )\n)\nleaded_df\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_lead_2
0D102014-07-032076.22048.7
1D102014-07-042073.42048.9
2D102014-07-052048.72006.4
3D102014-07-062048.92017.6
4D102014-07-072006.42019.1
...............
9738D5002012-09-199418.89445.9
9739D5002012-09-209365.79497.9
9740D5002012-09-219445.99545.3
9741D5002012-09-229497.9NaN
9742D5002012-09-239545.3NaN
\n

9743 rows Γ— 4 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Add 7 leaded values for a single time series\nleaded_df_single = (\n df \n .query('id == \"D10\"')\n .augment_leads(\n date_column='date',\n value_column='value',\n leads=(1, 7)\n )\n)\nleaded_df_single \n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_lead_1value_lead_2value_lead_3value_lead_4value_lead_5value_lead_6value_lead_7
0D102014-07-032076.22073.42048.72048.92006.42017.62019.12007.4
1D102014-07-042073.42048.72048.92006.42017.62019.12007.42010.0
2D102014-07-052048.72048.92006.42017.62019.12007.42010.02001.5
3D102014-07-062048.92006.42017.62019.12007.42010.02001.51978.8
4D102014-07-072006.42017.62019.12007.42010.02001.51978.81988.3
.................................
669D102016-05-022630.72649.32631.82622.52620.1NaNNaNNaN
670D102016-05-032649.32631.82622.52620.1NaNNaNNaNNaN
671D102016-05-042631.82622.52620.1NaNNaNNaNNaNNaN
672D102016-05-052622.52620.1NaNNaNNaNNaNNaNNaN
673D102016-05-062620.1NaNNaNNaNNaNNaNNaNNaN
\n

674 rows Γ— 10 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# Add 2 leaded values, 2 and 4, for a single time series\nleaded_df_single_two = (\n df \n .query('id == \"D10\"')\n .augment_leads(\n date_column='date',\n value_column='value',\n leads=[2, 4]\n )\n)\nleaded_df_single_two\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_lead_2value_lead_4
0D102014-07-032076.22048.72006.4
1D102014-07-042073.42048.92017.6
2D102014-07-052048.72006.42019.1
3D102014-07-062048.92017.62007.4
4D102014-07-072006.42019.12010.0
..................
669D102016-05-022630.72631.82620.1
670D102016-05-032649.32622.5NaN
671D102016-05-042631.82620.1NaN
672D102016-05-052622.5NaNNaN
673D102016-05-062620.1NaNNaN
\n

674 rows Γ— 5 columns

\n
\n```\n:::\n:::\n\n\n", "supporting": [ - "augment_leads_files\\figure-html" + "augment_leads_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/augment_rolling/execute-results/html.json b/docs/_freeze/reference/augment_rolling/execute-results/html.json index 461e8a5e..a51d51f5 100644 --- a/docs/_freeze/reference/augment_rolling/execute-results/html.json +++ b/docs/_freeze/reference/augment_rolling/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "b734795387f607dd0aa53f89c541c32a", + "hash": "ceef4b740f689f15717dec958f8ae24b", "result": { "markdown": "---\ntitle: augment_rolling\n---\n\n\n\n`augment_rolling(data, date_column, value_column, use_independent_variables=False, window=2, window_func='mean', min_periods=None, center=False, **kwargs)`\n\nApply one or more rolling functions and window sizes to one or more columns of a DataFrame.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|-----------------------------|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `data` | Union\\[pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy\\] | The `data` parameter is the input DataFrame or GroupBy object that contains the data to be processed. It can be either a Pandas DataFrame or a GroupBy object. | _required_ |\n| `date_column` | str | The `date_column` parameter is the name of the datetime column in the DataFrame by which the data should be sorted within each group. | _required_ |\n| `value_column` | Union\\[str, list\\] | The `value_column` parameter is the name of the column(s) in the DataFrame to which the rolling window function(s) should be applied. It can be a single column name or a list of column names. | _required_ |\n| `use_independent_variables` | bool | The `use_independent_variables` parameter is an optional parameter that specifies whether the rolling function(s) require independent variables, such as rolling correlation or rolling regression. (See Examples below.) | `False` |\n| `window` | Union\\[int, tuple, list\\] | The `window` parameter in the `augment_rolling` function is used to specify the size of the rolling windows. It can be either an integer or a list of integers. - If it is an integer, the same window size will be applied to all columns specified in the `value_column`. - If it is a tuple, it will generate windows from the first to the second value (inclusive). - If it is a list of integers, each integer in the list will be used as the window size for the corresponding column in the `value_column` list. | `2` |\n| `window_func` | Union\\[str, list, Tuple\\[str, Callable\\]\\] | The `window_func` parameter in the `augment_rolling` function is used to specify the function(s) to be applied to the rolling windows. 1. It can be a string or a list of strings, where each string represents the name of the function to be applied. 2. Alternatively, it can be a list of tuples, where each tuple contains the name of the function to be applied and the function itself. The function is applied as a Pandas Series. (See Examples below.) 3. If the function requires independent variables, the `use_independent_variables` parameter must be specified. The independent variables will be passed to the function as a DataFrame containing the window of rows. (See Examples below.) | `'mean'` |\n| `center` | bool | The `center` parameter in the `augment_rolling` function determines whether the rolling window is centered or not. If `center` is set to `True`, the rolling window will be centered, meaning that the value at the center of the window will be used as the result. If ` | `False` |\n\n## Returns\n\n| Type | Description |\n|--------------|-------------------------------------------------------------------------------------------------------------------------------|\n| pd.DataFrame | The `augment_rolling` function returns a DataFrame with new columns for each applied function, window size, and value column. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\nimport pandas as pd\nimport numpy as np\n\ndf = tk.load_dataset(\"m4_daily\", parse_dates = ['date'])\n```\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# String Function Name and Series Lambda Function (no independent variables)\n# window = [2,7] yields only 2 and 7\nrolled_df = (\n df\n .groupby('id')\n .augment_rolling(\n date_column = 'date', \n value_column = 'value', \n window = [2,7], \n window_func = ['mean', ('std', lambda x: x.std())]\n )\n)\nrolled_df\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_rolling_mean_win_2value_rolling_std_win_2value_rolling_mean_win_7value_rolling_std_win_7
0D102014-07-032076.2NaNNaNNaNNaN
1D102014-07-042073.42074.801.402074.8000001.400000
2D102014-07-052048.72061.0512.352066.10000012.356645
3D102014-07-062048.92048.800.102061.80000013.037830
4D102014-07-072006.42027.6521.252050.72000025.041038
........................
9738D5002012-09-199418.89425.356.559382.07142974.335988
9739D5002012-09-209365.79392.2526.559396.40000058.431303
9740D5002012-09-219445.99405.8040.109419.11428639.184451
9741D5002012-09-229497.99471.9026.009438.92857138.945336
9742D5002012-09-239545.39521.6023.709449.02857153.379416
\n

9743 rows Γ— 7 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# String Function Name and Series Lambda Function (no independent variables)\n# window = (1,3) yields 1, 2, and 3\nrolled_df = (\n df\n .groupby('id')\n .augment_rolling(\n date_column = 'date', \n value_column = 'value', \n window = (1,3), \n window_func = ['mean', ('std', lambda x: x.std())]\n )\n)\nrolled_df \n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevaluevalue_rolling_mean_win_1value_rolling_std_win_1value_rolling_mean_win_2value_rolling_std_win_2value_rolling_mean_win_3value_rolling_std_win_3
0D102014-07-032076.22076.20.02076.200.002076.2000000.000000
1D102014-07-042073.42073.40.02074.801.402074.8000001.400000
2D102014-07-052048.72048.70.02061.0512.352066.10000012.356645
3D102014-07-062048.92048.90.02048.800.102057.00000011.596839
4D102014-07-072006.42006.40.02027.6521.252034.66666719.987718
..............................
9738D5002012-09-199418.89418.80.09425.356.559429.4666677.905413
9739D5002012-09-209365.79365.70.09392.2526.559405.46666728.623339
9740D5002012-09-219445.99445.90.09405.8040.109410.13333333.310092
9741D5002012-09-229497.99497.90.09471.9026.009436.50000054.378182
9742D5002012-09-239545.39545.30.09521.6023.709496.36666740.594362
\n

9743 rows Γ— 9 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# Rolling Correlation: Uses independent variables (value2)\n\ndf = pd.DataFrame({\n 'id': [1, 1, 1, 2, 2, 2],\n 'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),\n 'value1': [10, 20, 29, 42, 53, 59],\n 'value2': [2, 16, 20, 40, 41, 50],\n})\n\nresult_df = (\n df.groupby('id')\n .augment_rolling(\n date_column='date',\n value_column='value1',\n use_independent_variables=True,\n window=3,\n window_func=[('corr', lambda df: df['value1'].corr(df['value2']))],\n center = False\n )\n)\nresult_df\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue1value2value1_rolling_corr_win_3
012023-01-01102NaN
112023-01-022016NaN
212023-01-0329200.961054
322023-01-044240NaN
422023-01-055341NaN
522023-01-0659500.824831
\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n# Rolling Regression: Using independent variables (value2 and value3)\n\n# Requires: scikit-learn\nfrom sklearn.linear_model import LinearRegression\n\ndf = pd.DataFrame({\n 'id': [1, 1, 1, 2, 2, 2],\n 'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),\n 'value1': [10, 20, 29, 42, 53, 59],\n 'value2': [5, 16, 24, 35, 45, 58],\n 'value3': [2, 3, 6, 9, 10, 13]\n})\n\n# Define Regression Function\ndef regression(df):\n\n model = LinearRegression()\n X = df[['value2', 'value3']] # Extract X values (independent variables)\n y = df['value1'] # Extract y values (dependent variable)\n model.fit(X, y)\n ret = pd.Series([model.intercept_, model.coef_[0]], index=['Intercept', 'Slope'])\n \n return ret # Return intercept and slope as a Series\n \n\n# Example to call the function\nresult_df = (\n df.groupby('id')\n .augment_rolling(\n date_column='date',\n value_column='value1',\n use_independent_variables=True,\n window=3,\n window_func=[('regression', regression)]\n )\n .dropna()\n)\n\n# Display Results in Wide Format since returning multiple values\nregression_wide_df = pd.concat(result_df['value1_rolling_regression_win_3'].to_list(), axis=1).T\n\nregression_wide_df = pd.concat([result_df.reset_index(drop = True), regression_wide_df], axis=1)\n\nregression_wide_df\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue1value2value3value1_rolling_regression_win_3InterceptSlope
012023-01-0329246Intercept 4.28\nSlope 0.84\ndtype: flo...4.2800000.840000
122023-01-06595813Intercept 30.352941\nSlope 1.588235\n...30.3529411.588235
\n
\n```\n:::\n:::\n\n\n", "supporting": [ - "augment_rolling_files\\figure-html" + "augment_rolling_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/augment_timeseries_signature/execute-results/html.json b/docs/_freeze/reference/augment_timeseries_signature/execute-results/html.json index 2bf8bfcf..403adafa 100644 --- a/docs/_freeze/reference/augment_timeseries_signature/execute-results/html.json +++ b/docs/_freeze/reference/augment_timeseries_signature/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "c3e80eb54e00fcd186d78791c12234de", + "hash": "8aecc87c8a307524c3099d6572dca84e", "result": { "markdown": "---\ntitle: augment_timeseries_signature\n---\n\n\n\n`augment_timeseries_signature(data, date_column)`\n\nAdd 29 time series features to a DataFrame.\n\nThe function `augment_timeseries_signature` takes a DataFrame and a date column as input and returns the original DataFrame with the **29 different date and time based features** added as new columns: \n\n- index_num: An int64 feature that captures the entire datetime as a numeric value to the second\n- year: The year of the datetime\n- year_iso: The iso year of the datetime\n- yearstart: Logical (0,1) indicating if first day of year (defined by frequency)\n- yearend: Logical (0,1) indicating if last day of year (defined by frequency)\n- leapyear: Logical (0,1) indicating if the date belongs to a leap year\n- half: Half year of the date: Jan-Jun = 1, July-Dec = 2\n- quarter: Quarter of the date: Jan-Mar = 1, Apr-Jun = 2, Jul-Sep = 3, Oct-Dec = 4\n- quarteryear: Quarter of the date + relative year\n- quarterstart: Logical (0,1) indicating if first day of quarter (defined by frequency)\n- quarterend: Logical (0,1) indicating if last day of quarter (defined by frequency)\n- month: The month of the datetime\n- month_lbl: The month label of the datetime\n- monthstart: Logical (0,1) indicating if first day of month (defined by frequency)\n- monthend: Logical (0,1) indicating if last day of month (defined by frequency)\n- yweek: The week ordinal of the year\n- mweek: The week ordinal of the month\n- wday: The number of the day of the week with Monday=1, Sunday=6\n- wday_lbl: The day of the week label\n- mday: The day of the datetime\n- qday: The days of the relative quarter\n- yday: The ordinal day of year\n- weekend: Logical (0,1) indicating if the day is a weekend \n- hour: The hour of the datetime\n- minute: The minutes of the datetime\n- second: The seconds of the datetime\n- msecond: The microseconds of the datetime\n- nsecond: The nanoseconds of the datetime\n- am_pm: Half of the day, AM = ante meridiem, PM = post meridiem\n\n## Parameters\n\n| Name | Type | Description | Default |\n|---------------|--------------|--------------------------------------------------------------------------------------------------------------|------------|\n| `data` | pd.DataFrame | The `data` parameter is a pandas DataFrame that contains the time series data. | _required_ |\n| `date_column` | str | The `date_column` parameter is a string that represents the name of the date column in the `data` DataFrame. | _required_ |\n\n## Returns\n\n| Type | Description |\n|----------------------------------------------------------------------------------------------------------------|---------------|\n| A pandas DataFrame that is the concatenation of the original data DataFrame and the ts_signature_df DataFrame. | |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\npd.set_option('display.max_columns', None)\n\n# Adds 29 new time series features as columns to the original DataFrame\n( \n tk.load_dataset('bike_sales_sample', parse_dates = ['order_date'])\n .augment_timeseries_signature(date_column = 'order_date')\n .head()\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
order_idorder_lineorder_datequantitypricetotal_pricemodelcategory_1category_2frame_materialbikeshop_namecitystateorder_date_index_numorder_date_yearorder_date_year_isoorder_date_yearstartorder_date_yearendorder_date_leapyearorder_date_halforder_date_quarterorder_date_quarteryearorder_date_quarterstartorder_date_quarterendorder_date_monthorder_date_month_lblorder_date_monthstartorder_date_monthendorder_date_yweekorder_date_mweekorder_date_wdayorder_date_wday_lblorder_date_mdayorder_date_qdayorder_date_ydayorder_date_weekendorder_date_hourorder_date_minuteorder_date_secondorder_date_msecondorder_date_nsecondorder_date_am_pm
0112011-01-07160706070Jekyll Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY129435840020112011000112011Q1001January00115Friday777000000am
1122011-01-07159705970Trigger Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY129435840020112011000112011Q1001January00115Friday777000000am
2212011-01-10127702770Beast of the East 1MountainTrailAluminumKansas City 29ersKansas CityKS129461760020112011000112011Q1001January00221Monday101010000000am
3222011-01-10159705970Trigger Carbon 2MountainOver MountainCarbonKansas City 29ersKansas CityKS129461760020112011000112011Q1001January00221Monday101010000000am
4312011-01-1011066010660Supersix Evo Hi-Mod TeamRoadElite RoadCarbonLouisville Race EquipmentLouisvilleKY129461760020112011000112011Q1001January00221Monday101010000000am
\n
\n```\n:::\n:::\n\n\n", "supporting": [ - "augment_timeseries_signature_files\\figure-html" + "augment_timeseries_signature_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/floor_date/execute-results/html.json b/docs/_freeze/reference/floor_date/execute-results/html.json index 2094a639..d91bfdaf 100644 --- a/docs/_freeze/reference/floor_date/execute-results/html.json +++ b/docs/_freeze/reference/floor_date/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "d77135a655d3b5c0e974389777043cf0", + "hash": "b1784a0e4373ba1f0271f7c97929da2f", "result": { "markdown": "---\ntitle: floor_date\n---\n\n\n\n`floor_date(idx, unit='D')`\n\nRound a date down to the specified unit (e.g. Flooring).\n\nThe `floor_date` function takes a pandas Series of dates and returns a new Series with the dates rounded down to the specified unit.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|--------|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `idx` | pd.Series or pd.DatetimeIndex | The `idx` parameter is a pandas Series or pandas DatetimeIndex object that contains datetime values. It represents the dates that you want to round down. | _required_ |\n| `unit` | str | The `unit` parameter in the `floor_date` function is a string that specifies the time unit to which the dates in the `idx` series should be rounded down. It has a default value of \"D\", which stands for day. Other possible values for the `unit` parameter could be | `'D'` |\n\n## Returns\n\n| Type | Description |\n|-----------|--------------------------------------------------------------------------------------------|\n| pd.Series | The `floor_date` function returns a pandas Series object containing datetime64[ns] values. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\nimport pandas as pd\n\ndates = pd.date_range(\"2020-01-01\", \"2020-01-10\", freq=\"1H\")\ndates\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```\nDatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 01:00:00',\n '2020-01-01 02:00:00', '2020-01-01 03:00:00',\n '2020-01-01 04:00:00', '2020-01-01 05:00:00',\n '2020-01-01 06:00:00', '2020-01-01 07:00:00',\n '2020-01-01 08:00:00', '2020-01-01 09:00:00',\n ...\n '2020-01-09 15:00:00', '2020-01-09 16:00:00',\n '2020-01-09 17:00:00', '2020-01-09 18:00:00',\n '2020-01-09 19:00:00', '2020-01-09 20:00:00',\n '2020-01-09 21:00:00', '2020-01-09 22:00:00',\n '2020-01-09 23:00:00', '2020-01-10 00:00:00'],\n dtype='datetime64[ns]', length=217, freq='H')\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Works on DateTimeIndex\ntk.floor_date(dates, unit=\"D\")\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```\n0 2020-01-01\n1 2020-01-01\n2 2020-01-01\n3 2020-01-01\n4 2020-01-01\n ... \n212 2020-01-09\n213 2020-01-09\n214 2020-01-09\n215 2020-01-09\n216 2020-01-10\nName: idx, Length: 217, dtype: datetime64[ns]\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Works on Pandas Series\ndates.to_series().floor_date(unit=\"D\")\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```\n2020-01-01 00:00:00 2020-01-01\n2020-01-01 01:00:00 2020-01-01\n2020-01-01 02:00:00 2020-01-01\n2020-01-01 03:00:00 2020-01-01\n2020-01-01 04:00:00 2020-01-01\n ... \n2020-01-09 20:00:00 2020-01-09\n2020-01-09 21:00:00 2020-01-09\n2020-01-09 22:00:00 2020-01-09\n2020-01-09 23:00:00 2020-01-09\n2020-01-10 00:00:00 2020-01-10\nFreq: H, Length: 217, dtype: datetime64[ns]\n```\n:::\n:::\n\n\n", "supporting": [ - "floor_date_files\\figure-html" + "floor_date_files/figure-html" ], "filters": [], "includes": {} diff --git a/docs/_freeze/reference/future_frame/execute-results/html.json b/docs/_freeze/reference/future_frame/execute-results/html.json index 9dc625b5..abaa72b4 100644 --- a/docs/_freeze/reference/future_frame/execute-results/html.json +++ b/docs/_freeze/reference/future_frame/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "c04e54ee31e61dd65040d0d3d9ffcf55", + "hash": "7978a8bbb34199bb08461cc3135fa870", "result": { "markdown": "---\ntitle: future_frame\n---\n\n\n\n`future_frame(data, date_column, length_out, force_regular=False, bind_data=True)`\n\nExtend a DataFrame or GroupBy object with future dates.\n\nThe `future_frame` function extends a given DataFrame or GroupBy object with future dates based on a specified length, optionally binding the original data.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|-----------------|----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `data` | pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy | The `data` parameter is the input DataFrame or DataFrameGroupBy object that you want to extend with future dates. | _required_ |\n| `date_column` | str | The `date_column` parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to generate future dates. | _required_ |\n| `length_out` | int | The `length_out` parameter specifies the number of future dates to be added to the DataFrame. | _required_ |\n| `force_regular` | bool | The `force_regular` parameter is a boolean flag that determines whether the frequency of the future dates should be forced to be regular. If `force_regular` is set to `True`, the frequency of the future dates will be forced to be regular. If `force_regular` is set to `False`, the frequency of the future dates will be inferred from the input data (e.g. business calendars might be used). The default value is `False`. | `False` |\n| `bind_data` | bool | The `bind_data` parameter is a boolean flag that determines whether the extended data should be concatenated with the original data or returned separately. If `bind_data` is set to `True`, the extended data will be concatenated with the original data using `pd.concat`. If `bind_data` is set to `False`, the extended data will be returned separately. The default value is `True`. | `True` |\n\n## Returns\n\n| Type | Description |\n|--------------|------------------------------------------|\n| pd.DataFrame | An extended DataFrame with future dates. |\n\n## See Also\n\nmake_future_timeseries: Generate future dates for a time series.\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\ndf = tk.load_dataset('m4_hourly', parse_dates = ['date'])\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue
0H102015-07-01 12:00:00+00:00513
1H102015-07-01 13:00:00+00:00512
2H102015-07-01 14:00:00+00:00506
3H102015-07-01 15:00:00+00:00500
4H102015-07-01 16:00:00+00:00490
............
3055H4102017-02-10 07:00:00+00:00108
3056H4102017-02-10 08:00:00+00:0070
3057H4102017-02-10 09:00:00+00:0072
3058H4102017-02-10 10:00:00+00:0079
3059H4102017-02-10 11:00:00+00:0077
\n

3060 rows Γ— 3 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Extend the data for a single time series group by 12 hours\nextended_df = (\n df\n .query('id == \"H10\"')\n .future_frame(\n date_column = 'date', \n length_out = 12\n )\n .assign(id = lambda x: x['id'].ffill())\n)\nextended_df\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue
0H102015-07-01 12:00:00+00:00513.0
1H102015-07-01 13:00:00+00:00512.0
2H102015-07-01 14:00:00+00:00506.0
3H102015-07-01 15:00:00+00:00500.0
4H102015-07-01 16:00:00+00:00490.0
............
707H102015-07-30 23:00:00NaN
708H102015-07-31 00:00:00NaN
709H102015-07-31 01:00:00NaN
710H102015-07-31 02:00:00NaN
711H102015-07-31 03:00:00NaN
\n

712 rows Γ— 3 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Extend the data for each group by 12 hours\nextended_df = (\n df\n .groupby('id')\n .future_frame(\n date_column = 'date', \n length_out = 12\n )\n) \nextended_df\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
iddatevalue
0H102015-07-01 12:00:00+00:00513.0
1H102015-07-01 13:00:00+00:00512.0
2H102015-07-01 14:00:00+00:00506.0
3H102015-07-01 15:00:00+00:00500.0
4H102015-07-01 16:00:00+00:00490.0
............
707H502015-07-30 23:00:00NaN
708H502015-07-31 00:00:00NaN
709H502015-07-31 01:00:00NaN
710H502015-07-31 02:00:00NaN
711H502015-07-31 03:00:00NaN
\n

3108 rows Γ— 3 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# Same as above, but just return the extended data with bind_data=False\nextended_df = (\n df\n .groupby('id')\n .future_frame(\n date_column = 'date', \n length_out = 12,\n bind_data = False # Returns just future data\n )\n) \nextended_df\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
dateid
02015-07-30 16:00:00H10
12015-07-30 17:00:00H10
22015-07-30 18:00:00H10
32015-07-30 19:00:00H10
42015-07-30 20:00:00H10
52015-07-30 21:00:00H10
62015-07-30 22:00:00H10
72015-07-30 23:00:00H10
82015-07-31 00:00:00H10
92015-07-31 01:00:00H10
102015-07-31 02:00:00H10
112015-07-31 03:00:00H10
02013-09-30 16:00:00H150
12013-09-30 17:00:00H150
22013-09-30 18:00:00H150
32013-09-30 19:00:00H150
42013-09-30 20:00:00H150
52013-09-30 21:00:00H150
62013-09-30 22:00:00H150
72013-09-30 23:00:00H150
82013-10-01 00:00:00H150
92013-10-01 01:00:00H150
102013-10-01 02:00:00H150
112013-10-01 03:00:00H150
02017-02-10 12:00:00H410
12017-02-10 13:00:00H410
22017-02-10 14:00:00H410
32017-02-10 15:00:00H410
42017-02-10 16:00:00H410
52017-02-10 17:00:00H410
62017-02-10 18:00:00H410
72017-02-10 19:00:00H410
82017-02-10 20:00:00H410
92017-02-10 21:00:00H410
102017-02-10 22:00:00H410
112017-02-10 23:00:00H410
02015-07-30 16:00:00H50
12015-07-30 17:00:00H50
22015-07-30 18:00:00H50
32015-07-30 19:00:00H50
42015-07-30 20:00:00H50
52015-07-30 21:00:00H50
62015-07-30 22:00:00H50
72015-07-30 23:00:00H50
82015-07-31 00:00:00H50
92015-07-31 01:00:00H50
102015-07-31 02:00:00H50
112015-07-31 03:00:00H50
\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n # Working with irregular dates: Business Days (Stocks Data)\ndf = tk.load_dataset('stocks_daily', parse_dates = ['date'])\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
symboldateopenhighlowclosevolumeadjusted
0META2013-01-0227.44000128.18000027.42000028.0000006984640028.000000
1META2013-01-0327.87999928.46999927.59000027.7700006314060027.770000
2META2013-01-0428.01000028.93000027.83000028.7600007271540028.760000
3META2013-01-0728.69000129.79000128.65000029.4200008378180029.420000
4META2013-01-0829.51000029.60000028.86000129.0599994587130029.059999
...........................
16189GOOG2023-09-15138.800003139.360001137.179993138.30000348947600138.300003
16190GOOG2023-09-18137.630005139.929993137.630005138.96000716233600138.960007
16191GOOG2023-09-19138.250000139.175003137.500000138.83000215479100138.830002
16192GOOG2023-09-20138.830002138.839996134.520004134.58999621473500134.589996
16193GOOG2023-09-21132.389999133.190002131.089996131.36000122042700131.360001
\n

16194 rows Γ— 8 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=6}\n``` {.python .cell-code}\n# Allow irregular future dates (i.e. business days)\nextended_df = (\n df\n .groupby('symbol')\n .future_frame(\n date_column = 'date', \n length_out = 12,\n force_regular = False, # Allow irregular future dates (i.e. business days)),\n bind_data = False\n )\n) \nextended_df\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datesymbol
02023-09-22AAPL
12023-09-25AAPL
22023-09-26AAPL
32023-09-27AAPL
42023-09-28AAPL
.........
72023-10-03NVDA
82023-10-04NVDA
92023-10-05NVDA
102023-10-06NVDA
112023-10-09NVDA
\n

72 rows Γ— 2 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=7}\n``` {.python .cell-code}\n# Force regular: Include Weekends\nextended_df = (\n df\n .groupby('symbol')\n .future_frame(\n date_column = 'date', \n length_out = 12,\n force_regular = True, # Force regular future dates (i.e. include weekends)),\n bind_data = False\n )\n) \nextended_df\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datesymbol
02023-09-22AAPL
12023-09-23AAPL
22023-09-24AAPL
32023-09-25AAPL
42023-09-26AAPL
.........
72023-09-29NVDA
82023-09-30NVDA
92023-10-01NVDA
102023-10-02NVDA
112023-10-03NVDA
\n

72 rows Γ— 2 columns

\n
\n```\n:::\n:::\n\n\n", "supporting": [ - "future_frame_files\\figure-html" + "future_frame_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/get_available_datasets/execute-results/html.json b/docs/_freeze/reference/get_available_datasets/execute-results/html.json index 1c769285..f5ae7180 100644 --- a/docs/_freeze/reference/get_available_datasets/execute-results/html.json +++ b/docs/_freeze/reference/get_available_datasets/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "719d3701f6079916c1984b1815c38785", + "hash": "43cb0a9a95c83023356bc157103578f7", "result": { - "markdown": "---\ntitle: get_available_datasets\n---\n\n\n\n`datasets.get_datasets.get_available_datasets()`\n\nGet a list of 12 datasets that can be loaded with `pytimetk.load_dataset`.\n\nThe `get_available_datasets` function returns a sorted list of available dataset names from the `pytimetk.datasets` module. The available datasets are:\n\n## Returns\n\n| Type | Description |\n|--------|-----------------------------------------------------------------------------------------------------------------------------|\n| list | The function `get_available_datasets` returns a sorted list of available dataset names from the `pytimetk.datasets` module. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\n\ntk.get_available_datasets()\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```\n['bike_sales_sample',\n 'bike_sharing_daily',\n 'm4_daily',\n 'm4_hourly',\n 'm4_monthly',\n 'm4_quarterly',\n 'm4_weekly',\n 'm4_yearly',\n 'stocks_daily',\n 'taylor_30_min',\n 'walmart_sales_weekly',\n 'wikipedia_traffic_daily']\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: get_available_datasets\n---\n\n\n\n`get_available_datasets()`\n\nGet a list of 12 datasets that can be loaded with `pytimetk.load_dataset`.\n\nThe `get_available_datasets` function returns a sorted list of available dataset names from the `pytimetk.datasets` module. The available datasets are:\n\n## Returns\n\n| Type | Description |\n|--------|-----------------------------------------------------------------------------------------------------------------------------|\n| list | The function `get_available_datasets` returns a sorted list of available dataset names from the `pytimetk.datasets` module. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\n\ntk.get_available_datasets()\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```\n['bike_sales_sample',\n 'bike_sharing_daily',\n 'expedia',\n 'm4_daily',\n 'm4_hourly',\n 'm4_monthly',\n 'm4_quarterly',\n 'm4_weekly',\n 'm4_yearly',\n 'stocks_daily',\n 'taylor_30_min',\n 'walmart_sales_weekly',\n 'wikipedia_traffic_daily']\n```\n:::\n:::\n\n\n", "supporting": [ - "get_available_datasets_files\\figure-html" + "get_available_datasets_files/figure-html" ], "filters": [], "includes": {} diff --git a/docs/_freeze/reference/get_holiday_signature/execute-results/html.json b/docs/_freeze/reference/get_holiday_signature/execute-results/html.json index 1b61521c..a6a5c3e1 100644 --- a/docs/_freeze/reference/get_holiday_signature/execute-results/html.json +++ b/docs/_freeze/reference/get_holiday_signature/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "2238040376629f7f73aecb6ee79803fb", + "hash": "fd186ecbdb1bae50ecbbe3445e2c941a", "result": { "markdown": "---\ntitle: get_holiday_signature\n---\n\n\n\n`get_holiday_signature(idx, country_name='UnitedStates')`\n\nEngineers 4 different holiday features from a single datetime for 80+ countries.\n\nNote: Requires the `holidays` package to be installed. See https://pypi.org/project/holidays/ for more information.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|----------------|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|\n| `idx` | pd.DatetimeIndex or pd.Series | A pandas DatetimeIndex or Series containing the dates for which you want to get the holiday signature. | _required_ |\n| `country_name` | str | The name of the country for which to generate holiday features. Defaults to United States holidays, but the following countries are currently available and accessible by the full name or ISO code: Any of the following are acceptable keys for `country_name`: Available Countries: Full Country, Abrv. #1, #2, #3 Angola: Angola, AO, AGO, Argentina: Argentina, AR, ARG, Aruba: Aruba, AW, ABW, Australia: Australia, AU, AUS, Austria: Austria, AT, AUT, Bangladesh: Bangladesh, BD, BGD, Belarus: Belarus, BY, BLR, Belgium: Belgium, BE, BEL, Botswana: Botswana, BW, BWA, Brazil: Brazil, BR, BRA, Bulgaria: Bulgaria, BG, BLG, Burundi: Burundi, BI, BDI, Canada: Canada, CA, CAN, Chile: Chile, CL, CHL, Colombia: Colombia, CO, COL, Croatia: Croatia, HR, HRV, Curacao: Curacao, CW, CUW, Czechia: Czechia, CZ, CZE, Denmark: Denmark, DK, DNK, Djibouti: Djibouti, DJ, DJI, Dominican Republic: DominicanRepublic, DO, DOM, Egypt: Egypt, EG, EGY, England: England, Estonia: Estonia, EE, EST, European Central Bank: EuropeanCentralBank, Finland: Finland, FI, FIN, France: France, FR, FRA, Georgia: Georgia, GE, GEO, Germany: Germany, DE, DEU, Greece: Greece, GR, GRC, Honduras: Honduras, HN, HND, Hong Kong: HongKong, HK, HKG, Hungary: Hungary, HU, HUN, Iceland: Iceland, IS, ISL, India: India, IN, IND, Ireland: Ireland, IE, IRL, Isle Of Man: IsleOfMan, Israel: Israel, IL, ISR, Italy: Italy, IT, ITA, Jamaica: Jamaica, JM, JAM, Japan: Japan, JP, JPN, Kenya: Kenya, KE, KEN, Korea: Korea, KR, KOR, Latvia: Latvia, LV, LVA, Lithuania: Lithuania, LT, LTU, Luxembourg: Luxembourg, LU, LUX, Malaysia: Malaysia, MY, MYS, Malawi: Malawi, MW, MWI, Mexico: Mexico, MX, MEX, Morocco: Morocco, MA, MOR, Mozambique: Mozambique, MZ, MOZ, Netherlands: Netherlands, NL, NLD, NewZealand: NewZealand, NZ, NZL, Nicaragua: Nicaragua, NI, NIC, Nigeria: Nigeria, NG, NGA, Northern Ireland: NorthernIreland, Norway: Norway, NO, NOR, Paraguay: Paraguay, PY, PRY, Peru: Peru, PE, PER, Poland: Poland, PL, POL, Portugal: Portugal, PT, PRT, Portugal Ext: PortugalExt, PTE, Romania: Romania, RO, ROU, Russia: Russia, RU, RUS, Saudi Arabia: SaudiArabia, SA, SAU, Scotland: Scotland, Serbia: Serbia, RS, SRB, Singapore: Singapore, SG, SGP, Slovokia: Slovokia, SK, SVK, Slovenia: Slovenia, SI, SVN, South Africa: SouthAfrica, ZA, ZAF, Spain: Spain, ES, ESP, Sweden: Sweden, SE, SWE, Switzerland: Switzerland, CH, CHE, Turkey: Turkey, TR, TUR, Ukraine: Ukraine, UA, UKR, United Arab Emirates: UnitedArabEmirates, AE, ARE, United Kingdom: UnitedKingdom, GB, GBR, UK, United States: UnitedStates, US, USA, Venezuela: Venezuela, YV, VEN, Vietnam: Vietnam, VN, VNM, Wales: Wales | `'UnitedStates'` |\n\n## Returns\n\n| Type | Description |\n|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| pd.DataFrame: | A pandas DataFrame with three holiday-specific features: - is_holiday: (0, 1) indicator for holiday - before_holiday: (0, 1) indicator for day before holiday - after_holiday: (0, 1) indicator for day after holiday - holiday_name: name of the holiday |\n\n## Example\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\n# Make a DataFrame with a date column\nstart_date = '2023-01-01'\nend_date = '2023-01-10'\ndates = pd.date_range(start=start_date, end=end_date)\n\n# Get holiday features for US\ntk.get_holiday_signature(dates, 'UnitedStates')\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
idxis_holidaybefore_holidayafter_holidayholiday_name
02023-01-01110New Year's Day
12023-01-02101New Year's Day (Observed)
22023-01-03001NaN
32023-01-04000NaN
42023-01-05000NaN
52023-01-06000NaN
62023-01-07000NaN
72023-01-08000NaN
82023-01-09000NaN
92023-01-10000NaN
\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Get holiday features for France\ntk.get_holiday_signature(dates, 'France')\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
idxis_holidaybefore_holidayafter_holidayholiday_name
02023-01-01100New Year's Day
12023-01-02001NaN
22023-01-03000NaN
32023-01-04000NaN
42023-01-05000NaN
52023-01-06000NaN
62023-01-07000NaN
72023-01-08000NaN
82023-01-09000NaN
92023-01-10000NaN
\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Pandas Series\npd.Series(dates, name='dates').get_holiday_signature('UnitedStates')\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datesis_holidaybefore_holidayafter_holidayholiday_name
02023-01-01110New Year's Day
12023-01-02101New Year's Day (Observed)
22023-01-03001NaN
32023-01-04000NaN
42023-01-05000NaN
52023-01-06000NaN
62023-01-07000NaN
72023-01-08000NaN
82023-01-09000NaN
92023-01-10000NaN
\n
\n```\n:::\n:::\n\n\n", "supporting": [ - "get_holiday_signature_files\\figure-html" + "get_holiday_signature_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/get_timeseries_signature/execute-results/html.json b/docs/_freeze/reference/get_timeseries_signature/execute-results/html.json index 0cb2bed4..53b6a0a8 100644 --- a/docs/_freeze/reference/get_timeseries_signature/execute-results/html.json +++ b/docs/_freeze/reference/get_timeseries_signature/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "5a64cf9cc5c5961ce449fa7b8b9930b5", + "hash": "8130bc897e623f1ae1d107185220f81c", "result": { "markdown": "---\ntitle: get_timeseries_signature\n---\n\n\n\n`get_timeseries_signature(idx)`\n\nConvert a timestamp to a set of 29 time series features.\n\nThe function `tk_get_timeseries_signature` engineers **29 different date and time based features** from a single datetime index `idx`: \n\n- index_num: An int64 feature that captures the entire datetime as a numeric value to the second\n- year: The year of the datetime\n- year_iso: The iso year of the datetime\n- yearstart: Logical (0,1) indicating if first day of year (defined by frequency)\n- yearend: Logical (0,1) indicating if last day of year (defined by frequency)\n- leapyear: Logical (0,1) indicating if the date belongs to a leap year\n- half: Half year of the date: Jan-Jun = 1, July-Dec = 2\n- quarter: Quarter of the date: Jan-Mar = 1, Apr-Jun = 2, Jul-Sep = 3, Oct-Dec = 4\n- quarteryear: Quarter of the date + relative year\n- quarterstart: Logical (0,1) indicating if first day of quarter (defined by frequency)\n- quarterend: Logical (0,1) indicating if last day of quarter (defined by frequency)\n- month: The month of the datetime\n- month_lbl: The month label of the datetime\n- monthstart: Logical (0,1) indicating if first day of month (defined by frequency)\n- monthend: Logical (0,1) indicating if last day of month (defined by frequency)\n- yweek: The week ordinal of the year\n- mweek: The week ordinal of the month\n- wday: The number of the day of the week with Monday=1, Sunday=6\n- wday_lbl: The day of the week label\n- mday: The day of the datetime\n- qday: The days of the relative quarter\n- yday: The ordinal day of year\n- weekend: Logical (0,1) indicating if the day is a weekend \n- hour: The hour of the datetime\n- minute: The minutes of the datetime\n- second: The seconds of the datetime\n- msecond: The microseconds of the datetime\n- nsecond: The nanoseconds of the datetime\n- am_pm: Half of the day, AM = ante meridiem, PM = post meridiem\n\n## Parameters\n\n| Name | Type | Description | Default |\n|--------|-------------------------------|-----------------------------------------------------------------------------------------------------------|------------|\n| `idx` | pd.Series or pd.DatetimeIndex | idx is a pandas Series object containing datetime values. Alternatively a pd.DatetimeIndex can be passed. | _required_ |\n\n## Returns\n\n| Type | Description |\n|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|\n| The function `tk_get_timeseries_signature` returns a pandas DataFrame that contains 29 different date and time based features derived from a single datetime column. | |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\npd.set_option('display.max_columns', None)\n\ndates = pd.date_range(start = '2019-01', end = '2019-03', freq = 'D')\n\n# Makes 29 new time series features from the dates\ntk.get_timeseries_signature(dates).head()\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
index_numyearyear_isoyearstartyearendleapyearhalfquarterquarteryearquarterstartquarterendmonthmonth_lblmonthstartmonthendyweekmweekwdaywday_lblmdayqdayydayweekendhourminutesecondmsecondnsecondam_pm
0154630080020192019100112019Q1101January10112Tuesday111000000am
1154638720020192019000112019Q1001January00113Wednesday222000000am
2154647360020192019000112019Q1001January00114Thursday333000000am
3154656000020192019000112019Q1001January00115Friday444000000am
4154664640020192019000112019Q1001January00116Saturday555000000am
\n
\n```\n:::\n:::\n\n\n", "supporting": [ - "get_timeseries_signature_files\\figure-html" + "get_timeseries_signature_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/is_holiday/execute-results/html.json b/docs/_freeze/reference/is_holiday/execute-results/html.json index c4e8d677..471ee469 100644 --- a/docs/_freeze/reference/is_holiday/execute-results/html.json +++ b/docs/_freeze/reference/is_holiday/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "e36a5dc74fcf52f3ab6ee765f11754b1", + "hash": "230bd4dba48ac447c027753a8f9b28bd", "result": { "markdown": "---\ntitle: is_holiday\n---\n\n\n\n`is_holiday(idx, country_name='UnitedStates', country=None)`\n\nCheck if a given list of dates are holidays for a specified country.\n\nNote: This function requires the `holidays` package to be installed.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|----------------|-------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|------------------|\n| `idx` | Union\\[str, datetime, List\\[Union\\[str, datetime\\]\\], pd.DatetimeIndex, pd.Series\\] | The dates to check for holiday status. | _required_ |\n| `country_name` | str | The name of the country for which to check the holiday status. Defaults to 'UnitedStates' if not specified. | `'UnitedStates'` |\n| `country` | str | An alternative parameter to specify the country for holiday checking, overriding country_name. | `None` |\n\n## Returns:\n\npd.Series:\n Series containing True if the date is a holiday, False otherwise.\n\n\n\n## Raises:\n\nValueError:\n If the specified country is not found in the holidays package.\n\n\n\n## Examples:\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\ntk.is_holiday('2023-01-01', country_name='UnitedStates')\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```\n0 True\nName: is_holiday, dtype: bool\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# List of dates\ntk.is_holiday(['2023-01-01', '2023-01-02', '2023-01-03'], country_name='UnitedStates')\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```\n0 True\n1 True\n2 False\nName: is_holiday, dtype: bool\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# DatetimeIndex\ntk.is_holiday(pd.date_range(\"2023-01-01\", \"2023-01-03\"), country_name='UnitedStates')\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```\n0 True\n1 True\n2 False\nName: is_holiday, dtype: bool\n```\n:::\n:::\n\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# Pandas Series Method\n( \n pd.Series(pd.date_range(\"2023-01-01\", \"2023-01-03\"))\n .is_holiday(country_name='UnitedStates')\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```\n0 True\n1 True\n2 False\nName: is_holiday, dtype: bool\n```\n:::\n:::\n\n\n", "supporting": [ - "is_holiday_files\\figure-html" + "is_holiday_files/figure-html" ], "filters": [], "includes": {} diff --git a/docs/_freeze/reference/load_dataset/execute-results/html.json b/docs/_freeze/reference/load_dataset/execute-results/html.json index fa4727d9..6bbc1d60 100644 --- a/docs/_freeze/reference/load_dataset/execute-results/html.json +++ b/docs/_freeze/reference/load_dataset/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "7e46c6fb667f2333a394aa5114908fdf", + "hash": "d3a04477738cc6629ab3933c2b52be53", "result": { - "markdown": "---\ntitle: load_dataset\n---\n\n\n\n`datasets.get_datasets.load_dataset(name='m4_daily', verbose=False, **kwargs)`\n\nLoad one of 12 Time Series Datasets.\n\nThe `load_dataset` function is used to load various time series datasets by name, with options to print the available datasets and pass additional arguments to `pandas.read_csv`. The available datasets are:\n\n- `m4_hourly`: The M4 hourly dataset\n- `m4_daily`: The M4 daily dataset\n- `m4_weekly`: The M4 weekly dataset\n- `m4_monthly`: The M4 monthly dataset\n- `m4_quarterly`: The M4 quarterly dataset\n- `m4_yearly`: The M4 yearly dataset\n- `bike_sharing_daily`: The bike sharing daily dataset\n- `bike_sales_sample`: The bike sales sample dataset\n- `taylor_30_min`: The Taylor 30 minute dataset\n- `walmart_sales_weekly`: The Walmart sales weekly dataset\n- `wikipedia_traffic_daily`: The Wikipedia traffic daily dataset\n- `stocks_daily`: The MAANNG stocks dataset\n\nThe datasets can be loaded with `pytimetk.load_dataset(name)`, where `name` is the name of the dataset that you want to load. The default value is set to \"m4_daily\", which is the M4 daily dataset. However, you can choose from a list of available datasets mentioned above.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|\n| `name` | str | The `name` parameter is used to specify the name of the dataset that you want to load. The default value is set to \"m4_daily\", which is the M4 daily dataset. However, you can choose from a list of available datasets mentioned in the function's docstring. | `'m4_daily'` |\n| `verbose` | bool | The `verbose` parameter is a boolean flag that determines whether or not to print the names of the available datasets. If `verbose` is set to `True`, the function will print the names of the available datasets. If `verbose` is set to `False`, the function will not print anything. | `False` |\n| `**kwargs` | | The `**kwargs` parameter is used to pass additional arguments to `pandas.read_csv`. | `{}` |\n\n## Returns\n\n| Type | Description |\n|--------------|----------------------------------------------------------------------------------|\n| pd.DataFrame | The `load_dataset` function returns the requested dataset as a pandas DataFrame. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\nimport pandas as pd\n```\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Stocks Daily Dataset: META, APPL, AMZN, NFLX, NVDA, GOOG\ndf = tk.load_dataset('stocks_daily', parse_dates = ['date'])\n\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
symboldateopenhighlowclosevolumeadjusted
0META2013-01-0227.44000128.18000027.42000028.0000006984640028.000000
1META2013-01-0327.87999928.46999927.59000027.7700006314060027.770000
2META2013-01-0428.01000028.93000027.83000028.7600007271540028.760000
3META2013-01-0728.69000129.79000128.65000029.4200008378180029.420000
4META2013-01-0829.51000029.60000028.86000129.0599994587130029.059999
...........................
16189GOOG2023-09-15138.800003139.360001137.179993138.30000348947600138.300003
16190GOOG2023-09-18137.630005139.929993137.630005138.96000716233600138.960007
16191GOOG2023-09-19138.250000139.175003137.500000138.83000215479100138.830002
16192GOOG2023-09-20138.830002138.839996134.520004134.58999621473500134.589996
16193GOOG2023-09-21132.389999133.190002131.089996131.36000122042700131.360001
\n

16194 rows Γ— 8 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Bike Sales CRM Sample Dataset\ndf = tk.load_dataset('bike_sales_sample', parse_dates = ['order_date'])\n\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
order_idorder_lineorder_datequantitypricetotal_pricemodelcategory_1category_2frame_materialbikeshop_namecitystate
0112011-01-07160706070Jekyll Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
1122011-01-07159705970Trigger Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
2212011-01-10127702770Beast of the East 1MountainTrailAluminumKansas City 29ersKansas CityKS
3222011-01-10159705970Trigger Carbon 2MountainOver MountainCarbonKansas City 29ersKansas CityKS
4312011-01-1011066010660Supersix Evo Hi-Mod TeamRoadElite RoadCarbonLouisville Race EquipmentLouisvilleKY
..........................................
246132132011-12-22114101410CAAD8 105RoadElite RoadAluminumMiami Race EquipmentMiamiFL
246232212011-12-28112501250Synapse Disc TiagraRoadEndurance RoadAluminumPhoenix Bi-pedsPhoenixAZ
246332222011-12-28126602660Bad Habit 2MountainTrailAluminumPhoenix Bi-pedsPhoenixAZ
246432232011-12-28123402340F-Si 1MountainCross Country RaceAluminumPhoenix Bi-pedsPhoenixAZ
246532242011-12-28158605860Synapse Hi-Mod Dura AceRoadEndurance RoadCarbonPhoenix Bi-pedsPhoenixAZ
\n

2466 rows Γ— 13 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# Taylor 30-Minute Power Demand Dataset\ndf = tk.load_dataset('taylor_30_min', parse_dates = ['date'])\n\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
02000-06-05 00:00:00+00:0022262
12000-06-05 00:30:00+00:0021756
22000-06-05 01:00:00+00:0022247
32000-06-05 01:30:00+00:0022759
42000-06-05 02:00:00+00:0022549
.........
40272000-08-27 21:30:00+00:0027946
40282000-08-27 22:00:00+00:0027133
40292000-08-27 22:30:00+00:0025996
40302000-08-27 23:00:00+00:0024610
40312000-08-27 23:30:00+00:0023132
\n

4032 rows Γ— 2 columns

\n
\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: load_dataset\n---\n\n\n\n`load_dataset(name='m4_daily', verbose=False, **kwargs)`\n\nLoad one of 12 Time Series Datasets.\n\nThe `load_dataset` function is used to load various time series datasets by name, with options to print the available datasets and pass additional arguments to `pandas.read_csv`. The available datasets are:\n\n- `m4_hourly`: The M4 hourly dataset\n- `m4_daily`: The M4 daily dataset\n- `m4_weekly`: The M4 weekly dataset\n- `m4_monthly`: The M4 monthly dataset\n- `m4_quarterly`: The M4 quarterly dataset\n- `m4_yearly`: The M4 yearly dataset\n- `bike_sharing_daily`: The bike sharing daily dataset\n- `bike_sales_sample`: The bike sales sample dataset\n- `taylor_30_min`: The Taylor 30 minute dataset\n- `walmart_sales_weekly`: The Walmart sales weekly dataset\n- `wikipedia_traffic_daily`: The Wikipedia traffic daily dataset\n- `stocks_daily`: The MAANNG stocks dataset\n- `expedia`: Expedia Hotel Time Series Dataset\n\nThe datasets can be loaded with `pytimetk.load_dataset(name)`, where `name` is the name of the dataset that you want to load. The default value is set to \"m4_daily\", which is the M4 daily dataset. However, you can choose from a list of available datasets mentioned above.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|\n| `name` | str | The `name` parameter is used to specify the name of the dataset that you want to load. The default value is set to \"m4_daily\", which is the M4 daily dataset. However, you can choose from a list of available datasets mentioned in the function's docstring. | `'m4_daily'` |\n| `verbose` | bool | The `verbose` parameter is a boolean flag that determines whether or not to print the names of the available datasets. If `verbose` is set to `True`, the function will print the names of the available datasets. If `verbose` is set to `False`, the function will not print anything. | `False` |\n| `**kwargs` | | The `**kwargs` parameter is used to pass additional arguments to `pandas.read_csv`. | `{}` |\n\n## Returns\n\n| Type | Description |\n|--------------|----------------------------------------------------------------------------------|\n| pd.DataFrame | The `load_dataset` function returns the requested dataset as a pandas DataFrame. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\nimport pandas as pd\n```\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Stocks Daily Dataset: META, APPL, AMZN, NFLX, NVDA, GOOG\ndf = tk.load_dataset('stocks_daily', parse_dates = ['date'])\n\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
symboldateopenhighlowclosevolumeadjusted
0META2013-01-0227.44000128.18000027.42000028.0000006984640028.000000
1META2013-01-0327.87999928.46999927.59000027.7700006314060027.770000
2META2013-01-0428.01000028.93000027.83000028.7600007271540028.760000
3META2013-01-0728.69000129.79000128.65000029.4200008378180029.420000
4META2013-01-0829.51000029.60000028.86000129.0599994587130029.059999
...........................
16189GOOG2023-09-15138.800003139.360001137.179993138.30000348947600138.300003
16190GOOG2023-09-18137.630005139.929993137.630005138.96000716233600138.960007
16191GOOG2023-09-19138.250000139.175003137.500000138.83000215479100138.830002
16192GOOG2023-09-20138.830002138.839996134.520004134.58999621473500134.589996
16193GOOG2023-09-21132.389999133.190002131.089996131.36000122042700131.360001
\n

16194 rows Γ— 8 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Bike Sales CRM Sample Dataset\ndf = tk.load_dataset('bike_sales_sample', parse_dates = ['order_date'])\n\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
order_idorder_lineorder_datequantitypricetotal_pricemodelcategory_1category_2frame_materialbikeshop_namecitystate
0112011-01-07160706070Jekyll Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
1122011-01-07159705970Trigger Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
2212011-01-10127702770Beast of the East 1MountainTrailAluminumKansas City 29ersKansas CityKS
3222011-01-10159705970Trigger Carbon 2MountainOver MountainCarbonKansas City 29ersKansas CityKS
4312011-01-1011066010660Supersix Evo Hi-Mod TeamRoadElite RoadCarbonLouisville Race EquipmentLouisvilleKY
..........................................
246132132011-12-22114101410CAAD8 105RoadElite RoadAluminumMiami Race EquipmentMiamiFL
246232212011-12-28112501250Synapse Disc TiagraRoadEndurance RoadAluminumPhoenix Bi-pedsPhoenixAZ
246332222011-12-28126602660Bad Habit 2MountainTrailAluminumPhoenix Bi-pedsPhoenixAZ
246432232011-12-28123402340F-Si 1MountainCross Country RaceAluminumPhoenix Bi-pedsPhoenixAZ
246532242011-12-28158605860Synapse Hi-Mod Dura AceRoadEndurance RoadCarbonPhoenix Bi-pedsPhoenixAZ
\n

2466 rows Γ— 13 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# Taylor 30-Minute Power Demand Dataset\ndf = tk.load_dataset('taylor_30_min', parse_dates = ['date'])\n\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datevalue
02000-06-05 00:00:00+00:0022262
12000-06-05 00:30:00+00:0021756
22000-06-05 01:00:00+00:0022247
32000-06-05 01:30:00+00:0022759
42000-06-05 02:00:00+00:0022549
.........
40272000-08-27 21:30:00+00:0027946
40282000-08-27 22:00:00+00:0027133
40292000-08-27 22:30:00+00:0025996
40302000-08-27 23:00:00+00:0024610
40312000-08-27 23:30:00+00:0023132
\n

4032 rows Γ— 2 columns

\n
\n```\n:::\n:::\n\n\n", "supporting": [ - "load_dataset_files\\figure-html" + "load_dataset_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/make_future_timeseries/execute-results/html.json b/docs/_freeze/reference/make_future_timeseries/execute-results/html.json index a18b0017..54965239 100644 --- a/docs/_freeze/reference/make_future_timeseries/execute-results/html.json +++ b/docs/_freeze/reference/make_future_timeseries/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "be8f3037be160a6d5f023d404f074e28", + "hash": "9b4f4d8a56c88a8dde274fdedbdd7ded", "result": { "markdown": "---\ntitle: make_future_timeseries\n---\n\n\n\n`make_future_timeseries(idx, length_out, force_regular=False)`\n\nMake future dates for a time series.\n\nThe function `make_future_timeseries` takes a pandas Series or DateTimeIndex and generates a future sequence of dates based on the frequency of the input series.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|-----------------|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `idx` | pd.Series or pd.DateTimeIndex | The `idx` parameter is the input time series data. It can be either a pandas Series or a pandas DateTimeIndex. It represents the existing dates in the time series. | _required_ |\n| `length_out` | int | The parameter `length_out` is an integer that represents the number of future dates to generate for the time series. | _required_ |\n| `force_regular` | bool | The `force_regular` parameter is a boolean flag that determines whether the frequency of the future dates should be forced to be regular. If `force_regular` is set to `True`, the frequency of the future dates will be forced to be regular. If `force_regular` is set to `False`, the frequency of the future dates will be inferred from the input data (e.g. business calendars might be used). The default value is `False`. | `False` |\n\n## Returns\n\n| Type | Description |\n|-----------|-------------------------------------------------|\n| pd.Series | A pandas Series object containing future dates. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\ndates = pd.Series(pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04']))\ndates\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```\n0 2022-01-01\n1 2022-01-02\n2 2022-01-03\n3 2022-01-04\ndtype: datetime64[ns]\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# DateTimeIndex: Generate 5 future dates\nfuture_dates_dt = tk.make_future_timeseries(dates, 5)\nfuture_dates_dt\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```\n0 2022-01-05\n1 2022-01-06\n2 2022-01-07\n3 2022-01-08\n4 2022-01-09\ndtype: datetime64[ns]\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Series: Generate 5 future dates\npd.Series(future_dates_dt).make_future_timeseries(5)\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```\n0 2022-01-10\n1 2022-01-11\n2 2022-01-12\n3 2022-01-13\n4 2022-01-14\ndtype: datetime64[ns]\n```\n:::\n:::\n\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\ntimestamps = [\"2023-01-01 01:00\", \"2023-01-01 02:00\", \"2023-01-01 03:00\", \"2023-01-01 04:00\", \"2023-01-01 05:00\"]\n\ndates = pd.to_datetime(timestamps)\n\ntk.make_future_timeseries(dates, 5)\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```\n0 2023-01-01 06:00:00\n1 2023-01-01 07:00:00\n2 2023-01-01 08:00:00\n3 2023-01-01 09:00:00\n4 2023-01-01 10:00:00\ndtype: datetime64[ns]\n```\n:::\n:::\n\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n# Monthly Frequency: Generate 4 future dates\ndates = pd.to_datetime([\"2021-01-01\", \"2021-02-01\", \"2021-03-01\", \"2021-04-01\"])\n\ntk.make_future_timeseries(dates, 4)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```\n0 2021-05-01\n1 2021-06-01\n2 2021-07-01\n3 2021-08-01\ndtype: datetime64[ns]\n```\n:::\n:::\n\n\n::: {.cell execution_count=6}\n``` {.python .cell-code}\n# Quarterly Frequency: Generate 4 future dates\ndates = pd.to_datetime([\"2021-01-01\", \"2021-04-01\", \"2021-07-01\", \"2021-10-01\"])\n\ntk.make_future_timeseries(dates, 4)\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```\n0 2022-01-01\n1 2022-04-01\n2 2022-07-01\n3 2022-10-01\ndtype: datetime64[ns]\n```\n:::\n:::\n\n\n::: {.cell execution_count=7}\n``` {.python .cell-code}\n# Irregular Dates: Business Days\ndates = pd.to_datetime([\"2021-01-01\", \"2021-01-04\", \"2021-01-05\", \"2021-01-06\"])\n\ntk.get_pandas_frequency(dates)\n\ntk.make_future_timeseries(dates, 4)\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```\n0 2021-01-07\n1 2021-01-08\n2 2021-01-11\n3 2021-01-12\ndtype: datetime64[ns]\n```\n:::\n:::\n\n\n::: {.cell execution_count=8}\n``` {.python .cell-code}\n# Irregular Dates: Business Days (Force Regular) \ntk.make_future_timeseries(dates, 4, force_regular=True)\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```\n0 2021-01-07\n1 2021-01-08\n2 2021-01-09\n3 2021-01-10\ndtype: datetime64[ns]\n```\n:::\n:::\n\n\n", "supporting": [ - "make_future_timeseries_files\\figure-html" + "make_future_timeseries_files/figure-html" ], "filters": [], "includes": {} diff --git a/docs/_freeze/reference/make_weekday_sequence/execute-results/html.json b/docs/_freeze/reference/make_weekday_sequence/execute-results/html.json index c169804d..fb094382 100644 --- a/docs/_freeze/reference/make_weekday_sequence/execute-results/html.json +++ b/docs/_freeze/reference/make_weekday_sequence/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "94e6ab762bdd4dd9272ca76bff00cb5a", + "hash": "77df3fd005c340e01d234513ddedf9ad", "result": { "markdown": "---\ntitle: make_weekday_sequence\n---\n\n\n\n`make_weekday_sequence(start_date, end_date, sunday_to_thursday=False, remove_holidays=False, country=None)`\n\nGenerate a sequence of weekday dates within a specified date range, optionally excluding weekends and holidays.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|----------------------|-------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `start_date` | str or datetime or pd.DatetimeIndex | The start date of the date range. | _required_ |\n| `end_date` | str or datetime or pd.DatetimeIndex | The end date of the date range. | _required_ |\n| `sunday_to_thursday` | bool | If True, generates a sequence with Sunday to Thursday weekdays (excluding Friday and Saturday). If False (default), generates a sequence with Monday to Friday weekdays. | `False` |\n| `remove_holidays` | (bool, optional) | If True, excludes holidays (based on the specified country) from the generated sequence. If False (default), includes holidays in the sequence. | `False` |\n| `country` | str | The name of the country for which to generate holiday-specific sequences. Defaults to None, which uses the United States as the default country. | `None` |\n\n## Returns\n\n| Type | Description |\n|-----------|--------------------------------------------------|\n| pd.Series | A Series containing the generated weekday dates. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\n# United States has Monday to Friday as weekdays (excluding Saturday and Sunday and holidays)\ntk.make_weekday_sequence(\"2023-01-01\", \"2023-01-15\", sunday_to_thursday=False, remove_holidays=True, country='UnitedStates')\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```\n0 2023-01-03\n1 2023-01-04\n2 2023-01-05\n3 2023-01-06\n4 2023-01-09\n5 2023-01-10\n6 2023-01-11\n7 2023-01-12\n8 2023-01-13\nName: Weekday Dates, dtype: datetime64[ns]\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Israel has Sunday to Thursday as weekdays (excluding Friday and Saturday and Israel holidays)\ntk.make_weekday_sequence(\"2023-01-01\", \"2023-01-15\", sunday_to_thursday=True, remove_holidays=True, country='Israel')\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```\n0 2023-01-01\n1 2023-01-02\n2 2023-01-03\n3 2023-01-04\n4 2023-01-05\n5 2023-01-08\n6 2023-01-09\n7 2023-01-10\n8 2023-01-11\n9 2023-01-12\n10 2023-01-15\nName: Weekday Dates, dtype: datetime64[ns]\n```\n:::\n:::\n\n\n", "supporting": [ - "make_weekday_sequence_files\\figure-html" + "make_weekday_sequence_files/figure-html" ], "filters": [], "includes": {} diff --git a/docs/_freeze/reference/make_weekend_sequence/execute-results/html.json b/docs/_freeze/reference/make_weekend_sequence/execute-results/html.json index 7b8598fb..0c88e1f5 100644 --- a/docs/_freeze/reference/make_weekend_sequence/execute-results/html.json +++ b/docs/_freeze/reference/make_weekend_sequence/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "868513ab5d84adef7d9b9797427f3dbd", + "hash": "891c8ad358c2d0dbd7f1ff31765a3223", "result": { "markdown": "---\ntitle: make_weekend_sequence\n---\n\n\n\n`make_weekend_sequence(start_date, end_date, friday_saturday=False, remove_holidays=False, country=None)`\n\nGenerate a sequence of weekend dates within a specified date range, optionally excluding holidays.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|-------------------|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `start_date` | str or datetime or pd.DatetimeIndex | The start date of the date range. | _required_ |\n| `end_date` | str or datetime or pd.DatetimeIndex | The end date of the date range. | _required_ |\n| `friday_saturday` | bool | If True, generates a sequence with Friday and Saturday as weekends.If False (default), generates a sequence with Saturday and Sunday as weekends. | `False` |\n| `remove_holidays` | bool | If True, excludes holidays (based on the specified country) from the generated sequence. If False (default), includes holidays in the sequence. | `False` |\n| `country` | str | The name of the country for which to generate holiday-specific sequences. Defaults to None, which uses the United States as the default country. | `None` |\n\n## Returns\n\n| Type | Description |\n|-----------|--------------------------------------------------|\n| pd.Series | A Series containing the generated weekday dates. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\n# United States has Saturday and Sunday as weekends\ntk.make_weekend_sequence(\"2023-01-01\", \"2023-01-31\", friday_saturday=False, remove_holidays=True, country='UnitedStates')\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```\n0 2023-01-07\n1 2023-01-08\n2 2023-01-14\n3 2023-01-15\n4 2023-01-21\n5 2023-01-22\n6 2023-01-28\n7 2023-01-29\nName: Weekend Dates, dtype: datetime64[ns]\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Saudi Arabia has Friday and Saturday as weekends\ntk.make_weekend_sequence(\"2023-01-01\", \"2023-01-31\", friday_saturday=True, remove_holidays=True, country='SaudiArabia')\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```\n0 2023-01-06\n1 2023-01-07\n2 2023-01-13\n3 2023-01-14\n4 2023-01-20\n5 2023-01-21\n6 2023-01-27\n7 2023-01-28\nName: Weekend Dates, dtype: datetime64[ns]\n```\n:::\n:::\n\n\n", "supporting": [ - "make_weekend_sequence_files\\figure-html" + "make_weekend_sequence_files/figure-html" ], "filters": [], "includes": {} diff --git a/docs/_freeze/reference/pad_by_time/execute-results/html.json b/docs/_freeze/reference/pad_by_time/execute-results/html.json index 1d3aa19f..9c18df3d 100644 --- a/docs/_freeze/reference/pad_by_time/execute-results/html.json +++ b/docs/_freeze/reference/pad_by_time/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "9284b564dde025e41b498f83aad472ac", + "hash": "93cba57a0e0cc6cd43e20aeee2881fca", "result": { "markdown": "---\ntitle: pad_by_time\n---\n\n\n\n`pad_by_time(data, date_column, freq='D', start_date=None, end_date=None)`\n\nMake irregular time series regular by padding with missing dates.\n\n\nThe `pad_by_time` function inserts missing dates into a Pandas DataFrame or DataFrameGroupBy object, through the process making an irregularly spaced time series regularly spaced.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|---------------|----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `data` | pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy | The `data` parameter can be either a Pandas DataFrame or a Pandas DataFrameGroupBy object. It represents the data that you want to pad with missing dates. | _required_ |\n| `date_column` | str | The `date_column` parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to determine the minimum and maximum dates in theDataFrame, and to generate the regular date range for padding. | _required_ |\n| `freq` | str | The `freq` parameter specifies the frequency at which the missing timestamps should be generated. It accepts a string representing a pandas frequency alias. Some common frequency aliases include: - S: secondly frequency - min: minute frequency - H: hourly frequency - B: business day frequency - D: daily frequency - W: weekly frequency - M: month end frequency - MS: month start frequency - BMS: Business month start - Q: quarter end frequency - QS: quarter start frequency - Y: year end frequency - YS: year start frequency | `'D'` |\n| `start_date` | str | Specifies the start of the padded series. If NULL, it will use the lowest value of the input variable. In the case of groups, it will use the lowest value by group. | `None` |\n| `end_date` | str | Specifies the end of the padded series. If NULL, it will use the highest value of the input variable. In the case of groups, it will use the highest value by group. | `None` |\n\n## Returns\n\n| Type | Description |\n|--------------|-------------------------------------------------------------------------------------------------|\n| pd.DataFrame | The function `pad_by_time` returns a Pandas DataFrame that has been extended with future dates. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport pytimetk as tk\n\ndf = tk.load_dataset('stocks_daily', parse_dates = ['date'])\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
symboldateopenhighlowclosevolumeadjusted
0META2013-01-0227.44000128.18000027.42000028.0000006984640028.000000
1META2013-01-0327.87999928.46999927.59000027.7700006314060027.770000
2META2013-01-0428.01000028.93000027.83000028.7600007271540028.760000
3META2013-01-0728.69000129.79000128.65000029.4200008378180029.420000
4META2013-01-0829.51000029.60000028.86000129.0599994587130029.059999
...........................
16189GOOG2023-09-15138.800003139.360001137.179993138.30000348947600138.300003
16190GOOG2023-09-18137.630005139.929993137.630005138.96000716233600138.960007
16191GOOG2023-09-19138.250000139.175003137.500000138.83000215479100138.830002
16192GOOG2023-09-20138.830002138.839996134.520004134.58999621473500134.589996
16193GOOG2023-09-21132.389999133.190002131.089996131.36000122042700131.360001
\n

16194 rows Γ— 8 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Pad Single Time Series: Fill missing dates\npadded_df = (\n df\n .query('symbol == \"AAPL\"')\n .pad_by_time(\n date_column = 'date',\n freq = 'D'\n )\n .assign(id = lambda x: x['symbol'].ffill())\n)\npadded_df \n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datesymbolopenhighlowclosevolumeadjustedid
02013-01-02AAPL19.77928519.82142819.34392919.608213560518000.016.791180AAPL
12013-01-03AAPL19.56714219.63107119.32142819.360714352965200.016.579241AAPL
22013-01-04AAPL19.17750019.23678618.77964218.821428594333600.016.117437AAPL
32013-01-05NaNNaNNaNNaNNaNNaNNaNAAPL
42013-01-06NaNNaNNaNNaNNaNNaNNaNAAPL
..............................
39102023-09-17NaNNaNNaNNaNNaNNaNNaNAAPL
39112023-09-18AAPL176.479996179.380005176.169998177.97000167257600.0177.970001AAPL
39122023-09-19AAPL177.520004179.630005177.130005179.07000751826900.0179.070007AAPL
39132023-09-20AAPL179.259995179.699997175.399994175.49000558436200.0175.490005AAPL
39142023-09-21AAPL174.550003176.300003173.860001173.92999363047900.0173.929993AAPL
\n

3915 rows Γ— 9 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Pad by Group: Pad each group with missing dates\npadded_df = (\n df\n .groupby('symbol')\n .pad_by_time(\n date_column = 'date',\n freq = 'D'\n )\n)\npadded_df\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datesymbolopenhighlowclosevolumeadjusted
02013-01-02AAPL19.77928519.82142819.34392919.608213560518000.016.791180
12013-01-03AAPL19.56714219.63107119.32142819.360714352965200.016.579241
22013-01-04AAPL19.17750019.23678618.77964218.821428594333600.016.117437
32013-01-05AAPLNaNNaNNaNNaNNaNNaN
42013-01-06AAPLNaNNaNNaNNaNNaNNaN
...........................
234852023-09-17NVDANaNNaNNaNNaNNaNNaN
234862023-09-18NVDA427.480011442.420013420.000000439.66000450027100.0439.660004
234872023-09-19NVDA438.329987439.660004430.019989435.20001237306400.0435.200012
234882023-09-20NVDA436.000000439.029999422.230011422.39001536710800.0422.390015
234892023-09-21NVDA415.829987421.000000409.799988410.17001344893000.0410.170013
\n

23490 rows Γ— 8 columns

\n
\n```\n:::\n:::\n\n\n# Pad with end dates specified\npadded_df = (\n df\n .groupby('symbol')\n .pad_by_time(\n date_column = 'date',\n freq = 'D',\n start_date = '2013-01-01',\n end_date = '2023-09-21'\n )\n)\npadded_df\n\n", "supporting": [ - "pad_by_time_files\\figure-html" + "pad_by_time_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/plot_timeseries/execute-results/html.json b/docs/_freeze/reference/plot_timeseries/execute-results/html.json index da5c2047..ff1db82e 100644 --- a/docs/_freeze/reference/plot_timeseries/execute-results/html.json +++ b/docs/_freeze/reference/plot_timeseries/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "bf8260c77c30a3817b65a7e17a42f694", + "hash": "f08f90b512a2c2a9b9cd9b049b2ee907", "result": { - "markdown": "---\ntitle: plot_timeseries\n---\n\n\n\n`plot_timeseries(data, date_column, value_column, color_column=None, color_palette=None, facet_ncol=1, facet_nrow=None, facet_scales='free_y', facet_dir='h', line_color='#2c3e50', line_size=0.65, line_type='solid', line_alpha=1.0, y_intercept=None, y_intercept_color='#2c3e50', x_intercept=None, x_intercept_color='#2c3e50', smooth=True, smooth_color='#3366FF', smooth_frac=0.2, smooth_size=1.0, smooth_alpha=1.0, legend_show=True, title='Time Series Plot', x_lab='', y_lab='', color_lab='Legend', x_axis_date_labels='%b %Y', base_size=11, width=None, height=None, engine='plotly')`\n\nCreates time series plots using different plotting engines such as Plotnine, Matplotlib, and Plotly.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|----------------------|----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|\n| `data` | pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy | The input data for the plot. It can be either a Pandas DataFrame or a Pandas DataFrameGroupBy object. | _required_ |\n| `date_column` | str | The name of the column in the DataFrame that contains the dates for the time series data. | _required_ |\n| `value_column` | str | The `value_column` parameter is used to specify the name of the column in the DataFrame that contains the values for the time series data. This column will be plotted on the y-axis of the time series plot. | _required_ |\n| `color_column` | str | The `color_column` parameter is an optional parameter that specifies the column in the DataFrame that will be used to assign colors to the different time series. If this parameter is not provided, all time series will have the same color. | `None` |\n| `color_palette` | list | The `color_palette` parameter is used to specify the colors to be used for the different time series. It accepts a list of color codes or names. If the `color_column` parameter is not provided, the `tk.palette_timetk()` color palette will be used. | `None` |\n| `facet_ncol` | int | The `facet_ncol` parameter determines the number of columns in the facet grid. It specifies how many subplots will be arranged horizontally in the plot. | `1` |\n| `facet_nrow` | int | The `facet_nrow` parameter determines the number of rows in the facet grid. It specifies how many subplots will be arranged vertically in the grid. | `None` |\n| `facet_scales` | str | The `facet_scales` parameter determines the scaling of the y-axis in the facetted plots. It can take the following values: - \"free_y\": The y-axis scale will be free for each facet, but the x-axis scale will be fixed for all facets. This is the default value. - \"free_x\": The y-axis scale will be free for each facet, but the x-axis scale will be fixed for all facets. - \"free\": The y-axis scale will be free for each facet (subplot). This is the default value. | `'free_y'` |\n| `facet_dir` | str | The `facet_dir` parameter determines the direction in which the facets (subplots) are arranged. It can take two possible values: - \"h\": The facets will be arranged horizontally (in rows). This is the default value. - \"v\": The facets will be arranged vertically (in columns). | `'h'` |\n| `line_color` | str | The `line_color` parameter is used to specify the color of the lines in the time series plot. It accepts a string value representing a color code or name. The default value is \"#2c3e50\", which corresponds to a dark blue color. | `'#2c3e50'` |\n| `line_size` | float | The `line_size` parameter is used to specify the size of the lines in the time series plot. It determines the thickness of the lines. | `0.65` |\n| `line_type` | str | The `line_type` parameter is used to specify the type of line to be used in the time series plot. | `'solid'` |\n| `line_alpha` | float | The `line_alpha` parameter controls the transparency of the lines in the time series plot. It accepts a value between 0 and 1, where 0 means completely transparent (invisible) and 1 means completely opaque (solid). | `1.0` |\n| `y_intercept` | float | The `y_intercept` parameter is used to add a horizontal line to the plot at a specific y-value. It can be set to a numeric value to specify the y-value of the intercept. If set to `None` (default), no y-intercept line will be added to the plot | `None` |\n| `y_intercept_color` | str | The `y_intercept_color` parameter is used to specify the color of the y-intercept line in the plot. It accepts a string value representing a color code or name. The default value is \"#2c3e50\", which corresponds to a dark blue color. You can change this value. | `'#2c3e50'` |\n| `x_intercept` | str | The `x_intercept` parameter is used to add a vertical line at a specific x-axis value on the plot. It is used to highlight a specific point or event in the time series data. - By default, it is set to `None`, which means no vertical line will be added. - You can use a date string to specify the x-axis value of the intercept. For example, \"2020-01-01\" would add a vertical line at the beginning of the year 2020. | `None` |\n| `x_intercept_color` | str | The `x_intercept_color` parameter is used to specify the color of the vertical line that represents the x-intercept in the plot. By default, it is set to \"#2c3e50\", which is a dark blue color. You can change this value to any valid color code. | `'#2c3e50'` |\n| `smooth` | bool | The `smooth` parameter is a boolean indicating whether or not to apply smoothing to the time eries data. If set to True, the time series will be smoothed using the lowess algorithm. The default value is True. | `True` |\n| `smooth_color` | str | The `smooth_color` parameter is used to specify the color of the smoothed line in the time series plot. It accepts a string value representing a color code or name. The default value is `#3366FF`, which corresponds to a shade of blue. You can change this value to any valid color code. | `'#3366FF'` |\n| `smooth_frac` | float | The `smooth_frac` parameter is used to control the fraction of data points used for smoothing the time series. It determines the degree of smoothing applied to the data. A smaller value of `smooth_frac` will result in more smoothing, while a larger value will result in less smoothing. The default value is 0.2. | `0.2` |\n| `smooth_size` | float | The `smooth_size` parameter is used to specify the size of the line used to plot the smoothed values in the time series plot. It is a numeric value that controls the thickness of the line. A larger value will result in a thicker line, while a smaller value will result in a thinner line | `1.0` |\n| `smooth_alpha` | float | The `smooth_alpha` parameter controls the transparency of the smoothed line in the plot. It accepts a value between 0 and 1, where 0 means completely transparent and 1 means completely opaque. | `1.0` |\n| `legend_show` | bool | The `legend_show` parameter is a boolean indicating whether or not to show the legend in the plot. If set to True, the legend will be displayed. The default value is True. | `True` |\n| `title` | str | The title of the plot. | `'Time Series Plot'` |\n| `x_lab` | str | The `x_lab` parameter is used to specify the label for the x-axis in the plot. It is a string that represents the label text. | `''` |\n| `y_lab` | str | The `y_lab` parameter is used to specify the label for the y-axis in the plot. It is a string that represents the label for the y-axis. | `''` |\n| `color_lab` | str | The `color_lab` parameter is used to specify the label for the legend or color scale in the plot. It is used to provide a description of the colors used in the plot, typically when a color column is specified. | `'Legend'` |\n| `x_axis_date_labels` | str | The `x_axis_date_labels` parameter is used to specify the format of the date labels on the x-axis of the plot. It accepts a string representing the format of the date labels. For example, \"%b %Y\" would display the month abbreviation and year (e.g., Jan 2020). | `'%b %Y'` |\n| `base_size` | float | The `base_size` parameter is used to set the base font size for the plot. It determines the size of the text elements such as axis labels, titles, and legends. | `11` |\n| `width` | int | The `width` parameter is used to specify the width of the plot. It determines the horizontal size of the plot in pixels. | `None` |\n| `height` | int | The `height` parameter is used to specify the height of the plot in pixels. It determines the vertical size of the plot when it is rendered. | `None` |\n| `engine` | str | The `engine` parameter specifies the plotting library to use for creating the time series plot. It can take one of the following values: - \"plotly\" (interactive): Use the plotly library to create the plot. This is the default value. - \"plotnine\" (static): Use the plotnine library to create the plot. This is the default value. - \"matplotlib\" (static): Use the matplotlib library to create the plot. | `'plotly'` |\n\n## Returns\n\n| Type | Description |\n|------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| The function `plot_timeseries` returns a plot object, depending on the specified `engine` parameter. | - If `engine` is set to 'plotnine' or 'matplotlib', the function returns a plot object that can be further customized or displayed. - If `engine` is set to 'plotly', the function returns a plotly figure object. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\n\ndf = tk.load_dataset('m4_monthly', parse_dates = ['date'])\n\n# Plotly Object: Single Time Series\nfig = (\n df\n .query('id == \"M750\"')\n .plot_timeseries(\n 'date', 'value', \n facet_ncol = 1,\n x_axis_date_labels = \"%Y\",\n engine = 'plotly',\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Plotly Object: Grouped Time Series\nfig = (\n df\n .groupby('id')\n .plot_timeseries(\n 'date', 'value', \n facet_ncol = 2,\n facet_scales = \"free_y\",\n smooth_frac = 0.2,\n smooth_size = 2.0,\n y_intercept = None,\n x_axis_date_labels = \"%Y\",\n engine = 'plotly',\n width = 600,\n height = 500,\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Plotly Object: Color Column\nfig = (\n df\n .plot_timeseries(\n 'date', 'value', \n color_column = 'id',\n smooth = False,\n y_intercept = 0,\n x_axis_date_labels = \"%Y\",\n engine = 'plotly',\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# Plotnine Object: Single Time Series\nfig = (\n df\n .query('id == \"M1\"')\n .plot_timeseries(\n 'date', 'value', \n x_axis_date_labels = \"%Y\",\n engine = 'plotnine'\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n![](plot_timeseries_files/figure-html/cell-5-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n# Plotnine Object: Grouped Time Series\nfig = (\n df\n .groupby('id')\n .plot_timeseries(\n 'date', 'value',\n facet_ncol = 2,\n facet_scales = \"free\",\n line_size = 0.35,\n x_axis_date_labels = \"%Y\",\n engine = 'plotnine'\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n![](plot_timeseries_files/figure-html/cell-6-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=5}\n```\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=6}\n``` {.python .cell-code}\n# Plotnine Object: Color Column\nfig = (\n df\n .plot_timeseries(\n 'date', 'value', \n color_column = 'id',\n smooth = False,\n y_intercept = 0,\n x_axis_date_labels = \"%Y\",\n engine = 'plotnine',\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n![](plot_timeseries_files/figure-html/cell-7-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=6}\n```\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=7}\n``` {.python .cell-code}\n# Matplotlib object (same as plotnine, but converted to matplotlib object)\nfig = (\n df\n .groupby('id')\n .plot_timeseries(\n 'date', 'value', \n color_column = 'id',\n facet_ncol = 2,\n x_axis_date_labels = \"%Y\",\n engine = 'matplotlib',\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n![](plot_timeseries_files/figure-html/cell-8-output-1.png){}\n:::\n:::\n\n\n", + "markdown": "---\ntitle: plot_timeseries\n---\n\n\n\n`plot_timeseries(data, date_column, value_column, color_column=None, color_palette=None, facet_ncol=1, facet_nrow=None, facet_scales='free_y', facet_dir='h', line_color='#2c3e50', line_size=0.65, line_type='solid', line_alpha=1.0, y_intercept=None, y_intercept_color='#2c3e50', x_intercept=None, x_intercept_color='#2c3e50', smooth=True, smooth_color='#3366FF', smooth_frac=0.2, smooth_size=1.0, smooth_alpha=1.0, legend_show=True, title='Time Series Plot', x_lab='', y_lab='', color_lab='Legend', x_axis_date_labels='%b %Y', base_size=11, width=None, height=None, engine='plotly')`\n\nCreates time series plots using different plotting engines such as Plotnine, Matplotlib, and Plotly.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|----------------------|----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|\n| `data` | pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy | The input data for the plot. It can be either a Pandas DataFrame or a Pandas DataFrameGroupBy object. | _required_ |\n| `date_column` | str | The name of the column in the DataFrame that contains the dates for the time series data. | _required_ |\n| `value_column` | str | The `value_column` parameter is used to specify the name of the column in the DataFrame that contains the values for the time series data. This column will be plotted on the y-axis of the time series plot. | _required_ |\n| `color_column` | str | The `color_column` parameter is an optional parameter that specifies the column in the DataFrame that will be used to assign colors to the different time series. If this parameter is not provided, all time series will have the same color. | `None` |\n| `color_palette` | list | The `color_palette` parameter is used to specify the colors to be used for the different time series. It accepts a list of color codes or names. If the `color_column` parameter is not provided, the `tk.palette_timetk()` color palette will be used. | `None` |\n| `facet_ncol` | int | The `facet_ncol` parameter determines the number of columns in the facet grid. It specifies how many subplots will be arranged horizontally in the plot. | `1` |\n| `facet_nrow` | int | The `facet_nrow` parameter determines the number of rows in the facet grid. It specifies how many subplots will be arranged vertically in the grid. | `None` |\n| `facet_scales` | str | The `facet_scales` parameter determines the scaling of the y-axis in the facetted plots. It can take the following values: - \"free_y\": The y-axis scale will be free for each facet, but the x-axis scale will be fixed for all facets. This is the default value. - \"free_x\": The y-axis scale will be free for each facet, but the x-axis scale will be fixed for all facets. - \"free\": The y-axis scale will be free for each facet (subplot). This is the default value. | `'free_y'` |\n| `facet_dir` | str | The `facet_dir` parameter determines the direction in which the facets (subplots) are arranged. It can take two possible values: - \"h\": The facets will be arranged horizontally (in rows). This is the default value. - \"v\": The facets will be arranged vertically (in columns). | `'h'` |\n| `line_color` | str | The `line_color` parameter is used to specify the color of the lines in the time series plot. It accepts a string value representing a color code or name. The default value is \"#2c3e50\", which corresponds to a dark blue color. | `'#2c3e50'` |\n| `line_size` | float | The `line_size` parameter is used to specify the size of the lines in the time series plot. It determines the thickness of the lines. | `0.65` |\n| `line_type` | str | The `line_type` parameter is used to specify the type of line to be used in the time series plot. | `'solid'` |\n| `line_alpha` | float | The `line_alpha` parameter controls the transparency of the lines in the time series plot. It accepts a value between 0 and 1, where 0 means completely transparent (invisible) and 1 means completely opaque (solid). | `1.0` |\n| `y_intercept` | float | The `y_intercept` parameter is used to add a horizontal line to the plot at a specific y-value. It can be set to a numeric value to specify the y-value of the intercept. If set to `None` (default), no y-intercept line will be added to the plot | `None` |\n| `y_intercept_color` | str | The `y_intercept_color` parameter is used to specify the color of the y-intercept line in the plot. It accepts a string value representing a color code or name. The default value is \"#2c3e50\", which corresponds to a dark blue color. You can change this value. | `'#2c3e50'` |\n| `x_intercept` | str | The `x_intercept` parameter is used to add a vertical line at a specific x-axis value on the plot. It is used to highlight a specific point or event in the time series data. - By default, it is set to `None`, which means no vertical line will be added. - You can use a date string to specify the x-axis value of the intercept. For example, \"2020-01-01\" would add a vertical line at the beginning of the year 2020. | `None` |\n| `x_intercept_color` | str | The `x_intercept_color` parameter is used to specify the color of the vertical line that represents the x-intercept in the plot. By default, it is set to \"#2c3e50\", which is a dark blue color. You can change this value to any valid color code. | `'#2c3e50'` |\n| `smooth` | bool | The `smooth` parameter is a boolean indicating whether or not to apply smoothing to the time eries data. If set to True, the time series will be smoothed using the lowess algorithm. The default value is True. | `True` |\n| `smooth_color` | str | The `smooth_color` parameter is used to specify the color of the smoothed line in the time series plot. It accepts a string value representing a color code or name. The default value is `#3366FF`, which corresponds to a shade of blue. You can change this value to any valid color code. | `'#3366FF'` |\n| `smooth_frac` | float | The `smooth_frac` parameter is used to control the fraction of data points used for smoothing the time series. It determines the degree of smoothing applied to the data. A smaller value of `smooth_frac` will result in more smoothing, while a larger value will result in less smoothing. The default value is 0.2. | `0.2` |\n| `smooth_size` | float | The `smooth_size` parameter is used to specify the size of the line used to plot the smoothed values in the time series plot. It is a numeric value that controls the thickness of the line. A larger value will result in a thicker line, while a smaller value will result in a thinner line | `1.0` |\n| `smooth_alpha` | float | The `smooth_alpha` parameter controls the transparency of the smoothed line in the plot. It accepts a value between 0 and 1, where 0 means completely transparent and 1 means completely opaque. | `1.0` |\n| `legend_show` | bool | The `legend_show` parameter is a boolean indicating whether or not to show the legend in the plot. If set to True, the legend will be displayed. The default value is True. | `True` |\n| `title` | str | The title of the plot. | `'Time Series Plot'` |\n| `x_lab` | str | The `x_lab` parameter is used to specify the label for the x-axis in the plot. It is a string that represents the label text. | `''` |\n| `y_lab` | str | The `y_lab` parameter is used to specify the label for the y-axis in the plot. It is a string that represents the label for the y-axis. | `''` |\n| `color_lab` | str | The `color_lab` parameter is used to specify the label for the legend or color scale in the plot. It is used to provide a description of the colors used in the plot, typically when a color column is specified. | `'Legend'` |\n| `x_axis_date_labels` | str | The `x_axis_date_labels` parameter is used to specify the format of the date labels on the x-axis of the plot. It accepts a string representing the format of the date labels. For example, \"%b %Y\" would display the month abbreviation and year (e.g., Jan 2020). | `'%b %Y'` |\n| `base_size` | float | The `base_size` parameter is used to set the base font size for the plot. It determines the size of the text elements such as axis labels, titles, and legends. | `11` |\n| `width` | int | The `width` parameter is used to specify the width of the plot. It determines the horizontal size of the plot in pixels. | `None` |\n| `height` | int | The `height` parameter is used to specify the height of the plot in pixels. It determines the vertical size of the plot when it is rendered. | `None` |\n| `engine` | str | The `engine` parameter specifies the plotting library to use for creating the time series plot. It can take one of the following values: - \"plotly\" (interactive): Use the plotly library to create the plot. This is the default value. - \"plotnine\" (static): Use the plotnine library to create the plot. This is the default value. - \"matplotlib\" (static): Use the matplotlib library to create the plot. | `'plotly'` |\n\n## Returns\n\n| Type | Description |\n|------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| The function `plot_timeseries` returns a plot object, depending on the specified `engine` parameter. | - If `engine` is set to 'plotnine' or 'matplotlib', the function returns a plot object that can be further customized or displayed. - If `engine` is set to 'plotly', the function returns a plotly figure object. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\n\ndf = tk.load_dataset('m4_monthly', parse_dates = ['date'])\n\n# Plotly Object: Single Time Series\nfig = (\n df\n .query('id == \"M750\"')\n .plot_timeseries(\n 'date', 'value', \n facet_ncol = 1,\n x_axis_date_labels = \"%Y\",\n engine = 'plotly',\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Plotly Object: Grouped Time Series\nfig = (\n df\n .groupby('id')\n .plot_timeseries(\n 'date', 'value', \n facet_ncol = 2,\n facet_scales = \"free_y\",\n smooth_frac = 0.2,\n smooth_size = 2.0,\n y_intercept = None,\n x_axis_date_labels = \"%Y\",\n engine = 'plotly',\n width = 600,\n height = 500,\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Plotly Object: Color Column\nfig = (\n df\n .plot_timeseries(\n 'date', 'value', \n color_column = 'id',\n smooth = False,\n y_intercept = 0,\n x_axis_date_labels = \"%Y\",\n engine = 'plotly',\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# Plotnine Object: Single Time Series\nfig = (\n df\n .query('id == \"M1\"')\n .plot_timeseries(\n 'date', 'value', \n x_axis_date_labels = \"%Y\",\n engine = 'plotnine'\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n![](plot_timeseries_files/figure-html/cell-5-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n# Plotnine Object: Grouped Time Series\nfig = (\n df\n .groupby('id')\n .plot_timeseries(\n 'date', 'value',\n facet_ncol = 2,\n facet_scales = \"free\",\n line_size = 0.35,\n x_axis_date_labels = \"%Y\",\n engine = 'plotnine'\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n![](plot_timeseries_files/figure-html/cell-6-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=5}\n```\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=6}\n``` {.python .cell-code}\n# Plotnine Object: Color Column\nfig = (\n df\n .plot_timeseries(\n 'date', 'value', \n color_column = 'id',\n smooth = False,\n y_intercept = 0,\n x_axis_date_labels = \"%Y\",\n engine = 'plotnine',\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display}\n![](plot_timeseries_files/figure-html/cell-7-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=6}\n```\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=7}\n``` {.python .cell-code}\n# Matplotlib object (same as plotnine, but converted to matplotlib object)\nfig = (\n df\n .groupby('id')\n .plot_timeseries(\n 'date', 'value', \n color_column = 'id',\n facet_ncol = 2,\n x_axis_date_labels = \"%Y\",\n engine = 'matplotlib',\n )\n)\nfig\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n![](plot_timeseries_files/figure-html/cell-8-output-1.png){}\n:::\n:::\n\n\n", "supporting": [ - "plot_timeseries_files\\figure-html" + "plot_timeseries_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/plot_timeseries/figure-html/cell-5-output-1.png b/docs/_freeze/reference/plot_timeseries/figure-html/cell-5-output-1.png index ede86673..c356eab5 100644 Binary files a/docs/_freeze/reference/plot_timeseries/figure-html/cell-5-output-1.png and b/docs/_freeze/reference/plot_timeseries/figure-html/cell-5-output-1.png differ diff --git a/docs/_freeze/reference/plot_timeseries/figure-html/cell-6-output-1.png b/docs/_freeze/reference/plot_timeseries/figure-html/cell-6-output-1.png index ba0c4456..878fd69d 100644 Binary files a/docs/_freeze/reference/plot_timeseries/figure-html/cell-6-output-1.png and b/docs/_freeze/reference/plot_timeseries/figure-html/cell-6-output-1.png differ diff --git a/docs/_freeze/reference/plot_timeseries/figure-html/cell-7-output-1.png b/docs/_freeze/reference/plot_timeseries/figure-html/cell-7-output-1.png index 0fd55823..f60cb27f 100644 Binary files a/docs/_freeze/reference/plot_timeseries/figure-html/cell-7-output-1.png and b/docs/_freeze/reference/plot_timeseries/figure-html/cell-7-output-1.png differ diff --git a/docs/_freeze/reference/summarize_by_time/execute-results/html.json b/docs/_freeze/reference/summarize_by_time/execute-results/html.json index 4b55a884..00380a6b 100644 --- a/docs/_freeze/reference/summarize_by_time/execute-results/html.json +++ b/docs/_freeze/reference/summarize_by_time/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "e27d9dfc1ff561a0a133a2acd7ba2f28", + "hash": "c863eb2d07997047164e879f4ba6f70e", "result": { "markdown": "---\ntitle: summarize_by_time\n---\n\n\n\n`summarize_by_time(data, date_column, value_column, freq='D', agg_func='sum', kind='timestamp', wide_format=False, fillna=0, *args, **kwargs)`\n\nSummarize a DataFrame or GroupBy object by time.\n\nThe `summarize_by_time` function aggregates data by a specified time period and one or more numeric columns, allowing for grouping and customization of the time-based aggregation.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|----------------|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `data` | pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy | A pandas DataFrame or a pandas GroupBy object. This is the data that you want to summarize by time. | _required_ |\n| `date_column` | str | The name of the column in the data frame that contains the dates or timestamps to be aggregated by. This column must be of type datetime64. | _required_ |\n| `value_column` | str or list | The `value_column` parameter is the name of one or more columns in the DataFrame that you want to aggregate by. It can be either a string representing a single column name, or a list of strings representing multiple column names. | _required_ |\n| `freq` | str | The `freq` parameter specifies the frequency at which the data should be aggregated. It accepts a string representing a pandas frequency offset, such as \"D\" for daily or \"MS\" for month start. The default value is \"D\", which means the data will be aggregated on a daily basis. Some common frequency aliases include: - S: secondly frequency - min: minute frequency - H: hourly frequency - D: daily frequency - W: weekly frequency - M: month end frequency - MS: month start frequency - Q: quarter end frequency - QS: quarter start frequency - Y: year end frequency - YS: year start frequency | `'D'` |\n| `agg_func` | list | The `agg_func` parameter is used to specify one or more aggregating functions to apply to the value column(s) during the summarization process. It can be a single function or a list of functions. The default value is `\"sum\"`, which represents the sum function. Some common aggregating functions include: - \"sum\": Sum of values - \"mean\": Mean of values - \"median\": Median of values - \"min\": Minimum of values - \"max\": Maximum of values - \"std\": Standard deviation of values - \"var\": Variance of values - \"first\": First value in group - \"last\": Last value in group - \"count\": Count of values - \"nunique\": Number of unique values - \"corr\": Correlation between values Custom `lambda` aggregating functions can be used too. Here are several common examples: - (\"q25\", lambda x: x.quantile(0.25)): 25th percentile of values - (\"q75\", lambda x: x.quantile(0.75)): 75th percentile of values - (\"iqr\", lambda x: x.quantile(0.75) - x.quantile(0.25)): Interquartile range of values - (\"range\", lambda x: x.max() - x.min()): Range of values | `'sum'` |\n| `wide_format` | bool | A boolean parameter that determines whether the output should be in \"wide\" or \"long\" format. If set to `True`, the output will be in wide format, where each group is represented by a separate column. If set to False, the output will be in long format, where each group is represented by a separate row. The default value is `False`. | `False` |\n| `fillna` | int | The `fillna` parameter is used to specify the value to fill missing data with. By default, it is set to 0. If you want to keep missing values as NaN, you can use `np.nan` as the value for `fillna`. | `0` |\n\n## Returns\n\n| Type | Description |\n|--------------|------------------------------------------------|\n| pd.DataFrame | A Pandas DataFrame that is summarized by time. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\nimport pandas as pd\n\ndf = tk.load_dataset('bike_sales_sample', parse_dates = ['order_date'])\n\ndf\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
order_idorder_lineorder_datequantitypricetotal_pricemodelcategory_1category_2frame_materialbikeshop_namecitystate
0112011-01-07160706070Jekyll Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
1122011-01-07159705970Trigger Carbon 2MountainOver MountainCarbonIthaca Mountain ClimbersIthacaNY
2212011-01-10127702770Beast of the East 1MountainTrailAluminumKansas City 29ersKansas CityKS
3222011-01-10159705970Trigger Carbon 2MountainOver MountainCarbonKansas City 29ersKansas CityKS
4312011-01-1011066010660Supersix Evo Hi-Mod TeamRoadElite RoadCarbonLouisville Race EquipmentLouisvilleKY
..........................................
246132132011-12-22114101410CAAD8 105RoadElite RoadAluminumMiami Race EquipmentMiamiFL
246232212011-12-28112501250Synapse Disc TiagraRoadEndurance RoadAluminumPhoenix Bi-pedsPhoenixAZ
246332222011-12-28126602660Bad Habit 2MountainTrailAluminumPhoenix Bi-pedsPhoenixAZ
246432232011-12-28123402340F-Si 1MountainCross Country RaceAluminumPhoenix Bi-pedsPhoenixAZ
246532242011-12-28158605860Synapse Hi-Mod Dura AceRoadEndurance RoadCarbonPhoenix Bi-pedsPhoenixAZ
\n

2466 rows Γ— 13 columns

\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Summarize by time with a DataFrame object\n( \n df \n .summarize_by_time(\n date_column = 'order_date', \n value_column = 'total_price',\n freq = \"MS\",\n agg_func = ['mean', 'sum']\n )\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
order_datetotal_price_meantotal_price_sum
02011-01-014600.142857483015
12011-02-014611.4087301162075
22011-03-015196.653543659975
32011-04-014533.8461541827140
42011-05-014097.912621844170
52011-06-014544.8392281413445
62011-07-014976.7916671194430
72011-08-014961.970803679790
82011-09-014682.298851814720
92011-10-013930.053476734920
102011-11-014768.1753551006085
112011-12-014186.902655473120
\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Summarize by time with a GroupBy object (Long Format)\n(\n df \n .groupby('category_1') \n .summarize_by_time(\n date_column = 'order_date', \n value_column = 'total_price', \n freq = 'MS',\n agg_func = 'sum',\n wide_format = False, \n )\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
category_1order_datetotal_price
0Mountain2011-01-01221490
1Mountain2011-02-01660555
2Mountain2011-03-01358855
3Mountain2011-04-011075975
4Mountain2011-05-01450440
5Mountain2011-06-01723040
6Mountain2011-07-01767740
7Mountain2011-08-01361255
8Mountain2011-09-01401125
9Mountain2011-10-01377335
10Mountain2011-11-01549345
11Mountain2011-12-01276055
12Road2011-01-01261525
13Road2011-02-01501520
14Road2011-03-01301120
15Road2011-04-01751165
16Road2011-05-01393730
17Road2011-06-01690405
18Road2011-07-01426690
19Road2011-08-01318535
20Road2011-09-01413595
21Road2011-10-01357585
22Road2011-11-01456740
23Road2011-12-01197065
\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=4}\n``` {.python .cell-code}\n# Summarize by time with a GroupBy object (Wide Format)\n(\n df \n .groupby('category_1') \n .summarize_by_time(\n date_column = 'order_date', \n value_column = 'total_price', \n freq = 'MS',\n agg_func = 'sum',\n wide_format = True, \n )\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
order_datetotal_price_Mountaintotal_price_Road
02011-01-01221490261525
12011-02-01660555501520
22011-03-01358855301120
32011-04-011075975751165
42011-05-01450440393730
52011-06-01723040690405
62011-07-01767740426690
72011-08-01361255318535
82011-09-01401125413595
92011-10-01377335357585
102011-11-01549345456740
112011-12-01276055197065
\n
\n```\n:::\n:::\n\n\n::: {.cell execution_count=5}\n``` {.python .cell-code}\n# Summarize by time with a GroupBy object and multiple summaries (Wide Format)\n(\n df \n .groupby('category_1') \n .summarize_by_time(\n date_column = 'order_date', \n value_column = 'total_price', \n freq = 'MS',\n agg_func = ['sum', 'mean', ('q25', lambda x: x.quantile(0.25)), ('q75', lambda x: x.quantile(0.75))],\n wide_format = True, \n )\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
order_datetotal_price_sum_Mountaintotal_price_sum_Roadtotal_price_mean_Mountaintotal_price_mean_Roadtotal_price_q25_Mountaintotal_price_q75_Roadtotal_price_q25_Mountaintotal_price_q75_Road
02011-01-012214902615254922.0000004358.7500002060.01950.06070.05605.0
12011-02-016605555015204374.5364244965.5445542060.01950.05330.05860.0
22011-03-013588553011205882.8688524562.4242422130.02240.06390.05875.0
32011-04-0110759757511654890.7954554104.7267762060.01950.05970.04800.0
42011-05-014504403937304549.8989903679.7196262010.01570.06020.03500.0
52011-06-017230406904055021.1111114134.1616771950.01840.05647.54500.0
62011-07-017677404266905444.9645394310.0000002130.01895.06400.05330.0
72011-08-013612553185355734.2063494304.5270272235.01950.06400.04987.5
82011-09-014011254135955077.5316464353.6315791620.01950.06390.05330.0
92011-10-013773353575854439.2352943505.7352942160.01750.06070.04260.0
102011-11-015493454567405282.1634624268.5981312340.01950.07460.04370.0
112011-12-012760551970655208.5849063284.4166672060.01652.56400.03200.0
\n
\n```\n:::\n:::\n\n\n", "supporting": [ - "summarize_by_time_files\\figure-html" + "summarize_by_time_files/figure-html" ], "filters": [], "includes": { diff --git a/docs/_freeze/reference/week_of_month/execute-results/html.json b/docs/_freeze/reference/week_of_month/execute-results/html.json index c7f7940e..729ef6c2 100644 --- a/docs/_freeze/reference/week_of_month/execute-results/html.json +++ b/docs/_freeze/reference/week_of_month/execute-results/html.json @@ -1,9 +1,9 @@ { - "hash": "42585efd0e6141aff5be6652136479ec", + "hash": "c252d4a8d649b8d8cfe0811a7020ccec", "result": { "markdown": "---\ntitle: week_of_month\n---\n\n\n\n`week_of_month(idx)`\n\nThe \"week_of_month\" function calculates the week number of a given date within its month.\n\n## Parameters\n\n| Name | Type | Description | Default |\n|--------|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `idx` | pd.Series or pd.DatetimeIndex | The parameter \"idx\" is a pandas Series object that represents a specific date for which you want to determine the week of the month. | _required_ |\n\n## Returns\n\n| Type | Description |\n|-----------|-----------------------------------------|\n| pd.Series | The week of the month for a given date. |\n\n## Examples\n\n\n::: {.cell execution_count=1}\n``` {.python .cell-code}\nimport pytimetk as tk\nimport pandas as pd\n\ndates = pd.date_range(\"2020-01-01\", \"2020-02-28\", freq=\"1D\")\ndates\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```\nDatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',\n '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',\n '2020-01-09', '2020-01-10', '2020-01-11', '2020-01-12',\n '2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16',\n '2020-01-17', '2020-01-18', '2020-01-19', '2020-01-20',\n '2020-01-21', '2020-01-22', '2020-01-23', '2020-01-24',\n '2020-01-25', '2020-01-26', '2020-01-27', '2020-01-28',\n '2020-01-29', '2020-01-30', '2020-01-31', '2020-02-01',\n '2020-02-02', '2020-02-03', '2020-02-04', '2020-02-05',\n '2020-02-06', '2020-02-07', '2020-02-08', '2020-02-09',\n '2020-02-10', '2020-02-11', '2020-02-12', '2020-02-13',\n '2020-02-14', '2020-02-15', '2020-02-16', '2020-02-17',\n '2020-02-18', '2020-02-19', '2020-02-20', '2020-02-21',\n '2020-02-22', '2020-02-23', '2020-02-24', '2020-02-25',\n '2020-02-26', '2020-02-27', '2020-02-28'],\n dtype='datetime64[ns]', freq='D')\n```\n:::\n:::\n\n\n::: {.cell execution_count=2}\n``` {.python .cell-code}\n# Works on DateTimeIndex\ntk.week_of_month(dates)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```\n0 1\n1 1\n2 1\n3 1\n4 1\n5 1\n6 1\n7 2\n8 2\n9 2\n10 2\n11 2\n12 2\n13 2\n14 3\n15 3\n16 3\n17 3\n18 3\n19 3\n20 3\n21 4\n22 4\n23 4\n24 4\n25 4\n26 4\n27 4\n28 5\n29 5\n30 5\n31 1\n32 1\n33 1\n34 1\n35 1\n36 1\n37 1\n38 2\n39 2\n40 2\n41 2\n42 2\n43 2\n44 2\n45 3\n46 3\n47 3\n48 3\n49 3\n50 3\n51 3\n52 4\n53 4\n54 4\n55 4\n56 4\n57 4\n58 4\nName: week_of_month, dtype: int32\n```\n:::\n:::\n\n\n::: {.cell execution_count=3}\n``` {.python .cell-code}\n# Works on Pandas Series\ndates.to_series().week_of_month()\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```\n2020-01-01 1\n2020-01-02 1\n2020-01-03 1\n2020-01-04 1\n2020-01-05 1\n2020-01-06 1\n2020-01-07 1\n2020-01-08 2\n2020-01-09 2\n2020-01-10 2\n2020-01-11 2\n2020-01-12 2\n2020-01-13 2\n2020-01-14 2\n2020-01-15 3\n2020-01-16 3\n2020-01-17 3\n2020-01-18 3\n2020-01-19 3\n2020-01-20 3\n2020-01-21 3\n2020-01-22 4\n2020-01-23 4\n2020-01-24 4\n2020-01-25 4\n2020-01-26 4\n2020-01-27 4\n2020-01-28 4\n2020-01-29 5\n2020-01-30 5\n2020-01-31 5\n2020-02-01 1\n2020-02-02 1\n2020-02-03 1\n2020-02-04 1\n2020-02-05 1\n2020-02-06 1\n2020-02-07 1\n2020-02-08 2\n2020-02-09 2\n2020-02-10 2\n2020-02-11 2\n2020-02-12 2\n2020-02-13 2\n2020-02-14 2\n2020-02-15 3\n2020-02-16 3\n2020-02-17 3\n2020-02-18 3\n2020-02-19 3\n2020-02-20 3\n2020-02-21 3\n2020-02-22 4\n2020-02-23 4\n2020-02-24 4\n2020-02-25 4\n2020-02-26 4\n2020-02-27 4\n2020-02-28 4\nFreq: D, Name: week_of_month, dtype: int32\n```\n:::\n:::\n\n\n", "supporting": [ - "week_of_month_files\\figure-html" + "week_of_month_files/figure-html" ], "filters": [], "includes": {} diff --git a/docs/_quarto.yml b/docs/_quarto.yml index 95f9e3d1..c7b69e67 100644 --- a/docs/_quarto.yml +++ b/docs/_quarto.yml @@ -163,7 +163,7 @@ quartodoc: - glimpse - flatten_multiindex_column_names - title: πŸ’Ύ 13 Datasets - package: pytimetk.datasets.get_datasets + package: pytimetk desc: Practice `pytimetk` with 13 complementary time series datasets. contents: - get_available_datasets diff --git a/docs/_sidebar.yml b/docs/_sidebar.yml index 9caf59b0..91828d8e 100644 --- a/docs/_sidebar.yml +++ b/docs/_sidebar.yml @@ -16,6 +16,7 @@ website: - reference/augment_lags.qmd - reference/augment_leads.qmd - reference/augment_rolling.qmd + - reference/augment_expanding.qmd section: "\U0001F3D7\uFE0F Adding Features to Time Series DataFrames (Augmenting)" - contents: - reference/ts_features.qmd @@ -28,7 +29,7 @@ website: - reference/get_date_summary.qmd - reference/get_frequency_summary.qmd - reference/get_diff_summary.qmd - - reference/get_pandas_frequency.qmd + - reference/get_frequency.qmd - reference/get_timeseries_signature.qmd - reference/get_holiday_signature.qmd section: "\U0001F43C Time Series for Pandas Series" @@ -49,6 +50,6 @@ website: - contents: - reference/get_available_datasets.qmd - reference/load_dataset.qmd - section: "\U0001F4BE 12 Datasets" + section: "\U0001F4BE 13 Datasets" id: reference - id: dummy-sidebar diff --git a/docs/_site/getting-started/02_quick_start.html b/docs/_site/getting-started/02_quick_start.html index fc9dc565..194aedf1 100644 --- a/docs/_site/getting-started/02_quick_start.html +++ b/docs/_site/getting-started/02_quick_start.html @@ -718,9 +718,9 @@

)
-
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + +
+ +
+ + +
+ + + +
+ +
+
+

augment_expanding

+
+ + + +
+ + + + +
+ + +
+ +

augment_expanding(data, date_column, value_column, use_independent_variables=False, window_func='mean', min_periods=None, **kwargs)

+

Apply one or more expanding functions and window sizes to one or more columns of a DataFrame.

+
Parameters
+----------
+data : Union[pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy]
+    The `data` parameter is the input DataFrame or GroupBy object that contains the data to be processed. It can be either a Pandas DataFrame or a GroupBy object.
+date_column : str
+    The `date_column` parameter is the name of the datetime column in the DataFrame by which the data should be sorted within each group.
+value_column : Union[str, list]
+    The `value_column` parameter is the name of the column(s) in the DataFrame to which the expanding window function(s) should be applied. It can be a single column name or a list of column names.
+use_independent_variables : bool
+    The `use_independent_variables` parameter is an optional parameter that specifies whether the expanding function(s) require independent variables, such as expanding correlation or expanding regression. (See Examples below.)
+window_func : Union[str, list, Tuple[str, Callable]], optional
+    The `window_func` parameter in the `augment_expanding` function is used to specify the function(s) to be applied to the expanding windows. 
+    
+    1. It can be a string or a list of strings, where each string represents the name of the function to be applied. 
+    
+    2. Alternatively, it can be a list of tuples, where each tuple contains the name of the function to be applied and the function itself. The function is applied as a Pandas Series. (See Examples below.)
+    
+    3. If the function requires independent variables, the `use_independent_variables` parameter must be specified. The independent variables will be passed to the function as a DataFrame containing the window of rows. (See Examples below.)
+    
+Returns
+-------
+pd.DataFrame
+    The `augment_expanding` function returns a DataFrame with new columns for each applied function, window size, and value column.
+
+Examples
+--------
+
+
+
+::: {.cell execution_count=1}
+``` {.python .cell-code}
+import pytimetk as tk
+import pandas as pd
+import numpy as np
+
+df = tk.load_dataset("m4_daily", parse_dates = ['date'])
+```
+:::
+
+
+
+::: {.cell execution_count=2}
+``` {.python .cell-code}
+# String Function Name and Series Lambda Function (no independent variables)
+rolled_df = (
+    df
+        .groupby('id')
+        .augment_expanding(
+            date_column = 'date', 
+            value_column = 'value',  
+            window_func = ['mean', ('std', lambda x: x.std())]
+        )
+)
+rolled_df
+```
+
+::: {.cell-output .cell-output-display execution_count=2}
+
+```{=html}
+<div>
+<style scoped>
+    .dataframe tbody tr th:only-of-type {
+        vertical-align: middle;
+    }
+
+    .dataframe tbody tr th {
+        vertical-align: top;
+    }
+
+    .dataframe thead th {
+        text-align: right;
+    }
+</style>
+<table border="1" class="dataframe">
+  <thead>
+    <tr style="text-align: right;">
+      <th></th>
+      <th>id</th>
+      <th>date</th>
+      <th>value</th>
+      <th>value_expanding_mean</th>
+      <th>value_expanding_std</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>0</th>
+      <td>D10</td>
+      <td>2014-07-03</td>
+      <td>2076.2</td>
+      <td>2076.200000</td>
+      <td>0.000000</td>
+    </tr>
+    <tr>
+      <th>1</th>
+      <td>D10</td>
+      <td>2014-07-04</td>
+      <td>2073.4</td>
+      <td>2074.800000</td>
+      <td>1.400000</td>
+    </tr>
+    <tr>
+      <th>2</th>
+      <td>D10</td>
+      <td>2014-07-05</td>
+      <td>2048.7</td>
+      <td>2066.100000</td>
+      <td>12.356645</td>
+    </tr>
+    <tr>
+      <th>3</th>
+      <td>D10</td>
+      <td>2014-07-06</td>
+      <td>2048.9</td>
+      <td>2061.800000</td>
+      <td>13.037830</td>
+    </tr>
+    <tr>
+      <th>4</th>
+      <td>D10</td>
+      <td>2014-07-07</td>
+      <td>2006.4</td>
+      <td>2050.720000</td>
+      <td>25.041038</td>
+    </tr>
+    <tr>
+      <th>...</th>
+      <td>...</td>
+      <td>...</td>
+      <td>...</td>
+      <td>...</td>
+      <td>...</td>
+    </tr>
+    <tr>
+      <th>9738</th>
+      <td>D500</td>
+      <td>2012-09-19</td>
+      <td>9418.8</td>
+      <td>8286.606679</td>
+      <td>2456.667418</td>
+    </tr>
+    <tr>
+      <th>9739</th>
+      <td>D500</td>
+      <td>2012-09-20</td>
+      <td>9365.7</td>
+      <td>8286.864035</td>
+      <td>2456.430967</td>
+    </tr>
+    <tr>
+      <th>9740</th>
+      <td>D500</td>
+      <td>2012-09-21</td>
+      <td>9445.9</td>
+      <td>8287.140391</td>
+      <td>2456.203287</td>
+    </tr>
+    <tr>
+      <th>9741</th>
+      <td>D500</td>
+      <td>2012-09-22</td>
+      <td>9497.9</td>
+      <td>8287.429011</td>
+      <td>2455.981643</td>
+    </tr>
+    <tr>
+      <th>9742</th>
+      <td>D500</td>
+      <td>2012-09-23</td>
+      <td>9545.3</td>
+      <td>8287.728789</td>
+      <td>2455.765726</td>
+    </tr>
+  </tbody>
+</table>
+<p>9743 rows Γ— 5 columns</p>
+</div>
+```
+
+:::
+:::
+
+
+
+::: {.cell execution_count=3}
+``` {.python .cell-code}
+# Expanding Correlation: Uses independent variables (value2)
+
+df = pd.DataFrame({
+    'id': [1, 1, 1, 2, 2, 2],
+    'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),
+    'value1': [10, 20, 29, 42, 53, 59],
+    'value2': [2, 16, 20, 40, 41, 50],
+})
+
+result_df = (
+    df.groupby('id')
+    .augment_expanding(
+        date_column='date',
+        value_column='value1',
+        use_independent_variables=True,
+        window_func=[('corr', lambda df: df['value1'].corr(df['value2']))],
+    )
+)
+result_df
+```
+
+::: {.cell-output .cell-output-display execution_count=3}
+
+```{=html}
+<div>
+<style scoped>
+    .dataframe tbody tr th:only-of-type {
+        vertical-align: middle;
+    }
+
+    .dataframe tbody tr th {
+        vertical-align: top;
+    }
+
+    .dataframe thead th {
+        text-align: right;
+    }
+</style>
+<table border="1" class="dataframe">
+  <thead>
+    <tr style="text-align: right;">
+      <th></th>
+      <th>id</th>
+      <th>date</th>
+      <th>value1</th>
+      <th>value2</th>
+      <th>value1_expanding_corr</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>0</th>
+      <td>1</td>
+      <td>2023-01-01</td>
+      <td>10</td>
+      <td>2</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>1</th>
+      <td>1</td>
+      <td>2023-01-02</td>
+      <td>20</td>
+      <td>16</td>
+      <td>1.000000</td>
+    </tr>
+    <tr>
+      <th>2</th>
+      <td>1</td>
+      <td>2023-01-03</td>
+      <td>29</td>
+      <td>20</td>
+      <td>0.961054</td>
+    </tr>
+    <tr>
+      <th>3</th>
+      <td>2</td>
+      <td>2023-01-04</td>
+      <td>42</td>
+      <td>40</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>4</th>
+      <td>2</td>
+      <td>2023-01-05</td>
+      <td>53</td>
+      <td>41</td>
+      <td>1.000000</td>
+    </tr>
+    <tr>
+      <th>5</th>
+      <td>2</td>
+      <td>2023-01-06</td>
+      <td>59</td>
+      <td>50</td>
+      <td>0.824831</td>
+    </tr>
+  </tbody>
+</table>
+</div>
+```
+
+:::
+:::
+
+
+# Expanding Regression: Using independent variables (value2 and value3)
+
+# Requires: scikit-learn
+from sklearn.linear_model import LinearRegression
+
+df = pd.DataFrame({
+    'id': [1, 1, 1, 2, 2, 2],
+    'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),
+    'value1': [10, 20, 29, 42, 53, 59],
+    'value2': [5, 16, 24, 35, 45, 58],
+    'value3': [2, 3, 6, 9, 10, 13]
+})
+
+# Define Regression Function
+def regression(df):
+
+    model = LinearRegression()
+    X = df[['value2', 'value3']]  # Extract X values (independent variables)
+    y = df['value1']  # Extract y values (dependent variable)
+    model.fit(X, y)
+    ret = pd.Series([model.intercept_, model.coef_[0]], index=['Intercept', 'Slope'])
+    return ret # Return intercept and slope as a Series
+    
+
+# Example to call the function
+result_df = (
+    df.groupby('id')
+    .augment_expanding(
+        date_column='date',
+        value_column='value1',
+        use_independent_variables=True,
+        window_func=[('regression', regression)]
+    )
+    .dropna()
+)
+result_df
+
+# Display Results in Wide Format since returning multiple values
+regression_wide_df = pd.concat(result_df['value1_expanding_regression'].to_list(), axis=1).T
+
+regression_wide_df = pd.concat([result_df.reset_index(drop = True), regression_wide_df], axis=1)
+
+regression_wide_df
+

```

+ + + +
+ +
+ + + + \ No newline at end of file diff --git a/docs/_site/reference/get_available_datasets.html b/docs/_site/reference/get_available_datasets.html index 7379059d..b0370464 100644 --- a/docs/_site/reference/get_available_datasets.html +++ b/docs/_site/reference/get_available_datasets.html @@ -337,7 +337,7 @@

get_available_datasets

-

datasets.get_datasets.get_available_datasets()

+

get_available_datasets()

Get a list of 12 datasets that can be loaded with pytimetk.load_dataset.

The get_available_datasets function returns a sorted list of available dataset names from the pytimetk.datasets module. The available datasets are:

@@ -367,9 +367,10 @@

Examples

import pytimetk as tk
 
 tk.get_available_datasets()
-
+
['bike_sales_sample',
  'bike_sharing_daily',
+ 'expedia',
  'm4_daily',
  'm4_hourly',
  'm4_monthly',
diff --git a/docs/_site/reference/get_frequency.html b/docs/_site/reference/get_frequency.html
new file mode 100644
index 00000000..51e37dd7
--- /dev/null
+++ b/docs/_site/reference/get_frequency.html
@@ -0,0 +1,706 @@
+
+
+
+
+
+
+
+
+
+pytimetk – get_frequency
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + +
+ +
+ + +
+ + + +
+ + + +
+

get_frequency

+

get_frequency(idx, force_regular=False)

+

Get the frequency of a pandas Series or DatetimeIndex.

+

The function get_frequency first attempts to get a pandas inferred frequency. If the inferred frequency is None, it will attempt calculate the frequency manually. If the frequency cannot be determined, the function will raise a ValueError.

+
+

Parameters

+ ++++++ + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
idxpd.Series or pd.DatetimeIndexThe idx parameter can be either a pd.Series or a pd.DatetimeIndex. It represents the index or the time series data for which we want to determine the frequency.required
force_regularboolThe force_regular parameter is a boolean flag that determines whether to force the frequency to be regular. If set to True, the function will convert irregular frequencies to their regular counterparts. For example, if the inferred frequency is β€˜B’ (business days), it will be converted to β€˜D’ (calendar days). The default value is False.False
+
+
+

Returns

+ + + + + + + + + + + + + +
TypeDescription
strThe frequency of the given pandas series or datetime index.
+ + +
+
+ +
+ +
+ + + + \ No newline at end of file diff --git a/docs/_site/reference/get_frequency_summary.html b/docs/_site/reference/get_frequency_summary.html index d99a99c6..7b53b08f 100644 --- a/docs/_site/reference/get_frequency_summary.html +++ b/docs/_site/reference/get_frequency_summary.html @@ -20,6 +20,40 @@ margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ vertical-align: middle; } +/* CSS for syntax highlighting */ +pre > code.sourceCode { white-space: pre; position: relative; } +pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } +pre > code.sourceCode > span:empty { height: 1.2em; } +.sourceCode { overflow: visible; } +code.sourceCode > span { color: inherit; text-decoration: inherit; } +div.sourceCode { margin: 1em 0; } +pre.sourceCode { margin: 0; } +@media screen { +div.sourceCode { overflow: auto; } +} +@media print { +pre > code.sourceCode { white-space: pre-wrap; } +pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +} +pre.numberSource code + { counter-reset: source-line 0; } +pre.numberSource code > span + { position: relative; left: -4em; counter-increment: source-line; } +pre.numberSource code > span > a:first-child::before + { content: counter(source-line); + position: relative; left: -1em; text-align: right; vertical-align: baseline; + border: none; display: inline-block; + -webkit-touch-callout: none; -webkit-user-select: none; + -khtml-user-select: none; -moz-user-select: none; + -ms-user-select: none; user-select: none; + padding: 0 4px; width: 4em; + } +pre.numberSource { margin-left: 3em; padding-left: 4px; } +div.sourceCode + { } +@media screen { +pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } +} @@ -100,7 +134,7 @@ - +
@@ -278,23 +312,34 @@

On this page

+
+
+

get_frequency_summary

+
-
-

get_frequency_summary

+ +
+ + + + +
+ + +
+

get_frequency_summary(idx)

-

Returns a summary including the inferred frequency, median time difference, scale, and unit.

+

More robust version of pandas inferred frequency.

Parameters

@@ -342,9 +387,35 @@

Returns

+
+
+

Examples

+
+
import pytimetk as tk
+import pandas as pd
+
+dates = pd.date_range(start = '2020-01-01', end = '2020-01-10', freq = 'D')
+
+tk.get_frequency(dates)
+
+
'D'
+
+
+
+
# pandas inferred frequency fails
+dates = pd.to_datetime(["2021-01-01", "2021-02-01"])
+
+# Returns None
+tk.get_pandas_frequency(dates)
+
+# Returns '1MS'
+tk.get_frequency(dates)
+
+
'1MS'
+
+
-
diff --git a/docs/_site/reference/index.html b/docs/_site/reference/index.html index 9c292cdc..6ab89c0c 100644 --- a/docs/_site/reference/index.html +++ b/docs/_site/reference/index.html @@ -288,7 +288,7 @@

On this page

  • πŸ› οΈ Date Utilities
  • πŸ› οΈ Visualization Utilities
  • Extra Pandas Helpers (That Help Beyond Just Time Series)
  • -
  • πŸ’Ύ 12 Datasets
  • +
  • πŸ’Ύ 13 Datasets
  • @@ -357,6 +357,10 @@

    augment_rolling Apply one or more rolling functions and window sizes to one or more columns of a DataFrame. + +augment_expanding +Apply one or more expanding functions and window sizes to one or more columns of a DataFrame. + @@ -399,14 +403,14 @@

    🐼 Time Se get_frequency_summary -Returns a summary including the inferred frequency, median time difference, scale, and unit. +More robust version of pandas inferred frequency. get_diff_summary Calculates summary statistics of the time differences between consecutive values in a datetime index. -get_pandas_frequency +get_frequency Get the frequency of a pandas Series or DatetimeIndex. @@ -476,16 +480,16 @@

    -

    πŸ’Ύ 12 Datasets

    -

    Practice pytimetk with 12 complementary time series datasets.

    +

    πŸ’Ύ 13 Datasets

    +

    Practice pytimetk with 13 complementary time series datasets.

    - + - + diff --git a/docs/_site/reference/load_dataset.html b/docs/_site/reference/load_dataset.html index 4f1ba5b4..bd268cfd 100644 --- a/docs/_site/reference/load_dataset.html +++ b/docs/_site/reference/load_dataset.html @@ -341,7 +341,7 @@

    load_dataset

    -

    datasets.get_datasets.load_dataset(name='m4_daily', verbose=False, **kwargs)

    +

    load_dataset(name='m4_daily', verbose=False, **kwargs)

    Load one of 12 Time Series Datasets.

    The load_dataset function is used to load various time series datasets by name, with options to print the available datasets and pass additional arguments to pandas.read_csv. The available datasets are:

      @@ -357,6 +357,7 @@

      load_dataset

    • walmart_sales_weekly: The Walmart sales weekly dataset
    • wikipedia_traffic_daily: The Wikipedia traffic daily dataset
    • stocks_daily: The MAANNG stocks dataset
    • +
    • expedia: Expedia Hotel Time Series Dataset

    The datasets can be loaded with pytimetk.load_dataset(name), where name is the name of the dataset that you want to load. The default value is set to β€œm4_daily”, which is the M4 daily dataset. However, you can choose from a list of available datasets mentioned above.

    diff --git a/docs/_site/reference/plot_timeseries.html b/docs/_site/reference/plot_timeseries.html index c3d2b479..776658e3 100644 --- a/docs/_site/reference/plot_timeseries.html +++ b/docs/_site/reference/plot_timeseries.html @@ -614,9 +614,9 @@

    Examples

    fig
    -
    get_available_datasetsget_available_datasets Get a list of 12 datasets that can be loaded with pytimetk.load_dataset.
    load_datasetload_dataset Load one of 12 Time Series Datasets.