augment_expanding
+augment_expanding(data, date_column, value_column, use_independent_variables=False, window_func='mean', min_periods=None, **kwargs)
Apply one or more expanding functions and window sizes to one or more columns of a DataFrame.
+Parameters
+----------
+data : Union[pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy]
+ The `data` parameter is the input DataFrame or GroupBy object that contains the data to be processed. It can be either a Pandas DataFrame or a GroupBy object.
+date_column : str
+ The `date_column` parameter is the name of the datetime column in the DataFrame by which the data should be sorted within each group.
+value_column : Union[str, list]
+ The `value_column` parameter is the name of the column(s) in the DataFrame to which the expanding window function(s) should be applied. It can be a single column name or a list of column names.
+use_independent_variables : bool
+ The `use_independent_variables` parameter is an optional parameter that specifies whether the expanding function(s) require independent variables, such as expanding correlation or expanding regression. (See Examples below.)
+window_func : Union[str, list, Tuple[str, Callable]], optional
+ The `window_func` parameter in the `augment_expanding` function is used to specify the function(s) to be applied to the expanding windows.
+
+ 1. It can be a string or a list of strings, where each string represents the name of the function to be applied.
+
+ 2. Alternatively, it can be a list of tuples, where each tuple contains the name of the function to be applied and the function itself. The function is applied as a Pandas Series. (See Examples below.)
+
+ 3. If the function requires independent variables, the `use_independent_variables` parameter must be specified. The independent variables will be passed to the function as a DataFrame containing the window of rows. (See Examples below.)
+
+Returns
+-------
+pd.DataFrame
+ The `augment_expanding` function returns a DataFrame with new columns for each applied function, window size, and value column.
+
+Examples
+--------
+
+
+
+::: {.cell execution_count=1}
+``` {.python .cell-code}
+import pytimetk as tk
+import pandas as pd
+import numpy as np
+
+df = tk.load_dataset("m4_daily", parse_dates = ['date'])
+```
+:::
+
+
+
+::: {.cell execution_count=2}
+``` {.python .cell-code}
+# String Function Name and Series Lambda Function (no independent variables)
+rolled_df = (
+ df
+ .groupby('id')
+ .augment_expanding(
+ date_column = 'date',
+ value_column = 'value',
+ window_func = ['mean', ('std', lambda x: x.std())]
+ )
+)
+rolled_df
+```
+
+::: {.cell-output .cell-output-display execution_count=2}
+
+```{=html}
+<div>
+<style scoped>
+ .dataframe tbody tr th:only-of-type {
+ vertical-align: middle;
+ }
+
+ .dataframe tbody tr th {
+ vertical-align: top;
+ }
+
+ .dataframe thead th {
+ text-align: right;
+ }
+</style>
+<table border="1" class="dataframe">
+ <thead>
+ <tr style="text-align: right;">
+ <th></th>
+ <th>id</th>
+ <th>date</th>
+ <th>value</th>
+ <th>value_expanding_mean</th>
+ <th>value_expanding_std</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <th>0</th>
+ <td>D10</td>
+ <td>2014-07-03</td>
+ <td>2076.2</td>
+ <td>2076.200000</td>
+ <td>0.000000</td>
+ </tr>
+ <tr>
+ <th>1</th>
+ <td>D10</td>
+ <td>2014-07-04</td>
+ <td>2073.4</td>
+ <td>2074.800000</td>
+ <td>1.400000</td>
+ </tr>
+ <tr>
+ <th>2</th>
+ <td>D10</td>
+ <td>2014-07-05</td>
+ <td>2048.7</td>
+ <td>2066.100000</td>
+ <td>12.356645</td>
+ </tr>
+ <tr>
+ <th>3</th>
+ <td>D10</td>
+ <td>2014-07-06</td>
+ <td>2048.9</td>
+ <td>2061.800000</td>
+ <td>13.037830</td>
+ </tr>
+ <tr>
+ <th>4</th>
+ <td>D10</td>
+ <td>2014-07-07</td>
+ <td>2006.4</td>
+ <td>2050.720000</td>
+ <td>25.041038</td>
+ </tr>
+ <tr>
+ <th>...</th>
+ <td>...</td>
+ <td>...</td>
+ <td>...</td>
+ <td>...</td>
+ <td>...</td>
+ </tr>
+ <tr>
+ <th>9738</th>
+ <td>D500</td>
+ <td>2012-09-19</td>
+ <td>9418.8</td>
+ <td>8286.606679</td>
+ <td>2456.667418</td>
+ </tr>
+ <tr>
+ <th>9739</th>
+ <td>D500</td>
+ <td>2012-09-20</td>
+ <td>9365.7</td>
+ <td>8286.864035</td>
+ <td>2456.430967</td>
+ </tr>
+ <tr>
+ <th>9740</th>
+ <td>D500</td>
+ <td>2012-09-21</td>
+ <td>9445.9</td>
+ <td>8287.140391</td>
+ <td>2456.203287</td>
+ </tr>
+ <tr>
+ <th>9741</th>
+ <td>D500</td>
+ <td>2012-09-22</td>
+ <td>9497.9</td>
+ <td>8287.429011</td>
+ <td>2455.981643</td>
+ </tr>
+ <tr>
+ <th>9742</th>
+ <td>D500</td>
+ <td>2012-09-23</td>
+ <td>9545.3</td>
+ <td>8287.728789</td>
+ <td>2455.765726</td>
+ </tr>
+ </tbody>
+</table>
+<p>9743 rows Γ 5 columns</p>
+</div>
+```
+
+:::
+:::
+
+
+
+::: {.cell execution_count=3}
+``` {.python .cell-code}
+# Expanding Correlation: Uses independent variables (value2)
+
+df = pd.DataFrame({
+ 'id': [1, 1, 1, 2, 2, 2],
+ 'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),
+ 'value1': [10, 20, 29, 42, 53, 59],
+ 'value2': [2, 16, 20, 40, 41, 50],
+})
+
+result_df = (
+ df.groupby('id')
+ .augment_expanding(
+ date_column='date',
+ value_column='value1',
+ use_independent_variables=True,
+ window_func=[('corr', lambda df: df['value1'].corr(df['value2']))],
+ )
+)
+result_df
+```
+
+::: {.cell-output .cell-output-display execution_count=3}
+
+```{=html}
+<div>
+<style scoped>
+ .dataframe tbody tr th:only-of-type {
+ vertical-align: middle;
+ }
+
+ .dataframe tbody tr th {
+ vertical-align: top;
+ }
+
+ .dataframe thead th {
+ text-align: right;
+ }
+</style>
+<table border="1" class="dataframe">
+ <thead>
+ <tr style="text-align: right;">
+ <th></th>
+ <th>id</th>
+ <th>date</th>
+ <th>value1</th>
+ <th>value2</th>
+ <th>value1_expanding_corr</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <th>0</th>
+ <td>1</td>
+ <td>2023-01-01</td>
+ <td>10</td>
+ <td>2</td>
+ <td>NaN</td>
+ </tr>
+ <tr>
+ <th>1</th>
+ <td>1</td>
+ <td>2023-01-02</td>
+ <td>20</td>
+ <td>16</td>
+ <td>1.000000</td>
+ </tr>
+ <tr>
+ <th>2</th>
+ <td>1</td>
+ <td>2023-01-03</td>
+ <td>29</td>
+ <td>20</td>
+ <td>0.961054</td>
+ </tr>
+ <tr>
+ <th>3</th>
+ <td>2</td>
+ <td>2023-01-04</td>
+ <td>42</td>
+ <td>40</td>
+ <td>NaN</td>
+ </tr>
+ <tr>
+ <th>4</th>
+ <td>2</td>
+ <td>2023-01-05</td>
+ <td>53</td>
+ <td>41</td>
+ <td>1.000000</td>
+ </tr>
+ <tr>
+ <th>5</th>
+ <td>2</td>
+ <td>2023-01-06</td>
+ <td>59</td>
+ <td>50</td>
+ <td>0.824831</td>
+ </tr>
+ </tbody>
+</table>
+</div>
+```
+
+:::
+:::
+
+
+# Expanding Regression: Using independent variables (value2 and value3)
+
+# Requires: scikit-learn
+from sklearn.linear_model import LinearRegression
+
+df = pd.DataFrame({
+ 'id': [1, 1, 1, 2, 2, 2],
+ 'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),
+ 'value1': [10, 20, 29, 42, 53, 59],
+ 'value2': [5, 16, 24, 35, 45, 58],
+ 'value3': [2, 3, 6, 9, 10, 13]
+})
+
+# Define Regression Function
+def regression(df):
+
+ model = LinearRegression()
+ X = df[['value2', 'value3']] # Extract X values (independent variables)
+ y = df['value1'] # Extract y values (dependent variable)
+ model.fit(X, y)
+ ret = pd.Series([model.intercept_, model.coef_[0]], index=['Intercept', 'Slope'])
+ return ret # Return intercept and slope as a Series
+
+
+# Example to call the function
+result_df = (
+ df.groupby('id')
+ .augment_expanding(
+ date_column='date',
+ value_column='value1',
+ use_independent_variables=True,
+ window_func=[('regression', regression)]
+ )
+ .dropna()
+)
+result_df
+
+# Display Results in Wide Format since returning multiple values
+regression_wide_df = pd.concat(result_df['value1_expanding_regression'].to_list(), axis=1).T
+
+regression_wide_df = pd.concat([result_df.reset_index(drop = True), regression_wide_df], axis=1)
+
+regression_wide_df
+```
+ + + +