Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/tz aware dta #2054

Merged
merged 7 commits into from
Nov 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
- Adapted the example notebooks to properly apply data transformers and avoid look-ahead bias. [#2020](https://github.com/unit8co/darts/pull/2020) by [Samriddhi Singh](https://github.com/SimTheGreat).
- Improvements to Regression Models:
- `XGBModel` now leverages XGBoost's native Quantile Regression support that was released in version 2.0.0 for improved probabilistic forecasts. [#2051](https://github.com/unit8co/darts/pull/2051) by [Dennis Bader](https://github.com/dennisbader).
- Other improvements:
- Added support for time index time zone conversion with parameter `tz` before generating/computing holidays and datetime attributes. Support was added to all Time Axis Encoders (standalone encoders and forecasting models' `add_encoders`, time series generation utils functions `holidays_timeseries()` and `datetime_attribute_timeseries()`, and `TimeSeries` methods `add_datetime_attribute()` and `add_holidays()`. [#2054](https://github.com/unit8co/darts/pull/2054) by [Dennis Bader](https://github.com/dennisbader).

**Fixed**
- Fixed a bug when calling optimized `historical_forecasts()` for a `RegressionModel` trained with unequal component-specific lags. [#2040](https://github.com/unit8co/darts/pull/2040) by [Antoine Madrona](https://github.com/madtoinou).
Expand Down
80 changes: 75 additions & 5 deletions darts/dataprocessing/encoders/encoders.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
input_chunk_length=24,
output_chunk_length=12,
attribute='month'
tz='CET'
)

past_covariates_train = encoder.encode_train(
Expand Down Expand Up @@ -75,6 +76,8 @@
attribute
An attribute of `pd.DatetimeIndex`: see all available attributes in
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
tz
Optionally, convert the time zone naive index to a time zone `tz` before applying the encoder.
* `CyclicTemporalEncoder`
Adds cyclic pd.DatetimeIndex attribute information deriveed from `series.time_index`.
Adds 2 columns, corresponding to sin and cos encodings, to uniquely describe the underlying attribute.
Expand All @@ -84,6 +87,8 @@
An attribute of `pd.DatetimeIndex` that follows a cyclic pattern. One of ('month', 'day', 'weekday',
'dayofweek', 'day_of_week', 'hour', 'minute', 'second', 'microsecond', 'nanosecond', 'quarter',
'dayofyear', 'day_of_year', 'week', 'weekofyear', 'week_of_year').
tz
Optionally, convert the time zone naive index to a time zone `tz` before applying the encoder.
* `IntegerIndexEncoder`
Adds the relative index positions as integer values (positions) derived from `series` time index.
`series` can either have a pd.DatetimeIndex or an integer index.
Expand Down Expand Up @@ -121,6 +126,7 @@
* 'position' for `IntegerIndexEncoder`
* 'custom' for `CallableIndexEncoder`
* 'transformer' for a transformer
* 'tz' for applying a time zone conversion
* inner keys: covariates type

* 'past' for past covariates
Expand All @@ -142,7 +148,8 @@
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [lambda idx: (idx.year - 1950) / 50]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET',
}

model = SomeTorchForecastingModel(..., add_encoders=add_encoders)
Expand Down Expand Up @@ -184,6 +191,7 @@
VALID_TIME_PARAMS = [FUTURE, PAST]
VALID_ENCODER_DTYPES = (str, Sequence)

TZ_KEYS = ["tz"]
TRANSFORMER_KEYS = ["transformer"]
VALID_TRANSFORMER_DTYPES = FittableDataTransformer
INTEGER_INDEX_ATTRIBUTES = ["relative"]
Expand All @@ -192,7 +200,12 @@
class CyclicTemporalEncoder(SingleEncoder):
"""`CyclicTemporalEncoder`: Cyclic encoding of time series datetime attributes."""

def __init__(self, index_generator: CovariatesIndexGenerator, attribute: str):
def __init__(
self,
index_generator: CovariatesIndexGenerator,
attribute: str,
tz: Optional[str] = None,
):
"""
Cyclic index encoding for `TimeSeries` that have a time index of type `pd.DatetimeIndex`.

Expand All @@ -208,9 +221,12 @@ def __init__(self, index_generator: CovariatesIndexGenerator, attribute: str):
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
For more information, check out :meth:`datetime_attribute_timeseries()
<darts.utils.timeseries_generation.datetime_attribute_timeseries>`
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(index_generator)
self.attribute = attribute
self.tz = tz

def _encode(
self, index: SupportedIndex, target_end: pd.Timestamp, dtype: np.dtype
Expand All @@ -226,6 +242,7 @@ def _encode(
self.base_component_name + self.attribute + "_sin",
self.base_component_name + self.attribute + "_cos",
],
tz=self.tz,
)

@property
Expand Down Expand Up @@ -255,6 +272,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
Expand All @@ -280,6 +298,8 @@ def __init__(
Optionally, a list of integers representing the past covariate lags. Accepts integer lag values <= -1.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_past_covariates` of :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=PastCovariatesIndexGenerator(
Expand All @@ -288,6 +308,7 @@ def __init__(
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)


Expand All @@ -300,6 +321,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
Expand All @@ -325,6 +347,8 @@ def __init__(
Optionally, a list of integers representing the future covariate lags. Accepts all integer values.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_future_covariates` from :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=FutureCovariatesIndexGenerator(
Expand All @@ -333,6 +357,7 @@ def __init__(
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)


Expand All @@ -341,7 +366,12 @@ class DatetimeAttributeEncoder(SingleEncoder):
Requires the underlying TimeSeries to have a pd.DatetimeIndex
"""

def __init__(self, index_generator: CovariatesIndexGenerator, attribute: str):
def __init__(
self,
index_generator: CovariatesIndexGenerator,
attribute: str,
tz: Optional[str] = None,
):
"""
Parameters
----------
Expand All @@ -355,9 +385,12 @@ def __init__(self, index_generator: CovariatesIndexGenerator, attribute: str):
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
For more information, check out :meth:`datetime_attribute_timeseries()
<darts.utils.timeseries_generation.datetime_attribute_timeseries>`
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(index_generator)
self.attribute = attribute
self.tz = tz

def _encode(
self, index: SupportedIndex, target_end: pd.Timestamp, dtype: np.dtype
Expand All @@ -369,6 +402,7 @@ def _encode(
attribute=self.attribute,
dtype=dtype,
with_columns=self.base_component_name + self.attribute,
tz=self.tz,
)

@property
Expand Down Expand Up @@ -398,6 +432,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
Expand All @@ -423,6 +458,8 @@ def __init__(
Optionally, a list of integers representing the past covariate lags. Accepts integer lag values <= -1.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_past_covariates` of :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=PastCovariatesIndexGenerator(
Expand All @@ -431,6 +468,7 @@ def __init__(
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)


Expand All @@ -443,6 +481,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
Expand All @@ -468,6 +507,8 @@ def __init__(
Optionally, a list of integers representing the future covariate lags. Accepts all integer values.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_future_covariates` from :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=FutureCovariatesIndexGenerator(
Expand All @@ -476,6 +517,7 @@ def __init__(
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)


Expand Down Expand Up @@ -567,6 +609,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
**kwargs,
):
"""
Parameters
Expand Down Expand Up @@ -610,6 +653,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
**kwargs,
):
"""
Parameters
Expand Down Expand Up @@ -713,6 +757,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
**kwargs,
):
"""
Parameters
Expand Down Expand Up @@ -759,6 +804,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
**kwargs,
):
"""
Parameters
Expand Down Expand Up @@ -837,6 +883,9 @@ def __init__(
<darts.dataprocessing.transformers.fittable_data_transformer.FittableDataTransformer>` such as Scaler() or
BoxCox(). The transformers will be fitted on the training dataset when calling calling `model.fit()`.
The training, validation and inference datasets are then transformed equally.
Supported time zone:
Optionally, apply a time zone conversion with keyword 'tz'. This converts the time zone-naive index to a
timezone `'tz'` before applying the `'cyclic'` or `'datetime_attribute'` temporal encoders.

An example of a valid `add_encoders` dict for hourly data:

Expand All @@ -849,7 +898,8 @@ def __init__(
'datetime_attribute': {'past': ['hour'], 'future': ['year', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [lambda idx: (idx.year - 1950) / 50]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET',
}

Tuples of `(encoder_id, attribute)` are extracted from `add_encoders` to instantiate the `SingleEncoder`
Expand Down Expand Up @@ -1289,6 +1339,7 @@ def _setup_encoders(self, params: Dict) -> None:
* params={'cyclic': {'past': ['month', 'dayofweek', ...], 'future': [same as for 'past']}}
"""
past_encoders, future_encoders = self._process_input_encoders(params)
tz = self._process_timezone(params)

if not past_encoders and not future_encoders:
return
Expand All @@ -1299,6 +1350,7 @@ def _setup_encoders(self, params: Dict) -> None:
input_chunk_length=self.input_chunk_length,
output_chunk_length=self.output_chunk_length,
lags_covariates=self.lags_past_covariates,
tz=tz,
)
for enc_id, attr in past_encoders
]
Expand All @@ -1308,6 +1360,7 @@ def _setup_encoders(self, params: Dict) -> None:
input_chunk_length=self.input_chunk_length,
output_chunk_length=self.output_chunk_length,
lags_covariates=self.lags_future_covariates,
tz=tz,
)
for enc_id, attr in future_encoders
]
Expand Down Expand Up @@ -1369,7 +1422,9 @@ def _process_input_encoders(self, params: Dict) -> Tuple[List, List]:

# check input for invalid encoder types
invalid_encoders = [
enc for enc in params if enc not in ENCODER_KEYS + TRANSFORMER_KEYS
enc
for enc in params
if enc not in ENCODER_KEYS + TZ_KEYS + TRANSFORMER_KEYS
]
raise_if(
len(invalid_encoders) > 0,
Expand Down Expand Up @@ -1480,6 +1535,21 @@ def _process_input_transformer(
]
return transformer, transform_past_mask, transform_future_mask

@staticmethod
def _process_timezone(params: Dict) -> Optional[str]:
"""Processes input params used at model creation for time zone specification, and returns the time zone.

Parameters
----------
params
Dict from parameter `add_encoders` (kwargs) used at model creation. Relevant parameters are:
* params={'tz': 'CET'}
"""
if not params:
return None

return params.get(TZ_KEYS[0], None)

@property
def requires_fit(self) -> bool:
return any(
Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/arima.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'future': ['relative']},
'custom': {'future': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..

Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/auto_arima.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'future': ['relative']},
'custom': {'future': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..

Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/block_rnn_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..
random_state
Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/catboost_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..
likelihood
Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/croston.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'future': ['relative']},
'custom': {'future': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..

Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/dlinear.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..
random_state
Expand Down
Loading
Loading