Skip to content

Commit

Permalink
Merge branch 'master' into feat/add_midas_transformer
Browse files Browse the repository at this point in the history
  • Loading branch information
madtoinou authored Nov 6, 2023
2 parents 1d5b9be + 772d705 commit 6c0f4a3
Show file tree
Hide file tree
Showing 31 changed files with 374 additions and 50 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
- Adapted the example notebooks to properly apply data transformers and avoid look-ahead bias. [#2020](https://github.com/unit8co/darts/pull/2020) by [Samriddhi Singh](https://github.com/SimTheGreat).
- Improvements to Regression Models:
- `XGBModel` now leverages XGBoost's native Quantile Regression support that was released in version 2.0.0 for improved probabilistic forecasts. [#2051](https://github.com/unit8co/darts/pull/2051) by [Dennis Bader](https://github.com/dennisbader).
- Other improvements:
- Added support for time index time zone conversion with parameter `tz` before generating/computing holidays and datetime attributes. Support was added to all Time Axis Encoders (standalone encoders and forecasting models' `add_encoders`, time series generation utils functions `holidays_timeseries()` and `datetime_attribute_timeseries()`, and `TimeSeries` methods `add_datetime_attribute()` and `add_holidays()`. [#2054](https://github.com/unit8co/darts/pull/2054) by [Dennis Bader](https://github.com/dennisbader).
- New `MIDAS` fittable-invertible data-transformer, which uses mixed-data sampling to convert `TimeSeries` from high frequency to low frequency. [#1820](https://github.com/unit8co/darts/pull/1820) by [Boyd Biersteker](https://github.com/Beerstabr) and [Antoine Madrona](https://github.com/madtoinou).

**Fixed**
Expand Down
80 changes: 75 additions & 5 deletions darts/dataprocessing/encoders/encoders.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
input_chunk_length=24,
output_chunk_length=12,
attribute='month'
tz='CET'
)
past_covariates_train = encoder.encode_train(
Expand Down Expand Up @@ -75,6 +76,8 @@
attribute
An attribute of `pd.DatetimeIndex`: see all available attributes in
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
tz
Optionally, convert the time zone naive index to a time zone `tz` before applying the encoder.
* `CyclicTemporalEncoder`
Adds cyclic pd.DatetimeIndex attribute information deriveed from `series.time_index`.
Adds 2 columns, corresponding to sin and cos encodings, to uniquely describe the underlying attribute.
Expand All @@ -84,6 +87,8 @@
An attribute of `pd.DatetimeIndex` that follows a cyclic pattern. One of ('month', 'day', 'weekday',
'dayofweek', 'day_of_week', 'hour', 'minute', 'second', 'microsecond', 'nanosecond', 'quarter',
'dayofyear', 'day_of_year', 'week', 'weekofyear', 'week_of_year').
tz
Optionally, convert the time zone naive index to a time zone `tz` before applying the encoder.
* `IntegerIndexEncoder`
Adds the relative index positions as integer values (positions) derived from `series` time index.
`series` can either have a pd.DatetimeIndex or an integer index.
Expand Down Expand Up @@ -121,6 +126,7 @@
* 'position' for `IntegerIndexEncoder`
* 'custom' for `CallableIndexEncoder`
* 'transformer' for a transformer
* 'tz' for applying a time zone conversion
* inner keys: covariates type
* 'past' for past covariates
Expand All @@ -142,7 +148,8 @@
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [lambda idx: (idx.year - 1950) / 50]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET',
}
model = SomeTorchForecastingModel(..., add_encoders=add_encoders)
Expand Down Expand Up @@ -184,6 +191,7 @@
VALID_TIME_PARAMS = [FUTURE, PAST]
VALID_ENCODER_DTYPES = (str, Sequence)

TZ_KEYS = ["tz"]
TRANSFORMER_KEYS = ["transformer"]
VALID_TRANSFORMER_DTYPES = FittableDataTransformer
INTEGER_INDEX_ATTRIBUTES = ["relative"]
Expand All @@ -192,7 +200,12 @@
class CyclicTemporalEncoder(SingleEncoder):
"""`CyclicTemporalEncoder`: Cyclic encoding of time series datetime attributes."""

def __init__(self, index_generator: CovariatesIndexGenerator, attribute: str):
def __init__(
self,
index_generator: CovariatesIndexGenerator,
attribute: str,
tz: Optional[str] = None,
):
"""
Cyclic index encoding for `TimeSeries` that have a time index of type `pd.DatetimeIndex`.
Expand All @@ -208,9 +221,12 @@ def __init__(self, index_generator: CovariatesIndexGenerator, attribute: str):
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
For more information, check out :meth:`datetime_attribute_timeseries()
<darts.utils.timeseries_generation.datetime_attribute_timeseries>`
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(index_generator)
self.attribute = attribute
self.tz = tz

def _encode(
self, index: SupportedIndex, target_end: pd.Timestamp, dtype: np.dtype
Expand All @@ -226,6 +242,7 @@ def _encode(
self.base_component_name + self.attribute + "_sin",
self.base_component_name + self.attribute + "_cos",
],
tz=self.tz,
)

@property
Expand Down Expand Up @@ -255,6 +272,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
Expand All @@ -280,6 +298,8 @@ def __init__(
Optionally, a list of integers representing the past covariate lags. Accepts integer lag values <= -1.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_past_covariates` of :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=PastCovariatesIndexGenerator(
Expand All @@ -288,6 +308,7 @@ def __init__(
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)


Expand All @@ -300,6 +321,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
Expand All @@ -325,6 +347,8 @@ def __init__(
Optionally, a list of integers representing the future covariate lags. Accepts all integer values.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_future_covariates` from :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=FutureCovariatesIndexGenerator(
Expand All @@ -333,6 +357,7 @@ def __init__(
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)


Expand All @@ -341,7 +366,12 @@ class DatetimeAttributeEncoder(SingleEncoder):
Requires the underlying TimeSeries to have a pd.DatetimeIndex
"""

def __init__(self, index_generator: CovariatesIndexGenerator, attribute: str):
def __init__(
self,
index_generator: CovariatesIndexGenerator,
attribute: str,
tz: Optional[str] = None,
):
"""
Parameters
----------
Expand All @@ -355,9 +385,12 @@ def __init__(self, index_generator: CovariatesIndexGenerator, attribute: str):
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
For more information, check out :meth:`datetime_attribute_timeseries()
<darts.utils.timeseries_generation.datetime_attribute_timeseries>`
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(index_generator)
self.attribute = attribute
self.tz = tz

def _encode(
self, index: SupportedIndex, target_end: pd.Timestamp, dtype: np.dtype
Expand All @@ -369,6 +402,7 @@ def _encode(
attribute=self.attribute,
dtype=dtype,
with_columns=self.base_component_name + self.attribute,
tz=self.tz,
)

@property
Expand Down Expand Up @@ -398,6 +432,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
Expand All @@ -423,6 +458,8 @@ def __init__(
Optionally, a list of integers representing the past covariate lags. Accepts integer lag values <= -1.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_past_covariates` of :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=PastCovariatesIndexGenerator(
Expand All @@ -431,6 +468,7 @@ def __init__(
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)


Expand All @@ -443,6 +481,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
Expand All @@ -468,6 +507,8 @@ def __init__(
Optionally, a list of integers representing the future covariate lags. Accepts all integer values.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_future_covariates` from :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=FutureCovariatesIndexGenerator(
Expand All @@ -476,6 +517,7 @@ def __init__(
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)


Expand Down Expand Up @@ -567,6 +609,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
**kwargs,
):
"""
Parameters
Expand Down Expand Up @@ -610,6 +653,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
**kwargs,
):
"""
Parameters
Expand Down Expand Up @@ -713,6 +757,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
**kwargs,
):
"""
Parameters
Expand Down Expand Up @@ -759,6 +804,7 @@ def __init__(
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[List[int]] = None,
**kwargs,
):
"""
Parameters
Expand Down Expand Up @@ -837,6 +883,9 @@ def __init__(
<darts.dataprocessing.transformers.fittable_data_transformer.FittableDataTransformer>` such as Scaler() or
BoxCox(). The transformers will be fitted on the training dataset when calling calling `model.fit()`.
The training, validation and inference datasets are then transformed equally.
Supported time zone:
Optionally, apply a time zone conversion with keyword 'tz'. This converts the time zone-naive index to a
timezone `'tz'` before applying the `'cyclic'` or `'datetime_attribute'` temporal encoders.
An example of a valid `add_encoders` dict for hourly data:
Expand All @@ -849,7 +898,8 @@ def __init__(
'datetime_attribute': {'past': ['hour'], 'future': ['year', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [lambda idx: (idx.year - 1950) / 50]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET',
}
Tuples of `(encoder_id, attribute)` are extracted from `add_encoders` to instantiate the `SingleEncoder`
Expand Down Expand Up @@ -1289,6 +1339,7 @@ def _setup_encoders(self, params: Dict) -> None:
* params={'cyclic': {'past': ['month', 'dayofweek', ...], 'future': [same as for 'past']}}
"""
past_encoders, future_encoders = self._process_input_encoders(params)
tz = self._process_timezone(params)

if not past_encoders and not future_encoders:
return
Expand All @@ -1299,6 +1350,7 @@ def _setup_encoders(self, params: Dict) -> None:
input_chunk_length=self.input_chunk_length,
output_chunk_length=self.output_chunk_length,
lags_covariates=self.lags_past_covariates,
tz=tz,
)
for enc_id, attr in past_encoders
]
Expand All @@ -1308,6 +1360,7 @@ def _setup_encoders(self, params: Dict) -> None:
input_chunk_length=self.input_chunk_length,
output_chunk_length=self.output_chunk_length,
lags_covariates=self.lags_future_covariates,
tz=tz,
)
for enc_id, attr in future_encoders
]
Expand Down Expand Up @@ -1369,7 +1422,9 @@ def _process_input_encoders(self, params: Dict) -> Tuple[List, List]:

# check input for invalid encoder types
invalid_encoders = [
enc for enc in params if enc not in ENCODER_KEYS + TRANSFORMER_KEYS
enc
for enc in params
if enc not in ENCODER_KEYS + TZ_KEYS + TRANSFORMER_KEYS
]
raise_if(
len(invalid_encoders) > 0,
Expand Down Expand Up @@ -1480,6 +1535,21 @@ def _process_input_transformer(
]
return transformer, transform_past_mask, transform_future_mask

@staticmethod
def _process_timezone(params: Dict) -> Optional[str]:
"""Processes input params used at model creation for time zone specification, and returns the time zone.
Parameters
----------
params
Dict from parameter `add_encoders` (kwargs) used at model creation. Relevant parameters are:
* params={'tz': 'CET'}
"""
if not params:
return None

return params.get(TZ_KEYS[0], None)

@property
def requires_fit(self) -> bool:
return any(
Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/arima.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'future': ['relative']},
'custom': {'future': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..
Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/auto_arima.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'future': ['relative']},
'custom': {'future': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..
Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/block_rnn_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..
random_state
Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/catboost_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..
likelihood
Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/croston.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'future': ['relative']},
'custom': {'future': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..
Expand Down
3 changes: 2 additions & 1 deletion darts/models/forecasting/dlinear.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,8 @@ def encode_year(idx):
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [encode_year]},
'transformer': Scaler()
'transformer': Scaler(),
'tz': 'CET'
}
..
random_state
Expand Down
Loading

0 comments on commit 6c0f4a3

Please sign in to comment.