Skip to content

Commit

Permalink
Feat/scalar with window (#2529)
Browse files Browse the repository at this point in the history
* add basic scalar window support

* change scaler to a more robust generalisation

Co-authored-by: Dennis Bader <[email protected]>

* delete unused functions and add util function

* delete unused functions and add util function

* add transformers to optimized historical forecasts

* add covariate transformers, refactor + docstring update

* delete util to avoid circular import

* delete unused param and add transforms to torch models

* fix param name

* move all series and covariates fitting into one place, allow data transform without model retrain

* update readme and data types

* optimized forecasts only support invertible data transform

* feat: harmonize application of scaler in hf, support for Pipeline

* feat: adding basic test for regression models

* fix: using an util method to reduce code duplication

* fix: simplify the tests

* fix: makes things faster if no data transformer are passed

* feat: add test for the optimized hf

* fix: using util method in gridsearch as well

* fix: reverting some changes

* fix: make sure the series have a range that require scaling

* update changelog

* feat: adding small example about how to use scaler in historical forecasts

* fix: adress review comments

* fix: adapting the tests

* fix: moved the historical forecasts test to dedicated folder

* feat: make sure the already fitted data transformer process series correctly

* feat: adding tests for historical forecasts with scaler

* fix: remove duplicated test, also test tfm historical forecast with scaler

* fix: typo

* fix: adressing review comments

* fix: adjust the example of historical forecasts with auto-scaling according to revire comments

* fix: typo

* fix: adress review comments

* feat: possibility to select transformer idx

* fix: adding tests, fixing logic

* feat: added tests for the new data transformer features, fixed logic

* feat: add tests, fix logic

* fix: adding virtual env to gitignore

* fix: renamed the idx_params argument idx_series

* apply minor changes

* apply minor changes part 2

* add additional info to docs

* fix: replaced exception with warning when multiple series, retrain=True and data transformer defined with global_fit=True

---------

Co-authored-by: Jan Fidor <[email protected]>
Co-authored-by: Dennis Bader <[email protected]>
Co-authored-by: JanFidor <[email protected]>
  • Loading branch information
4 people authored Nov 25, 2024
1 parent d103a05 commit 39cf38c
Show file tree
Hide file tree
Showing 15 changed files with 1,363 additions and 530 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ darts_logs/
docs_env
.DS_Store
.gradle
.venv

# used by CI to build with latest versions of dependencies
requirements-latest.txt
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
**Improved**

- Improvements to `ForecastingModel`: Improved `start` handling for historical forecasts, backtest, residuals, and gridsearch. If `start` is not within the trainable / forecastable points, uses the closest valid start point that is a round multiple of `stride` ahead of start. Raises a ValueError, if no valid start point exists. This guarantees that all historical forecasts are `n * stride` points away from start, and will simplify many downstream tasks. [#2560](https://github.com/unit8co/darts/issues/2560) by [Dennis Bader](https://github.com/dennisbader).
- Added `data_transformers` argument to `historical_forecasts`, `backtest`, `residuals`, and `gridsearch` that allow to automatically apply `DataTransformer` and/or `Pipeline` to the input series without data-leakage (fit on historic window of input series, transform the input series, and inverse transform the forecasts). [#2529](https://github.com/unit8co/darts/pull/2529) by [Antoine Madrona](https://github.com/madtoinou) and [Jan Fidor](https://github.com/JanFidor)
- Added `series_idx` argument to `DataTransformer` that allows users to use only a subset of the transformers when `global_fit=False` and severals series are used. [#2529](https://github.com/unit8co/darts/pull/2529) by [Antoine Madrona](https://github.com/madtoinou)

**Fixed**

Expand Down
65 changes: 59 additions & 6 deletions darts/dataprocessing/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

from collections.abc import Iterator, Sequence
from copy import deepcopy
from typing import Union
from typing import Optional, Union

from darts import TimeSeries
from darts.dataprocessing.transformers import (
Expand Down Expand Up @@ -90,6 +90,16 @@ def __init__(
isinstance(t, InvertibleDataTransformer) for t in self._transformers
)

self._fittable = any(
isinstance(t, FittableDataTransformer) for t in self._transformers
)

self._global_fit = all(
t._global_fit
for t in self._transformers
if isinstance(t, FittableDataTransformer)
)

if verbose is not None:
for transformer in self._transformers:
transformer.set_verbose(verbose)
Expand Down Expand Up @@ -149,7 +159,9 @@ def fit_transform(
return data

def transform(
self, data: Union[TimeSeries, Sequence[TimeSeries]]
self,
data: Union[TimeSeries, Sequence[TimeSeries]],
series_idx: Optional[Union[int, Sequence[int]]] = None,
) -> Union[TimeSeries, Sequence[TimeSeries]]:
"""
For each data transformer in pipeline transform data. Then transformed data is passed to next transformer.
Expand All @@ -158,18 +170,24 @@ def transform(
----------
data
(`Sequence` of) `TimeSeries` to be transformed.
series_idx
Optionally, the index(es) of each series corresponding to their positions within the series used to fit
the transformer (to retrieve the appropriate transformer parameters).
Returns
-------
Union[TimeSeries, Sequence[TimeSeries]]
Transformed data.
"""
for transformer in self._transformers:
data = transformer.transform(data)
data = transformer.transform(data, series_idx=series_idx)
return data

def inverse_transform(
self, data: Union[TimeSeries, Sequence[TimeSeries]], partial: bool = False
self,
data: Union[TimeSeries, Sequence[TimeSeries]],
partial: bool = False,
series_idx: Optional[Union[int, Sequence[int]]] = None,
) -> Union[TimeSeries, Sequence[TimeSeries]]:
"""
For each data transformer in the pipeline, inverse-transform data. Then inverse transformed data is passed to
Expand All @@ -184,6 +202,9 @@ def inverse_transform(
partial
If set to `True`, the inverse transformation is applied even if the pipeline is not fully invertible,
calling `inverse_transform()` only on the `InvertibleDataTransformer`s
series_idx
Optionally, the index(es) of each series corresponding to their positions within the series used to fit
the transformer (to retrieve the appropriate transformer parameters).
Returns
-------
Expand All @@ -198,14 +219,18 @@ def inverse_transform(
)

for transformer in reversed(self._transformers):
data = transformer.inverse_transform(data)
data = transformer.inverse_transform(data, series_idx=series_idx)
return data
else:
for transformer in reversed(self._transformers):
if isinstance(transformer, InvertibleDataTransformer):
data = transformer.inverse_transform(data)
data = transformer.inverse_transform(
data,
series_idx=series_idx,
)
return data

@property
def invertible(self) -> bool:
"""
Returns whether the pipeline is invertible or not.
Expand All @@ -218,6 +243,34 @@ def invertible(self) -> bool:
"""
return self._invertible

@property
def fittable(self) -> bool:
"""
Returns whether the pipeline is fittable or not.
A pipeline is fittable if at least one of the transformers in the pipeline is fittable.
Returns
-------
bool
`True` if the pipeline is fittable, `False` otherwise
"""
return self._fittable

@property
def _fit_called(self) -> bool:
"""
Returns whether all the transformers in the pipeline were fitted (when applicable).
Returns
-------
bool
`True` if all the fittable transformers are fitted, `False` otherwise
"""
return all(
(not isinstance(t, FittableDataTransformer)) or t._fit_called
for t in self._transformers
)

def __getitem__(self, key: Union[int, slice]) -> "Pipeline":
"""
Gets subset of Pipeline based either on index or slice with indexes.
Expand Down
22 changes: 20 additions & 2 deletions darts/dataprocessing/transformers/base_data_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,7 @@ def transform(
series: Union[TimeSeries, Sequence[TimeSeries]],
*args,
component_mask: Optional[np.array] = None,
series_idx: Optional[Union[int, Sequence[int]]] = None,
**kwargs,
) -> Union[TimeSeries, list[TimeSeries]]:
"""Transforms a (sequence of) of series by calling the user-implemeneted `ts_transform` method.
Expand All @@ -328,6 +329,9 @@ def transform(
attribute was set to `True` when instantiating `BaseDataTransformer`, then the component mask
will be automatically applied to each `TimeSeries` input. Otherwise, `component_mask` will be
provided as an addition keyword argument to `ts_transform`. See 'Notes' for further details.
series_idx
Optionally, the index(es) of each series corresponding to their positions within the series used to fit
the transformer (to retrieve the appropriate transformer parameters).
kwargs
Additional keyword arguments for each :func:`ts_transform()` method call
Expand Down Expand Up @@ -360,10 +364,16 @@ def transform(
# Take note of original input for unmasking purposes:
if isinstance(series, TimeSeries):
data = [series]
transformer_selector = [0]
if series_idx:
transformer_selector = self._process_series_idx(series_idx)
else:
transformer_selector = [0]
else:
data = series
transformer_selector = range(len(series))
if series_idx:
transformer_selector = self._process_series_idx(series_idx)
else:
transformer_selector = range(len(series))

input_iterator = _build_tqdm_iterator(
zip(data, self._get_params(transformer_selector=transformer_selector)),
Expand Down Expand Up @@ -439,6 +449,14 @@ def _check_fixed_params(self, transformer_selector: Iterable) -> None:
)
return None

@staticmethod
def _process_series_idx(series_idx: Union[int, Sequence[int]]) -> Sequence[int]:
"""Convert the `series_idx` to a Sequence[int].
Note: the validity of the entries in series_idx is checked in _get_params().
"""
return [series_idx] if isinstance(series_idx, int) else series_idx

@staticmethod
def apply_component_mask(
series: TimeSeries,
Expand Down
16 changes: 16 additions & 0 deletions darts/dataprocessing/transformers/fittable_data_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,22 @@ def fit(
)
return self

def transform(
self,
series: Union[TimeSeries, Sequence[TimeSeries]],
*args,
component_mask: Optional[np.array] = None,
series_idx: Optional[Union[int, Sequence[int]]] = None,
**kwargs,
) -> Union[TimeSeries, list[TimeSeries]]:
return super().transform(
series=series,
*args,
component_mask=component_mask,
series_idx=series_idx if not self._global_fit else None,
**kwargs,
)

def fit_transform(
self,
series: Union[TimeSeries, Sequence[TimeSeries]],
Expand Down
20 changes: 17 additions & 3 deletions darts/dataprocessing/transformers/invertible_data_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,7 @@ def inverse_transform(
series: Union[TimeSeries, Sequence[TimeSeries], Sequence[Sequence[TimeSeries]]],
*args,
component_mask: Optional[np.array] = None,
series_idx: Optional[Union[int, Sequence[int]]] = None,
**kwargs,
) -> Union[TimeSeries, list[TimeSeries], list[list[TimeSeries]]]:
"""Inverse transforms a (sequence of) series by calling the user-implemented `ts_inverse_transform` method.
Expand Down Expand Up @@ -285,6 +286,9 @@ def inverse_transform(
component_mask : Optional[np.ndarray] = None
Optionally, a 1-D boolean np.ndarray of length ``series.n_components`` that specifies
which components of the underlying `series` the inverse transform should consider.
series_idx
Optionally, the index(es) of each series corresponding to their positions within the series used to fit
the transformer (to retrieve the appropriate transformer parameters).
kwargs
Additional keyword arguments for the :func:`ts_inverse_transform()` method
Expand Down Expand Up @@ -324,16 +328,26 @@ def inverse_transform(
called_with_sequence_series = False
if isinstance(series, TimeSeries):
data = [series]
transformer_selector = [0]
if series_idx:
transformer_selector = self._process_series_idx(series_idx)
else:
transformer_selector = [0]
called_with_single_series = True
elif isinstance(series[0], TimeSeries): # Sequence[TimeSeries]
data = series
transformer_selector = range(len(series))
if series_idx:
transformer_selector = self._process_series_idx(series_idx)
else:
transformer_selector = range(len(series))
called_with_sequence_series = True
else: # Sequence[Sequence[TimeSeries]]
data = []
transformer_selector = []
for idx, series_list in enumerate(series):
if series_idx:
iterator_ = zip(self._process_series_idx(series_idx), series)
else:
iterator_ = enumerate(series)
for idx, series_list in iterator_:
data.extend(series_list)
transformer_selector += [idx] * len(series_list)

Expand Down
Loading

0 comments on commit 39cf38c

Please sign in to comment.