Skip to content

Commit

Permalink
Feat/specify lags per component for RegressionModel (#1962)
Browse files Browse the repository at this point in the history
* feat: updated lags sanity checks to accept dictionnary

* fix: better management of corner cases during lags checks

* fix: improved modularity

* fix: simplified the logic a bit

* feat: when generating lagged data, the values can be extracted using component-specific lags

* feat: raise error if all the ts in target/past/future don't have the same number of components

* feat: added support for component-specific lags in fit() and predict()

* test: added tests and fix some bug accordingly

* feat: component-wise lags support encoders, improved sanity checks

* feat: possibility to declare default lags for all the not specified components, updated changelog

* test: adding a test for the lagged data creation

* fix: typo

* fix: adressing review comments

* Apply suggestions from code review

Co-authored-by: Dennis Bader <[email protected]>

* refactor: lags argument are converted to dict before running the type check and processing of the values

* refactor: lags argument are converted to dict before running the type check and processing of the values

* doc: improved documentation of the component-specific lags in tabularization

* test: adding a test for the multivariate scenario

* test: checking the appriopriate lags are extracted by the shap explainer

* fix: shapexplainer extract the appropriate lags, updated the type hints

* fix: passing covariates when trained on multiple series

* fix: moved the series components consistency to create_lagged_data to limit iteration of the series

* fix: improved the error message for components inconsistency, improve tests parametrization

* fix: addressing reviewer comments

* Apply suggestions from code review

Co-authored-by: Dennis Bader <[email protected]>

* test: checking that the name of the features is correctly generated when using dict to define the lags

* fix: linting

* fix: updating the error msg

* fix: bug when the number of lags is different across components

* fix: future lags in test

---------

Co-authored-by: Dennis Bader <[email protected]>
  • Loading branch information
madtoinou and dennisbader authored Sep 14, 2023
1 parent a6ceb5d commit b3498bf
Show file tree
Hide file tree
Showing 13 changed files with 999 additions and 270 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
- `TimeSeries` with a `RangeIndex` starting in the negative start are now supported by `historical_forecasts`. [#1866](https://github.com/unit8co/darts/pull/1866) by [Antoine Madrona](https://github.com/madtoinou).
- Added a new argument `start_format` to `historical_forecasts()`, `backtest()` and `gridsearch` that allows to use an integer `start` either as the index position or index value/label for `series` indexed with a `pd.RangeIndex`. [#1866](https://github.com/unit8co/darts/pull/1866) by [Antoine Madrona](https://github.com/madtoinou).
- Added `RINorm` (Reversible Instance Norm) as an input normalization option for all `TorchForecastingModel` except `RNNModel`. Activate it with model creation parameter `use_reversible_instance_norm`. [#1969](https://github.com/unit8co/darts/pull/1969) by [Dennis Bader](https://github.com/dennisbader).
- Reduced the size of the Darts docker image `unit8/darts:latest`, and included all optional models as well as dev requirements. [#1878](https://github.com/unit8co/darts/pull/1878) by [Alex Colpitts](https://github.com/alexcolpitts96).
- Reduced the size of the Darts docker image `unit8/darts:latest`, and included all optional models as well as dev requirements. [#1878](https://github.com/unit8co/darts/pull/1878) by [Alex Colpitts](https://github.com/alexcolpitts96).
- Added short examples in the docstring of all the models, including covariates usage and some model-specific parameters. [#1956](https://github.com/unit8co/darts/pull/1956) by [Antoine Madrona](https://github.com/madtoinou).
- All `RegressionModel`s now support component/column-specific lags for target, past, and future covariates series. [#1962](https://github.com/unit8co/darts/pull/1962) by [Antoine Madrona](https://github.com/madtoinou).

**Fixed**
- Fixed a bug in `TimeSeries.from_dataframe()` when using a pandas.DataFrame with `df.columns.name != None`. [#1938](https://github.com/unit8co/darts/pull/1938) by [Antoine Madrona](https://github.com/madtoinou).
Expand Down
12 changes: 6 additions & 6 deletions darts/explainability/shap_explainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -732,9 +732,9 @@ def _build_explainer_sklearn(

def _create_regression_model_shap_X(
self,
target_series,
past_covariates,
future_covariates,
target_series: Optional[Union[TimeSeries, Sequence[TimeSeries]]],
past_covariates: Optional[Union[TimeSeries, Sequence[TimeSeries]]],
future_covariates: Optional[Union[TimeSeries, Sequence[TimeSeries]]],
n_samples=None,
train=False,
) -> pd.DataFrame:
Expand All @@ -746,9 +746,9 @@ def _create_regression_model_shap_X(
"""

lags_list = self.model.lags.get("target")
lags_past_covariates_list = self.model.lags.get("past")
lags_future_covariates_list = self.model.lags.get("future")
lags_list = self.model._get_lags("target")
lags_past_covariates_list = self.model._get_lags("past")
lags_future_covariates_list = self.model._get_lags("future")

X, indexes = create_lagged_prediction_data(
target_series=target_series if lags_list else None,
Expand Down
48 changes: 33 additions & 15 deletions darts/models/forecasting/lgbm.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,15 @@
https://github.com/unit8co/darts/blob/master/INSTALL.md
"""

from typing import List, Optional, Sequence, Tuple, Union
from typing import List, Optional, Sequence, Union

import lightgbm as lgb
import numpy as np

from darts.logging import get_logger
from darts.models.forecasting.regression_model import (
FUTURE_LAGS_TYPE,
LAGS_TYPE,
RegressionModelWithCategoricalCovariates,
_LikelihoodMixin,
)
Expand All @@ -28,13 +30,13 @@
class LightGBMModel(RegressionModelWithCategoricalCovariates, _LikelihoodMixin):
def __init__(
self,
lags: Union[int, list] = None,
lags_past_covariates: Union[int, List[int]] = None,
lags_future_covariates: Union[Tuple[int, int], List[int]] = None,
lags: Optional[LAGS_TYPE] = None,
lags_past_covariates: Optional[LAGS_TYPE] = None,
lags_future_covariates: Optional[FUTURE_LAGS_TYPE] = None,
output_chunk_length: int = 1,
add_encoders: Optional[dict] = None,
likelihood: str = None,
quantiles: List[float] = None,
likelihood: Optional[str] = None,
quantiles: Optional[List[float]] = None,
random_state: Optional[int] = None,
multi_models: Optional[bool] = True,
use_static_covariates: bool = True,
Expand All @@ -48,17 +50,33 @@ def __init__(
Parameters
----------
lags
Lagged target values used to predict the next time step. If an integer is given the last `lags` past lags
are used (from -1 backward). Otherwise a list of integers with lags is required (each lag must be < 0).
Lagged target `series` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags` past lags; e.g. `(-1, -2, ..., -lags)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `series` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_past_covariates
Number of lagged past_covariates values used to predict the next time step. If an integer is given the last
`lags_past_covariates` past lags are used (inclusive, starting from lag -1). Otherwise a list of integers
with lags < 0 is required.
Lagged `past_covariates` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags_past_covariates` past lags; e.g. `(-1, -2, ..., -lags)`,
where `0` corresponds to the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `past_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_future_covariates
Number of lagged future_covariates values used to predict the next time step. If an tuple (past, future) is
given the last `past` lags in the past are used (inclusive, starting from lag -1) along with the first
`future` future lags (starting from 0 - the prediction time - up to `future - 1` included). Otherwise a list
of integers with lags is required.
Lagged `future_covariates` values used to predict the next time step/s.
If a tuple of `(past, future)`, both values must be > 0. Uses the last `n=past` past lags and `n=future`
future lags; e.g. `(-past, -(past - 1), ..., -1, 0, 1, .... future - 1)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, uses only the specified values as lags.
If a dictionary, the keys correspond to the `future_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (tuple or list of integers). The key
'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
output_chunk_length
Number of time steps predicted at once by the internal regression model. Does not have to equal the forecast
horizon `n` used in `predict()`. However, setting `output_chunk_length` equal to the forecast horizon may
Expand Down
53 changes: 37 additions & 16 deletions darts/models/forecasting/linear_regression_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,19 @@
A forecasting model using a linear regression of some of the target series' lags, as well as optionally some
covariate series lags in order to obtain a forecast.
"""
from typing import List, Optional, Sequence, Tuple, Union
from typing import List, Optional, Sequence, Union

import numpy as np
from scipy.optimize import linprog
from sklearn.linear_model import LinearRegression, PoissonRegressor, QuantileRegressor

from darts.logging import get_logger
from darts.models.forecasting.regression_model import RegressionModel, _LikelihoodMixin
from darts.models.forecasting.regression_model import (
FUTURE_LAGS_TYPE,
LAGS_TYPE,
RegressionModel,
_LikelihoodMixin,
)
from darts.timeseries import TimeSeries

logger = get_logger(__name__)
Expand All @@ -21,13 +26,13 @@
class LinearRegressionModel(RegressionModel, _LikelihoodMixin):
def __init__(
self,
lags: Union[int, list] = None,
lags_past_covariates: Union[int, List[int]] = None,
lags_future_covariates: Union[Tuple[int, int], List[int]] = None,
lags: Optional[LAGS_TYPE] = None,
lags_past_covariates: Optional[LAGS_TYPE] = None,
lags_future_covariates: Optional[FUTURE_LAGS_TYPE] = None,
output_chunk_length: int = 1,
add_encoders: Optional[dict] = None,
likelihood: str = None,
quantiles: List[float] = None,
likelihood: Optional[str] = None,
quantiles: Optional[List[float]] = None,
random_state: Optional[int] = None,
multi_models: Optional[bool] = True,
use_static_covariates: bool = True,
Expand All @@ -38,17 +43,33 @@ def __init__(
Parameters
----------
lags
Lagged target values used to predict the next time step. If an integer is given the last `lags` past lags
are used (from -1 backward). Otherwise a list of integers with lags is required (each lag must be < 0).
Lagged target `series` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags` past lags; e.g. `(-1, -2, ..., -lags)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `series` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_past_covariates
Number of lagged past_covariates values used to predict the next time step. If an integer is given the last
`lags_past_covariates` past lags are used (inclusive, starting from lag -1). Otherwise a list of integers
with lags < 0 is required.
Lagged `past_covariates` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags_past_covariates` past lags; e.g. `(-1, -2, ..., -lags)`,
where `0` corresponds to the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `past_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_future_covariates
Number of lagged future_covariates values used to predict the next time step. If an tuple (past, future) is
given the last `past` lags in the past are used (inclusive, starting from lag -1) along with the first
`future` future lags (starting from 0 - the prediction time - up to `future - 1` included). Otherwise a list
of integers with lags is required.
Lagged `future_covariates` values used to predict the next time step/s.
If a tuple of `(past, future)`, both values must be > 0. Uses the last `n=past` past lags and `n=future`
future lags; e.g. `(-past, -(past - 1), ..., -1, 0, 1, .... future - 1)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, uses only the specified values as lags.
If a dictionary, the keys correspond to the `future_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (tuple or list of integers). The key
'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
output_chunk_length
Number of time steps predicted at once by the internal regression model. Does not have to equal the forecast
horizon `n` used in `predict()`. However, setting `output_chunk_length` equal to the forecast horizon may
Expand Down
48 changes: 34 additions & 14 deletions darts/models/forecasting/random_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,22 +14,26 @@
----------
.. [1] https://en.wikipedia.org/wiki/Random_forest
"""
from typing import List, Optional, Tuple, Union
from typing import Optional

from sklearn.ensemble import RandomForestRegressor

from darts.logging import get_logger
from darts.models.forecasting.regression_model import RegressionModel
from darts.models.forecasting.regression_model import (
FUTURE_LAGS_TYPE,
LAGS_TYPE,
RegressionModel,
)

logger = get_logger(__name__)


class RandomForest(RegressionModel):
def __init__(
self,
lags: Union[int, list] = None,
lags_past_covariates: Union[int, List[int]] = None,
lags_future_covariates: Union[Tuple[int, int], List[int]] = None,
lags: Optional[LAGS_TYPE] = None,
lags_past_covariates: Optional[LAGS_TYPE] = None,
lags_future_covariates: Optional[FUTURE_LAGS_TYPE] = None,
output_chunk_length: int = 1,
add_encoders: Optional[dict] = None,
n_estimators: Optional[int] = 100,
Expand All @@ -43,17 +47,33 @@ def __init__(
Parameters
----------
lags
Lagged target values used to predict the next time step. If an integer is given the last `lags` past lags
are used (from -1 backward). Otherwise a list of integers with lags is required (each lag must be < 0).
Lagged target `series` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags` past lags; e.g. `(-1, -2, ..., -lags)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `series` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_past_covariates
Number of lagged past_covariates values used to predict the next time step. If an integer is given the last
`lags_past_covariates` past lags are used (inclusive, starting from lag -1). Otherwise a list of integers
with lags < 0 is required.
Lagged `past_covariates` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags_past_covariates` past lags; e.g. `(-1, -2, ..., -lags)`,
where `0` corresponds to the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `past_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_future_covariates
Number of lagged future_covariates values used to predict the next time step. If an tuple (past, future) is
given the last `past` lags in the past are used (inclusive, starting from lag -1) along with the first
`future` future lags (starting from 0 - the prediction time - up to `future - 1` included). Otherwise a list
of integers with lags is required.
Lagged `future_covariates` values used to predict the next time step/s.
If a tuple of `(past, future)`, both values must be > 0. Uses the last `n=past` past lags and `n=future`
future lags; e.g. `(-past, -(past - 1), ..., -1, 0, 1, .... future - 1)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, uses only the specified values as lags.
If a dictionary, the keys correspond to the `future_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (tuple or list of integers). The key
'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
output_chunk_length
Number of time steps predicted at once by the internal regression model. Does not have to equal the forecast
horizon `n` used in `predict()`. However, setting `output_chunk_length` equal to the forecast horizon may
Expand Down
Loading

0 comments on commit b3498bf

Please sign in to comment.