Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/specify lags per component for RegressionModel #1962

Merged
merged 38 commits into from
Sep 14, 2023
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
cfd381b
feat: updated lags sanity checks to accept dictionnary
madtoinou Aug 18, 2023
b3ce1f1
fix: better management of corner cases during lags checks
madtoinou Aug 18, 2023
2dde70f
fix: improved modularity
madtoinou Aug 18, 2023
65c82a7
fix: simplified the logic a bit
madtoinou Aug 18, 2023
9c5b312
feat: when generating lagged data, the values can be extracted using …
madtoinou Aug 18, 2023
753db5b
feat: raise error if all the ts in target/past/future don't have the …
madtoinou Aug 18, 2023
0cdeee7
feat: added support for component-specific lags in fit() and predict()
madtoinou Aug 21, 2023
f24ea84
test: added tests and fix some bug accordingly
madtoinou Aug 21, 2023
01b8409
feat: component-wise lags support encoders, improved sanity checks
madtoinou Aug 21, 2023
a671af8
feat: possibility to declare default lags for all the not specified c…
madtoinou Aug 23, 2023
2aa96a4
test: adding a test for the lagged data creation
madtoinou Aug 23, 2023
c3133b2
fix: typo
madtoinou Aug 23, 2023
41f30ec
Merge branch 'master' into feat/lags_per_component
madtoinou Aug 25, 2023
646b671
fix: adressing review comments
madtoinou Aug 25, 2023
3221f86
Apply suggestions from code review
madtoinou Aug 25, 2023
3254db3
refactor: lags argument are converted to dict before running the type…
madtoinou Aug 28, 2023
269005e
refactor: lags argument are converted to dict before running the type…
madtoinou Aug 28, 2023
bcd4455
doc: improved documentation of the component-specific lags in tabular…
madtoinou Aug 28, 2023
b859d9a
test: adding a test for the multivariate scenario
madtoinou Aug 28, 2023
c0121a5
test: checking the appriopriate lags are extracted by the shap explainer
madtoinou Aug 29, 2023
d682f13
fix: shapexplainer extract the appropriate lags, updated the type hints
madtoinou Aug 29, 2023
9db7f73
Merge branch 'master' into feat/lags_per_component
madtoinou Aug 31, 2023
96f1a7f
fix: passing covariates when trained on multiple series
madtoinou Aug 31, 2023
07f0f83
Merge branch 'master' into feat/lags_per_component
madtoinou Aug 31, 2023
d987141
fix: moved the series components consistency to create_lagged_data to…
madtoinou Aug 31, 2023
70467cf
fix: improved the error message for components inconsistency, improve…
madtoinou Aug 31, 2023
da735a2
Merge branch 'master' into feat/lags_per_component
madtoinou Sep 1, 2023
f2a9e08
fix: addressing reviewer comments
madtoinou Sep 1, 2023
f0967f6
Apply suggestions from code review
madtoinou Sep 1, 2023
be53695
test: checking that the name of the features is correctly generated w…
madtoinou Sep 4, 2023
b23da55
Merge branch 'master' into feat/lags_per_component
madtoinou Sep 4, 2023
1b2bd4c
fix: linting
madtoinou Sep 4, 2023
1ea2c7f
fix: updating the error msg
madtoinou Sep 4, 2023
0624f86
Merge branch 'master' into feat/lags_per_component
madtoinou Sep 6, 2023
970d8a3
fix: bug when the number of lags is different across components
madtoinou Sep 14, 2023
37c6b26
Merge branch 'master' into feat/lags_per_component
madtoinou Sep 14, 2023
edf8554
fix: future lags in test
madtoinou Sep 14, 2023
1235e59
Merge branch 'master' into feat/lags_per_component
madtoinou Sep 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
**Improved**
- `TimeSeries` with a `RangeIndex` starting in the negative start are now supported by `historical_forecasts`. [#1866](https://github.com/unit8co/darts/pull/1866) by [Antoine Madrona](https://github.com/madtoinou).
- Added a new argument `start_format` to `historical_forecasts()`, `backtest()` and `gridsearch` that allows to use an integer `start` either as the index position or index value/label for `series` indexed with a `pd.RangeIndex`. [#1866](https://github.com/unit8co/darts/pull/1866) by [Antoine Madrona](https://github.com/madtoinou).
- `RegressionModel` can now be created with different lags for each component of the target and past/future covariates series. [#1962](https://github.com/unit8co/darts/pull/1962) by [Antoine Madrona](https://github.com/madtoinou).
madtoinou marked this conversation as resolved.
Show resolved Hide resolved

**Fixed**
- Fixed a bug in `TimeSeries.from_dataframe()` when using a pandas.DataFrame with `df.columns.name != None`. [#1938](https://github.com/unit8co/darts/pull/1938) by [Antoine Madrona](https://github.com/madtoinou).
Expand Down
32 changes: 18 additions & 14 deletions Dockerfile
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,20 +1,24 @@
FROM jupyter/base-notebook:python-3.9.5
FROM ubuntu:latest

RUN conda update --all -y --quiet \
&& conda install -c conda-forge ipywidgets -y --quiet \
&& conda clean --all -f -y
# setup packages
RUN apt-get update -y
RUN apt-get install -y python3 python-is-python3 python3-pip default-jre
RUN pip install --upgrade pip

USER root
# install python requirements before copying the rest of the files
# this way we can cache the requirements and not have to reinstall them
COPY requirements/ /app/requirements/
RUN pip install -r /app/requirements/dev-all.txt

# to build pystan
RUN apt-get update \
&& apt-get -y install build-essential \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
# copy local files
COPY . /app

USER $NB_USER
# set work directory
WORKDIR /app

ADD . /home/jovyan/work
# install darts
RUN pip install -e .

WORKDIR /home/jovyan/work

RUN pip install .
# assuming you are working out of your darts directory:
# docker build . -t darts-test:latest
# docker run -it -v $(pwd)/:/app/ darts-test:latest bash
12 changes: 6 additions & 6 deletions darts/explainability/shap_explainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -732,9 +732,9 @@ def _build_explainer_sklearn(

def _create_regression_model_shap_X(
self,
target_series,
past_covariates,
future_covariates,
target_series: Optional[Union[TimeSeries, Sequence[TimeSeries]]],
past_covariates: Optional[Union[TimeSeries, Sequence[TimeSeries]]],
future_covariates: Optional[Union[TimeSeries, Sequence[TimeSeries]]],
n_samples=None,
train=False,
) -> pd.DataFrame:
Expand All @@ -746,9 +746,9 @@ def _create_regression_model_shap_X(

"""

lags_list = self.model.lags.get("target")
lags_past_covariates_list = self.model.lags.get("past")
lags_future_covariates_list = self.model.lags.get("future")
lags_list = self.model._get_lags("target")
lags_past_covariates_list = self.model._get_lags("past")
lags_future_covariates_list = self.model._get_lags("future")

X, indexes = create_lagged_prediction_data(
target_series=target_series if lags_list else None,
Expand Down
48 changes: 33 additions & 15 deletions darts/models/forecasting/lgbm.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,15 @@
https://github.com/unit8co/darts/blob/master/INSTALL.md
"""

from typing import List, Optional, Sequence, Tuple, Union
from typing import List, Optional, Sequence, Union

import lightgbm as lgb
import numpy as np

from darts.logging import get_logger
from darts.models.forecasting.regression_model import (
FUTURE_LAGS_TYPE,
LAGS_TYPE,
RegressionModelWithCategoricalCovariates,
_LikelihoodMixin,
)
Expand All @@ -28,13 +30,13 @@
class LightGBMModel(RegressionModelWithCategoricalCovariates, _LikelihoodMixin):
def __init__(
self,
lags: Union[int, list] = None,
lags_past_covariates: Union[int, List[int]] = None,
lags_future_covariates: Union[Tuple[int, int], List[int]] = None,
lags: Optional[LAGS_TYPE] = None,
lags_past_covariates: Optional[LAGS_TYPE] = None,
lags_future_covariates: Optional[FUTURE_LAGS_TYPE] = None,
output_chunk_length: int = 1,
add_encoders: Optional[dict] = None,
likelihood: str = None,
quantiles: List[float] = None,
likelihood: Optional[str] = None,
quantiles: Optional[List[float]] = None,
random_state: Optional[int] = None,
multi_models: Optional[bool] = True,
use_static_covariates: bool = True,
Expand All @@ -48,17 +50,33 @@ def __init__(
Parameters
----------
lags
Lagged target values used to predict the next time step. If an integer is given the last `lags` past lags
are used (from -1 backward). Otherwise a list of integers with lags is required (each lag must be < 0).
Lagged target `series` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags` past lags; e.g. `(-1, -2, ..., -lags)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `series` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_past_covariates
Number of lagged past_covariates values used to predict the next time step. If an integer is given the last
`lags_past_covariates` past lags are used (inclusive, starting from lag -1). Otherwise a list of integers
with lags < 0 is required.
Lagged `past_covariates` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags_past_covariates` past lags; e.g. `(-1, -2, ..., -lags)`,
where `0` corresponds to the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `past_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_future_covariates
Number of lagged future_covariates values used to predict the next time step. If an tuple (past, future) is
given the last `past` lags in the past are used (inclusive, starting from lag -1) along with the first
`future` future lags (starting from 0 - the prediction time - up to `future - 1` included). Otherwise a list
of integers with lags is required.
Lagged `future_covariates` values used to predict the next time step/s.
If a tuple of `(past, future)`, both values must be > 0. Uses the last `n=past` past lags and `n=future`
future lags; e.g. `(-past, -(past - 1), ..., -1, 0, 1, .... future - 1)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, uses only the specified values as lags.
If a dictionary, the keys correspond to the `future_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (tuple or list of integers). The key
'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
output_chunk_length
Number of time steps predicted at once by the internal regression model. Does not have to equal the forecast
horizon `n` used in `predict()`. However, setting `output_chunk_length` equal to the forecast horizon may
Expand Down
53 changes: 37 additions & 16 deletions darts/models/forecasting/linear_regression_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,19 @@
A forecasting model using a linear regression of some of the target series' lags, as well as optionally some
covariate series lags in order to obtain a forecast.
"""
from typing import List, Optional, Sequence, Tuple, Union
from typing import List, Optional, Sequence, Union

import numpy as np
from scipy.optimize import linprog
from sklearn.linear_model import LinearRegression, PoissonRegressor, QuantileRegressor

from darts.logging import get_logger
from darts.models.forecasting.regression_model import RegressionModel, _LikelihoodMixin
from darts.models.forecasting.regression_model import (
FUTURE_LAGS_TYPE,
LAGS_TYPE,
RegressionModel,
_LikelihoodMixin,
)
from darts.timeseries import TimeSeries

logger = get_logger(__name__)
Expand All @@ -21,13 +26,13 @@
class LinearRegressionModel(RegressionModel, _LikelihoodMixin):
def __init__(
self,
lags: Union[int, list] = None,
lags_past_covariates: Union[int, List[int]] = None,
lags_future_covariates: Union[Tuple[int, int], List[int]] = None,
lags: Optional[LAGS_TYPE] = None,
lags_past_covariates: Optional[LAGS_TYPE] = None,
lags_future_covariates: Optional[FUTURE_LAGS_TYPE] = None,
output_chunk_length: int = 1,
add_encoders: Optional[dict] = None,
likelihood: str = None,
quantiles: List[float] = None,
likelihood: Optional[str] = None,
quantiles: Optional[List[float]] = None,
random_state: Optional[int] = None,
multi_models: Optional[bool] = True,
use_static_covariates: bool = True,
Expand All @@ -38,17 +43,33 @@ def __init__(
Parameters
----------
lags
Lagged target values used to predict the next time step. If an integer is given the last `lags` past lags
are used (from -1 backward). Otherwise a list of integers with lags is required (each lag must be < 0).
Lagged target `series` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags` past lags; e.g. `(-1, -2, ..., -lags)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `series` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_past_covariates
Number of lagged past_covariates values used to predict the next time step. If an integer is given the last
`lags_past_covariates` past lags are used (inclusive, starting from lag -1). Otherwise a list of integers
with lags < 0 is required.
Lagged `past_covariates` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags_past_covariates` past lags; e.g. `(-1, -2, ..., -lags)`,
where `0` corresponds to the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `past_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_future_covariates
Number of lagged future_covariates values used to predict the next time step. If an tuple (past, future) is
given the last `past` lags in the past are used (inclusive, starting from lag -1) along with the first
`future` future lags (starting from 0 - the prediction time - up to `future - 1` included). Otherwise a list
of integers with lags is required.
Lagged `future_covariates` values used to predict the next time step/s.
If a tuple of `(past, future)`, both values must be > 0. Uses the last `n=past` past lags and `n=future`
future lags; e.g. `(-past, -(past - 1), ..., -1, 0, 1, .... future - 1)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, uses only the specified values as lags.
If a dictionary, the keys correspond to the `future_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (tuple or list of integers). The key
'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
output_chunk_length
Number of time steps predicted at once by the internal regression model. Does not have to equal the forecast
horizon `n` used in `predict()`. However, setting `output_chunk_length` equal to the forecast horizon may
Expand Down
48 changes: 34 additions & 14 deletions darts/models/forecasting/random_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,22 +14,26 @@
----------
.. [1] https://en.wikipedia.org/wiki/Random_forest
"""
from typing import List, Optional, Tuple, Union
from typing import Optional

from sklearn.ensemble import RandomForestRegressor

from darts.logging import get_logger
from darts.models.forecasting.regression_model import RegressionModel
from darts.models.forecasting.regression_model import (
FUTURE_LAGS_TYPE,
LAGS_TYPE,
RegressionModel,
)

logger = get_logger(__name__)


class RandomForest(RegressionModel):
def __init__(
self,
lags: Union[int, list] = None,
lags_past_covariates: Union[int, List[int]] = None,
lags_future_covariates: Union[Tuple[int, int], List[int]] = None,
lags: Optional[LAGS_TYPE] = None,
lags_past_covariates: Optional[LAGS_TYPE] = None,
lags_future_covariates: Optional[FUTURE_LAGS_TYPE] = None,
output_chunk_length: int = 1,
add_encoders: Optional[dict] = None,
n_estimators: Optional[int] = 100,
Expand All @@ -43,17 +47,33 @@ def __init__(
Parameters
----------
lags
Lagged target values used to predict the next time step. If an integer is given the last `lags` past lags
are used (from -1 backward). Otherwise a list of integers with lags is required (each lag must be < 0).
Lagged target `series` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags` past lags; e.g. `(-1, -2, ..., -lags)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `series` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_past_covariates
Number of lagged past_covariates values used to predict the next time step. If an integer is given the last
`lags_past_covariates` past lags are used (inclusive, starting from lag -1). Otherwise a list of integers
with lags < 0 is required.
Lagged `past_covariates` values used to predict the next time step/s.
If an integer, must be > 0. Uses the last `n=lags_past_covariates` past lags; e.g. `(-1, -2, ..., -lags)`,
where `0` corresponds to the first predicted time step of each sample.
If a list of integers, each value must be < 0. Uses only the specified values as lags.
If a dictionary, the keys correspond to the `past_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (integer or list of integers). The
key 'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
lags_future_covariates
Number of lagged future_covariates values used to predict the next time step. If an tuple (past, future) is
given the last `past` lags in the past are used (inclusive, starting from lag -1) along with the first
`future` future lags (starting from 0 - the prediction time - up to `future - 1` included). Otherwise a list
of integers with lags is required.
Lagged `future_covariates` values used to predict the next time step/s.
If a tuple of `(past, future)`, both values must be > 0. Uses the last `n=past` past lags and `n=future`
future lags; e.g. `(-past, -(past - 1), ..., -1, 0, 1, .... future - 1)`, where `0`
corresponds the first predicted time step of each sample.
If a list of integers, uses only the specified values as lags.
If a dictionary, the keys correspond to the `future_covariates` component names (of the first series when
using multiple series) and the values correspond to the component lags (tuple or list of integers). The key
'default_lags' can be used to provide default lags for un-specified components. Raises and error if some
components are missing and the 'default_lags' key is not provided.
output_chunk_length
Number of time steps predicted at once by the internal regression model. Does not have to equal the forecast
horizon `n` used in `predict()`. However, setting `output_chunk_length` equal to the forecast horizon may
Expand Down
Loading