Skip to content

Commit

Permalink
Feat - New version with good improvements, check CHANGELOG
Browse files Browse the repository at this point in the history
  • Loading branch information
davidusb-geek committed Mar 6, 2023
1 parent fa0f420 commit 5977cb9
Show file tree
Hide file tree
Showing 10 changed files with 297 additions and 41 deletions.
15 changes: 14 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Changelog

## [0.4.1] - 2023-03-06
### Improvement
- Improved the documentation and the in-code docstrings.
- Added the possibility to save the optimized model after a tuning routine.
- Added the possibility to publish predict results to a Home Assistant sensor.
- Added the possibility to provide custom entity_id, unit_of_measurement and friendly_name for each published data.

## [0.4.0] - 2023-03-06
### Improvement
- A brand new load forecast module and more... The new forecast module can actually be used to foreast any Home Assistant variable. The API provides fit, predict and tune methods. By the default it provides a more efficient way to forecast the power load consumption. It is based on the skforecast module that uses scikit-learn regression models considering auto-regression lags as features. The hyperparameter optimization is proposed using bayesian optimization from the optuna module.
Expand Down Expand Up @@ -344,7 +351,13 @@
[0.3.23]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.23
[0.3.24]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.24
[0.3.25]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.25
[0.3.27]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.27
[0.3.29]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.29
[0.3.32]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.32
[0.3.34]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.34
[0.3.35]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.35
[0.3.36]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.36
[0.4.0]: https://github.com/davidusb-geek/emhass/releases/tag/v0.4.0
[0.4.1]: https://github.com/davidusb-geek/emhass/releases/tag/v0.4.1

# Notes
All notable changes to this project will be documented in this file.
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
author = 'David HERNANDEZ'

# The full version, including alpha/beta/rc tags
release = '0.4.0'
release = '0.4.1'

# -- General configuration ---------------------------------------------------

Expand Down
31 changes: 31 additions & 0 deletions docs/mlforecaster.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,28 @@ curl -i -H "Content-Type:application/json" -X POST -d '{"model_type": "load_fore
```
The resulting forecast DataFrame is shown in the webui.

It is possible to publish the predict method results to a Home Assistant. By default this is desactivated but it can be activated by using runtime parameters.

The list of parameters needed to set the data publish task is:

- `model_predict_publish`: set to `True` to activate the publish action when calling the `forecast-model-predict` end point.

- `model_predict_entity_id`: the unique `entity_id` to be used.

- `model_predict_unit_of_measurement`: the `unit_of_measurement` to be used.

- `model_predict_friendly_name`: the `friendly_name` to be used.

The default values for these parameters are:
```
runtimeparams = {
"model_predict_publish": False,
"model_predict_entity_id": "sensor.p_load_forecast_custom_model",
"model_predict_unit_of_measurement": "W",
"model_predict_friendly_name": "Load Power Forecast custom ML model"
}
```

## The tuning method with Bayesian hyperparameter optimization

With a previously fitted model you can use the `forecast-model-tune` end point to tune its hyperparameters. This will be using bayeasian optimization with a wrapper of `optuna` in the `skforecast` module.
Expand Down Expand Up @@ -134,3 +156,12 @@ https://joaquinamatrodrigo.github.io/skforecast/0.6.0/user_guides/autoregresive-
![](https://joaquinamatrodrigo.github.io/skforecast/0.6.0/img/diagram-recursive-mutistep-forecasting.png)

With this type of model what we do in EMHASS is to create new features based on the timestamps of the data retrieved from Home Assistant. We create new features based on the day, the hour of the day, the day of the week, the month of the year, among others.

What is interesting is that these added features are based on the timestamps, they always known in advance and useful for generating forecasts. These are the so-called future known covariates.

In the future we may test to expand using other possible known future covariates from Home Assistant, for example a known (forecasted) temperature, a scheduled presence sensor, etc.

## Going further?
This class can be gebneralized to actually forecasting any given sensor variable present in Home Assistant. It has been tested and the main initial motivation for this development was for a better load power consumption forecasting. But in reality is has been coded in a flexible way so that you can control what variable is used, how many lags, the amount of data used to train the model, etc.

So you can really go further and try to forecast other types of variables and possible use the results for some interesting automations in Home Assistant. If doing this, was is important is to evaluate the pertinence of the obtained forecasts. The hope is that the tools proposed here can be used for that purpose.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

setup(
name='emhass', # Required
version='0.4.0', # Required
version='0.4.1', # Required
description='An Energy Management System for Home Assistant', # Optional
long_description=long_description, # Optional
long_description_content_type='text/markdown', # Optional (see note above)
Expand Down
70 changes: 60 additions & 10 deletions src/emhass/command_line.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ def forecast_model_fit(input_data_dict: dict, logger: logging.Logger,
# Save model
if not debug:
filename = model_type+'_mlf.pkl'
with open(pathlib.Path(root) / 'data' / filename, 'wb') as outp:
with open(pathlib.Path(root) / filename, 'wb') as outp:
pickle.dump(mlf, outp, pickle.HIGHEST_PROTOCOL)
return df_pred, df_pred_backtest, mlf

Expand Down Expand Up @@ -344,19 +344,42 @@ def forecast_model_predict(input_data_dict: dict, logger: logging.Logger,
model_type = input_data_dict['params']['passed_data']['model_type']
root = input_data_dict['root']
filename = model_type+'_mlf.pkl'
filename_path = pathlib.Path(root) / 'data' / filename
filename_path = pathlib.Path(root) / filename
if not debug:
if filename_path.is_file():
with open(filename_path, 'rb') as inp:
mlf = pickle.load(inp)
else:
logger.error("The ML forecaster file was not found, please run a model fit method before this predict method")
return
# Make predictions
if use_last_window:
data_last_window = copy.deepcopy(input_data_dict['df_input_data'])
else:
data_last_window = None
predictions = mlf.predict(data_last_window)
# Publish data to a Home Assistant sensor
model_predict_publish = input_data_dict['params']['passed_data']['model_predict_publish']
model_predict_entity_id = input_data_dict['params']['passed_data']['model_predict_entity_id']
model_predict_unit_of_measurement = input_data_dict['params']['passed_data']['model_predict_unit_of_measurement']
model_predict_friendly_name = input_data_dict['params']['passed_data']['model_predict_friendly_name']
if model_predict_publish:
# Estimate the current index
now_precise = datetime.now(input_data_dict['retrieve_hass_conf']['time_zone']).replace(second=0, microsecond=0)
if input_data_dict['retrieve_hass_conf']['method_ts_round'] == 'nearest':
idx_closest = predictions.index.get_indexer([now_precise], method='nearest')[0]
elif input_data_dict['retrieve_hass_conf']['method_ts_round'] == 'first':
idx_closest = predictions.index.get_indexer([now_precise], method='ffill')[0]
elif input_data_dict['retrieve_hass_conf']['method_ts_round'] == 'last':
idx_closest = predictions.index.get_indexer([now_precise], method='bfill')[0]
if idx_closest == -1:
idx_closest = predictions.index.get_indexer([now_precise], method='nearest')[0]
# Publish Load forecast
input_data_dict['rh'].post_data(predictions, idx_closest,
model_predict_entity_id,
model_predict_unit_of_measurement,
model_predict_friendly_name,
from_mlforecaster=True)
return predictions

def forecast_model_tune(input_data_dict: dict, logger: logging.Logger,
Expand All @@ -379,15 +402,21 @@ def forecast_model_tune(input_data_dict: dict, logger: logging.Logger,
model_type = input_data_dict['params']['passed_data']['model_type']
root = input_data_dict['root']
filename = model_type+'_mlf.pkl'
filename_path = pathlib.Path(root) / 'data' / filename
filename_path = pathlib.Path(root) / filename
if not debug:
if filename_path.is_file():
with open(filename_path, 'rb') as inp:
mlf = pickle.load(inp)
else:
logger.error("The ML forecaster file was not found, please run a model fit method before this tune method")
return
# Tune the model
df_pred_optim = mlf.tune(debug=debug)
# Save model
if not debug:
filename = model_type+'_mlf.pkl'
with open(pathlib.Path(root) / filename, 'wb') as outp:
pickle.dump(mlf, outp, pickle.HIGHEST_PROTOCOL)
return df_pred_optim

def publish_data(input_data_dict: dict, logger: logging.Logger,
Expand Down Expand Up @@ -432,39 +461,60 @@ def publish_data(input_data_dict: dict, logger: logging.Logger,
if idx_closest == -1:
idx_closest = opt_res_latest.index.get_indexer([now_precise], method='nearest')[0]
# Publish PV forecast
custom_pv_forecast_id = input_data_dict['params']['passed_data']['custom_pv_forecast_id']
input_data_dict['rh'].post_data(opt_res_latest['P_PV'], idx_closest,
'sensor.p_pv_forecast', "W", "PV Power Forecast")
custom_pv_forecast_id["entity_id"],
custom_pv_forecast_id["unit_of_measurement"],
custom_pv_forecast_id["friendly_name"])
# Publish Load forecast
custom_load_forecast_id = input_data_dict['params']['passed_data']['custom_load_forecast_id']
input_data_dict['rh'].post_data(opt_res_latest['P_Load'], idx_closest,
'sensor.p_load_forecast', "W", "Load Power Forecast")
custom_load_forecast_id["entity_id"],
custom_load_forecast_id["unit_of_measurement"],
custom_load_forecast_id["friendly_name"])
cols_published = ['P_PV', 'P_Load']
# Publish deferrable loads
custom_deferrable_forecast_id = input_data_dict['params']['passed_data']['custom_deferrable_forecast_id']
for k in range(input_data_dict['opt'].optim_conf['num_def_loads']):
if "P_deferrable{}".format(k) not in opt_res_latest.columns:
logger.error("P_deferrable{}".format(k)+" was not found in results DataFrame. Optimization task may need to be relaunched or it did not converged to a solution.")
else:
input_data_dict['rh'].post_data(opt_res_latest["P_deferrable{}".format(k)], idx_closest,
'sensor.p_deferrable{}'.format(k), "W", "Deferrable Load {}".format(k))
custom_deferrable_forecast_id[k]["entity_id"],
custom_deferrable_forecast_id[k]["unit_of_measurement"],
custom_deferrable_forecast_id[k]["friendly_name"])
cols_published = cols_published+["P_deferrable{}".format(k)]
# Publish battery power
if input_data_dict['opt'].optim_conf['set_use_battery']:
if 'P_batt' not in opt_res_latest.columns:
logger.error("P_batt was not found in results DataFrame. Optimization task may need to be relaunched or it did not converged to a solution.")
else:
custom_batt_forecast_id = input_data_dict['params']['passed_data']['custom_batt_forecast_id']
input_data_dict['rh'].post_data(opt_res_latest['P_batt'], idx_closest,
'sensor.p_batt_forecast', "W", "Battery Power Forecast")
custom_batt_forecast_id["entity_id"],
custom_batt_forecast_id["unit_of_measurement"],
custom_batt_forecast_id["friendly_name"])
cols_published = cols_published+["P_batt"]
custom_batt_soc_forecast_id = input_data_dict['params']['passed_data']['custom_batt_soc_forecast_id']
input_data_dict['rh'].post_data(opt_res_latest['SOC_opt']*100, idx_closest,
'sensor.soc_batt_forecast', "%", "Battery SOC Forecast")
custom_batt_soc_forecast_id["entity_id"],
custom_batt_soc_forecast_id["unit_of_measurement"],
custom_batt_soc_forecast_id["friendly_name"])
cols_published = cols_published+["SOC_opt"]
# Publish grid power
custom_grid_forecast_id = input_data_dict['params']['passed_data']['custom_grid_forecast_id']
input_data_dict['rh'].post_data(opt_res_latest['P_grid'], idx_closest,
'sensor.p_grid_forecast', "W", "Grid Power Forecast")
custom_grid_forecast_id["entity_id"],
custom_grid_forecast_id["unit_of_measurement"],
custom_grid_forecast_id["friendly_name"])
cols_published = cols_published+["P_grid"]
# Publish total value of cost function
custom_cost_fun_id = input_data_dict['params']['passed_data']['custom_cost_fun_id']
col_cost_fun = [i for i in opt_res_latest.columns if 'cost_fun_' in i]
input_data_dict['rh'].post_data(opt_res_latest[col_cost_fun], idx_closest,
'sensor.total_cost_fun_value', "", "Total cost function value")
custom_cost_fun_id["entity_id"],
custom_cost_fun_id["unit_of_measurement"],
custom_cost_fun_id["friendly_name"])
# Create a DF resuming what has been published
opt_res = opt_res_latest[cols_published].loc[[opt_res_latest.index[idx_closest]]]
return opt_res
Expand Down
65 changes: 61 additions & 4 deletions src/emhass/machine_learning_forecaster.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,28 @@ class mlforecaster:

def __init__(self, data: pd.DataFrame, model_type: str, var_model: str, sklearn_model: str,
num_lags: int, root: str, logger: logging.Logger) -> None:
r"""Define constructor for the forecast class.
:param data: The data that will be used for train/test
:type data: pd.DataFrame
:param model_type: A unique name defining this model and useful to identify \
for what it will be used for.
:type model_type: str
:param var_model: The name of the sensor to retrieve data from Home Assistant. \
Example: `sensor.power_load_no_var_loads`.
:type var_model: str
:param sklearn_model: The `scikit-learn` model that will be used. For now only \
this options are possible: `LinearRegression`, `ElasticNet` and `KNeighborsRegressor`.
:type sklearn_model: str
:param num_lags: The number of auto-regression lags to consider. A good starting point \
is to fix this as one day. For example if your time step is 30 minutes, then fix this \
to 48, if the time step is 1 hour the fix this to 24 and so on.
:type num_lags: int
:param root: The parent folder of the path where the config.yaml file is located
:type root: str
:param logger: The passed logger object
:type logger: logging.Logger
"""
self.data = data
self.model_type = model_type
self.var_model = var_model
Expand All @@ -53,7 +75,14 @@ def __init__(self, data: pd.DataFrame, model_type: str, var_model: str, sklearn_
self.data = self.data[~self.data.index.duplicated(keep='first')]

@staticmethod
def add_date_features(data):
def add_date_features(data: pd.DataFrame) -> pd.DataFrame:
"""Add date features from the input DataFrame timestamp
:param data: The input DataFrame
:type data: pd.DataFrame
:return: The DataFrame with the added features
:rtype: pd.DataFrame
"""
df = copy.deepcopy(data)
df['year'] = [i.year for i in df.index]
df['month'] = [i.month for i in df.index]
Expand All @@ -65,10 +94,22 @@ def add_date_features(data):

@staticmethod
def neg_r2_score(y_true, y_pred):
"""The negative of the r2 score."""
return -r2_score(y_true, y_pred)

def fit(self, split_date_delta: Optional[str] = '48h', perform_backtest: Optional[bool] = False
) -> Tuple[pd.DataFrame, pd.DataFrame]:
r"""The fit method to train the ML model.
:param split_date_delta: The delta from now to `split_date_delta` that will be used \
as the test period to evaluate the model, defaults to '48h'
:type split_date_delta: Optional[str], optional
:param perform_backtest: If `True` then a back testing routine is performed to evaluate \
the performance of the model on the complete train set, defaults to False
:type perform_backtest: Optional[bool], optional
:return: The DataFrame containing the forecast data results without and with backtest
:rtype: Tuple[pd.DataFrame, pd.DataFrame]
"""
self.logger.info("Performing a forecast model fit for "+self.model_type)
# Preparing the data: adding exogenous features
self.data_exo = pd.DataFrame(index=self.data.index)
Expand Down Expand Up @@ -135,6 +176,16 @@ def fit(self, split_date_delta: Optional[str] = '48h', perform_backtest: Optiona

def predict(self, data_last_window: Optional[pd.DataFrame] = None
) -> pd.Series:
"""The predict method to generate forecasts from a previously fitted ML model.
:param data_last_window: The data that will be used to generate the new forecast, this \
will be freshly retrieved from Home Assistant. This data is needed because the forecast \
model is an auto-regressive model with lags. If not passed then the data used during the \
model train is used, defaults to None
:type data_last_window: Optional[pd.DataFrame], optional
:return: A pandas series containing the generated forecasts.
:rtype: pd.Series
"""
if data_last_window is None:
predictions = self.forecaster.predict(steps=self.num_lags, exog=self.data_train.drop(self.var_model, axis=1))
else:
Expand All @@ -151,7 +202,14 @@ def predict(self, data_last_window: Optional[pd.DataFrame] = None
return predictions

def tune(self, debug: Optional[bool] = False) -> pd.DataFrame:
# Bayesian search hyperparameter and lags with Skopt
"""Tuning a previously fitted model using bayesian optimization.
:param debug: Set to True for testing and faster optimizations, defaults to False
:type debug: Optional[bool], optional
:return: The DataFrame with the forecasts using the optimized model.
:rtype: pd.DataFrame
"""
# Bayesian search hyperparameter and lags with skforecast/optuna
# Lags used as predictors
if debug:
lags_grid = [3]
Expand Down Expand Up @@ -214,8 +272,7 @@ def search_space(trial):
random_state = 123,
return_best = True,
verbose = False,
engine = 'optuna',
kwargs_gp_minimize = {}
engine = 'optuna'
)
self.logger.info(f"Elapsed time: {time.time() - start_time}")
self.is_tuned = True
Expand Down
Loading

0 comments on commit 5977cb9

Please sign in to comment.