Feat - New version with good improvements, check CHANGELOG

davidusb-geek · Mar 6, 2023 · 5977cb9 · 5977cb9
1 parent fa0f420
commit 5977cb9
Show file tree

Hide file tree

Showing 10 changed files with 297 additions and 41 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,12 @@
 # Changelog
 
+## [0.4.1] - 2023-03-06
+### Improvement
+- Improved the documentation and the in-code docstrings.
+- Added the possibility to save the optimized model after a tuning routine.
+- Added the possibility to publish predict results to a Home Assistant sensor.
+- Added the possibility to provide custom entity_id, unit_of_measurement and friendly_name for each published data.
+
 ## [0.4.0] - 2023-03-06
 ### Improvement
 - A brand new load forecast module and more... The new forecast module can actually be used to foreast any Home Assistant variable. The API provides fit, predict and tune methods. By the default it provides a more efficient way to forecast the power load consumption. It is based on the skforecast module that uses scikit-learn regression models considering auto-regression lags as features. The hyperparameter optimization is proposed using bayesian optimization from the optuna module.
@@ -344,7 +351,13 @@
 [0.3.23]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.23
 [0.3.24]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.24
 [0.3.25]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.25
-[0.3.27]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.27
+[0.3.29]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.29
+[0.3.32]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.32
+[0.3.34]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.34
+[0.3.35]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.35
+[0.3.36]: https://github.com/davidusb-geek/emhass/releases/tag/v0.3.36
+[0.4.0]: https://github.com/davidusb-geek/emhass/releases/tag/v0.4.0
+[0.4.1]: https://github.com/davidusb-geek/emhass/releases/tag/v0.4.1
 
 # Notes
 All notable changes to this project will be documented in this file.

diff --git a/docs/conf.py b/docs/conf.py
@@ -22,7 +22,7 @@
 author = 'David HERNANDEZ'
 
 # The full version, including alpha/beta/rc tags
-release = '0.4.0'
+release = '0.4.1'
 
 # -- General configuration ---------------------------------------------------
 

diff --git a/docs/mlforecaster.md b/docs/mlforecaster.md
@@ -91,6 +91,28 @@ curl -i -H "Content-Type:application/json" -X POST -d '{"model_type": "load_fore
 ```
 The resulting forecast DataFrame is shown in the webui.
 
+It is possible to publish the predict method results to a Home Assistant. By default this is desactivated but it can be activated by using runtime parameters.
+
+The list of parameters needed to set the data publish task is:
+
+- `model_predict_publish`: set to `True` to activate the publish action when calling the `forecast-model-predict` end point.
+
+- `model_predict_entity_id`: the unique `entity_id` to be used.
+
+- `model_predict_unit_of_measurement`: the `unit_of_measurement` to be used.
+
+- `model_predict_friendly_name`: the `friendly_name` to be used.
+
+The default values for these parameters are:
+```
+runtimeparams = {
+    "model_predict_publish": False,
+    "model_predict_entity_id": "sensor.p_load_forecast_custom_model",
+    "model_predict_unit_of_measurement": "W",
+    "model_predict_friendly_name": "Load Power Forecast custom ML model"
+}
+```
+
 ## The tuning method with Bayesian hyperparameter optimization
 
 With a previously fitted model you can use the `forecast-model-tune` end point to tune its hyperparameters. This will be using bayeasian optimization with a wrapper of `optuna` in the `skforecast` module.
@@ -134,3 +156,12 @@ https://joaquinamatrodrigo.github.io/skforecast/0.6.0/user_guides/autoregresive-
 ![](https://joaquinamatrodrigo.github.io/skforecast/0.6.0/img/diagram-recursive-mutistep-forecasting.png) 
 
 With this type of model what we do in EMHASS is to create new features based on the timestamps of the data retrieved from Home Assistant. We create new features based on the day, the hour of the day, the day of the week, the month of the year, among others. 
+
+What is interesting is that these added features are based on the timestamps, they always known in advance and useful for generating forecasts. These are the so-called future known covariates.
+
+In the future we may test to expand using other possible known future covariates from Home Assistant, for example a known (forecasted) temperature, a scheduled presence sensor, etc.
+
+## Going further?
+This class can be gebneralized to actually forecasting any given sensor variable present in Home Assistant. It has been tested and the main initial motivation for this development was for a better load power consumption forecasting. But in reality is has been coded in a flexible way so that you can control what variable is used, how many lags, the amount of data used to train the model, etc.
+
+So you can really go further and try to forecast other types of variables and possible use the results for some interesting automations in Home Assistant. If doing this, was is important is to evaluate the pertinence of the obtained forecasts. The hope is that the tools proposed here can be used for that purpose.
diff --git a/setup.py b/setup.py
@@ -19,7 +19,7 @@
 
 setup(
     name='emhass',  # Required
-    version='0.4.0',  # Required
+    version='0.4.1',  # Required
     description='An Energy Management System for Home Assistant',  # Optional
     long_description=long_description,  # Optional
     long_description_content_type='text/markdown',  # Optional (see note above)

diff --git a/src/emhass/command_line.py b/src/emhass/command_line.py
@@ -313,7 +313,7 @@ def forecast_model_fit(input_data_dict: dict, logger: logging.Logger,
     # Save model
     if not debug:
         filename = model_type+'_mlf.pkl'
-        with open(pathlib.Path(root) / 'data' / filename, 'wb') as outp:
+        with open(pathlib.Path(root) / filename, 'wb') as outp:
             pickle.dump(mlf, outp, pickle.HIGHEST_PROTOCOL)
     return df_pred, df_pred_backtest, mlf
 
@@ -344,19 +344,42 @@ def forecast_model_predict(input_data_dict: dict, logger: logging.Logger,
     model_type = input_data_dict['params']['passed_data']['model_type']
     root = input_data_dict['root']
     filename = model_type+'_mlf.pkl'
-    filename_path = pathlib.Path(root) / 'data' / filename
+    filename_path = pathlib.Path(root) / filename
     if not debug:
         if filename_path.is_file():
             with open(filename_path, 'rb') as inp:
                 mlf = pickle.load(inp)
         else:
             logger.error("The ML forecaster file was not found, please run a model fit method before this predict method")
+            return
     # Make predictions
     if use_last_window:
         data_last_window = copy.deepcopy(input_data_dict['df_input_data'])
     else:
         data_last_window = None
     predictions = mlf.predict(data_last_window)
+    # Publish data to a Home Assistant sensor
+    model_predict_publish = input_data_dict['params']['passed_data']['model_predict_publish']
+    model_predict_entity_id = input_data_dict['params']['passed_data']['model_predict_entity_id']
+    model_predict_unit_of_measurement = input_data_dict['params']['passed_data']['model_predict_unit_of_measurement']
+    model_predict_friendly_name = input_data_dict['params']['passed_data']['model_predict_friendly_name']
+    if model_predict_publish:
+        # Estimate the current index
+        now_precise = datetime.now(input_data_dict['retrieve_hass_conf']['time_zone']).replace(second=0, microsecond=0)
+        if input_data_dict['retrieve_hass_conf']['method_ts_round'] == 'nearest':
+            idx_closest = predictions.index.get_indexer([now_precise], method='nearest')[0]
+        elif input_data_dict['retrieve_hass_conf']['method_ts_round'] == 'first':
+            idx_closest = predictions.index.get_indexer([now_precise], method='ffill')[0]
+        elif input_data_dict['retrieve_hass_conf']['method_ts_round'] == 'last':
+            idx_closest = predictions.index.get_indexer([now_precise], method='bfill')[0]
+        if idx_closest == -1:
+            idx_closest = predictions.index.get_indexer([now_precise], method='nearest')[0]
+        # Publish Load forecast
+        input_data_dict['rh'].post_data(predictions, idx_closest, 
+                                        model_predict_entity_id,
+                                        model_predict_unit_of_measurement, 
+                                        model_predict_friendly_name,
+                                        from_mlforecaster=True)
     return predictions
 
 def forecast_model_tune(input_data_dict: dict, logger: logging.Logger,
@@ -379,15 +402,21 @@ def forecast_model_tune(input_data_dict: dict, logger: logging.Logger,
     model_type = input_data_dict['params']['passed_data']['model_type']
     root = input_data_dict['root']
     filename = model_type+'_mlf.pkl'
-    filename_path = pathlib.Path(root) / 'data' / filename
+    filename_path = pathlib.Path(root) / filename
     if not debug:
         if filename_path.is_file():
             with open(filename_path, 'rb') as inp:
                 mlf = pickle.load(inp)
         else:
             logger.error("The ML forecaster file was not found, please run a model fit method before this tune method")
+            return
     # Tune the model
     df_pred_optim = mlf.tune(debug=debug)
+    # Save model
+    if not debug:
+        filename = model_type+'_mlf.pkl'
+        with open(pathlib.Path(root) / filename, 'wb') as outp:
+            pickle.dump(mlf, outp, pickle.HIGHEST_PROTOCOL)
     return df_pred_optim
 
 def publish_data(input_data_dict: dict, logger: logging.Logger,
@@ -432,39 +461,60 @@ def publish_data(input_data_dict: dict, logger: logging.Logger,
     if idx_closest == -1:
         idx_closest = opt_res_latest.index.get_indexer([now_precise], method='nearest')[0]
     # Publish PV forecast
+    custom_pv_forecast_id = input_data_dict['params']['passed_data']['custom_pv_forecast_id']
     input_data_dict['rh'].post_data(opt_res_latest['P_PV'], idx_closest, 
-                                    'sensor.p_pv_forecast', "W", "PV Power Forecast")
+                                    custom_pv_forecast_id["entity_id"], 
+                                    custom_pv_forecast_id["unit_of_measurement"],
+                                    custom_pv_forecast_id["friendly_name"])
     # Publish Load forecast
+    custom_load_forecast_id = input_data_dict['params']['passed_data']['custom_load_forecast_id']
     input_data_dict['rh'].post_data(opt_res_latest['P_Load'], idx_closest, 
-                                    'sensor.p_load_forecast', "W", "Load Power Forecast")
+                                    custom_load_forecast_id["entity_id"], 
+                                    custom_load_forecast_id["unit_of_measurement"],
+                                    custom_load_forecast_id["friendly_name"])
     cols_published = ['P_PV', 'P_Load']
     # Publish deferrable loads
+    custom_deferrable_forecast_id = input_data_dict['params']['passed_data']['custom_deferrable_forecast_id']
     for k in range(input_data_dict['opt'].optim_conf['num_def_loads']):
         if "P_deferrable{}".format(k) not in opt_res_latest.columns:
             logger.error("P_deferrable{}".format(k)+" was not found in results DataFrame. Optimization task may need to be relaunched or it did not converged to a solution.")
         else:
             input_data_dict['rh'].post_data(opt_res_latest["P_deferrable{}".format(k)], idx_closest, 
-                                            'sensor.p_deferrable{}'.format(k), "W", "Deferrable Load {}".format(k))
+                                            custom_deferrable_forecast_id[k]["entity_id"], 
+                                            custom_deferrable_forecast_id[k]["unit_of_measurement"],
+                                            custom_deferrable_forecast_id[k]["friendly_name"])
             cols_published = cols_published+["P_deferrable{}".format(k)]
     # Publish battery power
     if input_data_dict['opt'].optim_conf['set_use_battery']:
         if 'P_batt' not in opt_res_latest.columns:
             logger.error("P_batt was not found in results DataFrame. Optimization task may need to be relaunched or it did not converged to a solution.")
         else:
+            custom_batt_forecast_id = input_data_dict['params']['passed_data']['custom_batt_forecast_id']
             input_data_dict['rh'].post_data(opt_res_latest['P_batt'], idx_closest,
-                                            'sensor.p_batt_forecast', "W", "Battery Power Forecast")
+                                            custom_batt_forecast_id["entity_id"], 
+                                            custom_batt_forecast_id["unit_of_measurement"],
+                                            custom_batt_forecast_id["friendly_name"])
             cols_published = cols_published+["P_batt"]
+            custom_batt_soc_forecast_id = input_data_dict['params']['passed_data']['custom_batt_soc_forecast_id']
             input_data_dict['rh'].post_data(opt_res_latest['SOC_opt']*100, idx_closest,
-                                            'sensor.soc_batt_forecast', "%", "Battery SOC Forecast")
+                                            custom_batt_soc_forecast_id["entity_id"], 
+                                            custom_batt_soc_forecast_id["unit_of_measurement"],
+                                            custom_batt_soc_forecast_id["friendly_name"])
             cols_published = cols_published+["SOC_opt"]
     # Publish grid power
+    custom_grid_forecast_id = input_data_dict['params']['passed_data']['custom_grid_forecast_id']
     input_data_dict['rh'].post_data(opt_res_latest['P_grid'], idx_closest, 
-                                    'sensor.p_grid_forecast', "W", "Grid Power Forecast")
+                                    custom_grid_forecast_id["entity_id"], 
+                                    custom_grid_forecast_id["unit_of_measurement"],
+                                    custom_grid_forecast_id["friendly_name"])
     cols_published = cols_published+["P_grid"]
     # Publish total value of cost function
+    custom_cost_fun_id = input_data_dict['params']['passed_data']['custom_cost_fun_id']
     col_cost_fun = [i for i in opt_res_latest.columns if 'cost_fun_' in i]
     input_data_dict['rh'].post_data(opt_res_latest[col_cost_fun], idx_closest, 
-                                    'sensor.total_cost_fun_value', "", "Total cost function value")
+                                    custom_cost_fun_id["entity_id"], 
+                                    custom_cost_fun_id["unit_of_measurement"],
+                                    custom_cost_fun_id["friendly_name"])
     # Create a DF resuming what has been published
     opt_res = opt_res_latest[cols_published].loc[[opt_res_latest.index[idx_closest]]]
     return opt_res

diff --git a/src/emhass/machine_learning_forecaster.py b/src/emhass/machine_learning_forecaster.py
@@ -39,6 +39,28 @@ class mlforecaster:
 
     def __init__(self, data: pd.DataFrame, model_type: str, var_model: str, sklearn_model: str,
                  num_lags: int, root: str, logger: logging.Logger) -> None:
+        r"""Define constructor for the forecast class.
+
+        :param data: The data that will be used for train/test
+        :type data: pd.DataFrame
+        :param model_type: A unique name defining this model and useful to identify \
+            for what it will be used for.
+        :type model_type: str
+        :param var_model: The name of the sensor to retrieve data from Home Assistant. \
+            Example: `sensor.power_load_no_var_loads`.
+        :type var_model: str
+        :param sklearn_model: The `scikit-learn` model that will be used. For now only \
+            this options are possible: `LinearRegression`, `ElasticNet` and `KNeighborsRegressor`.
+        :type sklearn_model: str
+        :param num_lags: The number of auto-regression lags to consider. A good starting point \
+            is to fix this as one day. For example if your time step is 30 minutes, then fix this \
+            to 48, if the time step is 1 hour the fix this to 24 and so on.
+        :type num_lags: int
+        :param root: The parent folder of the path where the config.yaml file is located
+        :type root: str
+        :param logger: The passed logger object
+        :type logger: logging.Logger
+        """
         self.data = data
         self.model_type = model_type
         self.var_model = var_model
@@ -53,7 +75,14 @@ def __init__(self, data: pd.DataFrame, model_type: str, var_model: str, sklearn_
         self.data = self.data[~self.data.index.duplicated(keep='first')]
 
     @staticmethod
-    def add_date_features(data):
+    def add_date_features(data: pd.DataFrame) -> pd.DataFrame:
+        """Add date features from the input DataFrame timestamp
+
+        :param data: The input DataFrame
+        :type data: pd.DataFrame
+        :return: The DataFrame with the added features
+        :rtype: pd.DataFrame
+        """
         df = copy.deepcopy(data)
         df['year'] = [i.year for i in df.index]
         df['month'] = [i.month for i in df.index]
@@ -65,10 +94,22 @@ def add_date_features(data):
 
     @staticmethod
     def neg_r2_score(y_true, y_pred):
+        """The negative of the r2 score."""
         return -r2_score(y_true, y_pred)
 
     def fit(self, split_date_delta: Optional[str] = '48h', perform_backtest: Optional[bool] = False
             ) -> Tuple[pd.DataFrame, pd.DataFrame]:
+        r"""The fit method to train the ML model.
+
+        :param split_date_delta: The delta from now to `split_date_delta` that will be used \
+            as the test period to evaluate the model, defaults to '48h'
+        :type split_date_delta: Optional[str], optional
+        :param perform_backtest: If `True` then a back testing routine is performed to evaluate \
+            the performance of the model on the complete train set, defaults to False
+        :type perform_backtest: Optional[bool], optional
+        :return: The DataFrame containing the forecast data results without and with backtest
+        :rtype: Tuple[pd.DataFrame, pd.DataFrame]
+        """
         self.logger.info("Performing a forecast model fit for "+self.model_type)
         # Preparing the data: adding exogenous features
         self.data_exo = pd.DataFrame(index=self.data.index)
@@ -135,6 +176,16 @@ def fit(self, split_date_delta: Optional[str] = '48h', perform_backtest: Optiona
 
     def predict(self, data_last_window: Optional[pd.DataFrame] = None
             ) -> pd.Series:
+        """The predict method to generate forecasts from a previously fitted ML model.
+
+        :param data_last_window: The data that will be used to generate the new forecast, this \
+            will be freshly retrieved from Home Assistant. This data is needed because the forecast \
+            model is an auto-regressive model with lags. If not passed then the data used during the \
+            model train is used, defaults to None
+        :type data_last_window: Optional[pd.DataFrame], optional
+        :return: A pandas series containing the generated forecasts.
+        :rtype: pd.Series
+        """
         if data_last_window is None:
             predictions = self.forecaster.predict(steps=self.num_lags, exog=self.data_train.drop(self.var_model, axis=1))
         else:
@@ -151,7 +202,14 @@ def predict(self, data_last_window: Optional[pd.DataFrame] = None
         return predictions
 
     def tune(self, debug: Optional[bool] = False) -> pd.DataFrame:
-        # Bayesian search hyperparameter and lags with Skopt
+        """Tuning a previously fitted model using bayesian optimization.
+
+        :param debug: Set to True for testing and faster optimizations, defaults to False
+        :type debug: Optional[bool], optional
+        :return: The DataFrame with the forecasts using the optimized model.
+        :rtype: pd.DataFrame
+        """
+        # Bayesian search hyperparameter and lags with skforecast/optuna
         # Lags used as predictors
         if debug:
             lags_grid = [3]
@@ -214,8 +272,7 @@ def search_space(trial):
             random_state       = 123,
             return_best        = True,
             verbose            = False,
-            engine             = 'optuna',
-            kwargs_gp_minimize = {}
+            engine             = 'optuna'
         )
         self.logger.info(f"Elapsed time: {time.time() - start_time}")
         self.is_tuned = True