Choose which dimensions are reduced in measures_improvement (#416)

### Pull Request Checklist: - [ ] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #xyz - [x] (If applicable) Documentation has been added / updated (for bug fixes / features). - [x] (If applicable) Tests have been added. - [x] This PR does not seem to break the templates. - [x] CHANGELOG.rst has been updated (with summary of main changes). - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added. ### What kind of change does this PR introduce? * `measures_improvement` has an extra kwarg: `dim`, which allows to control on which dimensions we compute the percentage of improvement. ### Does this PR introduce a breaking change? Default behaviour `None` is the same as before, reduce all dimensions. I thought it made sense to put `dim` as the first kwarg, so this is not breaking per-se, but still somewhat semi-breaking for scripts inputting `to_level` positionnally e.g. `measures_improvement([ds1,ds2], to_level)`, so I can move down `dim` if it's preferrable. There is a small breaking change: `ds1-ds2` is used instead of `ds2` to find non-null values and compute the pct of improvement. This may result in less non-null values. I think it's the right thing to do though. ### Other information: My example is that I had a dataset with (rlat, rlon, period), I want improvement percentages for each period. I realize now this is an unconventional dataset in the xscen philosophy, I should have had separate dataset, when using `properties_and_measure` I should have used `period` etc etc. and have multiple catalog entries. But, for properties using `group=time.season, time.month`, we would have an extra dimension `season, month` that we don't necessarily want to reduce, so I think it still makes sense. Also: I think this is unrelated to templates, I assumed this is related to catalogs? So I checked the box
Ouranosinc · Nov 4, 2024 · fbb3c8a · fbb3c8a
2 parents 0fecccd + 690f698
commit fbb3c8a
Show file tree

Hide file tree

Showing 8 changed files with 67 additions and 20 deletions.
diff --git a/.zenodo.json b/.zenodo.json
@@ -49,6 +49,11 @@
       "name": "Gauvin St-Denis, Blaise",
       "affiliation": "Ouranos",
       "orcid": "0009-0004-9049-2092"
+    },
+    {
+      "name": "Dupuis, Éric",
+      "affiliation": "Ouranos",
+      "orcid": "0000-0001-7976-4596"
     }
   ],
   "keywords": [

diff --git a/AUTHORS.rst b/AUTHORS.rst
@@ -24,3 +24,4 @@ Contributors
 * Marco Braun <[email protected]> `@vindelico <https://github.com/vindelico>`_
 * Sarah-Claude Bourdeau-Goulet <[email protected]> `@sarahclaude <https://github.com/sarahclaude>`_
 * Blaise Gauvin St-Denis <[email protected]> `@bstdenis <https://github.com/bstdenis>`_
+* Éric Dupuis <[email protected]> `@coxipi <https://github.com/coxipi>`_
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -4,13 +4,15 @@ Changelog
 
 v0.11.0 (unreleased)
 --------------------
-Contributors to this version: Gabriel Rondeau-Genesse (:user:`RondeauG`), Pascal Bourgault (:user:`aulemahal`).
+Contributors to this version: Gabriel Rondeau-Genesse (:user:`RondeauG`), Pascal Bourgault (:user:`aulemahal`), Éric Dupuis (:user:`coxipi`).
 
 New features and enhancements
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 * ``xs.io.make_toc`` now includes the global attributes of the dataset after the information about the variables. (:pull:`473`).
 * New function ``xs.get_warming_level_from_period`` to get the warming level associated with a given time horizon. (:pull:`474`).
 * Added ability to skip whole folders to ``xs.parse_directory`` with argument ``skip_dirs``. (:pull:`478`, :pull:`479`).
+* `diagnostics.measures_improvement` now accepts `dim`, which specifies `dimension(s)` on which the proportion of improved pixels are computed. (:pull:`416`)
+
 
 Breaking changes
 ^^^^^^^^^^^^^^^^

diff --git a/src/xscen/aggregate.py b/src/xscen/aggregate.py
@@ -952,7 +952,7 @@ def produce_horizon(  # noqa: C901
     ----------
     ds : xr.Dataset
         Input dataset with a time dimension.
-    indicators :  Union[str, os.PathLike, Sequence[Indicator], Sequence[Tuple[str, Indicator]], ModuleType]
+    indicators :  str | os.PathLike | Sequence[Indicator] | Sequence[Tuple[str, Indicator]] | ModuleType
         Indicators to compute. It will be passed to the `indicators` argument of `xs.compute_indicators`.
     periods : list of str or list of lists of str, optional
         Either [start, end] or list of [start_year, end_year] for the period(s) to be evaluated.

diff --git a/src/xscen/catalog.py b/src/xscen/catalog.py
@@ -778,7 +778,7 @@ def update(
 
         Parameters
         ----------
-        df : Union[DataCatalog, intake_esm.esm_datastore, pd.DataFrame, pd.Series, Sequence[pd.Series]], optional
+        df : DataCatalog | intake_esm.esm_datastore | pd.DataFrame | pd.Series  | Sequence[pd.Series], optional
             Data to be added to the catalog. If None, nothing is added, but the catalog is still updated.
         """
         # Append the new DataFrame or Series
@@ -999,7 +999,7 @@ def unstack_id(df: pd.DataFrame | ProjectCatalog | DataCatalog) -> dict:
 
     Parameters
     ----------
-    df : Union[pd.DataFrame, ProjectCatalog, DataCatalog]
+    df : pd.DataFrame | ProjectCatalog | DataCatalog
         Either a Project/DataCatalog or a pandas DataFrame.
 
     Returns

diff --git a/src/xscen/diagnostics.py b/src/xscen/diagnostics.py
@@ -330,7 +330,7 @@ def properties_and_measures(  # noqa: C901
     ----------
     ds : xr.Dataset
         Input dataset.
-    properties : Union[str, os.PathLike, Sequence[Indicator], Sequence[tuple[str, Indicator]], ModuleType]
+    properties : str | os.PathLike | Sequence[Indicator] | Sequence[tuple[str, Indicator]] | ModuleType
         Path to a YAML file that instructs on how to calculate properties.
         Can be the indicator module directly, or a sequence of indicators or a sequence of
         tuples (indicator name, indicator) as returned by `iter_indicators()`.
@@ -533,24 +533,32 @@ def measures_heatmap(
 
 
 def measures_improvement(
-    meas_datasets: list[xr.Dataset] | dict, to_level: str = "diag-improved"
+    meas_datasets: list[xr.Dataset] | dict,
+    dim: str | Sequence[str] | None = None,
+    to_level: str = "diag-improved",
 ) -> xr.Dataset:
     """
     Calculate the fraction of improved grid points for each property between two datasets of measures.
 
     Parameters
     ----------
-    meas_datasets: list of xr.Dataset or dict
+    meas_datasets: list[xr.Dataset] | dict
         List of 2 datasets: Initial dataset of measures and final (improved) dataset of measures.
         Both datasets must have the same variables.
         It is also possible to pass a dictionary where the values are the datasets and the key are not used.
+    dim : str or sequence of str, optional
+        Dimension(s) on which to compute the percentage of improved grid points. Default is `None`, which reduces all dimensions.
     to_level: str
         processing_level to assign to the output dataset
 
     Returns
     -------
     xr.Dataset
         Dataset containing information on the fraction of improved grid points for each property.
+
+    Notes
+    -----
+    If `dim` is specified, it should be present in every variable of  `meas_datasets`.
     """
     if isinstance(meas_datasets, dict):
         meas_datasets = list(meas_datasets.values())
@@ -562,24 +570,29 @@ def measures_improvement(
         )
     ds1 = meas_datasets[0]
     ds2 = meas_datasets[1]
+    if dim is not None:
+        dims = [dim] if isinstance(dim, str) else dim
+        for v in ds1.data_vars:
+            if set(dims).issubset(set(ds1[v].dims)) is False:
+                raise ValueError(
+                    f"Dimension provided `dim` ({dim}) must be in every variable of `meas_datasets`"
+                )
+
     percent_better = []
     for var in ds2.data_vars:
+        if dim is None:
+            # reduce all dimensions (which may be variable dependent)
+            dims = ds2[var].dims
         if "xclim.sdba.measures.RATIO" in ds1[var].attrs["history"]:
             diff_bias = abs(ds1[var] - 1) - abs(ds2[var] - 1)
         else:
             diff_bias = abs(ds1[var]) - abs(ds2[var])
-        diff_bias = diff_bias.values.ravel()
-        diff_bias = diff_bias[~np.isnan(diff_bias)]
+        total_improved = (diff_bias >= 0).sum(dim=dims)
+        total_notnull = diff_bias.notnull().sum(dim=dims)
+        percent_better_var = total_improved / total_notnull
+        percent_better.append(percent_better_var.expand_dims({"properties": [var]}))
 
-        total = ds2[var].values.ravel()
-        total = total[~np.isnan(total)]
-
-        improved = diff_bias >= 0
-        percent_better.append(np.sum(improved) / len(total))
-
-    ds_better = xr.DataArray(
-        percent_better, coords={"properties": list(ds2.data_vars)}, dims="properties"
-    )
+    ds_better = xr.concat(percent_better, dim="properties")
 
     ds_better = ds_better.to_dataset(name="improved_grid_points")
 

diff --git a/src/xscen/indicators.py b/src/xscen/indicators.py
@@ -123,7 +123,7 @@ def compute_indicators(  # noqa: C901
     ----------
     ds : xr.Dataset
         Dataset to use for the indicators.
-    indicators : Union[str, os.PathLike, Sequence[Indicator], Sequence[tuple[str, Indicator]], ModuleType]
+    indicators : str | os.PathLike | Sequence[Indicator] | Sequence[tuple[str, Indicator]] | ModuleType
         Path to a YAML file that instructs on how to calculate missing variables.
         Can also be only the "stem", if translations and custom indices are implemented.
         Can be the indicator module directly, or a sequence of indicators or a sequence of
@@ -349,7 +349,7 @@ def select_inds_for_avail_vars(
     ----------
     ds : xr.Dataset
         Dataset to use for the indicators.
-    indicators : Union[str, os.PathLike, Sequence[Indicator], Sequence[Tuple[str, Indicator]]]
+    indicators : str | os.PathLike | Sequence[Indicator] | Sequence[Tuple[str, Indicator]]
         Path to a YAML file that instructs on how to calculate indicators.
         Can also be only the "stem", if translations and custom indices are implemented.
         Can be the indicator module directly, or a sequence of indicators or a sequence of

diff --git a/tests/test_diagnostics.py b/tests/test_diagnostics.py
@@ -448,6 +448,32 @@ def test_measures_improvement(self):
         assert "mean-tas" in out.properties.values
         np.testing.assert_allclose(out["improved_grid_points"].values, 1)
 
+    def test_measures_improvement_dim(self):
+        p1, m1 = xs.properties_and_measures(
+            self.ds,
+            properties=self.yaml_file,
+            period=["2001", "2001"],
+        )
+
+        p2, m2 = xs.properties_and_measures(
+            self.ds,
+            properties=self.yaml_file,
+            dref_for_measure=p1,
+            period=["2001", "2001"],
+        )
+        m3 = xr.concat(
+            [
+                m2.quantile_98_tas.to_dataset().expand_dims({"dummy_dim": [i]})
+                for i in range(3)
+            ],
+            dim="dummy_dim",
+        )
+        out = xs.diagnostics.measures_improvement(
+            [m3, m3], dim="dummy_dim", to_level="test"
+        )
+        assert set(out.dims) == {"properties", "season"}
+        np.testing.assert_allclose(out["improved_grid_points"].values, 1)
+
     def test_measures_improvement_2d(self):
 
         p1, m1 = xs.properties_and_measures(