Skip to content

Commit

Permalink
Choose which dimensions are reduced in measures_improvement (#416)
Browse files Browse the repository at this point in the history
<!-- Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [ ] This PR addresses an already opened issue (for bug fixes /
features)
    - This PR fixes #xyz
- [x] (If applicable) Documentation has been added / updated (for bug
fixes / features).
- [x] (If applicable) Tests have been added.
- [x] This PR does not seem to break the templates.
- [x] CHANGELOG.rst has been updated (with summary of main changes).
- [x] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added.

### What kind of change does this PR introduce?

* `measures_improvement` has an extra kwarg: `dim`, which allows to
control on which dimensions we compute the percentage of improvement.

### Does this PR introduce a breaking change?

Default behaviour `None` is the same as before, reduce all dimensions. I
thought it made sense to put `dim` as the first kwarg, so this is not
breaking per-se, but still somewhat semi-breaking for scripts inputting
`to_level` positionnally e.g. `measures_improvement([ds1,ds2],
to_level)`, so I can move down `dim` if it's preferrable.

There is a small breaking change: `ds1-ds2` is used instead of `ds2` to
find non-null values and compute the pct of improvement. This may result
in less non-null values. I think it's the right thing to do though.

### Other information:

My example is that I had a dataset with (rlat, rlon, period), I want
improvement percentages for each period. I realize now this is an
unconventional dataset in the xscen philosophy, I should have had
separate dataset, when using `properties_and_measure` I should have used
`period` etc etc. and have multiple catalog entries. But, for properties
using `group=time.season, time.month`, we would have an extra dimension
`season, month` that we don't necessarily want to reduce, so I think it
still makes sense.

Also: I think this is unrelated to templates, I assumed this is related
to catalogs? So I checked the box
  • Loading branch information
RondeauG authored Nov 4, 2024
2 parents 0fecccd + 690f698 commit fbb3c8a
Show file tree
Hide file tree
Showing 8 changed files with 67 additions and 20 deletions.
5 changes: 5 additions & 0 deletions .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,11 @@
"name": "Gauvin St-Denis, Blaise",
"affiliation": "Ouranos",
"orcid": "0009-0004-9049-2092"
},
{
"name": "Dupuis, Éric",
"affiliation": "Ouranos",
"orcid": "0000-0001-7976-4596"
}
],
"keywords": [
Expand Down
1 change: 1 addition & 0 deletions AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ Contributors
* Marco Braun <[email protected]> `@vindelico <https://github.com/vindelico>`_
* Sarah-Claude Bourdeau-Goulet <[email protected]> `@sarahclaude <https://github.com/sarahclaude>`_
* Blaise Gauvin St-Denis <[email protected]> `@bstdenis <https://github.com/bstdenis>`_
* Éric Dupuis <[email protected]> `@coxipi <https://github.com/coxipi>`_
4 changes: 3 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@ Changelog

v0.11.0 (unreleased)
--------------------
Contributors to this version: Gabriel Rondeau-Genesse (:user:`RondeauG`), Pascal Bourgault (:user:`aulemahal`).
Contributors to this version: Gabriel Rondeau-Genesse (:user:`RondeauG`), Pascal Bourgault (:user:`aulemahal`), Éric Dupuis (:user:`coxipi`).

New features and enhancements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* ``xs.io.make_toc`` now includes the global attributes of the dataset after the information about the variables. (:pull:`473`).
* New function ``xs.get_warming_level_from_period`` to get the warming level associated with a given time horizon. (:pull:`474`).
* Added ability to skip whole folders to ``xs.parse_directory`` with argument ``skip_dirs``. (:pull:`478`, :pull:`479`).
* `diagnostics.measures_improvement` now accepts `dim`, which specifies `dimension(s)` on which the proportion of improved pixels are computed. (:pull:`416`)


Breaking changes
^^^^^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion src/xscen/aggregate.py
Original file line number Diff line number Diff line change
Expand Up @@ -952,7 +952,7 @@ def produce_horizon( # noqa: C901
----------
ds : xr.Dataset
Input dataset with a time dimension.
indicators : Union[str, os.PathLike, Sequence[Indicator], Sequence[Tuple[str, Indicator]], ModuleType]
indicators : str | os.PathLike | Sequence[Indicator] | Sequence[Tuple[str, Indicator]] | ModuleType
Indicators to compute. It will be passed to the `indicators` argument of `xs.compute_indicators`.
periods : list of str or list of lists of str, optional
Either [start, end] or list of [start_year, end_year] for the period(s) to be evaluated.
Expand Down
4 changes: 2 additions & 2 deletions src/xscen/catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -778,7 +778,7 @@ def update(
Parameters
----------
df : Union[DataCatalog, intake_esm.esm_datastore, pd.DataFrame, pd.Series, Sequence[pd.Series]], optional
df : DataCatalog | intake_esm.esm_datastore | pd.DataFrame | pd.Series | Sequence[pd.Series], optional
Data to be added to the catalog. If None, nothing is added, but the catalog is still updated.
"""
# Append the new DataFrame or Series
Expand Down Expand Up @@ -999,7 +999,7 @@ def unstack_id(df: pd.DataFrame | ProjectCatalog | DataCatalog) -> dict:
Parameters
----------
df : Union[pd.DataFrame, ProjectCatalog, DataCatalog]
df : pd.DataFrame | ProjectCatalog | DataCatalog
Either a Project/DataCatalog or a pandas DataFrame.
Returns
Expand Down
41 changes: 27 additions & 14 deletions src/xscen/diagnostics.py
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@ def properties_and_measures( # noqa: C901
----------
ds : xr.Dataset
Input dataset.
properties : Union[str, os.PathLike, Sequence[Indicator], Sequence[tuple[str, Indicator]], ModuleType]
properties : str | os.PathLike | Sequence[Indicator] | Sequence[tuple[str, Indicator]] | ModuleType
Path to a YAML file that instructs on how to calculate properties.
Can be the indicator module directly, or a sequence of indicators or a sequence of
tuples (indicator name, indicator) as returned by `iter_indicators()`.
Expand Down Expand Up @@ -533,24 +533,32 @@ def measures_heatmap(


def measures_improvement(
meas_datasets: list[xr.Dataset] | dict, to_level: str = "diag-improved"
meas_datasets: list[xr.Dataset] | dict,
dim: str | Sequence[str] | None = None,
to_level: str = "diag-improved",
) -> xr.Dataset:
"""
Calculate the fraction of improved grid points for each property between two datasets of measures.
Parameters
----------
meas_datasets: list of xr.Dataset or dict
meas_datasets: list[xr.Dataset] | dict
List of 2 datasets: Initial dataset of measures and final (improved) dataset of measures.
Both datasets must have the same variables.
It is also possible to pass a dictionary where the values are the datasets and the key are not used.
dim : str or sequence of str, optional
Dimension(s) on which to compute the percentage of improved grid points. Default is `None`, which reduces all dimensions.
to_level: str
processing_level to assign to the output dataset
Returns
-------
xr.Dataset
Dataset containing information on the fraction of improved grid points for each property.
Notes
-----
If `dim` is specified, it should be present in every variable of `meas_datasets`.
"""
if isinstance(meas_datasets, dict):
meas_datasets = list(meas_datasets.values())
Expand All @@ -562,24 +570,29 @@ def measures_improvement(
)
ds1 = meas_datasets[0]
ds2 = meas_datasets[1]
if dim is not None:
dims = [dim] if isinstance(dim, str) else dim
for v in ds1.data_vars:
if set(dims).issubset(set(ds1[v].dims)) is False:
raise ValueError(
f"Dimension provided `dim` ({dim}) must be in every variable of `meas_datasets`"
)

percent_better = []
for var in ds2.data_vars:
if dim is None:
# reduce all dimensions (which may be variable dependent)
dims = ds2[var].dims
if "xclim.sdba.measures.RATIO" in ds1[var].attrs["history"]:
diff_bias = abs(ds1[var] - 1) - abs(ds2[var] - 1)
else:
diff_bias = abs(ds1[var]) - abs(ds2[var])
diff_bias = diff_bias.values.ravel()
diff_bias = diff_bias[~np.isnan(diff_bias)]
total_improved = (diff_bias >= 0).sum(dim=dims)
total_notnull = diff_bias.notnull().sum(dim=dims)
percent_better_var = total_improved / total_notnull
percent_better.append(percent_better_var.expand_dims({"properties": [var]}))

total = ds2[var].values.ravel()
total = total[~np.isnan(total)]

improved = diff_bias >= 0
percent_better.append(np.sum(improved) / len(total))

ds_better = xr.DataArray(
percent_better, coords={"properties": list(ds2.data_vars)}, dims="properties"
)
ds_better = xr.concat(percent_better, dim="properties")

ds_better = ds_better.to_dataset(name="improved_grid_points")

Expand Down
4 changes: 2 additions & 2 deletions src/xscen/indicators.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def compute_indicators( # noqa: C901
----------
ds : xr.Dataset
Dataset to use for the indicators.
indicators : Union[str, os.PathLike, Sequence[Indicator], Sequence[tuple[str, Indicator]], ModuleType]
indicators : str | os.PathLike | Sequence[Indicator] | Sequence[tuple[str, Indicator]] | ModuleType
Path to a YAML file that instructs on how to calculate missing variables.
Can also be only the "stem", if translations and custom indices are implemented.
Can be the indicator module directly, or a sequence of indicators or a sequence of
Expand Down Expand Up @@ -349,7 +349,7 @@ def select_inds_for_avail_vars(
----------
ds : xr.Dataset
Dataset to use for the indicators.
indicators : Union[str, os.PathLike, Sequence[Indicator], Sequence[Tuple[str, Indicator]]]
indicators : str | os.PathLike | Sequence[Indicator] | Sequence[Tuple[str, Indicator]]
Path to a YAML file that instructs on how to calculate indicators.
Can also be only the "stem", if translations and custom indices are implemented.
Can be the indicator module directly, or a sequence of indicators or a sequence of
Expand Down
26 changes: 26 additions & 0 deletions tests/test_diagnostics.py
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,32 @@ def test_measures_improvement(self):
assert "mean-tas" in out.properties.values
np.testing.assert_allclose(out["improved_grid_points"].values, 1)

def test_measures_improvement_dim(self):
p1, m1 = xs.properties_and_measures(
self.ds,
properties=self.yaml_file,
period=["2001", "2001"],
)

p2, m2 = xs.properties_and_measures(
self.ds,
properties=self.yaml_file,
dref_for_measure=p1,
period=["2001", "2001"],
)
m3 = xr.concat(
[
m2.quantile_98_tas.to_dataset().expand_dims({"dummy_dim": [i]})
for i in range(3)
],
dim="dummy_dim",
)
out = xs.diagnostics.measures_improvement(
[m3, m3], dim="dummy_dim", to_level="test"
)
assert set(out.dims) == {"properties", "season"}
np.testing.assert_allclose(out["improved_grid_points"].values, 1)

def test_measures_improvement_2d(self):

p1, m1 = xs.properties_and_measures(
Expand Down

0 comments on commit fbb3c8a

Please sign in to comment.