Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop build_reduction_data and refactor the reduce module #386

Merged
merged 11 commits into from
Apr 26, 2024
7 changes: 7 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,18 @@ v0.9.0 (unreleased)
-------------------
Contributors to this version: Trevor James Smith (:user:`Zeitsperre`), Pascal Bourgault (:user:`aulemahal`), Gabriel Rondeau-Genesse (:user:`RondeauG`), Juliette Lavoie (:user:`juliettelavoie`), Marco Braun (:user:`vindelico`).

New features and enhancements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* ``xs.reduce_ensemble`` will now call ``xclim.ensembles.create_ensemble`` and ``xclim.ensembles.make_critera`` if required. (:pull:`386`).

Breaking changes
^^^^^^^^^^^^^^^^
* Removed support for the old instances of the `region` argument in ``spatial_mean``, ``extract_dataset``, and ``subset``. (:pull:`367`).
* Removed ``xscen.extract.clisops_subset``. (:pull:`367`).
* ``dtr`` (the function) was renamed to ``dtr_from_minmax`` to avoid confusion with the `dtr` variable. (:pull:`372`).
* The ``xscen.reduce`` module has been abandoned. (:pull:`386`).
* ``build_reduction_data`` has been made redundant by ``xclim.ensembles.make_critera`` and will be removed in a future release.
* ``xscen.reduce.reduce_ensemble`` has been moved to ``xscen.ensembles.reduce_ensemble``, as a module was no longer necessary.

Internal changes
^^^^^^^^^^^^^^^^
Expand Down
61 changes: 16 additions & 45 deletions docs/notebooks/4_ensembles.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,13 @@
"for d in datasets:\n",
" ds = open_dataset(datasets[d]).isel(lon=slice(0, 4), lat=slice(0, 4))\n",
" ds = xs.climatological_op(\n",
" ds, op=\"mean\", window=30, periods=[[1981, 2010], [2021, 2050]]\n",
" )\n",
" datasets[d] = xs.compute_deltas(ds, reference_horizon=\"1981-2010\")\n",
" datasets[d].attrs[\"cat:id\"] = d # Required by build_reduction_data\n",
" datasets[d].attrs[\"cat:xrfreq\"] = \"AS-JAN\""
" ds,\n",
" op=\"mean\",\n",
" window=30,\n",
" periods=[[1981, 2010], [2021, 2050]],\n",
" horizons_as_dim=True,\n",
" ).drop_vars(\"time\")\n",
" datasets[d] = xs.compute_deltas(ds, reference_horizon=\"1981-2010\")"
]
},
{
Expand All @@ -68,47 +70,14 @@
"source": [
"### Preparing the data\n",
"\n",
"Ensemble reduction is built upon climate indicators that are relevant to represent the ensemble's variability for a given application. In this case, we'll use the mean temperature delta between 2021-2050 and 1981-2010.\n",
"Ensemble reduction is built upon climate indicators that are relevant to represent the ensemble's variability for a given application. In this case, we'll use the mean temperature delta between 2021-2050 and 1981-2010, but monthly or seasonal indicators could also be required. The `horizons_as_dim` argument in `climatological_op` can help combine indicators of multiple frequencies into a single dataset. Alternatively, `xscen.utils.unstack_dates` can also accomplish the same thing if the climatological operations have already been computed.\n",
"\n",
"However, the functions implemented in `xclim.ensembles._reduce` require a very specific 2-D DataArray of dimensions \"realization\" and \"criteria\". That means that all the variables need to be combined and renamed, and that all dimensions need to be stacked together.\n",
"The functions implemented in `xclim.ensembles._reduce` require a very specific 2-D DataArray of dimensions \"realization\" and \"criteria\". The first solution is to first create an ensemble using `xclim.ensembles.create_ensemble`, then pass the result to `xclim.ensembles.make_criteria`. Alternatively, the datasets can be passed directly to `xscen.ensembles.reduce_ensemble` and the necessary preliminary steps will be accomplished automatically.\n",
"\n",
"`xs.build_reduction_data` can be used to prepare the data for ensemble reduction. Its arguments are:\n",
"In this example, the number of criteria will corresponds to: `indicators x horizons x longitude x latitude`, but criteria that are purely NaN across all realizations will be removed.\n",
"\n",
"- `datasets` (dict, list)\n",
"- `xrfreqs` are the unique frequencies of the indicators.\n",
"- `horizons` is used to instruct on which horizon(s) to build the data from.\n",
"Note that `xs.spatial_mean` could have been used prior to calling that function to remove the spatial dimensions.\n",
"\n",
"Because a simulation could have multiple datasets (in the case of multiple frequencies), an attempt will be made to decipher the ID and frequency from the metadata."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = xs.build_reduction_data(\n",
" datasets=datasets,\n",
" xrfreqs=[\"AS-JAN\"],\n",
" horizons=[\"2021-2050\"],\n",
")\n",
"\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The number of criteria corresponds to: `indicators x horizons x longitude x latitude`, but criteria that are purely NaN across all realizations are removed.\n",
"\n",
"Note that `xs.spatial_mean` could have been used prior to calling that function to remove the spatial dimensions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting a reduced ensemble\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
Expand All @@ -117,9 +86,11 @@
"</div>\n",
"\n",
"Ensemble reduction through `xscen.reduce_ensemble` consists in a simple call to `xclim`. The arguments are:\n",
"- `data`, which is the 2D DataArray that is created by using `xs.build_reduction_data`.\n",
"- `data`, which is the aforementioned 2D DataArray, or the list/dict of datasets required to build it.\n",
"- `method` is either `kkz` or `kmeans`. See the link above for further details on each technique.\n",
"- `kwargs` is a dictionary of arguments to send to the method chosen."
"- `horizons` is used to instruct on which horizon(s) to build the data from, if data needs to be constructed.\n",
"- `create_kwargs`, the arguments to pass to `xclim.ensembles.create_ensemble` if data needs to be constructed.\n",
"- `kwargs` is a dictionary of arguments to send to the clustering method chosen."
]
},
{
Expand All @@ -129,7 +100,7 @@
"outputs": [],
"source": [
"selected, clusters, fig_data = xs.reduce_ensemble(\n",
" data=data, method=\"kmeans\", kwargs={\"method\": {\"n_clusters\": 3}}\n",
" data=datasets, method=\"kmeans\", horizons=[\"2021-2050\"], max_clusters=3\n",
")"
]
},
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ values = [
[tool.coverage.run]
relative_files = true
include = ["xscen/*"]
omit = ["docs/notebooks/*.ipynb", "tests/*.py"]
omit = ["docs/notebooks/*.ipynb", "tests/*.py", "xscen/reduce.py"] # FIXME: Remove xscen/reduce.py when it's fully deleted.

[tool.isort]
append_only = true
Expand Down
2 changes: 1 addition & 1 deletion templates/1-basic_workflow_with_config/workflow1.py
Original file line number Diff line number Diff line change
Expand Up @@ -417,7 +417,7 @@
xs.measure_time(name=f"{cur}", logger=logger),
):
# Compute climatological mean
ds_mean = xs.climatological_mean(ds=ds_input)
ds_mean = xs.climatological_op(ds=ds_input)

# Save to zarr
path = f"{CONFIG['paths']['task']}".format(**cur)
Expand Down
55 changes: 55 additions & 0 deletions tests/test_ensembles.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import xesmf as xe
except ImportError:
xe = None
from xclim.testing import open_dataset
from xclim.testing.helpers import test_timeseries as timeseries

import xscen as xs
Expand Down Expand Up @@ -1119,3 +1120,57 @@ def test_build_partition_data(self, samplecat, tmp_path):
subset_kw=dict(name="mtl", method="gridpoint", lat=[45.0], lon=[-74]),
indicators_kw=dict(indicators=[xc.atmos.tg_mean, xc.indicators.cf.tg]),
)


class TestReduceEnsemble:
def test_with_criteria(self):
ds = open_dataset("EnsembleReduce/TestEnsReduceCriteria.nc")
selected, clusters, fig_data = xs.reduce_ensemble(
ds["data"], method="kmeans", max_clusters=3
)
assert selected.shape == (3,)
np.testing.assert_array_equal(selected, [4, 7, 23])
assert len(clusters) == 3
assert fig_data is not None

@pytest.mark.parametrize("horizon", ["1981-2010", "2021-2050"])
def test_without_criteria(self, horizon):
datasets = {
"ACCESS": "EnsembleStats/BCCAQv2+ANUSPLIN300_ACCESS1-0_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc",
"BNU-ESM": "EnsembleStats/BCCAQv2+ANUSPLIN300_BNU-ESM_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc",
"CCSM4-r1": "EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc",
"CCSM4-r2": "EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r2i1p1_1950-2100_tg_mean_YS.nc",
"CNRM-CM5": "EnsembleStats/BCCAQv2+ANUSPLIN300_CNRM-CM5_historical+rcp45_r1i1p1_1970-2050_tg_mean_YS.nc",
}
for d in datasets:
ds = open_dataset(datasets[d]).isel(lon=slice(0, 4), lat=slice(0, 4))
ds = xs.climatological_op(
ds,
op="mean",
window=30,
periods=[["1981", "2010"], ["2021", "2050"]],
horizons_as_dim=True,
).drop_vars("time")
datasets[d] = xs.compute_deltas(ds, reference_horizon="1981-2010")

selected, clusters, fig_data = xs.reduce_ensemble(
datasets, method="kkz", horizons=[horizon], num_select=4
)
assert selected.shape == (4,)
answer = (
["ACCESS", "BNU-ESM", "CNRM-CM5", "CCSM4-r1"]
if horizon == "2021-2050"
else ["ACCESS", "BNU-ESM", "CCSM4-r1", "CCSM4-r2"]
)
np.testing.assert_array_equal(selected, answer)
assert clusters == {}
assert fig_data == {}

def test_errors(self):
ds = open_dataset("EnsembleReduce/TestEnsReduceCriteria.nc")
with pytest.raises(
ValueError, match="Data must have a 'horizon' dimension to be subsetted."
):
xs.reduce_ensemble(
ds["data"], method="kmeans", horizons=["1981-2010"], max_clusters=3
)
2 changes: 0 additions & 2 deletions xscen/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
extract,
indicators,
io,
reduce,
regrid,
scripting,
spatial,
Expand All @@ -38,7 +37,6 @@
)
from .indicators import compute_indicators
from .io import save_to_netcdf, save_to_table, save_to_zarr
from .reduce import build_reduction_data, reduce_ensemble
from .regrid import *
from .scripting import (
TimeoutException,
Expand Down
74 changes: 74 additions & 0 deletions xscen/ensembles.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
"build_partition_data",
"ensemble_stats",
"generate_weights",
"reduce_ensemble",
]


Expand Down Expand Up @@ -764,3 +765,76 @@ def build_partition_data(
ens = ens.rename(rename_dict)

return ens


@parse_config
def reduce_ensemble(
data: Union[xr.DataArray, dict, list, xr.Dataset],
method: str,
*,
horizons: Optional[list[str]] = None,
create_kwargs: Optional[dict] = None,
**kwargs,
):
r"""Reduce an ensemble of simulations using clustering algorithms from xclim.ensembles.

Parameters
----------
data : xr.DataArray
Selection criteria data : 2-D xr.DataArray with dimensions 'realization' and 'criteria'.
These are the values used for clustering. Realizations represent the individual original
ensemble members and criteria the variables/indicators used in the grouping algorithm.
This data can be generated using py:func:`xclim.ensembles.make_criteria`.
Alternatively, either a xr.Dataset, a list of xr.Dataset or a dictionary of xr.Dataset can be passed,
in which case the data will be built using py:func:`xclim.ensembles.create_ensemble` and py:func:`xclim.ensembles.make_criteria`.
method : str
['kkz', 'kmeans']. Clustering method.
horizons : list of str, optional
Subset of horizons on which to create the data. Only used if `data` needs to be built.
create_kwargs : dict, optional
Arguments to pass to py:func:`xclim.ensembles.create_ensemble` if `data` is not an xr.DataArray.
\*\*kwargs : dict
Arguments to send to either py:func:`xclim.ensembles.kkz_reduce_ensemble` or py:func:`xclim.ensembles.kmeans_reduce_ensemble`.

Returns
-------
selected : xr.DataArray
DataArray of dimension 'realization' with the selected simulations.
clusters : dict
If using kmeans clustering, realizations grouped by cluster.
fig_data : dict
If using kmeans clustering, data necessary to call py:func:`xclim.ensembles.plot_rsqprofile`.

Notes
-----
If building `data` to be constructed by this function, the datasets should already have a climatology computed on them, such that the data
has no temporal dimension aside from the "horizon" coordinate (which is optional and might be used to subset the data).
If the indicators are a mix of yearly, seasonal, and monthly, they should be stacked on the same time/horizon axis and put in the same dataset.
You can use py:func:`xscen.utils.unstack_dates` on seasonal or monthly indicators to this end.
"""
if isinstance(data, (list, dict)):
data = ensembles.create_ensemble(datasets=data, **(create_kwargs or {}))
if horizons:
if "horizon" not in data.dims:
raise ValueError("Data must have a 'horizon' dimension to be subsetted.")
data = data.sel(horizon=horizons)
if "criteria" not in data.dims:
data = ensembles.make_criteria(data)

selected = getattr(ensembles, f"{method}_reduce_ensemble")(data=data, **kwargs)

clusters = {}
fig_data = {}
if method == "kmeans":
fig_data = selected[2]
clusters_tmp = selected[1]
selected = selected[0]
realization = np.arange(len(clusters_tmp))

clusters = {
g: data.realization.isel(realization=realization[clusters_tmp == g])
for g in np.unique(clusters_tmp)
}
selected = data.realization.isel(realization=selected)

return selected, clusters, fig_data
Loading
Loading