add get_partition_input (#289)

### Pull Request Checklist: - [ ] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #xyz - [x] (If applicable) Documentation has been added / updated (for bug fixes / features). - [x] (If applicable) Tests have been added. - [x] This PR does not seem to break the templates. - [x] HISTORY.rst has been updated (with summary of main changes). - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added. ### What kind of change does this PR introduce? * Add `xs.ensembles.get_partition_input` to prepare data from catalog to be passed to xclim partionning functions (https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning) * Now, I am wondering if it would be more xscen to also wrap the xclim fonction instead only prepping the data? This would allow to add a attrs `processing_level= partition` to be able to put it in a a catalog. workflow 1: `xs.get_partition_input` -> `xc.ensembles.lafferty_sriver` -> `fg.partition` workflow 2: `xs.partition` (which gets the input and wraps xclim) -> maybe save to disk and catalog -> `fg.partition` or other plots (maybe maps of one component) EDIT: I went with workflow 1 ### Does this PR introduce a breaking change? no ### Other information: * I wasn't sure if this should be in `extract.py` (because we extract from a cat) or in `ensembles.py` ( to match the xclim module). * companion de Ouranosinc/xclim#1529 et Ouranosinc/figanos#134
Ouranosinc · Jan 10, 2024 · fca2816 · fca2816
2 parents ce54df8 + df30d75
commit fca2816
Show file tree

Hide file tree

Showing 11 changed files with 273 additions and 12 deletions.
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -24,6 +24,9 @@ New features and enhancements
 * ``xs.io.round_bits`` to round floating point variable up to a number of bits, allowing for a better compression. This can be combined with the saving step through argument ``"bitround"`` of ``save_to_netcdf`` and ``save_to_zarr``. (:pull:`266`).
 * Added annual global tas timeseries for CMIP6's models CMCC-ESM2 (ssp245, ssp370, ssp585), EC-Earth3-CC (ssp245, ssp585), KACE-1-0-G (ssp245, ssp370, ssp585) and TaiESM1 (ssp245, ssp370). Moved global tas database to a netCDF file. (:issue:`268`, :pull:`270`).
 * Implemented support for multiple levels and models in ``xs.subset_warming_level``. Better support for `DataArray` and `DataFrame` in ``xs.get_warming_level``. (:pull:`270`).
+* Added the ability to directly provide an ensemble dataset to ``xs.ensemble_stats``. (:pull:`299`).
+* Added support in ``xs.ensemble_stats`` for the new robustness-related functions available in `xclim`. (:pull:`299`).
+* New function ``xs.ensembles.get_partition_input`` (:pull:`289`).
 
 Breaking changes
 ^^^^^^^^^^^^^^^^
@@ -41,6 +44,7 @@ Bug fixes
 * `search_data_catalogs` now eliminates anything that matches any entry in `exclusions`. (:issue:`275`, :pull:`280`).
 * Fixed a bug in ``xs.scripting.save_and_update`` where ``build_path_kwargs`` was ignored when trying to guess the file format. (:pull:`282`).
 * Add a warning to ``xs.extract._dispatch_historical_to_future``. (:issue:`286`, :pull:`287`).
+* Modify use_cftime for the calendar conversion in ``to_dataset``. (:issue:`303`, :pull:`289`).
 
 Internal changes
 ^^^^^^^^^^^^^^^^

diff --git a/conda/xscen/meta.yaml b/conda/xscen/meta.yaml
@@ -40,7 +40,7 @@ requirements:
     - pyyaml
     - rechunker
     - shapely >= 2
-    - sparse
+    - sparse <=0.14
     - toolz
     - xarray
     - xclim >=0.43.0

diff --git a/conda/xscen/recipe.yaml b/conda/xscen/recipe.yaml
@@ -41,7 +41,7 @@ requirements:
     - pyyaml
     - rechunker
     - shapely >= 2
-    - sparse
+    - sparse <=0.14
     - toolz
     - xarray
     - xclim >=0.43.0

diff --git a/docs/notebooks/4_ensemble_reduction.ipynb → docs/notebooks/4_ensembles.ipynb b/docs/notebooks/4_ensemble_reduction.ipynb → docs/notebooks/4_ensembles.ipynb
@@ -4,7 +4,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Ensemble reduction\n",
+    "# Ensembles"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Ensemble reduction\n",
     "\n",
     "This tutorial will explore ensemble reduction (also known as ensemble selection) using `xscen`. This will use pre-computed annual mean temperatures from `xclim.testing`."
    ]
@@ -39,7 +46,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Preparing the data\n",
+    "### Preparing the data\n",
     "\n",
     "Ensemble reduction is built upon climate indicators that are relevant to represent the ensemble's variability for a given application. In this case, we'll use the mean temperature delta between 2021-2050 and 1981-2010.\n",
     "\n",
@@ -82,7 +89,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Selecting a reduced ensemble\n",
+    "### Selecting a reduced ensemble\n",
     "\n",
     "<div class=\"alert alert-info\"> <b>NOTE</b>\n",
     "    \n",
@@ -145,13 +152,144 @@
     "\n",
     "plot_rsqprofile(fig_data)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Ensemble partition\n",
+    "This tutorial will show how to use xscen to create the input for [xclim partition functions](https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a catalog for the tutorial\n",
+    "from pathlib import Path\n",
+    "\n",
+    "output_folder = Path().absolute() / \"_data\"\n",
+    "\n",
+    "project = {\n",
+    "    \"title\": \"partition-catalog\",\n",
+    "    \"description\": \"Catalog for the tutorial NetCDFs.\",\n",
+    "}\n",
+    "\n",
+    "cat = xs.ProjectCatalog(\n",
+    "    str(output_folder / \"partition-catalog.json\"),\n",
+    "    create=True,\n",
+    "    project=project,\n",
+    "    overwrite=True,\n",
+    ")\n",
+    "\n",
+    "for bap in [\"A\", \"B\"]:\n",
+    "    for s in [\"model1\", \"model2\"]:\n",
+    "        df = xs.parse_directory(\n",
+    "            directories=[f\"{Path().absolute()}/samples/tutorial/\"],\n",
+    "            patterns=[\n",
+    "                \"{activity}/{domain}/{institution}/{source}/{experiment}/{member}/{frequency}/{?:_}.nc\"\n",
+    "            ],\n",
+    "            homogenous_info={\n",
+    "                \"mip_era\": \"CMIP6\",\n",
+    "                \"type\": \"simulation\",\n",
+    "                \"processing_level\": \"raw\",\n",
+    "                \"bias_adjust_project\": bap,\n",
+    "                \"source\": s,\n",
+    "            },\n",
+    "            read_from_file=[\"variable\", \"date_start\", \"date_end\"],\n",
+    "        )\n",
+    "\n",
+    "        cat.update(df)\n",
+    "cat.df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "From a dictionary of datasets, the function creates a dataset with new dimensions in `partition_dim`(`[\"source\", \"experiment\", \"bias_adjust_project\"]`). \n",
+    "- By default, it translates the xscen vocabulary (eg. `experiment`) to the xclim partition vocabulary (eg. `scenario`). It is possible to pass `rename_dict` to rename the dimensions with other names.\n",
+    "- If the inputs are not on the same grid, they can be regridded through `regrid_kw` or subset to a point through `subset_kw`. The functions assumes that if there are different `bias_adjust_project`, they will be on different grids (with all `source` on the same grid). If there is one or less `bias_adjust_project`, the assumption is that`source` have different grids."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create a dictionnary of datasets wanted for the partition\n",
+    "input_dict = cat.search(variable=\"tas\", member=\"r1i1p1f1\").to_dataset_dict()\n",
+    "\n",
+    "# build a single dataset\n",
+    "\n",
+    "ds = xs.ensembles.build_partition_data(\n",
+    "    input_dict, subset_kw=dict(name=\"mtl\", method=\"gridpoint\", lat=[45.5], lon=[-73.6])\n",
+    ")\n",
+    "ds"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Pass the input to an xclim partition function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "nbsphinx": "hidden"
+   },
+   "outputs": [],
+   "source": [
+    "# This is a hidden cell.\n",
+    "# extend with fake data to have at least 3 years\n",
+    "import xarray as xr\n",
+    "\n",
+    "ds2 = ds.copy()\n",
+    "ds[\"time\"] = xr.cftime_range(start=\"2001-01-01\", periods=len(ds[\"time\"]), freq=\"D\")\n",
+    "ds2[\"time\"] = xr.cftime_range(start=\"2003-01-01\", periods=len(ds[\"time\"]), freq=\"D\")\n",
+    "ds = xr.concat([ds, ds2], dim=\"time\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import xclim as xc\n",
+    "\n",
+    "# get a yearly dataset\n",
+    "da = xc.atmos.tg_mean(ds=ds)\n",
+    "\n",
+    "# compute uncertainty partitionning\n",
+    "mean, uncertainties = xc.ensembles.hawkins_sutton(da)\n",
+    "uncertainties"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div class=\"alert alert-info\"> <b>NOTE</b>\n",
+    "    \n",
+    "Note that the [figanos library](https://figanos.readthedocs.io/en/latest/) provides a function `fg.partition` to plot the uncertainties.\n",
+    "    \n",
+    "</div>"
+   ]
   }
  ],
  "metadata": {
   "@webio": {
    "lastCommId": null,
    "lastKernelId": null
   },
+  "celltoolbar": "Aucun(e)",
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
@@ -162,7 +300,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.9.13"
   }
  },
  "nbformat": 4,

diff --git a/docs/notebooks/index.rst b/docs/notebooks/index.rst
@@ -8,6 +8,6 @@ Examples
     1_catalog
     2_getting_started
     3_diagnostics
-    4_ensemble_reduction
+    4_ensembles
     5_warminglevels
     6_config
diff --git a/environment-dev.yml b/environment-dev.yml
@@ -25,7 +25,7 @@ dependencies:
   - pyyaml
   - rechunker
   - shapely >=2.0
-  - sparse
+  - sparse <=0.14 # 0.15 breaks xe.SpatialAverager (https://github.com/Ouranosinc/xscen/actions/runs/7463533566/job/20308477398)
   - toolz
   - xarray <2023.11.0
   - xclim >=0.46.0

diff --git a/environment.yml b/environment.yml
@@ -25,7 +25,7 @@ dependencies:
   - pyyaml
   - rechunker
   - shapely >=2.0
-  - sparse
+  - sparse <=0.14
   - toolz
   - xarray <2023.11.0
   - xclim >=0.46.0

diff --git a/pyproject.toml b/pyproject.toml
@@ -56,7 +56,7 @@ dependencies = [
   "pyyaml",
   "rechunker",
   "shapely>=2.0",
-  "sparse",
+  "sparse<=0.14",
   "toolz",
   # FIXME: Unpin xarray before releasing!
   "xarray<2023.11.0",

diff --git a/tests/test_ensembles.py b/tests/test_ensembles.py
@@ -4,6 +4,7 @@
 import numpy as np
 import pytest
 import xarray as xr
+import xesmf
 from xclim.testing.helpers import test_timeseries as timeseries
 
 import xscen as xs
@@ -1059,3 +1060,37 @@ def test_attribute_weight_error(self):
                     self.ens_rcm,
                     attribute_weights={"experiment": {"rcp45": 2, "rcp85": 1}},
                 )
+
+
+class TestEnsemblePartition:
+    def test_build_partition_data(self, samplecat, tmp_path):
+        # test subset
+        datasets = samplecat.search(variable="tas").to_dataset_dict()
+        ds = xs.ensembles.build_partition_data(
+            datasets=datasets,
+            partition_dim=["source", "experiment"],
+            subset_kw=dict(name="mtl", method="gridpoint", lat=[45.0], lon=[-74]),
+            rename_dict={"source": "new-name"},
+        )
+
+        assert ds.dims == {"time": 730, "scenario": 4, "new-name": 2}
+        assert ds.lat.values == 45.0
+        assert ds.lon.values == -74
+        assert [i for i in ds.data_vars] == ["tas"]
+
+        # test regrid
+        ds_grid = xesmf.util.cf_grid_2d(-75, -74, 0.25, 45, 48, 0.55)
+        datasets = samplecat.search(variable="tas", member="r1i1p1f1").to_dataset_dict()
+        ds = xs.ensembles.build_partition_data(
+            datasets=datasets,
+            regrid_kw=dict(ds_grid=ds_grid, weights_location=tmp_path),
+        )
+
+        assert ds.dims == {
+            "scenario": 4,
+            "model": 1,
+            "time": 730,
+            "lat": 5,
+            "lon": 4,
+        }
+        assert [i for i in ds.data_vars] == ["tas"]
diff --git a/xscen/catalog.py b/xscen/catalog.py
@@ -501,7 +501,7 @@ def preprocess(ds):
                 ds = ensure_correct_time(ds, xrfreq)
                 if calendar is not None:
                     ds = ds.convert_calendar(
-                        calendar, use_cftime=(calendar == "default"), align_on="date"
+                        calendar, use_cftime=(calendar != "default"), align_on="date"
                     )
                 return ds