Skip to content

Commit

Permalink
add get_partition_input (#289)
Browse files Browse the repository at this point in the history
<!-- Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [ ] This PR addresses an already opened issue (for bug fixes /
features)
    - This PR fixes #xyz
- [x] (If applicable) Documentation has been added / updated (for bug
fixes / features).
- [x] (If applicable) Tests have been added.
- [x] This PR does not seem to break the templates.
- [x] HISTORY.rst has been updated (with summary of main changes).
- [x] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added.

### What kind of change does this PR introduce?

* Add `xs.ensembles.get_partition_input` to prepare data from catalog to
be passed to xclim partionning functions
(https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning)
* Now, I am wondering if it would be more xscen to also wrap the xclim
fonction instead only prepping the data? This would allow to add a attrs
`processing_level= partition` to be able to put it in a a catalog.
workflow 1: `xs.get_partition_input` -> `xc.ensembles.lafferty_sriver`
-> `fg.partition`
workflow 2: `xs.partition` (which gets the input and wraps xclim) ->
maybe save to disk and catalog -> `fg.partition` or other plots (maybe
maps of one component)
EDIT: I went with workflow 1

### Does this PR introduce a breaking change?
no

### Other information:
* I wasn't sure if this should be in `extract.py` (because we extract
from a cat) or in `ensembles.py` ( to match the xclim module).
* companion de Ouranosinc/xclim#1529 et
Ouranosinc/figanos#134
  • Loading branch information
juliettelavoie authored Jan 10, 2024
2 parents ce54df8 + df30d75 commit fca2816
Show file tree
Hide file tree
Showing 11 changed files with 273 additions and 12 deletions.
4 changes: 4 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ New features and enhancements
* ``xs.io.round_bits`` to round floating point variable up to a number of bits, allowing for a better compression. This can be combined with the saving step through argument ``"bitround"`` of ``save_to_netcdf`` and ``save_to_zarr``. (:pull:`266`).
* Added annual global tas timeseries for CMIP6's models CMCC-ESM2 (ssp245, ssp370, ssp585), EC-Earth3-CC (ssp245, ssp585), KACE-1-0-G (ssp245, ssp370, ssp585) and TaiESM1 (ssp245, ssp370). Moved global tas database to a netCDF file. (:issue:`268`, :pull:`270`).
* Implemented support for multiple levels and models in ``xs.subset_warming_level``. Better support for `DataArray` and `DataFrame` in ``xs.get_warming_level``. (:pull:`270`).
* Added the ability to directly provide an ensemble dataset to ``xs.ensemble_stats``. (:pull:`299`).
* Added support in ``xs.ensemble_stats`` for the new robustness-related functions available in `xclim`. (:pull:`299`).
* New function ``xs.ensembles.get_partition_input`` (:pull:`289`).

Breaking changes
^^^^^^^^^^^^^^^^
Expand All @@ -41,6 +44,7 @@ Bug fixes
* `search_data_catalogs` now eliminates anything that matches any entry in `exclusions`. (:issue:`275`, :pull:`280`).
* Fixed a bug in ``xs.scripting.save_and_update`` where ``build_path_kwargs`` was ignored when trying to guess the file format. (:pull:`282`).
* Add a warning to ``xs.extract._dispatch_historical_to_future``. (:issue:`286`, :pull:`287`).
* Modify use_cftime for the calendar conversion in ``to_dataset``. (:issue:`303`, :pull:`289`).

Internal changes
^^^^^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion conda/xscen/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ requirements:
- pyyaml
- rechunker
- shapely >= 2
- sparse
- sparse <=0.14
- toolz
- xarray
- xclim >=0.43.0
Expand Down
2 changes: 1 addition & 1 deletion conda/xscen/recipe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ requirements:
- pyyaml
- rechunker
- shapely >= 2
- sparse
- sparse <=0.14
- toolz
- xarray
- xclim >=0.43.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ensemble reduction\n",
"# Ensembles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ensemble reduction\n",
"\n",
"This tutorial will explore ensemble reduction (also known as ensemble selection) using `xscen`. This will use pre-computed annual mean temperatures from `xclim.testing`."
]
Expand Down Expand Up @@ -39,7 +46,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing the data\n",
"### Preparing the data\n",
"\n",
"Ensemble reduction is built upon climate indicators that are relevant to represent the ensemble's variability for a given application. In this case, we'll use the mean temperature delta between 2021-2050 and 1981-2010.\n",
"\n",
Expand Down Expand Up @@ -82,7 +89,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selecting a reduced ensemble\n",
"### Selecting a reduced ensemble\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
" \n",
Expand Down Expand Up @@ -145,13 +152,144 @@
"\n",
"plot_rsqprofile(fig_data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ensemble partition\n",
"This tutorial will show how to use xscen to create the input for [xclim partition functions](https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a catalog for the tutorial\n",
"from pathlib import Path\n",
"\n",
"output_folder = Path().absolute() / \"_data\"\n",
"\n",
"project = {\n",
" \"title\": \"partition-catalog\",\n",
" \"description\": \"Catalog for the tutorial NetCDFs.\",\n",
"}\n",
"\n",
"cat = xs.ProjectCatalog(\n",
" str(output_folder / \"partition-catalog.json\"),\n",
" create=True,\n",
" project=project,\n",
" overwrite=True,\n",
")\n",
"\n",
"for bap in [\"A\", \"B\"]:\n",
" for s in [\"model1\", \"model2\"]:\n",
" df = xs.parse_directory(\n",
" directories=[f\"{Path().absolute()}/samples/tutorial/\"],\n",
" patterns=[\n",
" \"{activity}/{domain}/{institution}/{source}/{experiment}/{member}/{frequency}/{?:_}.nc\"\n",
" ],\n",
" homogenous_info={\n",
" \"mip_era\": \"CMIP6\",\n",
" \"type\": \"simulation\",\n",
" \"processing_level\": \"raw\",\n",
" \"bias_adjust_project\": bap,\n",
" \"source\": s,\n",
" },\n",
" read_from_file=[\"variable\", \"date_start\", \"date_end\"],\n",
" )\n",
"\n",
" cat.update(df)\n",
"cat.df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From a dictionary of datasets, the function creates a dataset with new dimensions in `partition_dim`(`[\"source\", \"experiment\", \"bias_adjust_project\"]`). \n",
"- By default, it translates the xscen vocabulary (eg. `experiment`) to the xclim partition vocabulary (eg. `scenario`). It is possible to pass `rename_dict` to rename the dimensions with other names.\n",
"- If the inputs are not on the same grid, they can be regridded through `regrid_kw` or subset to a point through `subset_kw`. The functions assumes that if there are different `bias_adjust_project`, they will be on different grids (with all `source` on the same grid). If there is one or less `bias_adjust_project`, the assumption is that`source` have different grids."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create a dictionnary of datasets wanted for the partition\n",
"input_dict = cat.search(variable=\"tas\", member=\"r1i1p1f1\").to_dataset_dict()\n",
"\n",
"# build a single dataset\n",
"\n",
"ds = xs.ensembles.build_partition_data(\n",
" input_dict, subset_kw=dict(name=\"mtl\", method=\"gridpoint\", lat=[45.5], lon=[-73.6])\n",
")\n",
"ds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pass the input to an xclim partition function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"nbsphinx": "hidden"
},
"outputs": [],
"source": [
"# This is a hidden cell.\n",
"# extend with fake data to have at least 3 years\n",
"import xarray as xr\n",
"\n",
"ds2 = ds.copy()\n",
"ds[\"time\"] = xr.cftime_range(start=\"2001-01-01\", periods=len(ds[\"time\"]), freq=\"D\")\n",
"ds2[\"time\"] = xr.cftime_range(start=\"2003-01-01\", periods=len(ds[\"time\"]), freq=\"D\")\n",
"ds = xr.concat([ds, ds2], dim=\"time\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import xclim as xc\n",
"\n",
"# get a yearly dataset\n",
"da = xc.atmos.tg_mean(ds=ds)\n",
"\n",
"# compute uncertainty partitionning\n",
"mean, uncertainties = xc.ensembles.hawkins_sutton(da)\n",
"uncertainties"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
" \n",
"Note that the [figanos library](https://figanos.readthedocs.io/en/latest/) provides a function `fg.partition` to plot the uncertainties.\n",
" \n",
"</div>"
]
}
],
"metadata": {
"@webio": {
"lastCommId": null,
"lastKernelId": null
},
"celltoolbar": "Aucun(e)",
"language_info": {
"codemirror_mode": {
"name": "ipython",
Expand All @@ -162,7 +300,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.9.13"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion docs/notebooks/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,6 @@ Examples
1_catalog
2_getting_started
3_diagnostics
4_ensemble_reduction
4_ensembles
5_warminglevels
6_config
2 changes: 1 addition & 1 deletion environment-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ dependencies:
- pyyaml
- rechunker
- shapely >=2.0
- sparse
- sparse <=0.14 # 0.15 breaks xe.SpatialAverager (https://github.com/Ouranosinc/xscen/actions/runs/7463533566/job/20308477398)
- toolz
- xarray <2023.11.0
- xclim >=0.46.0
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ dependencies:
- pyyaml
- rechunker
- shapely >=2.0
- sparse
- sparse <=0.14
- toolz
- xarray <2023.11.0
- xclim >=0.46.0
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ dependencies = [
"pyyaml",
"rechunker",
"shapely>=2.0",
"sparse",
"sparse<=0.14",
"toolz",
# FIXME: Unpin xarray before releasing!
"xarray<2023.11.0",
Expand Down
35 changes: 35 additions & 0 deletions tests/test_ensembles.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import numpy as np
import pytest
import xarray as xr
import xesmf
from xclim.testing.helpers import test_timeseries as timeseries

import xscen as xs
Expand Down Expand Up @@ -1059,3 +1060,37 @@ def test_attribute_weight_error(self):
self.ens_rcm,
attribute_weights={"experiment": {"rcp45": 2, "rcp85": 1}},
)


class TestEnsemblePartition:
def test_build_partition_data(self, samplecat, tmp_path):
# test subset
datasets = samplecat.search(variable="tas").to_dataset_dict()
ds = xs.ensembles.build_partition_data(
datasets=datasets,
partition_dim=["source", "experiment"],
subset_kw=dict(name="mtl", method="gridpoint", lat=[45.0], lon=[-74]),
rename_dict={"source": "new-name"},
)

assert ds.dims == {"time": 730, "scenario": 4, "new-name": 2}
assert ds.lat.values == 45.0
assert ds.lon.values == -74
assert [i for i in ds.data_vars] == ["tas"]

# test regrid
ds_grid = xesmf.util.cf_grid_2d(-75, -74, 0.25, 45, 48, 0.55)
datasets = samplecat.search(variable="tas", member="r1i1p1f1").to_dataset_dict()
ds = xs.ensembles.build_partition_data(
datasets=datasets,
regrid_kw=dict(ds_grid=ds_grid, weights_location=tmp_path),
)

assert ds.dims == {
"scenario": 4,
"model": 1,
"time": 730,
"lat": 5,
"lon": 4,
}
assert [i for i in ds.data_vars] == ["tas"]
2 changes: 1 addition & 1 deletion xscen/catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -501,7 +501,7 @@ def preprocess(ds):
ds = ensure_correct_time(ds, xrfreq)
if calendar is not None:
ds = ds.convert_calendar(
calendar, use_cftime=(calendar == "default"), align_on="date"
calendar, use_cftime=(calendar != "default"), align_on="date"
)
return ds

Expand Down
Loading

0 comments on commit fca2816

Please sign in to comment.