Merge branch 'main' into climatological_op

Ouranosinc · Dec 19, 2023 · 199a796 · 199a796
2 parents 76959d1 + acb0f73
commit 199a796
Show file tree

Hide file tree

Showing 25 changed files with 943 additions and 506 deletions.
diff --git a/.cruft.json b/.cruft.json
@@ -11,7 +11,7 @@
       "project_slug": "xscen",
       "project_short_description": "A climate change scenario-building analysis framework, built with xclim/xarray.",
       "pypi_username": "RondeauG",
-      "version": "0.7.22-beta",
+      "version": "0.7.24-beta",
       "use_pytest": "y",
       "use_black": "y",
       "use_conda": "y",

diff --git a/.gitignore b/.gitignore
@@ -4,6 +4,13 @@ docs/apidoc/modules.rst
 docs/apidoc/xscen*.rst
 docs/notebooks/_data
 
+# Files generated by the notebooks
+docs/**.nc
+docs/**.zarr
+docs/notebooks/samples/example*
+docs/notebooks/samples/gs-weights/
+!docs/notebooks/samples/tutorial/*/*/*/*/*/*/*/*.nc
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
@@ -116,10 +123,3 @@ venv.bak/
 
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 __pypackages__/
-
-# Files generated by the notebooks
-*.nc
-*.zarr
-docs/notebooks/samples/example*
-docs/notebooks/samples/gs-weights/
-!docs/notebooks/samples/tutorial/*/*/*/*/*/*/*/*.nc
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -79,6 +79,7 @@ repos:
     hooks:
       - id: nbstripout
         files: ".ipynb"
+        args: [ '--extra-keys', 'metadata.kernelspec' ]
   - repo: https://github.com/Yelp/detect-secrets
     rev: v1.4.0
     hooks:

diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -17,7 +17,7 @@ build:
       - pip install . --no-deps
     pre_build:
       - sphinx-apidoc -o docs/apidoc --private --module-first xscen
-      - env SKIP_NOTEBOOKS=1 sphinx-build -b linkcheck docs/ _build/linkcheck
+      - env SKIP_NOTEBOOKS=1 sphinx-build -b linkcheck docs/ _build/linkcheck || true
 #    post_build:
 #      - rm -rf docs/notebooks/_data
 

diff --git a/CHANGES.rst b/CHANGES.rst
@@ -22,13 +22,16 @@ New features and enhancements
 * Better ``xs.extract.resample`` : support for weighted resampling operations when starting with frequencies coarser than daily and missing timesteps/values handling. (:issue:`80`, :issue:`93`, :pull:`265`).
 * New argument ``attribute_weights`` to ``generate_weights`` to allow for custom weights. (:pull:`252`).
 * ``xs.io.round_bits`` to round floating point variable up to a number of bits, allowing for a better compression. This can be combined with the saving step through argument ``"bitround"`` of ``save_to_netcdf`` and ``save_to_zarr``. (:pull:`266`).
-* Added the ability to directly provide an ensemble dataset to ``xs.ensemble_stats``. (:pull:`299`).
-* Added support in ``xs.ensemble_stats`` for the new robustness-related functions available in `xclim`. (:pull:`299`).
+* Added annual global tas timeseries for CMIP6's models CMCC-ESM2 (ssp245, ssp370, ssp585), EC-Earth3-CC (ssp245, ssp585), KACE-1-0-G (ssp245, ssp370, ssp585) and TaiESM1 (ssp245, ssp370). Moved global tas database to a netCDF file. (:issue:`268`, :pull:`270`).
+* Implemented support for multiple levels and models in ``xs.subset_warming_level``. Better support for `DataArray` and `DataFrame` in ``xs.get_warming_level``. (:pull:`270`).
 
 Breaking changes
 ^^^^^^^^^^^^^^^^
 * ``climatological_mean()`` has been replaced with ``climatological_op()`` and will be abandoned in a future version. (:pull:`290`)
 * ``experiment_weights`` argument in ``generate_weights`` was renamed to ``balance_experiments``. (:pull:`252`).
+* New argument ``attribute_weights`` to ``generate_weights`` to allow for custom weights. (:pull:`252`).
+* For a sequence of models, the output of ``xs.get_warming_level`` is now a list. Revert to a dictionary with ``output='selected'`` (:pull:`270`).
+* The global average temperature database is now a netCDF, custom databases must follow the same format (:pull:`270`).
 
 Bug fixes
 ^^^^^^^^^
@@ -72,6 +75,7 @@ Internal changes
 * Linting checks now examine the testing folder, function complexity, and alphabetical order of `__all__` lists. (:pull:`292`).
 * ``publish_release_notes`` now uses better logic for finding and reformatting the `CHANGES.rst` file. (:pull:`292`).
 * ``bump2version`` version-bumping utility was replaced by ``bump-my-version``. (:pull:`292`).
+* Documentation build checks no longer fail due to broken external links; Notebooks are now nested and numbered. (:pull:`304`).
 
 v0.7.1 (2023-08-23)
 -------------------

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -9,6 +9,7 @@ include .zenodo.json
 recursive-include xscen *.py *.yml
 recursive-include xscen/CVs *.json
 recursive-include xscen/data/fr *.yml *.csv
+recursive-include xscen/data *.nc
 recursive-include xscen/data/fr/LC_MESSAGES *.mo *.po
 recursive-include tests *.py
 recursive-include docs conf.py Makefile make.bat *.png *.rst *.yml
@@ -18,7 +19,8 @@ recursive-include docs/notebooks/samples *.csv *.json *.yml
 recursive-exclude * __pycache__
 recursive-exclude * *.py[co]
 recursive-exclude .github *.md *.yml
-recursive-exclude conda *.yaml
+recursive-exclude conda *.yaml *.yml
+recursive-exclude docs notebooks samples tutorial *.nc
 recursive-exclude templates *.csv *.json *.py *.rst *.yml
 
 exclude .coveralls.yml

diff --git a/docs/conf.py b/docs/conf.py
@@ -132,6 +132,8 @@
 
 linkcheck_ignore = [
     r"https://github.com/Ouranosinc/xscen/(pull|issue).*",  # too labourious to fully check
+    r"https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.3803",  # Error 403: Forbidden
+    r"https://library.wmo.int/idurl/4/56300",  # HTTPconnectionPool error
 ]
 
 # Add any paths that contain templates here, relative to this directory.
@@ -174,6 +176,7 @@
     "_build",
     "Thumbs.db",
     ".DS_Store",
+    "notebooks/global_tas_average_obs.ipynb"
 ]
 
 # The name of the Pygments (syntax highlighting) style to use.

diff --git a/docs/goodtoknow.rst b/docs/goodtoknow.rst
@@ -82,3 +82,21 @@ As seen above, it can be useful to use the "special" sections of the config file
     warning:
         # warning_category : filter_action
         all: ignore
+
+
+Global warming dataset
+----------------------
+The :py:func:`xscen.extract.get_warming_level` and :py:func:`xscen.extract.subset_warming_level` functions use a custom made database of global temperature averages to find the global warming levels of known climate simulations. The database is stored as a netCDF file `inside the package itself <https://github.com/Ouranosinc/xscen/blob/main/xscen/data/IPCC_annual_global_tas.nc>`_. It stores the global temperature average (land and ocean) from 1850 to 2100 for multiple simulations (not all simulations cover the entire temporal range). Simulations are defined through 4 fields:
+
+- ``mip_era`` : "CMIP6", "CMIP5" or "obs" (see below)
+- ``source`` : The model name for GCM (same as the `source` column) and the driving model name for RCM (`driving_model` column)
+- ``experiment`` : The CMIP experiment name of the run. The "historical" and "pre-industrial" experiments have been merged into each future experiment (similar to what ``match_hist_and_fut`` does in :py:func:`search_data_catalogs`)
+- ``member`` : The realization variant label of the run (same as the `member` column)
+
+An extra ``data_source`` field is also available and describes how the data has been obtained:
+
+- "IPCC Atlas" : The timeseries was copied directly from the public data of the `IPCC Atlas <https://github.com/IPCC-WG1/Atlas/tree/main/datasets-aggregated-regionally/data>`_'
+- "From Amon" : The monthly temperature average was resampled annually and averaged over the globe using a cos-lat weighting
+- "From Amon with xscen" : Same, xscen was used to perform the computation.
+
+In addition to the climate simulations, a few "observational" datasets are made available in the database. The choice of datasets and the methodology was adapted from the WMO's `State of the Global Climate 2021 <https://library.wmo.int/idurl/4/56300>`_. However, to have some consistency between these and the simulated series, an estimated 1850-1900 mean temperature was added to the WMO-compliant anomalies to get absolute values. Keep in mind that this is only an estimation, the timeseries should only be used to compute anomalies. The observational series have a short dataset name in the ``source`` field, "obs" in ``mip_era`` and ``experiment``,  and an empty ``member`` (`""`). The ``data_source`` is noted : "Computed following WMO guidelines".
diff --git a/docs/index.rst b/docs/index.rst
@@ -27,12 +27,7 @@ Features
     readme
     installation
     goodtoknow
-    notebooks/1_catalog
-    notebooks/2_getting_started
-    notebooks/3_diagnostics
-    notebooks/4_ensemble_reduction
-    notebooks/5_warminglevels
-    notebooks/6_config
+    notebooks/index
     columns
     templates
     api

diff --git a/docs/notebooks/1_catalog.ipynb b/docs/notebooks/1_catalog.ipynb
@@ -1136,11 +1136,6 @@
    "lastCommId": null,
    "lastKernelId": null
   },
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
@@ -1151,7 +1146,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.16"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,

diff --git a/docs/notebooks/2_getting_started.ipynb b/docs/notebooks/2_getting_started.ipynb
@@ -1466,11 +1466,6 @@
    "lastCommId": null,
    "lastKernelId": null
   },
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
@@ -1481,7 +1476,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.11"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,

diff --git a/docs/notebooks/3_diagnostics.ipynb b/docs/notebooks/3_diagnostics.ipynb
@@ -474,11 +474,6 @@
    "lastCommId": null,
    "lastKernelId": null
   },
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",

diff --git a/docs/notebooks/4_ensemble_reduction.ipynb b/docs/notebooks/4_ensemble_reduction.ipynb
@@ -152,11 +152,6 @@
    "lastCommId": null,
    "lastKernelId": null
   },
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",

diff --git a/docs/notebooks/5_warminglevels.ipynb b/docs/notebooks/5_warminglevels.ipynb
@@ -108,16 +108,21 @@
     "\n",
     "If all that you want to know is the year or the period during which a climate model reaches a given warming level, then ``xs.get_warming_level`` is the function to use since you can simply give it a string or a list of strings and receive that information.\n",
     "\n",
-    "The arguments of ``xs.get_warming_level`` are:\n",
+    "The usual arguments of ``xs.get_warming_level`` are:\n",
     "\n",
-    "- `realization`: Dataset, string, or list of strings. Strings should follow the format 'mip-era_source_experiment_member'\n",
+    "- `realization`: Dataset, dict or string.\n",
+    "    * Strings should follow the format 'mip-era_source_experiment_member'. Those fields should be found in the dict or in the attributes of the dataset (allowing for a possible 'cat:' prefix).\n",
+    "    * In all cases, regex is allowed to relax the name matching.\n",
+    "    * The \"source\" part can also be a `driving_model` name. If a `Dataset` is passed and it's `driving_model` attribute is non-null, it is used.\n",
     "- `wl`: warming level.\n",
     "- `window`: Number of years in the centered window during which the warming level is reached. Note that in the case of an even number, the IPCC standard is used (-n/2+1, +n/2).\n",
     "- `tas_baseline_period`: The period over which the warming level is calculated, equivalent to \"+0°C\". Defaults to 1850-1900.\n",
-    "- `ignore_member`: The default `warming_level_csv` only contains data for 1 member. If you want a result regardless of the realization number, set this to True. This is only used when `models` is a Dataset.\n",
+    "- `ignore_member`: The default `tas_src` only contains data for 1 member. If you want a result regardless of the realization number, set this to True.\n",
     "- `return_horizon`: Whether to return the start/end of the horizon or to return the middle year.\n",
-    "    \n",
-    "If `realization` is a list, the function returns a dictionary. Otherwise, it will return either a string or ['start_yr', 'end_yr'], depending on `return_horizon`. For entries that it fails to find in the csv, or for instances where a given warming level is not reached, the function returns None."
+    "\n",
+    "It returns either a string or `['start_yr', 'end_yr']`, depending on `return_horizon`. For entries that it fails to find in the database, or for instances where a given warming level is not reached, the function returns None (or `[None, None]`).\n",
+    "\n",
+    "If `realization` is a list of the accepted types, or a DataArray or a DataFrame, the function returns a sequence of the same size (and with the same index, if relevant). It can happen that a requested model's name was not found exactly in the database, but that arguments allowed for a relaxed search (`ignore_member = True` or regex in  `realization`). In that case, the _selected_ model doesn't have the same name as the requested one and this information is only shown in the log, unless one passes `output='selected'` to receive a dictionary instead where the keys are the _selected_ models in the database."
    ]
   },
   {
@@ -129,7 +134,7 @@
    },
    "outputs": [],
    "source": [
-    "# Multiple entries, returns a dictionary\n",
+    "# Multiple entries, returns a list of the same length\n",
     "print(\n",
     "    xs.get_warming_level(\n",
     "        [\n",
@@ -140,7 +145,6 @@
     "        ],\n",
     "        wl=2,\n",
     "        window=20,\n",
-    "        return_horizon=False,\n",
     "    )\n",
     ")\n",
     "# Returns a list\n",
@@ -174,19 +178,19 @@
     "\n",
     "### Method #1: Subsetting datasets by warming level\n",
     "\n",
-    "``xs.subset_warming_level`` can be used to subset a dataset for a window over which a given global warming level is reached. Since the datasets will still have a 'time' axis after the subsetting, only a single warming level can be processed at a time, as to prevent overlapping `time` coordinates. A new dimension named `warminglevel` is created by the function, and thus it would in theory be possible to concatenate multiple results alongside that new dimension, but NaNs would need to be managed for subsequent steps.\n",
+    "``xs.subset_warming_level`` can be used to subset a dataset for a window over which a given global warming level is reached. A new dimension named `warminglevel` is created by the function.\n",
     "\n",
     "The function calls `get_warming_level`, so the arguments are essentially the same.:\n",
     "\n",
     "- `ds`: input dataset.\n",
     "- `wl`: warming level.\n",
     "- `window`: Number of years in the centered window during which the warming level is reached. Note that in the case of an even number, the IPCC standard is used (-n/2+1, +n/2).\n",
     "- `tas_baseline_period`: The period over which the warming level is calculated, equivalent to \"+0°C\". Defaults to 1850-1900.\n",
-    "- `ignore_member`: The default `warming_level_csv` only contains data for 1 member. If you want a result regardless of the realization number, set this to True.\n",
+    "- `ignore_member`: The default database only contains data for 1 member. If you want a result regardless of the realization number, set this to True.\n",
     "- `to_level`: Contrary to other methods, you can use \"{wl}\", \"{period0}\" and \"{period1}\" in the string to dynamically include `wl`, 'tas_baseline_period[0]' and 'tas_baseline_period[1]' in the `processing_level`.\n",
-    "- `wl_dim`: The string used to fill the new `warminglevel` dimension. You can use \"{wl}\", \"{period0}\" and \"{period1}\" in the string to dynamically include `wl`, `tas_baseline_period[0]` and `tas_baseline_period[1]`. If None, no new dimension will be added.\n",
+    "- `wl_dim`: The string used to fill the new `warminglevel` dimension. You can use \"{wl}\", \"{period0}\" and \"{period1}\" in the string to dynamically include `wl`, `tas_baseline_period[0]` and `tas_baseline_period[1]`. Or you can use `True` to have a float coordinate with units of °C. If None, no new dimension will be added.\n",
     "    \n",
-    "If the source, experiment, (member), and warming level are not found in the csv. The function returns None."
+    "If the source, experiment, (member), and warming level are not found in the database. The function returns None."
    ]
   },
   {
@@ -206,15 +210,56 @@
     "    frequency=\"day\",\n",
     ").to_dataset()\n",
     "\n",
-    "display(\n",
-    "    xs.subset_warming_level(\n",
-    "        ds,\n",
-    "        wl=2,\n",
-    "        window=20,\n",
-    "    )\n",
+    "xs.subset_warming_level(\n",
+    "    ds,\n",
+    "    wl=2,\n",
+    "    window=20,\n",
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "60f63d4d-38b3-48a1-af88-7170af99c9ed",
+   "metadata": {},
+   "source": [
+    "#### Vectorized subsetting\n",
+    "\n",
+    "The function can also vectorize the subsetting over multiple warming levels or over a properly constructed \"realization\" dimension. In that case, the original time axis can't be preserved. It is replaced by a fake one starting in 1000. However, as this process is a bit complex, the current xscen version only supports this if the data is annual. As the time axis doesn't carry any information, a `warminglevel_bounds` coordinate is added with the time bounds of the subsetting. If a warming level was not reached, a NaN slice is inserted in the output dataset.\n",
+    "\n",
+    "This option is to be used when \"scalar\" subsetting is not enough, but you want to do things differently than `produce_horizons`.\n",
+    "\n",
+    "Here, we'll open all experiments into a single ensemble dataset where the `realization` dimension is constructed exactly as `get_warming_level` expects it to be. We'll also average the daily data to an annual scale."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e394f0d6-3cb2-4e77-a221-fc1c1bd07475",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds = pcat.search(\n",
+    "    processing_level=\"extracted\",\n",
+    "    member=\"r1.*\",\n",
+    "    frequency=\"day\",\n",
+    ").to_dataset(\n",
+    "    # Value of the \"realization\" dimension will be constructed by concatenaing those fields with a '_'\n",
+    "    create_ensemble_on=[\"mip_era\", \"source\", \"experiment\", \"member\"]\n",
+    ")\n",
+    "ds = ds.resample(time=\"YS\").mean()\n",
+    "ds"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "74320396-09e8-40ea-80ee-56ec7ad567c1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "xs.subset_warming_level(ds, wl=[1.5, 2, 3], wl_dim=True, to_level=\"warming-level\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "c02b599e",
@@ -428,11 +473,6 @@
   }
  ],
  "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",

diff --git a/docs/notebooks/6_config.ipynb b/docs/notebooks/6_config.ipynb
@@ -368,11 +368,6 @@
   }
  ],
  "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",