Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dOTC #1787

Merged
merged 106 commits into from
Aug 7, 2024
Merged

dOTC #1787

Show file tree
Hide file tree
Changes from 97 commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
08c8ae8
OTC
LamAdr Jun 17, 2024
51d8d67
comments
LamAdr Jun 17, 2024
56f60c5
more comments
LamAdr Jun 17, 2024
366dd82
keep_attrs and remove _adjust_src
LamAdr Jun 17, 2024
771a1b3
dOTC
LamAdr Jun 18, 2024
b9725ea
Merge remote-tracking branch 'origin/main' into dOTC
LamAdr Jun 18, 2024
984c663
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 18, 2024
98a8085
pre-compile
LamAdr Jun 18, 2024
709d1ba
Merge remote-tracking branch 'origin/dOTC' into dOTC
LamAdr Jun 18, 2024
ef0dc58
keep the right attrs
LamAdr Jun 18, 2024
456976c
restructure
LamAdr Jun 18, 2024
e7c7123
map_groups
LamAdr Jun 19, 2024
0d64085
Inputs are Datasets
LamAdr Jun 19, 2024
f6122ec
MultivariateAdjust from PR#1580
LamAdr Jun 25, 2024
7a19690
test OTC vs SBCK
LamAdr Jun 25, 2024
092151b
series_Dataset -> series_dataset
LamAdr Jun 25, 2024
8a10e3d
dropna, rename core_dims and reindex_like
LamAdr Jun 25, 2024
2f34427
Merge branch 'main' into dOTC
Zeitsperre Jun 25, 2024
6307d45
test dOTC vs SBCK
LamAdr Jun 26, 2024
36e3125
dOTC rename time, reindex
LamAdr Jun 26, 2024
1f24e89
reindex correction
LamAdr Jun 26, 2024
6b31c4f
Merge branch 'main' of https://github.com/Ouranosinc/xclim into dOTC
LamAdr Jun 26, 2024
9cca4e1
Merge branch 'dOTC' of https://github.com/LamAdr/xclim into dOTC
LamAdr Jun 26, 2024
51cb784
sqeuclidean
LamAdr Jul 2, 2024
2e3ca2a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 2, 2024
326f1a8
numItermax doc
LamAdr Jul 2, 2024
8246d8d
revert Multivariate classes
LamAdr Jul 2, 2024
d615fea
Merge branch 'main' into dOTC
Zeitsperre Jul 2, 2024
0347d59
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 2, 2024
b41812d
Update sdba_utils.py
Zeitsperre Jul 2, 2024
229dbe8
inherits from Adjust
LamAdr Jul 2, 2024
d7fb1ac
update tests
LamAdr Jul 2, 2024
60861cb
use map_groups dim
LamAdr Jul 2, 2024
8e3a8a9
Adjust class pass as kwarg and signature changes
LamAdr Jul 3, 2024
88d57dd
docstrings
LamAdr Jul 3, 2024
b073548
pre-merge
LamAdr Jul 3, 2024
0790f3a
numItermax -> num_iter_max
LamAdr Jul 3, 2024
07e049e
cholesky ref
LamAdr Jul 3, 2024
ae1e6ec
sim _is_hist
LamAdr Jul 3, 2024
615eb06
test_shape
LamAdr Jul 3, 2024
8b67168
kind
LamAdr Jul 3, 2024
37de293
spray_bins
LamAdr Jul 3, 2024
c79ad13
spry_bins formulation
LamAdr Jul 4, 2024
5740984
speedup
LamAdr Jul 4, 2024
3d72ea3
halt cholesky
LamAdr Jul 4, 2024
ea3a6bc
Sturges is evil
LamAdr Jul 4, 2024
685b507
Scott
LamAdr Jul 4, 2024
602ed3a
window
LamAdr Jul 4, 2024
05e7b87
remove
LamAdr Jul 8, 2024
69fe3b8
POT optional
LamAdr Jul 8, 2024
b892331
pot opt 2
LamAdr Jul 8, 2024
531ec22
pot opt 3
LamAdr Jul 8, 2024
045a87a
sort codespell
LamAdr Jul 9, 2024
e58ba0d
test NAs
LamAdr Jul 9, 2024
da31098
authors
LamAdr Jul 9, 2024
3ef5cf1
cholesky iterations
LamAdr Jul 10, 2024
7239002
docstrings
LamAdr Jul 10, 2024
386d2e3
kind bug
LamAdr Jul 11, 2024
7644100
dask enforces map_blocks shape
LamAdr Jul 11, 2024
ad8e7a1
pts_dim
LamAdr Jul 11, 2024
fdf432b
pot opt 4
LamAdr Jul 11, 2024
6c02dda
opt dep tests and workflows
LamAdr Jul 11, 2024
041c4c5
doc
LamAdr Jul 15, 2024
791d7a5
Merge branch 'main' into dOTC
Zeitsperre Jul 15, 2024
3e30b1f
distance unit is now bin_width
LamAdr Jul 15, 2024
0e550a9
POT optional dependence doc
LamAdr Jul 15, 2024
5a87120
adapt_freq_thresh
LamAdr Jul 15, 2024
23c8fc9
Merge branch 'dOTC' of https://github.com/LamAdr/xclim into dOTC
LamAdr Jul 15, 2024
4e1796e
do not adapt_freq sim
LamAdr Jul 16, 2024
0a8b1c2
comment
LamAdr Jul 16, 2024
d9f94c2
dropna how=all
LamAdr Jul 16, 2024
7320b1b
notebook
LamAdr Jul 17, 2024
e0aeed5
docstrings
LamAdr Jul 17, 2024
d0a7f8b
changelog
LamAdr Jul 17, 2024
25168c4
Merge branch 'main' into dOTC
Zeitsperre Jul 23, 2024
7267993
spray_bins -> jitter_inside_bins
LamAdr Jul 23, 2024
8a64a87
test dask
LamAdr Jul 23, 2024
d67b8b1
Merge branch 'dOTC' of https://github.com/LamAdr/xclim into dOTC
LamAdr Jul 23, 2024
eb31e88
@ looks cool
LamAdr Jul 23, 2024
915587e
cov_factor
LamAdr Jul 23, 2024
5a4997f
Merge branch 'main' into dOTC
Zeitsperre Jul 24, 2024
0781a0a
Merge branch 'dOTC' of https://github.com/LamAdr/xclim into dOTC
LamAdr Jul 24, 2024
3384cef
convert thresh units
LamAdr Jul 24, 2024
ad6240b
missing adapt_freq_thresh
LamAdr Jul 24, 2024
8ce3cb0
validate cov_factor
LamAdr Jul 24, 2024
c61de5c
'identity' -> None + notebook explanation
LamAdr Jul 24, 2024
dc860cb
cholesky rescale
LamAdr Jul 25, 2024
2d46bfa
transform options
LamAdr Aug 5, 2024
d2f4940
transform doc
LamAdr Aug 5, 2024
908289b
adapt_freq_thresh modification
LamAdr Aug 5, 2024
02b2e66
adapt_freq_thresh bug 2
LamAdr Aug 5, 2024
da5c51d
notebook transform
LamAdr Aug 5, 2024
7e0a30b
bin_* are dicts
LamAdr Aug 5, 2024
b15efc0
differences from SBCK
LamAdr Aug 5, 2024
5f7b747
bin_* can be float
LamAdr Aug 6, 2024
afa0432
kind and adapt_freq_thresh can be str
LamAdr Aug 6, 2024
7b6c4ad
notebook correlations : center title
LamAdr Aug 6, 2024
dd375a7
Merge branch 'main' into dOTC
Zeitsperre Aug 7, 2024
c6ec66f
list acceptable transforms
LamAdr Aug 7, 2024
b3e3a31
Merge branch 'dOTC' of https://github.com/LamAdr/xclim into dOTC
LamAdr Aug 7, 2024
df05bdd
docstring ORs
LamAdr Aug 7, 2024
9575db2
orcid
LamAdr Aug 7, 2024
167c406
Merge branch 'main' into dOTC
Zeitsperre Aug 7, 2024
b66e05a
importorskip SBCK
LamAdr Aug 7, 2024
1ddaa96
Merge branch 'dOTC' of https://github.com/LamAdr/xclim into dOTC
LamAdr Aug 7, 2024
73aa04c
Merge branch 'main' into dOTC
Zeitsperre Aug 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,9 @@
"affiliation": "Santander Meteorology Group, Instituto de Física de Cantabria (CSIC-UC), Santander, Spain",
"orcid": "0000-0001-9053-2542"
},
{
"name": "Lamarche, Adrien"
LamAdr marked this conversation as resolved.
Show resolved Hide resolved
},
{
"name": "Wang, Hui-Min",
"affiliation": "National University of Singapore, Singapore, Singapore",
Expand Down
3 changes: 2 additions & 1 deletion AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,5 @@ Contributors
* Dante Castro <[email protected]> `@profesorpaiche <https://github.com/profesorpaiche>`_
* Sascha Hofmann <[email protected]> `@saschahofmann <https://github.com/saschahofmann>`_
* Javier Diez-Sierra <[email protected]> `@JavierDiezSierra <https://github.com/JavierDiezSierra>`_
* Hui-Min Wang `@Hem-W <https://github.com/Hem-W>`
* Hui-Min Wang `@Hem-W <https://github.com/Hem-W>`_
* Adrien Lamarche `@LamAdr <https://github.com/LamAdr>`_
5 changes: 3 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,13 @@ Changelog

v0.52.0 (unreleased)
--------------------
Contributors to this version: David Huard (:user:`huard`), Trevor James Smith (:user:`Zeitsperre`), Hui-Min Wang (:user:`Hem-W`), Éric Dupuis (:user:`coxipi`), Sarah Gammon (:user:`SarahG-579462`).
Contributors to this version: David Huard (:user:`huard`), Trevor James Smith (:user:`Zeitsperre`), Hui-Min Wang (:user:`Hem-W`), Éric Dupuis (:user:`coxipi`), Sarah Gammon (:user:`SarahG-579462`), Adrien Lamarche (:user:`LamAdr`).

New features and enhancements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* ``xclim.sdba.nbutils.quantile`` and its child functions are now faster. If the module `fastnanquantile` is installed, it is used as the backend for the computation of quantiles and yields even faster results. (:issue:`1255`, :pull:`1513`).
* New multivariate bias adjustment class `MBCn`, giving a faster and more accurate implementation of the 'MBCn' algorithm (:issue:`1551`, :pull:`1580`).
* New multivariate bias adjustment class ``MBCn``, giving a faster and more accurate implementation of the 'MBCn' algorithm. (:issue:`1551`, :pull:`1580`).
* New multivariate bias adjustment classes ``OTC`` and ``dOTC``. (:pull:`1787`).

Bug fixes
^^^^^^^^^
Expand Down
226 changes: 218 additions & 8 deletions docs/notebooks/sdba.ipynb
LamAdr marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,7 @@
"metadata": {},
"source": [
"### Third example : Multi-method protocol - Hnilica et al. 2017\n",
"\n",
"In [their paper of 2017](https://doi.org/10.1002/joc.4890), Hnilica, Hanel and Puš present a bias-adjustment method based on the principles of Principal Components Analysis.\n",
"\n",
"The idea is simple: use principal components to define coordinates on the reference and on the simulation, and then transform the simulation data from the latter to the former. Spatial correlation can thus be conserved by taking different points as the dimensions of the transform space. The method was demonstrated in the article by bias-adjusting precipitation over different drainage basins.\n",
Expand Down Expand Up @@ -439,9 +440,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fourth example : Multivariate bias-adjustment (Cannon, 2018)\n",
"### Fourth example : Dynamical Optimal Transport Correction - Robin et al. 2019\n",
"Robin, Vrac, Naveau and Yiou presented the dOTC multivariate bias correction method in a [2019 paper](https://hess.copernicus.org/articles/23/773/2019/).\n",
"\n",
"This section replicates the \"MBCn\" algorithm described by [Cannon (2018)](https://doi.org/10.1007/s00382-017-3580-6). The method relies on some univariate algorithm, an adaption of the N-pdf transform of [Pitié et al. (2005)](https://ieeexplore.ieee.org/document/1544887/) and a final reordering step.\n",
"Here, we use optimal transport to find mappings between reference, simulated historical and simulated future data. Following these mappings, future simulation is corrected by applying the temporal evolution of model data to the reference.\n",
"\n",
"In the following, we use the Adjusted and Homogenized Canadian Climate Dataset ([AHCCD](https://open.canada.ca/data/en/dataset/9c4ebc00-3ea4-4fe0-8bf2-66cfe1cddd1d)) and CanESM2 data as reference and simulation, respectively, and correct both `pr` and `tasmax` together."
]
Expand All @@ -452,12 +454,14 @@
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"from xclim.core.units import convert_units_to\n",
"from xclim.testing import open_dataset\n",
"\n",
"dref = open_dataset(\n",
" \"sdba/ahccd_1950-2013.nc\", chunks={\"location\": 1}, drop_variables=[\"lat\", \"lon\"]\n",
").sel(time=slice(\"1981\", \"2010\"))\n",
"dref = open_dataset(\"sdba/ahccd_1950-2013.nc\", drop_variables=[\"lat\", \"lon\"]).sel(\n",
" time=slice(\"1981\", \"2010\")\n",
")\n",
"\n",
"# Fix the standard name of the `pr` variable.\n",
"# This allows the convert_units_to below to infer the correct CF transformation (precip rate to flux)\n",
Expand All @@ -468,12 +472,218 @@
" tasmax=convert_units_to(dref.tasmax, \"K\"),\n",
" pr=convert_units_to(dref.pr, \"kg m-2 s-1\"),\n",
")\n",
"dsim = open_dataset(\n",
" \"sdba/CanESM2_1950-2100.nc\", chunks={\"location\": 1}, drop_variables=[\"lat\", \"lon\"]\n",
")\n",
"dsim = open_dataset(\"sdba/CanESM2_1950-2100.nc\", drop_variables=[\"lat\", \"lon\"])\n",
"\n",
"dhist = dsim.sel(time=slice(\"1981\", \"2010\"))\n",
"dsim = dsim.sel(time=slice(\"2041\", \"2070\"))\n",
"dref"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we are going to correct the precipitations multiplicatively to make sure they don't become negative. In this context, small precipitation values can lead to huge aberrations. This problem can be mitigated with `adapt_freq_thresh`. We also need to stack our variables into a `dataArray` before feeding them to `dOTC`.\n",
"\n",
"Since the precipitations are treated multiplicatively, we have no choice but to use \"std\" for the `cov_factor` argument (the default), which means the rescaling of model data to the observed data scale is done independently for every variable. In the situation where one only has additive variables, it is recommended to use the \"cholesky\" `cov_factor`, in which case the rescaling is done in a multivariate fashion."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ref = dref.where(dref.location == \"Amos\", drop=True).squeeze()\n",
"hist = dhist.where(dhist.location == \"Amos\", drop=True).squeeze()\n",
"sim = dsim.where(dsim.location == \"Amos\", drop=True).squeeze()\n",
"\n",
"ref = sdba.processing.stack_variables(ref)\n",
"hist = sdba.processing.stack_variables(hist)\n",
"sim = sdba.processing.stack_variables(sim)\n",
"\n",
"# This function has random components\n",
"np.random.seed(0)\n",
"\n",
"# Contrary to most algorithms in sdba, dOTC has no `train` method\n",
"scen = sdba.adjustment.dOTC.adjust(\n",
" ref,\n",
" hist,\n",
" sim,\n",
" kind={\n",
" \"pr\": \"*\"\n",
" }, # Since this bias correction method is multivariate, `kind` must be specified per variable\n",
" adapt_freq_thresh={\"pr\": \"3.5e-4 kg m-2 s-1\"}, # Idem\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Some analysis figures\n",
"\n",
"# Unstack variables and select a location\n",
"ref = sdba.processing.unstack_variables(ref)\n",
"hist = sdba.processing.unstack_variables(hist)\n",
"sim = sdba.processing.unstack_variables(sim)\n",
"scen = sdba.processing.unstack_variables(scen)\n",
"\n",
"fig = plt.figure(figsize=(10, 10))\n",
"gs = plt.matplotlib.gridspec.GridSpec(2, 2, fig)\n",
"ax_pr = plt.subplot(gs[0, 0])\n",
"ax_tasmax = plt.subplot(gs[0, 1])\n",
"ax_scatter = plt.subplot(gs[1, :])\n",
"\n",
"# Precipitation\n",
"hist.pr.plot(ax=ax_pr, color=\"c\", label=\"Simulation (past)\")\n",
"ref.pr.plot(ax=ax_pr, color=\"b\", label=\"Reference\", alpha=0.5)\n",
"sim.pr.plot(ax=ax_pr, color=\"y\", label=\"Simulation (future)\")\n",
"scen.pr.plot(ax=ax_pr, color=\"r\", label=\"Corrected\", alpha=0.5)\n",
"ax_pr.set_title(\"Precipitation\")\n",
"\n",
"# Maximum temperature\n",
"hist.tasmax.plot(ax=ax_tasmax, color=\"c\")\n",
"ref.tasmax.plot(ax=ax_tasmax, color=\"b\", alpha=0.5)\n",
"sim.tasmax.plot(ax=ax_tasmax, color=\"y\")\n",
"scen.tasmax.plot(ax=ax_tasmax, color=\"r\", alpha=0.5)\n",
"ax_tasmax.set_title(\"Maximum temperature\")\n",
"\n",
"# Scatter\n",
"ref.plot.scatter(x=\"tasmax\", y=\"pr\", ax=ax_scatter, color=\"b\", edgecolors=\"k\", s=20)\n",
"scen.plot.scatter(x=\"tasmax\", y=\"pr\", ax=ax_scatter, color=\"r\", edgecolors=\"k\", s=20)\n",
"sim.plot.scatter(x=\"tasmax\", y=\"pr\", ax=ax_scatter, color=\"y\", edgecolors=\"k\", s=20)\n",
"hist.plot.scatter(x=\"tasmax\", y=\"pr\", ax=ax_scatter, color=\"c\", edgecolors=\"k\", s=20)\n",
"ax_scatter.set_title(\"Variables distribution\")\n",
"\n",
"# Example mapping\n",
"max_time = scen.pr.idxmax().data\n",
"max_idx = np.where(scen.time.data == max_time)[0][0]\n",
"\n",
"scen_x = scen.tasmax.isel(time=max_idx)\n",
"scen_y = scen.pr.isel(time=max_idx)\n",
"sim_x = sim.tasmax.isel(time=max_idx)\n",
"sim_y = sim.pr.isel(time=max_idx)\n",
"\n",
"ax_scatter.scatter(scen_x, scen_y, color=\"r\", edgecolors=\"k\", s=30, linewidth=1)\n",
"ax_scatter.scatter(sim_x, sim_y, color=\"y\", edgecolors=\"k\", s=30, linewidth=1)\n",
"\n",
"prop = dict(arrowstyle=\"-|>,head_width=0.3,head_length=0.8\", facecolor=\"black\", lw=1)\n",
"ax_scatter.annotate(\"\", xy=(scen_x, scen_y), xytext=(sim_x, sim_y), arrowprops=prop)\n",
"\n",
"ax_pr.legend()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import gaussian_kde\n",
"\n",
"fig = plt.figure(figsize=(10, 5))\n",
"gs = plt.matplotlib.gridspec.GridSpec(1, 2, fig)\n",
"\n",
"tasmax = plt.subplot(gs[0, 0])\n",
"pr = plt.subplot(gs[0, 1])\n",
"\n",
"sim_t = sim.tasmax.to_numpy()\n",
"scen_t = scen.tasmax.to_numpy()\n",
"stack = np.vstack([sim_t, scen_t])\n",
"z = gaussian_kde(stack)(stack)\n",
"idx = z.argsort()\n",
"sim_t, scen_t, z = sim_t[idx], scen_t[idx], z[idx]\n",
"tasmax.scatter(sim_t, scen_t, c=z, s=1, cmap=\"viridis\")\n",
"tasmax.set_title(\"Tasmax\")\n",
"tasmax.set_ylabel(\"scen tasmax\")\n",
"tasmax.set_xlabel(\"sim tasmax\")\n",
"\n",
"sim_p = sim.pr.to_numpy()\n",
"scen_p = scen.pr.to_numpy()\n",
"stack = np.vstack([sim_p, scen_p])\n",
"z = gaussian_kde(stack)(stack)\n",
"idx = z.argsort()\n",
"sim_p, scen_p, z = sim_p[idx], scen_p[idx], z[idx]\n",
"pr.scatter(sim_p, scen_p, c=z, s=1, cmap=\"viridis\")\n",
"pr.set_title(\"Pr\")\n",
"pr.set_ylabel(\"scen pr\")\n",
"pr.set_xlabel(\"sim pr\")\n",
"\n",
"fig.suptitle(\"Correlations between input and output per variable\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This last plot shows the correlation between input and output per variable. Here we see a relatively strong correlation for all variables, meaning they are all taken into account when finding the optimal transport mappings. This is because we're using the (by default) `transform = 'max_distance'` argument. Were the data not transformed, the distances along the precipitation dimension would be very small relative to the temperature distances. Precipitation values would then be spread around at very low cost and have virtually no effect on the result. See this in action with `transform = None`.\n",
"\n",
"The chunks we see in the tasmax data are artefacts of the `bin_width`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fifth example : Multivariate bias-adjustment with multiple steps (Cannon, 2018)\n",
"\n",
"This section replicates the \"MBCn\" algorithm described by [Cannon (2018)](https://doi.org/10.1007/s00382-017-3580-6). The method relies on some univariate algorithm, an adaption of the N-pdf transform of [Pitié et al. (2005)](https://ieeexplore.ieee.org/document/1544887/) and a final reordering step.\n",
"\n",
"As in the dOTC example, we use the Adjusted and Homogenized Canadian Climate Dataset ([AHCCD](https://open.canada.ca/data/en/dataset/9c4ebc00-3ea4-4fe0-8bf2-66cfe1cddd1d)) and CanESM2 data as reference and simulation, respectively, and correct both `pr` and `tasmax` together. This time, we chunk our data with Dask."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dref[\"pr\"] = dref.pr.chunk({\"location\": 1})\n",
"dref[\"tasmax\"] = dref.tasmax.chunk({\"location\": 1})\n",
"\n",
"dhist[\"pr\"] = dhist.pr.chunk({\"location\": 1})\n",
"dhist[\"tasmax\"] = dhist.tasmax.chunk({\"location\": 1})\n",
"\n",
"dsim[\"pr\"] = dsim.pr.chunk({\"location\": 1})\n",
"dsim[\"tasmax\"] = dsim.tasmax.chunk({\"location\": 1})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Perform an initial univariate adjustment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# additive for tasmax\n",
"QDMtx = sdba.QuantileDeltaMapping.train(\n",
" dref.tasmax, dhist.tasmax, nquantiles=20, kind=\"+\", group=\"time\"\n",
")\n",
"# Adjust both hist and sim, we'll feed both to the Npdf transform.\n",
"scenh_tx = QDMtx.adjust(dhist.tasmax)\n",
"scens_tx = QDMtx.adjust(dsim.tasmax)\n",
"\n",
"# remove == 0 values in pr:\n",
"dref[\"pr\"] = sdba.processing.jitter_under_thresh(dref.pr, \"0.01 mm d-1\")\n",
"dhist[\"pr\"] = sdba.processing.jitter_under_thresh(dhist.pr, \"0.01 mm d-1\")\n",
"dsim[\"pr\"] = sdba.processing.jitter_under_thresh(dsim.pr, \"0.01 mm d-1\")\n",
"\n",
"# multiplicative for pr\n",
"QDMpr = sdba.QuantileDeltaMapping.train(\n",
" dref.pr, dhist.pr, nquantiles=20, kind=\"*\", group=\"time\"\n",
")\n",
"# Adjust both hist and sim, we'll feed both to the Npdf transform.\n",
"scenh_pr = QDMpr.adjust(dhist.pr)\n",
"scens_pr = QDMpr.adjust(dsim.pr)\n",
"\n",
"# Stack variables : Dataset -> DataArray with `multivar` dimension\n",
"dref, dhist, dsim = (sdba.stack_variables(da) for da in (dref, dhist, dsim))"
Expand Down
51 changes: 51 additions & 0 deletions docs/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -2152,3 +2152,54 @@ @article{droogers2002
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-0036464359&doi=10.1023%2fA%3a1015508322413&partnerID=40&md5=7322aaa4c6874878f5b1dab3c73c1718},
type = {Article}
}

@Article{robin_2019,
author = {Robin, Y. and Vrac, M. and Naveau, P. and Yiou, P.},
title = {Multivariate stochastic bias corrections with optimal transport},
journal = {Hydrology and Earth System Sciences},
volume = {23},
year = {2019},
number = {2},
pages = {773--786},
url = {https://hess.copernicus.org/articles/23/773/2019/},
doi = {10.5194/hess-23-773-2019}
}

@misc{robin_2021,
title = {{SBCK}: {Statistical} {Bias} {Correction} {Kit}},
copyright = {GPL-3},
shorttitle = {{SBCK}},
url = {https://github.com/yrobink/SBCK-python},
urldate = {2024-07-03},
author = {Robin, Yoann},
year = {2021},
}

@article{higham_1988,
title = {Computing a nearest symmetric positive semidefinite matrix},
journal = {Linear Algebra and its Applications},
volume = {103},
pages = {103-118},
year = {1988},
issn = {0024-3795},
doi = {https://doi.org/10.1016/0024-3795(88)90223-6},
url = {https://www.sciencedirect.com/science/article/pii/0024379588902236},
author = {Nicholas J. Higham},
abstract = {The nearest symmetric positive semidefinite matrix in the Frobenius norm to an arbitrary real matrix A is shown to be (B + H)/2, where H is the symmetric polar factor of B=(A + AT)/2. In the 2-norm a nearest symmetric positive semidefinite matrix, and its distance δ2(A) from A, are given by a computationally challenging formula due to Halmos. We show how the bisection method can be applied to this formula to compute upper and lower bounds for δ2(A) differing by no more than a given amount. A key ingredient is a stable and efficient test for positive definiteness, based on an attempted Choleski decomposition. For accurate computation of δ2(A) we formulate the problem as one of zero finding and apply a hybrid Newton-bisection algorithm. Some numerical difficulties are discussed and illustrated by example.}
}

@article{knol_1989,
title = "Least-squares approximation of an improper correlation matrix by a proper one",
abstract = "An algorithm is presented for the best least-squares fitting correlation matrix approximating a given missing value or improper correlation matrix. The proposed algorithm is based upon a solution for Mosier's oblique Procrustes rotation problem offered by ten Berge and Nevels. A necessary and sufficient condition is given for a solution to yield the unique global minimum of the least-squares function. Empirical verification of the condition indicates that the occurrence of non-optimal solutions with the proposed algorithm is very unlikely. A possible drawback of the optimal solution is that it is a singular matrix of necessity. In cases where singularity is undesirable, one may impose the additional nonsingularity constraint that the smallest eigenvalue of the solution be δ, where δ is an arbitrary small positive constant. Finally, it may be desirable to weight the squared errors of estimation differentially. A generalized solution is derived which satisfies the additional nonsingularity constraint and also allows for weighting. The generalized solution can readily be obtained from the standard “unweighted singular” solution by transforming the observed improper correlation matrix in a suitable way.",
keywords = "Missing value correlation, indefinite correlation matrix, IR-85889, tetrachoric correlation, constrained least-squares approximation",
author = "Knol, {Dirk L.} and {ten Berge}, {Jos M.F.}",
year = "1989",
doi = "10.1007/BF02294448",
language = "Undefined",
volume = "54",
pages = "53--61",
journal = "Psychometrika",
issn = "0033-3123",
publisher = "Springer",
number = "1",
}
5 changes: 3 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ docs = [
"sphinxcontrib-bibtex",
"sphinxcontrib-svg2pdfconverter[Cairosvg]"
]
extras = ["fastnanquantile"]
extras = ["fastnanquantile", "POT"]
all = ["xclim[dev]", "xclim[docs]", "xclim[extras]"]

[project.scripts]
Expand Down Expand Up @@ -162,7 +162,7 @@ values = [

[tool.codespell]
skip = 'xclim/data/*.json,docs/_build,docs/notebooks/xclim_training/*.ipynb,docs/references.bib,__pycache__,*.gz,*.nc,*.png,*.svg,*.whl'
ignore-words-list = "absolue,astroid,bloc,bui,callendar,degreee,environnement,hanel,inferrable,lond,nam,nd,ressources,socio-economic,sie,vas"
ignore-words-list = "absolue,astroid,bloc,bui,callendar,degreee,environnement,hanel,inferrable,lond,nam,nd,ot,ressources,socio-economic,sie,vas"

[tool.coverage.run]
relative_files = true
Expand All @@ -176,6 +176,7 @@ pep621_dev_dependency_groups = ["all", "dev", "docs"]
[tool.deptry.package_module_name_map]
"scikit-learn" = "sklearn"
"pyyaml" = "yaml"
"POT" = "ot"

[tool.deptry.per_rule_ignores]
DEP001 = ["SBCK"]
Expand Down
Loading