Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QuantileDeltaMapping adjustment in Dask fails if training dataset has been loaded #1678

Closed
2 tasks done
saschahofmann opened this issue Mar 13, 2024 · 0 comments · Fixed by #1679
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@saschahofmann
Copy link
Contributor

Setup Information

  • Xclim version: 0.47.0
  • Python version: 311.6
  • Operating System: Ubuntu 18.04

Description

When I modify the example from the docs to run the QuantileDeltaMapping in Dask, it fails during adjustment if I load the dataset after training.

As you can see in the example below, all I did is add chunking to ref and sim. I then load the qdm.ds to make sure that training is performed. In my real world example, I want to do this because I will adjust multiple simulations with the same training dataset. In fact, the same xclim example recommends doing this.

The error message (TypeError: object of type 'NoneType' has no len()) points to xclim/sdba/base.py:606.
These are the lines

                badchunks.update(
                    {
                        dim: chunks.get(dim)
                        for dim in reduced_dims
                        if len(chunks.get(dim)) > 1
                    }

reduced_dims are groups and quantiles which aren't in chunks after I called load on qdm.ds. That's why chunks.get(dim) return NoneType which explains the error message.

In fact, just 2 lines above xclim seems to anticipate this case by returning an empty list by default (L599 if len(chunks.get(dim, [])) > 1).

I suggest to use the same approach to fix this issue. Happy to make a PR.

Steps To Reproduce

import xarray as xr
import numpy as np
from xclim.sdba import QuantileDeltaMapping

t = xr.cftime_range("2000-01-01", "2030-12-31", freq="D", calendar="noleap")
ref = xr.DataArray(
    (
        -20 * np.cos(2 * np.pi * t.dayofyear / 365)
        + 2 * np.random.random_sample((t.size,))
        + 273.15
        + 0.1 * (t - t[0]).days / 365
    ),  # "warming" of 1K per decade,
    dims=("time",),
    coords={"time": t},
    attrs={"units": "K"},
).chunk({"time": -1})
sim = xr.DataArray(
    (
        -18 * np.cos(2 * np.pi * t.dayofyear / 365)
        + 2 * np.random.random_sample((t.size,))
        + 273.15
        + 0.11 * (t - t[0]).days / 365
    ),  # "warming" of 1.1K per decade
    dims=("time",),
    coords={"time": t},
    attrs={"units": "K"},
).chunk({"time": -1})

ref = ref.sel(time=slice(None, "2015-01-01"))
hist = sim.sel(time=slice(None, "2015-01-01"))

qdm = QuantileDeltaMapping.train(ref, hist, group="time", kind="+")
qdm.ds.load()
adjusted = qdm.adjust(sim)

Additional context

No response

Contribution

  • I would be willing/able to open a Pull Request to address this bug.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@saschahofmann saschahofmann added the bug Something isn't working label Mar 13, 2024
Zeitsperre added a commit that referenced this issue Mar 21, 2024
See issue #1678 for an explanation of the problem

### What kind of change does this PR introduce?

Returns an empty list if a reduced dimension is not in the dataset
chunks in xclim/sdba/base.py map_blocks when checking for badchunks

### Does this PR introduce a breaking change?
No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant