Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix map_blocks for DataArray with cftime and sdba_encode_cf #1674

Merged
merged 3 commits into from
Mar 12, 2024

Conversation

aulemahal
Copy link
Collaborator

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • CHANGES.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduced

tl;dr : me fix issue.

I think this issue stems from a change in recent xarray, but the case was untested by xclim. The solution was to iterate over a list that is "disconnected" from the dictionary.

The issue occurs when a DataArray that has a coordinate made of cftime objects is passed to a map_blocks-wrapped function and xclim's global option sdba_encode_cf is True.

The previous encoding code was iterating over the coordinates with coords.items(). When it would modify a the cftime coord, this would fail as you can't modify a dictionary while it is being iterated upon.

Only happens if the option to encode is activated, only happens if there is a cftime coord and only happens on DataArray objects. I'm not 100% sure why, but I think Dataset.coords.items does not map directly to a dict. We usually use Dataset objects in function wrapped by map_blocks (or map_groups). The culprit here was the polynomial's trend computation which uses a DataArray.

Reason for the apparent dask-dependency

The interesting part is that this bug does not depend on the fact that the data is using dask or not. But it led me to discover another "issue" or, rather, "edgecase" .

In xscen.adjust we wrap the xclim call with options:

        with xc.set_options(sdba_encode_cf=True, sdba_extra_output=False):
            out = ADJ.adjust(sim_sel, **xclim_adjust_args)

Here, for DQM, the adjust method is wrapped by map_blocks. When sim_sel uses dask, the actual adjustment is, of course, not executed here. Only when the computation is triggered will the adjustment function be run, and only then will it call the polynomial detrending. But at that moment, the sdba_encode_cf option is not set anymore! So the issue was not raised for the "dask" case.

When sim_sel was not using dask, all the code was executed under the option statement and thus the issue was raised. Which is a bit dumb because the encoding is a micro-optimization made for dask, it's no use otherwise.

Does this PR introduce a breaking change?

No

@aulemahal aulemahal requested a review from huard March 7, 2024 21:53
@github-actions github-actions bot added the sdba Issues concerning the sdba submodule. label Mar 7, 2024
Copy link
Collaborator

@huard huard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Edgy edge case.

xclim/sdba/detrending.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the approved Approved for additional tests label Mar 8, 2024
@coveralls
Copy link

Coverage Status

coverage: 90.196%. remained the same
when pulling b733b04 on fix-sdbaiter
into c16a8dc on main.

@Zeitsperre Zeitsperre merged commit bea276a into main Mar 12, 2024
19 checks passed
@Zeitsperre Zeitsperre deleted the fix-sdbaiter branch March 12, 2024 15:01
aulemahal added a commit to Ouranosinc/xscen that referenced this pull request Mar 15, 2024
<!-- Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [] This PR addresses an already opened issue (for bug fixes /
features)
    - This PR fixes #xyz
- [ ] (If applicable) Documentation has been added / updated (for bug
fixes / features).
- [ ] (If applicable) Tests have been added.
- [x] This PR does not seem to break the templates.
- [ ] CHANGES.rst has been updated (with summary of main changes).
- [ ] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added.

### What kind of change does this PR introduce?
Instead of setting the options only when calling xclim's `adjust`, we
set the options globally at import time.

This fixes the subtle issue noted in Ouranosinc/xclim#1674 : the context
options are not seen by all code when dask-backed data is used. as the
actual computation occurs outside of the context.

It also allows to change these options at run time.
`xc.set_options(sdba_encode_cf=False)` will fix #362, for example. This
was not possible earlier as the option context was within the function
and couldn't be modified by the user.

### Does this PR introduce a breaking change?
Yes, issue #362 will now occur on dask arrays too, until xclim's PR is
merged. But it can be circumvented as shown above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved for additional tests sdba Issues concerning the sdba submodule.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sdba map_blocks fails with non-dask inputs when sdba_encode_cf option is activated
4 participants