Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast MBCn (a la groupies) #1580

Merged
merged 113 commits into from
Jul 24, 2024
Merged

Fast MBCn (a la groupies) #1580

merged 113 commits into from
Jul 24, 2024

Conversation

coxipi
Copy link
Contributor

@coxipi coxipi commented Jan 9, 2024

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
    • This PR fixes #xyz
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • CHANGES.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

New MBCn TrainAdjust class. The train part finds adjustment factors for the npdf transform. The adjust part does the rest.

  • A single numpy function to perform all rotations of the npdf_transform makes the process faster
  • Grouping is handled using the same logic as in numpy_groupies. I initially tried to stop using map_blocks by using what I call a the Big Dataset (BD) solution. It was a dataset that included the group windowed blocks. This was working well but sometimes caused dask workers to die. Maybe a better chunking could have solved this problem. But instead of constructing a BD, we simply loop over blocks, and simply specify time indices in each block (à la groupies) in the original datasets. The resulting code is a bit more messy, but it seems to be working well performance-wise.

The function also changes how windowed group blocks are handled throughout the computation. Now, a block is preserved its form from begin to start of the MBCn computation.

  • This is in contrast to the current way which was grouping and ungrouping block between each iteration of the NpdfTransform.
  • The standardization is performed on a block
  • The univariate bias correction is maintainted as blocks, reordered, then the blocks are ungrouped
  • In the sdba notebook, it was suggested that we should give the univariate bias corrected datasets in the npdf transform. But following (Cannon, 2018), we should input the raw datasets in the npdf transform. This change should not really matter that much, but still, to perform exactly the MBCn as presented by Cannon, this change is necessary.

All these changes will result in a different output for window>1 and our implementation should now match that of Cannon.

Does this PR introduce a breaking change?

No

Other information

  • It might be worthwhile to retest map_blocks to see if, with the rest of changes, it can offer a good performance. It would be cleaner code
  • Using BD would also simplify many things, worth re-exploring if it can maintain the performance

@aulemahal
Copy link
Collaborator

Woups. I didn't mean to approve, only to comment.

Copy link
Collaborator

@aulemahal aulemahal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an actual approval.

Grosse et belle job @coxipi! Deux morceaux de robots : 🤲 🤖 🤖.

@github-actions github-actions bot added the approved Approved for additional tests label Jul 19, 2024
xclim/sdba/_adjustment.py Outdated Show resolved Hide resolved
@coxipi
Copy link
Contributor Author

coxipi commented Jul 19, 2024

Nice!

Merci pour la review qui a dû demander du jus aussi! J'essaie de faire plus de PRs moins grosses...

@coxipi coxipi merged commit 1d91900 into main Jul 24, 2024
19 checks passed
@coxipi coxipi deleted the npdf_gpies branch July 24, 2024 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved for additional tests docs Improvements to documenation indicators Climate indices and indicators sdba Issues concerning the sdba submodule.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants