-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting notebooks from COSIMA Cookbook to ACCESS-NRI intake catalog #313
Comments
#310 seems relevant |
Largely this is up to the COSIMA community, but it is likely a choice you will be forced to make for a few reasons:
So I'd suggest taking this to a COSIMA meeting and making the community aware that a decision needs to be made, and probably the sooner the better. One downside of dropping the Cookbook is there is currently no GUI interface for searching the Intake Database. However there are some powerful methods for searching (filtering) the intake catalogue https://access-nri-intake-catalog.readthedocs.io/en/latest/usage/quickstart.html#data-discovery Also there are plans for some tools to assist with data discovery from ACCESS-NRI, and we're working on prototypes we hope to share soon. |
Thanks @aidanheerdegen! |
The advice we had from ACCESS-NRI on this back around the workshop last year was that at that time they didn’t advise switching everything to intake catalogue, because it was so new and untested. But @aidanheerdegen @dougiesquire if you think the intake catalogue is ready for us to switch everything over completely, i think we should do that. We discussed this previously at a COSIMA meeting and agreed we should switch when the time was right (which is maybe now?). |
Not much to add from me:
|
@anton-seaice makes a good point about retaining existing capability for those who rely on it, identifying deficiencies and setting up a new testing infrastructure. @dougiesquire is also correct that datasets that are missing from intake should be identified. @adele-morrison I'm not in a position to say if the intake catalogue is ready or not, but we're rapidly approaching a point where there isn't really an option, so we need to make sure it is. So, synthesising from above, some steps to consider:
Might be good to break this out into a GitHub project, make some issues etc. I would do this, but I'm not in a position to assist much I'm afraid. |
Could the culture be shifted going forward so that when someone / a group runs an experiment that is meant to be open to the community that creating an Intake-ESM datastore is part of the process of making it available and documented? My view is that it's the "authors" of specific experiments that are best placed to create and update datastores for the catalog? @dougiesquire appears to have done a great job documenting how that can be done for those who've never built a catalog before > I want to catalog my own data to make it easy to find and load This new catalog is a big step forward, IMO, but will require a community effort to maintain, in my view. |
Hi All, So from the MED team perspective, I asked @max-anu to look at converting the recipes to use the intake catalog. #310 is supposed to be a draft and everyone is welcome to participate and/or comment on it. Happy to discuss the plan going ahead. |
I have started a repo on the ACCESS-NRI organisation to automate testing of the recipes: This is based on what I have set up for the ESMValTool recipes: https://github.com/ACCESS-NRI/ESMValTool-workflow?tab=readme-ov-file#recipes-current-status Will keep you posted. |
This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there: |
@max-anu may have some updates. |
I've had a quick look at this and it seems that many of the changes @max-anu made are in the My only suggestion is for others to check the existing branch and then to merge? |
It's possible to split @max-anu work into multiple PRs if that helps for the workshop. We can do one / notebook. |
Yeah, I think that would be perfect! |
Agree!! |
Maybe this could be good practice for the group doing PR review training to hone their skills on during the hackathon? |
Sure. But a bunch of smaller PRs would be even better. And more manageable in general. Even the “PR review master” might have trouble dealing with a PR that they need to run all the notebooks |
Oh yes, that's what a meant. A bunch of small PRs and then each person in the training could work on reviewing one each. |
I'll have a look today or tomorrow. |
That's one way to do it @edoddridge . I want to keep @max-anu contributions. I'll have a look and report back to you. |
I think that level of usage is very much in the noise. To put some numbers on this, the estimated usage from ARE for an 8 hour session with XX-Large is
It's not a big deal. |
I think you/we/everybody should go full on using the ARE resources and save your time and effort for analyses, thinking, fun activities, etc. |
Jaja great! I might be scarred from my beginner days when I was told to get better at dask instead of opening bigger |
I'm trying to understand how to proceed here. Personally I'm hitting big slowdown issues, much worse then 2x reported above. @julia-neme seems to be doing so as well and we have an open channel of communications providing mental support to each other. But if me and @julia-neme are hitting these then I expect most COSIMA users to hit those as well. I see a bunch of kwargs here proposed by @anton-seaice. Do we need those? Should we always have them? If so, we need to have them in the PRs for the intake conversion... Otherwise they are hidden in a comment in an issue.
I'm a bit in despair. I'm very intrigued and interested in understanding the nitty gritty details of dask and xarray but in terms of what we suggests all COSIMAers to do, I would like to give them something that works. (Also for myself btw, most times I'd like just to copy a line and paste it and load the data -- I don't wanna be thinking of xarray internals etc.) If we should have these kwargs in then let's have them there all the time? At the moment I'm trying to push through several of the open PRs for intake conversion and I'm hitting these issues. Perhaps we should pause trying to penetrate those PRs until I understand what's happening? |
Btw I'm hitting these issues even when I'm trying to load 1 variable... so no concatenation (or at least no concatenation as far as I understand!) |
@navidcy could you please point me to an example of your issue(s)? |
I'll try to do that later... Or @adele-morrison / @julia-neme feel free to do so? #344 (comment) might be one? |
Apologies for the confusion about this. These kwargs help with sea-ice (cice) data but not with ocean (mom) data. Its similar to the MOM only add 1d coordinates (i.e. xt_ocean, yt_ocean and time) to its output: Whilst cice adds 2d coordinates to its output (geographic lons/lats at T and U points): When |
@navidcy, I've looked into this and replied, but I didn't encounter the big slowdown issues you refer to here. I think this one might mostly have been a case of comparing apples to oranges. |
I can see a large number of |
Would collating the branches collate the pull requests too? I think I've found working through these that most of the notebooks require more work than just an intake conversion, so merging into one pull request would potentially generate discussion on different topics all mixed up in one PR. |
@julia-neme oh no, absolutely not. I just meant doing the administrative task of working out where all the various branches/PRs are up to, and dumping that information in the issue description, given a lot of them aren't cross-linked (although I've just realized I can't edit the initial issue text myself). |
Ohh I guess there are a lot of branches with conversions already complete that could be cleaned up? |
@julia-neme could you edit the first comment on this issue and mark with ticks the recipes you know have been converted? |
Yep! I'll do on monday. Note that not all notebooks have a PR request requesting a conversion. I'm not sure how all those PRs/branches happened. |
Not that I've ever tried it before, so take this suggestion with a grain of salt, but we could even stand up sub-issues for each recipe that hasn't been completed yet: https://dev.to/keracudmore/create-sub-issues-in-github-issues-409m |
Hey @marc-white, I think we've updated the list at the beginning with @navidcy and should have all the links. Hope it is useful. |
Could someone with the appropriate superpowers please add me to the project? I was just stymied in my attempts to push back my changes to the #356 branch. |
added you; let me know if you still have issues |
Apart from my Outlook client's stubborn refusal to open the invitation email at all, that seems to have done it! Thanks @navidcy ! |
Why we need to have two ways for people to load output? At the moment, after #298, we have tutorials both for the cookbook and for the ACCESS-NRI Intake catalog
https://cosima-recipes.readthedocs.io/en/latest/tutorials.html
Let's just leave the best of the two. Is ACCESS-NRI Intake catalog the future? Let's make that the default in all examples and drop the cookbook then?
cc @angus-g, @aidanheerdegen, @AndyHoggANU, @aekiss, @adele-morrison, @anton-seaice, @PaulSpence, @rmholmes, @edoddridge, @micaeljtoliveira
Recipes
Bathymetry.ipynb(Intake conversion Bathymetry #352)Relative_Vorticity
; take #2 #426)Tutorials
ACCESS-OM2-GMD-Paper-Figs
The text was updated successfully, but these errors were encountered: