-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REQUEST]: HighResMIP HadGEM vars for ocean sound speed calculation #72
Comments
ping @gzt5142 |
Ok I merged #73, lets see how that goes...thumbs pressed |
Hmm the full runs failed with
Here are the respective dataflow @cisaacstern what could cause this issue? I will try to dig into this a bit more later. Sorry for the delay. |
When we processed these with Kerchunk we did them as two datasets: Should we have split the issue into two different issues, with:
in one and
in the other? |
@rsignell no that should not be an issue. We process every dataset separately anyways. There is something else going on. Will look into this after the talk 😆 |
The regular hypercube error appears to be the same issue discussed in pangeo-forge/pangeo-forge-recipes#520. If so, this would seem to be a corner case bug relating to certain chunking scenarios. As documented on the linked issue, the next step there is for @tom-the-hill to open a PR with a minimal reproducer failing test. |
@cisaacstern or @jbusecke Any updates here? |
Unfortunately we are still blocked by the above bug in PGF-recipes. The good news is that once that is fixed we might be able to get the data straight into the public bucket! Is there a concrete deadline? |
Nope. No deadline. I was just curious about the status. Thanks! |
Ok I think we have removed the main roadblock here temporarily (gnarly, gnarly stuff really). def zstore_to_iid(zstore: str):
# this is a bit whacky to account for the different way of storing old/new stores
return '.'.join(zstore.replace('gs://','').replace('.zarr','').replace('.','/').split('/')[-11:-1])
iids_requested = [
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.so.gn.v20200514',
'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.Omon.so.gn.v20200514',
'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.Omon.thetao.gn.v20200514',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.thetao.gn.v20200514',
]
import intake
# uncomment/comment lines to swap catalogs
url = "https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog.json"
col = intake.open_esm_datastore(url)
iids_all= [zstore_to_iid(z) for z in col.df['zstore'].tolist()]
iids_uploaded = [iid for iid in iids_all if iid in iids_requested]
iids_uploaded Since we got 'some' of the urls during #73, I am currently assuming this will just resolve itself with time, but there might be some bugs either in pangeo-forge-esgf or the ESGF API itself that do not return all the supposedly available urls (I feel that #119 might be similar). Lets keep an eye on this for now and ill try to investigate more deeply later. |
I just checked, and they are all ingested! Please reopen if there are issues on your end. |
Meeep. That is not looking great... I hope I can fix these soon (#76 is relevant). |
Darn. Very much still interested in this @jbusecke. Thanks for continuing to push! |
Ill get there... |
Check this out @andreall!! |
Just as a headsup, these jobs are absolutely massive, and for now I have to babysit them manually. For whatever reason the other experiment_id did seem unavailable at the time of running these (from the ESGF side), so please feel free to ping me in a few days. Eventually I think we will be able to handle such large datasets better with more efficient downloading upstream. |
Yes, the HiRESMIP data is massive. This is only 2 variables! |
List of requested idds
Description
We are working on Climate Change impacts on sound speed, and this dataset is believed to be the best for this purpose. This is about 3TB of data files.
The text was updated successfully, but these errors were encountered: