-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HyCOM Public Zarr #163
Comments
Ohhhh that looks really cool! @dhruvbalwada might be interested in this? I guess this will work, but might not be as fast as on gcs. Wondering if we should have a badge for the 'cloud'? But either way, this would be dope to link in. |
Would be great to link to this! |
Great! @dhruvbalwada if you have some background on this dataset, do you have any interest in doing a bit of exploring on which of these Zarr stores would be useful? Seems like there are Zarr stores per variable as well as lagrangian vs eulerian versions. |
@norlandrhagen I think all of these will potentially be useful. (This dataset is very complementary to a LLC4320 data that was made available through Pangeo, and has been used by many). Is the discussion here to just provide a link to these datasets? or is something that will cost LEAP and so we have some resource constrain? |
@dhruvbalwada the former. It will be very beneficial to get an idea how to present these stores in the catalog in a meaningful way. |
Happy to help with that, let me know what you would like me to actually do. |
Awesome! Thanks for the expertise @dhruvbalwada. I think a good start would be to see if you can access / catalog these Zarr stores. I think the data is here, but I haven't explored it yet. Also might be some clues here. The data producer / speaker, Shane Elipot, seems super nice and was eager to have people using his data. I bet you/we could reach out to him with questions. I think ideally we have a table of Zarr stores we want to add to the catalog + some metadata. ex:
|
Just played around with the data a bit, and wanted to note some points:
import s3fs
import xarray as xr
fs = s3fs.S3FileSystem(anon=True)
mapper = fs.get_mapper("s3://hycom-global-drifters/lagrangian/global_hycom_0m_step_1.zarr")
xr.open_dataset(mapper, engine='zarr') but this doesnt:
|
This seems like a cool use case! Maybe we open up an issue in virtualizarr. It seems possible to merge the virtual zarrs. |
Bit of an update here. I'm working towards a Virtualizarr Zarr reader (zarr-developers/VirtualiZarr#262). This should allow us to combine a bunch of existing Zarr stores here into a single |
Do these steps have alignable dimensions though? If not then you're in DataTree territory... |
Totally, which would be very cool to have in virtualizarr as well! On some initial digging through the Eulerian stores, it seems like the step section of the path corresponds to time. In the README:
import s3fs
import xarray as xr
fs = s3fs.S3FileSystem(anon=True)
ds1 = xr.open_zarr(fs.get_mapper("s3://hycom-global-drifters/eulerian/hycom12-1-rechunked-corr.zarr"), chunks={})
ds2 = xr.open_zarr(fs.get_mapper("s3://hycom-global-drifters/eulerian/hycom12-2-rechunked-corr.zarr"), chunks={})
ds_concat = xr.concat([ds1,ds2], dim="time")
|
I'm sorry what? There are 24 separate zarr stores, just to hold different timesteps and different variables??? 🤦♂️ |
I mean at least you can use your new zarr virtual reader to combine them all into one sane icechunk store @norlandrhagen 😆 |
Lets not be too judgy, I bet this was one hell of a lift to produce and get into zarr (yeah I realize me advocating for not being judgy is rich 🤣 - overworked data manager hat off). But I agree that it is very nice that we can now combine this and make it more usable! Is each step a different release date for a bunch of floats (in which case we have to probably concatenate along a new dimension) or are these literally just split along a common time dimension. |
Just requested support for |
Yes you're right - it's easy for me to say after being steeped in Zarr for the last year! 😅 Also yes even getting something into a slightly wonky Zarr store can be a huge amount of work. I'm just a bit concerned by the anti-pattern/misunderstanding this implies, of treating each Zarr store like a single chunked netCDF4 file, instead of treating one Zarr store as representing thousands of related netCDF files.
Does this dataset contain individual drifter timeseries? Looks like its post-processed into a regular grid? |
Maybe once/if my ZarrV2 virtualizarr reader is in, we can share a reference that has all the combined zarr stores back to the data provider. |
100% aligned here! I think this might need sone deeper understanding of the data. I think the original data is acrual float timeseries (and the eulerian stiff is an aggregation of stats!) |
FWIW, the speaker / data producer Shane Elipot seemed very eager for people to use this dataset. I think he would be happy to chat about any design choices etc. |
I'm at the pangeo showcase talk. Shane Elipot has a massive public ocean model Zarr output on the AWS public data program. I think it's split into 12 separate Zarr stores.
https://github.com/selipot/hycom-oceantrack?tab=readme-ov-file
Wondering if LEAP folks would find this useful? @jbusecke
The text was updated successfully, but these errors were encountered: