You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Great write up, really interesting. I just want to give my 2 cents:
Nothing about the problems described are specific to GRIB.
Once you know where the bytes in each chunk are and how to decode them, every problem you have described applies to any chunked multi-dimensional array dataset.
Therefore, any solution to this should not be specific to GRIB at all! The only part that needs to be specific to GRIB is the part that determines the locations of the bytes and how to decode them. Every step after that (and I agree with a lot of your ideas) should not have anything to do with GRIB specifically. All your cool caching layer ideas can be implemented entirely in the language of zarr / icechunk / seamless arrays.
One way to do this would be to write a GRIB reader for VirtualiZarr. That's the only GRIB-specific step. This just brings us back to zarr-developers/VirtualiZarr#238, and thus finishes my attempt to nerd-snipe you :)
The text was updated successfully, but these errors were encountered:
Nothing about the problems described are specific to GRIB.
I agree 100%! (Sorry, I should've written more about this in the text!)
Some of this text was originally in the hypergrib repo. But, as you say, it's clear that a lot of this isn't specific to GRIB. So I (clumsily) created this new repo to draw a distinction between GRIB-specific things (like hypergrib) and ideas that are more general.
I definitely agree that the caching should be a separate project! (And you could imagine an MVP caching system being pretty simple: maybe just a ~100 lines of Python, connecting together existing tools).
I'm deliberately keeping hypergrib as a GRIB-specific thing (for now, at least) if only because my brain isn't capable of designing a general-purpose thing until I've built several special-purpose things! And because I want to see how fast we can go if we "cheat" and create a special-purpose multi-file GRIB reader. Although I'm optimistic that, over time, we can make it more and more general (whilst maintaining performance!).
Great write up, really interesting. I just want to give my 2 cents:
Nothing about the problems described are specific to GRIB.
Once you know where the bytes in each chunk are and how to decode them, every problem you have described applies to any chunked multi-dimensional array dataset.
Therefore, any solution to this should not be specific to GRIB at all! The only part that needs to be specific to GRIB is the part that determines the locations of the bytes and how to decode them. Every step after that (and I agree with a lot of your ideas) should not have anything to do with GRIB specifically. All your cool caching layer ideas can be implemented entirely in the language of zarr / icechunk / seamless arrays.
One way to do this would be to write a GRIB reader for VirtualiZarr. That's the only GRIB-specific step. This just brings us back to zarr-developers/VirtualiZarr#238, and thus finishes my attempt to nerd-snipe you :)
The text was updated successfully, but these errors were encountered: