Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nothing about this is specific to GRIB #2

Open
TomNicholas opened this issue Jan 10, 2025 · 1 comment
Open

Nothing about this is specific to GRIB #2

TomNicholas opened this issue Jan 10, 2025 · 1 comment

Comments

@TomNicholas
Copy link

TomNicholas commented Jan 10, 2025

Great write up, really interesting. I just want to give my 2 cents:

Nothing about the problems described are specific to GRIB.

Once you know where the bytes in each chunk are and how to decode them, every problem you have described applies to any chunked multi-dimensional array dataset.

Therefore, any solution to this should not be specific to GRIB at all! The only part that needs to be specific to GRIB is the part that determines the locations of the bytes and how to decode them. Every step after that (and I agree with a lot of your ideas) should not have anything to do with GRIB specifically. All your cool caching layer ideas can be implemented entirely in the language of zarr / icechunk / seamless arrays.

One way to do this would be to write a GRIB reader for VirtualiZarr. That's the only GRIB-specific step. This just brings us back to zarr-developers/VirtualiZarr#238, and thus finishes my attempt to nerd-snipe you :)

@JackKelly
Copy link
Owner

Great write up, really interesting

Thanks! I'm really glad you like it!

Nothing about the problems described are specific to GRIB.

I agree 100%! (Sorry, I should've written more about this in the text!)

Some of this text was originally in the hypergrib repo. But, as you say, it's clear that a lot of this isn't specific to GRIB. So I (clumsily) created this new repo to draw a distinction between GRIB-specific things (like hypergrib) and ideas that are more general.

I definitely agree that the caching should be a separate project! (And you could imagine an MVP caching system being pretty simple: maybe just a ~100 lines of Python, connecting together existing tools).

I'm deliberately keeping hypergrib as a GRIB-specific thing (for now, at least) if only because my brain isn't capable of designing a general-purpose thing until I've built several special-purpose things! And because I want to see how fast we can go if we "cheat" and create a special-purpose multi-file GRIB reader. Although I'm optimistic that, over time, we can make it more and more general (whilst maintaining performance!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants