Coordination with the gridded project? #422
Replies: 3 comments
-
Hi @ChrisBarker-NOAA thanks very much for reaching out and providing a quite transparent overview of gridded! Your reasonings with going without xarray at the time make much sense. Though, as you mentioned, Xarray, indeed Pangeo stack in broader view, has come a LONG way. Xarray and Dask are thus considered as the cornerstones of our UXarray for the good of community. However, that shouldn't prevent us from having collaboration since there is much overlapping between our targeted problems and proposed solutions. We are already inspired by a number of community discussions and tools such as @Huite's Xarray Extension proposal: extension for UGRID and unstructured mesh utils, gridded, etc. That said, more coolaboration is always welcome! |
Beta Was this translation helpful? Give feedback.
-
Exactly -- where I'm not sure is whether a package like gridded (i.e. an standard API For working with data on arbitrary types of grids) can reasonably be built using the Xarray API directly -- or if it should be a wrapper around an xarray Dataset, The challenge is that the xarray API is very much about the arrays themselves -- working with indexes, etc -- and when you get to unstructured grids, that whole way of thinking is not applicable. For example, a gridded.Variable is NOT an array at all -- is has an array underneath that the user can access if they want, but the idea is that that's not the usual use case. Rather, it is an abstraction for a field,and you can get the value of that an any lat-lon (or x,y) location, without knowing, (or caring) about how the information is stored. But I'll poke a bit more into what you're doing with uxarray, and see where you are going with it. -CHB |
Beta Was this translation helpful? Give feedback.
-
Stumbled on this discussion (also some similar things discussed in NOAA-ORR-ERD/gridded#55).
My take on this: (Disclaimer -- Having worked on Xarray indexes for a while I certainly have a biased point of view on this! Also a lot of progress as been made on the Xarray side since @ChrisBarker-NOAA's last comment of this discussion): While the Xarray API is indeed very much about the arrays themselves, all the recent developments in Xarray turned Dataset / DataArray into very flexible and extensible containers. With a combination of coordinates (data + metadata), custom Xarray indexes and accessors, it is possible to extend xarray DataArray or Dataset with a lot of capabilities and API way beyond the array-centric Xarray API. You could store and implement almost anything in Xarray indexes and/or accessors, even things that are not strictly array based. I agree that in theory Xarray Dataset and DataArray are probably not the right level of abstraction for representing grid fields and that a higher level of abstraction certainly makes more sense. That said, it is convenient to deal with objects that we are already familiar with. Building on the various extensibility mechanisms provided by Xarray may be a good practical choice. You still get a very array-based representation (repr) but you could have all the high-level "physical world" API at your fingertips, organized in a tidy way (I think). @ChrisBarker-NOAA since you mention "world coordinates" you might be interested by this discussion sunpy/ndcube#222 where there's an example of an Xarray "WCSIndex" that may be relevant here too. Many other examples of indexes are gathered here: pydata/xarray#7041 (still very much work in progress). |
Beta Was this translation helpful? Give feedback.
-
I just noticed a post on the UGRID gitHub project referring to uxarray, and it reminded me to reach out.
I'm the primary developer behind gridded:
https://github.com/NOAA-ORR-ERD/gridded
In a way, its goals are pretty similar to UXarray, except:
It's completely different API -- more purpose specific, and not like the xarray API at all. This is for two reasons:
xarray was immature and not ready for real use when we started gridded
xarray is (was?) very array / index oriented, and despite numerous conversations with the xarray team, we didn't see a way to apply the same API to the problems we needed to solve.
What problems are those? fundamentally, being able to work in natively "world" coordinates -- not having to know or care about the underlying arrays or indexes, etc.
However, I gave a talk about gridded at an AMS conference a few years ago, and a common question was "can I do [this thing] like I do with xarray? So folks do indeed want the xarray API.
So -- XArray has come a LONG way in the last ten years, and it may be time to revisit the whole thing.
Also -- we've had trouble maintaining gridded to support features that are really useful, but not what we actively needed for our work -- the whole enterprise needs a broader community. And we've been thinking a major refactor is in order anyway.
All that being said, there's some useful code and ideas in gridded that may be helpful -- I hope you've at least looked at it, and let us know if you have any questions / ideas, etc, and/or are open to more collaboration.
Beta Was this translation helpful? Give feedback.
All reactions