-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADD: objects can override is_dict_like() #8613
base: main
Are you sure you want to change the base?
Conversation
By setting obj._xarray_dict_like_, any object obj can now tell xarray to treat it like it is (or is not) a dict. This is useful if using custom objects for coordinates, when those objects store information in a dict-like format, but that information is not related to xarray and there's no reason for xarray to treat the objects as dicts.
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
Thanks for the PR. I don't have a strong view but one quick question — could you instead implement the methods that get checked on the custom object? |
Thanks for your response. I had considered that, but unfortunately the def is_dict_like(value: Any) -> TypeGuard[Mapping]:
return hasattr(value, "keys") and hasattr(value, "__getitem__") Ideally I would like a solution that allows me to keep the |
Ah I see, thanks. I wonder if we could alternatively check that doesn't have The downside is that something like https://pypi.org/project/frozendict/ — which is Hashable — won't work to lookup on MultiIndexes. Ideally I think we should avoid these sorts of custom attributes — they make xarray's implementation more complicated, increase the cardinality of its interface. And often this sort of issue is emblematic of a deeper issue — should we be doing something quite different depending on the type of the argument / is there a more lower-level API even if On the lower-level API point — does the equivalent of FYI @Sevans711 one thing that would be useful here is a reproducible example, with a minimal class that has those methods. |
I don't see a reason why we could not simply remove this check (I didn't look at the code, possibly requires some refactoring). This would solve the issue at hand and for other use cases raise a more low level error (I assume). Just a note: @Sevans711 are you sure you want to build your code like this? You are giving away lots of the power of xarray by using this approach. It would be much more intuitive (and faster!) if you would use the Plasma fluid properties directly as coordinates. |
another option would be to wrap the value you're looking for in a 0D array with In general I agree, though, this is a pretty un-idiomatic use of xarray. |
Thank you all for your comments. Replying to the various points you made:
|
Just to throw out some ideas, the data you're describing seems like a particularly well fit for |
Thanks for pointing this out, I didn't know about this! I will continue to consider it, though I don't know if it is the best fit for me. To clarify my use-case: all my data is homogeneous, of the same type and with no holes or variable-length lists. I believe the only thing I'm doing that doesn't quite follow the usual patterns is using objects for coordinates, instead of a simple type like int/float/str. There's no reason my data couldn't work with numpy, but I'm using xarray to make it easier to track all the coordinates and dimensions :) |
If all of the objects are fixed-size data structures (i.e. all instances in the array have the same set of named fields), then Awkward Array would be overkill. Looking at Such a thing could be implemented as records within records, but it could also be represented as nested columns, like using Pandas's >>> pd.DataFrame(
... data=[[1, 1.1, "one"], [2, 2.2, "two"], [3, 3.3, "three"]],
... columns=pd.MultiIndex.from_tuples([("a1", "x"), ("a1", "y"), ("a2", "z")])
... )
a1 a2
x y z
0 1 1.1 one
1 2 2.2 two
2 3 3.3 three In xarray, would xarray-datatree work? On the other hand, maybe I've misunderstood and only the class PlasmaFluid:
...
def __index__(self): # or maybe __int__
return self.i |
This is correct, only the It is a good idea to implement What are your thoughts on having the is_dict_like() check return False for objects with an I want to briefly mention another benefit of including the PlasmaFluid objects as coordinates, maybe it will be more convincing than my other examples:
|
Bumping, since it has been a little while. What are the steps required to get this PR accepted? I would be happy to implement a different check for Also happy to wait longer if you need more time to consider. Mostly I just want to avoid this PR getting buried / forgotten. I am motivated to work on getting this finished and pulled into main! Ty for any assistance. |
Seems like the general request is to allow indexing of object-type arrays with objects. Can we just skip the check if the dimension being indexed is backed by an |
This seems worth trying. I'm not sure exactly how multiindexes will work, but if tests pass it's probably fine... |
Yes I think we can safely remove this special case now. Selection on both single and multi-indexes used to be handled internally through the same function that was supporting only dimension coordinate indexers. This is not the case anymore. |
Awesome! If this will be getting fixed imminently and sometime in the not-too-distant future (maybe within a few months?) I am happy to wait. I suspect it would be much less work for someone who is more familiar with the codebase than me. However if you need me to attempt to remove this special case please let me know, and I'll see what I can do. |
By setting
obj._xarray_dict_like_
, any objectobj
can now tell xarray to treat it like it is (or is not) a dict. Note: objects without the_xarray_dict_like_
attribute will be unaffected; this change is fully backwards-compatible.There are no relevant issue reports, tests added, user visible changes, or new functions / methods.
This is my first contribution to xarray, I tried my best to follow the contribution guidelines; please let me know if I am missing anything essential!
Motivation
I have been using xarray in my personal project pretty successfully (xarray is a great package, and I am very grateful for it)! However, I sometimes use dict-like objects as coordinates. xarray seems to be okay with this most of the time, but fails when it comes to things like the
DataArray.sel
method. I would really like to be able to usesel
, since my dict-like objects are only dict-like for reasons completely unrelated to xarray.For example, I built a PlasmaFluid class which stores information about different fluids. And, I use instances of that class as coordinates for my DataArray. E.g.:
This code works just fine, and produces a nicely-labeled DataArray, where the
fluid
dimension contains the actualPlasmaFluid
objects, e.g.u_x.isel(fluid=0) is f0
.However, it fails when I try to use the
sel
method, e.g.:After implementing the change in this PR, I am able to avoid this issue, and can use the
sel
method by updating the class I wrote to includePlasmaFluid._xarray_dict_like_ = False
.Minimal Reproducible Example
As per @max-sixty's suggestion (thank you!) here is an example minimal reproducible example you can copy paste to see the issue I'm referring to: