-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataTree: fetch inherited coords on demand #9187
DataTree: fetch inherited coords on demand #9187
Conversation
…not rely on attribute-like access
_children: dict[str, DataTree] | ||
_attrs: dict[Hashable, Any] | None | ||
_cache: dict[str, Any] | ||
_coord_names: set[Hashable] | ||
_dims: dict[Hashable, int] | ||
_local_coord_names: set[Hashable] | ||
_inherited_coord_names: set[Hashable] | ||
_local_dims: dict[Hashable, int] | ||
_encoding: dict[Hashable, Any] | None | ||
_close: Callable[[], None] | None | ||
_indexes: dict[Hashable, Index] | ||
_variables: dict[Hashable, Variable] | ||
_local_indexes: dict[Hashable, Index] | ||
_local_variables: dict[Hashable, Variable] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only new internal attribute used here is _inherited_coord_names
. Some of the others are renamed just to clarify that they only hold the information local to this node.
# set tree attributes | ||
self._children = {} | ||
self._parent = None | ||
self._set_node_data(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _post_attach_recursively(self: DataTree, parent: DataTree) -> None: | ||
for k in parent._coord_names: | ||
if k not in self._variables: | ||
self._inherited_coord_names.add(k) | ||
|
||
for child in self._children.values(): | ||
child._post_attach_recursively(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After alignment has been checked, we build the list of inheritable coordinate names.
def _get_inherited_coord_var(self: DataTree, key: str) -> Variable: | ||
for node in self.parents: | ||
if key in node._local_coord_names: | ||
return node._local_variables[key] | ||
|
||
raise Exception( | ||
"should never get here - means we didn't do our alignment / construction properly" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method, along with the various ._*
properties below, is what substitutes for all the ChainMap
stuff in #9063.
Here we take the approach that we iterate up the tree to go fetch the inherited coordinate variable only when we need it. (As opposed to pre-fetching all of them and storing all of them on each child node under a ChainMap
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is basically a very literal way to do the "search by proximity" in the CF conventions.
@property | ||
def _inherited_variables(self: DataTree) -> Mapping[Hashable, Variable]: | ||
return { | ||
k: self._get_inherited_coord_var(k) for k in self._inherited_coord_names | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We build these lists of inherited variables/coords/dims/indexes dynamically, instead of trying to save them as static properties.
self._variables if inherited else self._local_variables, | ||
self._coord_names if inherited else self._local_coord_names, | ||
self._dims if inherited else self._local_dims, | ||
self._attrs, | ||
self._indexes, | ||
self._indexes if inherited else self._local_indexes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .to_dataset
method becomes really simple, because the internal data model of the DataTree
node is a close as possible to that of a Dataset
.
# nodes should include inherited dimensions | ||
assert dt["b"].sizes == {"x": 2, "y": 1} | ||
assert dt["c"].sizes == {"x": 2, "y": 3} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer why would I expect these dimensions to be present if the parent node only has them on data variables, not coordinates?
with pytest.raises(ValueError, match=expected_msg): | ||
DataTree(data=xr.Dataset(coords={"x": [1]}), children={"b": b}) | ||
|
||
def test_inconsistent_grandchild_indexes(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two grandchild
tests don't work because of some as-yet unsolved bug in my implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this #9063 (comment)
@property | ||
def _item_sources(self) -> Iterable[Mapping[Any, Any]]: | ||
"""Places to look-up items for key-completion""" | ||
yield self.data_vars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got horrible recursion errors from the attribute-like access method, and I just took the shortcut of deleting it for now.
Closing as replaced by #9063 |
This takes the new tests from #9063 and makes (most of) them pass using a different implementation. I didn't get to the IO part but this should be enough to show you what I was imagining.
cc @shoyer
@keewis @owenlittlejohns @flamingbear @eni-awowale I'm interested if any of you see advantages in either mine or @shoyer's approach here (or don't care)
whats-new.rst
api.rst