Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataTree: fetch inherited coords on demand #9187

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Jun 28, 2024

This takes the new tests from #9063 and makes (most of) them pass using a different implementation. I didn't get to the IO part but this should be enough to show you what I was imagining.

cc @shoyer

@keewis @owenlittlejohns @flamingbear @eni-awowale I'm interested if any of you see advantages in either mine or @shoyer's approach here (or don't care)

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Jun 28, 2024
Comment on lines 373 to +382
_children: dict[str, DataTree]
_attrs: dict[Hashable, Any] | None
_cache: dict[str, Any]
_coord_names: set[Hashable]
_dims: dict[Hashable, int]
_local_coord_names: set[Hashable]
_inherited_coord_names: set[Hashable]
_local_dims: dict[Hashable, int]
_encoding: dict[Hashable, Any] | None
_close: Callable[[], None] | None
_indexes: dict[Hashable, Index]
_variables: dict[Hashable, Variable]
_local_indexes: dict[Hashable, Index]
_local_variables: dict[Hashable, Variable]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only new internal attribute used here is _inherited_coord_names. Some of the others are renamed just to clarify that they only hold the information local to this node.

Comment on lines +437 to +440
# set tree attributes
self._children = {}
self._parent = None
self._set_node_data(data)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all just the same refactor of constructors that @shoyer did in #9063.

Comment on lines +472 to +478
def _post_attach_recursively(self: DataTree, parent: DataTree) -> None:
for k in parent._coord_names:
if k not in self._variables:
self._inherited_coord_names.add(k)

for child in self._children.values():
child._post_attach_recursively(self)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After alignment has been checked, we build the list of inheritable coordinate names.

Comment on lines +559 to +566
def _get_inherited_coord_var(self: DataTree, key: str) -> Variable:
for node in self.parents:
if key in node._local_coord_names:
return node._local_variables[key]

raise Exception(
"should never get here - means we didn't do our alignment / construction properly"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method, along with the various ._* properties below, is what substitutes for all the ChainMap stuff in #9063.

Here we take the approach that we iterate up the tree to go fetch the inherited coordinate variable only when we need it. (As opposed to pre-fetching all of them and storing all of them on each child node under a ChainMap.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is basically a very literal way to do the "search by proximity" in the CF conventions.

Comment on lines +568 to +572
@property
def _inherited_variables(self: DataTree) -> Mapping[Hashable, Variable]:
return {
k: self._get_inherited_coord_var(k) for k in self._inherited_coord_names
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We build these lists of inherited variables/coords/dims/indexes dynamically, instead of trying to save them as static properties.

Comment on lines +530 to +534
self._variables if inherited else self._local_variables,
self._coord_names if inherited else self._local_coord_names,
self._dims if inherited else self._local_dims,
self._attrs,
self._indexes,
self._indexes if inherited else self._local_indexes,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .to_dataset method becomes really simple, because the internal data model of the DataTree node is a close as possible to that of a Dataset.

Comment on lines +721 to +723
# nodes should include inherited dimensions
assert dt["b"].sizes == {"x": 2, "y": 1}
assert dt["c"].sizes == {"x": 2, "y": 3}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer why would I expect these dimensions to be present if the parent node only has them on data variables, not coordinates?

with pytest.raises(ValueError, match=expected_msg):
DataTree(data=xr.Dataset(coords={"x": [1]}), children={"b": b})

def test_inconsistent_grandchild_indexes(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two grandchild tests don't work because of some as-yet unsolved bug in my implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this #9063 (comment)

@property
def _item_sources(self) -> Iterable[Mapping[Any, Any]]:
"""Places to look-up items for key-completion"""
yield self.data_vars
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got horrible recursion errors from the attribute-like access method, and I just took the shortcut of deleting it for now.

@TomNicholas
Copy link
Member Author

Closing as replaced by #9063

@TomNicholas TomNicholas closed this Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
Development

Successfully merging this pull request may close these issues.

1 participant