-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensemble Methods to retrieve a data subset #361
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #361 +/- ##
==========================================
+ Coverage 95.16% 95.26% +0.09%
==========================================
Files 24 25 +1
Lines 1634 1667 +33
==========================================
+ Hits 1555 1588 +33
Misses 79 79 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good to me.
I am worried about the client management issues, but I feel we can file a separate issue for that if preferred
# turn off cleanups -- in the case where multiple ensembles are | ||
# using a client, an individual ensemble should not close the | ||
# client during an __exit__ or __del__ event. This means that | ||
# the client will not be closed without an explicit client.close() | ||
# call, which is unfortunate... not sure of an alternative way | ||
# forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel there is a path for some ugly answers to this (implement our own client manager with reference counting, each ensemble keeps track of its parents/children in a tree-like structure that gets updated on-exit, etc) but not sure we need to solve this now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, agree that this is something to keep an eye on, but feels out of scope for this PR, I'll make an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Doug!
Change Description
Resolves #356. Adds new methods for retrieving a subset of Ensemble data.
Solution Description
I wrote some solution details in #356, but the summary is that this PR adds one new method for returning a subset of objects and their sources, via
Ensemble.sample_objects
. And additionally overwrites the behavior of dasks partition slicing to also set the dirty flag (meaning partition slices will propagate from object to source and vice-versa), making slices on object or source partitions function as another valid method to sample the data.Code Quality
Project-Specific Pull Request Checklists
Bug Fix Checklist
New Feature Checklist
Documentation Change Checklist
Build/CI Change Checklist
Other Change Checklist