-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data-store graph window efficiency/refactor #5319
data-store graph window efficiency/refactor #5319
Conversation
66bb079
to
c20d9b5
Compare
e0b2763
to
500c6bc
Compare
Results are in, it's a winner 🏆! This approach scales much better. The old approach only started to bite after ~20 tasks, the new approach keeps the same scaling as before suggesting that
Memory usage is not effected (good news): |
Here's some analysis of the
|
Could you rebase onto 8.1.x please, we can put this straight into the 8.1.1 release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have tested out and it's a hell of a speedup for increment_graph_window()
!
However I think this is a good opportunity to improve some of the dev documentation to help better understand the code
flow_nums, | ||
is_parent=False, | ||
itask=None | ||
): | ||
"""Create task-point element populated with static data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Create task-point element populated with static data. | |
"""Create TaskProxy populated with static data. |
And for what purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not TaskProxy
, it is the data-store task proxy node (PBTaskProxy
)..
TaskProxy
object is used for alot of the attributes to fill out this protobuf class i.e.
cylc-flow/cylc/flow/data_store_mgr.py
Line 1162 in 259956b
def _process_internal_task_proxy(self, itask, tproxy): |
and others.. It's also convenient that the pool is made up of TaskProxy, so that can be passed in to gather the info from.
500c6bc
to
d99db21
Compare
Have rebased onto
|
source_tokens (cylc.flow.id.Tokens) | ||
point (PointBase) | ||
flow_nums (set) | ||
edge_distance (int): | ||
Graph distance from active/origin node. | ||
active_id (str): | ||
Active/origin node id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm still a little confused as to the difference between source_tokens
and active_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
active_id
is the ID of the active task proxy (the origin about which the graph is expanded), it is passed through the recursive calls to increment_graph_window
by it's children/parents (stays constant with this graph walk).
The purpose is to keep an association of that task's window nodes/branch (children/parents), so we know what to prune (or check for pruning) when that active task leaves the pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source_tokens
will change throughout the recursion; from the active pool task's (itask.tokens
) to the child/parent tokens generated in the graph walk.
I changed to using source_tokens
argument (from name
) to avoid more Token
duplication with creating an id
@MetRonnie - What is this Nevermind! it's both of ours:
Time for sleep 🛏️ |
FYI: I've found another |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is scaling much more nicely, thanks @dwsutherland!
Tested it out with the graph view to make sure the generated edges still match expectations. Good news is that the graph view is able to handle this horrible graph:
Well, kinda.
if e_id in self.n_window_edges[active_id]: | ||
return | ||
if ( | ||
e_id not in self.data[self.workflow_id][EDGES] | ||
and e_id not in self.added[EDGES] | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor]
Can these be combined?
if e_id in self.n_window_edges[active_id]: | |
return | |
if ( | |
e_id not in self.data[self.workflow_id][EDGES] | |
and e_id not in self.added[EDGES] | |
): | |
if ( | |
e_id in self.n_window_edges[active_id] | |
or ( | |
e_id not in self.data[self.workflow_id][EDGES] | |
and e_id not in self.added[EDGES] | |
) | |
): | |
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that be:
if (
e_id in self.n_window_edges[active_id]
or e_id in self.data[self.workflow_id][EDGES]
or e_id in self.added[EDGES]
):
return
?
previously there was more further on outside this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to "Time for sleep 🛏️"? 😅
(will open a collective changelog for the efficiency changes in 8.1.1) |
Partially addresses #5315
Supersedes #5316
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
andconda-environment.yml
.CHANGES.md
entry included if this is a change that can affect users?.?.x
branch.