Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data-store graph window efficiency/refactor #5319

Merged
merged 1 commit into from
Jan 24, 2023

Conversation

dwsutherland
Copy link
Member

@dwsutherland dwsutherland commented Jan 21, 2023

Partially addresses #5315
Supersedes #5316

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Covered by existing tests.
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@dwsutherland dwsutherland added the efficiency For notable efficiency improvements label Jan 21, 2023
@dwsutherland dwsutherland self-assigned this Jan 21, 2023
@dwsutherland dwsutherland mentioned this pull request Jan 21, 2023
8 tasks
@dwsutherland dwsutherland force-pushed the store-tproxy-efficiency branch from 66bb079 to c20d9b5 Compare January 22, 2023 00:52
@dwsutherland dwsutherland force-pushed the store-tproxy-efficiency branch 3 times, most recently from e0b2763 to 500c6bc Compare January 22, 2023 07:33
@oliver-sanders
Copy link
Member

oliver-sanders commented Jan 23, 2023

Results are in, it's a winner 🏆!

Figure_1

This approach scales much better. The old approach only started to bite after ~20 tasks, the new approach keeps the same scaling as before suggesting that increment_graph_window is no longer the dominant performance factor after the change, nice!

Note:

  1. upstream/master was at 259956b
  2. store-tproxy-efficiency-w-platform-call is this branch with the following diff:
diff --git a/cylc/flow/task_proxy.py b/cylc/flow/task_proxy.py
index 7c99f1272..696794f44 100644
--- a/cylc/flow/task_proxy.py
+++ b/cylc/flow/task_proxy.py
@@ -237,10 +237,10 @@ class TaskProxy:
 
         self.local_job_file_path: Optional[str] = None
 
-        # if data_mode:
-        #     self.platform = {}
-        # else:
-        self.platform = get_platform()
+        if data_mode:
+            self.platform = {}
+        else:
+            self.platform = get_platform()

         self.job_vacated = False
         self.poll_timer: Optional['TaskActionTimer'] = None

In order to profile the algorithm change independent of the get_platform change.

Memory usage is not effected (good news):

Figure_1

@MetRonnie MetRonnie self-requested a review January 23, 2023 14:20
@oliver-sanders
Copy link
Member

oliver-sanders commented Jan 23, 2023

Here's some analysis of the --profile results for the workflow in #5315 with -s TASKS=25 -s CYCLES=1.

Here's the graph (for -s TASKS=5 for clarity):

tmpjlgjfah2

num_edges = (25 * 2) + (25 * 25) = 675
num_tasks = (25 * 2) + 2 = 52

The increment _graph_window function is executed 34451/51 times. That means:

  • 51 top level calls (one for each task, except the last task which doesn't require graph increment changes).
  • And 34451 recursive calls resulting from these top level calls.

Interestingly 34451 / 675 ~= 51 which suggests that _graph_window is getting called num_edges x num_tasks times So there's still a potential (num_tasks - 1) = 51x speedup to be had from remembering the edges we've already investigated.

The generate_edge function is called 1325 times, which is much lower than it's previous counterpart resulting in a dramatic improvement. However, this number is about two times the number of edges present in the workflow. So there's a potential 2x saving here too.

So still scope for further speedup but it might take a bit of head scratching to work out how to make it happen.

Here's the revised call-stack after these efficiency improvements:

Screenshot from 2023-01-23 15-08-41

  • Most of the time is taken up with Tokens/id calls where there is strong potential for optimisation.
  • 1.34 seconds of that time is set comprehensions in Tokens which Optimize ID/tokens code #5321 will reduce substantially.
  • ~0.5 seconds of that time is for 1325 calls of edge_id (about two calls for each edge?) where there is caching potential.

So there's scope for ~2 seconds of passive improvement from other changes which is about 50% of the remaining runtime which is good news.

@oliver-sanders oliver-sanders added this to the 8.1.1 milestone Jan 23, 2023
@oliver-sanders
Copy link
Member

Could you rebase onto 8.1.x please, we can put this straight into the 8.1.1 release.

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tested out and it's a hell of a speedup for increment_graph_window()!

However I think this is a good opportunity to improve some of the dev documentation to help better understand the code

cylc/flow/data_store_mgr.py Outdated Show resolved Hide resolved
cylc/flow/data_store_mgr.py Outdated Show resolved Hide resolved
cylc/flow/data_store_mgr.py Outdated Show resolved Hide resolved
flow_nums,
is_parent=False,
itask=None
):
"""Create task-point element populated with static data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Create task-point element populated with static data.
"""Create TaskProxy populated with static data.

And for what purpose?

Copy link
Member Author

@dwsutherland dwsutherland Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not TaskProxy, it is the data-store task proxy node (PBTaskProxy)..

TaskProxy object is used for alot of the attributes to fill out this protobuf class i.e.

def _process_internal_task_proxy(self, itask, tproxy):

and others.. It's also convenient that the pool is made up of TaskProxy, so that can be passed in to gather the info from.

cylc/flow/task_pool.py Show resolved Hide resolved
cylc/flow/data_store_mgr.py Show resolved Hide resolved
cylc/flow/data_store_mgr.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Show resolved Hide resolved
@dwsutherland dwsutherland changed the base branch from master to 8.1.x January 24, 2023 05:47
@dwsutherland dwsutherland force-pushed the store-tproxy-efficiency branch from 500c6bc to d99db21 Compare January 24, 2023 05:50
@dwsutherland
Copy link
Member Author

dwsutherland commented Jan 24, 2023

Have rebased onto 8.1.x, and I've also:

  • Reduced the Tokens duplicate call a little by reusing those generated earlier.
  • Addressed some feedback.

@oliver-sanders
Copy link
Member

oliver-sanders commented Jan 24, 2023

Overnight testing where ronnie-local/SPEED-POWER is the effect of this branch merged with Ronnies Tokens PR (since merged to master).

Figure_1

I'd say that's a successful result!

Comment on lines +674 to 680
source_tokens (cylc.flow.id.Tokens)
point (PointBase)
flow_nums (set)
edge_distance (int):
Graph distance from active/origin node.
active_id (str):
Active/origin node id.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still a little confused as to the difference between source_tokens and active_id

Copy link
Member Author

@dwsutherland dwsutherland Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

active_id is the ID of the active task proxy (the origin about which the graph is expanded), it is passed through the recursive calls to increment_graph_window by it's children/parents (stays constant with this graph walk).

The purpose is to keep an association of that task's window nodes/branch (children/parents), so we know what to prune (or check for pruning) when that active task leaves the pool.

Copy link
Member Author

@dwsutherland dwsutherland Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source_tokens will change throughout the recursion; from the active pool task's (itask.tokens) to the child/parent tokens generated in the graph walk.

I changed to using source_tokens argument (from name) to avoid more Token duplication with creating an id

@dwsutherland
Copy link
Member Author

dwsutherland commented Jan 24, 2023

I'd say that's a successful result!

@MetRonnie - What is this SPEED-POWER branch? 😆 .. it's beating mine!

Nevermind! it's both of ours:

Overnight testing where ronnie-local/SPEED-POWER is the effect of this branch merged with Ronnies Tokens PR (since merged to master).

Time for sleep 🛏️

@oliver-sanders
Copy link
Member

FYI: I've found another Tokens optimisation which reduces the Tokens.duplicate cost making increment_graph_window quite a lot quicker again - #5325.

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is scaling much more nicely, thanks @dwsutherland!

Tested it out with the graph view to make sure the generated edges still match expectations. Good news is that the graph view is able to handle this horrible graph:

Screenshot from 2023-01-24 12-16-01

Well, kinda.

Comment on lines +833 to +838
if e_id in self.n_window_edges[active_id]:
return
if (
e_id not in self.data[self.workflow_id][EDGES]
and e_id not in self.added[EDGES]
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor]

Can these be combined?

Suggested change
if e_id in self.n_window_edges[active_id]:
return
if (
e_id not in self.data[self.workflow_id][EDGES]
and e_id not in self.added[EDGES]
):
if (
e_id in self.n_window_edges[active_id]
or (
e_id not in self.data[self.workflow_id][EDGES]
and e_id not in self.added[EDGES]
)
):
return

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that be:

        if (
            e_id in self.n_window_edges[active_id]
            or e_id in self.data[self.workflow_id][EDGES]
            or e_id in self.added[EDGES]
        ):
            return

?
previously there was more further on outside this test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to "Time for sleep 🛏️"? 😅

@oliver-sanders
Copy link
Member

(will open a collective changelog for the efficiency changes in 8.1.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants