data-store graph window efficiency/refactor #5319

dwsutherland · 2023-01-21T23:32:06Z

Partially addresses #5315
Supersedes #5316

Check List

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
Applied any dependency changes to both setup.cfg and conda-environment.yml.
Covered by existing tests.
CHANGES.md entry included if this is a change that can affect users
Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

cylc/flow/task_pool.py

oliver-sanders · 2023-01-23T13:51:12Z

Results are in, it's a winner 🏆!

This approach scales much better. The old approach only started to bite after ~20 tasks, the new approach keeps the same scaling as before suggesting that increment_graph_window is no longer the dominant performance factor after the change, nice!

Note:

upstream/master was at 259956b
store-tproxy-efficiency-w-platform-call is this branch with the following diff:

diff --git a/cylc/flow/task_proxy.py b/cylc/flow/task_proxy.py
index 7c99f1272..696794f44 100644
--- a/cylc/flow/task_proxy.py
+++ b/cylc/flow/task_proxy.py
@@ -237,10 +237,10 @@ class TaskProxy:
 
         self.local_job_file_path: Optional[str] = None
 
-        # if data_mode:
-        #     self.platform = {}
-        # else:
-        self.platform = get_platform()
+        if data_mode:
+            self.platform = {}
+        else:
+            self.platform = get_platform()

         self.job_vacated = False
         self.poll_timer: Optional['TaskActionTimer'] = None

In order to profile the algorithm change independent of the get_platform change.

Memory usage is not effected (good news):

oliver-sanders · 2023-01-23T15:21:14Z

Here's some analysis of the `--profile` results for the workflow in #5315 with `-s TASKS=25 -s CYCLES=1`.

Here's the graph (for -s TASKS=5 for clarity):

num_edges = (25 * 2) + (25 * 25) = 675
num_tasks = (25 * 2) + 2 = 52

The increment _graph_window function is executed 34451/51 times. That means:

51 top level calls (one for each task, except the last task which doesn't require graph increment changes).
And 34451 recursive calls resulting from these top level calls.

Interestingly 34451 / 675 ~= 51 which suggests that _graph_window is getting called num_edges x num_tasks times So there's still a potential (num_tasks - 1) = 51x speedup to be had from remembering the edges we've already investigated.

The generate_edge function is called 1325 times, which is much lower than it's previous counterpart resulting in a dramatic improvement. However, this number is about two times the number of edges present in the workflow. So there's a potential 2x saving here too.

So still scope for further speedup but it might take a bit of head scratching to work out how to make it happen.

Here's the revised call-stack after these efficiency improvements:

Most of the time is taken up with Tokens/id calls where there is strong potential for optimisation.
1.34 seconds of that time is set comprehensions in Tokens which Optimize ID/tokens code #5321 will reduce substantially.
~0.5 seconds of that time is for 1325 calls of edge_id (about two calls for each edge?) where there is caching potential.

So there's scope for ~2 seconds of passive improvement from other changes which is about 50% of the remaining runtime which is good news.

oliver-sanders · 2023-01-23T16:22:07Z

Could you rebase onto 8.1.x please, we can put this straight into the 8.1.1 release.

MetRonnie

Have tested out and it's a hell of a speedup for increment_graph_window()!

However I think this is a good opportunity to improve some of the dev documentation to help better understand the code

cylc/flow/data_store_mgr.py

MetRonnie · 2023-01-23T16:37:05Z

cylc/flow/data_store_mgr.py

+        flow_nums,
+        is_parent=False,
+        itask=None
+    ):
        """Create task-point element populated with static data.


Suggested change

"""Create task-point element populated with static data.

"""Create TaskProxy populated with static data.

And for what purpose?

No, it's not TaskProxy, it is the data-store task proxy node (PBTaskProxy)..

TaskProxy object is used for alot of the attributes to fill out this protobuf class i.e.

cylc-flow/cylc/flow/data_store_mgr.py

Line 1162 in 259956b

def _process_internal_task_proxy(self, itask, tproxy):

and others.. It's also convenient that the pool is made up of TaskProxy, so that can be passed in to gather the info from.

cylc/flow/task_pool.py

cylc/flow/data_store_mgr.py

cylc/flow/task_pool.py

dwsutherland · 2023-01-24T05:56:52Z

Have rebased onto 8.1.x, and I've also:

Reduced the Tokens duplicate call a little by reusing those generated earlier.
Addressed some feedback.

oliver-sanders · 2023-01-24T09:33:07Z

Overnight testing where ronnie-local/SPEED-POWER is the effect of this branch merged with Ronnies Tokens PR (since merged to master).

I'd say that's a successful result!

MetRonnie · 2023-01-24T10:30:33Z

cylc/flow/data_store_mgr.py

+            source_tokens (cylc.flow.id.Tokens)
+            point (PointBase)
+            flow_nums (set)
            edge_distance (int):
                Graph distance from active/origin node.
            active_id (str):
                Active/origin node id.


Sorry, I'm still a little confused as to the difference between source_tokens and active_id

active_id is the ID of the active task proxy (the origin about which the graph is expanded), it is passed through the recursive calls to increment_graph_window by it's children/parents (stays constant with this graph walk).

The purpose is to keep an association of that task's window nodes/branch (children/parents), so we know what to prune (or check for pruning) when that active task leaves the pool.

source_tokens will change throughout the recursion; from the active pool task's (itask.tokens) to the child/parent tokens generated in the graph walk.

I changed to using source_tokens argument (from name) to avoid more Token duplication with creating an id

dwsutherland · 2023-01-24T11:16:47Z

I'd say that's a successful result!

@MetRonnie - What is this SPEED-POWER branch? 😆 .. it's beating mine!

Nevermind! it's both of ours:

Overnight testing where ronnie-local/SPEED-POWER is the effect of this branch merged with Ronnies Tokens PR (since merged to master).

Time for sleep 🛏️

oliver-sanders · 2023-01-24T11:40:57Z

FYI: I've found another Tokens optimisation which reduces the Tokens.duplicate cost making increment_graph_window quite a lot quicker again - #5325.

oliver-sanders

This is scaling much more nicely, thanks @dwsutherland!

Tested it out with the graph view to make sure the generated edges still match expectations. Good news is that the graph view is able to handle this horrible graph:

Well, kinda.

oliver-sanders · 2023-01-24T10:23:11Z

cylc/flow/data_store_mgr.py

+        if e_id in self.n_window_edges[active_id]:
+            return
+        if (
+            e_id not in self.data[self.workflow_id][EDGES]
+            and e_id not in self.added[EDGES]
+        ):


[minor]

Can these be combined?

Suggested change

if e_id in self.n_window_edges[active_id]:

return

if (

e_id not in self.data[self.workflow_id][EDGES]

and e_id not in self.added[EDGES]

):

if (

e_id in self.n_window_edges[active_id]

or (

e_id not in self.data[self.workflow_id][EDGES]

and e_id not in self.added[EDGES]

)

):

return

Wouldn't that be:

if ( e_id in self.n_window_edges[active_id] or e_id in self.data[self.workflow_id][EDGES] or e_id in self.added[EDGES] ): return

?
previously there was more further on outside this test.

What happened to "Time for sleep 🛏️"? 😅

oliver-sanders · 2023-01-24T12:19:43Z

(will open a collective changelog for the efficiency changes in 8.1.1)

dwsutherland added the efficiency For notable efficiency improvements label Jan 21, 2023

dwsutherland self-assigned this Jan 21, 2023

dwsutherland mentioned this pull request Jan 21, 2023

Avoid creating task proxies #5316

Closed

8 tasks

dwsutherland commented Jan 22, 2023

View reviewed changes

cylc/flow/task_pool.py Show resolved Hide resolved

dwsutherland force-pushed the store-tproxy-efficiency branch from 66bb079 to c20d9b5 Compare January 22, 2023 00:52

dwsutherland requested a review from oliver-sanders January 22, 2023 00:53

dwsutherland force-pushed the store-tproxy-efficiency branch 3 times, most recently from e0b2763 to 500c6bc Compare January 22, 2023 07:33

MetRonnie self-requested a review January 23, 2023 14:20

oliver-sanders added this to the 8.1.1 milestone Jan 23, 2023

MetRonnie approved these changes Jan 23, 2023

View reviewed changes

oliver-sanders reviewed Jan 23, 2023

View reviewed changes

cylc/flow/data_store_mgr.py Show resolved Hide resolved

cylc/flow/data_store_mgr.py Outdated Show resolved Hide resolved

cylc/flow/task_pool.py Show resolved Hide resolved

dwsutherland changed the base branch from master to 8.1.x January 24, 2023 05:47

data-store graph window efficiency/refactor

d99db21

dwsutherland force-pushed the store-tproxy-efficiency branch from 500c6bc to d99db21 Compare January 24, 2023 05:50

MetRonnie reviewed Jan 24, 2023

View reviewed changes

oliver-sanders mentioned this pull request Jan 24, 2023

tokens: speed up duplication of Tokens objects #5325

Merged

8 tasks

oliver-sanders approved these changes Jan 24, 2023

View reviewed changes

oliver-sanders merged commit 977fd84 into cylc:8.1.x Jan 24, 2023

oliver-sanders mentioned this pull request Jan 26, 2023

efficiency: increment_graph_window #5315

Closed

hjoliver mentioned this pull request Mar 29, 2023

Spawning + datastore with thousands of "ready" tasks #5437

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-store graph window efficiency/refactor #5319

data-store graph window efficiency/refactor #5319

dwsutherland commented Jan 21, 2023 •

edited by oliver-sanders

Loading

oliver-sanders commented Jan 23, 2023 •

edited

Loading

oliver-sanders commented Jan 23, 2023 •

edited

Loading

oliver-sanders commented Jan 23, 2023

MetRonnie left a comment

MetRonnie Jan 23, 2023

dwsutherland Jan 24, 2023 •

edited

Loading

dwsutherland commented Jan 24, 2023 •

edited

Loading

oliver-sanders commented Jan 24, 2023 •

edited

Loading

MetRonnie Jan 24, 2023

dwsutherland Jan 24, 2023 •

edited

Loading

dwsutherland Jan 24, 2023 •

edited

Loading

dwsutherland commented Jan 24, 2023 •

edited

Loading

oliver-sanders commented Jan 24, 2023

oliver-sanders left a comment

oliver-sanders Jan 24, 2023

dwsutherland Jan 24, 2023

MetRonnie Jan 24, 2023

oliver-sanders commented Jan 24, 2023

	"""Create task-point element populated with static data.
	"""Create TaskProxy populated with static data.

data-store graph window efficiency/refactor #5319

data-store graph window efficiency/refactor #5319

Conversation

dwsutherland commented Jan 21, 2023 • edited by oliver-sanders Loading

oliver-sanders commented Jan 23, 2023 • edited Loading

oliver-sanders commented Jan 23, 2023 • edited Loading

Here's some analysis of the --profile results for the workflow in #5315 with -s TASKS=25 -s CYCLES=1.

Here's the revised call-stack after these efficiency improvements:

oliver-sanders commented Jan 23, 2023

MetRonnie left a comment

Choose a reason for hiding this comment

MetRonnie Jan 23, 2023

Choose a reason for hiding this comment

dwsutherland Jan 24, 2023 • edited Loading

Choose a reason for hiding this comment

dwsutherland commented Jan 24, 2023 • edited Loading

oliver-sanders commented Jan 24, 2023 • edited Loading

MetRonnie Jan 24, 2023

Choose a reason for hiding this comment

dwsutherland Jan 24, 2023 • edited Loading

Choose a reason for hiding this comment

dwsutherland Jan 24, 2023 • edited Loading

Choose a reason for hiding this comment

dwsutherland commented Jan 24, 2023 • edited Loading

oliver-sanders commented Jan 24, 2023

oliver-sanders left a comment

Choose a reason for hiding this comment

oliver-sanders Jan 24, 2023

Choose a reason for hiding this comment

dwsutherland Jan 24, 2023

Choose a reason for hiding this comment

MetRonnie Jan 24, 2023

Choose a reason for hiding this comment

oliver-sanders commented Jan 24, 2023

dwsutherland commented Jan 21, 2023 •

edited by oliver-sanders

Loading

oliver-sanders commented Jan 23, 2023 •

edited

Loading

oliver-sanders commented Jan 23, 2023 •

edited

Loading

Here's some analysis of the `--profile` results for the workflow in #5315 with `-s TASKS=25 -s CYCLES=1`.

dwsutherland Jan 24, 2023 •

edited

Loading

dwsutherland commented Jan 24, 2023 •

edited

Loading

oliver-sanders commented Jan 24, 2023 •

edited

Loading

dwsutherland Jan 24, 2023 •

edited

Loading

dwsutherland Jan 24, 2023 •

edited

Loading

dwsutherland commented Jan 24, 2023 •

edited

Loading