feat: compute node unregisters from meta for graceful shutdown #17662

BugenZhao · 2024-07-11T09:07:50Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

We left some TODOs when introducing graceful shutdown for compute node in #17533. After this PR, the compute node will unregisters itself from the meta service and proactively shutdown its barrier control stream on graceful shutdown.

Specifically,

The compute node will first unregister from the meta service, so that following batch queries and streaming jobs won't be scheduled here.
Then, it sends a Shutdown message on the barrier control stream, triggering a recovery on the new set of compute nodes.
After that, the compute node waits for the connection to be reset.
Finally, exit the entrypoint function then the process gracefully.

Additionally, improved error reporting and code styles.

After this PR, I suppose we can get rid of the extra manual step of risingwave ctl meta unregister-worker when scaling-in, as described in the doc.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

BugenZhao · 2024-07-11T09:07:58Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @BugenZhao and the rest of your teammates on Graphite

BugenZhao · 2024-07-24T08:04:22Z

src/meta/src/barrier/info.rs

            }
        }
+        for (node_id, actors) in deleted_actors {
+            let node = self.node_map.get(&node_id);
+            warn!(


This is to make the logs more concise.

BugenZhao · 2024-07-24T08:05:13Z

src/meta/src/stream/scale.rs

+                            if !worker_is_streaming_compute(&worker) {
+                                continue;
+                            }


Otherwise we get false warnings.

BugenZhao · 2024-07-24T08:05:42Z

src/stream/src/task/barrier_manager.rs

+                    // This is because the meta service will reset the control stream manager and
+                    // drop the connection to us upon recovery. As a result, the receiver part of
+                    // this sender will also be dropped, causing the stream to close.
+                    sender.closed().await;


Note the newly introduced procedure of self.control_stream_manager.clear(); during recovery.

src/stream/src/task/barrier_manager.rs

BugenZhao · 2024-07-24T09:12:50Z

One issue is that scaling-in is also triggered unnecessarily when killing single-node or playground instances, leading to verbose warning logs. 🤡 I'm considering whether to introduce a flag to bypass the procedure.

shanicky · 2024-07-24T10:55:42Z

src/compute/src/server.rs

+    // Unregister from the meta service, then...
+    // - batch queries will not be scheduled to this compute node,
+    // - streaming actors will not be scheduled to this compute node after next recovery.
+    meta_client.try_unregister().await;


Will a worker without any actors trigger a meta recovery if it exits?

This is a good point. Updated the code to not tigger recovery if an idle compute node exits.

Reverted per discussion in https://github.com/risingwavelabs/risingwave/pull/17662/files#r1692609064.

yezizp2012

The impl totally LGTM. Btw, normal upgrades and going online and offline will also lead to frequent scaling. In cases where the cluster load is very high, I am not sure if the current auto scaling stability is sufficient. That's the only thing I concern.
Not related to this PR, I have discussed with @shanicky , we can introduced a pre-unregister interface for cn, so that meta can schedule a online scale in without recovery. We can do it when this PR get merged.

yezizp2012 · 2024-07-25T07:01:44Z

One issue is that scaling-in is also triggered unnecessarily when killing single-node or playground instances, leading to verbose warning logs. 🤡 I'm considering whether to introduce a flag to bypass the procedure.

+1, one similar thing is that we can avoid lead election as well for single-node or playground instances.

BugenZhao · 2024-07-25T07:12:08Z

we can introduced a pre-unregister interface for cn, so that meta can schedule a online scale in without recovery

This is a good point. Theoretically everything can be done online without a recovery. However, due to the fact that

we always clear the executor cache when scaling-in, so there might be no big difference on streaming performance,
recovery does not affect batch availability,
scaling online can be less responsive (depending on the number of in-flight barriers), which may not fit within the default killing timeout of 30s in Kubernetes,

I'm not sure it this can improve much. 🤔

proto/stream_service.proto

wenym1

Rest LGTM

src/stream/src/task/barrier_manager.rs

proto/stream_service.proto

wenym1 · 2024-07-26T04:02:43Z

src/meta/src/barrier/rpc.rs

+                            .nodes
+                            .remove(&worker_id)
+                            .expect("should exist when get shutdown resp");
+                        let has_inflight_barrier = !node.inflight_barriers.is_empty();


Note that, in #17758, inflight_barriers will be removed

shanicky

LGTM!

This reverts commit 3142219.

Signed-off-by: Bugen Zhao <[email protected]>

wenym1

LGTM!

yezizp2012 · 2024-07-29T07:00:03Z

we always clear the executor cache when scaling-in, so there might be no big difference on streaming performance,

Oh indeed. I didn't realize it until you mentioned it. If so there's no much difference from recovery.

This was referenced Jul 11, 2024

fix(risedev): fixes and improvements for risedev-dev running risingwave commands #17586

Merged

refactor: graceful shutdown on meta node & unify election path #17608

Merged

BugenZhao mentioned this pull request Jul 11, 2024

refactor: graceful shutdown in standalone mode #17633

Merged

4 tasks

github-actions bot added type/refactor ci/run-e2e-single-node-tests labels Jul 11, 2024

BugenZhao force-pushed the bz/07-09-standalone_graceful_shutdown branch from 3142219 to d9a8caf Compare July 12, 2024 06:41

BugenZhao force-pushed the bz/more-graceful-shutdown branch from 5543066 to f369a26 Compare July 12, 2024 06:41

Base automatically changed from bz/07-09-standalone_graceful_shutdown to main July 16, 2024 03:24

BugenZhao force-pushed the bz/more-graceful-shutdown branch from f369a26 to 698dbe0 Compare July 23, 2024 07:06

BugenZhao changed the title ~~refactor: more graceful shutdown~~ refactor: more graceful shutdown on compute and meta node Jul 24, 2024

BugenZhao changed the title ~~refactor: more graceful shutdown on compute and meta node~~ feat: more graceful shutdown on compute and meta node Jul 24, 2024

github-actions bot added type/feature and removed type/refactor labels Jul 24, 2024

BugenZhao changed the title ~~feat: more graceful shutdown on compute and meta node~~ feat: more graceful shutdown on compute node Jul 24, 2024

BugenZhao changed the title ~~feat: more graceful shutdown on compute node~~ feat: compute node unregisters from meta for graceful shutdown Jul 24, 2024

BugenZhao marked this pull request as ready for review July 24, 2024 08:04

graphite-app bot requested a review from a team July 24, 2024 08:04

BugenZhao requested review from shanicky, yezizp2012 and wenym1 July 24, 2024 08:07

BugenZhao commented Jul 24, 2024

View reviewed changes

graphite-app bot requested a review from a team July 24, 2024 08:24

BugenZhao mentioned this pull request Jul 24, 2024

feat: also gracefully shutdown on SIGTERM #17802

Merged

4 tasks

shanicky reviewed Jul 24, 2024

View reviewed changes

yezizp2012 approved these changes Jul 25, 2024

View reviewed changes

wenym1 reviewed Jul 25, 2024

View reviewed changes

proto/stream_service.proto Show resolved Hide resolved

wenym1 reviewed Jul 26, 2024

View reviewed changes

shanicky approved these changes Jul 26, 2024

View reviewed changes

BugenZhao requested a review from wenym1 July 26, 2024 08:24

BugenZhao added 11 commits July 26, 2024 16:26

Revert "revert changes not relavent to standalone"

3a5f445

This reverts commit 3142219.

only maintain worker cache for streaming compute nodes in scale manager

b6ba7cc

shutdown stream manager

b55b5d2

add todos

f10f303

use shutdown message

69a752b

Signed-off-by: Bugen Zhao <[email protected]>

release connection eagerly

a46cba2

Signed-off-by: Bugen Zhao <[email protected]>

do not exit on heartbeat error

29b2992

Signed-off-by: Bugen Zhao <[email protected]>

refine docs

a0a2c48

Signed-off-by: Bugen Zhao <[email protected]>

revert changes on meta

edfc335

Signed-off-by: Bugen Zhao <[email protected]>

do not recover if the compute node is idle

a514fd2

Signed-off-by: Bugen Zhao <[email protected]>

remove no-actor optimization & move actor op handler to async context

4dcfb9d

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao force-pushed the bz/more-graceful-shutdown branch from 8c20bad to 4dcfb9d Compare July 26, 2024 08:26

wenym1 approved these changes Jul 26, 2024

View reviewed changes

BugenZhao added this pull request to the merge queue Jul 26, 2024

Merged via the queue into main with commit f389a77 Jul 26, 2024
33 of 34 checks passed

BugenZhao deleted the bz/more-graceful-shutdown branch July 26, 2024 10:16

wenym1 mentioned this pull request Jul 26, 2024

feat: support inject and collect barrier from partial graph #17758

Merged

9 tasks

st1page mentioned this pull request Aug 5, 2024

Considering auto scaling's behavior when there is many streaming jobs. #17927

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: compute node unregisters from meta for graceful shutdown #17662

feat: compute node unregisters from meta for graceful shutdown #17662

BugenZhao commented Jul 11, 2024 •

edited

Loading

BugenZhao commented Jul 11, 2024 •

edited

Loading

BugenZhao Jul 24, 2024

BugenZhao Jul 24, 2024

BugenZhao Jul 24, 2024

BugenZhao commented Jul 24, 2024

shanicky Jul 24, 2024

BugenZhao Jul 25, 2024

BugenZhao Jul 26, 2024

yezizp2012 left a comment

yezizp2012 commented Jul 25, 2024

BugenZhao commented Jul 25, 2024 •

edited

Loading

wenym1 left a comment

wenym1 Jul 26, 2024

shanicky left a comment

wenym1 left a comment

yezizp2012 commented Jul 29, 2024

feat: compute node unregisters from meta for graceful shutdown #17662

feat: compute node unregisters from meta for graceful shutdown #17662

Conversation

BugenZhao commented Jul 11, 2024 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

BugenZhao commented Jul 11, 2024 • edited Loading

BugenZhao Jul 24, 2024

Choose a reason for hiding this comment

BugenZhao Jul 24, 2024

Choose a reason for hiding this comment

BugenZhao Jul 24, 2024

Choose a reason for hiding this comment

BugenZhao commented Jul 24, 2024

shanicky Jul 24, 2024

Choose a reason for hiding this comment

BugenZhao Jul 25, 2024

Choose a reason for hiding this comment

BugenZhao Jul 26, 2024

Choose a reason for hiding this comment

yezizp2012 left a comment

Choose a reason for hiding this comment

yezizp2012 commented Jul 25, 2024

BugenZhao commented Jul 25, 2024 • edited Loading

wenym1 left a comment

Choose a reason for hiding this comment

wenym1 Jul 26, 2024

Choose a reason for hiding this comment

shanicky left a comment

Choose a reason for hiding this comment

wenym1 left a comment

Choose a reason for hiding this comment

yezizp2012 commented Jul 29, 2024

BugenZhao commented Jul 11, 2024 •

edited

Loading

BugenZhao commented Jul 11, 2024 •

edited

Loading

BugenZhao commented Jul 25, 2024 •

edited

Loading