Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: compute node unregisters from meta for graceful shutdown #17662

Merged
merged 11 commits into from
Jul 26, 2024

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Jul 11, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

We left some TODOs when introducing graceful shutdown for compute node in #17533. After this PR, the compute node will unregisters itself from the meta service and proactively shutdown its barrier control stream on graceful shutdown.

Specifically,

  • The compute node will first unregister from the meta service, so that following batch queries and streaming jobs won't be scheduled here.
  • Then, it sends a Shutdown message on the barrier control stream, triggering a recovery on the new set of compute nodes.
  • After that, the compute node waits for the connection to be reset.
  • Finally, exit the entrypoint function then the process gracefully.

Additionally, improved error reporting and code styles.


After this PR, I suppose we can get rid of the extra manual step of risingwave ctl meta unregister-worker when scaling-in, as described in the doc.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@BugenZhao BugenZhao force-pushed the bz/07-09-standalone_graceful_shutdown branch from 3142219 to d9a8caf Compare July 12, 2024 06:41
@BugenZhao BugenZhao force-pushed the bz/more-graceful-shutdown branch from 5543066 to f369a26 Compare July 12, 2024 06:41
Base automatically changed from bz/07-09-standalone_graceful_shutdown to main July 16, 2024 03:24
@BugenZhao BugenZhao force-pushed the bz/more-graceful-shutdown branch from f369a26 to 698dbe0 Compare July 23, 2024 07:06
@BugenZhao BugenZhao changed the title refactor: more graceful shutdown refactor: more graceful shutdown on compute and meta node Jul 24, 2024
@BugenZhao BugenZhao changed the title refactor: more graceful shutdown on compute and meta node feat: more graceful shutdown on compute and meta node Jul 24, 2024
@BugenZhao BugenZhao changed the title feat: more graceful shutdown on compute and meta node feat: more graceful shutdown on compute node Jul 24, 2024
@BugenZhao BugenZhao changed the title feat: more graceful shutdown on compute node feat: compute node unregisters from meta for graceful shutdown Jul 24, 2024
@BugenZhao BugenZhao marked this pull request as ready for review July 24, 2024 08:04
@graphite-app graphite-app bot requested a review from a team July 24, 2024 08:04
}
}
for (node_id, actors) in deleted_actors {
let node = self.node_map.get(&node_id);
warn!(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make the logs more concise.

Comment on lines +2662 to +2655
if !worker_is_streaming_compute(&worker) {
continue;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we get false warnings.

// This is because the meta service will reset the control stream manager and
// drop the connection to us upon recovery. As a result, the receiver part of
// this sender will also be dropped, causing the stream to close.
sender.closed().await;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the newly introduced procedure of self.control_stream_manager.clear(); during recovery.

src/stream/src/task/barrier_manager.rs Outdated Show resolved Hide resolved
@graphite-app graphite-app bot requested a review from a team July 24, 2024 08:24
@BugenZhao
Copy link
Member Author

One issue is that scaling-in is also triggered unnecessarily when killing single-node or playground instances, leading to verbose warning logs. 🤡 I'm considering whether to introduce a flag to bypass the procedure.

// Unregister from the meta service, then...
// - batch queries will not be scheduled to this compute node,
// - streaming actors will not be scheduled to this compute node after next recovery.
meta_client.try_unregister().await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will a worker without any actors trigger a meta recovery if it exits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. Updated the code to not tigger recovery if an idle compute node exits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@yezizp2012 yezizp2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The impl totally LGTM. Btw, normal upgrades and going online and offline will also lead to frequent scaling. In cases where the cluster load is very high, I am not sure if the current auto scaling stability is sufficient. That's the only thing I concern.
Not related to this PR, I have discussed with @shanicky , we can introduced a pre-unregister interface for cn, so that meta can schedule a online scale in without recovery. We can do it when this PR get merged.

@yezizp2012
Copy link
Member

One issue is that scaling-in is also triggered unnecessarily when killing single-node or playground instances, leading to verbose warning logs. 🤡 I'm considering whether to introduce a flag to bypass the procedure.

+1, one similar thing is that we can avoid lead election as well for single-node or playground instances.

@BugenZhao
Copy link
Member Author

BugenZhao commented Jul 25, 2024

we can introduced a pre-unregister interface for cn, so that meta can schedule a online scale in without recovery

This is a good point. Theoretically everything can be done online without a recovery. However, due to the fact that

  • we always clear the executor cache when scaling-in, so there might be no big difference on streaming performance,
  • recovery does not affect batch availability,
  • scaling online can be less responsive (depending on the number of in-flight barriers), which may not fit within the default killing timeout of 30s in Kubernetes,

I'm not sure it this can improve much. 🤔

Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

src/stream/src/task/barrier_manager.rs Show resolved Hide resolved
src/stream/src/task/barrier_manager.rs Outdated Show resolved Hide resolved
proto/stream_service.proto Show resolved Hide resolved
.nodes
.remove(&worker_id)
.expect("should exist when get shutdown resp");
let has_inflight_barrier = !node.inflight_barriers.is_empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that, in #17758, inflight_barriers will be removed

Copy link
Contributor

@shanicky shanicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@BugenZhao BugenZhao requested a review from wenym1 July 26, 2024 08:24
@BugenZhao BugenZhao force-pushed the bz/more-graceful-shutdown branch from 8c20bad to 4dcfb9d Compare July 26, 2024 08:26
Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@BugenZhao BugenZhao added this pull request to the merge queue Jul 26, 2024
Merged via the queue into main with commit f389a77 Jul 26, 2024
33 of 34 checks passed
@BugenZhao BugenZhao deleted the bz/more-graceful-shutdown branch July 26, 2024 10:16
@yezizp2012
Copy link
Member

  • we always clear the executor cache when scaling-in, so there might be no big difference on streaming performance,

Oh indeed. I didn't realize it until you mentioned it. If so there's no much difference from recovery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants