feat(sink): reimplement sink coordinator worker and support scale #18467
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Our sink coordinator worker determine whether all sink metadata of an epoch are collected by checking if all vnodes are collected. The coordinator worker handles epochs one by one. When receiving a commit request from one parallelism, it will blocking await on the commit request from all other parallelisms, and then commit all, and then wait for the next epoch. In the current implementation there won't be concurrent epoch.
The current implementation works in normal case, but doesn't work in the case of scale out, because when scale out happens, the newly created parallelism may send a commit request of a later epoch before the existing parallelisms finish a previous epoch, and there may exist more than one epoch in this case. The current implementation will report an error when such case happens, and the scale out is actually achieved by triggering a log store rewind with such error.
In this PR, we reimplement the sink coordinator worker and support concurrent epochs by introducing an epoch queue. We add variant
update_vnode_bitmap
andstop
in the sink coordination request so that each sink parallelism can send the relevant information to the coordinator worker.We also change to force a commit on update vnode bitmap for sinks that decouple commit with checkpoint barrier.
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.