-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(meta): decouple barrier collect and sync in global barrier manager #19475
base: main
Are you sure you want to change the base?
Conversation
impl<C: GlobalBarrierWorkerContext> GlobalBarrierWorker<C> { | ||
/// We need to make sure there are no changes when doing recovery | ||
pub async fn clear_on_err(&mut self, err: &MetaError) { | ||
// TODO: move this method to `complete_task.rs` and mark some structs and fields as private before merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
f311539
to
bb4e0f0
Compare
Is this ready for review? Please add some description. |
Not yet. Still running some tests. Will ping reviewers when it's ready for review |
596b11c
to
76a77b4
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Temporarily mark this PR as draft. Database isolation will be implemented in favor of #19556 and its subsequent PRs. |
119a250
to
112f834
Compare
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Previously in global barrier manager, barriers are handled in 2 phases. In phase 1, a barrier is injected to CNs, and waits for the
BarrierCompleteResponse
from the injected CNs. When the response is received, the barrier has been collected from all actors in the CN, and the epoch of the barrier has been synced and the SST files are included in the response. In phase 2, the SST files will be committed to hummock manager, and the barrier command will commit the change to fragment and catalog manager if there is any.In partial checkpoint, barriers are injected to multiple partial graphs, and then multiple partial graphs can be synced and then committed together. Therefore, in this PR, we will decouple the barrier collection and sync in global barrier manager. In general, a barrier will have 3 phase,
collect
,complete
, andcommit
. A barrier will be handled in the following steps:collect
phase to wait for theBarrierCollectResponse
. The barrier injection and collection is independent in different partial graphs.BarrierCollectResponse
from all CNs, the global barrier worker will generate aCompleteBarrierTask
that includes them together. These partial graphs and epochs will enter thecomplete
phase.BarrierCompleteRequest
will be sent to CNs and wait for the responses. When CN receives the request, multiple partial graphs will be synced together.BarrierCompleteResponse
is received from all CNs, the partial graphs enters thecommit
phase and will be committed together to hummock, fragment manager and catalog manager.Note:
collect
phase, because it has nothing to do in thecomplete
andcommit
phase.collect
andcomplete
phase. The previous singleCompleteBarrierResponse
proto message will also be divided. The previouscreate_mview_progress
is moved to theCollectBarrierResponse
.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.