-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stream): wait committed epoch in state table init_epoch #19223
Conversation
7fd5a68
to
18a7f90
Compare
Hints for reviewers: In storage side, the major changes are as followed.
In streaming side, the major changes are as followed.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on the storage change.
let mut latest_version = if let Some(CacheRefillerEvent { | ||
pinned_version, | ||
new_pinned_version, | ||
}) = self.refiller.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: after this PR, the refiller buffer won't get cleared on handle_clear
. This won't affect correctness but may slightly delay future version update because a new version update needs to wait for previous refiller events to complete before it can be effective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we no longer need to clear refiller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refiller was previously clear for the logic of waiting for receiving the latest hummock version. The event handler should wait for receiving the latest hummock in meta node, so that the newly spawned actors can see the latest hummock version after recovery. However, after this PR, the spawned actors will explicitly wait for the committed_epoch
of the state table to bump up to the prev_epoch
in init_epoch
, so that we don't need to wait for the latest hummock version globally during recovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The streaming part LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although in this PR we seem correctly changed all occurrences, it looks still dangerous to me that some other developer can easily break the order.
let first_barrier = expect_first_barrier(&mut input_stream).await?; | ||
let mut state = self.state; | ||
state.init(first_barrier.epoch).await?; | ||
let first_epoch = first_barrier.epoch; | ||
yield Message::Barrier(first_barrier); | ||
let mut state = self.state; | ||
state.init(first_epoch).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have a declmacro to enforce the order of yielding and using the first barrier? Like
handle_first_barrier!(&mut input_stream, |barrier| {
self.state.init(barrier.epoch).await?;
});
which internally calls expect_first_barrier
+ yield Barrier
+ the provided init block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's difficult, because yield barrier
in the custom macro will fail to compile in the try_stream
proc macro.
Although in this PR we seem correctly changed all occurrences, it looks still dangerous to me that some other developer can easily break the order.
I think it's fine, because if the order is not followed, deadlock must happen. As long as the newly developed features are covered in CI test, which we are supposed to do, we can be confident that the order is correctly handled, and otherwise something must be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then maybe we can add some notes on this in the doc of init_epoch
.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
The
init_epoch
part of #18312.It is required that after
init_epoch
returns, we can always read the consistent data onprev_epoch
, and otherwise, we will read inconsistent data and pollute the downstream.init_epoch
is called during recovery and configuration. Previously, in recovery, this is ensured by CN waiting the latest hummock version, and in configuration change, this is ensured by pausing the barrier injection and do a CN-widetry_wait_epoch
call in theWaitEpochCommit
streaming service rpc to wait for the committed epoch bumping up globally..In this PR, in the
init_epoch
ofStateTable
, we will change to wait for thecommitted_epoch
of the state table to bump up to theprev_epoch
. With this PR, during recovery there is no need for CN to wait for the latest hummock version, and for configuration change without update vnode bitmap (replace table, sink into table, not including scale), there is no need to do the pause-wait-resume barrier injection.Deadlock can easily happen. Bumping up
committed_epoch
depends on collecting barriers from all actors, but ifinit_epoch
blocks the handling of initial barrier, wheninit_epoch
waits for bumping upcommitted_epoch
, it blocks barrier handling, and causescommitted_epoch
never bump up, and cause deadlock.To avoid deadlock, we should strictly follow the order when handling the initial barrier. The order should be, receive first barrier, yield first barrier, and then
StateTable::init_epoch
. In this PR, we just ensure that all usages ofStateTable::init_epoch
will be included in the changed code so that we can carefully review the code. TheStateTable::init_epoch
method will become async, and therefore all direct usages on it will be included in the changed code. Some non-async utils method that callsStateTable::init_epoch
is required to be async as well, and therefore will be included in the changed code as well. There are two utils methods, both namedinit_epoch
, that were already async, and callStateTable::init_epoch
. To include the usages of these two methods in the changed code, the two methods are both renamed toinit_epoch_after_yield_barrier
, so that the usage can also be included in the changed code. Moreover, if the handling order is incorrect, deadlock must happen if the problematic code is reached. Hence, if we can pass all the CI tests, we can be confident that the correctness is fulfilled.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.