-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(stream): own and recreate shared context and local barrier manager in recovery #15157
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
e2e_test/batch/transaction/now.slt
Outdated
query T | ||
select * from v | ||
except | ||
select * from mv; | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can still keep this one. 😄
manager | ||
.send_barrier(barrier.clone(), actor_ids_to_send, actor_ids_to_collect) | ||
actor_op_tx | ||
.send_and_await(|tx| LocalActorOperation::InjectBarrier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we still extracting it as a method? Same for AwaitEpochCommited
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Currently we have
SharedContext
andLocalBarrierManager
that are created at CN initialization and then exists in long run.SharedContext
is for actors to communicate and exchange information, and it gets cleared in every recovery.LocalBarrierManager
is used for actors to send barrier related event to barrier worker loop via a channel. However, the same channel is still used after recovery, and it's possible that there is some stale actor events sent after the reset event are stored in the channel and get handled after recovery, which may caused bug. Therefore, theLocalBarrierManager
should also be cleared in recovery.Since both
SharedContext
andLocalBarrierManager
should be cleared in recovery, it's better to just recreate them and let the actors created after recovery to use the new ones. Therefore, in this PR, we change to own the shared context and local barrier manager in the barrier worker loop, and we will recreate them in recovery.The current
LocalBarrierEvent
mixes the events reported by actor and the requests sent from stream manager. In this PR we split the request sent from stream manager to a separate enum namedLocalActorOperation
. The stream manager no longer send request viaLocalBarrierManager
, but instead holds its own channel to send request to local barrier worker loop. TheLocalBarrierManager
is owned in the barrier worker loop, and pass to actor during actor creation.Previously the stream manager also holds a reference to
SharedContext
to handle some rpc request. In this PR we change to handle the rpc request via sending a request to barrier worker loop and wait for result.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.