-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: watermark filter use commited epoch to read global watermark #17724
fix: watermark filter use commited epoch to read global watermark #17724
Conversation
barrier_num_during_idle += 1; | ||
|
||
if barrier_num_during_idle == UPDATE_GLIOBAL_WATERMARK_FREQUENCY_WHEN_IDLE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without it, the issue will still occur when "relace the table", as it cannot be guaranteed that all actors have been dropped by the time execution reaches here. After adding this logic, it will wait for 10 barriers before proceeding to this point, thereby avoiding the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
current_watermark = Self::get_global_max_watermark( | ||
&table, | ||
&global_watermark_table, | ||
HummockReadEpoch::Committed(prev_epoch), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess NoWait
is still okay here, cuz we're sure prev_epoch
of the Pause
checkpoint is committed here.
risingwave/src/meta/src/barrier/command.rs
Lines 892 to 900 in 4605773
Command::Pause(reason) => { | |
if let PausedReason::ConfigChange = reason { | |
// After the `Pause` barrier is collected and committed, we must ensure that the | |
// storage version with this epoch is synced to all compute nodes before the | |
// execution of the next command of `Update`, as some newly created operators | |
// may immediately initialize their states on that barrier. | |
self.wait_epoch_commit(self.prev_epoch.value().0).await?; | |
} | |
} |
By NoWait
on update.prev_epoch
(i.e. pause.curr_epoch
), we're essentially read on the exactly same snapshot as pause.prev_epoch
because we ensure that during the Pause
barrier no data is written.
…7724) (#17762) Co-authored-by: stonepage <[email protected]>
…7724) (#17761) Co-authored-by: stonepage <[email protected]>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
fix #16836
But when making modifications, it was found that, in addition to initialization and scaling, querying the global watermark is also necessary when the source is idle. In this situation, using epoch to query could lead to a degradation in concurrent checkpointing, as it would require waiting for the epoch Here, reading with an specific epoc is too restrictive. In fact, what the watermark filter needs here is "to query the most recently committed version available, the newer the better" That is, `get_row(pk, HummockReadEpoch::Committed(current_max_commited_epoch))`.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.