-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(backfill): Persist backfill operator state #9752
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
0fe7198
to
521d683
Compare
521d683
to
6302d34
Compare
Codecov Report
@@ Coverage Diff @@
## main #9752 +/- ##
==========================================
- Coverage 71.17% 71.11% -0.07%
==========================================
Files 1250 1249 -1
Lines 209551 209727 +176
==========================================
- Hits 149155 149145 -10
- Misses 60396 60582 +186
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 13 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we think there are no internal states for creating indexes previously, but after this PR, every mv on mv will have states. We need to modify the meta to persist those internal states for creating indexes, otherwise, we can't see the backfill internal states for indexes.
424daa3
to
c33d8d5
Compare
f1d2b5a
to
76b5206
Compare
ecb6ba3
to
2bd05bb
Compare
This comment was marked as outdated.
This comment was marked as outdated.
451cbe4
to
330d5af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we decided to store the state per-vnode, are we going to support scaling a materialized view that is being created? If so, I guess there's some further stuff to resolve:
- We should revisit the logic of the scaling on the meta service.
risingwave/src/meta/src/stream/scale.rs
Lines 417 to 423 in 318714b
state @ table_fragments::State::Initial | state @ table_fragments::State::Creating => { bail!( "the materialized view of fragment {fragment_id} is in state {}", state.as_str_name() ) } - We need to refine the progress report to vnode granularity, instead of actor in the current design.
BTW, do we have some tests for checking whether the recovery works? IMO, we should check the fail-over occurred
A convenient way for testing this could be writing some unit tests, where we can drop the executor instance and rebuild it with the same state store backend to simulate the fail-over progress. For example: risingwave/src/stream/src/executor/watermark_filter.rs Lines 455 to 478 in 318714b
|
Recovery is not fully supported in this PR. Agreed on adding the tests in future PRs, when we support recovery for partial backfill state. This PR just adds state persistence, there still needs to be changes to meta to scale created mview. For now I think adding some simple recovery tests to check internal state pre and post recovery should suffice + materialized view contents. What do you think? |
Then I think it's okay to leave it to next PRs. 🫡 |
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Many thanks to @chenzl25 who has provided lots of advice, discussion and review around it.
Changes
Backfill state is persisted. There are still further changes before #8051 can be fully supported, specifically allowing meta to persist partially complete state, implementing the logic to recover from partially complete state, and defining valid partially complete states.
This also means pretty much all queries become stateful. This is due to StreamTableScan becoming stateful (if backfill is used).
Implementation notes
Persist backfill operator state. This is encoded in the following form:
Progress is partitioned by
vnode
and updated for allpos
.If backfill is finished,
backfill_finished
column should betrue
.This PR involves changes to:
Closes #6275
Checklist:
./risedev test
.UpstreamHashShard
.Checklist For Contributors
./risedev check
(or alias,./risedev c
)Checklist For Reviewers
Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note