-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(config): make arrangement backfill default #14846
Conversation
Need to figure out:
|
e156e36
to
18d0f05
Compare
E2e test Runtime comparison (debug mode)
|
22f3147
to
18d0f05
Compare
Runtime still too long without the debug. 8 minutes vs 6 minutes for a normal PR https://buildkite.com/risingwavelabs/pull-request/builds/40961#018d590f-88cb-4868-9ad0-d57ad894bd8d |
18d0f05
to
f7cf811
Compare
a2038c2
to
654a721
Compare
654a721
to
eea0bdb
Compare
293bef2
to
7692c4d
Compare
7692c4d
to
1b0a4c1
Compare
3f624f0
to
fc566e1
Compare
Arrangement backfill passes backfill performance tests. |
* tomb refers to tombstone, generated when there's deleted values. An old issue #12680 shows backfill had issues when there's a large number of tombstones. |
fc566e1
to
1be3993
Compare
e309f80
to
f504f88
Compare
08edf2e
to
3b2b0ce
Compare
3c7bc6f
to
f4ce778
Compare
f4ce778
to
bbf0dd6
Compare
Fix parallel in memory tests here. #15930 |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR makes arrangement backfill the default backfill by changing the
streaming_use_arrangement_backfill
to true by default.We also fix some ci issues which block this from being the default.
Changes
Large number of changes are due to
./risedev dapt
changing the planner tests, since nowArrangementBackfill
will be the scan type instead ofBackfill
.In this PR we also hide some compactor
table_ids
, since these will take up a lot of logging space and are not really useful in debugging CI.Here are the regression in PR runtimes:
It's fine IMO since it's just debug builds. More importantly
main-cron
does not show regression for e2e test release or backfill tests.Here are the changes to main-cron runtimes (see #14846 (comment) for details):
ReplicatedStateTable
. Metadata is larger since we need the fullTableCatalog
, instead of justTableDesc
.Performance comparisons between this and no shuffle backfill:
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
Arrangement Backfill decouples upstream stream jobs with their downstream counter parts.
Consider the following:
Before this PR, we used
NoShuffleBackfill
to merge historical data and the update stream being scanned from m1 into m2. The implementation ofNoShuffleBackfill
is such that the parallelism ofm1
andm2
are coupled together.Such that if
m2
scales,m1
also needs to be scaled.With
ArrangementBackfill
, we decouple this behaviour, and som2
can scale independently ofm1
.Note that Arrangement Backfill is enabled by default. You can disable it with: