New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

refactor(snapshot-backfill): extract common logic of consuming snapshot and log store #19936

Merged

wenym1 merged 5 commits into main from yiming/snapshot-backfill-vnode-stream

Dec 30, 2024

Contributor

wenym1 commented Dec 25, 2024 •

edited

Loading

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Part of #19720.

For snapshot backfill, in the main branch and the future PR #19720, the stage of consuming snapshot and consuming log store have the similar logic of consuming a stream of row with op and convert the stream into stream chunk. In this PR, we will extra the common logic and let them share the same code.

Previously, the row streams of multiple vnodes are combined in the stream returned from StorageTable method. In this PR we will change to combine the stream of multiple vnodes outside StorageTable with FuturesUnordered, so that we can access each vnode stream and get the progress of each vnode, which will be useful in #19720.

Checklist

I have written necessary rustdoc comments.
I have added necessary unit tests and integration tests.
I have added test labels as necessary.
I have added fuzzing tests or opened an issue to track them.
My PR contains breaking changes.
My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

My PR needs documentation updates.

Release note

github-actions bot added the type/refactor label

wenym1 mentioned this pull request

feat(snapshot-backfill): introduce state to snapshot backfill #19720

Merged

9 tasks

Contributor Author

wenym1 commented Dec 25, 2024 •

edited

Loading

This stack of pull requests is managed by Graphite. Learn more about stacking.

wenym1 mentioned this pull request

feat(storage): lazily remove instance data in uploader to avoid data loss #19931

Merged

8 tasks

Base automatically changed from yiming/lazily-remove-uploader-instance-data to main

December 25, 2024 09:15

wenym1 added the ci/run-backfill-tests label

wenym1 force-pushed the yiming/snapshot-backfill-vnode-stream branch from 2d2947f to 579b264 Compare

December 25, 2024 09:49

wenym1 requested review from hzxa21 and kwannoel

December 26, 2024 07:24

kwannoel reviewed

View reviewed changes

src/stream/src/executor/backfill/snapshot_backfill/executor.rs Show resolved Hide resolved

kwannoel reviewed

View reviewed changes

src/stream/src/executor/backfill/snapshot_backfill/vnode_stream.rs

+                      self.stream.poll_next_unpin(cx)
+                  }
+              }

Contributor

kwannoel Dec 27, 2024

Can we adapt the docs from the PR and add them here? To my understanding the reason we need a custom stream, rather than rely on a try_stream macro, is because we need to encapsulate some state inside the stream as well.

It is possible to do it with try_stream, but the internal state of the sink will be exposed.

Contributor Author

wenym1 Dec 27, 2024

It's hard to use the try_stream macro, because the ownership of the vnode streams will be owned by the stream generated by try_stream, and then we can't get the state of the vnode streams.

Contributor

kwannoel Dec 27, 2024

Okay let's add some docs to provide this justification.

kwannoel reviewed

View reviewed changes

src/stream/src/executor/backfill/snapshot_backfill/vnode_stream.rs Outdated Show resolved Hide resolved

kwannoel reviewed

View reviewed changes

src/stream/src/executor/backfill/snapshot_backfill/vnode_stream.rs Outdated Show resolved Hide resolved

wenym1 requested a review from kwannoel

December 27, 2024 07:26

kwannoel reviewed

View reviewed changes

src/stream/src/executor/backfill/snapshot_backfill/vnode_stream.rs Outdated

+                          match ready!(this.poll_next_row(cx)) {
+                              Ok(Some(((op, row), second))) => {
+                                  let may_chunk = if let Some((second_op, second_row)) = second {
+                                      if this.data_chunk_builder.can_append(2) {

Contributor

kwannoel Dec 27, 2024

Suggested change

      
                                    if this.data_chunk_builder.can_append(2) {
          
                                    if this.data_chunk_builder.can_append_update

Two rows will always correspond to update. This will make it more readable.

kwannoel reviewed

View reviewed changes

src/stream/src/executor/backfill/snapshot_backfill/vnode_stream.rs Outdated Show resolved Hide resolved

kwannoel reviewed

View reviewed changes

src/stream/src/executor/backfill/snapshot_backfill/vnode_stream.rs Outdated

+                                      }
+                                  } else {
+                                      this.ops.push(op);
+                                      this.data_chunk_builder.append_one_row(row)

Contributor

kwannoel Dec 27, 2024 •

edited

Loading

How do we ensure that the stream always drains any remaining rows in data_chunk_builder?

Contributor

kwannoel Dec 27, 2024

Oh because when we hit Ok(None) below we always drain it.

Will be helpful to add this as a comment.

kwannoel approved these changes

View reviewed changes

Contributor

kwannoel left a comment

LGTM.

wenym1 force-pushed the yiming/snapshot-backfill-vnode-stream branch from 6e0ccbc to de3fef3 Compare

December 30, 2024 06:32

graphite-app bot requested a review from a team

December 30, 2024 06:50

wenym1 added this pull request to the merge queue

wenym1 removed this pull request from the merge queue due to a manual request

wenym1 added 5 commits

December 30, 2024 17:30


          refactor(snapshot-backfill): extract common logic of consuming snapsh…

bd5b08e

…ot and log store


          add comment

2b33980


          address comment

fab4343


          use pin_project

8a42c5b


          add todo

627c55a

wenym1 force-pushed the yiming/snapshot-backfill-vnode-stream branch from de3fef3 to 627c55a Compare

December 30, 2024 09:30

wenym1 added this pull request to the merge queue

Merged via the queue into main with commit 705cdf1

30 of 31 checks passed

wenym1 deleted the yiming/snapshot-backfill-vnode-stream branch

December 30, 2024 10:53

wenym1 mentioned this pull request

feat(snapshot-backfill): implement executor to consume upstream table #20167

Open

8 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/run-backfill-tests type/refactor