Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(snapshot-backfill): extract common logic of consuming snapshot and log store #19936

Merged
merged 5 commits into from
Dec 30, 2024

Conversation

wenym1
Copy link
Contributor

@wenym1 wenym1 commented Dec 25, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Part of #19720.

For snapshot backfill, in the main branch and the future PR #19720, the stage of consuming snapshot and consuming log store have the similar logic of consuming a stream of row with op and convert the stream into stream chunk. In this PR, we will extra the common logic and let them share the same code.

Previously, the row streams of multiple vnodes are combined in the stream returned from StorageTable method. In this PR we will change to combine the stream of multiple vnodes outside StorageTable with FuturesUnordered, so that we can access each vnode stream and get the progress of each vnode, which will be useful in #19720.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

  • My PR needs documentation updates.
Release note

Base automatically changed from yiming/lazily-remove-uploader-instance-data to main December 25, 2024 09:15
@wenym1 wenym1 force-pushed the yiming/snapshot-backfill-vnode-stream branch from 2d2947f to 579b264 Compare December 25, 2024 09:49
@wenym1 wenym1 requested review from hzxa21 and kwannoel December 26, 2024 07:24
self.stream.poll_next_unpin(cx)
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we adapt the docs from the PR and add them here? To my understanding the reason we need a custom stream, rather than rely on a try_stream macro, is because we need to encapsulate some state inside the stream as well.

It is possible to do it with try_stream, but the internal state of the sink will be exposed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to use the try_stream macro, because the ownership of the vnode streams will be owned by the stream generated by try_stream, and then we can't get the state of the vnode streams.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay let's add some docs to provide this justification.

@wenym1 wenym1 requested a review from kwannoel December 27, 2024 07:26
match ready!(this.poll_next_row(cx)) {
Ok(Some(((op, row), second))) => {
let may_chunk = if let Some((second_op, second_row)) = second {
if this.data_chunk_builder.can_append(2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if this.data_chunk_builder.can_append(2) {
if this.data_chunk_builder.can_append_update

Two rows will always correspond to update. This will make it more readable.

}
} else {
this.ops.push(op);
this.data_chunk_builder.append_one_row(row)
Copy link
Contributor

@kwannoel kwannoel Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we ensure that the stream always drains any remaining rows in data_chunk_builder?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh because when we hit Ok(None) below we always drain it.

Will be helpful to add this as a comment.

Copy link
Contributor

@kwannoel kwannoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@wenym1 wenym1 force-pushed the yiming/snapshot-backfill-vnode-stream branch from 6e0ccbc to de3fef3 Compare December 30, 2024 06:32
@graphite-app graphite-app bot requested a review from a team December 30, 2024 06:50
@wenym1 wenym1 added this pull request to the merge queue Dec 30, 2024
@wenym1 wenym1 removed this pull request from the merge queue due to a manual request Dec 30, 2024
@wenym1 wenym1 force-pushed the yiming/snapshot-backfill-vnode-stream branch from de3fef3 to 627c55a Compare December 30, 2024 09:30
@wenym1 wenym1 added this pull request to the merge queue Dec 30, 2024
Merged via the queue into main with commit 705cdf1 Dec 30, 2024
30 of 31 checks passed
@wenym1 wenym1 deleted the yiming/snapshot-backfill-vnode-stream branch December 30, 2024 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants