Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(common): Use safe interfaces for DataChunkBuilder #12014

Merged
merged 7 commits into from
Sep 4, 2023

Conversation

kwannoel
Copy link
Contributor

@kwannoel kwannoel commented Sep 1, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Sqlsmith found a bug in arrangement backfill where the same row is read twice in snapshot. That's because initial buffer is not flushed.

To be defensive, the same is done for no_shuffle_backfill, although it's not strictly necessary.

Additionally, we also refactor some interfaces for DataChunkBuilder, making sure they are safe.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@github-actions github-actions bot added the type/fix Bug fix label Sep 1, 2023
@kwannoel kwannoel requested a review from chenzl25 September 1, 2023 00:31
@kwannoel kwannoel requested a review from BugenZhao September 1, 2023 00:39
@kwannoel kwannoel enabled auto-merge September 1, 2023 00:39
@codecov
Copy link

codecov bot commented Sep 1, 2023

Codecov Report

Merging #12014 (735c557) into main (fdbba6e) will decrease coverage by 0.01%.
Report is 10 commits behind head on main.
The diff coverage is 43.47%.

@@            Coverage Diff             @@
##             main   #12014      +/-   ##
==========================================
- Coverage   69.85%   69.85%   -0.01%     
==========================================
  Files        1391     1391              
  Lines      233343   233345       +2     
==========================================
- Hits       162996   162994       -2     
- Misses      70347    70351       +4     
Flag Coverage Δ
rust 69.85% <43.47%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...ream/src/executor/backfill/arrangement_backfill.rs 0.00% <0.00%> (ø)
...tream/src/executor/backfill/no_shuffle_backfill.rs 0.00% <ø> (ø)
...m/src/executor/backfill/upstream_table/snapshot.rs 87.50% <ø> (ø)
src/common/src/util/chunk_coalesce.rs 94.61% <18.18%> (-2.62%) ⬇️
src/storage/src/table/mod.rs 82.60% <100.00%> (-0.61%) ⬇️
src/stream/src/executor/backfill/utils.rs 9.52% <100.00%> (-0.72%) ⬇️

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@BugenZhao
Copy link
Member

When reviewing this PR, I found several issues with the new DataChunkBuilder design. Would you please first take a look? Sorry for being late. 🥺

#11117 (review)

@kwannoel
Copy link
Contributor Author

kwannoel commented Sep 1, 2023

When reviewing this PR, I found several issues with the new DataChunkBuilder design. Would you please first take a look? Sorry for being late. 🥺

#11117 (review)

Addressed the review comments in this PR, thanks for the feedback.

Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask how to construct such a case that can hit this bug?

Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kwannoel kwannoel added this pull request to the merge queue Sep 1, 2023
Comment on lines +147 to +148
let chunk = builder.consume_all();
Ok(chunk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little bit weird to me. I assume this method will be called in a loop. If we always consume the chunk builder here, then what's the difference compared to initializing a ChunkBuilder inside this function?

Copy link
Contributor Author

@kwannoel kwannoel Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it was also unintuitive to me. The reference scenario is here: #11007 (comment).

But idea is this. Suppose snapshot read proceeds midway.

It can get stuck here while waiting for next row, perhaps object store latency:

            Some(row) => {
    while let Some(row) = stream.next().await.transpose()? 
...

Now there are some rows in the buffer before this, but they are not yet flushed.

If some upstream updates come, we will process those, and if within the upstream updates, there is a barrier, we will drop the snapshot. This means we won't ever reach the consume_all part of this function.

But critically even though the snapshot is dropped, there are still rows inside the buffer.

We don't want to just drop these rows, losing the progress we made.

So we should still flush these rows out, and that is done outside of the snapshot_read function, on barrier.

And so we need access to ChunkBuilder outside of this function, so we can flush it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this makes much sense. In other words, this stream will likely be canceled in its usage cases.

@kwannoel kwannoel removed this pull request from the merge queue due to a manual request Sep 1, 2023
@kwannoel kwannoel added this pull request to the merge queue Sep 1, 2023
@kwannoel kwannoel requested a review from BugenZhao September 1, 2023 09:15
@kwannoel kwannoel removed this pull request from the merge queue due to a manual request Sep 1, 2023
@kwannoel kwannoel changed the title fix(stream): flush initial snapshot_read buffer refactor(common): Use safe interfaces for DataChunkBuilder Sep 4, 2023
@github-actions github-actions bot added type/refactor and removed type/fix Bug fix labels Sep 4, 2023
@kwannoel
Copy link
Contributor Author

kwannoel commented Sep 4, 2023

No need to cherry-pick as it's not actually "fixing" anything. More of improving the safety of code as per @BugenZhao 's comments.

See discussion: #12014 (comment)

@kwannoel kwannoel added this pull request to the merge queue Sep 4, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 4, 2023
@kwannoel kwannoel added this pull request to the merge queue Sep 4, 2023
Merged via the queue into main with commit 033924d Sep 4, 2023
@kwannoel kwannoel deleted the kwannoel/flush-initial-backfill branch September 4, 2023 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants