-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(common): Use safe interfaces for DataChunkBuilder
#12014
Conversation
Codecov Report
@@ Coverage Diff @@
## main #12014 +/- ##
==========================================
- Coverage 69.85% 69.85% -0.01%
==========================================
Files 1391 1391
Lines 233343 233345 +2
==========================================
- Hits 162996 162994 -2
- Misses 70347 70351 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 3 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
When reviewing this PR, I found several issues with the new |
Addressed the review comments in this PR, thanks for the feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask how to construct such a case that can hit this bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
let chunk = builder.consume_all(); | ||
Ok(chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a little bit weird to me. I assume this method will be called in a loop. If we always consume the chunk builder here, then what's the difference compared to initializing a ChunkBuilder
inside this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it was also unintuitive to me. The reference scenario is here: #11007 (comment).
But idea is this. Suppose snapshot read proceeds midway.
It can get stuck here while waiting for next row, perhaps object store latency:
Some(row) => {
while let Some(row) = stream.next().await.transpose()?
...
Now there are some rows in the buffer before this, but they are not yet flushed.
If some upstream updates come, we will process those, and if within the upstream updates, there is a barrier, we will drop the snapshot. This means we won't ever reach the consume_all
part of this function.
But critically even though the snapshot is dropped, there are still rows inside the buffer.
We don't want to just drop these rows, losing the progress we made.
So we should still flush these rows out, and that is done outside of the snapshot_read function, on barrier.
And so we need access to ChunkBuilder
outside of this function, so we can flush it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this makes much sense. In other words, this stream will likely be canceled in its usage cases.
DataChunkBuilder
No need to cherry-pick as it's not actually "fixing" anything. More of improving the safety of code as per @BugenZhao 's comments. See discussion: #12014 (comment) |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Sqlsmith found a bug in arrangement backfill where the same row is read twice in snapshot. That's because initial buffer is not flushed.
To be defensive, the same is done for no_shuffle_backfill, although it's not strictly necessary.
Additionally, we also refactor some interfaces for
DataChunkBuilder
, making sure they are safe.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.