Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stream): compact stream chunk by dropping useless rows in sink #11070
feat(stream): compact stream chunk by dropping useless rows in sink #11070
Changes from 44 commits
ad24c38
132fb6b
8b3ec22
35e12a0
54a5ad5
2bdfe48
573524b
51bb6f3
bd68db6
d34154b
7e7cfc9
2346f49
4b978f0
935eb68
5f7c7e2
51f2c8d
e46c84b
7c3a330
890f56f
fa27338
d87c0af
4a1ff05
ce12bef
00dbcd8
37242eb
9df8fb7
d6c4847
2311ca0
3e84a5e
3409dbd
942fc9f
f31eb81
9877599
873b134
54744f0
e77d89b
3783d4b
c15c3ea
b571b69
b432b29
0ae211a
06f09a4
b457332
b915d00
64c0fa1
4e38d52
94f43b5
4bd6956
57f3317
05ebe68
3fcf00c
7cc30ac
27d3bc5
77d2cad
7ef5945
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL @BugenZhao thanks 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent about 5 minutes to understand the logic here 🤣 It takes a very clever way to in-place update the visibility array instead of rebuilding new chunks.
However, I think the motivation behind this design is because the compaction happens on
Project
and it cost dramatic performance loss, while due to my last comment, I oppose it. Instead, let's focus on the original user requirement and do it on Sink only.If so, considering the disordering problem of Sink, we can solve both problems with a single approach (as I have proposed in Slack)
-
,+
,U- U+
For example:
In this way, problem can be solved by building one hash table. Because this only happen on
Sink
, rebuilding Chunk is acceptable to me.Besides, @st1page is considering that this clever implementation can be reused. But I didn't see any requirements for reusing it 🥹
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least currently, we have two different use cases here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this makes sense, but
U-/U+
rows. There are still lots of operators can benefit more or less from this. Reordering all deletes before inserts will completely breakU-/U+
.Alternatively, we can partially achieve this while preserving event orders - the solution is trivial: just hide the update with same before & after values.
For example, set
vis = false
for these:but not these:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this PR does not reorder any events 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because it's now only used in
SinkExecutor
, notProject
. I said so because you mentioned it can be reused (but not yet now).