-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(topn): compact the (Group)TopN output to avoid amplification #19451
Conversation
TopNStaging
to compact the topn results
Can we trigger a nexmark benchmark on this? |
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
A bug is found by e2e test. Fixing... |
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
/// Used to manage staging changes for one `TopNCache` (for one group). | ||
/// It should be maintained when an entry is inserted or deleted from the `middle` cache. | ||
#[derive(Debug, Default)] | ||
pub struct TopNStaging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to reuse StreamChunkCompactor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can achieve it with ease, I want to avoid deserialize those rows and construct StreamChunk
with wastes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid deserialize those rows and construct StreamChunk with wastes
This sounds reasonable. Does RowOpMap
work?
with ease
Haven't read the code, but you also made a mistake. So I'm thinking it's not trivial 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty straightforward actually. I made a mistake because I'm stupid😄
benchmarking... |
Done. Results are added to PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NP
debug_assert!(!self.to_update.contains_key(&cache_key)); | ||
debug_assert!(!self.to_insert.contains_key(&cache_key)); | ||
self.to_insert.insert(cache_key, row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we enable assertion in production as well, just like mem-table conflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I realized that we shouldn't do any assertion in this structure. We should leave the responsibility to TopNCache
, and the latter, instead of simply assert
, will do consistency_error
/consistency_panic
.
if self.to_insert.remove(&cache_key).is_some() { | ||
// do nothing more | ||
} else if let Some((old_row, _)) = self.to_update.remove(&cache_key) { | ||
self.to_delete.insert(cache_key, old_row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we assert_eq
about the new row?
Signed-off-by: Richard Chien <[email protected]>
Really so niubi? |
I think this is because of reduced redundant deserialization. |
…9451) Signed-off-by: Richard Chien <[email protected]>
…9451) Signed-off-by: Richard Chien <[email protected]>
…9451) (#19514) Signed-off-by: Richard Chien <[email protected]> Co-authored-by: Richard Chien <[email protected]>
…9451) Signed-off-by: Richard Chien <[email protected]>
…9451) (#19512) Signed-off-by: Richard Chien <[email protected]> Co-authored-by: Richard Chien <[email protected]>
…9451) Signed-off-by: Richard Chien <[email protected]>
…9451) Signed-off-by: Richard Chien <[email protected]>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Previously we produced intermediate result containing redundant rows in GroupTopN, this PR uses a newly introduced
TopNStaging
structure to compact the result.For instance, with
offset = 0, limit = 1, group by col 0, 1, order by col 2
, this input chunkwill produce this output
but it actually should produce only one row:
Benchmark Results
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.