-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bloom-filter): add memory control for creator #5185
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Zhenchi <[email protected]>
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5185 +/- ##
==========================================
- Coverage 84.05% 83.84% -0.22%
==========================================
Files 1175 1182 +7
Lines 218970 219558 +588
==========================================
+ Hits 184065 184084 +19
- Misses 34905 35474 +569 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 6 out of 13 changed files in this pull request and generated no comments.
Files not reviewed (7)
- src/index/src/inverted_index/create/sort.rs: Evaluated as low risk
- src/mito2/src/sst/index/inverted_index/creator/temp_provider.rs: Evaluated as low risk
- src/index/src/lib.rs: Evaluated as low risk
- src/index/src/bloom_filter/creator.rs: Evaluated as low risk
- src/index/src/inverted_index/create/sort/external_sort.rs: Evaluated as low risk
- src/index/src/inverted_index/error.rs: Evaluated as low risk
- src/index/src/bloom_filter/error.rs: Evaluated as low risk
a few questions on bloom filter perf:
|
creator | ||
.push_row_elems(vec![b"c".to_vec(), b"d".to_vec()]) | ||
.await | ||
.unwrap(); | ||
// Finalize the first segment | ||
assert!(creator.cur_seg_distinct_elems_mem_usage == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert!(creator.cur_seg_distinct_elems_mem_usage == 0); | |
assert_eq!(creator.cur_seg_distinct_elems_mem_usage, 0); |
} | ||
|
||
self.bloom_filter_buf.clear(); | ||
write_u64_slice(&mut self.bloom_filter_buf, bf.as_slice()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the write_u64_slice
method be moved from creator.rs
to here?
self.bloom_filter_buf.clear(); | ||
write_u64_slice(&mut self.bloom_filter_buf, bf.as_slice()); | ||
let fbf = FinalizedBloomFilterSegment { | ||
bloom_filter_bytes: self.bloom_filter_buf.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why do we need the buf(self.bloom_filter_buf
) ?Can we move bytes to avoid cloning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe even use bytes::Bytes
for bloom_filter_buf
and Vec<bytes::Bytes>
for bloom_filter_bytes to make sure of zero copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why do we need the buf(
self.bloom_filter_buf
) ?Can we move bytes to avoid cloning?
Good advice, this buf is really not needed
Thanks for the suggestion, the calculation of the bloom filter is not a performance bottleneck at the moment, so I'm willing to keep it simple in this regard. |
/// # Format | ||
/// | ||
/// [ elem count ][ size ][ bloom filter ][ elem count ][ size ][ bloom filter ]... | ||
/// |<- u64 LE ->||<- u64 LE ->||<- bf bytes ->||<- u64 LE ->||<- u64 LE ->||<- bf bytes ->|... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would a version
number to added to file header too? Just in case future update might change file format?
Signed-off-by: Zhenchi <[email protected]>
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
#5176
What's changed and what's your intention?
ExternalTempFileProvider
to adapt to new index intermediatesFinalizedBloomFilterStorage
to a new fileFinalizedBloomFilterStorage
:ExternalTempFileProvider
drain
to returnFinalizedBloomFilterSegment
from both intermediate files and memory as a stream to the upper layerChecklist