-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ensure each storage read only involves one vnode #15289
Conversation
e832693
to
2684da6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM!
// Only 1 staging sst is provided | ||
assert_eq!(1, hummock_read_snapshot.1.len()); | ||
} | ||
// #[tokio::test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to comment out the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a test only for read_filter_for_batch
let read_version_vec = { | ||
let read_guard = self.read_version_mapping.read(); | ||
read_guard | ||
.get(&table_id) | ||
.map(|v| { | ||
v.values() | ||
.filter(|v| !v.read_arc().is_replicated()) | ||
.filter(|v| { | ||
let read_version = v.read(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make it fail to read the replicated data by batch read. May add a warning or panic here to avoid misuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Added a warning log when there are no non-replicated read version but replicated read version found.
Note that it is also not possible to read replicated data by batch read prior to this PR.
@@ -416,6 +429,14 @@ impl StateStoreRead for HummockStorage { | |||
epoch: u64, | |||
read_options: ReadOptions, | |||
) -> impl Future<Output = StorageResult<Self::IterStream>> + '_ { | |||
let (l_vnode_inclusive, r_vnode_exclusive) = vnode_range(&key_range); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making it unified, moving assert logic to functions vnode
and vnode_part
.
I'm not sure whether we need to strictly enforce checks on every call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making it unified, moving assert logic to functions vnode and vnode_part.
fn vnode(range)
also have the same assertion. Given that the cost of the assertion is small, I would rather make the assertion more explicit by putting in at the beginning of all the iter
calls.
I'm not sure whether we need to strictly enforce checks on every call?
This is the main motivation of this PR to avoid misuage of storage iter.
@@ -260,6 +259,14 @@ impl LocalStateStore for LocalHummockStorage { | |||
key_range: TableKeyRange, | |||
read_options: ReadOptions, | |||
) -> StorageResult<Self::IterStream<'_>> { | |||
let (l_vnode_inclusive, r_vnode_exclusive) = vnode_range(&key_range); | |||
assert_eq!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
// Only 1 staging sst is provided | ||
assert_eq!(1, hummock_read_snapshot.1.len()); | ||
} | ||
// #[tokio::test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a test only for read_filter_for_batch
@@ -102,79 +102,54 @@ impl TableWatermarksIndex { | |||
self.read_watermark(vnode, HummockEpoch::MAX) | |||
} | |||
|
|||
pub fn range_watermarks( | |||
pub fn rewrite_range_with_table_watermark( | |||
&self, | |||
epoch: HummockEpoch, | |||
key_range: &mut TableKeyRange, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just question: why not return a new key_range after rewrite instead of "mut ref "?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we also do in-place update prior to this PR. I am okay both ways so I can also change that if you think it is important.
After this PR, we remove the SkipWaterIterator from the read path. How about move it to compactor/iterator mod |
compactor/iterator.rs is already big enough with ~800 LoC. I would rather keep SkipWatermarkIterator in a separate file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the efforts
94679aa
to
6d6beb2
Compare
c91da17
to
72e6c9e
Compare
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Given that we already ensure storage iter won't read across vnodes in executors, we add strict assertion check in
HummockStorage::iter
andLocalHummockStorage::iter
to ensure the provided key range is within one vnode to make this assumption explicit. There are several benefits:version.rs
.table_watermark.rs
.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.