-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(storage): cache recent versions and unify batch query and time travel #18477
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -335,17 +329,14 @@ impl HummockStorage { | |||
.inspect_err(|e| tracing::error!("{}", e.to_report_string())) | |||
.map_err(|e| HummockError::meta_error(e.to_report_string()))?; | |||
let version = HummockVersion::from_rpc_protobuf(&pb_version); | |||
validate_safe_epoch(&version, table_id, epoch)?; | |||
let (tx, _rx) = unbounded_channel(); | |||
Ok(PinnedVersion::new(version, tx)) | |||
}; | |||
let version = self | |||
.simple_time_travel_version_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May improve the cache implementation in the future.
Notet that the |
Tried, but in CI we still use the etcd meta store backend, and the non-zero default will fail to start the meta node. Changing the default value may make it incompatible for system upgrading from older version. I think we can still disable it by default, and change to enable it by default after we deprecate the etcd. |
self.get_safe_version_from_recent(table_id, epoch) | ||
} | ||
} else { | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that for dropped table we always go through time travel even if it exists in one of the recent version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is unexpected to query a dropped table, and therefore this is rarely hit.
The reason for the current logic is to ensure that when querying the list of recent versions, the committed epoch of the table id is sorted so that we can easily use binary search. When the table id does not exist in a version, the table is either not created yet or dropped. When comparing a version with a specified query epoch, if the table is not created, the version is less than the query epoch, but if the table has been dropped, the version is greater than the query epoch, which means we cannot know the order between a query epoch and a version without the table_id. Therefore, if we can ensure that when query the list of recent versions the table must exist in the latest version, the list of recent versions is sorted.
Is the correct? Based on the PR, I think we will read from recent version or time travel version if |
The PR description was stale. I just updated it to not using the safe epoch. If epoch < mce, it will check the recent version. |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Motivation as described in #18214
Cache the most recent 60 hummock versions which have bumped the committed version of any table.
Unify the logic of batch query and time travel query. The logic of getting a hummock version for query will be
epoch > committed_epoch
, merge staging data with latest versionChecklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.