Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): support hummock time travel #17621

Merged
merged 17 commits into from
Jul 11, 2024
Merged

feat(storage): support hummock time travel #17621

merged 17 commits into from
Jul 11, 2024

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented Jul 9, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR's LOC is substantial but the concept is straightforward. Tips for reviewers: the major changes comes in the 2 new files:

  • src/storage/hummock_sdk/src/time_travel.rs. It converts between HummockVersion with and without the sizable SstableInfo, as well as HummockVersionDelta.
  • src/meta/src/hummock/manager/time_travel.rs. It reads and writes metadata for time travel.

This PR support point-in-time time travel query for table/MVs.

The basic idea is:

  • Create time travel metadata/data. Along with finishing a checkpoint barrier, metadata for time travel is persisted to sql meta store
    1. The epoch->version_id mapping entry.
    2. The Hummock version delta generated by this checkpoint barrier, where SstableInfo has been replaced with thinSstableId.
    3. Optionally a snapshot of the current Hummock version, where SstableInfo has been replaced with thin SstableId, i.e. ref to SstableInfo.
    4. The SstableInfo used by delta and version from step 2 and 3. It's used to recover delta and version from step 2 and 3 afterward.
    5. When deleteing SST files from object store, SSTs referenced in step 4 are filtered out, so that these SST files remain available for time travel query.
    6. As an optimization, only compaction group 3 (for table/MV) or its successors are retained in step 2 and 3. This helps to reduce the metadata size and data size added by SSTs.
  • Make time travel query.
    1. Specify a system time for query. The SQL syntax is SELECT ... FROM ... FOR SYSTEM_TIME AS OF unix_timestamp.
    2. HummockStorage in compute node converts this system time to an epoch and fetch an eligible time travel Hummock version from meta node. This RPC to the meta node can potentially be optimized out in the future.
    3. HummockStorage get/iter over this time travel Hummock version.

Not in this PR:

  • e2e test. As it's not straightforward to write a e2e test with the AS OF unix_timestamp syntax. Will add test after support AS OF now - interval in other PR.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@zwang28 zwang28 added the user-facing-changes Contains changes that are visible to users label Jul 9, 2024
@graphite-app graphite-app bot requested a review from a team July 9, 2024 05:07
@zwang28 zwang28 removed the request for review from a team July 9, 2024 05:07
@zwang28 zwang28 requested a review from a team July 9, 2024 05:07
@zwang28
Copy link
Contributor Author

zwang28 commented Jul 9, 2024

@chenzl25 Would you please help to review the lookupjoin part, as well as the as of syntax?

@zwang28 zwang28 requested a review from chenzl25 July 9, 2024 05:08
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 5195 files.

Valid Invalid Ignored Fixed
2219 1 2975 0
Click to see the invalid file list
  • src/meta/src/hummock/manager/time_travel.rs

src/meta/src/hummock/manager/time_travel.rs Show resolved Hide resolved
@zwang28 zwang28 requested a review from hzxa21 July 9, 2024 05:10
@zwang28 zwang28 requested review from Li0k and wenym1 July 9, 2024 05:23
@chenzl25
Copy link
Contributor

chenzl25 commented Jul 9, 2024

@chenzl25 Would you please help to review the lookupjoin part, as well as the as of syntax?

Sure!

Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider index selection during time travel. There are 2 places we need to pass the time travel information to the index.

  1. pub fn to_index_scan_if_index_covered(&self, index: &Rc<IndexCatalog>) -> Option<LogicalScan> {
  2. https://github.com/risingwavelabs/risingwave/blob/bd9dcfc064df4642e9c0d674b3bd64f7aeb4d3f7/src/frontend/src/optimizer/rule/index_selection_rule.rs

@graphite-app graphite-app bot requested a review from a team July 9, 2024 07:09
@zwang28 zwang28 requested a review from chenzl25 July 9, 2024 07:42
@zwang28
Copy link
Contributor Author

zwang28 commented Jul 9, 2024

We need to consider index selection during time travel. There are 2 places we need to pass the time travel information to the index.

to_index_scan_if_index_covered already handle as_of via TableScan::to_index_scan

@zwang28
Copy link
Contributor Author

zwang28 commented Jul 10, 2024

Other changes includes in another coming PR:

  • e2e tests
  • SQL syntax: FOR SYSTEM_TIME AS OF CURRENT - [INTERVAL]
  • a simple time travel version cache in compute node in order to speed up time travel query

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some early comments first. Will continue afterwards.

src/config/docs.md Outdated Show resolved Hide resolved
Comment on lines +359 to +365
.on_conflict(
OnConflict::column(hummock_sstable_info::Column::SstId)
.do_nothing()
.to_owned(),
)
.do_nothing()
.exec(txn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this introduce excessive load on meta store given that there can be many SSTs in a version even though many of them already exist in the hummock_sstable_info table? I am thinking whether we should maintain an in-memory set for sst ids in hummock_sstable_info to avoid unneccesary traffic to meta store. Same trick applies to other time travel related meta store tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an in-memory set for sst ids in hummock_sstable_info

+1 because it's in the critical path commit_epoch

other time travel related meta store tables

It seems unnecessary to me because there won't be many conflict insert for them. Besides, for the read path the complete metadata is required rather than the index (*_id), which would be very large in memory.

Copy link
Contributor Author

@zwang28 zwang28 Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See last_time_travel_snapshot_sst_ids. It is much simpler than maintaining a full set of sst ids, at the cost of less accuracy, i.e. sometimes there'll still be attempts to unnecessarily write the same SSTs into meta store. But it's good enough and avoid introducing locking in truncate_time_travel_metadata.

src/meta/src/hummock/manager/time_travel.rs Show resolved Hide resolved
src/meta/src/hummock/manager/checkpoint.rs Show resolved Hide resolved
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM because most logics are non-intrusive to the existing logics,

src/meta/src/hummock/manager/time_travel.rs Outdated Show resolved Hide resolved
src/meta/src/hummock/manager/time_travel.rs Show resolved Hide resolved
src/meta/src/hummock/manager/time_travel.rs Outdated Show resolved Hide resolved
/// Clone from an `SstableInfo`, but only set the `sst_id` for the target, leaving other fields as default.
/// The goal is to reduce the size of pb object generated afterward.
fn stripped_sstable_info(origin: &SstableInfo) -> SstableInfo {
SstableInfo {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a note instead of a comment. At the first glance, I think we can just have a new struct with sstable_info to be an Option instead of still writing 12 empty fields, which may occupy the space unneccessarily. I then realized that this is because to do so we need a new struct for each nested pb messages in HummockVersion including Levels, Level, OverlappngLevel and ... Eventually I think we should get rid of using PBs directly in meta and switch to our own struct (probably with Arc on sstable info as well to save memory) but yeah we don't need to do that in this PR.

@zwang28 zwang28 requested review from hzxa21 and Li0k July 11, 2024 11:59
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zwang28 zwang28 enabled auto-merge July 11, 2024 13:44
@zwang28 zwang28 added this pull request to the merge queue Jul 11, 2024
Merged via the queue into main with commit 5230e97 Jul 11, 2024
32 of 33 checks passed
@zwang28 zwang28 deleted the wangzheng/time_travel branch July 11, 2024 14:07
zwang28 added a commit that referenced this pull request Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants