Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(batch): query historical epoch data (#6840)
This PR supports "query historical epoch data", which is enabled by meta backup underneath. Typical use case: 1. Create meta backup via `risectl meta backup-meta` from time to time. 2. To query historical data, firstly find a valid epoch. A valid epoch is one that within the `safe_epoch` and `max_committed_epoch` of at least one meta snapshot. ``` dev=> select * from rw_catalog.rw_meta_snapshot; meta_snapshot_id | hummock_version_id | safe_epoch | safe_epoch_ts | max_committed_epoch | max_committed_epoch_ts ------------------+--------------------+------------------+-------------------------+---------------------+------------------------- 1 | 5 | 0 | | 3551844974919680 | 2022-12-19 06:40:53.255 2 | 9 | 3551845695750144 | 2022-12-19 06:41:04.254 | 3551847139508224 | 2022-12-19 06:41:26.284 (2 rows) ``` 3. Suppose we choose an epoch 3551845695750200, which is covered by meta backup 2 above. 4. `SET QUERY_EPOCH=3551845695750200`. Then all `SELECT` in this session returns data as of this epoch. 5. `SET QUERY_EPOCH=0` to disable historical query. Implementation: - frontend: support querying available meta backup info, in order to get a best-fit epoch to use. - frontend: support specifying query_epoch session variable. - storage: support querying data using arbitrary historical hummock version, which is from meta backup. **Query schema changed or dropped table is not supported in this PR**: - Querying historical data from these tables is doable because meta backup contains hummock version along with table catalog. However schema change feature is not available yet, so shall we also postpone supporting such query? Example with QUERY_EPOCH can be found in `test_query_backup.sh`. Approved-By: hzxa21 Approved-By: Li0k Co-Authored-By: zwang28 <[email protected]> Co-Authored-By: zwang28 <[email protected]>
- Loading branch information