-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support vacuum inverted index #17291
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@b41sh mentioned that Index and Table may not have a one-to-one mapping, as an aggregate index could be built from multiple tables. At the time I suggested adding table_id
to the index key, I wasn’t aware of this. Should we reconsider the design of the marked-deleted key in light of this information?
Reviewed 13 of 21 files at r1, all commit messages.
Reviewable status: 13 of 21 files reviewed, 2 unresolved discussions (waiting on @SkyFan2002)
src/meta/app/src/schema/table.rs
line 889 at r1 (raw file):
#[derive(Clone, Debug, PartialEq, Eq)] pub struct GetMarkedDeletedTableIndexesReply { pub table_indexes: HashMap<u64, Vec<(String, String, MarkedDeletedIndexMeta)>>,
What about introducing two type alias IndexName = String
and IndexVersion = String
to improve readability?
Or make it more strict, by defining a explicit type for index version: struct IndexVersion(String);
src/meta/api/src/schema_api_impl.rs
line 2655 at r1 (raw file):
#[logcall::logcall] #[fastrace::trace] async fn get_marked_deleted_table_indexes(
The implementation of this method is actually a list
operation thus this method should be list_marked_deleted_table_indexes()
@drmingdrmer adding Currently, we don't have indexes associated with multiple tables. If we add such indexes in the future, they would be fundamentally different from our current indexes. I propose that if we add such indexes in the future, we should use a separate key to store these dropped indexes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 21 files at r1, 10 of 10 files at r2, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @SkyFan2002)
src/meta/app/src/schema/table.rs
line 893 at r2 (raw file):
pub type IndexName = String; pub type IndexVersion = String;
Since these aliases already exist, use them throughout the codebase to make the code more self-explanatory and improve readability.
src/meta/api/src/schema_api_impl.rs
line 2679 at r2 (raw file):
DirName::new_with_level(ident, 3) } };
Suggestion:
let dir = {
let table_id = table_id.unwrap_or_default();
let ident = MarkedDeletedTableIndexIdIdent::new_generic(
tenant,
MarkedDeletedTableIndexId::new(
table_id,
"dummy".to_string(),
"dummy".to_string(),
),
);
DirName::new_with_level(ident, 2)
};
src/meta/api/src/schema_api_impl.rs
line 3249 at r2 (raw file):
tenant: &Tenant, table_id: u64, indexes: &[(String, String)],
Is there need to remove a collection of indexes in transaction? AFAIK, they are independent and can be removed one by one.
Code quote:
async fn remove_marked_deleted_table_indexes(
&self,
tenant: &Tenant,
table_id: u64,
indexes: &[(String, String)],
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
This PR implements a new table function
fuse_vacuum_drop_inverted_index()
to clean up the data of dropped and outside the retention period inverted indexes.Implemention
A new key-value pair is added to the meta-service:
When an inverted index is dropped or replaced, the
fd_marked_deleted_table_index
key-value pair is added.When a vacuum is triggered, the meta-service will check the
__fd_marked_deleted_table_index
key. And filter out the indexes that is in retention period withMarkedDeletedIndexMeta.dropped_on
.The vacuum will delete the index data that is not in retention period, by identifying the index files with
index_name
andindex_version
. After that, the meta-service will remove the index meta from the__fd_marked_deleted_table_index/index_name/index_version
key.Tests
Type of change
This change is