Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(compaction): fix table_id mask #18325

Closed
wants to merge 8 commits into from
Closed

Conversation

Li0k
Copy link
Contributor

@Li0k Li0k commented Aug 30, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

After github.com//pull/18053,SstableInfo contains "correct" table_ids. Iterator can use SstableInfo's table_ids to mask data access.

This PR fix 2 path:

  • replace existing_table_ids (which generate with catalog) with SstableInfo.table_ids
  • fix fast_compactor iterator to mask data access with SstableInfo.table_ids

related to #18323

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@Li0k Li0k requested a review from hzxa21 August 30, 2024 03:29
@github-actions github-actions bot added the type/fix Bug fix label Aug 30, 2024
@@ -396,13 +395,14 @@ async fn test_failpoints_compactor_iterator_recreate() {
.await
.unwrap();

info.table_ids.push(table_id as u32);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make the put_sst implementation populae the table_ids field correctly?

stats: &StoreLocalStatistic,
task_progress: Arc<TaskProgress>,
sstable_store: SstableStoreRef,
max_io_retry_times: usize,
) -> Self {
let existing_table_ids = HashSet::from_iter(sstable_info.table_ids.iter().cloned());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: let's name it sst_table_ids to make things more explicit. People familiar with the previous codes can easily be confused and think existing_table_ids is the table ids of the compaction group.

sstable: TableHolder,
iter: Option<BlockIterator>,
task_progress: Arc<TaskProgress>,

existing_table_ids: HashSet<StateTableId>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -234,6 +271,7 @@ impl ConcatSstableIterator {
self.cur_idx = idx;
if self.cur_idx < self.sstables.len() {
let sstable_info = &self.sstables[self.cur_idx];
let existing_table_ids = HashSet::from_iter(sstable_info.table_ids.iter().cloned());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -158,6 +159,18 @@ impl BlockDataStream {
}
}
}

pub fn block_index(&self) -> usize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this method? This is exactly the same as next_block_index

Some((data, _uncompressed_size)) => {
block_meta = self
.block_stream
.block_meta(self.block_stream.block_index())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this correct? Based on the BlockDataStream implementation, block_index is pointing to the next block not the returning block of next_block_impl

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic can be easily messed up by using tbe wrong block idx. Please add UTs to make sure things are working as expected.

let filter_block = self
.sstable
.filter_reader
.get_block_raw_filter(self.next_block_index);
self.next_block_index += 1;
.get_block_raw_filter(self.block_stream.block_index());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@Li0k Li0k closed this Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants