Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parquet): introduce inverted index applier to reader #3130

Merged

Conversation

zhongzc
Copy link
Contributor

@zhongzc zhongzc commented Jan 10, 2024

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

Add the index applier in the Parquet reader to filter row groups:

  • Add the inverted_index_available property to SstInfo and FileMeta
  • Introduce the row_groups_to_read method for ParquetReaderBuilder, which returns row groups that still need to be read after being filtered through the inverted index and min-max index
  • Add metrics to observe the selectivity of the index

Moreover, once inverted_index_available becomes a property of FileMeta, it not only represents a single SST File but also includes the associated index files. Therefore, when handling deletions, they should be deleted together.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

Refer to a related PR or issue link (optional)

#2705

@github-actions github-actions bot added docs-not-required This change does not impact docs. Size: M labels Jan 10, 2024
Signed-off-by: Zhenchi <[email protected]>
Copy link

codecov bot commented Jan 10, 2024

Codecov Report

Attention: 24 lines in your changes are missing coverage. Please review.

Comparison is base (29a7f30) 85.48% compared to head (20f97c1) 85.04%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3130      +/-   ##
==========================================
- Coverage   85.48%   85.04%   -0.45%     
==========================================
  Files         822      822              
  Lines      134403   134560     +157     
==========================================
- Hits       114899   114431     -468     
- Misses      19504    20129     +625     

Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/mito2/src/access_layer.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/parquet/reader.rs Show resolved Hide resolved
src/mito2/src/sst/parquet/reader.rs Show resolved Hide resolved
@zhongzc
Copy link
Contributor Author

zhongzc commented Jan 10, 2024

@waynexia @evenyag PTAL

Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

src/mito2/src/sst/file.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@evenyag evenyag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhongzc zhongzc requested a review from waynexia January 11, 2024 05:21
@zhongzc zhongzc added this pull request to the merge queue Jan 11, 2024
src/mito2/src/sst/file.rs Outdated Show resolved Hide resolved
@zhongzc zhongzc removed this pull request from the merge queue due to a manual request Jan 11, 2024
@zhongzc zhongzc requested a review from evenyag January 11, 2024 07:19
@zhongzc
Copy link
Contributor Author

zhongzc commented Jan 11, 2024

@evenyag @waynexia PTAL

@waynexia waynexia added this pull request to the merge queue Jan 11, 2024
Merged via the queue into GreptimeTeam:main with commit fd8fb64 Jan 11, 2024
15 checks passed
@zhongzc zhongzc self-assigned this Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants