-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading bloom filters from Parquet files and filter row groups using them #17289
base: branch-25.02
Are you sure you want to change the base?
Support reading bloom filters from Parquet files and filter row groups using them #17289
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review, flushing the small comments I've got so far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ty operators in the expression
ast_operator::NOT, _bloom_filter_expr.push(ast::operation{ast_operator::NOT, value})}); | ||
} | ||
// For all other expressions, push an always true expression | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karthikeyann @vuule added this logic handle any non col == lit
type expressions in the filter. Essentially just transforming them all to always true.
* @brief Collects lists of equality predicate literals in the AST expression, one list per input | ||
* table column. This is used in row group filtering based on bloom filters. | ||
*/ | ||
class equality_literals_collector : public ast::detail::expression_transformer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an ast::tree
in this expression converter as we only visit and collect literals for col == lit
expressions.
*/ | ||
std::reference_wrapper<ast::expression const> visit(ast::literal const& expr) override | ||
{ | ||
return expr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to push any of these to the ast::tree
from the child class bloom_filter_expression_converter
either as these columns or literals don't participate in the transformed expression tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few small comments, looks good overall :)
Co-authored-by: Vukasin Milovanovic <[email protected]>
Description
This PR adds support to read bloom filters from Parquet files and use them to filter row groups based on
col == literal
like predicate(s), if provided.Related to #17164
Checklist