Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading bloom filters from Parquet files and filter row groups using them #17289

Open
wants to merge 96 commits into
base: branch-25.02
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
95fe8e8
Initial stuff for reading bloom filter from PQ files
mhaseeb123 Nov 9, 2024
4f0e7ab
Minor bug fix
mhaseeb123 Nov 9, 2024
48a50c4
Apply style fix
mhaseeb123 Nov 9, 2024
9a85d08
Merge branch 'branch-24.12' into fea/extract-pq-bloom-filter-data
mhaseeb123 Nov 14, 2024
b71cf9b
Merge branch 'branch-24.12' into fea/extract-pq-bloom-filter-data
mhaseeb123 Nov 15, 2024
68be24f
Some updates
mhaseeb123 Nov 16, 2024
f848251
Move contents to a separate file
mhaseeb123 Nov 16, 2024
0b65233
Revert erroneous changes
mhaseeb123 Nov 16, 2024
cf7d762
Style and doc fix
mhaseeb123 Nov 16, 2024
81efad2
Get equality predicate col indices
mhaseeb123 Nov 19, 2024
088377b
Enable `arrow_filter_policy` and `span` types in bloom filter.
mhaseeb123 Nov 20, 2024
0435bff
Merge branch 'branch-24.12' into fea/extract-pq-bloom-filter-data
mhaseeb123 Nov 20, 2024
3dff590
Successfully search bloom filter
mhaseeb123 Nov 21, 2024
71e1d33
style fix
mhaseeb123 Nov 21, 2024
aa65a2b
Code cleanup
mhaseeb123 Nov 22, 2024
c52821b
add tests
mhaseeb123 Nov 25, 2024
3a20a98
Initial stuff for reading bloom filter from PQ files
mhaseeb123 Nov 9, 2024
d67e4b5
Minor bug fix
mhaseeb123 Nov 9, 2024
10471d4
Apply style fix
mhaseeb123 Nov 9, 2024
1e12662
Some updates
mhaseeb123 Nov 16, 2024
ee7217c
Move contents to a separate file
mhaseeb123 Nov 16, 2024
f8e6159
Revert erroneous changes
mhaseeb123 Nov 16, 2024
1886cab
Style and doc fix
mhaseeb123 Nov 16, 2024
be228b3
Get equality predicate col indices
mhaseeb123 Nov 19, 2024
aaf355e
Enable `arrow_filter_policy` and `span` types in bloom filter.
mhaseeb123 Nov 20, 2024
e92324e
Successfully search bloom filter
mhaseeb123 Nov 21, 2024
0b1719d
style fix
mhaseeb123 Nov 21, 2024
ef3a262
Code cleanup
mhaseeb123 Nov 22, 2024
051be2d
add tests
mhaseeb123 Nov 25, 2024
a12c90e
Merge branch 'fea/extract-pq-bloom-filter-data' of https://github.com…
mhaseeb123 Nov 25, 2024
fb55c3f
Major cleanups
mhaseeb123 Nov 26, 2024
b477d2d
Significant code refactoring
mhaseeb123 Nov 26, 2024
f9f1746
minor style fix
mhaseeb123 Nov 26, 2024
bad484f
refactoring
mhaseeb123 Nov 26, 2024
ce09d43
Minor refactoring
mhaseeb123 Nov 26, 2024
dddee6c
Minor improvements
mhaseeb123 Nov 26, 2024
0cfeb80
Add gtest
mhaseeb123 Nov 26, 2024
9137585
Improvements
mhaseeb123 Nov 26, 2024
77152b4
Support int96 in bloom filter
mhaseeb123 Nov 27, 2024
3984291
Cleanup
mhaseeb123 Nov 27, 2024
9a39aa4
Minor improvements
mhaseeb123 Nov 27, 2024
1def801
Fix minor bug
mhaseeb123 Nov 27, 2024
6edc248
MInor bug fixing
mhaseeb123 Nov 28, 2024
2925f1e
Add python tests
mhaseeb123 Nov 28, 2024
efc6ec0
Correct parquet files
mhaseeb123 Nov 28, 2024
df84aca
Merge branch 'branch-25.02' into fea/extract-pq-bloom-filter-data
mhaseeb123 Nov 28, 2024
a2fa784
minor spelling fix
mhaseeb123 Dec 2, 2024
1f5da37
Merge branch 'branch-25.02' into fea/extract-pq-bloom-filter-data
mhaseeb123 Dec 2, 2024
fa0cec8
Apply suggestions from code review
mhaseeb123 Dec 2, 2024
7a309c6
Minor bug fix
mhaseeb123 Dec 2, 2024
bcc68c0
Convert to enum class
mhaseeb123 Dec 2, 2024
2dce9b1
Apply suggestion from code review
mhaseeb123 Dec 3, 2024
e03bea0
Suggestions from code reviews
mhaseeb123 Dec 3, 2024
059a9d8
Merge branch 'branch-25.02' into fea/extract-pq-bloom-filter-data
mhaseeb123 Dec 3, 2024
4b0b5ed
Apply suggestions from code reviews
mhaseeb123 Dec 4, 2024
c1256b1
Refactor into single table for cudf::compute_column
mhaseeb123 Dec 4, 2024
88bf491
Minor, add const
mhaseeb123 Dec 4, 2024
9ca42c6
Move bloom filter test to parquet test
mhaseeb123 Dec 4, 2024
84c24c1
Minor updates
mhaseeb123 Dec 4, 2024
0c05031
Minor
mhaseeb123 Dec 4, 2024
09560c5
Logical and between bloom filter and stats
mhaseeb123 Dec 4, 2024
21f4412
Revert merging converted AST tables.
mhaseeb123 Dec 4, 2024
442de80
Revert an extra eol
mhaseeb123 Dec 4, 2024
f7952d4
Revert extra eol
mhaseeb123 Dec 4, 2024
4d0c570
Read bloom filter data sync
mhaseeb123 Dec 4, 2024
67c6247
Update cpp/src/io/parquet/bloom_filter_reader.cu
mhaseeb123 Dec 4, 2024
40c80b7
strong type for int96 timestamp
mhaseeb123 Dec 4, 2024
690c165
Merge branch 'fea/extract-pq-bloom-filter-data' of https://github.com…
mhaseeb123 Dec 4, 2024
c5f8150
Remove unused header
mhaseeb123 Dec 4, 2024
7a21a6e
Merge branch 'branch-25.02' into fea/extract-pq-bloom-filter-data
mhaseeb123 Dec 6, 2024
4465277
Apply suggestions from code review
mhaseeb123 Dec 9, 2024
3888732
Apply suggestions
mhaseeb123 Dec 9, 2024
8bc8927
Update cpp/src/io/parquet/reader_impl_helpers.hpp
mhaseeb123 Dec 9, 2024
d719e65
Update cpp/src/io/parquet/reader_impl_helpers.hpp
mhaseeb123 Dec 9, 2024
03cf07f
Move equality_literals instead of copying
mhaseeb123 Dec 9, 2024
de94168
Merge branch 'fea/extract-pq-bloom-filter-data' of https://github.com…
mhaseeb123 Dec 9, 2024
c92d326
Minor
mhaseeb123 Dec 9, 2024
82083f9
Use spans instead of passing around vectors
mhaseeb123 Dec 10, 2024
6918a40
Minor
mhaseeb123 Dec 10, 2024
85cdc00
Make `get_equality_literals()` safe again
mhaseeb123 Dec 10, 2024
aa1a909
Merge branch 'branch-25.02' into fea/extract-pq-bloom-filter-data
mhaseeb123 Dec 10, 2024
fdf8fc8
Update counting_iterator
mhaseeb123 Dec 10, 2024
10a8f5a
Minor changes
mhaseeb123 Dec 10, 2024
d46504f
Minor
mhaseeb123 Dec 10, 2024
c94ce86
Sync arrow filter policy with cuco
mhaseeb123 Dec 10, 2024
69aa685
Merge branch 'branch-25.02' into fea/extract-pq-bloom-filter-data
mhaseeb123 Dec 11, 2024
d95a178
Address partial reviewer comments and fix new logger header
mhaseeb123 Dec 12, 2024
840c6e7
Revert to direct dtype check until I find a way to get scalar from li…
mhaseeb123 Dec 12, 2024
9d8c071
Create a dummy scalar of type T and compare with dtype
mhaseeb123 Dec 12, 2024
3b8aea0
Use a temporary scalar
mhaseeb123 Dec 12, 2024
0c859db
Merge branch 'branch-25.02' into fea/extract-pq-bloom-filter-data
mhaseeb123 Dec 12, 2024
c385537
Recalculate `total_row_groups` in apply_bloom_filter
mhaseeb123 Dec 13, 2024
3693ad1
Simplify bloom filter expression with ast::tree and handle non-equali…
mhaseeb123 Dec 13, 2024
c2de9fb
Apply suggestions from code review
mhaseeb123 Dec 13, 2024
344851c
Minor optimization: Set `have_bloom_filters` while populating `bloom_…
mhaseeb123 Dec 14, 2024
96fb7c2
Merge branch 'branch-25.02' into fea/extract-pq-bloom-filter-data
mhaseeb123 Dec 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Move equality_literals instead of copying
mhaseeb123 committed Dec 9, 2024
commit 03cf07f08b96d414d6555723163f96b5f0f437e0
2 changes: 1 addition & 1 deletion cpp/src/io/parquet/bloom_filter_reader.cu
Original file line number Diff line number Diff line change
@@ -246,7 +246,7 @@ class equality_literals_collector : public ast::detail::expression_transformer {
*
* @return Vectors of equality literals, one per input table column
*/
[[nodiscard]] std::vector<std::vector<ast::literal*>> const get_equality_literals() const
[[nodiscard]] std::vector<std::vector<ast::literal*>> get_equality_literals() const
{
return _equality_literals;
}