Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit instantiations for IVF-PQ search kernels used in tests #2212

Merged
merged 14 commits into from
Mar 18, 2024

Conversation

tfeher
Copy link
Contributor

@tfeher tfeher commented Mar 5, 2024

Compilation of IVF-PQ search kernels can be time consuming. In libraft.so the compilation is done in parallel for kernels without filtering and with int64_t index type.

We have test with uint32_t index type as well as tests for bitset_filter with both 32 and 64 bit index types. This PR adds explicit template instantiations for the test. This way we avoid repeated compilation of the kernels with filter and this also enables parallel compilation of the compute_similarity kernel for different template types. The kernels with these additional type parameters are not added to libraft.so, only linked together with the test executable.

Note that this PR does not increase the number of compiled kernels, but it enables to compile them in parallel.

@tfeher tfeher requested review from a team as code owners March 5, 2024 18:14
@tfeher tfeher self-assigned this Mar 5, 2024
@tfeher tfeher added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Vector Search and removed cpp CMake labels Mar 5, 2024
@tfeher
Copy link
Contributor Author

tfeher commented Mar 5, 2024

What is the best way to avoid this extra compilation @lowener and @achirkin?

One idea would be to cast away the index type if none_ivf_sample_filter is selected while we wrap it into the adapter. Could we have a way where the filter adapter is not templated on IdxT?

@cjnolet
Copy link
Member

cjnolet commented Mar 5, 2024

One idea would be to cast away the index type if none_ivf_sample_filter is selected while we wrap it into the adapter. Could we have a way where the filter adapter is not templated on IdxT?

We had a discussion a little while back of just having the search_with_filter() accept a cuda::std::function as an argument (thus removing the template for the filter type and turning it into a runtime argument). I wonder if it's finally time to make that change in the 24.04 version.

@tfeher
Copy link
Contributor Author

tfeher commented Mar 10, 2024

I wonder if it's finally time to make that change in the 24.04 version.

As discussed offline, I recommend to push further changes to the filter function to 24.06, and keep this PR in its current, non-breaking form. I have added a reference to the tracking issue for filters #1738

tfeher added 2 commits March 11, 2024 13:58
- move common -ext.cuh headers to raft_internal
- move .cu files back to src
@tfeher
Copy link
Contributor Author

tfeher commented Mar 11, 2024

Overview of the changes in compile time. Test and bench with filtering plus test with uint32_t (highlighted in bold) is now compiled in parallel

file build time before after PR (mm:ss)
ann_ivf_pq/test_float_uint32_t.cu 50:15 3:13
ann_ivf_pq/test_filter_float_int64_t.cu 26:52 1:30
ann_ivf_pq/test_filter_int8_t_int64_t.cu 26:48 1:50
bench/ivf_pq_filter_float_int64_t.cu 21:05 1:48
ann_ivf_pq/test_int8_t_int64_t.cu 3:17 3:14
ann_ivf_pq/test_float_int64_t.cu 3:14 4:14
ann_ivf_pq/test_uint8_t_int64_t.cu 3:11 4:11
bench/ivf_pq_int8_t_int64_t.cu 1:55 1:45
bench/ivf_pq_uint8_t_int64_t.cu 1:53 1:39
bench/ivf_pq_float_int64_t.cu 1:53 1:56
ann_ivf_pq/ivf_pq_compute_similarity_half_fp8_true_bitset32 0 (new) 6:06
ann_ivf_pq/ivf_pq_compute_similarity_half_fp8_true_filt32 0 (new) 6:03
ann_ivf_pq/ivf_pq_compute_similarity_half_fp8_true_bitset64 0 (new) 5:37
ann_ivf_pq/ivf_pq_compute_similarity_half_fp8_false_filt32 0 (new) 5:15
ann_ivf_pq/ivf_pq_compute_similarity_half_fp8_false_bitset32 0 (new) 5:15
ann_ivf_pq/ivf_pq_compute_similarity_half_half_bitset32 0 (new) 5:06
ann_ivf_pq/ivf_pq_compute_similarity_float_fp8_true_bitset32 0 (new) 5:04
ann_ivf_pq/ivf_pq_compute_similarity_half_fp8_false_bitset64 0 (new) 5:03
ann_ivf_pq/ivf_pq_compute_similarity_half_half_filt32 0 (new) 5:03
ann_ivf_pq/ivf_pq_compute_similarity_float_fp8_true_filt32 0 (new) 5:02
ann_ivf_pq/ivf_pq_compute_similarity_float_fp8_true_bitset64 0 (new) 4:51
ann_ivf_pq/ivf_pq_compute_similarity_float_fp8_false_bitset32 0 (new) 4:46
ann_ivf_pq/ivf_pq_compute_similarity_half_half_bitset64 0 (new) 4:44
ann_ivf_pq/ivf_pq_compute_similarity_float_fp8_false_filt32 0 (new) 4:44
ann_ivf_pq/ivf_pq_compute_similarity_float_fp8_false_bitset64 0 (new) 4:42
ann_ivf_pq/ivf_pq_compute_similarity_float_half_bitset32 0 (new) 4:37
ann_ivf_pq/ivf_pq_compute_similarity_float_half_filt32 0 (new) 4:37
ann_ivf_pq/ivf_pq_compute_similarity_float_half_bitset64 0 (new) 4:35
ann_ivf_pq/ivf_pq_compute_similarity_float_float_bitset32 0 (new) 4:34
ann_ivf_pq/ivf_pq_compute_similarity_float_float_filt32 0 (new) 4:31
ann_ivf_pq/ivf_pq_compute_similarity_float_float_bitset64 0 (new) 4:28

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tfeher, for the PR! Looks good to me. A small nitpick about the internal indexing type below.

cpp/include/raft/neighbors/detail/ivf_pq_build.cuh Outdated Show resolved Hide resolved
@tfeher tfeher force-pushed the ivfpq_test_instantiate_uint32 branch from 2bf77db to 53a6f1a Compare March 18, 2024 09:10
Copy link
Member

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@tfeher
Copy link
Contributor Author

tfeher commented Mar 18, 2024

/merge

@rapids-bot rapids-bot bot merged commit d14cac2 into rapidsai:branch-24.04 Mar 18, 2024
71 checks passed
achirkin pushed a commit to enp1s0/raft that referenced this pull request Mar 18, 2024
…apidsai#2212)

Compilation of IVF-PQ search kernels can be time consuming. In `libraft.so` the compilation is done in parallel for kernels without filtering and with `int64_t` index type.

We have test with `uint32_t` index type as well as tests for `bitset_filter` with both 32 and 64 bit index types. This PR adds explicit template instantiations for the test. This way we avoid repeated compilation of the kernels with filter and this also enables parallel compilation of the `compute_similarity` kernel for different template types. The kernels with these additional type parameters are not added to `libraft.so`, only linked together with the test executable. 

Note that this PR does not increase the number of compiled kernels, but it enables to compile them in parallel.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Ben Frederickson (https://github.com/benfred)

URL: rapidsai#2212
achirkin pushed a commit to enp1s0/raft that referenced this pull request Mar 18, 2024
…apidsai#2212)

Compilation of IVF-PQ search kernels can be time consuming. In `libraft.so` the compilation is done in parallel for kernels without filtering and with `int64_t` index type.

We have test with `uint32_t` index type as well as tests for `bitset_filter` with both 32 and 64 bit index types. This PR adds explicit template instantiations for the test. This way we avoid repeated compilation of the kernels with filter and this also enables parallel compilation of the `compute_similarity` kernel for different template types. The kernels with these additional type parameters are not added to `libraft.so`, only linked together with the test executable.

Note that this PR does not increase the number of compiled kernels, but it enables to compile them in parallel.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Ben Frederickson (https://github.com/benfred)

URL: rapidsai#2212
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Vector Search
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants