Cache IVF-PQ and select-warpsort kernel launch parameters to reduce latency #1786

achirkin · 2023-08-30T06:58:21Z

This PR aims at reducing the latency in IVF-PQ and related functions, especially with small work sizes and in the "throughput" benchmark mode.

Add kernel config caching to ivf_pq::search::compute_similarity kernel
Add kernel config caching to select::warpsort
Fix the memory_resource usage in matrix::select_k: make sure all temporary allocations use raft's workspace memory resource.

…gemm

cjnolet · 2024-01-25T13:36:34Z

@achirkin can you retarget this PR to 24.4? It's too late to get this into 24.02.

achirkin · 2024-01-30T14:17:42Z

I've benchmarked prims::SelectK, prims:KNN/ivf_pq (with n_queries set to 1), and ann-bench ivf-pq.

Performance changes in the prims benchmarks are very minimal; most of the cases are the same withing +-2-3% time, a few cases became faster by 5-8%.

The most benefit comes in ann-bench run in the "throughput" mode. At 4 threads, it's 20%, at 32 threads it's 60% faster.

cjnolet

LGTM.

tfeher

Thanks for this PR Artem, it looks good to me. I have one question, but that need not block merging this.

tfeher · 2024-02-02T23:24:51Z

cpp/include/raft/neighbors/detail/ivf_pq_search.cuh

+  inline auto operator()(const search_kernel_key& x) const noexcept -> std::size_t
+  {
+    return (size_t{x.manage_local_topk} << 63) +
+           size_t{x.topk} * size_t{x.n_probes} * size_t{x.n_queries} +


Why is this hash with parameters ("*") multiplied used? I guess occasional collisions do not matter much for us, but a proper hash_combine functions is not much more complex than this.

Good point! But does raft depend on boost already? There seem to be no standard library equivalent for this.
Also, in the current implementation of LRU cache, we don't use hash values at all (we changed the implementation from a hashmap to a vector of key-value tuples).

achirkin · 2024-02-06T13:19:04Z

/merge

achirkin and others added 17 commits August 14, 2023 16:16

Replace GEMM backend: cublas.gemm -> cublaslt.matmul

2cc477b

Replace broken (due to missing direct includes) direct uses of cublas…

dc7a9a4

…gemm

Merge branch 'branch-23.10' into fea-cublaslt-matmul

34a9479

Fix docs

71c03c0

Replace cublasgemm where it makes sense

a2fb088

Fix a typo

699de0c

Merge branch 'branch-23.10' into fea-cublaslt-matmul

f994f19

Put the cache into the resource handle as a user-define resource

f4d634a

Merge branch 'branch-23.10' into fea-cublaslt-matmul

2d1bf5c

Move matmul into a separate file

e57eebf

Complete the docs

d44bf20

Merge branch 'branch-23.10' into fea-cublaslt-matmul

facf81d

Merge branch 'branch-23.10' into fea-cublaslt-matmul

157d8ae

Merge branch 'branch-23.10' into fea-cublaslt-matmul

be68b61

Merge branch 'branch-23.10' into fea-cublaslt-matmul

f5ac41a

Merge branch 'branch-23.10' into fea-cublaslt-matmul

2d4dcb2

Merge branch 'branch-23.10' into fea-cublaslt-matmul

6f58669

achirkin added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 30, 2023

achirkin self-assigned this Aug 30, 2023

github-actions bot added the cpp label Aug 30, 2023

achirkin force-pushed the fea-cache-ivf-pq-params branch from a653b8b to 7aefb3b Compare August 30, 2023 07:43

achirkin added the 2 - In Progress Currenty a work in progress label Aug 30, 2023

achirkin and others added 7 commits August 30, 2023 15:18

Merge branch 'branch-23.10' into fea-cublaslt-matmul

a0e93fd

Merge branch 'branch-23.10' into fea-cublaslt-matmul

4c0d742

Merge branch 'branch-23.10' into fea-cublaslt-matmul

01c3634

Merge branch 'branch-23.10' into fea-cublaslt-matmul

abb3f00

Merge branch 'branch-23.10' into fea-cublaslt-matmul

e24b1c0

move matmul.hpp to cublaslt_wrappers.hpp

de29580

Merge branch 'branch-23.10' into fea-cublaslt-matmul

3835ed0

achirkin added 2 commits January 25, 2024 14:31

Merge branch 'branch-24.02' into fea-cache-ivf-pq-params

800ee80

Style check

82b34b9

achirkin added 2 - In Progress Currenty a work in progress and removed 0 - Blocked Unable to proceed until blocker is cleared labels Jan 25, 2024

achirkin changed the base branch from branch-24.02 to branch-24.04 January 25, 2024 13:37

Merge branch 'branch-24.04' into fea-cache-ivf-pq-params

d25778c

achirkin marked this pull request as draft January 25, 2024 13:38

achirkin and others added 4 commits January 25, 2024 16:21

Remove unused code

ce9d044

Merge branch 'branch-24.04' into fea-cache-ivf-pq-params

976d597

Make sure select_k always uses the workspace memory resource

6d29811

Revert an accidental copyright-only change

542aa64

achirkin removed request for a team January 29, 2024 16:08

Merge branch 'branch-24.04' into fea-cache-ivf-pq-params

8f7e37e

achirkin marked this pull request as ready for review January 30, 2024 14:20

achirkin added 3 - Ready for Review and removed 2 - In Progress Currenty a work in progress labels Jan 30, 2024

achirkin requested a review from tfeher January 30, 2024 14:21

achirkin mentioned this pull request Jan 31, 2024

[FEA] Add support for select_k on CSR matrix #2140

Merged

achirkin added 3 commits January 31, 2024 20:26

Merge branch 'branch-24.04' into fea-cache-ivf-pq-params

d8e1380

Merge branch 'branch-24.04' into fea-cache-ivf-pq-params

035e709

Merge branch 'branch-24.04' into fea-cache-ivf-pq-params

336f599

cjnolet approved these changes Feb 2, 2024

View reviewed changes

achirkin added 2 commits February 5, 2024 05:42

Merge branch 'branch-24.04' into fea-cache-ivf-pq-params

272f87e

Merge branch 'branch-24.04' into fea-cache-ivf-pq-params

48e19e2

tfeher approved these changes Feb 6, 2024

View reviewed changes

rapids-bot bot merged commit d7cbcf9 into rapidsai:branch-24.04 Feb 6, 2024
62 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache IVF-PQ and select-warpsort kernel launch parameters to reduce latency #1786

Cache IVF-PQ and select-warpsort kernel launch parameters to reduce latency #1786

achirkin commented Aug 30, 2023 •

edited

Loading

cjnolet commented Jan 25, 2024 •

edited

Loading

achirkin commented Jan 30, 2024 •

edited

Loading

cjnolet left a comment

tfeher left a comment

tfeher Feb 2, 2024

achirkin Feb 6, 2024

achirkin commented Feb 6, 2024

Cache IVF-PQ and select-warpsort kernel launch parameters to reduce latency #1786

Cache IVF-PQ and select-warpsort kernel launch parameters to reduce latency #1786

Conversation

achirkin commented Aug 30, 2023 • edited Loading

cjnolet commented Jan 25, 2024 • edited Loading

achirkin commented Jan 30, 2024 • edited Loading

cjnolet left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

tfeher Feb 2, 2024

Choose a reason for hiding this comment

achirkin Feb 6, 2024

Choose a reason for hiding this comment

achirkin commented Feb 6, 2024

achirkin commented Aug 30, 2023 •

edited

Loading

cjnolet commented Jan 25, 2024 •

edited

Loading

achirkin commented Jan 30, 2024 •

edited

Loading