Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache IVF-PQ and select-warpsort kernel launch parameters to reduce latency #1786

Merged
merged 50 commits into from
Feb 6, 2024

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Aug 30, 2023

This PR aims at reducing the latency in IVF-PQ and related functions, especially with small work sizes and in the "throughput" benchmark mode.

  • Add kernel config caching to ivf_pq::search::compute_similarity kernel
  • Add kernel config caching to select::warpsort
  • Fix the memory_resource usage in matrix::select_k: make sure all temporary allocations use raft's workspace memory resource.

@achirkin achirkin added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 30, 2023
@achirkin achirkin self-assigned this Aug 30, 2023
@github-actions github-actions bot added the cpp label Aug 30, 2023
@achirkin achirkin force-pushed the fea-cache-ivf-pq-params branch from a653b8b to 7aefb3b Compare August 30, 2023 07:43
@achirkin achirkin added the 2 - In Progress Currenty a work in progress label Aug 30, 2023
@achirkin achirkin added 2 - In Progress Currenty a work in progress and removed 0 - Blocked Unable to proceed until blocker is cleared labels Jan 25, 2024
@cjnolet
Copy link
Member

cjnolet commented Jan 25, 2024

@achirkin can you retarget this PR to 24.4? It's too late to get this into 24.02.

@achirkin achirkin changed the base branch from branch-24.02 to branch-24.04 January 25, 2024 13:37
@achirkin achirkin marked this pull request as draft January 25, 2024 13:38
@achirkin achirkin removed request for a team January 29, 2024 16:08
@achirkin
Copy link
Contributor Author

achirkin commented Jan 30, 2024

I've benchmarked prims::SelectK, prims:KNN/ivf_pq (with n_queries set to 1), and ann-bench ivf-pq.

Performance changes in the prims benchmarks are very minimal; most of the cases are the same withing +-2-3% time, a few cases became faster by 5-8%.

The most benefit comes in ann-bench run in the "throughput" mode. At 4 threads, it's 20%, at 32 threads it's 60% faster.
IVF-PQ speed up in the throughput mode

@achirkin achirkin marked this pull request as ready for review January 30, 2024 14:20
@achirkin achirkin added 3 - Ready for Review and removed 2 - In Progress Currenty a work in progress labels Jan 30, 2024
@achirkin achirkin requested a review from tfeher January 30, 2024 14:21
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR Artem, it looks good to me. I have one question, but that need not block merging this.

inline auto operator()(const search_kernel_key& x) const noexcept -> std::size_t
{
return (size_t{x.manage_local_topk} << 63) +
size_t{x.topk} * size_t{x.n_probes} * size_t{x.n_queries} +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this hash with parameters ("*") multiplied used? I guess occasional collisions do not matter much for us, but a proper hash_combine functions is not much more complex than this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! But does raft depend on boost already? There seem to be no standard library equivalent for this.
Also, in the current implementation of LRU cache, we don't use hash values at all (we changed the implementation from a hashmap to a vector of key-value tuples).

@achirkin
Copy link
Contributor Author

achirkin commented Feb 6, 2024

/merge

@rapids-bot rapids-bot bot merged commit d7cbcf9 into rapidsai:branch-24.04 Feb 6, 2024
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants