-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAGRA - runtime dispatch of distance functions #324
CAGRA - runtime dispatch of distance functions #324
Conversation
This is an alternative to #296 for reducing the binary size with minimal changes to the code. Here's how they compare: enh-cagra-runtime-distance-dispatch (#324)performance: up to 12% QPS slowdown on deep-100M and 5% QPS slowdown on wiki-all enh-cagra-separable-compilation (#296)performance: up to 13% QPS slowdown on deep-100M and up to 8% QPS slowdown on wiki-all (low-itopk CAGRA-Q cases being the worst, with others seemingly slightly better than (#324) |
To sum it up:
|
Performance update: Latest benchmarks (includes k = 10 and k = 100 for both datasets): |
Move the TEAM_SIZE and DATASET_BLOCK_DIM template parameters of CAGRA search kernels (single_cta and multi_cta versions) to runtime by introducing a switch-case statement inside the compute distance components (hence the name: runtime dispatch of the distance implementation). As a result:
branch-24.10
, but each instance is slightly largerDATASET_BLOCK_DIM
perTEAM_SIZE
is possible, i.e. the team size defines both (though this is how it currently is anyway).