Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic dispatch for block codecs #579

Merged
merged 19 commits into from
Jul 20, 2024
Merged

Dynamic dispatch for block codecs #579

merged 19 commits into from
Jul 20, 2024

Conversation

elshize
Copy link
Member

@elshize elshize commented Mar 23, 2024

Use dynamic dispatch for block codecs. The old class template
block_freq_index is replaced by the BlockInvertedIndex class. This class
contains a shared pointer to a BlockCodec object, which is a superclass of
all block codec implementations.

Changelog-changed: BlockInvertedIndex replaces block_freq_index
Changelog-performance: Expect a slight performance drop of all block codecs

@elshize elshize self-assigned this Mar 23, 2024
@elshize elshize force-pushed the dynamic-dispatch-codec branch from f1d5888 to 7a94fb4 Compare March 23, 2024 19:31
Copy link

codecov bot commented Mar 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.86%. Comparing base (277cbe9) to head (6178ca5).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #579      +/-   ##
==========================================
+ Coverage   93.27%   93.86%   +0.58%     
==========================================
  Files          89       80       -9     
  Lines        4448     3881     -567     
==========================================
- Hits         4149     3643     -506     
+ Misses        299      238      -61     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JMMackenzie
Copy link
Member

Benchmarking on MS-Marco v2 passages with BM25:

main

k=10
{"type": "block_simdbp", "query": "wand", "avg": 12999.1, "q50": 5843, "q90": 32519, "q95": 47480, "q99": 95925}
{"type": "block_simdbp", "query": "maxscore", "avg": 9178.09, "q50": 3963, "q90": 23918, "q95": 33600, "q99": 66648}
{"type": "block_simdbp", "query": "ranked_or", "avg": 458066, "q50": 387151, "q90": 968641, "q95": 1.2303e+06, "q99": 1.8538e+06}

k=1000
{"type": "block_simdbp", "query": "wand", "avg": 48258.5, "q50": 24904, "q90": 110636, "q95": 175410, "q99": 379647}
{"type": "block_simdbp", "query": "maxscore", "avg": 31727.1, "q50": 17407, "q90": 69608, "q95": 112551, "q99": 243058}
{"type": "block_simdbp", "query": "ranked_or", "avg": 458130, "q50": 390417, "q90": 964131, "q95": 1.22383e+06, "q99": 1.84154e+06}

this

k=10
{"type": "block_simdbp", "query": "wand", "avg": 12784.1, "q50": 5622, "q90": 32250, "q95": 47708, "q99": 95753}
{"type": "block_simdbp", "query": "maxscore", "avg": 9132.16, "q50": 3899, "q90": 23733, "q95": 35401, "q99": 66252}
{"type": "block_simdbp", "query": "ranked_or", "avg": 445143, "q50": 372223, "q90": 941900, "q95": 1.20444e+06, "q99": 1.8235e+06}

k=1000
{"type": "block_simdbp", "query": "wand", "avg": 48205.2, "q50": 24919, "q90": 111024, "q95": 175868, "q99": 374888}
{"type": "block_simdbp", "query": "maxscore", "avg": 31675.2, "q50": 17268, "q90": 69635, "q95": 112292, "q99": 247485}
{"type": "block_simdbp", "query": "ranked_or", "avg": 450663, "q50": 374939, "q90": 955607, "q95": 1.21881e+06, "q99": 1.87744e+06}

@JMMackenzie
Copy link
Member

In other words, it looks to have no performance penalty? Seems kind of weird but I'll take it. We should double check that we're doing what we think we're doing I guess...

@JMMackenzie
Copy link
Member

JMMackenzie commented Apr 1, 2024

I tested stream_vbyte as well with MaxScore k=10, found no regressions.

main
{"type": "block_streamvbyte", "query": "maxscore", "avg": 9534.42, "q50": 4137, "q90": 24831, "q95": 34820, "q99": 68842}

this
{"type": "block_streamvbyte", "query": "maxscore", "avg": 9465.44, "q50": 4100, "q90": 24682, "q95": 34673, "q99": 68516}

Edit: "main" is not really main, it's just queries vs queries_dynamic in this branch. I think this can be merged once clang-format is happy?

@elshize
Copy link
Member Author

elshize commented Apr 2, 2024

@amallia FYI we did some testing here regarding dynamic dispatch for block codecs. Seems no performance regression.

I think we'll have to go for it. This will simplify the code, make the compilation faster, and so on. I'm planning to work on porting all the code to the new type-erased block codec as part of this PR. Let me know if you have any comment regarding the general idea; I'll let you both know when it's ready for review.

@elshize elshize force-pushed the dynamic-dispatch-codec branch from f20581b to 173c74f Compare April 5, 2024 22:55
@elshize elshize force-pushed the dynamic-dispatch-codec branch from a19c652 to 9c5438e Compare April 14, 2024 00:06
@elshize elshize force-pushed the dynamic-dispatch-codec branch from 6de9e8b to 06a74cc Compare May 12, 2024 13:34
@elshize elshize changed the title Test of dynamic dispatch for block codec Dynamic dispatch for block codecs May 18, 2024
@elshize elshize force-pushed the dynamic-dispatch-codec branch from ed97d5e to 6178ca5 Compare May 18, 2024 19:23
@elshize elshize requested review from JMMackenzie and amallia May 18, 2024 20:00
Copy link
Member

@JMMackenzie JMMackenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this took so long, Michal. It all looks great, and I think it is worth the very minor performance hit.

include/pisa/codec/block_codec.hpp Show resolved Hide resolved
test/test_block_codecs.cpp Show resolved Hide resolved
@elshize
Copy link
Member Author

elshize commented Jul 20, 2024

@JMMackenzie Thanks! Before merging, I'll clean up the git history here. I might just squash all of it, but at least I want it clean and have some important info in it, including some performance disclaimers.

@elshize elshize merged commit bb2b3df into master Jul 20, 2024
9 of 10 checks passed
@elshize elshize deleted the dynamic-dispatch-codec branch July 20, 2024 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants