Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move codec implementations to cpp and make codec dependencies private #465

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

elshize
Copy link
Member

@elshize elshize commented Jun 14, 2021

Because the implementation of block-wise decode (and encode but this one is not as crucial) is moved out of the header, there is a potential that this will affect performance.

Personally, I don't think these are being inlined anyway but should be tested. I additionally added PISA_ENABLE_IPO cmake option in case we want to test link time optimization.

@elshize elshize self-assigned this Jun 14, 2021
@codecov
Copy link

codecov bot commented Jun 14, 2021

Codecov Report

Merging #465 (f5945bb) into master (a28100d) will decrease coverage by 0.07%.
The diff coverage is 50.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #465      +/-   ##
==========================================
- Coverage   93.26%   93.18%   -0.08%     
==========================================
  Files          92       86       -6     
  Lines        4926     4785     -141     
==========================================
- Hits         4594     4459     -135     
+ Misses        332      326       -6     
Impacted Files Coverage Δ
include/pisa/codec/block_codecs.hpp 86.61% <ø> (-1.78%) ⬇️
include/pisa/codec/varintgb.hpp 62.79% <ø> (ø)
include/pisa/forward_index.hpp 95.12% <ø> (ø)
include/pisa/codec/VarIntG8IU.h 50.84% <50.42%> (+0.84%) ⬆️
include/pisa/bit_vector.hpp 97.54% <0.00%> (-0.41%) ⬇️
include/pisa/block_posting_list.hpp 99.21% <0.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a28100d...f5945bb. Read the comment docs.

@elshize elshize requested a review from JMMackenzie June 14, 2021 19:12
@elshize
Copy link
Member Author

elshize commented Jun 14, 2021

@JMMackenzie do you think you'd be able to run a benchmark comparing this to master? I think one block codec should be enough to verify if there's any regression, unless by looking at the code you can see more potential regressions...

@JMMackenzie
Copy link
Member

CW09B with block_simdbp encoding. About 5000 queries of varying lengths.Based on the numbers below, it looks like we are paying a little for this.

AND k = 1000

Master: {"type": "block_simdbp", "query": "and", "avg": 10647.6, "q50": 3791, "q90": 28323, "q95": 43043, "q99": 93122}
This: {"type": "block_simdbp", "query": "and", "avg": 10747.1, "q50": 3743, "q90": 28466, "q95": 43279, "q99": 94556}

MaxScore k = 10

Master: {"type": "block_simdbp", "query": "maxscore", "avg": 19966.6, "q50": 10714, "q90": 51073, "q95": 69106, "q99": 115939}
This: {"type": "block_simdbp", "query": "maxscore", "avg": 24766.2, "q50": 14654, "q90": 62199, "q95": 81999, "q99": 126860}

MaxScore k = 1000

Master: {"type": "block_simdbp", "query": "maxscore", "avg": 35093.4, "q50": 24444, "q90": 79897, "q95": 104968, "q99": 170087}
This: {"type": "block_simdbp", "query": "maxscore", "avg": 39695.4, "q50": 29151, "q90": 88562, "q95": 114082, "q99": 176220}

@elshize
Copy link
Member Author

elshize commented Jun 15, 2021

Thanks @JMMackenzie can you try passing -DPISA_ENABLE_IPO to cmake, recompile and run again? I'm curious if this helps.

@elshize
Copy link
Member Author

elshize commented Jun 15, 2021

Actually, I see a bug there, I name the option with PISA_ but the check is without it (right below). You'd need to fix that. Whatever you do, please check if the lto flag is actually passed to the linker to make sure we compare the right thing.

@JMMackenzie
Copy link
Member

Actually, I see a bug there, I name the option with PISA_ but the check is without it (right below). You'd need to fix that. Whatever you do, please check if the lto flag is actually passed to the linker to make sure we compare the right thing.

Already saw and fixed that. Having some trouble with CMake now, working through it. Will report back.

@elshize
Copy link
Member Author

elshize commented Jun 15, 2021

You'll also need:

cmake_policy(SET CMP0069 NEW) # Use IPO (LTO) when requested

@JMMackenzie
Copy link
Member

Doesn't seem to help too much:

AND k = 1000

Master: {"type": "block_simdbp", "query": "and", "avg": 10647.6, "q50": 3791, "q90": 28323, "q95": 43043, "q99": 93122}
This: {"type": "block_simdbp", "query": "and", "avg": 10747.1, "q50": 3743, "q90": 28466, "q95": 43279, "q99": 94556}
IPO: {"type": "block_simdbp", "query": "and", "avg": 10673.2, "q50": 3786, "q90": 28325, "q95": 42724, "q99": 93325}

MaxScore k = 10

Master: {"type": "block_simdbp", "query": "maxscore", "avg": 19966.6, "q50": 10714, "q90": 51073, "q95": 69106, "q99": 115939}
This: {"type": "block_simdbp", "query": "maxscore", "avg": 24766.2, "q50": 14654, "q90": 62199, "q95": 81999, "q99": 126860}
IPO: {"type": "block_simdbp", "query": "maxscore", "avg": 24060.2, "q50": 14279, "q90": 60437, "q95": 79765, "q99": 123746}


MaxScore k = 1000

Master: {"type": "block_simdbp", "query": "maxscore", "avg": 35093.4, "q50": 24444, "q90": 79897, "q95": 104968, "q99": 170087}
This: {"type": "block_simdbp", "query": "maxscore", "avg": 39695.4, "q50": 29151, "q90": 88562, "q95": 114082, "q99": 176220}
IPO: {"type": "block_simdbp", "query": "maxscore", "avg": 40741.5, "q50": 30760, "q90": 89645, "q95": 114450, "q99": 173881}

@JMMackenzie
Copy link
Member

Are we willing to take this performance hit? I'm happy to revisit these tests soon if we want to make sure...

@elshize
Copy link
Member Author

elshize commented Mar 14, 2022

Are we willing to take this performance hit? I'm happy to revisit these tests soon if we want to make sure...

Possibly... but not entirely sure. Kind of want to take some time and think about it, mostly whether there's a different way of improving compilation but without the performance hit. Hope to have some time for it sometime this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants