Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Short description
Major change to the SIMD resolution macros.
Analysis of generated assembly for v0.8.0 shows multiple places where the transition to portable binaries caused critical functions to not be inlined/resolved as intrinsics. The biggest culprit is
popcnt
, which seems to be never inlined in the depth classifier.Following analysis, we introduce an additional step into the resolution mechanism. After obtaining a
ResolvedSimd
with all generic parameters bound to concrete SIMD implementations, we allow for another macro-based dispatch to construct a local function with relevanttarget_feature
annotations and pass the environment inside via arguments. SinceResolvedSimd
is already resolved statically, this doesn't introduce code bloat into the final binary, but allows the compiler to see thetarget_feature
annotations when generating code for that function. By introducing that dispatch to the three topmost "entry points" of the engine (run_on_subtree
,run_head_skipping
,skip
(tail-skipping)) we ensure all downstream intrinsics are properly resolved and inlined.The result is a massive throughput increase of 5, 10, 20, or in case of
google_map::travel_modes/rsonpath_direct_count
59 (fifty-nine) percent.Issue
This was done as part of work on #276
Checklist
All of these should be ticked off before you submit the PR.
just verify
locally and it succeeded.