Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of rounding dates in date_histogram aggregation #9727

Merged

Conversation

ketanv3
Copy link
Contributor

@ketanv3 ketanv3 commented Sep 4, 2023

Description

In the date_histogram aggregation, the timestamp value from each document must be rounded down to the nearest interval (year, quarter, month, week, day, etc.) as defined in the search request. This rounded down timestamp serves as the bucket key to aggregate results. Instead of rounding down the timestamp for each hit, the ArrayRounding class pre-computes the possible values and performs a binary search to find the round down point.

This is ideal for sufficiently large arrays. But a simple linear search is far superior for small arrays as it avoids the penalty of branch misprediction and pipeline stalls, and accesses memory sequentially.

Since this takes up majority of the CPU time, making these changes can lead to significant improvements. Hot methods:

42.55% java/util/Arrays.binarySearch0 ([JIIJ)I Inlined
11.74% org/opensearch/common/util/ReorganizingLongHash.find (J)J Inlined
9.32% .itable stub ()L; Native
7.15% org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator$1.collect (IJ)V JIT compiled
6.06% org/apache/lucene/util/packed/DirectReader$DirectPackedReader12.get (J)J JIT compiled

Macro benchmarks

I reused the noaa OSB workload and executed the following search request with varying intervals.

{
  "size": 0,
  "aggs": {
    "observations_per_month": {
      "date_histogram": {
        "field": "date",
        "interval": "month"
      }
    }
  }
}
Interval Num Values Binary (ms) Linear (ms)
year 3 870 824
quarter 12 1293 999
month 36 1625 1132
week 695 652
day 1004 942

Improvement in quarter/month intervals is substantial. Improvement in year/week/day interval is small as the number of pre-computed values is small, so no big jumps are made to access memory. This varies from workload to workload.

Micro benchmarks

Results are from c6i.2xlarge EC2 instance (Intel Xeon 8375C CPU @ 2.90GHz). Full results.

Uniform distribution

Each value equally likely to be picked.

Size Binary (M ops/s) Linear (M ops/s)
1 542.007 3371.438
2 143.544 201.142
4 97.628 134.224
8 64.972 102.861
16 49.087 89.784
32 38.744 70.796
64 31.847 58.108
128 25.786 44.351
256 23.336 29.917

Skewed distribution (towards the edge)

Follows a normal distribution centered at p90 ± 5% stddev.

Size Binary (M ops/s) Linear (M ops/s)
1 569.787 3140.960
2 386.256 3324.844
4 253.394 1366.625
8 114.747 328.050
16 85.917 167.981
32 62.349 101.340
64 48.765 85.985
128 39.663 64.171
256 32.544 50.117

Skewed distribution (centered)

Follows a normal distribution centered at p50 ± 5% stddev. This is the worst-case for "meet in the middle" linear search.

Size Binary (M ops/s) Linear (M ops/s)
1 571.713 3339.736
2 131.849 205.204
4 128.351 175.668
8 116.738 138.015
16 89.446 96.195
32 62.628 64.243
64 46.722 46.605
128 37.563 31.748
256 30.916 19.308

Alternatives

  1. I experimented with Eytzinger layout to improve the memory access pattern. The performance was marginally better than regular binary search but slower than linear search on small arrays, and similar to regular binary search on large arrays.
  2. On top of (1), I tried to write a branchless implementation of binary search but the unpredictability of JIT makes this task almost impossible.
  3. Linear search is very SIMD friendly! For very small arrays (<4 elements) it doesn't make sense as the overhead of bootstrapping is more than the actual work, but for large arrays, the SIMD implementation is almost twice as fast (platform specific). I'll raise a follow up PR for this.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2023

Compatibility status:

Checks if related components are compatible with change 1e50ab2

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2023

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Sep 4, 2023

Codecov Report

Merging #9727 (84017d1) into main (4ccecd6) will decrease coverage by 0.08%.
Report is 2 commits behind head on main.
The diff coverage is 42.72%.

@@             Coverage Diff              @@
##               main    #9727      +/-   ##
============================================
- Coverage     71.16%   71.09%   -0.08%     
  Complexity    58115    58115              
============================================
  Files          4831     4831              
  Lines        273999   274082      +83     
  Branches      39920    39930      +10     
============================================
- Hits         195005   194847     -158     
- Misses        62604    62920     +316     
+ Partials      16390    16315      -75     
Files Changed Coverage Δ
...arch/index/recovery/RemoteStoreRestoreService.java 12.03% <0.00%> (ø)
server/src/main/java/org/opensearch/node/Node.java 85.54% <0.00%> (ø)
...arch/gateway/remote/RemoteClusterStateService.java 65.37% <23.37%> (-13.05%) ⬇️
.../src/main/java/org/opensearch/common/Rounding.java 86.54% <92.85%> (-0.42%) ⬇️
...search/gateway/remote/ClusterMetadataManifest.java 97.46% <100.00%> (+0.53%) ⬆️

... and 460 files with indirect coverage changes

📢 Have feedback on the report? Share it here.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Compatibility status:

Checks if related components are compatible with change abfb6cd

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Gradle Check (Jenkins) Run Completed with:

@ketanv3
Copy link
Contributor Author

ketanv3 commented Sep 7, 2023

Bad Gateway errors from Jenkins again. :(

GitHub action failed but the check is still running in the background.
https://build.ci.opensearch.org/job/gradle-check/25029/console

Update: At least the check completed successfully now.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Gradle Check (Jenkins) Run Completed with:

@ketanv3
Copy link
Contributor Author

ketanv3 commented Sep 7, 2023

Jenkins down again. This is disappointing.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2023

Gradle Check (Jenkins) Run Completed with:

@ketanv3
Copy link
Contributor Author

ketanv3 commented Sep 8, 2023

@reta @msfroh Gradle check succeeded. Can we merge this now?

@reta reta merged commit 7abfc17 into opensearch-project:main Sep 8, 2023
@reta reta added the backport 2.x Backport to 2.x branch label Sep 8, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-9727-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7abfc17a6d1334a885e16d07aa3f6d1c875279c8
# Push it to GitHub
git push --set-upstream origin backport/backport-9727-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-9727-to-2.x.

@reta
Copy link
Collaborator

reta commented Sep 8, 2023

@reta @msfroh Gradle check succeeded. Can we merge this now?

@ketanv3 @msfroh do we think it should go to 2.10? @ketanv3 please do manual backport to 2.x

ketanv3 added a commit to ketanv3/OpenSearch that referenced this pull request Sep 8, 2023
…pensearch-project#9727)

* Improve performance of rounding dates in date_histogram aggregation

Signed-off-by: Ketan Verma <[email protected]>

* Minor refactoring changes

Signed-off-by: Ketan Verma <[email protected]>

---------

Signed-off-by: Ketan Verma <[email protected]>
ketanv3 added a commit to ketanv3/OpenSearch that referenced this pull request Sep 8, 2023
…pensearch-project#9727)

* Improve performance of rounding dates in date_histogram aggregation

Signed-off-by: Ketan Verma <[email protected]>

* Minor refactoring changes

Signed-off-by: Ketan Verma <[email protected]>

---------

Signed-off-by: Ketan Verma <[email protected]>
reta pushed a commit that referenced this pull request Sep 8, 2023
…9727) (#9928)

* Improve performance of rounding dates in date_histogram aggregation



* Minor refactoring changes



---------

Signed-off-by: Ketan Verma <[email protected]>
reta pushed a commit that referenced this pull request Sep 8, 2023
…9727) (#9930)

* Improve performance of rounding dates in date_histogram aggregation



* Minor refactoring changes



---------

Signed-off-by: Ketan Verma <[email protected]>
reta added a commit that referenced this pull request Sep 8, 2023
reta added a commit that referenced this pull request Sep 8, 2023
kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
…pensearch-project#9727)

* Improve performance of rounding dates in date_histogram aggregation

Signed-off-by: Ketan Verma <[email protected]>

* Minor refactoring changes

Signed-off-by: Ketan Verma <[email protected]>

---------

Signed-off-by: Ketan Verma <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/OpenSearch that referenced this pull request Sep 20, 2023
…pensearch-project#9727)

* Improve performance of rounding dates in date_histogram aggregation

Signed-off-by: Ketan Verma <[email protected]>

* Minor refactoring changes

Signed-off-by: Ketan Verma <[email protected]>

---------

Signed-off-by: Ketan Verma <[email protected]>
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…pensearch-project#9727)

* Improve performance of rounding dates in date_histogram aggregation

Signed-off-by: Ketan Verma <[email protected]>

* Minor refactoring changes

Signed-off-by: Ketan Verma <[email protected]>

---------

Signed-off-by: Ketan Verma <[email protected]>
Signed-off-by: Ivan Brusic <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…pensearch-project#9727)

* Improve performance of rounding dates in date_histogram aggregation

Signed-off-by: Ketan Verma <[email protected]>

* Minor refactoring changes

Signed-off-by: Ketan Verma <[email protected]>

---------

Signed-off-by: Ketan Verma <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants