-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Star Tree] [Search] Resolving Date histogram with metric aggregation using star-tree #16674
Conversation
❌ Gradle check result for d7fbc39: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/StarTreeLeafBucketCollectorBase.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/StarTreeLeafBucketCollectorBase.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/startree/StarTreeFilter.java
Outdated
Show resolved
Hide resolved
...in/java/org/opensearch/index/compositeindex/datacube/startree/utils/StarTreeQueryHelper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/StarTreeBucketCollector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/StarTreeLeafBucketCollectorBase.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I don't like the deepening association with the Collector
interface, given that we're not collecting documents from a Lucene query.
Instead, could we try to work at the Aggregator
level more?
I think the place where you should hook more of the star tree stuff in would be the getLeafCollector(LeafReaderContext)
method in AggregatorBase
:
OpenSearch/server/src/main/java/org/opensearch/search/aggregations/AggregatorBase.java
Line 202 in c0852f8
public final LeafBucketCollector getLeafCollector(LeafReaderContext ctx) throws IOException { |
I think that's where you could (per-segment) make your choice about whether you are going to delegate to the aggregation logic to return a real LeafBucketCollector
, or you're going to throw a CollectionTerminatedException
. You can even pass the subAggregators
to that logic so the parent has easy access to its children.
Essentially, in the star tree case, you should never need to return a collector. You can just read the values directly from the segment.
server/src/main/java/org/opensearch/search/aggregations/bucket/BucketsAggregator.java
Outdated
Show resolved
Hide resolved
Hi @msfroh @sandeshkr419 , Currently, lets say we have nested aggregation as follows :
Totally makes sense to throw It works well if there are no nested collectors. But am just wondering how to handle nested aggs , for the above example we need Do we need to return a new interface/class specific for star tree in that case for aggregation ?
Because we can't throw And if we don't have new interface/class, then still subCollector is of type Please correct me if my understanding is wrong. |
@msfroh @bharath-techie Refactored the changes to not extend LeafBucketCollectors. However, I have introduced a new interface Now, introducing this new method can compliment with Froh's idea on unifying pre-computations, I'll leave comments on that PR itself to make it more friendly while pre-computing sub-aggregations. |
❌ Gradle check result for e1898f1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@sandeshkr419 Can we please validate |
server/src/main/java/org/opensearch/search/aggregations/metrics/StarTreeCollector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/bucket/BucketsAggregator.java
Outdated
Show resolved
Hide resolved
Comparing the 2 solutions proposed in 2 commits:
Response (redacted):
Now to update the sum bucket, we'd have to iterate over Basically, a collector like interface which has hold of sub-collectors (and we recursively call up collectStarTreeEntry - which will be an analogous method to collect/collectBucjket what we have for Lucene documents) might simplify the solution what @bharath-techie also proposed earlier. Let me know your thoughts on this. |
I don't understand what benefit you get from extending |
Hi @sandeshkr419 ,
Will this work ? Basically the parent must check if all subAggregators are of type So all aggregators will have [ We can have better interfaces as well but the idea is to enable this within our interfaces and make it work similar to existing leafCollector interface ] Edit : [ 25/12 ]
|
❌ Gradle check result for 75ad9a8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 7fab6ad: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@msfroh @bharath-techie Updated the PR description with high-level code design. Basically, I introduced these 2 interfaces (bad terminology I know - I'll revisit the terminology once again):
As next steps, 1/ I'll remove up the major hard-coding in the code regarding identifying supported query shapes and fix other POC code 2/ tune-up the above interfaces to get to a minimal state. Other than this, @bharath-techie , @msfroh I think this design achieves what it intends to do in the 1st place - 1/ easily extensible for other bucket aggregators and introducing more nested aggregations and 2/ remove the unwanted dependency from LeafBucketCollector as it is not required. Let me know if there are major design comments, I'll work on getting this PR out of draft state in parallel. |
...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java
Outdated
Show resolved
Hide resolved
❌ Gradle check result for 7208b67: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@msfroh Thanks for checking in. Added more test cases and added changelog. Let me know if you have further comments. @bharath-techie Please let me know if you also have any other comments. |
...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregator.java
Outdated
Show resolved
Hide resolved
Thanks for the changes @sandeshkr419. So the changes look good and will approve it post the addressal of minor comments. |
...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java
Show resolved
Hide resolved
Signed-off-by: Sandesh Kumar <[email protected]>
❌ Gradle check result for 92415c4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sandesh Kumar <[email protected]>
Signed-off-by: Sandesh Kumar <[email protected]>
❌ Gradle check result for 61644ca: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 61644ca: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❕ Gradle check result for 61644ca: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
…sing star-tree (#16674) --------- Signed-off-by: Sandesh Kumar <[email protected]> Co-authored-by: Sandesh Kumar <[email protected]> (cherry picked from commit b5234a5) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Description
Sharing the updated code design to resolve date histograms with metric sub-aggregators:
StarTreeQueryContext
in search context.getLeafCollector()
in invoked, if query can be resolved using star-tree.FixedBitSet matchingDocsBitSet
StarTreeBucketCollector
for all the aggregators including sub-aggregators.StarTreeBucketCollector
interface exposes a methodcollectStarTreeEntry()
which we run through all theStarTreeBucketCollector
corresponding to all the aggregators. Also, it is used to initialize the relevant metric iterators which we require for each aggregators to pre-compute that particular aggregation.collectStarTreeEntry()
for allStarTreeBucketCollector
to collect up all buckets.getLeafCollector()
is invoked. For parent aggregator,getLeafCollector()
pre-computes the aggregation using above mentioned steps and then returns an early termination leaf collector.This is the draft set of changes which I am utilizing to discuss solution approach here.The primary challenge to resolve a date histogram with metric aggregation was to figure out how sub-aggregators will get resolve. When resolving a query by star-tree, we lose lthe need of ucene documents and don't utilizecollect()
calls which are used internally to delegate the collection of sub-aggregations to sub-collectors.To mitigate this challenge, I have introduced a wrapper class -StarTreeBucketCollector
to basically introduce acollectStarEntry(int starTreeEntry, long bucket)
method. This method is then overridden in metric aggregator methods and invoked from parent aggregators (here DateHistogramAggregator).The benefit of this strategy is that this is easily extensible by other bucket aggregators where metric aggregations will be nested. Also, other bucket related utilities are re-useable as it is, it saves the effort of having a separate set of utilities for star tree buckets as the old buckets are utilized here.Want to take early feedback on this approach.Note: Things are hard-coded for one example query shape right now
Related Issues
Resolves (#16552)
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.