Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable Width Histogram doesn't work properly with Top Hits Histogram Aggregation #16708

Open
vijay267 opened this issue Nov 22, 2024 · 1 comment
Labels
bug Something isn't working Search:Aggregations

Comments

@vijay267
Copy link

vijay267 commented Nov 22, 2024

Describe the bug

When I try to execute a variable width histogram (on score) with a nested top hits histogram, top hits ends up showing the WRONG documents inside each of the buckets. I'm using OpenSearch 1.3.19.

So for example if I have a variable width histogram and the bucket scores end up as 0-10 15-100 130-150, the top hits subaggregation will often show the documents with scores of say 140 in the 0-10 bucket. Given I don't have this problem with either the range aggregation or the normal histogram, this seems like a bug.

Related component

Search:Aggregations

To Reproduce

  1. Set up an OpenSearch 1.3.19 cluster (for example with Docker)

  2. Add an index with these settings & mappings
    EntityIndexMappings.txt
    EntityIndexSettings.txt

  3. Add these documents to the newly created index

doc-111790-en.txt
doc-5829842-en.txt
doc-5829843-en.txt
doc-5878933-en.txt
doc-5884592-en.txt
doc-8221094-en.txt

  1. Make a search request with variable width histogram & top hits aggregation and you'll get a response like this (the scores might be a bit different given my actual index has thousands of documents).
    As you can see the top hits are putting documents with the wrong score into certain buckets. For example the bucket for scores 1108.385 - 1108.385 has a document (id = 5884592-en) with score 4.5.

VariableWidthHistogramRequest.txt
VariableWidthHistogramResponse.txt

Let me know if you need any more information.

Expected behavior

With using a variable width histogram and then a top hits subaggregation, the results showing up in top hits should be the ones that actual belong to the variable width histogram.

Additional Details

Plugins
analysis-kuromoji
analysis-nori
analysis-smartcn
analysis-icu
analysis-stempel
analysis-stconvert (https://get.infini.cloud/opensearch/analysis-stconvert/1.3.19)

Host/Environment (please complete the following information):

  • OS: Not sure? I'm running it in docker to reproduce the issue.
  • Version 1.3.19

Additional context
This problem also is present on OpenDistro for ElasticSearch.

@sandeshkr419
Copy link
Contributor

[Search Triage] @getsaurabh02 - Can we assign this to someone to further look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Aggregations
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants