Variable Width Histogram doesn't work properly with Top Hits Histogram Aggregation #16708

vijay267 · 2024-11-22T20:15:00Z

Describe the bug

When I try to execute a variable width histogram (on score) with a nested top hits histogram, top hits ends up showing the WRONG documents inside each of the buckets. I'm using OpenSearch 1.3.19.

So for example if I have a variable width histogram and the bucket scores end up as 0-10 15-100 130-150, the top hits subaggregation will often show the documents with scores of say 140 in the 0-10 bucket. Given I don't have this problem with either the range aggregation or the normal histogram, this seems like a bug.

Related component

Search:Aggregations

To Reproduce

Set up an OpenSearch 1.3.19 cluster (for example with Docker)
Add an index with these settings & mappings
EntityIndexMappings.txt
EntityIndexSettings.txt
Add these documents to the newly created index

doc-111790-en.txt
doc-5829842-en.txt
doc-5829843-en.txt
doc-5878933-en.txt
doc-5884592-en.txt
doc-8221094-en.txt

Make a search request with variable width histogram & top hits aggregation and you'll get a response like this (the scores might be a bit different given my actual index has thousands of documents).
As you can see the top hits are putting documents with the wrong score into certain buckets. For example the bucket for scores 1108.385 - 1108.385 has a document (id = 5884592-en) with score 4.5.

VariableWidthHistogramRequest.txt
VariableWidthHistogramResponse.txt

Let me know if you need any more information.

Expected behavior

With using a variable width histogram and then a top hits subaggregation, the results showing up in top hits should be the ones that actual belong to the variable width histogram.

Additional Details

Plugins
analysis-kuromoji
analysis-nori
analysis-smartcn
analysis-icu
analysis-stempel
analysis-stconvert (https://get.infini.cloud/opensearch/analysis-stconvert/1.3.19)

Host/Environment (please complete the following information):

OS: Not sure? I'm running it in docker to reproduce the issue.
Version 1.3.19

Additional context
This problem also is present on OpenDistro for ElasticSearch.

sandeshkr419 · 2024-11-27T17:33:14Z

[Search Triage] @getsaurabh02 - Can we assign this to someone to further look?

vijay267 added bug Something isn't working untriaged labels Nov 22, 2024

github-actions bot added the Search:Aggregations label Nov 22, 2024

github-project-automation bot added this to Search Project Board Nov 22, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Nov 22, 2024

sandeshkr419 removed the untriaged label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable Width Histogram doesn't work properly with Top Hits Histogram Aggregation #16708

Variable Width Histogram doesn't work properly with Top Hits Histogram Aggregation #16708

vijay267 commented Nov 22, 2024 •

edited

Loading

sandeshkr419 commented Nov 27, 2024

Variable Width Histogram doesn't work properly with Top Hits Histogram Aggregation #16708

Variable Width Histogram doesn't work properly with Top Hits Histogram Aggregation #16708

Comments

vijay267 commented Nov 22, 2024 • edited Loading

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details

sandeshkr419 commented Nov 27, 2024

vijay267 commented Nov 22, 2024 •

edited

Loading