Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] search.max_buckets is not evaluated correctly for terms agg #13314

Open
jed326 opened this issue Apr 19, 2024 · 3 comments
Open

[BUG] search.max_buckets is not evaluated correctly for terms agg #13314

jed326 opened this issue Apr 19, 2024 · 3 comments

Comments

@jed326
Copy link
Collaborator

jed326 commented Apr 19, 2024

Describe the bug

The search.max_buckets setting (ref) is used to control the maximum number of aggregation buckets allowed in a single search response.

For terms aggregations the way in which the bucket count is calculated is that sub-aggregation buckets are counted first, and then if their parent bucket is pruned from the candidate list the sub-aggregation bucket count is then subtracted. This means that it is not really accurately counting the number of buckets, see reproduction section below for an example.

More broadly speaking, I'm not sure if this search.max_buckets setting is actually useful. I think the setting can have 2 uses:

  1. Limit the response size of a given search request -- This isn't quite working correctly as shown by this issue
  2. Stop bad aggregations from taking up too many resources -- Most aggregation types do not enforce this max_buckets setting at the shard level, it's only evaluated during reduce on the coordinator level which is after a lot of the resource intensive portions of the search request are already completed.

Somewhat related:

Related component

Search:Resiliency

To Reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

The following was done with the noaa opensearch-benchmarks workload but it's not specific to that data.

Set cluster setting:

{
    "persistent": {
        "search.max_buckets": 2
    }
}

This search request does not hit the max buckets limit

{
    "size": 0,
    "aggs": {
        "station": {
            "terms": {
                "field": "station.id",
                "size": 1,
                "shard_size": 1
            },
            "aggs": {
                "date": {
                    "terms": {
                        "field": "date",
                        "size": 1,
                        "shard_size": 1
                    }
                }
            }
        }
    }
}

Neither does this one

{
    "size": 0,
    "aggs": {
        "station": {
            "terms": {
                "field": "station.id",
                "size": 1,
                "shard_size": 1
            },
            "aggs": {
                "date": {
                    "terms": {
                        "field": "date",
                        "size": 1,
                        "shard_size": 2
                    }
                }
            }
        }
    }
}

However, this one does:

{
    "size": 0,
    "aggs": {
        "station": {
            "terms": {
                "field": "station.id",
                "size": 1,
                "shard_size": 2
            },
            "aggs": {
                "date": {
                    "terms": {
                        "field": "date",
                        "size": 1,
                        "shard_size": 1
                    }
                }
            }
        }
    }
}

In all 3 of these cases the response size on the coordinator is only 2 buckets.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7]
@jed326 Thanks for creating this issue, look forward to a pull request that addresses this topic.

Note; might be worthwhile to create an RFC to remove the field entirely in v3.0

@getsaurabh02
Copy link
Member

search.max_buckets could be more treated as a circuit breaker construct which limits any bad aggregation query from taking up too many resources, specially memory. I have seen this working in favor of JVM heap utilization on clusters, preventing rogue query from taking the whole node down.

Coverage and accuracy is definitely an issue as pointed by @jed326, especially in case of pruning and should be addressed first.

@jed326 jed326 changed the title [BUG] search.max_buckets is not evaluated correctly (and probably is not that useful) [BUG] search.max_buckets is not evaluated correctly for terms agg May 7, 2024
@jed326
Copy link
Collaborator Author

jed326 commented May 7, 2024

Renaming this issue to focus on the bug specific to terms aggregations. I think there is still some work aside from that we can do to make the search.max_buckets setting more consistent across aggregation types but that can be a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: New
Status: Next (Next Quarter)
Development

No branches or pull requests

3 participants