You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The search.max_buckets setting (ref) is used to control the maximum number of aggregation buckets allowed in a single search response.
For terms aggregations the way in which the bucket count is calculated is that sub-aggregation buckets are counted first, and then if their parent bucket is pruned from the candidate list the sub-aggregation bucket count is then subtracted. This means that it is not really accurately counting the number of buckets, see reproduction section below for an example.
More broadly speaking, I'm not sure if this search.max_buckets setting is actually useful. I think the setting can have 2 uses:
Limit the response size of a given search request -- This isn't quite working correctly as shown by this issue
Stop bad aggregations from taking up too many resources -- Most aggregation types do not enforce this max_buckets setting at the shard level, it's only evaluated during reduce on the coordinator level which is after a lot of the resource intensive portions of the search request are already completed.
search.max_buckets could be more treated as a circuit breaker construct which limits any bad aggregation query from taking up too many resources, specially memory. I have seen this working in favor of JVM heap utilization on clusters, preventing rogue query from taking the whole node down.
Coverage and accuracy is definitely an issue as pointed by @jed326, especially in case of pruning and should be addressed first.
jed326
changed the title
[BUG] search.max_buckets is not evaluated correctly (and probably is not that useful)
[BUG] search.max_buckets is not evaluated correctly for terms agg
May 7, 2024
Renaming this issue to focus on the bug specific to terms aggregations. I think there is still some work aside from that we can do to make the search.max_buckets setting more consistent across aggregation types but that can be a follow-up.
Describe the bug
The
search.max_buckets
setting (ref) is used to control the maximum number of aggregation buckets allowed in a single search response.For terms aggregations the way in which the bucket count is calculated is that sub-aggregation buckets are counted first, and then if their parent bucket is pruned from the candidate list the sub-aggregation bucket count is then subtracted. This means that it is not really accurately counting the number of buckets, see reproduction section below for an example.
More broadly speaking, I'm not sure if this
search.max_buckets
setting is actually useful. I think the setting can have 2 uses:max_buckets
setting at the shard level, it's only evaluated duringreduce
on the coordinator level which is after a lot of the resource intensive portions of the search request are already completed.Somewhat related:
Related component
Search:Resiliency
To Reproduce
Expected behavior
The following was done with the
noaa
opensearch-benchmarks workload but it's not specific to that data.Set cluster setting:
This search request does not hit the max buckets limit
Neither does this one
However, this one does:
In all 3 of these cases the response size on the coordinator is only 2 buckets.
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: