Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Make rewrite-level max_clause_count check configurable at the index and/or query level #16723

Open
msfroh opened this issue Nov 26, 2024 · 0 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Resiliency

Comments

@msfroh
Copy link
Collaborator

msfroh commented Nov 26, 2024

Is your feature request related to a problem? Please describe

With the change in Lucene 9 to more aggressively count all query clauses, there are queries that worked on OpenSearch 1.x that start failing with TooManyNestedClauses on OpenSearch 2.x. Specifically, this check occurs in IndexSearcher#rewrite(Query).

In #13568, we made max_clause_count a dynamic cluster setting, which was at least better than the previous case where it was a static cluster setting (so you needed to specify it in your opensearch.yml file and restart your cluster to change it).

Changing max_clause_count has quite a wide-ranging impact. Some of these are redundant short-circuit checks (e.g. a single BooleanQuery can't have more than max_clause_count clauses, and will fail when you add the clause that exceeds the limit, though that same query should also fail the rewrite check). Sometimes it changes query behavior, though. For example, the percolate field type changes what kind of query it constructs depending on the max clause count.

It would be nice to loosen the constraint at query rewrite time without impacting a widely-used (global) parameter.

Describe the solution you'd like

I noticed that our ContextIndexSearcher subclass of Lucene's IndexSearcher already overrides the rewrite method in order to track rewrite time when profiling is enabled.

I'm thinking that we could copy/paste "inline" the Lucene IndexSearcher method and override the behavior of getNumClausesCheckVisitor() to use a non-global clause count parameter. Specifically, ContextIndexSearcher has access to the SearchSourceBuilder and IndexSettings for the current index. We could apply the rewrite-level check based on a clause count specified at the query or index level, falling back to the global value by default.

Other code that reads the global value can continue to do so.

Related component

Search:Resiliency

Describe alternatives you've considered

We could try proposing a change in Lucene to add an overload of the IndexSearcher#rewrite(Query) method that adds a max clause count parameter, where the existing method could delegate using the global value.

Additional context

No response

@msfroh msfroh added enhancement Enhancement or improvement to existing feature or request untriaged labels Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Resiliency
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants