Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Query Insights - Recommendation framework #12292

Open
ansjcy opened this issue Feb 12, 2024 · 1 comment
Open

[RFC] Query Insights - Recommendation framework #12292

ansjcy opened this issue Feb 12, 2024 · 1 comment
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Search Project-wide roadmap label Search:Query Insights

Comments

@ansjcy
Copy link
Member

ansjcy commented Feb 12, 2024

Is your feature request related to a problem? Please describe

We developed the Query Insights Plugin as part of #11429 to provide query-level insights and surface the potential issues to users, but the complete workflow (elaborated below in [1]) should also involve providing actionable recommendations to the user so that they can improve the performance. The recommendations could be on per-query level (optimize how the query is constructed) or on index/cache level (optimize how the underlying data are stored).

[1] Based on feedback from community, the complete workflow for an OpenSearch admin user to interact with Query insights framework to improve the query performance should be like: 1. The user gain overview info on queries that consume most resources from query insights framework. 2. The user drill down to specific queries to understand the details, including the query shape, the users who sent the queries, etc. 3. Drill down information also suggests what are the potential impact of these queries, and actions users can take to optimize those queries and improve overall search performance.

Describe the solution you'd like

There are two different types of recommendations:

  1. Query-specific recommendations
    These recommendations focus on optimizing individual queries, such as removing redundant filters or eliminating routing parameters to prevent shard overload. We can focus on only the "top N heavy queries ([RFC] Real-time Insights into Top N Queries by Latency and Resource Usage #11186)" and implement an async post processor to analyze the top n queries, and if possible, attach the "per query level" recommendation to the top queries.

  2. Underlying Index-Level Recommendations
    These recommendations are aimed at improving overall search query performance by optimizing the underlying data. Ideally those recommdations should be like "add a field / reconstruct the documents in the index to improve the CPU utilization by X percent". For example, remodel the document to avoid joins, use copy-to to combine fields to avoid multi_match etc.

We should establish a standard recommendation infrastructure within Query Insights plugin to support those use cases. It should support the complete workflow to add customized rules, detect and match queries with rules, and make recommendations.

Related component

Search:Query Insights

Describe alternatives you've considered

Alternatively, we can potentially utilize Performance Analyzer RCA to offer simple rule-based recommendations. The RCA agent will be responsible for reading and interpreting query insight data, integrating with PA metrics, and generating recommendations. These recommendations can be write back to the cluster through the query insight plugin, making them accessible through the dashboard. Or, they can be simply exposed through an API of the RCA agent. But that will require us to add a dependency on PA/RCA.

Additional context

Query Insights framework: #11429

Any feedback would be appreciated!

@ansjcy ansjcy added enhancement Enhancement or improvement to existing feature or request untriaged and removed untriaged labels Feb 12, 2024
@ansjcy ansjcy removed the untriaged label Feb 12, 2024
@ansjcy ansjcy changed the title [Query Insights] Recommendation framework [RFC] Recommendation framework Feb 18, 2024
@ansjcy ansjcy changed the title [RFC] Recommendation framework [RFC] Query Insights - Recommendation framework Feb 18, 2024
@andrross andrross added the Roadmap:Search Project-wide roadmap label label May 29, 2024
@github-project-automation github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024
@getsaurabh02 getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024
@ansjcy
Copy link
Member Author

ansjcy commented Jan 8, 2025

At a high level, the architecture should involve the following key components within the Query Insights Framework.

Recommendation Rules: Defines "Under what condition, Take what action, and What impact there will be." These rules include:

  • Matching rules (Under what condition): Define the conditions under which recommendations should be applied. it can be a combination of query DSL structures, cluster state, and index state.
  • Actions (Take what action): Specify the actions to take once a rule is matched. It can be rewriting the query, or enabling certain features, or adjusting certain configurations.
  • Impact Vectors (What impact there will be): Qualify / Quantify the potential impact of each action, covering key metrics such as latency and resource usage. Impact vectors can be static (rule-based) or dynamic (learned from historical data using trained models).

Recommendation Engine: is the core of the proposed system which comprises:

  • Rule Registerer: Register and manage recommendation rules in the recmomendation engine.
  • Rule Matcher: Matches queries against predefined rules based on Query Sources and Cluster States.
    • Query Sources are the sources from which we analyze and make the recommendations, sources include top queries from Query Insights and user-provided specific queries (from API or profiling page). They serve as the input for the recommendation process.
    • Cluster States are all relevant cluster and index metadata, such as index state, instance types, features settings (e.g. workload management and tiered caching settings). They are also input for the recommendation process to ensure context-aware recommendations.
  • Action Generator: Generates actionable recommendations based on matched rules.
  • Decider: Quantifies the expected impact of each recommendation using Impact Vector. It decides the best action (or combination of actions) to present to users.
  • Feedback Tracker: Tracks the real-world impact of applied recommendations and refines rules and Impact Vectors based on collected feedback.

Customer Experiences: are all the customer touch points to present or execute the recommendations.

  • Display Layer: Exposes recommendations and actions, it includes:
    • APIs: For programmatic access to recommendations.
    • Top Queries Dashboards: For visual presentation of recommendations for top n queries on the dashboards.
    • Profiling Page: For customer to get recommendations to improve performance for specific queries.
    • Notifications: To send notifications on certain frequent and high confidence recommendations
    • OpenSearch Assistants: To integrate the recommendation data into the LLM chatbot to provide interactive guidance in understanding and applying recommendations.
  • Action Layer: Executes the recommendations automatically or with one click, it can include query rewrites, index updates, or enabling specific OpenSearch features, and reports the results back to the Feedback Tracker.

The interactions among these components and the overall workflow are shown in the architecture diagram below.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Search Project-wide roadmap label Search:Query Insights
Projects
Status: New
Status: Later (6 months plus)
Development

No branches or pull requests

2 participants