Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supports additional query timing types for profiling plugin query components #17146

Open
wants to merge 3 commits into
base: 2.x
Choose a base branch
from

Conversation

shatejas
Copy link

@shatejas shatejas commented Jan 27, 2025

Description

Adds enums related to knn to be able to profile ann query. Currently its difficult to debug latencies for knn, this will help increase visibility on knn query

KNN PR: opensearch-project/k-NN#2450

Related Issues

Resolves opensearch-project/k-NN#2286

Sample response

"profile": {
		"shards": [
			{
				"id": "[pJFQnSmhTb--c8eubY5bxg][target_index_faiss][0]",
				"inbound_network_time_in_millis": 0,
				"outbound_network_time_in_millis": 0,
				"searches": [
					{
						"query": [
							{
								"type": "NativeEngineKnnVectorQuery",
								"description": "NativeEngineKnnVectorQuery[]...KNNQuery[]",
								"time_in_nanos": 17356165,
								"breakdown": {
									"advance": 0,
									"advance_count": 0,
									"ann_search": 0,
									"ann_search_count": 0,
									"build_scorer": 519125,
									"build_scorer_count": 2,
									"compute_max_score": 0,
									"compute_max_score_count": 0,
									"create_weight": 16831833,
									"create_weight_count": 1,
									"exact_knn_search": 0,
									"exact_knn_search_count": 0,
									"match": 0,
									"match_count": 0,
									"next_doc": 3249,
									"next_doc_count": 4,
									"score": 1958,
									"score_count": 3,
									"set_min_competitive_score": 0,
									"set_min_competitive_score_count": 0,
									"shallow_advance": 0,
									"shallow_advance_count": 0
								},
								"children": [
									{
										"type": "KNNQuery",
										"description": "",
										"time_in_nanos": 1937584,
										"breakdown": {
											"advance": 0,
											"advance_count": 0,
											"ann_search": 1937584,
											"ann_search_count": 1,
											"build_scorer": 0,
											"build_scorer_count": 0,
											"compute_max_score": 0,
											"compute_max_score_count": 0,
											"create_weight": 0,
											"create_weight_count": 0,
											"exact_knn_search": 0,
											"exact_knn_search_count": 0,
											"match": 0,
											"match_count": 0,
											"next_doc": 0,
											"next_doc_count": 0,
											"score": 0,
											"score_count": 0,
											"set_min_competitive_score": 0,
											"set_min_competitive_score_count": 0,
											"shallow_advance": 0,
											"shallow_advance_count": 0
										}
									}
								]
							}
						],
						"rewrite_time": 12709,
						"collector": [
							{
								"name": "SimpleTopScoreDocCollector",
								"reason": "search_top_hits",
								"time_in_nanos": 256085
							}
						]
					}
				],
				"aggregations": []
			}
		]
	}

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ContextIndexSearcher

Signed-off-by: Tejas Shah <[email protected]>
Signed-off-by: Tejas Shah <[email protected]>
@shatejas shatejas marked this pull request as ready for review January 27, 2025 21:01
@shatejas shatejas changed the title Adds KNN specific enums for profiling, exposes profiler in Adds KNN specific enums for profiling, exposes profiler in ContextIndexSearcher Jan 27, 2025
Copy link
Contributor

❌ Gradle check result for 51f1446: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@@ -48,7 +48,9 @@ public enum QueryTimingType {
SCORE,
SHALLOW_ADVANCE,
COMPUTE_MAX_SCORE,
SET_MIN_COMPETITIVE_SCORE;
SET_MIN_COMPETITIVE_SCORE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shatejas the core does not know anything about k-nn plugin (or any other plugin per se), this has to be part of the plugin related instrumentation. We may need to think how the profile phases could be extended / customized though, if required.

Copy link
Author

@shatejas shatejas Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta I understand, whats the recommendation in that case? this is one way I found to be able to have additional components in profile query

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shatejas we may never run into a need to have such an extensibility feature, so we may have to design in a plugin neural way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may never run into a need to have such an extensibility feature

Currently there is a need to have time_in_nanos for knn. KNN query is relatively complex, both ann and exact search as well as the filter query inside the knn are major components and there is no visibility on these making it extremely difficult to debug performance issues.

Currently I wasn't able to find a hook to have knn components in query breakdown without these changes.

so we may have to design in a plugin neural way.

can you elaborate whats involved here? if its major change in knn plugin it might have to be iterative and this change might work till then

Copy link
Contributor

@navneet1v navneet1v Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta I think agree on designing this in a plugin neutral way. @shatejas lets have extension points in core that can be used by Plugins to provide their QueryTimingTyes.

One idea I can think of here is QueryTimingType would be getting used to put in some string in the profile output. We can create another enum/class which collects all the TimingType from all the plugins and then put them at right place during serialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @shatejas

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shatejas I am not sure (just my opinion) this is the way solve the problem. The query timing types are context dependent: if we run k-nn query, they have to be there, otherwise - nothing should be in profile (otherwise we are polluting profiling summary with irrelevant details). The ability to add context dependent profiler (supported out of the box) solves it nicely I believe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta One thing I tried was to have types in query builder, for normal case it works fine. But the problem comes during complex queries example

{
  "profile": "true",
  "query": {
    "bool": {
      "should": [
        {
          "range": {
            "rating": {
              "gte": 8,
              "lte": 10
            }
          }
        },
        {
          "knn": {
            "location": {
              "vector": [
                4.2,
                4.6
              ],
              "k": 3
            }
          }
        }
      ]
    }
  }
}

In this scenario, it doesn't recognize knn specific strings and throw npe for profiling since the top level boolean query will have empty set.

Another option is we simply filter 0 values in the breakdown which will remove the verbosity

Overall, context might be helpful but it might not be a bad thing to always tell the user that a particular component 0 nanos

The ability to add context dependent profiler (supported out of the box) solves it nicely I believe.

I will need recommendation here. QueryBuilder and Plugin interfaces are exposed to plugins so, I tried those. Anything apart from these two that is exposed to plugin that I can leverage?

Copy link
Collaborator

@reta reta Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need recommendation here. QueryBuilder and Plugin interfaces are exposed to plugins so, I tried those. Anything apart from these two that is exposed to plugin that I can leverage?

I listed them here: #17146 (comment) . We do support injection of multiple profilers into SearchContext, all these APIs are open to the plugins.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryProfiler<T extends Enum> (or alternatively, introduce the intermediate abstraction like QueryProfilerBase<T extends Enum> and let QueryProfiler extend it).

The reason I did not go this route is because we want to add a few types in addition to QueryTimingType. Making it generic forces to have additional enums and have duplicate values of QueryTimingType since plugins like knn just want to piggy back on existing profiler

I am also not sure having a plugin specific profiler solves the context problem - providing custom profiler through querybuilder or plugin will require this profiler to set it in index searcher when the query is being executed.

SET_MIN_COMPETITIVE_SCORE;
SET_MIN_COMPETITIVE_SCORE,
EXACT_KNN_SEARCH,
ANN_SEARCH;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried this so idk about the viability of the idea, but is it possible to change QueryTimingType to an interface

interface QueryTimingType {
    String getValue();
}

and then have core define a default

enum DefaultQueryTimingType implements QueryTimingType {
    CREATE_WEIGHT,
    BUILD_SCORER,
    NEXT_DOC,
    ADVANCE,
    MATCH,
    SCORE,
    SHALLOW_ADVANCE,
    COMPUTE_MAX_SCORE,
    SET_MIN_COMPETITIVE_SCORE

    @Override
    public String getValue() {
        return name();
    }

    @Override
    public String toString() {
        return name().toLowerCase(Locale.ROOT);
    }
}

There could be an extension point for plugins to also define QueryTimingTypes and then a mechanism to collect them across the plugins and create a registry.

@shatejas shatejas changed the title Adds KNN specific enums for profiling, exposes profiler in ContextIndexSearcher Supports additional query timing types for profiling plugin query components Jan 28, 2025
Copy link
Contributor

❌ Gradle check result for 568cfe2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator

reta commented Jan 28, 2025

On the unrelated note, @shatejas please target main branch

@shatejas
Copy link
Author

On the unrelated note, @shatejas please target main branch

@reta there are some issues with knn main branch which make it harder to test. Can open up a PR against main branch once the approach is finalized

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants