Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable concurrent search for terminate_after path #10200

Merged
merged 1 commit into from
Oct 3, 2023

Conversation

jed326
Copy link
Collaborator

@jed326 jed326 commented Sep 23, 2023

Disable concurrent search when terminate_after parameter is used.

Originally I planned on having this PR include a terminate_after threshold settings for values below which concurrent search would be disabled. However, I ran into a problem in TotalHitCountCollector::reduce that makes it difficult to get the correct TotalHits.relation.

@Override
public ReduceableSearchResult reduce(Collection<TotalHitCountCollector> collectors) throws IOException {
return (QuerySearchResult result) -> {
final TotalHits.Relation relation = (terminatedAfter != null)
? TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO
: TotalHits.Relation.EQUAL_TO;
int totalHits = collectors.stream().mapToInt(TotalHitCountCollector::getTotalHits).sum();
if (terminatedAfter != null && totalHits > terminatedAfter) {
totalHits = terminatedAfter;
}
final TotalHits totalHitCount = new TotalHits(totalHits, relation);
final TopDocs topDocs = (sort != null)
? new TopFieldDocs(totalHitCount, Lucene.EMPTY_SCORE_DOCS, sort.getSort())
: new TopDocs(totalHitCount, Lucene.EMPTY_SCORE_DOCS);
result.topDocs(new TopDocsAndMaxScore(topDocs, Float.NaN), null);
};
}

The problem is the TotalHitCountCollectorManager only has terminatedAfter value and the totalHits values but that is not enough to determine what the relation should be. Based on the limited amount of time I spent looking into this so far, this looks like a pretty heavy duty refactor needs to be done here so I think the whole "forceTermination + terminate_after threshold portion" can be taken as a future perf improvement instead.

Related Issues

Resolves #8371
Resolves #9946

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 23, 2023

Compatibility status:

Checks if related components are compatible with change e69448e

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git]

@github-actions github-actions bot added backlog distributed framework enhancement Enhancement or improvement to existing feature or request labels Sep 23, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@jed326 jed326 force-pushed the force-terminate_after branch 2 times, most recently from 672e8aa to d2bf2f5 Compare September 23, 2023 00:35
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions github-actions bot added the flaky-test Random test failure that succeeds on second run label Sep 26, 2023
@jed326 jed326 force-pushed the force-terminate_after branch from d2bf2f5 to 1dd3eed Compare September 27, 2023 00:06
@jed326 jed326 changed the base branch from main to 1.x September 27, 2023 00:15
@jed326 jed326 changed the base branch from 1.x to main September 27, 2023 00:15
@jed326 jed326 marked this pull request as ready for review September 27, 2023 00:15
@jed326 jed326 force-pushed the force-terminate_after branch from ad70a60 to a868a53 Compare September 27, 2023 22:22
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@jed326 jed326 force-pushed the force-terminate_after branch from a868a53 to e147b67 Compare October 2, 2023 00:58
@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

Gradle Check (Jenkins) Run Completed with:

@jed326 jed326 marked this pull request as ready for review October 2, 2023 01:40
@jed326
Copy link
Collaborator Author

jed326 commented Oct 2, 2023

@reta @sohami could you help review this? I ended up just disabling concurrent search for terminate_after completely in this PR since it looks like a bigger refactor than I originally thought to get force termination to work properly, which is a prerequisite to having a threshold setting. See PR description for more details.

@jed326 jed326 force-pushed the force-terminate_after branch from e147b67 to e69448e Compare October 2, 2023 16:37
@reta reta removed the flaky-test Random test failure that succeeds on second run label Oct 2, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      3 org.opensearch.index.shard.RemoteIndexShardTests.classMethod
      1 org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitRefresh

@sohami sohami added the backport 2.x Backport to 2.x branch label Oct 3, 2023
Copy link
Collaborator

@sohami sohami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but lets update the original issue with all the open items related information so that it is available when we pick up next time.

@sohami sohami merged commit 3a790c1 into opensearch-project:main Oct 3, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 3, 2023
Signed-off-by: Jay Deng <[email protected]>
(cherry picked from commit 3a790c1)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@jed326 jed326 changed the title Support terminate_after force termination for concurrent segment search Disable concurrent search for terminate_after path Oct 3, 2023
@github-actions github-actions bot added the flaky-test Random test failure that succeeds on second run label Oct 3, 2023
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Oct 3, 2023
reta pushed a commit that referenced this pull request Oct 3, 2023
(cherry picked from commit 3a790c1)

Signed-off-by: Jay Deng <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request Oct 9, 2023
vikasvb90 pushed a commit to vikasvb90/OpenSearch that referenced this pull request Oct 10, 2023
@@ -896,6 +896,8 @@ public void evaluateRequestShouldUseConcurrentSearch() {
&& aggregations().factories() != null
&& !aggregations().factories().allFactoriesSupportConcurrentSearch()) {
requestShouldUseConcurrentSearch.set(false);
} else if (terminateAfter != DEFAULT_TERMINATE_AFTER) {
requestShouldUseConcurrentSearch.set(false);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we disable concurrent search here, while there is a timeout in effect, this can have the effect of causing a significant reduction in total documents scanned across all nodes. That means you would get different results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we disable concurrent search here, while there is a timeout in effect,

I am not sure I understand the issue, but this is how aggregations work right now (they are sequential, not concurrent). The concurrent path have some limitation related to timeouts though, that is correct

Copy link
Collaborator Author

@jed326 jed326 Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure I understand the issue, specifically what "That means you would get different results" is referring to. What is happening here is that we are reverting back to the non-concurrent search workflow whenever the terminate_after search parameter is used.

Here is a (very long) issue with more details on the problems surrounding terminate_after, force termination, and concurrent search: #8371

shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
@jed326 jed326 deleted the force-terminate_after branch August 7, 2024 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog backport 2.x Backport to 2.x branch distributed framework enhancement Enhancement or improvement to existing feature or request flaky-test Random test failure that succeeds on second run
Projects
None yet
4 participants