Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a flaky test that relies on terminate_after for an exact count match #12178

Closed
wants to merge 36 commits into from

Conversation

austintlee
Copy link
Contributor

Description

[Describe what this change achieves]

Related Issues

Resolves #10435

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

austintlee and others added 30 commits September 11, 2023 09:51
Signed-off-by: Austin Lee <[email protected]>

Update changelog.

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
…arch-project#10402)

* Fix stats reporting for multistream downloads.

Signed-off-by: Rishikesh1159 <[email protected]>

* rename tracker to fileTransferTracker.

Signed-off-by: Rishikesh1159 <[email protected]>

---------

Signed-off-by: Rishikesh1159 <[email protected]>
…project#10437)

* DiversifiedSamplerIT.testNestedSamples
* QueryProfilePhaseTests.testMaxScore
* QueryProfilePhaseTests.testCollapseQuerySearchResults
* HighlighterSearchIT.testHighlightQueryRewriteDatesWithNow
* FieldCapabilitiesIT.testWithIndexFilter
* QueryProfilePhaseTests.testDisableTopScoreCollection

Signed-off-by: Jay Deng <[email protected]>
Signed-off-by: Jay Deng <[email protected]>
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <[email protected]>
…ect#10503)

* revive remote cluster state auto restore integ tests

Signed-off-by: bansvaru <[email protected]>
… loss recovery (opensearch-project#10447)

* Remote Restore Cluster Metadata if local disk state lost after quorum loss

Signed-off-by: bansvaru <[email protected]>
…rvice for RateLimiting (opensearch-project#9286)

* Changes for AdmissionControl Interceptor and AdmissionControlService for RateLimiting (opensearch-project#9286)

Signed-off-by: Ajay Kumar Movva <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
…arch-project#10402)

* Fix stats reporting for multistream downloads.

Signed-off-by: Rishikesh1159 <[email protected]>

* rename tracker to fileTransferTracker.

Signed-off-by: Rishikesh1159 <[email protected]>

---------

Signed-off-by: Rishikesh1159 <[email protected]>
…project#10437)

* DiversifiedSamplerIT.testNestedSamples
* QueryProfilePhaseTests.testMaxScore
* QueryProfilePhaseTests.testCollapseQuerySearchResults
* HighlighterSearchIT.testHighlightQueryRewriteDatesWithNow
* FieldCapabilitiesIT.testWithIndexFilter
* QueryProfilePhaseTests.testDisableTopScoreCollection

Signed-off-by: Jay Deng <[email protected]>
Signed-off-by: Jay Deng <[email protected]>
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <[email protected]>
…ect#10503)

* revive remote cluster state auto restore integ tests

Signed-off-by: bansvaru <[email protected]>
… loss recovery (opensearch-project#10447)

* Remote Restore Cluster Metadata if local disk state lost after quorum loss

Signed-off-by: bansvaru <[email protected]>
…rvice for RateLimiting (opensearch-project#9286)

* Changes for AdmissionControl Interceptor and AdmissionControlService for RateLimiting (opensearch-project#9286)

Signed-off-by: Ajay Kumar Movva <[email protected]>
Copy link
Contributor

github-actions bot commented Feb 6, 2024

Compatibility status:

Checks if related components are compatible with change 730d737

Incompatible components

Incompatible components: [https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/performance-analyzer.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/security.git]

Copy link
Contributor

github-actions bot commented Feb 6, 2024

✅ Gradle check result for 730d737: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search Search query, autocomplete ...etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Search requests with size=0 & terminate_after sometimes return incorrectly hit count
10 participants