Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable parallelExecution for integration test suites #934

Merged
merged 6 commits into from
Nov 22, 2024

Conversation

qianheng-aws
Copy link
Contributor

@qianheng-aws qianheng-aws commented Nov 20, 2024

Description

Enable parallel integration.

Based on the metrics collected:

total time cost: 1h09m, test suites: 125, test cases: 1674
Cost Range: 0 min - 1 min: 109 test suites, total cost 476 sec
Cost Range: 1 min - 2 min: 5 test suites, total cost 410 sec
Cost Range: 2 min - 3 min: 3 test suites, total cost 469 sec
Cost Range: 3 min - 4 min: 7 test suites, total cost 1482 sec
Cost Range: 6 min - 7 min: 1 test suites, total cost 407 sec

The time cost of each suite is somehow faired. Most of test suites cost less than 1min and maximum cost is no more than 7 mins.

To reduce test execution time, we should increase parallelism, especially since we don't have any long-running test suites and all tests currently run sequentially.

TODO: There is another thought to reduce the average testing time for each suites is reusing the docker container among suites. It cost around 10 secs to bootstrap a container for OpenSearch. It will save 10 minutes if running integration(65 suites currently) in sequence.

There are 2 ways to increase parallelism:

Option1: Enable SBT's parallel execution in one node.
Pros: Easy to implement
Cons: Increase pressure on the building node, has possibility to make integ-test unstable if too much parallelism. It will launch at most 4(CPU cores of building node) docker containers and JVM. This optimization has upper bound limited by the performance of building node.

Option2: Add more nodes in CI and distribute tests equally to these nodes.
Pros: Can scaling as many building node as possible if we want.
Cons: Increase the complexity of the CI workflow since we're going to distribute tests to different building nodes and so need to merge their reports when all nodes have finished their tasks in the end. And it will also increase our spending on CI resources since we will use more building nodes.

These 2 options are compatible and can apply both of them if we want. Take option1 as the first step, as it can save resource and won't increase the workflow's complexity.

Option1 Test, time cost of integ-test recording:
baseline -> 1h 3m 35s
4 groups -> 32m 17s
3 groups -> 37m 58s

Try to shuffle tests before splitting into groups:
4 groups with shuffle -> 32m 42s
3 groups with shuffle -> 38m 37s

Related Issues

Resolves #853

Check List

  • Updated documentation (docs/ppl-lang/README.md)
  • Implemented unit tests
  • Implemented tests for combination with other commands
  • New added source code should include a copyright header
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@YANG-DB
Copy link
Member

YANG-DB commented Nov 20, 2024

@qianheng-aws this is a great idea !
Is there any down side for this (option 1 vs option 2) parallelism ?

@qianheng-aws
Copy link
Contributor Author

qianheng-aws commented Nov 21, 2024

@qianheng-aws this is a great idea ! Is there any down side for this (option 1 vs option 2) parallelism ?

Here is the pros and cons comparing these 2 options, and also added it in the description:
Option1: Enable SBT's parallel execution in one node.
Pros: Easy to implement
Cons: Increase pressure on the building node, has possibility to make integ-test unstable if too much parallelism. It will launch at most 4(CPU cores) docker containers and JVM. This optimization has upper bound limited by the performance of building node.

Option2: Add more nodes in CI and distribute tests equally to these nodes.
Pros: Can scaling as many building node as possible if we want.
Cons: Increase the complexity of the CI workflow since we're going to distribute tests to different building nodes and so need to merge their reports when all nodes have finished their tasks in the end. And it will also increase our spending on CI resources since we will use more building nodes.

@qianheng-aws qianheng-aws marked this pull request as ready for review November 21, 2024 15:56
@qianheng-aws qianheng-aws changed the title IT test efficiency enhancement Split testingGroup and enable parallelExecution for integration test suites Nov 21, 2024
@qianheng-aws qianheng-aws changed the title Split testingGroup and enable parallelExecution for integration test suites Enable parallelExecution for integration test suites Nov 21, 2024
@LantaoJin LantaoJin merged commit 3ff2ef2 into opensearch-project:main Nov 22, 2024
4 checks passed
@LantaoJin LantaoJin added testing test related feature infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. labels Nov 22, 2024
kenrickyap pushed a commit to Bit-Quill/opensearch-spark that referenced this pull request Dec 11, 2024
…ect#934)

* Split integration test to multiple groups and enable parallelExecution

Signed-off-by: Heng Qian <[email protected]>

* Fix spark-warehouse conflict

Signed-off-by: Heng Qian <[email protected]>

* Test with 3 groups

Signed-off-by: Heng Qian <[email protected]>

* Random shuffle tests before splitting groups

Signed-off-by: Heng Qian <[email protected]>

* reset group number to 4

Signed-off-by: Heng Qian <[email protected]>

* revert shuffle

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. testing test related feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]Reduce IT Test time
3 participants