-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable parallelExecution for integration test suites #934
Enable parallelExecution for integration test suites #934
Conversation
@qianheng-aws this is a great idea ! |
Here is the pros and cons comparing these 2 options, and also added it in the description: Option2: Add more nodes in CI and distribute tests equally to these nodes. |
Signed-off-by: Heng Qian <[email protected]>
5c7896f
to
0f0d883
Compare
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
…ect#934) * Split integration test to multiple groups and enable parallelExecution Signed-off-by: Heng Qian <[email protected]> * Fix spark-warehouse conflict Signed-off-by: Heng Qian <[email protected]> * Test with 3 groups Signed-off-by: Heng Qian <[email protected]> * Random shuffle tests before splitting groups Signed-off-by: Heng Qian <[email protected]> * reset group number to 4 Signed-off-by: Heng Qian <[email protected]> * revert shuffle Signed-off-by: Heng Qian <[email protected]> --------- Signed-off-by: Heng Qian <[email protected]>
Description
Enable parallel integration.
Based on the metrics collected:
The time cost of each suite is somehow faired. Most of test suites cost less than 1min and maximum cost is no more than 7 mins.
To reduce test execution time, we should increase parallelism, especially since we don't have any long-running test suites and all tests currently run sequentially.
TODO: There is another thought to reduce the average testing time for each suites is reusing the docker container among suites. It cost around 10 secs to bootstrap a container for OpenSearch. It will save 10 minutes if running integration(65 suites currently) in sequence.
There are 2 ways to increase parallelism:
Option1: Enable SBT's parallel execution in one node.
Pros: Easy to implement
Cons: Increase pressure on the building node, has possibility to make integ-test unstable if too much parallelism. It will launch at most 4(CPU cores of building node) docker containers and JVM. This optimization has upper bound limited by the performance of building node.
Option2: Add more nodes in CI and distribute tests equally to these nodes.
Pros: Can scaling as many building node as possible if we want.
Cons: Increase the complexity of the CI workflow since we're going to distribute tests to different building nodes and so need to merge their reports when all nodes have finished their tasks in the end. And it will also increase our spending on CI resources since we will use more building nodes.
These 2 options are compatible and can apply both of them if we want. Take option1 as the first step, as it can save resource and won't increase the workflow's complexity.
Option1 Test, time cost of integ-test recording:
baseline -> 1h 3m 35s
4 groups -> 32m 17s
3 groups -> 37m 58s
Try to shuffle tests before splitting into groups:
4 groups with shuffle -> 32m 42s
3 groups with shuffle -> 38m 37s
Related Issues
Resolves #853
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.