-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change tinkerbell e2e run algorithm to schedule longer/smaller tests concurrently #8627
Conversation
/hold |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #8627 +/- ##
=========================================
+ Coverage 0 73.53% +73.53%
=========================================
Files 0 578 +578
Lines 0 36557 +36557
=========================================
+ Hits 0 26882 +26882
- Misses 0 7955 +7955
- Partials 0 1720 +1720 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Rahul Ganesh <[email protected]>
bbb047e
to
43bfd8d
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sp1999 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-0.20 |
@sp1999: new pull request created: #8902 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Description of changes:
Tinkerbell E2E tests runner algorithm sorts the tests based on the hardware count and prioritize running the tests that require more hardware which typically take more duration to run first. Our test runners run concurrently and reserve the hardware from the available pool. The concurrent count is set to 20 as we have issues with VCenter environment when we bump this count to higher number. In current setup, the longer tests reserve the required hardware first and run for a really long time. So the first few runners that reserve the hardware do not free them up until the first couple of hours of the runtime. We do not know if this is the best way to use up the hardware as smaller tests like single node tests take 20-30% of the runtime as that of longer tests. We have earlier also tried running all the smaller tests first which did take same amount of time like longer tests or even a little longer. Change the algorithm to prioritize tests that require lesser and more hardware count at the same time by popping from the tests queue at both ends. Experimentally check if this helps in using the hardware efficiently and helps in reducing the test run time.
Testing (if applicable):
Documentation added/planned (if applicable):
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.