-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change priority for scheduling reroute during timeout #16445
base: main
Are you sure you want to change the base?
Change priority for scheduling reroute during timeout #16445
Conversation
Signed-off-by: Rishab Nahata <[email protected]>
❌ Gradle check result for 5e83a92: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
"reroute after existing shards allocator timed out", | ||
Priority.HIGH, | ||
"reroute after existing shards allocator [R] timed out", | ||
Priority.NORMAL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a separate priority for primary vs replica?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NORMAL also seems right for PSA. But during genuine issues in the cluster which can be identified with appropriate monitoring, we might need to raise it to HIGH. I will update the PR with a similar setting for ESA similar to BSA to raise reroute priority. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets update the PR description
Updated |
Signed-off-by: Rishab Nahata <[email protected]>
❌ Gradle check result for 6a448d0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Rishab Nahata <[email protected]>
❌ Gradle check result for 825a983: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Rishab Nahata <[email protected]>
❌ Gradle check result for 5368e7f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Rishab Nahata <[email protected]>
❌ Gradle check result for 2ba604d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16445 +/- ##
============================================
- Coverage 72.11% 72.09% -0.03%
- Complexity 65071 65091 +20
============================================
Files 5313 5313
Lines 303413 303437 +24
Branches 43906 43908 +2
============================================
- Hits 218816 218769 -47
- Misses 66639 66785 +146
+ Partials 17958 17883 -75 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Rishab Nahata <[email protected]>
Signed-off-by: Rishab Nahata <[email protected]>
Signed-off-by: Rishab Nahata <[email protected]>
Setting.Property.NodeScope, | ||
Setting.Property.Dynamic | ||
); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic seems redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to parse reroute priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
*/ | ||
public static final Setting<Priority> FOLLOW_UP_REROUTE_PRIORITY_SETTING = new Setting<>( | ||
"cluster.routing.allocation.balanced_shards_allocator.schedule_reroute.priority", | ||
Priority.NORMAL.toString(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add a changelog as we are changing the default priority from HIGH to NORMAL
This PR is stalled because it has been open for 30 days with no activity. |
Description
This PR updates the priority of scheduling reroute when timed out from HIGH to NORMAL. This is because consistent HIGH reroutes might starve NORMAL priority tasks. And moreover, NORMAL is right for reasonable clusters. For clusters in messed up state which is causing NORMAL priority tasks to starve, we add a new dynamic cluster setting to raise the priority of reroute task to allocate shards in such scenarios.
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
[ ] Functionality includes testing.[ ] API changes companion pull request created, if applicable.[ ] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.