Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed cluster affinity scheduling evaluation order when scheduling is triggered via WorkloadRebalancer #5425

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bharathguvvala
Copy link

@bharathguvvala bharathguvvala commented Aug 26, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:
As discussed in the community meet on 2024-07-09 , added the fix to evaluate the cluster affinities freshly when scheduling is triggered via WorkloadRebalancer.

Which issue(s) this PR fixes:
Fixes #5070

Progress

  • share opinion in Community Meeting
  • proposal
  • implementation
  • UT test
  • e2e test

Special notes for your reviewer:
Raising this PR to get early feedback on the implementation as it is still work in progress and tested manually by building the deploying the scheduler. There may be unit test failures which I will fix so please don't merge the PR until that's done.

Does this PR introduce a user-facing change?:
Yes. [Rest of the section TBA]

@karmada-bot karmada-bot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. labels Aug 26, 2024
@karmada-bot
Copy link
Collaborator

Welcome @bharathguvvala! It looks like this is your first PR to karmada-io/karmada 🎉

@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 26, 2024
@chaosi-zju
Copy link
Member

chaosi-zju commented Aug 26, 2024

hi @bharathguvvala, really glad to see your PR submission!

By the way, here may be some minor problems with the ci now, and can you please fix the ci and squash the commit records into one? thanks~

@bharathguvvala
Copy link
Author

@chaosi-zju Sure, will do that.

@@ -301,7 +301,7 @@ func (c *EndpointsliceDispatchController) cleanOrphanDispatchedEndpointSlice(ctx
return nil
}

func (c *EndpointsliceDispatchController) dispatchEndpointSlice(ctx context.Context, work *workv1alpha1.Work, mcs *networkingv1alpha1.MultiClusterService) error {
func (c *EndpointsliceDispatchController) dispatchEndpointSlice(_ context.Context, work *workv1alpha1.Work, mcs *networkingv1alpha1.MultiClusterService) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @bharathguvvala, your this line may be submitted by mistake, ctx is required~

Suggested change
func (c *EndpointsliceDispatchController) dispatchEndpointSlice(_ context.Context, work *workv1alpha1.Work, mcs *networkingv1alpha1.MultiClusterService) error {
func (c *EndpointsliceDispatchController) dispatchEndpointSlice(ctx context.Context, work *workv1alpha1.Work, mcs *networkingv1alpha1.MultiClusterService) error {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea this seems like a mistake. will correct it.

@@ -657,7 +661,7 @@ func (s *Scheduler) scheduleClusterResourceBindingWithClusterAffinity(crb *workv
return err
}

func (s *Scheduler) scheduleClusterResourceBindingWithClusterAffinities(crb *workv1alpha2.ClusterResourceBinding) error {
func (s *Scheduler) scheduleClusterResourceBindingWithClusterAffinities(crb *workv1alpha2.ClusterResourceBinding, performFreshScheduling bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, adding a parameter performFreshScheduling is indeed works, but may be not the best practice~

It just occurred to me that we have similar logic in:

// the assignment mode is defaults to Steady to minimizes disruptions and preserves the balance across clusters.
expectAssignmentMode := Steady
// when spec.rescheduleTriggeredAt is updated, it represents a rescheduling is manually triggered by user, and the
// expected behavior of this action is to do a complete recalculation without referring to last scheduling results.
if util.RescheduleRequired(spec.RescheduleTriggeredAt, status.LastScheduledTime) {
expectAssignmentMode = Fresh
}

may be we can also use util.RescheduleRequired(spec.RescheduleTriggeredAt, status.LastScheduledTime) to judge whether need to reset the affinityIndex, since now a fresh reschedule is defaults to reset the index of ClusterAffinities, so we can do just like:

if util.RescheduleRequired(crb.Spec.RescheduleTriggeredAt, crb.Status.LastScheduledTime) {
  affinityIndex = 0
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to keep the status quo functionality. I will change it based on your suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks~
By the way, I'd appreciate it if you could rebase the latest master code and squash the commit log into a single record~

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done incorporating these changes. Currently there's only one scheduler worker which I think will continue to stay that way. If there are more then that could cause a race condition where the condition util.RescheduleRequired could evaluate to false while deciding scheduled cluster(s) after first evaluating to true .

@karmada-bot karmada-bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 21, 2024
@codecov-commenter
Copy link

codecov-commenter commented Oct 21, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 48.17%. Comparing base (cf7ac41) to head (762f992).

Files with missing lines Patch % Lines
pkg/scheduler/scheduler.go 0.00% 4 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5425      +/-   ##
==========================================
- Coverage   48.18%   48.17%   -0.02%     
==========================================
  Files         664      664              
  Lines       54799    54805       +6     
==========================================
- Hits        26405    26402       -3     
- Misses      26680    26686       +6     
- Partials     1714     1717       +3     
Flag Coverage Δ
unittests 48.17% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chaosi-zju
Copy link
Member

/retest

@chaosi-zju
Copy link
Member

Hi, there are still two nits:

  1. The CI of this PR failed due to it wasn't signed off, usually please use git commit -s -m 'your message ' or git commit -m ' Signed-off-by: AuthorName <[email protected]> \n <other message> ' to pass DCO (detail guideline can refer to: https://probot.github.io/apps/dco/).

  2. I see this PR has 26 commit record, can you please squash these 26 commits into one commit, you can refer to: https://stackoverflow.com/questions/5189560/how-do-i-squash-my-last-n-commits-together

thanks~

@karmada-bot karmada-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 13, 2024
@karmada-bot karmada-bot added the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Dec 13, 2024
@karmada-bot
Copy link
Collaborator

Adding label do-not-merge/contains-merge-commits because PR contains merge commits, which are not allowed in this repository.
Use git rebase to reapply your commits on top of the target branch. Detailed instructions for doing so can be found here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@karmada-bot karmada-bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 13, 2024
@bharathguvvala
Copy link
Author

@chaosi-zju I've tried to squash the commits and sort of messed it up. Those squashed commits aren't showing up in the branch. Can we do "squash and merge" while merging this PR?

@chaosi-zju
Copy link
Member

I've tried to squash the commits and sort of messed it up. Those squashed commits aren't showing up in the branch. Can we do "squash and merge" while merging this PR?

can you try one more time by following command:

git remote add upstream https://github.com/karmada-io/karmada.git
git fetch upstream
git rebase upstream/master

then if you successfully rebased, you can directly push tou commit,

but you may also encounter some rebase conflict, in that case you should manually resolve the conflict and execute git rebase --continue

Signed-off-by: Bharath Raghavendra Reddy Guvvala <[email protected]>
@karmada-bot karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 14, 2025
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign garrybest for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 14, 2025
@bharathguvvala
Copy link
Author

@chaosi-zju can you check now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fresh rescheduling not happening via workloadrebalancer
4 participants