Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA: refactor PredicateChecker into ClusterSnapshot #7497

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

towca
Copy link
Collaborator

@towca towca commented Nov 14, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler.

To handle DRA properly, scheduling predicates/filters always need to be run whenever scheduling a pod to a node inside the snapshot (so that the DRA scheduler plugin can compute the necessary allocation). The way that the code is structured currently doesn't make this requirement obvious, and we risk future changes breaking DRA behavior (e.g. new logic that schedules pods inside the snapshot gets added, but doesn't check the predicates). This PR refactors the code so that running predicates is the default behavior when scheduling pods inside the snapshot.

Summary of changes:

  • All PredicateChecker methods need ClusterSnapshot, and ClusterSnapshot needs PredicateChecker for the pod-scheduling methods. IMO it makes the most sense to make PredicateChecker an implementation detail of ClusterSnapshot, so that the users don't have to coordinate the two.
  • The predicate-checking logic in ClusterSnapshot would be the same for Basic and Delta implementations, so another layer of abstraction (PredicateSnapshot) is introduced on top of them, and the logic is implemented there.
  • The changes above require separating the clustersnapshot files into multiple subpackages. This also makes for a more readable code structure.
  • A bunch of test code throughout the CA code base needs to be adapted.

Which issue(s) this PR fixes:

The CA/DRA integration is tracked in kubernetes/kubernetes#118612, this is just part of the implementation.

Special notes for your reviewer:

The first commit in the PR is just a squash of #7466 and #7479, and it shouldn't be a part of this review. The PR will be rebased on top of master after the others are merged.

This is intended to be a no-op refactor. It was extracted from #7350 after #7447, #7466, and #7479. This should be the last refactor PR, next ones will introduce actual DRA logic.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

https://github.com/kubernetes/enhancements/blob/9de7f62e16fc5c1ea3bd40689487c9edc7fa5057/keps/sig-node/4381-dra-structured-parameters/README.md

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 14, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: towca

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2024
@towca towca force-pushed the jtuznik/dra-predicate-snapshot branch 2 times, most recently from ed9232e to 27420ef Compare November 14, 2024 15:34
@towca
Copy link
Collaborator Author

towca commented Nov 14, 2024

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 14, 2024
@towca towca force-pushed the jtuznik/dra-predicate-snapshot branch 2 times, most recently from e377759 to d84511f Compare November 19, 2024 14:13
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2024
@towca
Copy link
Collaborator Author

towca commented Nov 19, 2024

/assign @BigDarkClown

@towca towca force-pushed the jtuznik/dra-predicate-snapshot branch from d84511f to d78b5d8 Compare November 19, 2024 14:35
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2024
towca added a commit to towca/autoscaler that referenced this pull request Nov 20, 2024
towca added a commit to towca/autoscaler that referenced this pull request Nov 20, 2024
@towca towca force-pushed the jtuznik/dra-predicate-snapshot branch from d78b5d8 to e4d5002 Compare November 21, 2024 18:48
towca added a commit to towca/autoscaler that referenced this pull request Nov 25, 2024
towca added a commit to towca/autoscaler that referenced this pull request Nov 25, 2024
@@ -56,6 +56,7 @@ func (p *filterOutExpendable) Process(context *context.AutoscalingContext, pods
// CA logic from before migration to scheduler framework. So let's keep it for now
func (p *filterOutExpendable) addPreemptingPodsToSnapshot(pods []*apiv1.Pod, ctx *context.AutoscalingContext) error {
for _, p := range pods {
// TODO(DRA): Figure out if/how to use the predicate-checking SchedulePod() here instead - otherwise this doesn't work with DRA pods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this post-v1.32 TODO work?

@@ -223,7 +221,7 @@ func (r *RemovalSimulator) findPlaceFor(removedNode string, pods []*apiv1.Pod, n

// remove pods from clusterSnapshot first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we change this comment to "// unscheduled pods from clusterSnapshot first" ?

@@ -0,0 +1,126 @@
/*
Copyright 2016 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new files should read 2024 (let's hope we land this in 2024 :))

continue
}

if !preFilterResult.AllNodes() && !preFilterResult.NodeNames.Has(nodeInfo.Node().Name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an opportunity to rename the AllNodes method to something more descriptive like AllNodesAreEligible?

It also might be helpful for future maintainers to add a comment above these initial two if statements:

// Ensure that this node in the iteration fulfills the passed in nodeMatches filter func
// If only certain nodes are capable of running this pod,
// and if this node in the iteration isn't one of them, try the next node

return estimationState.newNodeNames[nodeInfo.Node().Name]
})
if err != nil {
if err != nil && err.Type() == clustersnapshot.NoNodesPassingPredicatesFoundError {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My read on this function is that it is meant to short-circuit as soon as any pod in a list of Pending pods is encountered that can not be scheduled on an existing node in the cluster, and that it will return that pod, as well as any additional, not-yet-checked pods in the list to the caller. Why are we changing our approach here where only the "nominal" error (NoNodesPassingPredicatesFoundError) triggers that behavior? Currently any error response causes us to return that pod list, whereas now all other errors now trigger the return of a nil list.

The existing FitsAnyNodeMatching method seems to have comparable error paths as the new SchedulePodOnAnyNodeMatching method (though now strongly typed).

@towca towca force-pushed the jtuznik/dra-predicate-snapshot branch from e4d5002 to aa4b078 Compare November 27, 2024 16:11
…hecker

This decouples PredicateChecker from the Framework initialization logic,
and allows creating multiple PredicateChecker instances while only
initializing the framework once.

This commit also fixes how CA integrates with Framework metrics. Instead
of Registering them they're only Initialized so that CA doesn't expose
scheduler metrics. And the initialization is moved from multiple
different places to the Handle constructor.
To handle DRA properly, scheduling predicates will need to be run
whenever Pods are scheduled in the snapshot.

PredicateChecker always needs a ClusterSnapshot to work, and ClusterSnapshot
scheduling methods need to run the predicates first. So it makes most
sense to have PredicateChecker be a dependency for ClusterSnapshot
implementations, and move the PredicateChecker methods to
ClusterSnapshot.

This commit mirrors PredicateChecker methods in ClusterSnapshot (with
the exception of FitsAnyNode which isn't used anywhere and is trivial to
do via FitsAnyNodeMatching). Further commits will remove the
PredicateChecker interface and move the implementation under
clustersnapshot.

Dummy methods are added to current ClusterSnapshot implementations to
get the tests to pass. Further commits will actually implement them.

PredicateError is refactored into a broader SchedulingError so that the
ClusterSnapshot methods can return a single error that the callers can
use to distinguish between a failing predicate and other, unexpected
errors.
PredicateSnapshot implements the ClusterSnapshot methods that need
to run predicates on top of a SnapshotBase.

testsnapshot pkg is introduced, providing functions abstracting away
the snapshot creation for tests.

ClusterSnapshot tests are moved near PredicateSnapshot, as it'll be
the only "full" implementation.
For DRA, this component will have to call the Reserve phase in addition
to just checking predicates/filters.

The new version also makes more sense in the context of
PredicateSnapshot, which is the only context now.
@towca towca force-pushed the jtuznik/dra-predicate-snapshot branch from aa4b078 to d78bc5b Compare November 27, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants