-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karpenter keep scheduling pod with PVC on node that reach max EBS volume #1748
Comments
This issue is currently awaiting triage. If Karpenter contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Can you share the Karpenter logs from this time period? |
Hey @engedaam , in relation to the NodePool that Pierre was testing with, we only see these logs in Karpenter logs: ❯ stern karpenter |grep test-pierre
+ karpenter-77fcb896cf-f2lcn › controller
+ karpenter-77fcb896cf-qgqrg › controller
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:23:26.160Z","logger":"controller","caller":"disruption/controller.go:91","message":"removing consolidatable status condition","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"8714d086-69c0-4d3c-a828-c1fc94636770"}
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:23:55.213Z","logger":"controller","caller":"disruption/controller.go:91","message":"marking consolidatable","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"dda72ae0-cfbc-401e-802a-5f67e1e63c05"}
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:24:35.354Z","logger":"controller","caller":"disruption/controller.go:91","message":"removing consolidatable status condition","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"459d8557-0ce1-4e51-8356-e709a878fc3d"}
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:25:04.188Z","logger":"controller","caller":"disruption/controller.go:91","message":"marking consolidatable","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"07af6d93-06c6-462f-a327-ed3346991047"}
^C This happens at the same time where we see these events in relation to pod number 24 showing that the Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 77s default-scheduler Successfully assigned test-pierre-123/test-pierre-24 to ip-10-0-143-68.eu-west-3.compute.internal
Warning FailedAttachVolume 15s attachdetach-controller AttachVolume.Attach failed for volume "pvc-757a114a-0e15-4711-81ae-160d2dc54f02" : rpc error: code = DeadlineExceeded desc = context deadline exceeded On the CSINode object, we do see this allocatable volumes value (sorry, this is probably an older node): ❯ k get csinode ip-10-0-143-68.eu-west-3.compute.internal -oyaml | yq -r .spec.drivers
- allocatable:
count: 26
name: ebs.csi.aws.com
nodeID: i-0473200018d6278da
topologyKeys:
- kubernetes.io/os
- topology.ebs.csi.aws.com/zone
- topology.kubernetes.io/zone Considering that we have 2 system pods (Daemonsets), and the CSINode is correctly applied by the EBS-CSI-controller, and that we apply |
Experiencing the same issue with latest version of Karpenter. AWS attach-limits are a little bit trickier than just taking into account EBS volume, as it also counts ENI attachment + local instance store + GPU/accelerator. Are all of this taken into consideration by Karpenter? |
Description
Observed Behavior:
When a node reach the maximum allocatable volume, Karpenter keep trying to schedule pod with PVC on that node.
Expected Behavior:
Karpenter should scale up a new node to schedule new pod with volumes.
Reproduction Steps (Please include YAML):
Versions:
kubectl version
): v1.30.3The text was updated successfully, but these errors were encountered: