Karpenter keep scheduling pod with PVC on node that reach max EBS volume #1748

Ph4rell · 2024-10-11T16:14:01Z

Description

Observed Behavior:
When a node reach the maximum allocatable volume, Karpenter keep trying to schedule pod with PVC on that node.

Expected Behavior:
Karpenter should scale up a new node to schedule new pod with volumes.

Reproduction Steps (Please include YAML):

Deploy a new nodepool:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: test
spec:
  weight: 10
  template:
    metadata:
      labels:
        role:test
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: firstclass
      startupTaints:
        - key: ebs.csi.aws.com/agent-not-ready
          effect: NoExecute
      taints:
        - key: test
          effect: NoSchedule
      expireAfter: Never
      terminationGracePeriod: 48h
      requirements:
        - key: "karpenter.k8s.aws/instance-family"
          operator: In
          values: ["r4", "r5"]
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["xlarge", "2xlarge"]
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: Never
  weight: 10

Deploy a statefulset that target the node

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test
  labels:
    app: my-app
  namespace: kube-system
spec:
  replicas: 25
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: busybox
        command: ["/bin/sh", "-c", "while true; do sleep1; done"]
        securityContext:
          runAsNonRoot: true
          runAsUser: 1000
          allowPrivilegeEscalation: false 
          capabilities:
            drop:
              - ALL
          seccompProfile:
            type: RuntimeDefault
        volumeMounts:
        - mountPath: /mnt/data
          name: my-pvc
      tolerations:
      - key: "test"
        operator: "Exists"
        effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: role
                operator: In
                values:
                - test
  volumeClaimTemplates:
    - metadata:
        name: my-pvc
      spec:
        storageClassName: gp2-1a
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 10Gi

The storageClass look like this:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp2-1a
parameters:
  type: gp2
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Versions:

Karpenter Controller Version: 1.0.6
Kubernetes Version (kubectl version): v1.30.3
Server Version: v1.30.4-eks-a737599
aws-ebs-csi-driver:v1.35.0

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-10-11T16:14:09Z

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

engedaam · 2024-10-21T17:17:52Z

Can you share the Karpenter logs from this time period?

amemni · 2024-10-24T08:38:18Z

Hey @engedaam , in relation to the NodePool that Pierre was testing with, we only see these logs in Karpenter logs:

❯ stern karpenter |grep test-pierre
+ karpenter-77fcb896cf-f2lcn › controller
+ karpenter-77fcb896cf-qgqrg › controller
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:23:26.160Z","logger":"controller","caller":"disruption/controller.go:91","message":"removing consolidatable status condition","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"8714d086-69c0-4d3c-a828-c1fc94636770"}
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:23:55.213Z","logger":"controller","caller":"disruption/controller.go:91","message":"marking consolidatable","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"dda72ae0-cfbc-401e-802a-5f67e1e63c05"}
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:24:35.354Z","logger":"controller","caller":"disruption/controller.go:91","message":"removing consolidatable status condition","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"459d8557-0ce1-4e51-8356-e709a878fc3d"}
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:25:04.188Z","logger":"controller","caller":"disruption/controller.go:91","message":"marking consolidatable","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"07af6d93-06c6-462f-a327-ed3346991047"}
^C

This happens at the same time where we see these events in relation to pod number 24 showing that the attachdetach-controller fails to attach the volume.

Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Normal   Scheduled           77s   default-scheduler        Successfully assigned test-pierre-123/test-pierre-24 to ip-10-0-143-68.eu-west-3.compute.internal
  Warning  FailedAttachVolume  15s   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-757a114a-0e15-4711-81ae-160d2dc54f02" : rpc error: code = DeadlineExceeded desc = context deadline exceeded

On the CSINode object, we do see this allocatable volumes value (sorry, this is probably an older node):

❯ k get csinode ip-10-0-143-68.eu-west-3.compute.internal -oyaml | yq -r .spec.drivers 
- allocatable:
    count: 26
  name: ebs.csi.aws.com
  nodeID: i-0473200018d6278da
  topologyKeys:
    - kubernetes.io/os
    - topology.ebs.csi.aws.com/zone
    - topology.kubernetes.io/zone

Considering that we have 2 system pods (Daemonsets), and the CSINode is correctly applied by the EBS-CSI-controller, and that we apply startupTaints to the NodePool as recommended in the Karpenter troubleshooting guide, we think that Karpenter fails to recognize the limit of allocatable volumes (26) and does not scale-up the NodePool as expected.

maximethebault · 2024-11-17T00:08:40Z

Experiencing the same issue with latest version of Karpenter. AWS attach-limits are a little bit trickier than just taking into account EBS volume, as it also counts ENI attachment + local instance store + GPU/accelerator. Are all of this taken into consideration by Karpenter?

Ph4rell added the kind/bug Categorizes issue or PR as related to a bug. label Oct 11, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter keep scheduling pod with PVC on node that reach max EBS volume #1748

Karpenter keep scheduling pod with PVC on node that reach max EBS volume #1748

Ph4rell commented Oct 11, 2024 •

edited

Loading

k8s-ci-robot commented Oct 11, 2024

engedaam commented Oct 21, 2024

amemni commented Oct 24, 2024

maximethebault commented Nov 17, 2024 •

edited

Loading

Karpenter keep scheduling pod with PVC on node that reach max EBS volume #1748

Karpenter keep scheduling pod with PVC on node that reach max EBS volume #1748

Comments

Ph4rell commented Oct 11, 2024 • edited Loading

Description

k8s-ci-robot commented Oct 11, 2024

engedaam commented Oct 21, 2024

amemni commented Oct 24, 2024

maximethebault commented Nov 17, 2024 • edited Loading

Ph4rell commented Oct 11, 2024 •

edited

Loading

maximethebault commented Nov 17, 2024 •

edited

Loading