Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter keep scheduling pod with PVC on node that reach max EBS volume #1748

Open
Ph4rell opened this issue Oct 11, 2024 · 4 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@Ph4rell
Copy link

Ph4rell commented Oct 11, 2024

Description

Observed Behavior:
When a node reach the maximum allocatable volume, Karpenter keep trying to schedule pod with PVC on that node.

Expected Behavior:
Karpenter should scale up a new node to schedule new pod with volumes.

Reproduction Steps (Please include YAML):

  • Deploy a new nodepool:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: test
spec:
  weight: 10
  template:
    metadata:
      labels:
        role:test
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: firstclass
      startupTaints:
        - key: ebs.csi.aws.com/agent-not-ready
          effect: NoExecute
      taints:
        - key: test
          effect: NoSchedule
      expireAfter: Never
      terminationGracePeriod: 48h
      requirements:
        - key: "karpenter.k8s.aws/instance-family"
          operator: In
          values: ["r4", "r5"]
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["xlarge", "2xlarge"]
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: Never
  weight: 10
  • Deploy a statefulset that target the node
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test
  labels:
    app: my-app
  namespace: kube-system
spec:
  replicas: 25
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: busybox
        command: ["/bin/sh", "-c", "while true; do sleep1; done"]
        securityContext:
          runAsNonRoot: true
          runAsUser: 1000
          allowPrivilegeEscalation: false 
          capabilities:
            drop:
              - ALL
          seccompProfile:
            type: RuntimeDefault
        volumeMounts:
        - mountPath: /mnt/data
          name: my-pvc
      tolerations:
      - key: "test"
        operator: "Exists"
        effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: role
                operator: In
                values:
                - test
  volumeClaimTemplates:
    - metadata:
        name: my-pvc
      spec:
        storageClassName: gp2-1a
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 10Gi
  • The storageClass look like this:
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp2-1a
parameters:
  type: gp2
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Versions:

  • Karpenter Controller Version: 1.0.6
  • Kubernetes Version (kubectl version): v1.30.3
  • Server Version: v1.30.4-eks-a737599
  • aws-ebs-csi-driver:v1.35.0
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@Ph4rell Ph4rell added the kind/bug Categorizes issue or PR as related to a bug. label Oct 11, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 11, 2024
@engedaam
Copy link
Contributor

Can you share the Karpenter logs from this time period?

@amemni
Copy link

amemni commented Oct 24, 2024

Hey @engedaam , in relation to the NodePool that Pierre was testing with, we only see these logs in Karpenter logs:

❯ stern karpenter |grep test-pierre
+ karpenter-77fcb896cf-f2lcn › controller
+ karpenter-77fcb896cf-qgqrg › controller
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:23:26.160Z","logger":"controller","caller":"disruption/controller.go:91","message":"removing consolidatable status condition","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"8714d086-69c0-4d3c-a828-c1fc94636770"}
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:23:55.213Z","logger":"controller","caller":"disruption/controller.go:91","message":"marking consolidatable","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"dda72ae0-cfbc-401e-802a-5f67e1e63c05"}
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:24:35.354Z","logger":"controller","caller":"disruption/controller.go:91","message":"removing consolidatable status condition","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"459d8557-0ce1-4e51-8356-e709a878fc3d"}
karpenter-77fcb896cf-qgqrg controller {"level":"DEBUG","time":"2024-10-24T08:25:04.188Z","logger":"controller","caller":"disruption/controller.go:91","message":"marking consolidatable","commit":"6174c75","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-pierre-4kqd6"},"namespace":"","name":"test-pierre-4kqd6","reconcileID":"07af6d93-06c6-462f-a327-ed3346991047"}
^C

This happens at the same time where we see these events in relation to pod number 24 showing that the attachdetach-controller fails to attach the volume.

Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Normal   Scheduled           77s   default-scheduler        Successfully assigned test-pierre-123/test-pierre-24 to ip-10-0-143-68.eu-west-3.compute.internal
  Warning  FailedAttachVolume  15s   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-757a114a-0e15-4711-81ae-160d2dc54f02" : rpc error: code = DeadlineExceeded desc = context deadline exceeded

On the CSINode object, we do see this allocatable volumes value (sorry, this is probably an older node):

❯ k get csinode ip-10-0-143-68.eu-west-3.compute.internal -oyaml | yq -r .spec.drivers 
- allocatable:
    count: 26
  name: ebs.csi.aws.com
  nodeID: i-0473200018d6278da
  topologyKeys:
    - kubernetes.io/os
    - topology.ebs.csi.aws.com/zone
    - topology.kubernetes.io/zone

Considering that we have 2 system pods (Daemonsets), and the CSINode is correctly applied by the EBS-CSI-controller, and that we apply startupTaints to the NodePool as recommended in the Karpenter troubleshooting guide, we think that Karpenter fails to recognize the limit of allocatable volumes (26) and does not scale-up the NodePool as expected.

@maximethebault
Copy link

maximethebault commented Nov 17, 2024

Experiencing the same issue with latest version of Karpenter. AWS attach-limits are a little bit trickier than just taking into account EBS volume, as it also counts ENI attachment + local instance store + GPU/accelerator. Are all of this taken into consideration by Karpenter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

5 participants