Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDBs still block "forceful" node termination #1776

Open
dpiddock opened this issue Oct 25, 2024 · 4 comments
Open

PDBs still block "forceful" node termination #1776

dpiddock opened this issue Oct 25, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@dpiddock
Copy link

Description

Observed Behavior:
After upgrading to Karpenter 1.0, we tried to enact a policy to terminate nodes after 7d with a 4h terminationGracePeriod. However, Karpenter still refuses to terminate a pod at the deadline if the PDB does not allow for disruption. This results in us having large instances running with just a single workload pod as Karpenter has already evicted other workloads and tainted the node karpenter.sh/disrupted:NoSchedule 💸 .

Repeated events are generated against the node:

  Normal   DisruptionBlocked  14m (x1329 over 47h)    karpenter  Cannot disrupt Node: state node is marked for deletion
  Warning  FailedDraining     3m48s (x1407 over 47h)  karpenter  Failed to drain node, 12 pods are waiting to be evicted

11 DaemonSet pods and 1 pod from a Deployment. The Deployment's PDB is configured to not allow normal termination of the pod.

Karpenter itself is logging:

{
  "level": "ERROR",
  "time": "2024-10-25T12:46:56.452Z",
  "logger": "controller",
  "message": "consistency error",
  "commit": "6174c75",
  "controller": "nodeclaim.consistency",
  "controllerGroup": "karpenter.sh",
  "controllerKind": "NodeClaim",
  "NodeClaim": {
    "name": "test-vxtgb"
  },
  "namespace": "",
  "name": "test-vxtgb",
  "reconcileID": "2a7b8ffd-80cf-4fbf-b612-870a33adec27",
  "error": "can't drain node, PDB \"default/test\" is blocking evictions"
}

Expected Behavior:
A node owned by Karpenter reaches expireAfter + terminationGracePeriod and all pods are removed. Node is terminated.

I'm not sure if this is actually a documentation bug? But the documentation certainly implies, to my reading, that PDBs get overridden when the grace period expires: terminationGracePeriod

Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the terminationGracePeriod is reached.

Reproduction Steps (Please include YAML):

  • Have a NodePool with forceful termination enabled. e.g.
    spec:
      template:
        spec:
          expireAfter: 1h
          terminationGracePeriod: 1h
  • Create a Deployment:
    kubectl create deployment test --image=nginx --replicas=1
    
  • Add a PDB that won't allow termination:
    kubectl create poddisruptionbudget test --selector=app=test --min-available=1
    
  • Wait. Node won't get terminated by Karpenter

Versions:

  • Chart Version: 1.0.6
  • Kubernetes Version (kubectl version):
    Client Version: v1.31.1
    Kustomize Version: v5.4.2
    Server Version: v1.30.4-eks-a737599
    
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@dpiddock dpiddock added the kind/bug Categorizes issue or PR as related to a bug. label Oct 25, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 25, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@engedaam
Copy link
Contributor

engedaam commented Nov 4, 2024

Can you share your karpenter configuration application?

@dpiddock
Copy link
Author

dpiddock commented Nov 6, 2024

We install Karpenter with Helm:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node.kubernetes.io/karpenter-workload
          operator: Exists
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: kubernetes.io/hostname
controller:
  resources:
    limits:
      memory: 1Gi
    requests:
      cpu: 0.25
      memory: 1Gi
dnsPolicy: Default
logLevel: info
podAnnotations:
  prometheus.io/port: "8080"
  prometheus.io/scrape: "true"
podDisruptionBudget:
  maxUnavailable: 1
  name: karpenter
priorityClassName: system-cluster-critical
serviceAccount:
  create: false
  name: karpenter-controller
settings:
  clusterEndpoint: https://[...].eks.amazonaws.com
  clusterName: application-cluster
  interruptionQueue: application-cluster-karpenter-interruption-handler
strategy:
  rollingUpdate:
    maxUnavailable: 1
tolerations:
- effect: NoSchedule
  key: node.kubernetes.io/workload
  operator: Equal
  value: karpenter
topologySpreadConstraints:
- maxSkew: 1
  topologyKey: topology.kubernetes.io/zone
  whenUnsatisfiable: DoNotSchedule

A sample EC2NodeClass

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: mixed
spec:
  amiFamily: AL2
  amiSelectorTerms:
  - id: ami-1 # amazon-eks-node-1.30-*
  - id: ami-2 # amazon-eks-arm64-node-1.30-*
  blockDeviceMappings:
  - deviceName: /dev/xvda
    ebs:
      deleteOnTermination: true
      encrypted: true
      volumeSize: 128Gi
      volumeType: gp3
  detailedMonitoring: true
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required
  role: application-cluster-node
  securityGroupSelectorTerms:
  - id: sg-1
  - id: sg-2
  subnetSelectorTerms:
  - id: subnet-a
  - id: subnet-b
  - id: subnet-c
  tags:
    Edition: mixed
    karpenter.sh/discovery: application-cluster
  userData: |
    #!/bin/bash
    KUBELET_CONFIG=/etc/kubernetes/kubelet/kubelet-config.json
    grep -v search /etc/resolv.conf > /etc/kubernetes/kubelet/resolv.conf
    echo "$(jq '.resolvConf="/etc/kubernetes/kubelet/resolv.conf"' $KUBELET_CONFIG)" > $KUBELET_CONFIG
    echo "$(jq '.registryPullQPS=10' $KUBELET_CONFIG)" >  $KUBELET_CONFIG
    echo "$(jq '.registryBurst=25' $KUBELET_CONFIG)" >  $KUBELET_CONFIG

And the matching NodePool:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: mixed
spec:
  disruption:
    budgets:
    - nodes: 10%
    - nodes: "0"
      reasons:
      - Drifted
    consolidateAfter: 0s
    consolidationPolicy: WhenEmptyOrUnderutilized
  limits:
    cpu: "1500"
  template:
    spec:
      expireAfter: 168h # 1 week
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: mixed
      requirements:
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values:
        - c 
        - m 
        - r 
      - key: karpenter.k8s.aws/instance-cpu
        operator: In
        values:
        - "8" 
        - "16"
        - "32"
      - key: karpenter.k8s.aws/instance-generation
        operator: Gt
        values:
        - "4" 
      - key: karpenter.k8s.aws/instance-hypervisor
        operator: In
        values:
        - nitro
      - key: topology.kubernetes.io/zone
        operator: In
        values:
        - us-east-1a
        - us-east-1b
        - us-east-1c
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
        - spot
      startupTaints:
      - effect: NoExecute
        key: ebs.csi.aws.com/agent-not-ready
      - effect: NoExecute
        key: efs.csi.aws.com/agent-not-ready
      terminationGracePeriod: 4h
  weight: 50

@PavelGloba
Copy link

PavelGloba commented Nov 19, 2024

If this feature will work properly, we are going to migrate to karpenter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

4 participants