Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question]: Noticed large amd64 node running in the cluster #167

Closed
2 tasks done
wesbragagt opened this issue Oct 21, 2024 · 1 comment
Closed
2 tasks done

[question]: Noticed large amd64 node running in the cluster #167

wesbragagt opened this issue Oct 21, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@wesbragagt
Copy link
Contributor

wesbragagt commented Oct 21, 2024

Prior Search

  • I have already searched this project's issues to determine if a similar question has already been asked.

What is your question?

Our users have reported today that our AWS cost spiked to $200/daily and today I noticed a really larged amd64 node that is running in the cluster. From what I understand karpenter shouldn't be scheduling nodes this large.

apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 10.0.135.46
    compatibility.karpenter.k8s.aws/kubelet-drift-hash: "9225586735335466555"
    karpenter.k8s.aws/ec2nodeclass-hash: "4467586684236461009"
    karpenter.k8s.aws/ec2nodeclass-hash-version: v3
    karpenter.sh/nodepool-hash: "12626618633731391170"
    karpenter.sh/nodepool-hash-version: v3
    karpenter.sh/stored-version-migrated: "true"
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2024-10-21T09:54:05Z"
  finalizers:
  - karpenter.sh/termination
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: c7a.32xlarge
    beta.kubernetes.io/os: linux
    failure-domain.beta.kubernetes.io/region: us-west-2
    failure-domain.beta.kubernetes.io/zone: us-west-2b
    k8s.io/cloud-provider-aws: 755524269923376f33a5502ad15fb0ce
    karpenter.k8s.aws/instance-category: c
    karpenter.k8s.aws/instance-cpu: "128"
    karpenter.k8s.aws/instance-cpu-manufacturer: amd
    karpenter.k8s.aws/instance-ebs-bandwidth: "40000"
    karpenter.k8s.aws/instance-encryption-in-transit-supported: "true"
    karpenter.k8s.aws/instance-family: c7a
    karpenter.k8s.aws/instance-generation: "7"
    karpenter.k8s.aws/instance-hypervisor: nitro
    karpenter.k8s.aws/instance-memory: "262144"
    karpenter.k8s.aws/instance-network-bandwidth: "50000"
    karpenter.k8s.aws/instance-size: 32xlarge
    karpenter.sh/capacity-type: spot
    karpenter.sh/nodepool: spot-65be7294
    karpenter.sh/registered: "true"
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: ip-10-0-135-46.us-west-2.compute.internal
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: c7a.32xlarge
    panfactum.com/class: spot
    topology.k8s.aws/zone-id: usw2-az2
    topology.kubernetes.io/region: us-west-2
    topology.kubernetes.io/zone: us-west-2b
  name: ip-10-0-135-46.us-west-2.compute.internal
  ownerReferences:
  - apiVersion: karpenter.sh/v1
    blockOwnerDeletion: true
    kind: NodeClaim
    name: spot-65be7294-w4dfb
    uid: 8576b2c8-7748-4bb2-8df0-451b7e7c8d2a
  resourceVersion: "148736475"
  uid: 89076915-f883-49fd-8316-d08c5d646f24
spec:
  providerID: aws:///us-west-2b/i-0ce228a5039bf14b5
  taints:
  - effect: NoSchedule
    key: node.cilium.io/agent-not-ready
    value: "true"
  - effect: NoSchedule
    key: spot
    value: "true"
status:
  addresses:
  - address: 10.0.135.46
    type: InternalIP
  - address: ip-10-0-135-46.us-west-2.compute.internal
    type: InternalDNS
  - address: ip-10-0-135-46.us-west-2.compute.internal
    type: Hostname
  allocatable:
    cpu: 127610m
    ephemeral-storage: "36778017935"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 257776024Ki
    pods: "110"
  capacity:
    cpu: "128"
    ephemeral-storage: 41071788Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 258492824Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2024-10-21T15:21:06Z"
    lastTransitionTime: "2024-10-21T09:54:03Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2024-10-21T15:21:06Z"
    lastTransitionTime: "2024-10-21T09:54:03Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2024-10-21T15:21:06Z"
    lastTransitionTime: "2024-10-21T09:54:03Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2024-10-21T15:21:06Z"
    lastTransitionTime: "2024-10-21T09:54:27Z"
    message: kubelet is posting ready status
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - 730335560480.dkr.ecr.us-west-2.amazonaws.com/quay/cilium/cilium@sha256:bfeb3f1034282444ae8c498dca94044df2b9c9c8e7ac678e0b43c849f0b31746
    sizeBytes: 208385057
  - names:
    - localhost/kubernetes/pause:0.1.0
    sizeBytes: 174872
  nodeInfo:
    architecture: amd64
    bootID: 456eb3db-e303-4a90-a012-fd67c478e309
    containerRuntimeVersion: containerd://1.7.22+bottlerocket
    kernelVersion: 6.1.109
    kubeProxyVersion: v1.29.5-eks-1109419
    kubeletVersion: v1.29.5-eks-1109419
    machineID: ec2ee53d7b1d7232db63ed24f6cedb47
    operatingSystem: linux
    osImage: Bottlerocket OS 1.25.0 (aws-k8s-1.29)
    systemUUID: ec2ee53d-7b1d-7232-db63-ed24f6cedb47

Events:

Name:               ip-10-0-135-46.us-west-2.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=c7a.32xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-west-2
                    failure-domain.beta.kubernetes.io/zone=us-west-2b
                    k8s.io/cloud-provider-aws=755524269923376f33a5502ad15fb0ce
                    karpenter.k8s.aws/instance-category=c
                    karpenter.k8s.aws/instance-cpu=128
                    karpenter.k8s.aws/instance-cpu-manufacturer=amd
                    karpenter.k8s.aws/instance-ebs-bandwidth=40000
                    karpenter.k8s.aws/instance-encryption-in-transit-supported=true
                    karpenter.k8s.aws/instance-family=c7a
                    karpenter.k8s.aws/instance-generation=7
                    karpenter.k8s.aws/instance-hypervisor=nitro
                    karpenter.k8s.aws/instance-memory=262144
                    karpenter.k8s.aws/instance-network-bandwidth=50000
                    karpenter.k8s.aws/instance-size=32xlarge
                    karpenter.sh/capacity-type=spot
                    karpenter.sh/nodepool=spot-65be7294
                    karpenter.sh/registered=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-135-46.us-west-2.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=c7a.32xlarge
                    panfactum.com/class=spot
                    topology.k8s.aws/zone-id=usw2-az2
                    topology.kubernetes.io/region=us-west-2
                    topology.kubernetes.io/zone=us-west-2b
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.0.135.46
                    compatibility.karpenter.k8s.aws/kubelet-drift-hash: 9225586735335466555
                    karpenter.k8s.aws/ec2nodeclass-hash: 4467586684236461009
                    karpenter.k8s.aws/ec2nodeclass-hash-version: v3
                    karpenter.sh/nodepool-hash: 12626618633731391170
                    karpenter.sh/nodepool-hash-version: v3
                    karpenter.sh/stored-version-migrated: true
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 21 Oct 2024 04:54:05 -0500
Taints:             node.cilium.io/agent-not-ready=true:NoSchedule
                    spot=true:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-0-135-46.us-west-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Mon, 21 Oct 2024 10:28:55 -0500
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Mon, 21 Oct 2024 10:26:12 -0500   Mon, 21 Oct 2024 04:54:03 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Mon, 21 Oct 2024 10:26:12 -0500   Mon, 21 Oct 2024 04:54:03 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Mon, 21 Oct 2024 10:26:12 -0500   Mon, 21 Oct 2024 04:54:03 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Mon, 21 Oct 2024 10:26:12 -0500   Mon, 21 Oct 2024 04:54:27 -0500   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.0.135.46
  InternalDNS:  ip-10-0-135-46.us-west-2.compute.internal
  Hostname:     ip-10-0-135-46.us-west-2.compute.internal
Capacity:
  cpu:                128
  ephemeral-storage:  41071788Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             258492824Ki
  pods:               110
Allocatable:
  cpu:                127610m
  ephemeral-storage:  36778017935
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             257776024Ki
  pods:               110
System Info:
  Machine ID:                 ec2ee53d7b1d7232db63ed24f6cedb47
  System UUID:                ec2ee53d-7b1d-7232-db63-ed24f6cedb47
  Boot ID:                    456eb3db-e303-4a90-a012-fd67c478e309
  Kernel Version:             6.1.109
  OS Image:                   Bottlerocket OS 1.25.0 (aws-k8s-1.29)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.22+bottlerocket
  Kubelet Version:            v1.29.5-eks-1109419
  Kube-Proxy Version:         v1.29.5-eks-1109419
ProviderID:                   aws:///us-west-2b/i-0ce228a5039bf14b5
Non-terminated Pods:          (1 in total)
  Namespace                   Name            CPU Requests  CPU Limits  Memory Requests  Memory Limits   Age
  ---------                   ----            ------------  ----------  ---------------  -------------   ---
  cilium                      cilium-67t72    100m (0%)     0 (0%)      272061154 (0%)   353679500 (0%)  5h34m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests        Limits
  --------           --------        ------
  cpu                100m (0%)       0 (0%)
  memory             272061154 (0%)  353679500 (0%)
  ephemeral-storage  0 (0%)          0 (0%)
  hugepages-1Gi      0 (0%)          0 (0%)
  hugepages-2Mi      0 (0%)          0 (0%)
Events:
  Type    Reason             Age                   From       Message
  ----    ------             ----                  ----       -------
  Normal  DisruptionBlocked  3m56s (x43 over 91m)  karpenter  Cannot disrupt Node: state node isn't initialized

What primary components of the stack does this relate to?

terraform

Code of Conduct

  • I agree to follow this project's Code of Conduct
@wesbragagt wesbragagt added question Further information is requested triage Needs to be triaged labels Oct 21, 2024
@fullykubed
Copy link
Member

@wesbragagt This is a problem in v1 Karpenter that we discovered: kubernetes-sigs/karpenter#1762.

The edge.24-10-21 release provides a workaround to this issue.

@fullykubed fullykubed removed the triage Needs to be triaged label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants