Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter - Reconciliation not possible for custom named instances #300

Closed
1 task done
paba19 opened this issue Nov 4, 2023 · 3 comments · Fixed by #315
Closed
1 task done

Karpenter - Reconciliation not possible for custom named instances #300

paba19 opened this issue Nov 4, 2023 · 3 comments · Fixed by #315
Milestone

Comments

@paba19
Copy link

paba19 commented Nov 4, 2023

Description

The Karpenter controller is not able to reconcile instances to be deleted if their name does not include Karpenter.

Looking at the code it seems that the IAM policy for the IRSAis expecting the name to always include karpenter (see link )

  • ✋ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

  • Module version [Required]:
  • Terraform version:
    Terraform v1.6.1
    on linux_amd64
  • Provider version(s):
    Terraform v1.6.1
    on linux_amd64
  • provider registry.terraform.io/gavinbunney/kubectl v1.14.0
  • provider registry.terraform.io/hashicorp/aws v5.0.1
  • provider registry.terraform.io/hashicorp/cloudinit v2.3.2
  • provider registry.terraform.io/hashicorp/helm v2.11.0
  • provider registry.terraform.io/hashicorp/http v3.4.0
  • provider registry.terraform.io/hashicorp/kubernetes v2.23.0
  • provider registry.terraform.io/hashicorp/time v0.9.1
  • provider registry.terraform.io/hashicorp/tls v4.0.4

Reproduction Code [Required]

module "eks-blueprints-addons" {
  source  = "aws-ia/eks-blueprints-addons/aws"
  version = "1.10.1"

  cluster_name      = local.cluster_name
  cluster_endpoint  = module.eks_cluster.cluster_endpoint
  cluster_version   = var.eks_cluster_version
  oidc_provider_arn = module.eks_cluster.oidc_provider_arn

  eks_addons = {
  karpenter_enable_spot_termination = false
  enable_karpenter = true
  karpenter_node = {
      iam_role_arn = module.eks_cluster.eks_managed_node_groups.eks-static-nodes.iam_role_arn
      create_iam_role = false
    }
  karpenter = {
    chart_version = "v${var.eks_karpenter_version}"
    values = [
      <<-EOT
        controller:
          image:
            tag: v${var.eks_karpenter_version}
        tolerations:
          - key: node-taints.example.com/scope
            value: infrastructure
            operator: Equal
        nodeSelector:
          node-labels.example.com/scope: "infrastrucutre"
      EOT
    ]
    wait = true
  }
  depends_on = [
    module.eks_cluster.eks_managed_node_groups
  ]
  tags = var.common_tags
}
resource "kubectl_manifest" "karpenter_provisioner" {
  yaml_body = <<-EOF
    apiVersion: karpenter.sh/v1beta1
    kind: NodePool
    metadata:
      name: default
    spec:
      template:
        metadata:
          labels:
            node-labels.example.com/scope: "application"
        spec:
          nodeClassRef:
            name: default
            apiVersion: karpenter.k8s.aws/v1beta1
            kind: EC2NodeClass
          taints:
          - key: node-taints.example.com/scope
            value: application
            effect: NoSchedule
          requirements:
          - key: "karpenter.k8s.aws/instance-category"
            operator: In
            values: ["t", "c"]
          - key: "karpenter.k8s.aws/instance-cpu"
            operator: In
            values: ["2", "4"]
          - key: "karpenter.k8s.aws/instance-memory"
            operator: Lt
            values: ["2048"]
          - key: "karpenter.k8s.aws/instance-generation"
            operator: Gt
            values: ["2"]
          - key: "topology.kubernetes.io/zone"
            operator: In
            values: ["eu-central-1a", "eu-central-1b"]
          - key: "kubernetes.io/arch"
            operator: In
            values: ["amd64"]
          - key: "karpenter.sh/capacity-type" # If not included, the webhook for the AWS cloud provider will default to on-demand
            operator: In
            values: ["on-demand"]
      disruption:
        expireAfter: Never
        consolidationPolicy: WhenEmpty
        consolidateAfter: 30s
      limits:
        cpu: "100"
        memory: 100Gi
    EOF

  depends_on = [
    module.eks-blueprints-addons
  ]
}

resource "kubectl_manifest" "karpenter_node_template" {
  yaml_body = <<-EOF
    apiVersion: karpenter.k8s.aws/v1beta1
    kind: EC2NodeClass
    metadata:
      name: default
    spec:
      amiFamily: AL2
      subnetSelectorTerms:          
        - tags:
            karpenter.sh/discovery/node: ${var.k8s_private_sub_tags["karpenter.sh/discovery/node"]}
            karpenter.sh/discovery: ${var.k8s_private_sub_tags["karpenter.sh/discovery"]}
      securityGroupSelectorTerms:   
        - tags:
            kubernetes.io/cluster/${local.cluster_name}: "owned"
      role: "${module.eks_cluster.eks_managed_node_groups.eks-static-nodes.iam_role_name}"
      tags:                  
        Name: eks-jit-nodes
      detailedMonitoring: false
    EOF

  depends_on = [
    module.eks-blueprints-addons
  ]
}

Steps to reproduce the behavior:

  • Apply the above
  • Trigger nodes creation
  • Scale back in deployment to trigger instance termination

Expected behaviour

Nodes should be terminated

Actual behaviour

Instance are not terminated and controller logs:

{"level":"ERROR","time":"2023-11-04T19:59:21.653Z","logger":"controller","message":"Reconciler error","commit":"1072d3b","controller":"node.termination","controllerGroup":"","controllerKind":"Node","Node":{"name":"ip-REDACTED.eu-central-1.compute.internal"},"namespace":"","name":"ip-REDACTED.eu-central-1.compute.internal","reconcileID":"5da575eb-5271-400c-8b89-ccde1c977275","error":"terminating cloudprovider instance, terminating instance, UnauthorizedOperation: You are not authorized to perform this operation. User: arn:aws:sts::REDACTED:assumed-role/karpenter-20231104182048115600000018/1699126836611442743 is not authorized to perform: ec2:TerminateInstances on resource: arn:aws:ec2:eu-central-1:REDACTED:instance/i-REDACTED because no identity-based policy allows the ec2:TerminateInstances action.

Additional context

var.eks_karpenter_version is 0.32.1

@bryantbiggs
Copy link
Contributor

you'll need to update to v1.11.0 which contains the necessary changes #298

@paba19
Copy link
Author

paba19 commented Nov 4, 2023

Thanks @bryantbiggs I'll test it again with new version, but checking the code of the tag 1.11.0 I don't quite get how it could help: https://github.com/aws-ia/terraform-aws-eks-blueprints-addons/blob/v1.11.0/main.tf#L2841

Looks like it still is not customisable.

Note in the example above that in the EC2NodeClass I gave it the name eks-jit-nodes.

Ideally I would like to be able to set a terraform variable that can be passed consistently to the IAM policy and to the EC2NodeClass.

Thanks again

@bryantbiggs
Copy link
Contributor

For now, I would not set the name tag on the nodes and the permissions will work as intended. We will be updating the permissions for Karpenter to re-align with the upstream project in #286

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants