Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during EKS Creation: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp 127.0.0.1:80: connect: connection refused #1280

Closed
lbornov2 opened this issue Mar 21, 2021 · 73 comments · Fixed by #1680

Comments

@lbornov2
Copy link

lbornov2 commented Mar 21, 2021

Description

When creating an EKS Cluster using terraform, we get the following error:

Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp 127.0.0.1:80: connect: connection refused

  on .terraform/modules/deployment.eks/aws_auth.tf line 65, in resource "kubernetes_config_map" "aws_auth":
  65: resource "kubernetes_config_map" "aws_auth" {

To fix this, we have to manually run:

aws eks update-kubeconfig --name ${var.context.app_name} --region ${var.context.region}

and then:

terraform apply -auto-approve

Versions

  • Terraform: 0.12.21
  • Provider(s):
  • aws - 3.22.0
  • terraform-aws-modules/eks/aws - 14.0.0
  • terraform-aws-modules/vpc/aws - 2.61.0
  • AWS CLI: 2.0.30
  • Helm: 3.3.4
  • Kubectl: 1.19.0

Reproduction

  1. Run the terraform code in the Code Snippet to Reproduce section
  2. Run terraform init && terraform apply -auto-approve
  3. You will get the error in the description.

To fix, manually run:

  1. aws eks update-kubeconfig --name ${var.context.app_name} --region ${var.context.region}
  2. terraform apply -auto-approve

Code Snippet to Reproduce

terraform {
  required_version = ">= 0.12.21"
}

provider "aws" {
  version = "~> 3.22.0"
  region  = "${var.context.region}"
}


data "aws_availability_zones" "available" {}

resource "random_string" "suffix" {
  length  = 8
  special = false
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "2.61.0"

  name                 = "${var.context.app_name}"
  cidr                 = "10.0.0.0/16"
  azs                  = data.aws_availability_zones.available.names
  private_subnets      = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets       = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
  enable_nat_gateway   = true
  single_nat_gateway   = true
  enable_dns_hostnames = true

  tags = {
    "kubernetes.io/cluster/${var.context.app_name}" = "shared"
  }

  public_subnet_tags = {
    "kubernetes.io/cluster/${var.context.app_name}" = "shared"
    "kubernetes.io/role/elb"                      = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/${var.context.app_name}" = "shared"
    "kubernetes.io/role/internal-elb"             = "1"
  }
}

resource "aws_security_group" "all_worker_mgmt" {
  name_prefix = "${var.context.app_name}-all_worker_management"
  vpc_id      = "${module.vpc.vpc_id}"

  ingress {
    from_port = 22
    to_port   = 22
    protocol  = "tcp"

    cidr_blocks = [
      "10.0.0.0/8",
      "172.16.0.0/12",
      "192.168.0.0/16",
    ]
  }
}

module "eks" {
  source                               = "terraform-aws-modules/eks/aws"
  version                              = "14.0.0"
  cluster_name                         = "${var.context.app_name}"
  cluster_version                      = "1.19"
  subnets                              = "${module.vpc.private_subnets}"
  vpc_id                               = "${module.vpc.vpc_id}"
  cluster_create_timeout               = "30m"
  worker_groups = [
    {
      instance_type = "${var.context.kubernetes.aws.machine_type}"
      asg_desired_capacity = "${var.context.replica_count}"
      asg_min_size = "${var.context.replica_count}"
      asg_max_size  = "${var.context.replica_count}"
      root_volume_type = "gp2"
    }
  ]
  worker_additional_security_group_ids = ["${aws_security_group.all_worker_mgmt.id}"]
  map_users = var.context.iam.aws.map_users
  map_roles = var.context.iam.aws.map_roles
}

Expected behavior

The cluster gets created successfully

Actual behavior

We get this output:

Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp 127.0.0.1:80: connect: connection refused

  on .terraform/modules/deployment.eks/aws_auth.tf line 65, in resource "kubernetes_config_map" "aws_auth":
  65: resource "kubernetes_config_map" "aws_auth" {
@dak1n1
Copy link

dak1n1 commented Mar 23, 2021

It looks like the Kubernetes provider isn't receiving a configuration. Here's how I configure mine, which is similar to the EKS module README except it's for the newer version of the Kubernetes provider. (My team recently released version 2.0 of the Kubernetes provider and it requires a slightly different config than shown in this module's README).

data "aws_eks_cluster" "default" {
  name = module.cluster.cluster_id
}

data "aws_eks_cluster_auth" "default" {
  name = module.cluster.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

@lbornov2
Copy link
Author

@dak1n1 - The code example I provided does NOT instantiate the kubernetes provider (at least not directly). The error happens during creation of the EKS module - not before or after. So - the problem happens in the EKS module itself - not outside of the module.

@dak1n1
Copy link

dak1n1 commented Mar 23, 2021

Right, but the EKS module uses the Kubernetes provider under the hood, so it needs a provider configuration. Otherwise, it will assume a default/empty config. You can skip using the Kubernetes provider within the EKS module by specifying manage_aws_auth = false in your EKS module config. That will skip the section that relies on the Kubernetes provider.

module "cluster" {
  source  = "terraform-aws-modules/eks/aws"
  version = "14.0.0" 
...
  manage_aws_auth  = false
...
}

@ArchiFleKs
Copy link
Contributor

I have the same issue on a newly created cluster everything is fine and then I'm changing the module tag and get this error, the weird thing is that if I put everything back just like in the state everything work as expected

@RobertFischer
Copy link

RobertFischer commented Mar 30, 2021

@dak1n1 -- How am I supposed to instantiate the Kubernetes provider using output from this module, but before I instantiate this module?

(Also, is everyone who specifies manage_aws_auth = true experiencing this issue? If not, what's different that allows them to avoid it? In the alternative, if we are all experiencing this issue, then isn't manage_aws_auth = true straight-up broke?)

@dak1n1
Copy link

dak1n1 commented Mar 30, 2021

@dak1n1 -- How am I supposed to instantiate the Kubernetes provider using output from this module, but before I instantiate this module?

(Also, is everyone who specifies manage_aws_auth = true experiencing this issue? If not, what's different that allows them to avoid it? In the alternative, if we are all experiencing this issue, then isn't manage_aws_auth = true straight-up broke?)

The issues vary depending on the user's configuration, but they're all related to a subject I'm currently researching, which is why I'm volunteering my time here. The EKS module, which is a community-driven effort (not managed by Hashicorp), specifically is using a pattern that is discouraged by the creators of Terraform. (Specifically, where it says a provider config should only reference values that are known before the configuration is applied, that's the issue we're hitting here).

The issues being described in this bug report all have the same root cause, which is that they are passing variables into a provider configuration which are not known at plan time. Terraform simply doesn't support that, and so they encourage instead separating out the AWS provider and Kubernetes provider resources, so you can use two applies when needed. However, there are some work-arounds that can still help to achieve this workflow. Since so many users want to use this in a single apply, this is my area of interest, to try and enable that pattern to succeed, despite the current limitations in Terraform.

TLDR: you can copy/paste the config I gave above, which will only read the EKS cluster after the variables are known. You can also use the config in this repo's README, which is another way to establish this dependency.

However, since this pattern is not actually supported in Terraform, there will be times when it will fail. Specifically, when the EKS cluster's credentials become unknown, such as when replacing the cluster, or during destroy on Terraform 0.14.x. To avoid these errors, there are work-arounds such as terraform refresh prior to destroy, and removing the kubernetes config map from state prior to replacing/modifying the EKS cluster (I believe that would look something like terraform state rm module.cluster.kubernetes_config_map.aws-auth, but I don't know what impact that would have on the EKS worker nodes).

This page can be helpful for learning about configuring providers that are used with modules. https://www.terraform.io/docs/language/modules/develop/providers.html

I also have a couple working configs here, if anyone wants to reference them.

On the Kubernetes provider side, I have some plans to smooth this out a bit and provide more meaningful error messages. I'm also planning to implement a version of this old request that I found when researching the topic. That will allow the Kubernetes provider to keep trying to contact the Kubernetes API, rather than failing immediately with the obscure localhost error we all see. So there are some changes hopefully coming in the next few months.

@RobertFischer
Copy link

RobertFischer commented Mar 30, 2021 via email

@jgournet
Copy link

it probably won't apply to many people, but in case that can help: I had this config:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.this.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.this.certificate_authority[0].data)
  config_context         = data.aws_eks_cluster.this.arn
  token                  = data.aws_eks_cluster_auth.this.token
}

Somehow, removing the "config_context" line made this error disappear ...

@charneykaye
Copy link

charneykaye commented Jun 15, 2021

I have never set manage_aws_auth yet I experienced this issue after I had changed the name of my eks module.

The resolution was

  • use terraform state list in order to discover the resources that were stored under the legacy module name
  • use terraform state rm to remove the resources stored under the legacy module name
  • use terraform import to re-import the module's resources under the new name

@stevehipwell
Copy link
Contributor

@dak1n1 this issue is being caused by hashicorp/terraform#24886 (which is also a pretty large pain point in implementing a Hashicorp native credential flow e.g. OIDC -> Vault STS -> AWS provider). Without this being solved what is the logic for a aws_eks_cluster_auth not to be created until something else has been configured? The current "best practice" pattern errors when you plan and don't apply for over 15 mins; it also errors when you've layered other code on top of this module and a control plane or managed worker change take over 15 mins to complete.

TL;DR - Is there a hack to make a aws_eks_cluster_auth resource wait for something else before being calculated so we can target these to be ready when they're not going to expire?

@ArchiFleKs
Copy link
Contributor

@stevehipwell Do you think this might be related to the issue you mentioned above ?

@stevehipwell
Copy link
Contributor

@ArchiFleKs I suspect that it's related and I meant to add a comment to that effect. We're having to delete the aws_auth config map from state when we destroy a cluster, see #1280 (comment).

@ArchiFleKs
Copy link
Contributor

ArchiFleKs commented Jun 21, 2021

@stevehipwell Yes my workaround is:

  • remove aws_auth from state
  • apply with manage_aws_auth=false
  • import configmap aws_auth to state
  • apply with manage_aws_auth=true

@stevehipwell
Copy link
Contributor

@ArchiFleKs have you tried reverting the Kubernetes provider version?

@ArchiFleKs
Copy link
Contributor

@ArchiFleKs have you tried reverting the Kubernetes provider version?

To which version ? before v2.0 ?

@stevehipwell
Copy link
Contributor

That'd be my first suggestion, and if it works see is any of the v2 versions also work.

@jaimehrubiks
Copy link
Contributor

I am also looking into best practices, although currently I use a single terraform run to deploy some dependencies, this eks module, and a bunch of helm charts and kubernetes yaml files.

Currently, using kubernetes provider version = "~> 1.11.1" and terraform 0.14.11 (and also manage_aws_auth=true) I am not experiencing big issues. ( I do need to refresh before destroy, but I feel that is expected). My provider blocks look like this:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
    token                  = data.aws_eks_cluster_auth.cluster.token
  }
}

provider "kubectl" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
}

I though of sharing my versions and blocks in case someone finds it useful.

Still, will keep around the discussion, in case I get an issue in the future and we can come up with workarounds.

@PascalBourdier
Copy link
Contributor

We encountered this trouble too and @grandria find another workaround : set a good value to KUBE_CONFIG_PATH export KUBE_CONFIG_PATH=$KUBECONFIG according the official doc: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/guides/v2-upgrade-guide

@KarstenSiemer
Copy link
Contributor

I experience this issue too and alike @jaimehrubiks I am as well using eks as a submodule and install a good amount of helm charts and other stuff into the cluster. I have had huge problems with token lifetimes in the past and always had to spin up clusters in two steps. 1st create the cluster, then error because there is no token, replan and apply with a token now.

I configured my providers like this in the submodule to overcome these problems:

data "aws_eks_cluster" "this" {
  name       = module.eks_control_plane.cluster.name
  depends_on = [module.eks_control_plane.cluster]
}

data "aws_eks_cluster_auth" "this" {
  count      = module.eks_control_plane.cluster != null ? 1 : 0
  name       = module.eks_control_plane.cluster.name
  depends_on = [module.eks_control_plane.cluster]
}

provider "kubernetes" {
  alias                  = "initial"
  host                   = data.aws_eks_cluster.this.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.this.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.this[0].token
}

data "aws_eks_cluster" "iam" {
  name       = module.eks_control_plane.cluster.name
  depends_on = [module.eks_control_plane.cluster, module.aws_auth]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.iam.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.iam.certificate_authority[0].data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", data.aws_eks_cluster.iam.id, "--role-arn", "arn:aws:iam::${var.aws_account_id}:role/Atlantis"]
    command     = "aws"
  }
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.iam.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.iam.certificate_authority[0].data)
    exec {
      api_version = "client.authentication.k8s.io/v1alpha1"
      args        = ["eks", "get-token", "--cluster-name", data.aws_eks_cluster.iam.id, "--role-arn", "arn:aws:iam::${var.aws_account_id}:role/Atlantis"]
      command     = "aws"
    }
  }
}

With this I can spin up a cluster and install helm charts and other stuff in a single plan/apply.
Using the aws eks get-token command it is easy to get a long living token for helm but I need to add the configuration for Atlantis to the cluster first, so that I can actually get such a token. This is done by the "initial" provider. It installs the aws-auth configmap, which is managed in a seperate module apart from the eks cluster itself.
Yet I experience this error:

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused

The data blocks to configure the provider are empty somehow and I cannot just reference a kube config since I want to be able to still use Atlantis and it's easy iam assumeables for authentification.

If I get said error I go into the local terraform file cache under .terraform/modules/$module-name and edit the provider files for the clusters into this:

data "aws_eks_cluster" "this" {
  name       = module.eks_control_plane.cluster.name
  depends_on = [module.eks_control_plane.cluster]
}

data "aws_eks_cluster_auth" "this" {
  count      = module.eks_control_plane.cluster != null ? 1 : 0
  name       = module.eks_control_plane.cluster.name
  depends_on = [module.eks_control_plane.cluster]
}

provider "kubernetes" {
  alias                  = "initial"
  host                   = module.eks_control_plane.cluster.endpoint
  cluster_ca_certificate = base64decode(module.eks_control_plane.cluster.certificate_authority[0].data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", module.eks_control_plane.cluster.name, "--role-arn", "arn:aws:iam::${var.aws_account_id}:role/Atlantis"]
    command     = "aws"
  }
}

data "aws_eks_cluster" "iam" {
  name       = module.eks_control_plane.cluster.name
  depends_on = [module.eks_control_plane.cluster, module.aws_auth]
}

provider "kubernetes" {
  host                   = module.eks_control_plane.cluster.endpoint
  cluster_ca_certificate = base64decode(module.eks_control_plane.cluster.certificate_authority[0].data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", module.eks_control_plane.cluster.name, "--role-arn", "arn:aws:iam::${var.aws_account_id}:role/Atlantis"]
    command     = "aws"
  }
}

provider "helm" {
  kubernetes {
    host                   = module.eks_control_plane.cluster.endpoint
    cluster_ca_certificate = base64decode(module.eks_control_plane.cluster.certificate_authority[0].data)
    exec {
      api_version = "client.authentication.k8s.io/v1alpha1"
      args        = ["eks", "get-token", "--cluster-name", module.eks_control_plane.cluster.name, "--role-arn", "arn:aws:iam::${var.aws_account_id}:role/Atlantis"]
      command     = "aws"
    }
  }
}

This works everytime once the cluster is already spun up. But I cant just use it like that, since I cannot spin up cluster in a single plan/apply fashion any more. I don't understand tho why the datablocks become stale and empty, even refreshing them doesn't help at all

@ArchiFleKs
Copy link
Contributor

We encountered this trouble too and @grandria find another workaround : set a good value to KUBE_CONFIG_PATH export KUBE_CONFIG_PATH=$KUBECONFIG according the official doc: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/guides/v2-upgrade-guide

Actually this technique is working, but it is a bit hard to set when running in a CI for example and kubeconfig is not present

@KarstenSiemer
Copy link
Contributor

KarstenSiemer commented Aug 23, 2021

okay, I think I somehow solved the problem at least for me.
I just moved the data blocks into the eks module and set an output with a depends on like this (inside module.eks_control_plane):

resource "time_sleep" "wait" {
  depends_on = [aws_eks_cluster.this[0]]

  create_duration = "30s"
}

data "aws_eks_cluster" "this" {
  count      = var.enabled ? 1 : 0
  name       = aws_eks_cluster.this[0].name
  depends_on = [aws_eks_cluster.this[0], time_sleep.wait]
}

data "aws_eks_cluster_auth" "this" {
  count      = var.enabled ? 1 : 0
  name       = aws_eks_cluster.this[0].name
  depends_on = [aws_eks_cluster.this[0], time_sleep.wait]
}

output "aws_eks_cluster" {
  value      = var.enabled ? data.aws_eks_cluster.this[0] : null
  depends_on = [aws_eks_cluster.this[0], time_sleep.wait]
}

output "aws_eks_cluster_auth" {
  value      = var.enabled ? data.aws_eks_cluster_auth.this[0] : null
  depends_on = [aws_eks_cluster.this[0], time_sleep.wait]
}

(I know putting the depends_on twice is actually redundant, but I just like to make really sure)
Anyway, I then refer to that in my parent module inside the providers:

provider "kubernetes" {
  alias                  = "initial"
  host                   = module.eks_control_plane.aws_eks_cluster.endpoint
  cluster_ca_certificate = base64decode(module.eks_control_plane.aws_eks_cluster.certificate_authority[0].data)
  token                  = module.eks_control_plane.aws_eks_cluster_auth.token
}

provider "kubernetes" {
  host                   = module.eks_control_plane.aws_eks_cluster.endpoint
  cluster_ca_certificate = base64decode(module.eks_control_plane.aws_eks_cluster.certificate_authority[0].data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", module.eks_control_plane.aws_eks_cluster.id, "--role-arn", "arn:aws:iam::${var.aws_account_id}:role/Atlantis"]
    command     = "aws"
  }
}

provider "helm" {
  kubernetes {
    host                   = module.eks_control_plane.aws_eks_cluster.endpoint
    cluster_ca_certificate = base64decode(module.eks_control_plane.aws_eks_cluster.certificate_authority[0].data)
    exec {
      api_version = "client.authentication.k8s.io/v1alpha1"
      args        = ["eks", "get-token", "--cluster-name", module.eks_control_plane.aws_eks_cluster.id, "--role-arn", "arn:aws:iam::${var.aws_account_id}:role/Atlantis"]
      command     = "aws"
    }
  }
}

That module gets sourced multiple times in a single aws account to spin up a load of clusters and this somehow gave me the least problems. The sleep makes the api more resilient to timing problems ( so I feel at least )

@davidgiga1993
Copy link

I'm facing the same issue after trying to change the tags of the cluster

@ikarlashov
Copy link

Okay, I found out what is the problem. Spent the entire day to fix it :)

TL;DR

  1. Move k8s provider to aws_auth.
  2. Set correct configuration for k8s provider as shown below.
  3. Configure aws cli before running terraform.

It's a bad idea to set manage_aws_auth to false like someone offered before. There's a block in EKS module that is dependent on it. So, just keep default for manage_aws_auth that is set to "true".

First of all, I put k8s provider configuration inside aws_auth.tf:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", var.cluster_name]
    command     = "aws"
  }
}

This is the only one correct configuration for eks that is also pointed out in official doc.

U can also use pre-generated kubeconfig file from aws eks update-kubeconfig, but it also uses same aws eks get-token under the hood. So you don't have hardcoded token anyway. It will be always dynamically generated from aws command.

Back to the main point: since there's an aws command to get token - u need to pre-configure it before running terraform:

aws configure set aws_access_key_id ${AWS_ACCESS_KEY_ID} --profile ${AWS_PROFILE}
aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY --profile ${AWS_PROFILE}
aws configure set region $AWS_DEFAULT_REGION --profile ${AWS_PROFILE}

And Voila, it works!

@stevehipwell
Copy link
Contributor

We've been told by Hashicorp to always use the exec plugin for Kubernetes providers to stop this issue or ones like it. FYI Azure AKS has an even bigger problem with this than AWS EKS.

@fangj99
Copy link

fangj99 commented Oct 6, 2021

We met same error when we try to add extra subnets to the VPC and EKS, removed the extra subnets, the error disappeared

huguesalary added a commit to huguesalary/terraform-aws-eks that referenced this issue Oct 20, 2021
A user of this module can subsequently use this ConfigMap output as they wish, in their own module.

This should help with issue terraform-aws-modules#1280

```
resource "kubernetes_config_map" "aws_auth" {

  metadata {
    name      = module.eks.config_map_aws_auth_yaml.metadata.name
    namespace = module.eks.config_map_aws_auth_yaml.metadata.namespace
    labels    = module.eks.config_map_aws_auth_yaml.metadata.labels
  }

  data = module.eks.config_map_aws_auth_yaml.data
}
```
@github-actions
Copy link

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Dec 13, 2021
@gkzz
Copy link

gkzz commented Dec 14, 2021

same here.

$ terraform plan
╷
│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
│ 
│   with module.eks.kubernetes_config_map.aws_auth[0],
│   on .terraform/modules/eks/aws_auth.tf line 63, in resource "kubernetes_config_map" "aws_auth":
│   63: resource "kubernetes_config_map" "aws_auth" {
$ terraform version
Terraform v1.1.0
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v3.63.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.6.1
+ provider registry.terraform.io/hashicorp/local v2.1.0
+ provider registry.terraform.io/hashicorp/null v3.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hashicorp/template v2.2.0
+ provider registry.terraform.io/terraform-aws-modules/http v2.4.1
$ cat kubernetes.tf
provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  token                  = data.aws_eks_cluster_auth.cluster.token
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
}

@github-actions github-actions bot removed the stale label Dec 15, 2021
@fabidick22
Copy link

> We met same error when we try to add extra subnets to the VPC and EKS, removed the extra subnets, the error disappeared

I was able to solve my problem in the same way, I have a module to manage network resources and another module to manage the EKS cluster.
First I created the cluster with two subnets but then we had the requirement to create another subnet, after applying these changes I had problems with my EKS module, I had to specify only my two old subnets to avoid the problem (private_subnet_id[0], private_subnet_id[1])

@philomory
Copy link

> We met same error when we try to add extra subnets to the VPC and EKS, removed the extra subnets, the error disappeared

I was able to solve my problem in the same way, I have a module to manage network resources and another module to manage the EKS cluster. First I created the cluster with two subnets but then we had the requirement to create another subnet, after applying these changes I had problems with my EKS module, I had to specify only my two old subnets to avoid the problem (private_subnet_id[0], private_subnet_id[1])

This happened to us, as well; we wanted to expand our cluster into additional subnets (specifically, the worker node ASGs, not so much the control plane); of course, by default the ASGs use the same set of subnets as the cluster control plan, and, apparently, you can't change the subnets of the cluster control plane. That said, it's a bit wild to me that this causes the planning phase to fall back to querying localhost. It's not easy to tell from Terraform's default logging what step is going wrong, but maybe setting TF_LOG to TRACE would reveal more.

@antonbabenko
Copy link
Member

This issue has been resolved in version 18.0.0 🎉

@timblaktu
Copy link

@antonbabenko we're still seeing this issue using v18.0.5 of this module. I see in the upgrade doc that:

Support for managing aws-auth configmap has been removed. This change also removes the dependency on the Kubernetes Terraform provider, the local dependency on aws-iam-authenticator for users, as well as the reliance on the forked http provider to wait and poll on cluster creation. To aid users in this change, an output variable aws_auth_configmap_yaml has been provided which renders the aws-auth configmap necessary to support at least the IAM roles used by the module (additional mapRoles/mapUsers definitions to be provided by users)

...and we have shifted to using the new aws_auth_configmap_yaml module output as suggested, and are now using what I believe is the recommended kubernetes provider config:

################################################################################
# Kubernetes provider configuration: How terraform auth{n,z} to a cluster
################################################################################
data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

...but don't understand what about the release fixed this issue, or what other config this fix is dependent on. Any advice would be appreciated. We could raise another issue, but I feel this is likely a misunderstanding on our part.

@bryantbiggs
Copy link
Member

bryantbiggs commented Feb 9, 2022

The issue was marked as resolved by release v18 because we removed the management of the aws-auth config map from the module (what the issue was created for)

@timblaktu
Copy link

Thanks @bryantbiggs. So, does this mean that it would be expected that this issue would still occur in a terraform project which uses v18+ of this eks module, and still uses the kubernetes provider? That's us. We're still managing our aws_auth config map in the same project which of course requires the kubernetes provider. I understand this was moved out of the eks module.

@bryantbiggs
Copy link
Member

It's possible; it's highly dependent on how your resources are configured, network connectivity, and what actions you are taking

@timblaktu
Copy link

OK, thanks for the info. We're first going to try using the exec plugin to handle kubernetes provider authentication, as some others have recommended above.

@stevehipwell
Copy link
Contributor

@timblaktu the tl;dr here is use the exec plugin for all Kubernetes based providers.

The long answer is that if you ask HashiCorp how to do this you will get many different, usually incorrect, answers. Of these two actually work; the "official" one is that you can't use Kubernetes in the workspace where it was created, the "engineering" one is to use the exec plugin and make sure you plan out your dependencies correctly.

@timblaktu
Copy link

timblaktu commented Feb 9, 2022

@stevehipwell thanks so much for those insights. I'm trying your "engineering" answer and want to dive a bit into the "make sure you plan out your dependencies correctly" part. I've changed my kubernetes provider config (which used to specify a token) to use the exec plugin for auth, like this:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

and am still getting the "localhost connection refused" error, presumably the first time the kubernetes provider tries to reach out to the cluster. In my case this is at the declaration of a "kubernetes_role" "configmap_update" resource, which causes it to try to Get "http://localhost/apis/rbac.authorization.k8s.io/v1/namespaces/kube-system/roles/configmap-update".

So,

  1. How do I know my exec plugin is working?
  2. What can I do to ensure the dependencies in my eks/k8s project are "planned out correctly" to avoid this issue?

EDIT: Could my problem here be that I am declaring the kubernetes provider to be implicitly dependent on the eks module through my reference to module.eks.cluster_id to fetch the cluster name? I could just as well fetch the cluster name from a local variable I have sitting around. Looking in the output of terraform graph the only relevant dependency I see between the kubernetes provider and what the eks module manages is this:

                "[root] provider[\"registry.terraform.io/hashicorp/kubernetes\"]" -> "[root] data.aws_eks_cluster.cluster (expand)"

...but this dependency is probably caused by my references to data.aws_eks_cluster.cluster.* in my provider config.

@stevehipwell
Copy link
Contributor

@timblaktu I've used the following pattern when layering on top of an EKS cluster created by this module for a number of major releases. I think the important point here is to use the module outputs for host and cluster_ca_certificate rather than a data object due to the way Terraform manages it's data collection. I also use the local.cluster_name input variable that I pass into the module rather than module.eks.cluster_id but I suspect that it might not make any difference.

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", local.cluster_name]
  }
}

@timblaktu
Copy link

Thanks again, @stevehipwell! This appears to be the final (and quite obscured!) wisdom that completely solves this issue. Using v18 of the module doesn't fix it by itself. Using an exec plugin in your kubernetes provider config doesn't fix it by itself. Your last point is also essential to the solution: you can't workaround this bug completely if you are referring to eks cluster data sources in your kubernetes provider config.

It's too bad that more issues aren't followed up on after being closed with the resulting essential tribal wisdom like we (mostly you) did here. But I guess that's the way of open source, no? If we (the tribe) doesn't do it, who will? Thanks again!

@lrstanley
Copy link

Is it worth calling out @stevehipwell's recommendations/details in the examples, and potentially the module docs, so others are aware of this caveat/don't run into the same issue? We've got many teams who have run into the same thing, and just temporarily split up the terraform to prevent running them so close together, which isn't ideal. I understand it's not the fault of these modules, but it might be worth at least adding a little snippet to point people in the right direction.

@timblaktu
Copy link

@lrstanley Probably it would make the most sense for this info to go prominently into the eks module documentation, since ultimately this is a bug in the module implementation, as noted here, and is the reason why all of these "all planets in alignment" work arounds are necessary.

@stevehipwell
Copy link
Contributor

@timblaktu this isn't an issue with this module, it's a general Terraform defect disguised as a design choice. I agree that the docs here could be updated with some "suggestions" on use, but HashiCorp should be the ones documenting how their providers work (as noted above don't hold your breath). Remember the maintainers have removed the nested provider in the module so provider logic belongs to the workspace consuming the module; HashiCorp own supporting this now.

@BlueShells
Copy link

@dak1n1 Hi Dak1n1 , I run into the same issue and fix this issue by your method , glad to to see you

@mareq
Copy link

mareq commented Jul 30, 2022

I am not sure if I understand the solution correctly: Is it to use exec instead of data source for token and have the host and cluster_certificate initialised from the module outputs instead of the data object as suggested above?

That is what I have done (hopefully no silly mistakes there):

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", var.cluster_name]
  }
}

But I am still getting the original error:

╷
│ Error: configmaps "aws-auth" already exists
│
│   with module.eks.kubernetes_config_map.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 453, in resource "kubernetes_config_map" "aws_auth":
│  453: resource "kubernetes_config_map" "aws_auth" {
│
╵

All I am doing is spinning up an emtpy cluster with eks_managed_node_groups.

This workaround does help, but is kind of ugly:

$ terraform apply
   ..error..
$ terraform import module.eks.kubernetes_config_map.aws_auth kube-system/aws-auth`
$ terraform apply

It is probably possible to avoid the error by saying -target=module.eks in the first apply, but what I am really up to is to avoid whole this dance altogether. Is that somehow possible or have I misunderstood the thread above and this IS the solution, at least for now?

@glyhood
Copy link

glyhood commented Nov 3, 2022

It looks like the Kubernetes provider isn't receiving a configuration. Here's how I configure mine, which is similar to the EKS module README except it's for the newer version of the Kubernetes provider. (My team recently released version 2.0 of the Kubernetes provider and it requires a slightly different config than shown in this module's README).

data "aws_eks_cluster" "default" {
  name = module.cluster.cluster_id
}

data "aws_eks_cluster_auth" "default" {
  name = module.cluster.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

This worked for me.

@marcos-gomes-ishop
Copy link

You can declare an env variable in your GitHub YAML like that and this will work:
env:
KUBE_CONFIG_PATH: /home/runner/.kube/config

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet