Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AKS - dial tcp [::1]:80: connect: connection refused on all plans modifying the azurerm_kubernetes_cluster resource #1307

Closed
jamurtag opened this issue Jun 11, 2021 · 17 comments

Comments

@jamurtag
Copy link

jamurtag commented Jun 11, 2021

Please see this comment for the explanation of the root cause.

Terraform Version, Provider Version and Kubernetes Version

Terraform version: 0.15, 1.0
Kubernetes provider version: 2.3.0
Kubernetes version: 1.19

Affected Resource(s)

  • all resources

Terraform Configuration Files

Our configuration is almost identical to your aks example code, so I tried using that and replicated the behaviour.

Note - I simulated by modifying the workers_count variable in the aks-cluster directory, however this isn't actually implemented in your code. Modify line 23 to be node_count = var.workers_count, then pass a new value in via the aks-cluster module in main.tf.

Steps to Reproduce

  1. Update the aks-cluster module as directed above to support workers_count
  2. terraform apply
  3. Change the workers_count variable to any other value
  4. terraform plan

Expected Behavior

Terraform should display a plan showing the updated node pool count.

Actual Behavior

The following error is reported:

$ terraform plan
random_id.cluster_name: Refreshing state... [id=aE_9C3A]
module.aks-cluster.azurerm_resource_group.default: Refreshing state... [id=/subscriptions/ff83a9d2-8d6e-4c4a-8b34-641163f8c99f/resourceGroups/tf-k8s-684ffd0b70]
module.aks-cluster.azurerm_kubernetes_cluster.default: Refreshing state... [id=/subscriptions/ff83a9d2-8d6e-4c4a-8b34-641163f8c99f/resourcegroups/tf-k8s-684ffd0b70/providers/Microsoft.ContainerService/managedClusters/tf-k8s-684ffd0b70]
module.kubernetes-config.local_file.kubeconfig: Refreshing state... [id=1ca8ad3c1c7f4aff65e5eda0038b619788b0956a]
module.kubernetes-config.helm_release.nginx_ingress: Refreshing state... [id=nginx-ingress-controller]
module.kubernetes-config.kubernetes_namespace.test: Refreshing state... [id=test]
╷
│ Error: Get "http://localhost/api/v1/namespaces/test": dial tcp [::1]:80: connect: connection refused
│
│   with module.kubernetes-config.kubernetes_namespace.test,
│   on kubernetes-config/main.tf line 14, in resource "kubernetes_namespace" "test":14: resource "kubernetes_namespace" "test" {
│
╵
╷
│ Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
│
│   with module.kubernetes-config.helm_release.nginx_ingress,
│   on kubernetes-config/main.tf line 59, in resource "helm_release" "nginx_ingress":59: resource helm_release nginx_ingress {

The data source is clearly not passing back valid data, even though there is a dependency on the aks-cluster module.

@jamurtag jamurtag added the bug label Jun 11, 2021
@chakri-nelluri
Copy link

I am hitting the same issue.

 2021-06-11T16:51:39.576-0400 [DEBUG] plugin: using plugin: version=5
 8675 2021-06-11T16:51:39.640-0400 [INFO]  plugin.terraform-provider-kubernetes_v2.3.2_x5: 2021/06/11 16:51:39 [WARN] Invalid provider configuration was supplied. Provider operations likely to fail: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable: timestamp=2021-06-11T16:51:39.640-0400
 8676 2021-06-11T16:51:39.640-0400 [INFO]  plugin.terraform-provider-kubernetes_v2.3.2_x5: 2021/06/11 16:51:39 [DEBUG] Enabling HTTP requests/responses tracing: timestamp=2021-06-11      T16:51:39.640-0400

@thy143
Copy link

thy143 commented Jun 28, 2021

We are also hitting this and it’s causing a bit of a headache

@dak1n1
Copy link
Contributor

dak1n1 commented Jul 2, 2021

Hi, I'm sorry to hear you all are struggling with this dependency issue. I've done extensive research in this area and come across similar scenarios. The cause has to do with passing an unknown value to a provider configuration block, which is not supported in Terraform core. To quote their docs:

You can use expressions in the values of these configuration arguments, 
but can only reference values that are known before the configuration is applied.

When you make a change to the underlying infrastructure, such as node count, you're passing an unknown value into the Kubernetes provider configuration block, since the full scope of the cluster infrastructure is not known until after the change has been applied to the AKS cluster. That's why Terraform is behaving as if it's not reading the cluster's data source properly.

Although I did write the initial guide to show that it can be possible to work around some of these issues, as you've found from experience, there are many edge cases that make it an unreliable and unintuitive process, to get the Kubernetes provider working alongside the underlying infrastructure. This is due to a long-standing limitation in Terraform, that can't be fixed in any provider, but we do have plans to smooth out the bumps a little by adding better error messages upfront, so that users don't run into this on subsequent applies.

I thought at first that I could list out every work-around to help users keep their preferred workflow of having the cluster in the same Terraform state as the Kubernetes resources. Most cases can be worked around using terraform state rm module.kubernetes-config or terraform apply -target=module.aks-cluster, but I think encouraging this kind of work-around will cause more headaches in the long run, as it puts the user in charge of figuring out when to use special one-off apply commands, rather than setting up Terraform to behave reliably and predictably from the start. Plus it can have unintended side-effects, like orphaning cloud resources.

That's why I have a new guide in progress here, which shows the most reliable method that we have so far: the cluster infrastructure needs to be kept in a state separate from the Kubernetes and Helm provider resources.

https://github.com/hashicorp/terraform-provider-kubernetes/tree/e058e225e621f06e393bcb6407e7737fd43817bd/_examples/aks

I know this is inconvenient, which is why we continue to try and accommodate users in single-apply scenarios, and scenarios which contain the Kubernetes and cluster resources in the same Terraform state. However, until upstream Terraform can add support for this, the single-apply workflow will remain buggy and less reliable than separating cluster infrastructure from Kubernetes resources.

@pmnathan
Copy link

pmnathan commented Sep 21, 2021

I have been testing using an alias azure provider for the data query and this seems to be a viable workaround for the issue. More testing is obviously required...from Steph's code example (https://github.com/hashicorp/terraform-provider-kubernetes/blob/main/_examples/aks/main.tf) the change will look like

provider "azurerm" {
  alias = "azurerm_k8s"
  features {}
}

data "azurerm_kubernetes_cluster" "default" {
  provider            = azurerm.azurerm_k8s
  depends_on          = [module.aks-cluster] # refresh cluster state before reading
  name                = local.cluster_name
  resource_group_name = local.cluster_name
}

Look forward to the communities feedback. thank you

@stevehipwell
Copy link
Contributor

@dak1n1 this error message is unintuitive as it isn't explaining why the error is occurring and leads to significant lost time tracking down the cause. Furthermore, if the error here is correct it means that Terraform has attempted to connect to the localhost cluster, which could have unintended consequences if there is such a cluster.

@AlexGoris-KasparSolutions

I have been testing using an alias azure provider for the data query and this seems to be a viable workaround for the issue. More testing is obviously required...from Steph's code example (https://github.com/hashicorp/terraform-provider-kubernetes/blob/main/_examples/aks/main.tf) the change will look like

provider "azurerm" {
  alias = "azurerm_k8s"
  features {}
}

data "azurerm_kubernetes_cluster" "default" {
  provider            = azurerm.azurerm_k8s
  depends_on          = [module.aks-cluster] # refresh cluster state before reading
  name                = local.cluster_name
  resource_group_name = local.cluster_name
}

Look forward to the communities feedback. thank you

I tried this today and sadly in my situation it did not help.

@BoHuang2018
Copy link

I encountered same problem on GKE @dak1n1 terraform state rm -target modules.my-gke-module terraform plan -target modules.my-gke-module terraform apply -target modules.my-gke-module helped to fix my problem. Thanks. I will try your new guide with GKE

@sgutwein
Copy link

Also adapted the node count, the first apply went through without problems. Now I 'm facing the same error message on every plan / apply...
Error: Get "http://localhost/api/v1/namespaces/external-secrets": dial tcp 127.0.0.1:80: connect: connection refused │ │ with kubernetes_namespace.namespaces["external-secrets"], │ on main.tf line 109, in resource "kubernetes_namespace" "namespaces": │ 109: resource "kubernetes_namespace" "namespaces" {

@BoHuang2018
Copy link

Hi @sgutwein , I used methods of @dak1n1 to solve my error on GKE successfully. I guess it's a solution to your error.

@sgutwein
Copy link

@BoHuang2018 It was a lot of pain, but that fixed it. Thanks.

@favoretti
Copy link
Contributor

Running terraform apply -refresh=false will just silently do the right thing BTW, as it won't try to refresh the current state. (And in general since this only happens when cluster gets recreated - no refresh isn't as bad as it sounds).

@akepjio
Copy link

akepjio commented Dec 21, 2021

thanks @favoretti for your answer . it works for me.

@josephcaxton
Copy link

@dak1n1 thanks for that explaination. It realy helped

@xelhark
Copy link

xelhark commented Jan 20, 2022

A better fix for GKE that worked for me was let google handle the connection config in ~/.kube/config and feed the kube config file to the provider manually:

gcloud container clusters get-credentials <my_cluster> --region=<my_region>
provider "kubernetes" {
  config_path            = "~/.kube/config"
}

@jrhouston
Copy link
Collaborator

As @dak1n1 explained above this is due to progressive apply. Please open a new issue with your complete configuration if you feel you are encountering a different issue than the original issue reported here.

@stevehipwell
Copy link
Contributor

@jrhouston progressive apply isn't a solution here, it isn't truly IaC if you need to use manual steps; this issue shouldn't be closed without a documented solution that isn't "buy more workspaces and/or go back to manual OPs".

AFAIK, from experience and discussions with the engineers working on Terraform, this problem should be resolvable by using the exec plugin to authenticate with Kubernetes. I can also add that there are some other considerations when configuring the provider which basically boils down to not using a cluster datasource to lookup in a workspace where the cluster resource is being created. But fundamentally Kubernetes only works in Terraform with the exec plugin or a very convoluted set of workspaces and/or manual steps.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests