-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AKS - dial tcp [::1]:80: connect: connection refused on all plans modifying the azurerm_kubernetes_cluster resource #1307
Comments
I am hitting the same issue.
|
We are also hitting this and it’s causing a bit of a headache |
Hi, I'm sorry to hear you all are struggling with this dependency issue. I've done extensive research in this area and come across similar scenarios. The cause has to do with passing an unknown value to a provider configuration block, which is not supported in Terraform core. To quote their docs:
When you make a change to the underlying infrastructure, such as node count, you're passing an unknown value into the Kubernetes provider configuration block, since the full scope of the cluster infrastructure is not known until after the change has been applied to the AKS cluster. That's why Terraform is behaving as if it's not reading the cluster's data source properly. Although I did write the initial guide to show that it can be possible to work around some of these issues, as you've found from experience, there are many edge cases that make it an unreliable and unintuitive process, to get the Kubernetes provider working alongside the underlying infrastructure. This is due to a long-standing limitation in Terraform, that can't be fixed in any provider, but we do have plans to smooth out the bumps a little by adding better error messages upfront, so that users don't run into this on subsequent applies. I thought at first that I could list out every work-around to help users keep their preferred workflow of having the cluster in the same Terraform state as the Kubernetes resources. Most cases can be worked around using That's why I have a new guide in progress here, which shows the most reliable method that we have so far: the cluster infrastructure needs to be kept in a state separate from the Kubernetes and Helm provider resources. I know this is inconvenient, which is why we continue to try and accommodate users in single-apply scenarios, and scenarios which contain the Kubernetes and cluster resources in the same Terraform state. However, until upstream Terraform can add support for this, the single-apply workflow will remain buggy and less reliable than separating cluster infrastructure from Kubernetes resources. |
I have been testing using an alias azure provider for the data query and this seems to be a viable workaround for the issue. More testing is obviously required...from Steph's code example (https://github.com/hashicorp/terraform-provider-kubernetes/blob/main/_examples/aks/main.tf) the change will look like
Look forward to the communities feedback. thank you |
@dak1n1 this error message is unintuitive as it isn't explaining why the error is occurring and leads to significant lost time tracking down the cause. Furthermore, if the error here is correct it means that Terraform has attempted to connect to the localhost cluster, which could have unintended consequences if there is such a cluster. |
I tried this today and sadly in my situation it did not help. |
I encountered same problem on GKE @dak1n1 |
Also adapted the node count, the first apply went through without problems. Now I 'm facing the same error message on every plan / apply... |
@BoHuang2018 It was a lot of pain, but that fixed it. Thanks. |
Running |
thanks @favoretti for your answer . it works for me. |
@dak1n1 thanks for that explaination. It realy helped |
A better fix for GKE that worked for me was let google handle the connection config in gcloud container clusters get-credentials <my_cluster> --region=<my_region> provider "kubernetes" {
config_path = "~/.kube/config"
} |
@jrhouston progressive apply isn't a solution here, it isn't truly IaC if you need to use manual steps; this issue shouldn't be closed without a documented solution that isn't "buy more workspaces and/or go back to manual OPs". AFAIK, from experience and discussions with the engineers working on Terraform, this problem should be resolvable by using the exec plugin to authenticate with Kubernetes. I can also add that there are some other considerations when configuring the provider which basically boils down to not using a cluster datasource to lookup in a workspace where the cluster resource is being created. But fundamentally Kubernetes only works in Terraform with the exec plugin or a very convoluted set of workspaces and/or manual steps. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Please see this comment for the explanation of the root cause.
Terraform Version, Provider Version and Kubernetes Version
Affected Resource(s)
Terraform Configuration Files
Our configuration is almost identical to your aks example code, so I tried using that and replicated the behaviour.
Note - I simulated by modifying the workers_count variable in the
aks-cluster
directory, however this isn't actually implemented in your code. Modify line 23 to benode_count = var.workers_count
, then pass a new value in via the aks-cluster module in main.tf.Steps to Reproduce
aks-cluster
module as directed above to support workers_countterraform apply
terraform plan
Expected Behavior
Terraform should display a plan showing the updated node pool count.
Actual Behavior
The following error is reported:
The data source is clearly not passing back valid data, even though there is a dependency on the aks-cluster module.
The text was updated successfully, but these errors were encountered: