Replace EKS test-infra with example #1192

dak1n1 · 2021-03-12T01:42:33Z

This copies the EKS cluster from our examples into our test-infra, so we can use it for tests.

Description

Acceptance tests

Have you added an acceptance test for the functionality being added?
Have you run the acceptance tests on this branch?

Output from acceptance testing:

$ make testacc TESTARGS='-run=TestAccXXX'

...

Release Note

Release note for CHANGELOG:

...

References

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

This copies the EKS cluster from our examples into our test-infra, so we can use it for tests.

alexsomesan

Thanks Stef!

I had a look and dropped some comments here in there. I hope I haven't missed anything.

I'll give this a try myself as soon as I find some spare cycles.

kubernetes/test-infra/eks/README.md

alexsomesan · 2021-03-12T11:14:03Z

kubernetes/test-infra/eks/main.tf

  write_kubeconfig = false

+  workers_group_defaults = {
+    root_volume_type = "gp2"


GP2 volumes are generally more expensive.
Is there a technical need for nodes to run on GP2 volumes? We're likely not putting much workload on the nodes during our tests.

gp2 was the default in the EKS module, but they recently updated it to gp3. I put it back to the old default because gp3 isn't available in all regions.

This is the commit where it was added. terraform-aws-modules/terraform-aws-eks@76537d1

Oh, if that's the case then disregard my comment. I must have based it on old information and likely GPx volumes are now the only choice. Sorry for the confusion :)

alexsomesan · 2021-03-12T11:40:02Z

kubernetes/test-infra/eks/README.md

+
+## Replacing the EKS cluster and re-creating the Kubernetes / Helm resources
+
+When the cluster is initially created, the Kubernetes and Helm providers will not be initialized until authentication details are created for the cluster. However, for future operations that may involve replacing the underlying cluster (for example, changing the network where the EKS cluster resides), the EKS cluster will have to be targeted without the Kubernetes/Helm providers, as shown below. This is done by removing the `module.kubernetes-config` from Terraform State prior to replacing cluster credentials, to avoid passing outdated credentials into the providers.


I would prefer that we leave out the first phrase. I feel it's presenting some facts inaccurately.

My understanding is that all providers are configured at the same time, BEFORE any of the resources have been created. It is because the configured attribute values are only then consumed by the Kubernetes / Helm provider CRUD logic at a later stage in the apply, that we only see intermittent issues that we generally label as "progressive apply". This sequencing is not predictable and is influenced by the resource dependencies in the graph in very unintuitive ways. I think the first phrase here gives users the false impression that this aspect is somehow predictable. I worry that it is not.

Maybe there's a way I can represent this more accurately. Let's discuss the nuances of it, and maybe I can learn from your perspective. There is probably some truth to both of our understandings of this scenario, since my understanding is based on first-hand experience and observation. The scenario I'm describing in this document is thoroughly tested. It is repeatable and predictable. I'm less certain about the "why" behind it though.

I think this is because, when you initialize the Kubernetes provider using a data source, like google_container_cluster, the reading of the data source can be deferred. In deferring the data source read, you get the predictable deferred provider initialization. This is what allows our initial builds of GKE/EKS/AKS clusters to succeed in our test-infra and examples. It's the predictable single-apply scenario.

However, you can run into problems later, such as during cluster replacement, when the Kubernetes provider initialization is no longer delayed by the initial read of the data source. The Kubernetes provider is initialized before the data source has read the new GKE credentials, since the credentials were replaced during the apply phase. In this scenario, the data source was never re-read to give the Kubernetes provider valid credentials, and so the apply fails. In those cases, as far as I'm aware, the only way forward is to remove the Kubernetes provider resources from state, as described in this document. (Related: I opened a bug upstream to see if they can help us with this scenario).

My goal with this README to be able to convey the information accurately to users, to help them to succeed in establishing the dependency between their GKE cluster and their Kubernetes resources.

How about this instead... I edited it to say the providers will not attempt to read from the Kubernetes API instead of saying they won't be initialized. Perhaps that information is more accurate and more relevant to the users.

Suggested change

When the cluster is initially created, the Kubernetes and Helm providers will not be initialized until authentication details are created for the cluster. However, for future operations that may involve replacing the underlying cluster (for example, changing the network where the EKS cluster resides), the EKS cluster will have to be targeted without the Kubernetes/Helm providers, as shown below. This is done by removing the `module.kubernetes-config` from Terraform State prior to replacing cluster credentials, to avoid passing outdated credentials into the providers.

When the cluster is initially created, the Kubernetes and Helm providers will not attempt to read from the Kubernetes API until authentication details are created for the cluster. However, for future operations that may involve replacing the underlying cluster (for example, changing the network where the EKS cluster resides), the EKS cluster will have to be targeted without the Kubernetes/Helm resources in state, as shown below. This is done by removing the `module.kubernetes-config` from Terraform State prior to replacing cluster credentials. Alternatively, keeping the EKS infrastructure and Kubernetes infrastructure in different state files will avoid this scenario.

Stef, thanks for breaking things down in such detail.

I appreciate the overall solution that you are proposing here and I think removing the state of the Kubernetes resources deliberately via the state rm command before replacing the cluster is a clever and very valid solution. I think we should promote that and encourage users to adopt it. It follows good practice patterns, by making actions explicit rather than implicit and limiting blast radius of each action. I like it.

What I'm not comfortable with is the messaging in the first phrase only. It can be interpreted by users as stating for a fact that Terraform offers any kind of sequencing guarantees that "the Kubernetes and Helm providers will not attempt to read from the Kubernetes API until authentication details are created for the cluster".

This causality, I'm afraid, is not sustained by the way Terraform is currently designed. After multiple conversations with Terraform engineers, the conclusion has always been that this is not by-design behaviour. The fact that it works (as clearly you have experienced first-hand) is a side-effect and side-effects are dangerous to promote as reliable fact.

In fact, Terraform documentation clearly advises against relying on this assumption here: https://www.terraform.io/docs/language/providers/configuration.html#provider-configuration-1. About midway in that section, it reads:
"You can use expressions in the values of these configuration arguments, but can only reference values that are known before the configuration is applied. This means you can safely reference input variables, but not attributes exported by resources (with an exception for resource arguments that are specified directly in the configuration)."

I think it's in our users' best interest (but also ours) for us to send a clear and cohesive message about how to properly use Terraform.

Ok, I'll give this more thought. Maybe for now, we can just remove this README from the test-infra dir. It was just a copy/paste of our example README anyway. And we can continue the discussion and tuning this file over in the PR where I'm making changes to this README in a more user-facing way. This discussion is also something we can think about for our upcoming meeting regarding the Kubernetes provider authentication issues. (Phil scheduled it for us yesterday -- we'll meet in about 2 weeks).

I removed the paragraph in question. It's ok to just wait until the meeting to discuss it further, since we might not do much work on the other PR until we have that talk.

I think that's a great way forward! I'm happy to do a deep dive in this topic online and exchange ideas.

ghost · 2021-04-12T05:10:00Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Replace EKS test-infra with example

2210992

This copies the EKS cluster from our examples into our test-infra, so we can use it for tests.

ghost added the size/XXL label Mar 12, 2021

alexsomesan reviewed Mar 12, 2021

View reviewed changes

dak1n1 added 3 commits March 12, 2021 08:50

fix env var name

10c5a6a

update go version

58ae59c

Remove some pieces of the example README

48cbe7f

alexsomesan approved these changes Mar 12, 2021

View reviewed changes

dak1n1 merged commit 078e9db into hashicorp:master Mar 12, 2021

ghost locked as resolved and limited conversation to collaborators Apr 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace EKS test-infra with example #1192

Replace EKS test-infra with example #1192

dak1n1 commented Mar 12, 2021

alexsomesan left a comment

alexsomesan Mar 12, 2021

dak1n1 Mar 12, 2021

dak1n1 Mar 12, 2021

alexsomesan Mar 12, 2021

alexsomesan Mar 12, 2021

dak1n1 Mar 12, 2021

alexsomesan Mar 12, 2021

dak1n1 Mar 12, 2021

dak1n1 Mar 12, 2021

alexsomesan Mar 12, 2021

ghost commented Apr 12, 2021


		## Replacing the EKS cluster and re-creating the Kubernetes / Helm resources

		When the cluster is initially created, the Kubernetes and Helm providers will not be initialized until authentication details are created for the cluster. However, for future operations that may involve replacing the underlying cluster (for example, changing the network where the EKS cluster resides), the EKS cluster will have to be targeted without the Kubernetes/Helm providers, as shown below. This is done by removing the `module.kubernetes-config` from Terraform State prior to replacing cluster credentials, to avoid passing outdated credentials into the providers.

Replace EKS test-infra with example #1192

Replace EKS test-infra with example #1192

Conversation

dak1n1 commented Mar 12, 2021

Description

Acceptance tests

Release Note

References

Community Note

alexsomesan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Apr 12, 2021