-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(nodeadm): init with only cluster name #2029
Comments
How much of a difference does this make in your deployment time? This behavior was intentionally removed, because we don't believe it should be used in a production environment (and because many IaC setups can easily wire the cluster details to the nodes' user data -- you can do this with the return values of I'll chat with the team about this 👍 |
Hey Carter! I understand that it won’t make a huge difference in deployment times. It would just fit better into our current deployment architecture. A couple other developers on our teams have asked for it. I think a feature gate would be a great idea and agree that it should not be default functionality. Thanks for helping us with our request and let me know how that conversation goes. |
Hey @cartermckinnon, Just Checking back in on this to see if there's any more interest in having this added back from any conversations. Thank you again for the support. |
What would you like to be added: On AL2 there was a script provided at /etc/eks/bootstrap.sh that could be called on node startup. Given the name of the cluster, it would fetch info about the cluster via the describe-cluster API that is needed to connect to the API server. In AL2023, this script was removed due to observed API throttling when many nodes tried to join the cluster at the same time, all calling the describe-cluster operation. Now, for self-managed node groups, this info needs to be provided in a NodeConfig manifest in the user data.
This is perfectly reasonable and a more efficient use of resources, especially for large clusters; however, for our cluster running only a few nodes, it adds another step to the bootstrapping process. In the current state with AL2023, we just end up making a describe cluster call ourselves before running out our node group cloud formation template to gather this information and embed it in the user data using sed commands. With AL2, we could deploy the nodegroup and cluster fresh with eachother, since all we needed to know about the cluster to deploy the nodegroups was its name - this can be predicted in advance. In the current state with AL2023, one of the required parameters is the API endpoint, which includes a random ID and cannot be predicted, so the nodegroup and cluster deployments are now coupled.
We'd like to have the option to leave this work to the Node, so that a bootstrap script can gather the details from the describe-cluster API and embed them into the NodeConfig yaml. This could be an opt-in feature for those who want it, and would once again decouple the cluster and self-managed nodegroup deployments.
Why is this needed: With AL2, we need only provide the cluster name to bootstrap the nodes to the eks cluster. This was a really nice feature because we could always infer what our cluster would be called even if it hadn’t been created yet. There is a random ID in the api-server endpoint, so we don’t have a way to predict It if the cluster needs to be stood up fresh again. This means that with AL2023, the cluster and node group deployments have become coupled - the cluster must finish deploying before we can discern properties of it required in the nodegroup user data. For smaller clusters, this is a larger detriment than the benefit given by the removal of bootstrapping.
I no longer see a method for decoupling the deployment of the cluster and self-managed node groups, since the nodes no longer fetch the cluster information themselves. Having the option to opt into using the bootstrap script would solve this problem, but we are open to alternative solutions. Having this option speeds up the initial deployment process if we need to stand up a new environment or cycle resources. We have our dev cluster running on the AL2023 images already - other than this one hitch, the feature improvements on the AL2023 AMI are great.
The text was updated successfully, but these errors were encountered: