Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [request]: Allow for optional Auto Bootstrapping on AL2023 similar to what is provided on AL2 - decouple cluster and nodegroups #2455

Open
rns350 opened this issue Oct 29, 2024 · 2 comments
Labels
EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue

Comments

@rns350
Copy link

rns350 commented Oct 29, 2024

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
On AL2 there was a script provided at /etc/eks/bootstrap.sh that could be called on node startup. Given the name of the cluster, it would fetch info about the cluster via the describe-cluster API that is needed to connect to the API server. In AL2023, this script was removed due to observed API throttling when many nodes tried to join the cluster at the same time, all calling the describe-cluster operation. Now, for self-managed node groups, this info needs to be provided in a NodeConfig manifest in the user data.

This is perfectly reasonable and a more efficient use of resources, especially for large clusters; however, for our cluster running only a few nodes, it adds another step to the bootstrapping process. In the current state with AL2023, we just end up making a describe cluster call ourselves before running out our node group cloud formation template to gather this information and embed it in the user data using sed commands. With AL2, we could deploy the nodegroup and cluster fresh with eachother, since all we needed to know about the cluster to deploy the nodegroups was its name - this can be predicted in advance. In the current state with AL2023, one of the required parameters in the API endpoint, which includes a random ID and cannot be predicted, so the nodegroup and cluster deployments are now coupled.

We'd like to have the option to leave this work to the Node, so that a bootstrap script can gather the details from the describe-cluster API and embed them into the NodeConfig yaml. This could be an opt-in feature for those who want it, and would once again decouple the cluster and self-managed managed nodegroup deployments.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
With AL2, we need only provide the cluster name to bootstrap the nodes to the eks cluster. This was a really nice feature because we could always infer what our cluster would be called even if it hadn’t been created yet. There is a random ID in the api-server endpoint, so we don’t have a way to predict It if the cluster needs to be stood up fresh again. This means that with AL2023, the cluster and node group deployments have become coupled - the cluster must finish deploying before we can discern properties of it required in the nodegroup user data. For smaller clusters, this is a larger detriment than the benefit given by the removal of bootstrapping.

I no longer see a method for decoupling the deployment of the cluster and self-managed node groups, since the nodes no longer fetch the cluster information themselves. Having the option to opt into using the bootstrap script would solve this problem, but we are open to alternative solutions. Having this option speeds up the initial deployment process if we need to stand up a new environment or cycle resources.

Are you currently working around this issue?
Yes - we just updated to AL2023 and now gather the data via a describe-cluster call. We then embed the details into the CF template before deploying. While this works, it means that the node group deployment is now dependent on the cluster deployment, whereas it wasn't before. We can predict the cluster name and provide it without the cluster being created; we can't do this with the API endpoint. We will otherwise need to couple these two deployment pipelines.

Additional context
We have our dev cluster running on the AL2023 images already - other than this one hitch, the feature improvements on the new AMI are great.

@rns350 rns350 added the Proposed Community submitted issue label Oct 29, 2024
@mikestef9 mikestef9 added the EKS Amazon Elastic Kubernetes Service label Oct 29, 2024
@dims
Copy link
Member

dims commented Oct 29, 2024

@rns350 would be good to surface this in https://github.com/awslabs/amazon-eks-ami/issues as well.

@rns350
Copy link
Author

rns350 commented Oct 29, 2024

Hey @dims , thank you for the advice. I surfaced the feature request here - awslabs/amazon-eks-ami#2029

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests

3 participants