-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Write sensitive cloud-init user-data into /etc/cloud/cloud.cfg.d #4746
🐛 Write sensitive cloud-init user-data into /etc/cloud/cloud.cfg.d #4746
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This change must be validated e2e. I've already tested it using my own AWS account, so I'm confident it will pass e2e. /test pull-cluster-api-provider-aws-e2e |
/cc @randomvariable You know this area. Tagging you, in case you have questions/concerns about this change. |
I'd like to backport this to supported release branches, too. |
/milestone v2.4.0 |
/lgtm |
/retest |
This makes sense, but I think further down the line we might want to rethink restarting the cloud-init process. |
/lgtm |
/retest |
/test pull-cluster-api-provider-aws-e2e |
@dlipovetsky looks like E2E tests needs to be fixed |
/test pull-cluster-api-provider-aws-e2e |
/test pull-cluster-api-provider-aws-apidiff-main |
Last e2e failure was due to reaching EventBridge resource quota. From the manager log:
|
/test pull-cluster-api-provider-aws-e2e |
1 similar comment
/test pull-cluster-api-provider-aws-e2e |
I still have no idea why the same 7 tests consistently fail. |
/test pull-cluster-api-provider-aws-build-docker |
@dlipovetsky maybe you could try rebasing the PR and then run the E2E tests? |
This allows cloud-init to read the user-data without using an #include, which always fails when cloud-init first runs.
f96b742
to
b21896e
Compare
/retest |
/test pull-cluster-api-provider-aws-e2e |
@dlipovetsky: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
I found the cause of the failing tests. These tests use the images created for Kubernetes v1.25.3. They were created in October 2022. They have cloud-init v22.3.4. This version does not support Jinja templating for cloud-config sources in The cloud-config created by CABPK happens to have a single instance of Jinja templating, for the kubeadm configuration:
And because Jinja templating isn't supported, the kubeadm configuration is written out with the Jinja string, instead of the templated value, so the configuration is invalid:
I'm considering what to do. |
@dlipovetsky Thanks for your persistence! Could we create a new image with Kube 1.26 and cloud-init v22.4? Or maybe a 1.25 image with v22.4? |
As soon as we're able to publish images into a CNCF-owned account (depends on kubernetes/k8s.io#6517). -- I also want to take this opportunity to question why we have Jinja template strings in our cluster templates at all. If I remove that string, the AWS Cloud Provider fails, apparently
And therefore the node never gets its |
Turns out that defining a part handler will not help here. Cloud-init will fetch any "includes" before running part handlers. If the include is missing, cloud-init will fail. |
/milestone v2.6.0 |
We cannot merge this until #4746 (comment) is resolved. /hold |
Based on the comment lets push this to the next milestone /milestone v2.7.0 |
Considering this still needs some work with respect to addressing #4746 (comment) I think we can move it to milestone v2.8.0. |
Sounds like a goo dplan @damdo . We need to consolidate the cloud-init work, there are multiple strands ongoing. I'll start a discussion. |
Thanks @richardcase meanwhile we start the discussion, are we still happy to move this to the 2.8 milestone? Thanks! |
/milestone v2.8.0 |
I'll close this in favor of building a custom cloud-init data source. See kubernetes-sigs/image-builder#1583. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
The boothook fetches sensitive user-data from an AWS service (Secrets Manager, or SSM Parameter Store). This PR changes the mechanism by the way this user-data is passed to cloud-init once it's fetched.
Previously, the boothook wrote the sensitive user-data to
/etc/secret-userdata.txt
, and cloud-init read it via an#include
directive. Now, the boothook writes it to/etc/cloud/cloud.cfg.d/99_kubeadm_bootstrap.cfg
. The directory is a well-documented configuration source used by cloud-init, and exists wherever cloud-init is installed. The file is given the prefix99_
to give it high priority over other configuration in that directory.Previously, cloud-init read sensitive user-data from
/etc/secret-userdata.txt
via an#include
directive. Now, it reads the sensitive user-data simply because it is located in the/etc/cloud/cloud.cfg.d
directory. Therefore, the#include
directive is no longer used, and is removed.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #4745
Special notes for your reviewer:
If we merge this PR, we can revert the workaround introduced in kubernetes-sigs/image-builder#406.
Checklist:
Release note: