-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Ensure that insecureSkipTLSVerify is false before setting caBundle on APIService #160
base: master
Are you sure you want to change the base?
Conversation
…dle on APIService Signed-off-by: Jorge Turrado <[email protected]>
pkg/rotator/rotator.go
Outdated
if err := unstructured.SetNestedField(apiService.Object, false, "spec", "insecureSkipTLSVerify"); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious to get some of the maintainers thoughts on whether this actually has to happen here, in cert-controller. If cert-controller
is going to be used with APIService, presumably we are going to inject a caBundle
so why not set this field outside of cert-controller
to begin with ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem we have faced with is that when you set insecureSkipTLSVerify: true
, you have to set explicitly insecureSkipTLSVerify: false
to remove the value.
As insecureSkipTLSVerify: false
is the default value, k8s api removes it automatically and CD tools like ArgoCD/Flux enter on permanent out-of-sync status.
kedacore/charts#550
In this scenario, we have some options:
- Adding a step to remove the field manually (breaking the helm upgrade process)
- Adding explicitly the value in helm chart, producing out-of-sync on CD tools
- Updating
insecureSkipTLSVerify
before settingcaBundle
Maybe I've missed something and there is any other option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused here. You're saying that having insecureSkipTLSVerify: false
in the Helm chart causes CI/CD pipelines to detect drift because the API server strips default values, correct?
I'm curious what makes cert-controller different in this regard, since it's merely another thing essentially running kubectl apply. Would cert-controller also be subject to the same default-stripping issue, leaving you right back where you began?
Or is the objective to remove the state from the Helm chart such that there is no longer any check for drift for this value?
Modifying controller code to meet the needs of CI/CD code does seem a bit backwards. I am also wary of potentially manually overriding user intent on what could be a very salient field (is there some reason why insecureSkipTLSVerify is set? Will un-setting it cause an outage? Can we be sure the answer will be identical for all projects that consume cert-controller?)
As for other options, maybe a post-install hook on the Helm chart to modify the value?
Here is an example of a post-install hook G8r uses:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused here. You're saying that having insecureSkipTLSVerify: false in the Helm chart causes CI/CD pipelines to detect drift because the API server strips default values, correct?
Yes, as it's the default value, it is removed automatically. If you deploy an apiservice with explicitly set insecureSkipTLSVerify: false
and then you get it from the cluster, you will see that your set value insecureSkipTLSVerify: false
has been chopped from the manifest. This triggers out-of-sync on CD systems like ArgoCD.
You can just try with this:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
app.kubernetes.io/name: v1beta1.external.metrics.k8s.io
app.kubernetes.io/version: latest
app.kubernetes.io/part-of: keda-operator
name: v1beta1.custom.metrics.k8s.io
# nosemgrep: yaml.kubernetes.security.skip-tls-verify-service.skip-tls-verify-service
spec:
service:
name: keda-metrics-apiserver
namespace: keda
group: external.metrics.k8s.io
version: v1beta1
groupPriorityMinimum: 100
versionPriority: 100
insecureSkipTLSVerify: false
I'm curious what makes cert-controller different in this regard, since it's merely another thing essentially running kubectl apply. Would cert-controller also be subject to the same default-stripping issue, leaving you right back where you began?
Yes, it's removed exactly in the same way, but if it's cert-controller, we don't need to set it in the helm chart, so external tools aren't affected.
Or is the objective to remove the state from the Helm chart such that there is no longer any check for drift for this value?
The objective is to not have to explicitly set the value to default value in general. (not only from helm but from raw manifests)
Modifying controller code to meet the needs of CI/CD code does seem a bit backwards. I am also wary of potentially manually overriding user intent on what could be a very salient field (is there some reason why insecureSkipTLSVerify is set? Will un-setting it cause an outage? Can we be sure the answer will be identical for all projects that consume cert-controller?)
Actually, if cert-controller tries to patch the apiservice
to add the caBundle
but the apiservice
has set insecureSkipTLSVerify: true
, cert-controller will fail because insecureSkipTLSVerify: true
and caBundle
are mutually exclusive. If the value has been set to true
and then cert-controller has been enabled, cert-controller won't work at all, printing errors all the time due to this mutual exclusivity.
As for other options, maybe a post-install hook on the Helm chart to modify the value?
We don't need this feature anymore (nor for helm nor for raw yamls) as we just needed it during some releases after the migration from random certs generated by the pods to real certs, stored in Kubernetes and managed by cert-controller/cert-manager. I mean, we don't need this feature anymore for KEDA xD
I've just pushed it instead of just closing the PR because this project is awesome and has helped us a lot and I assume that we aren't the only one who faces with this and the code was already done.
I can extend any other doubt that you have, said that, if you think that this doesn't make sense, we can close the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if cert-controller tries to patch the apiservice to add the caBundle but the apiservice has set insecureSkipTLSVerify: true, cert-controller will fail because insecureSkipTLSVerify: true and caBundle are mutually exclusive. If the value has been set to true and then cert-controller has been enabled, cert-controller won't work at all, printing errors all the time due to this mutual exclusivity.
This makes sense, but if a user has intentionally skipped TLS verification, will cert-controller overriding it break their use case? If so, consumers would be unable to use that version of cert-controller due to the mandatory override. It comes down to whether its better for cert-controller to return errors or for it to override user intent without context as to why it was set.
I've just pushed it instead of just closing the PR because this project is awesome and has helped us a lot and I assume that we aren't the only one who faces with this and the code was already done.
I very much appreciate the consideration and am glad the project has helped you!
I can extend any other doubt that you have, said that, if you think that this doesn't make sense, we can close the PR
I think if the override was optional, such that projects could opt-in, I would be less concerned. This would probably boil down to adding a new option to the rotator:
cert-controller/pkg/rotator/rotator.go
Lines 219 to 259 in b1a35eb
// CertRotator contains cert artifacts and a channel to close when the certs are ready. | |
type CertRotator struct { | |
reader SyncingReader | |
writer client.Writer | |
SecretKey types.NamespacedName | |
CertDir string | |
CAName string | |
CAOrganization string | |
DNSName string | |
ExtraDNSNames []string | |
IsReady chan struct{} | |
Webhooks []WebhookInfo | |
// FieldOwner is the optional fieldmanager of the webhook updated fields. | |
FieldOwner string | |
RestartOnSecretRefresh bool | |
ExtKeyUsages *[]x509.ExtKeyUsage | |
// RequireLeaderElection should be set to true if the CertRotator needs to | |
// be run in the leader election mode. | |
RequireLeaderElection bool | |
// CaCertDuration sets how long a CA cert will be valid for. | |
CaCertDuration time.Duration | |
// ServerCertDuration sets how long a server cert will be valid for. | |
ServerCertDuration time.Duration | |
// RotationCheckFrequency sets how often the rotation is executed | |
RotationCheckFrequency time.Duration | |
// LookaheadInterval sets how long before the certificate is renewed | |
LookaheadInterval time.Duration | |
// CertName and Keyname override certificate path | |
CertName string | |
KeyName string | |
certsMounted chan struct{} | |
certsNotMounted chan struct{} | |
wasCAInjected *atomic.Bool | |
caNotInjected chan struct{} | |
// testNoBackgroundRotation doesn't actually start the rotator in the background. | |
// This should only be used for testing. | |
testNoBackgroundRotation bool | |
} |
Which defaults to off so that behavior would be backwards compatible for users on upgrade.
Definitely up to you if the extra work is worth your time. I apologize for the slow review cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kindly reminder @maxsmythe 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably want to test both cases, but I hear you about not wanting to propagate the expected assertion everywhere.
Maybe we can use functional options to make this a backwards-compatible change to testWebhook()
.
Functional options blog: https://commandcenter.blogspot.com/2014/01/self-referential-functions-and-design.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f**k! I missed your comment 🤦
I'll review the PR this week 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added another split test for this specific feature. PTAL when you have some time @maxsmythe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kindly reminder @maxsmythe 😄
Signed-off-by: Jorge Turrado <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #160 +/- ##
==========================================
+ Coverage 57.16% 60.13% +2.97%
==========================================
Files 1 1
Lines 572 577 +5
==========================================
+ Hits 327 347 +20
+ Misses 181 170 -11
+ Partials 64 60 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Hello! |
Hello @ritazh @maxsmythe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hello ✋ |
@JorTurFer We are waiting on lgtm from @maxsmythe as well. |
Signed-off-by: Jorge Turrado <[email protected]>
aea7e07
to
944182b
Compare
Signed-off-by: Jorge Turrado <[email protected]>
@@ -778,7 +788,7 @@ func (r *ReconcileWH) Reconcile(ctx context.Context, request reconcile.Request) | |||
artifacts, err := buildArtifactsFromSecret(secret) | |||
if err != nil { | |||
crLog.Error(err, "secret is not well-formed, cannot update webhook configurations") | |||
return reconcile.Result{}, nil | |||
return reconcile.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found this error too. If the reconciliation has failed, we should reschedule it manually or just return the error for auto rescheduling it
If the APIService has already registered the value
insecureSkipTLSVerify: true
, cert-controller fails patching the APIService becausecaBundle
andinsecureSkipTLSVerify: true
aren't compatible.This PR updates the behaviour to ensure that
insecureSkipTLSVerify
is false before setting thecaBundle