Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the way the certificate for webhook is generated #1345

Open
muffl0n opened this issue Mar 31, 2022 · 21 comments
Open

Change the way the certificate for webhook is generated #1345

muffl0n opened this issue Mar 31, 2022 · 21 comments

Comments

@muffl0n
Copy link
Contributor

muffl0n commented Mar 31, 2022

Currently, the certificate for the ValidatingWebhook is generated on installation.
This has the drawback, that a helm diff reports a change on every run. It reports changes (to the certificate) that makes it hard to distinguish what is due to configuration changes made by oneself.

I took a look into several solutions and found the way "ingress-nginx" does it the smartest one.
This is a rough breakdown on how they do it:

  • On helm install the ValidatingWebhook is installed, without cert-data
  • Also a Job is installed, that runs immediately, creates a certificate+secret and patches the ValidatingWebhook to use that secret

They use the image jet/kube-webhook-certgen, see eclipse-che/che#17855 (comment) for some more details.

I would love to contribute on this, once it get's accepted as "we should do this". I'm relatively new to webhooks, so this could cost me a bit more of my time. But I'm eager to learn. :)

See https://kanisterio.slack.com/archives/C85C8V22J/p1647870529989079?thread_ts=1647866137.638459&cid=C85C8V22J for reference.

@ihcsim
Copy link
Contributor

ihcsim commented Mar 31, 2022

Will it be acceptable to fall back to using the Kubernetes apiserver trust root? That seems to be the behaviour if the caBundle field isn't specified in the validating webhook configuration.

@muffl0n
Copy link
Contributor Author

muffl0n commented Mar 31, 2022

But you still have to put a certificate in the pod, that acts as the ValidatingWebhook, right?
Currently the certificate is generated in validating-webhook.yaml and put in the secret "kanister-webhook-certs". This is mounted as "webhook-certs" in the operator in deployment.yaml

@ihcsim
Copy link
Contributor

ihcsim commented Mar 31, 2022

I wonder if the webhook can issue a CSR to the apiserver and get a signed certificate from it.

@muffl0n
Copy link
Contributor Author

muffl0n commented Mar 31, 2022

An approach that seems to use that way is described here: TLS Certificates for Kubernetes Admission Webhooks made easy with Certificator and Helm Hook?

@ihcsim
Copy link
Contributor

ihcsim commented Mar 31, 2022

Looks like it uses the same K8s CSR API. Just trying to see if we can avoid introducing external components.

@ihcsim
Copy link
Contributor

ihcsim commented Apr 11, 2022

Thanks for putting #1369 together. It helps to see what the code looks like 👍. At this point, we want to keep Kanister unopinionated in regards to cert management solutions. Depending on the user's requirements, there will always be legit reasons to pick one solution over the other.

Our current thinking is to keep the current Helm-generated cert approach as the default mechanism, and offer users who have more advanced, production-readiness requirements with a "bring your own cert" approach, by configuring the Helm chart.

Looks like you have already done some work with cert-manager and Kanister. How do you feel about putting together a reference architecture to show how to deploy Kanister into a cluster that already has cert-manager deployed? The Kanister Helm chart will likely need to be updated to make the VWC caBundle and webhook Secret optional. This will keep Kanister's deployment model free from other specific solutions, and be coupled only to K8s APIs like Secret and ValidatingWebhookConfiguration. It will also be interesting to see if we can generalized this "bring your own cert" approach to work with vault.

Then we can publish the reference architecture as documentation and blog post. WDYT?

@muffl0n
Copy link
Contributor Author

muffl0n commented Apr 11, 2022

Thanks for your feedback! Personally, I do not see why Kanister should not "stand on the shoulder of giants". As "ingress-nginx" and "kube-prometheus-stack" (to only name two) both use the approach with kube-webhook-certgen we should use this as a "sane default". Imho, this is better than the current solution, not only because "helm diff" currently reports a change on every run.

"kube-prometheus-stack" even has the possibility to choose between the approach with kube-webhook-certgen and a full blown solution with "cert-manager":
https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L1481-L1516

Maybe one could choose that path in two steps. First step being using kube-webhook-certgen, second being adding the solution with "cert-manager" and/or "vault".

@ihcsim
Copy link
Contributor

ihcsim commented Apr 11, 2022

Sounds like we agree that cert-manager is a more complete solution. Even kube-webhook-certgen doc recommends it. Let me know if you are interested in putting together a reference architecture.

As for what the default mode should be, at this point, I think the maintenance overhead and footprint of bringing in kube-webhook-certgen outweigh its benefits. I pointed out some of my concerns at #1369 (comment). Fundamentally, the tool does two things - create cert and patch the VWC. It's not hard to do the same thing with just Helm functions following the same pattern (using additional Job, hooks etc.).

Drawing from my experience with service mesh and distributing tracing, most serious clusters would come with its own cert management solutions already. Hence, having a strong "bring your own" story is more important than what the defaults are.

I'd argue that the concern with helm diff is a tooling issue, not a PKI issue. helm diff --suppress might work to some extent. If not, one can raise an issue with the project to improve --suppress to provide more granular filtering.

@muffl0n
Copy link
Contributor Author

muffl0n commented Apr 11, 2022

I agree that we should recommend the solution with cert-manager. But I still vote for a default with kube-webhook-certgen (or similar) as it will be sufficient for the majority of users.

I still refuse to see the helm generated certificate as a valid solution. What happens if someone installs Kanister and does not touch it for 365 days? Suddenly the webhook will just stop working "out of the blue". That's not a good default, imho.
Either we should provide a sane default (like with kube-webhook-certgen) or we should force the user to configure cert-manager (or similar) for the webhook.

@ihcsim
Copy link
Contributor

ihcsim commented Apr 11, 2022

Won't kube-webhook-certgen has the same issue since it relies on the next helm upgrade to run the job?

@muffl0n
Copy link
Contributor Author

muffl0n commented Apr 11, 2022

It runs the job on install which creates the certificate and patches the webhook.
In fact, the job is run on every install. But as the certificate is not expired (100y validity) it does nothing on consecutive runs.

@shuguet
Copy link
Member

shuguet commented Apr 11, 2022

Could we support an alternate way of selecting the cert? By referencing a secret by name for example?
That way the cert could either be generated throigh helm (we don't change the current behaviour), or it could pick up a pre-existing secret, which could be generated any way the user chooses to, no opinion from us on what the underlying Kubernetes distribution/end user choice makes.

@muffl0n
Copy link
Contributor Author

muffl0n commented Apr 14, 2022

I'm not sure if the discussion goes into the right direction. We should keep this as simple as possible for the user, preferably with a out of the box solution that "just works" (and does not produce confusing output on every update/install). For the advanced user we should allow the usage of cert-manager, as easy as the linked project kube-prometheus-stack provides it.

As I'm on maternity leave, I can't invest more time at the moment. But I think the work that has already been made should be sufficient to implement a sophisticated solution.

@ihcsim
Copy link
Contributor

ihcsim commented Apr 14, 2022

Essentially, what @shuguet describes is how cert-manager and many other cert management solutions work i.e. they generate a K8s secret (of type kubernetes.io/tls), and the server (webhook, in our case) fetches its TLS cert from the secret. How the secret comes into existence is up to the cert management solution.

@muffl0n No rush on this. All the best on your maternity leave.

@muffl0n
Copy link
Contributor Author

muffl0n commented May 12, 2022

We are missing one important point here: The ca also needs to be injected into the Webhook. In my example, this is done with this annotation
https://github.com/muffl0n/kanister/blob/1c50d48fc69ede7d68e314bf0c41a6359b3f80d4/helm/kanister-operator/templates/webhook/validating-webhook.yaml#L6-L7

  annotations:
    cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "kanister-operator.fullname" . }}-admission

So it's not only "pick up the secret that somebody created with some tool", it's also "inject the ca into our webhook". Or at least allow the user to specify which annotation to add. But this is only how cert-manager does ist. There is no other similar tool I know of and therefore I have no deeper knowledge what we should consider to allow the user to do.

So, as I fear that we are over-engineering this case, I'm coming back to my statement:

We should keep this as simple as possible for the user, preferably with a out of the box solution that "just works" [...] For the advanced user we should allow the usage of cert-manager [...]"

@ihcsim
Copy link
Contributor

ihcsim commented May 12, 2022

If user wants to bring their own cert, then the Helm chart won't generate the caBundle in the webhook configuration. It becomes a value that is configurable in the chart's values.yaml. Normally, this will be a long-lived, organization-owned CA root cert. It's easier with cert-manager because of the supported annotation.

@muffl0n
Copy link
Contributor Author

muffl0n commented May 19, 2022

That sound's reasonable, didn't think of it that way. Still having the feeling that it's overkill, though. :)

@github-actions
Copy link
Contributor

This issue is marked as stale due to inactivity. Add a new comment to reactivate it.

@github-actions github-actions bot added the stale label Jul 19, 2022
@github-actions
Copy link
Contributor

This issue is closed due to inactivity. Feel free to reopen it, if it's still relevant.

@ihcsim
Copy link
Contributor

ihcsim commented Aug 19, 2022

Still relevant.

@felixlut
Copy link

felixlut commented Jan 8, 2024

Just to add to the convo above:

Generating the cert directly in helm exacerbates the issue when working in a GitOps manner with for example ArgoCD (see the image below). Since ArgoCD constantly runs helm under the hood to generate the desired manifest and compares it to the cluster, the diff in the secret will be there indefinitely. While working on my Kanister deployment this leads to headaches dealing with intermittent TLS errors.

Going one step further my team has started looking into the Rendered Manifests Pattern which is incompatible with Helm generated secrets as well.

Both of these issues would be solved if the cert was generated server-side as opposed to client-side (in helm).

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants