Skip to content
This repository has been archived by the owner on Jun 28, 2023. It is now read-only.

Question: Possible to have faster feedback for problematic package configuration? #636

Closed
karuppiah7890 opened this issue May 28, 2021 · 15 comments

Comments

@karuppiah7890
Copy link
Contributor

I installed the external-dns package in my local CAPD standalone cluster to try it out, only to realize that external-dns works with multiple providers but there's no default local provider to see some sort of local demo, without using cloud or any external DNS services. I saw the below error as I hadn't configured the package while installing

$ kubectl get all -A | rg external
external-dns        pod/external-dns-76dc9d8c5b-4jtsw                                    0/1     CrashLoopBackOff   1          9s
external-dns      deployment.apps/external-dns             0/1     1            0           10s
external-dns      replicaset.apps/external-dns-76dc9d8c5b            1         1         0       10s

$ kubectl logs -f deploy/external-dns
time="2021-05-28T11:14:26Z" level=fatal msg="flag parsing error: required flag --provider not provided"

And then I saw the package configuration mentioning "required" arguments / package configuration

$ tanzu package configure external-dns.tce.vmware.com
Looking up config for package: external-dns.tce.vmware.com:
Values files saved to external-dns.tce.vmware.com-values.yaml. Configure this file before installing the package.

$ cat external-dns.tce.vmware.com-values.yaml 
#@data/values
#@overlay/match-child-defaults missing_ok=True
---

#! The namespace in which to deploy ExternalDNS.
namespace: external-dns

#! Deployment related configuration
#@overlay/replace
deployment:
  #! (REQUIRED) Args passed via command-line to external-dns
  #! Provided is an example list of arguments for an RFC2136 provider (BIND).
  #! For more guidance on configuration options for your desired DNS provider, consult the
  #! ExternalDNS docs at https://github.com/kubernetes-sigs/external-dns#running-externaldns
  args:
    - --source=service
    - --source=contour-httpproxy
    - --txt-owner-id=k8s
    - --domain-filter=k8s.example.org
    - --namespace=tanzu-system-service-discovery
    #! - --provider=rfc2136
    #! - --rfc2136-host=100.69.97.77
    #! - --rfc2136-port=53
    #! - --rfc2136-zone=k8s.example.org
    #! - --rfc2136-tsig-secret=MTlQs3NNU=
    #! - --rfc2136-tsig-secret-alg=hmac-sha256
    #! - --rfc2136-tsig-keyname=externaldns-key
    #! - --rfc2136-tsig-axfr
  #! Environment variables to pass to external-dns
  env: []
  #! Security context of the external-dns container
  securityContext: {}
  #! Volume mounts of the external-dns container
  volumeMounts: []
  #! Volume of the external-dns pod
  volumes: []

I was wondering if it's possible to have a faster feedback loop in such cases where I get the error while I try to install and not after I install. Here it's clear that some part of the configuration is required. I understand it's part of a list field where some elements are required. I just wanted to ask if we have any plans to have a faster feedback loop in such cases by having some sort of validation on package configuration.

Also, do we have any schema validation for package configuration? Like, unknown fields, optional / required fields, field type etc. It didn't seem like we have it when I tried to pass some config with unknown fields for a package installation. I tried with v0.4.0. If it isn't there, are there any plans for it? As part of package configuration validation

@karuppiah7890
Copy link
Contributor Author

Related #494

@jpmcb
Copy link
Contributor

jpmcb commented Jun 1, 2021

Hmmm great question.

I think there is a way to create overlays where if there is no value provided, the ytt interpolation fails, therefore preventing a package from being installed with improper or missing configurations.

I think most of the packages currently try to provide some kind of sensible default in the vaules.yaml file, but external DNS is likely one of those that can't provide some sensible default. So, I'm not sure what the right path here would be. We definitely want to provide quick time to value, and having packages with a default install experience of "failed - missing configurations" may not be great for user experience.

Regarding schema validation, ytt is pretty good at that and is already part of the packaging process:

$ ytt -f addons/packages/prometheus/bundle/config
...
# the output will be the entire config yaml manifest

$ echo $!
0

if any part of the overlays were missing or unable to match, the ytt step would fail. However, this doesn't do any kind of higher order "kubernetes" validation. For example, if a namespace field was missing from some object in the manifest, ytt would still interpolate it. It's up to the package authors to build overlays that match and confirm their packages correctly.

@karuppiah7890
Copy link
Contributor Author

karuppiah7890 commented Jun 2, 2021

We definitely want to provide quick time to value, and having packages with a default install experience of "failed - missing configurations" may not be great for user experience.

@jpmcb I didn't get that part. Do you mean that having a failure message isn't a great user experience in this case and we also want to provide quick feedback in case of errors such as above for quick time to value?

@jpmcb
Copy link
Contributor

jpmcb commented Jun 3, 2021

In my opinion, the "default" experience for installing a package should not be a failure. In other words, if I go to install a package without first configuring it, I would expect that package to just work right away. My gut feeling is that this is how most users will go about "discovering" packages. And I would also argue this has become the expectation in the industry; primarily, any package I install via Helm generally just works.

Most of the packages today do provide a good, sensible default install experience. But there may be packages that need additional configuration by the user before a default install experience works. And it's probably a mistake to assume that people will read the docs before trying a default install.

So, yes, it would be great to have some kind of validation mechanism, maybe something that ytt or carvel provides, and but I don't believe the best way to do that is to have a failing default package that expects some user configuration before it will ever work.

Would also be good to get @rosenhouse take on this since there may be ways within the external-dns package to make the default install experience better.

@rosenhouse
Copy link
Contributor

rosenhouse commented Jun 3, 2021

@cppforlife How are you thinking about workflows for required values that have no sensible default?

@rosenhouse
Copy link
Contributor

rosenhouse commented Jun 3, 2021

We just talked about this in the TKG Package Management office hours. Consensus seems to be: we should add validation to the ytt templating for the external-dns package, to make an earlier failure.


More context on external-dns itself: the whole point is to drive an external DNS provider, and we're not going to be able to infer that from an arbitrary cluster.

Deploying an in-cluster DNS server as part of the default package, just so we can give an in-cluster demo, would be possible. But I'm skeptical. That would require maintaining an entirely separate binary (e.g. CoreDNS or BIND or something) in the package, which doubles (at least) the maintenance burden for this package -- DNS servers are notorious CVE factories.

One possible upside to that is that it would then be natural to include a smoke-test with the package that actually exercises the whole end-to-end user workflow.

@rosenhouse
Copy link
Contributor

rosenhouse commented Jun 3, 2021

Follow-ups:

  1. we think the ideal UX is that tanzu package install exits non-zero with a useful error message
  2. we think that would require the CLI to do some status-checking on the package, and for the package to have status depend on the kapp app
  3. we're unclear if that happens today
  4. either way, we can introduce templating validation, to at least provide us some control over the error message. If the tanzu cli eventually gets a feature to wait/block on successful templating, then the desired UX will happen

@jorgemoralespou
Copy link
Contributor

jorgemoralespou commented Jun 4, 2021

Here's a POC that I did for external-dns to make configuration more sensible.

I have created values in a more targetted way so that a user would need to provide just the configuration that is relative to his infrastructure and dns provider: https://github.com/jorgemoralespou/tce/blob/externaldns-configurable/addons/packages/external-dns/bundle/config/values.yaml#L15

There's some validations (currently happening on the server side) via assertions: https://github.com/jorgemoralespou/tce/blob/externaldns-configurable/addons/packages/external-dns/bundle/config/overlays/assertions.yaml. If any validation fails, Reconciliation will fail and a meaningful error will be presented to the user.

With the new Carvel APIs and Schemas validation I would expect something similar, and validation being able to happen on the client side. This is documented on the TCE experience for Carvel packages items 10, 11 and 13

@flawedmatrix
Copy link
Contributor

Hey @jorgemoralespou, thank you for the nice POC with the more sensible external-dns configuration. I think after dealing with the difficulties in getting the args array approach to work with ytt, it would be a good idea to start moving towards more curated configuration.

The only concern I have is that we would be effectively removing support for providers other than the infrastructures that we list. However, if a user wanted to use a provider that we don't support, would it be fair to say that they should simply not use the TCE package and just install external-dns manually? I also imagine we could add support for different providers over time.

The line items you have referred to in vmware-tanzu/tanzu-framework#1570 sounds like great ways to address the problem of slow feedback whenever there is problematic package configuration.

@jorgemoralespou
Copy link
Contributor

if a user wanted to use a provider that we don't support, would it be fair to say that they should simply not use the TCE package and just install external-dns manually?

I would think so. Also, in the future, users will be able to provide their whole overlays, so they could still use this package if they overlay what they need.

@karuppiah7890
Copy link
Contributor Author

Interesting perspectives and ideas! 😮

And wow! I didn't realize my trivial question would lead to so much discussion! Seems like I might have caused some trouble too! Apologies! 😅

And seems like this issue / feature is being tracked as part of vmware-tanzu/tanzu-framework#1570 . Apologies for not checking it out earlier before creating this issue

@mcwumbly
Copy link
Contributor

mcwumbly commented Jun 8, 2021

I didn't realize my trivial question would lead to so much discussion!

This is a good thing! We should encourage more questions like these that lead to important discussions, particularly when they are grounded in details of an example of a real experience.

And seems like this issue / feature is being tracked as part of vmware-tanzu/tanzu-framework#1570. Apologies for not checking it out earlier before creating this issue

That issue looks like it aims to cover the entire experience for users of packages so that various related issues can be seen in one place. It seems like a natural consequence that it will ultimately link to more specific issues like the one you opened here. They ended up getting cross-linked in the process anyways.

(just my 2 cents, on this meta topic anyway)

@jorgemoralespou
Copy link
Contributor

And seems like this issue / feature is being tracked as part of vmware-tanzu/tanzu-framework#1570. Apologies for not checking it out earlier before creating this issue

That issue looks like it aims to cover the entire experience for users of packages so that various related issues can be seen in one place. It seems like a natural consequence that it will ultimately link to more specific issues like the one you opened here. They ended up getting cross-linked in the process anyways.

Yes, that issue os part of a different user story, and this happened to be linked. The fact that you opened this issue just validates the existence of the user story in vmware-tanzu/tanzu-framework#1570.

@flawedmatrix if you think the approach I took on my fork is valid I could just complete it and make a PR that you guys can later refine.

@flawedmatrix
Copy link
Contributor

@jorgemoralespou That sounds great to me. Thanks for taking the time to drive this!

@karuppiah7890
Copy link
Contributor Author

Since package plugin has moved to Tanzu Framework with a completely new implementation, this issue seems obsolete now. Once the package plugin from Tanzu Framework is tried out, we can check if this issue still holds and if yes, then create an issue in Tanzu Framework and take actions there. Closing this issue as there's no action in TCE for this issue and is also obsolete as I mentioned previously

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants