diff --git a/teps/0138-decouple-api-and-feature-versioning.md b/teps/0138-decouple-api-and-feature-versioning.md index 9e94e2223..c856d709e 100644 --- a/teps/0138-decouple-api-and-feature-versioning.md +++ b/teps/0138-decouple-api-and-feature-versioning.md @@ -45,11 +45,11 @@ This document proposes updating Tekton Pipelines' feature flags design, as origi ## Motivation -Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), the behavior of `enable-api-fields` depends on the CRD API version being used. In v1beta1 CRDs, `beta` features can be enabled by setting `enable-api-fields` to `beta` or to "`stable`", but in v1 CRDs, `beta` features can only be enabled by setting `enable-api-fields` to `beta`. This couples API versioning to feature stability, and has led to the following pain points: +Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), the behavior of `enable-api-fields` depends on the CRD API version being used. In `v1beta1` CRDs, `beta` features can be enabled by setting `enable-api-fields` to `beta` or to "`stable`", but in `v1` CRDs, `beta` features can only be enabled by setting `enable-api-fields` to `beta`. This couples API versioning to feature stability, and has led to the following pain points: -- [Feedback indicates](https://github.com/tektoncd/pipeline/issues/6592#issuecomment-1533268522) that users upgrading their CRDs from v1beta1 to v1 were confused to find `beta` features that worked by default in v1beta1 did not work by default in v1 when `enable-api-fields` was set to "`stable`" (its default value). This is especially confusing for users who are not cluster operators and cannot control the value of `enable-api-fields`, especially if they are not aware they are using `beta` features. +- [Feedback indicates](https://github.com/tektoncd/pipeline/issues/6592#issuecomment-1533268522) that users upgrading their CRDs from `v1beta1` to `v1` were confused to find `beta` features that worked by default in `v1beta1` did not work by default in `v1` when `enable-api-fields` was set to "`stable`" (its default value). This is especially confusing for users who are not cluster operators and cannot control the value of `enable-api-fields`, especially if they are not aware they are using `beta` features. -- For maintainers, the maintenance operation of swapping the storage version from v1beta1 to v1 should not have affected our users. However, we had to [change the user-facing default value of enable-api-fields from `stable` to `beta` ](https://github.com/tektoncd/pipeline/pull/6732) before changing the storage version of the API to [avoid breaking PipelineRuns using `beta` features](https://github.com/tektoncd/pipeline/pull/6444#issuecomment-1580926707). +- For maintainers, the maintenance operation of swapping the storage version from `v1beta1` to `v1` should not have affected our users. However, we had to [change the user-facing default value of enable-api-fields from `stable` to `beta` ](https://github.com/tektoncd/pipeline/pull/6732) before changing the storage version of the API to [avoid breaking PipelineRuns using `beta` features](https://github.com/tektoncd/pipeline/pull/6444#issuecomment-1580926707). - When promoting features, it could cause confusions for contributors to be dependent on the fact whether an apiVersion is available. For example, during [the promotion to beta for projected workspaces](https://github.com/tektoncd/pipeline/pull/5530), the `v1` api's existence led to confusions of what to do with `beta` features in `v1beta1` and its difference with in `v1`. @@ -65,7 +65,7 @@ Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton- - This is a nice-to-have but not necessarily a blocker, since the feature graduating process should not affect the implementation of how features are enabled. - Ensure pending resources don't break with changing feature flags on downgrades or upgrades - As [handling backwards incompatible changes for pending resources](https://github.com/tektoncd/pipeline/issues/6479) pointed out, we have run into the cases where [feature flag info are changed or lost](https://github.com/tektoncd/pipeline/issues/5999) when handling deprecated fields which led the pending resources to break. However, this issue was introduced by the implementation of feature flags rather than its design, and can be addressed separately. - - Users can downgrade their pipeline versions without invalidating stored resources, even if stored resources cannot be run with the downgraded server. Keeping the stored resources valid relates with the storage migration instead of our feature flags implementations, which has been covered in [Storage version migrator v1beta1 -> v1](https://github.com/tektoncd/pipeline/issues/6667) and is out of scope. + - Users can downgrade their pipeline versions without invalidating stored resources, even if stored resources cannot be run with the downgraded server. Keeping the stored resources valid relates with the storage migration instead of our feature flags implementations, which has been covered in [Storage version migrator `v1beta1` -> `v1`](https://github.com/tektoncd/pipeline/issues/6667) and is out of scope. ### Use Cases @@ -107,19 +107,20 @@ This TEP provides a plan for ensuring that feature stability doesn't depend on C ## Design Details ### Change existing validation to decouple feature and API versioning -We will change the current validation for `enable-api-fields=stable` to only allow using stable features regardless of API version. This will resolve the current issue of the coupling of API and feature versioning in v1beta1. More specifically, beta features resolvers, object/ array params and results will require `enable-api-fields` set to `beta` to be used. This means that, for those current beta features (resolvers, object/ array params and results) will no longer be enabled with `enable-api-fields=stable`. -To continue using these features, users will need to explicitly set the `enable-api-fields` flag to `beta`. This change will not affect users who are already using the `enable-api-fields=beta` flag, which is the default and will continue to be. It would change the behaviour for those who only want to enable `stable` features. +We will change the current validation for `enable-api-fields=stable` to only allow using `stable` features regardless of API version for both supported versions `v1beta1` and `v1`. This will resolve the current issue of the coupling of API and feature versioning in `v1beta1`. More specifically, [beta features](https://github.com/tektoncd/pipeline/blob/main/docs/additional-configs.md#beta-features) such as resolvers, object/ array params and results will require `enable-api-fields` set to `beta` to be used with `v1beta1` API. This means that, the current [beta features](https://github.com/tektoncd/pipeline/blob/main/docs/additional-configs.md#beta-features) (resolvers, object/ array params and results) will no longer be enabled with `enable-api-fields=stable`. +To continue using these beta features, users will need to explicitly set the `enable-api-fields` flag to `beta`. This change will not affect the pipeline deployments with `enable-api-fields` flag set to `beta`. This is the default configuration and will continue to be. This will impact the cluster operators who would like to enable only `stable` features. Those cluster operators will have to either opt off these [beta features](https://github.com/tektoncd/pipeline/blob/main/docs/additional-configs.md#beta-features) like resolvers or change the configuration of the pipeline deployment such that `enable-api-fields` is set to `stable` for their deployments. -Note that although this is a behavior change, it is more of a bug fix for the coupling of feature and API versioning. Currently, with `enable-api-fields` set to `stable`, PipelineRuns like [this one](https://github.com/tektoncd/pipeline/blob/main/examples/v1/pipelineruns/beta/git-resolver.yaml) fail because the controller cannot create child TaskRuns. This change will result in a validation failure instead. +Note that although this looks like a behavior change, it is actually a bug fix. Currently, with `enable-api-fields` set to `stable`, PipelineRuns like [this one](https://github.com/tektoncd/pipeline/blob/main/examples/`v1`/pipelineruns/beta/git-resolver.yaml) fail because the controller cannot create child TaskRuns. This change will result in a validation failure instead and this `pipelineSpec` will be prohibited with ``enable-api-fields=stable`. -However, the default value of `enable-api-fields` will **not** be changed and it will remain to be `beta`, so the **beta** features will continue to be enabled by default. +However, the [default](https://github.com/tektoncd/pipeline/blob/main/config/config-feature-flags.yaml#L89) value of `enable-api-fields` continues to be `beta`. The default deployment of Tekton Pipelines comes with the current [beta features](https://github.com/tektoncd/pipeline/blob/main/docs/additional-configs.md#beta-features) enabled. - **Impacts on users:** - - Cluster operators: - - Current cluster operators with `enable-api-fields` set to `alpha` or `beta` should not experience any changes. - - This makes it possible for cluster operators to have full control over only stable feature usages, rather than user overrides. For example, currently if cluster operators want their users to only opt-in “stable” features, they cannot do so for v1beta1 apiVersion for the exception of resolver, object params and results. - - Pipeline and Task authors: - This affects pipeline users in the situation where cluster operators have chosen to set `enable-api-fields` to `stable` (i.e. those who have changed the current default `beta` value) and who are accidentally using beta features, like resolvers, in v1beta1. + - Cluster Operators: + - No action needed from the cluster operator. The existing deployments with `enable-api-fields` set to `alpha` or `beta` should not experience any changes. + - This makes it possible for the cluster operators to have a full control over their deployments. The cluster operator can guarantee enabling only `stable` features and avoid user bypassing this enforcement . For example, currently it is not possible for the cluster operators to enable only `stable` features for `v1beta1` with an exception of a list of beta features such as resolver, object params and results. + - Pipeline and Task Authors: + - The same `pipeline` and `task` definitions might not be applied to a cluster with `enable-api-fields` set to `stable` after this change is implemented. + - The `pipeline` and `task` definitions in `v1beta1` implementing [beta features](https://github.com/tektoncd/pipeline/blob/main/docs/additional-configs.md#beta-features) will not be applied to a cluster with `enable-api-fields` set to `stable` Such pipeline with beta features used in `v1beta1` apiVersion would not result in a successful pipelineRun because of [the validation after the conversion to `v1` as the storage version](https://github.com/tektoncd/pipeline/blob/d9d2d1760fa534e2dfb16ca656cdf2d293a5900e/pkg/apis/pipeline/`v1`/taskref_validation.go#L39). Since in this example, resolver is still a beta feature, this pipeline should not have been applied to the cluster that has not set `enable-api-fields=beta` in the first place. ### Per feature flag for _new_ api-driven features Introduce per-feature flags for each **new** API driven feature. Each feature will have its own flag, instead of using the group API driven flag `enable-api-fields` to enable or disable all features of a stability level. Note that our proposal only takes effect on api-driven features while behavioural flags will remain the existing beahviour. @@ -129,7 +130,7 @@ The flag will also include the maintainer information and stability level for th For example: ```yaml -apiVersion: v1 +apiVersion: `v1` kind: ConfigMap metadata: name: feature-flags @@ -142,7 +143,7 @@ data: # alpha: v0.53 # # Trusted artifacts has ... functionalities. - # It is disabled by default. + # It is disabled by default. Set it to true to enable this feature. enable-trusted-artifacts: "true" ``` @@ -161,12 +162,12 @@ Note that for per feature flag that has stabilized, they cannot be disabled and The behaviour of existing `enable-api-fields` flag with per feature flag: - Any current beta features can be enabled with `enable-api-fields` set to “beta” or “alpha”. - Any current alpha features can be enabled with `enable-api-fields` set to “alpha”. When current alpha features are promoted to beta, they can be enabled with `enable-api-fields` set to `beta` or `alpha`. -- It will not be possible to enable existing features using per-feature flags. +- The existing features are not adopting this new mechanism of having one flag per feature. It will not be possible to enable existing features using per-feature flags. - We cannot enable existing features using per-feature flags because this would not be backwards compatible. If we allowed this, the individual flag would have precedence over the grouped flag, which means that to preserve backwards compatibility, the individual flag would need to be on by default for beta features and off by default for alpha features. However, this would not be backwards compatible for cluster operators who set `enable-api-fields` to `stable`, since they would also need to override the new `beta` level per feature flags. -- **Cluster operators perspective:** For new features, cluster operators will explicitly turn on or off each features in the ConfigMap. They will be able to choose to turn on a single feature. See future work for how cluster operators would communicate with their users on the list of features enabled. +- **Cluster Operators:** For new features, cluster operators will explicitly turn on or off each features in the ConfigMap. They will be able to choose to turn on a single feature. See future work for how cluster operators would communicate with their users on the list of features enabled. -- **Task and Pipeline authors:** Tekton Pipeline and Task authors will get to know the list of features that are turned on from their service providers. See [future work](#future-work) for the better communication from cluster operators to users for more details. +- **Task and Pipeline Authors:** Tekton Pipeline and Task authors will get to know the list of features that are turned on from their service providers. See [future work](#future-work) for the better communication from cluster operators to users for more details. ### Sunset `enable-api-fields` after existing features stabilize When all existing alpha and beta features have either been stabilized or removed, we will be able to remove the `enable-api-fields` flag. @@ -243,10 +244,10 @@ For the default value of `enable-api-fields` in the long run, in 9 months, it wi - The migration of `enable-api-fields` back to `stable` is backwards incompatible. ### New `enable-api-fields-new` flag; wait for all existing beta/alpha features to stabilize -This alternative proposes introducing a new flag `enable-api-fields-new` that validates new alpha and beta features and leave the existing flag `enable-api-fields` as is for existing alpha/beta features. Once all existing beta/alpha features become stable or v1beta1 apiVersion is removed, we could remove the existing `enable-api-fields`. +This alternative proposes introducing a new flag `enable-api-fields-new` that validates new alpha and beta features and leave the existing flag `enable-api-fields` as is for existing alpha/beta features. Once all existing beta/alpha features become stable or `v1beta1` apiVersion is removed, we could remove the existing `enable-api-fields`. #### Pros -- This has few impacts on users who are currently using `enable-api-fields=stable` for v1beta1. +- This has few impacts on users who are currently using `enable-api-fields=stable` for `v1beta1`. #### Cons - This will potentially adds to confusions to users for newly promoted alpha or beta features. @@ -256,7 +257,7 @@ This alternative proposes introducing a new flag `legacy-enable-beta-features-by This chart would apply to the existing beta features (array results, array indexing, object params and results, and remote resolution): -| enable-api-fields | legacy-enable-beta-features-by-default | enabled in v1beta1? | enabled in v1? | +| enable-api-fields | legacy-enable-beta-features-by-default | enabled in `v1beta1`? | enabled in `v1`? | | ------ | ------ | --- | --- | | beta | true | yes | yes | | beta | false | yes | yes | @@ -265,7 +266,7 @@ This chart would apply to the existing beta features (array results, array index For new beta features(e.g. matrix in the future): -| enable-api-fields | legacy-enable-beta-features-by-default | enabled in v1beta1? | enabled in v1? | +| enable-api-fields | legacy-enable-beta-features-by-default | enabled in `v1beta1`? | enabled in `v1`? | | ------ | ------ | --- | --- | | beta | true | yes | yes | | beta | false | yes | yes | @@ -290,13 +291,13 @@ This alternative proposes keeping beta features on by default. It will keep the - This is backwards compatible. - Changes are minimal for the current codebase with probably only documentation required. #### Cons -- We would still need to use `beta` as the default stability of features that are turned on. Some beta features are still coupled in feature and API versioning. This will not allow cluster operators to opt-in only stable features in v1beta1. +- We would still need to use `beta` as the default stability of features that are turned on. Some beta features are still coupled in feature and API versioning. This will not allow cluster operators to opt-in only stable features in `v1beta1`. ### Give 9-month warning before breaking changes and default to stable -This alternative proposes the validation change for stable features in v1beta1 to be done with giving a notice for the breaking change for 9 months. +This alternative proposes the validation change for stable features in `v1beta1` to be done with giving a notice for the breaking change for 9 months. #### Cons: -- Since v1beta1 only has 12 months of support period left, making a breaking change to it might have more impacts on v1beta1 users than its worth after 9 months. +- Since `v1beta1` only has 12 months of support period left, making a breaking change to it might have more impacts on `v1beta1` users than its worth after 9 months. ## Implementation Plan @@ -390,7 +391,7 @@ For testing out individual per feature flags, we will use unit tests for each si - Testing of behavioural features remains the way they run as of today. ## Future Work -- For seeking better way of communicating enabled features from cluster operators and downstream users including service providers and pipeline authors. CLI feature request to query all enabled features at each stability level via tkn. +- For seeking better way of communicating enabled features from cluster operators to downstream users including service providers and pipeline authors, we could consider having the CLI feature request to query all enabled features at each stability level via tkn. - Example output: ``` $pipeline: tkn features list enabled @@ -398,9 +399,10 @@ For testing out individual per feature flags, we will use unit tests for each si beta: , alpha: ``` +- For debugging purposes for Pipeline end users, we could have included all enabled feature flags in the output yaml of PipelineRuns and TaskRuns, for example one way is to keep it in annotations. This would be beneficial to users who do not have access to the configMap or the controller logs. - For preserving the ability of easily enabling all features of a stability level, we could provide some script that turns on all alpha/beta features by modifying the group of per feature flags. -- In order to avoid feature To have the process of documenting the stability of features asn an auto genenrated process according to the Per Feature Flags struct being updated as a source of truth. +- To avoid the possible errors from the manual process of documenting feature stability, we can automate the process by using the stability level field of the Per Feature Flags struct as the source of truth. For example, a script could be added to ./hack for updating the ./config/config-feature-flags.yaml. - Per feature flag with new value for enable-api-fields `none` This is similar to the proposed solution to introduce per feature flag and migrate Tekton to opt-in onl stable features in a backwards compatible way, except that we are introducing a new value `none` for `enable-api-fields` and it must be used for per feature flags for existing features. All new features enabled via per feature flags will be off by default, regardless of the value of “enable-api-fields”.