diff --git a/teps/0033-tekton-feature-gates.md b/teps/0033-tekton-feature-gates.md index 7ebd4656d..64a58f544 100644 --- a/teps/0033-tekton-feature-gates.md +++ b/teps/0033-tekton-feature-gates.md @@ -1,5 +1,7 @@ --- status: implemented +superseded-by: +- TEP-0138 title: Tekton Feature Gates creation-date: '2020-11-20' last-updated: '2021-12-16' diff --git a/teps/0138-decouple-api-and-feature-versioning.md b/teps/0138-decouple-api-and-feature-versioning.md index 055f7bca6..9e94e2223 100644 --- a/teps/0138-decouple-api-and-feature-versioning.md +++ b/teps/0138-decouple-api-and-feature-versioning.md @@ -1,8 +1,8 @@ --- -status: proposed -title: Decouple api and feature versioning +status: implementable +title: Decouple API and feature versioning creation-date: '2023-07-07' -last-updated: '2023-07-27' +last-updated: '2023-08-23' authors: - '@JeromeJu' - '@chitrangpatel' @@ -17,6 +17,25 @@ authors: - [Goals](#goals) - [Non-Goals](#non-goals) - [Use Cases](#use-cases) +- [Requirements](#requirements) +- [Proposal](#proposal) +- [Design Details](#design-details) + - [Change existing validation to decouple feature and API versioning](#change-existing-validation-to-decouple-feature-and-api-versioning) + - [Per feature flag for new api-driven features](#per-feature-flag-for-new-api-driven-features) + - [Sunset `enable-api-fields` after existing features stabilize](#sunset-enable-api-fields-after-existing-features-stabilize) +- [Design Evaluation](#design-evaluation) + - [Pros and cons](#pros-and-cons) +- [Alternatives](#alternatives) + - [Per feature flag with new value for enable-api-fields `none`](#per-feature-flag-with-new-value-for-enable-api-fields-none) + - [New `legacy-enable-beta-features-by-default` flag](#new-legacy-enable-beta-features-by-default-flag) + - [Make `beta` feature validation changes now](#make-beta-feature-validation-changes-now-migrate-enable-api-fields-to-stable-in-9-months) + - [New`legacy-stable` value for `enable-api-fields`](#new-legacy-stable-value-for-enable-api-fields-migrate-enable-api-fields-to-stable-in-9-months) + - [Make validation changes only for new beta features](#make-validation-changes-only-for-new-beta-features) + - [Give 9-month warning before making v1beta1 validation changes and default to stable](#give-9-month-warning-before-making-v1beta1-validation-changes-and-default-to-stable) + - [Wait until v1beta1 is removed to swap `enable-api-fields` back to `stable`](#wait-until-v1beta1-is-removed-to-swap-enable-api-fields-back-to-stable) +- [Implementation Plan](#implementation-plan) + - [Test Plan](#test-plan) +- [Future Work](#future-work) - [References](#references) @@ -26,21 +45,21 @@ This document proposes updating Tekton Pipelines' feature flags design, as origi ## Motivation -Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), the behavior of `enable-api-fields` depends on the CRD API version being used. In `v1beta1 CRDs`, `beta` features can be enabled by setting `enable-api-fields` to `beta` or to "`stable`", but in `v1` CRDs, `beta` features can only be enabled by setting `enable-api-fields` to `beta`. This couples API versioning to feature stability, and has led to the following pain points: +Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), the behavior of `enable-api-fields` depends on the CRD API version being used. In v1beta1 CRDs, `beta` features can be enabled by setting `enable-api-fields` to `beta` or to "`stable`", but in v1 CRDs, `beta` features can only be enabled by setting `enable-api-fields` to `beta`. This couples API versioning to feature stability, and has led to the following pain points: -- [Feedback indicates](https://github.com/tektoncd/pipeline/issues/6592#issuecomment-1533268522) that users upgrading their CRDs from `v1beta1` to `v1` were confused to find `beta` features that worked by default in `v1beta1` did not work by default in `v1` when `enable-api-fields` was set to "`stable`" (its default value). This is especially confusing for users who are not cluster operators and cannot control the value of `enable-api-fields`, especially if they are not aware they are using `beta` features. +- [Feedback indicates](https://github.com/tektoncd/pipeline/issues/6592#issuecomment-1533268522) that users upgrading their CRDs from v1beta1 to v1 were confused to find `beta` features that worked by default in v1beta1 did not work by default in v1 when `enable-api-fields` was set to "`stable`" (its default value). This is especially confusing for users who are not cluster operators and cannot control the value of `enable-api-fields`, especially if they are not aware they are using `beta` features. -- For maintainers, the maintenance operation of swapping the storage version from `v1beta1` to `v1` should not have affected our users. However, we had to [change the user-facing default value of enable-api-fields from `stable` to `beta` ](https://github.com/tektoncd/pipeline/pull/6732) before changing the storage version of the API to [avoid breaking PipelineRuns using `beta` features](https://github.com/tektoncd/pipeline/pull/6444#issuecomment-1580926707). +- For maintainers, the maintenance operation of swapping the storage version from v1beta1 to v1 should not have affected our users. However, we had to [change the user-facing default value of enable-api-fields from `stable` to `beta` ](https://github.com/tektoncd/pipeline/pull/6732) before changing the storage version of the API to [avoid breaking PipelineRuns using `beta` features](https://github.com/tektoncd/pipeline/pull/6444#issuecomment-1580926707). - When promoting features, it could cause confusions for contributors to be dependent on the fact whether an apiVersion is available. For example, during [the promotion to beta for projected workspaces](https://github.com/tektoncd/pipeline/pull/5530), the `v1` api's existence led to confusions of what to do with `beta` features in `v1beta1` and its difference with in `v1`. -### Goals +## Goals -- Feature validations and implementation should be independant from any API version. -- Come up with a backward-compatible migration plan for setting the `enable-api-fields` feature flag to `stable` in the long term. -- Changes and updates made to the existent feature validaitons regarding decoupling api and feature versioning should keep much backwards compatiblity as possible. +- Feature validations and implementation should be independent from any API version. +- Come up with a plan that makes the migration easier for setting feature flags to enable stable features only by default in the long term. +- Changes and updates made to the existing feature validations regarding decoupling API and feature versioning should keep much backwards compatiblity as possible. -### Non goals +## Non goals - Better guidance on feature promotion and when features can be promoted - This is a nice-to-have but not necessarily a blocker, since the feature graduating process should not affect the implementation of how features are enabled. @@ -72,8 +91,330 @@ Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton- **Tekton Maintainers** - I would like to be able to migrate the apiVersion without having to make backwards incompatible changes. +## Requirements +- Avoid making changes to existing features' stability levels just for the purpose of addressing coupling feature and API versioning. +- Avoid blocking the promotion of `beta` features from the existing `alpha` features. +- It should have a testing stratey that will give us confidence in our impelmentations of per feature flags and changes to existing feature flags. + +## Proposal + +This TEP provides a plan for ensuring that feature stability doesn't depend on CRD API version, and using per-feature flags to move to a future where only stable features are enabled by default through the following steps: +- [Change existing validation to decouple feature and API versioning](#change-existing-validation-to-decouple-feature-and-api-versioning) +- [Per feature flag for new api-driven features](#per-feature-flag-for-new-api-driven-features) +- [Sunset `enable-api-fields` after existing features stabilize](#sunset-enable-api-fields-after-existing-features-stabilize) + + +## Design Details + +### Change existing validation to decouple feature and API versioning +We will change the current validation for `enable-api-fields=stable` to only allow using stable features regardless of API version. This will resolve the current issue of the coupling of API and feature versioning in v1beta1. More specifically, beta features resolvers, object/ array params and results will require `enable-api-fields` set to `beta` to be used. This means that, for those current beta features (resolvers, object/ array params and results) will no longer be enabled with `enable-api-fields=stable`. +To continue using these features, users will need to explicitly set the `enable-api-fields` flag to `beta`. This change will not affect users who are already using the `enable-api-fields=beta` flag, which is the default and will continue to be. It would change the behaviour for those who only want to enable `stable` features. + +Note that although this is a behavior change, it is more of a bug fix for the coupling of feature and API versioning. Currently, with `enable-api-fields` set to `stable`, PipelineRuns like [this one](https://github.com/tektoncd/pipeline/blob/main/examples/v1/pipelineruns/beta/git-resolver.yaml) fail because the controller cannot create child TaskRuns. This change will result in a validation failure instead. + +However, the default value of `enable-api-fields` will **not** be changed and it will remain to be `beta`, so the **beta** features will continue to be enabled by default. + +- **Impacts on users:** + - Cluster operators: + - Current cluster operators with `enable-api-fields` set to `alpha` or `beta` should not experience any changes. + - This makes it possible for cluster operators to have full control over only stable feature usages, rather than user overrides. For example, currently if cluster operators want their users to only opt-in “stable” features, they cannot do so for v1beta1 apiVersion for the exception of resolver, object params and results. + - Pipeline and Task authors: + This affects pipeline users in the situation where cluster operators have chosen to set `enable-api-fields` to `stable` (i.e. those who have changed the current default `beta` value) and who are accidentally using beta features, like resolvers, in v1beta1. + +### Per feature flag for _new_ api-driven features +Introduce per-feature flags for each **new** API driven feature. Each feature will have its own flag, instead of using the group API driven flag `enable-api-fields` to enable or disable all features of a stability level. Note that our proposal only takes effect on api-driven features while behavioural flags will remain the existing beahviour. + +The flag will also include the maintainer information and stability level for the feature as the source of truth. New behavioural features will also have a new per-feature flag to either enable or disable the feature. This will allow behavioural features that has values leading to behaviors in different stability levels to be turned on or off instead of depending on the stability level of features that are enabled. + +For example: + +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: feature-flags +data: + # alpha: v0.53 + # + # Pipeline in pipeline has ... functionalities. + # It is disabled by default. + enable-pipeline-in-pipeline: "false" + # alpha: v0.53 + # + # Trusted artifacts has ... functionalities. + # It is disabled by default. + enable-trusted-artifacts: "true" +``` + +See [implementation plan](#implementation-plan) for more details on the PerFeatureFlag struct. + +All **new** features can only be enabled via per-feature flags. When they first get introduced as alpha, they will be disabled by default. When new features get promoted to stable, they will be enabled by default according to the following table: + +| Feature stability level | Default | +| ----------------------- | -------- | +| Stable | Enabled; Cannot be disabled | +| Beta | Disabled | +| Alpha | Disabled | + +Note that for per feature flag that has stabilized, they cannot be disabled and we will remove the feature flag after 3 releases after it has become stable. Notification will be given to users via release notes. + +The behaviour of existing `enable-api-fields` flag with per feature flag: +- Any current beta features can be enabled with `enable-api-fields` set to “beta” or “alpha”. +- Any current alpha features can be enabled with `enable-api-fields` set to “alpha”. When current alpha features are promoted to beta, they can be enabled with `enable-api-fields` set to `beta` or `alpha`. +- It will not be possible to enable existing features using per-feature flags. + - We cannot enable existing features using per-feature flags because this would not be backwards compatible. If we allowed this, the individual flag would have precedence over the grouped flag, which means that to preserve backwards compatibility, the individual flag would need to be on by default for beta features and off by default for alpha features. However, this would not be backwards compatible for cluster operators who set `enable-api-fields` to `stable`, since they would also need to override the new `beta` level per feature flags. + +- **Cluster operators perspective:** For new features, cluster operators will explicitly turn on or off each features in the ConfigMap. They will be able to choose to turn on a single feature. See future work for how cluster operators would communicate with their users on the list of features enabled. + +- **Task and Pipeline authors:** Tekton Pipeline and Task authors will get to know the list of features that are turned on from their service providers. See [future work](#future-work) for the better communication from cluster operators to users for more details. + +### Sunset `enable-api-fields` after existing features stabilize +When all existing alpha and beta features have either been stabilized or removed, we will be able to remove the `enable-api-fields` flag. + +Snapshot of existent beta and alpha features as of today: +| Feature | Stability level | Individual flag | +| ----------------------------------------------------------------------------------------------------- | --------------- | ----------------------------------------------- | +| [Array Results and Array Indexing](pipelineruns.md#specifying-parameters) | beta | | +| [Object Parameters and Results](pipelineruns.md#specifying-parameters) | beta | | +| [Remote Tasks](./taskruns.md#remote-tasks) and [Remote Pipelines](./pipelineruns.md#remote-pipelines) | beta | | +| [`Provenance` field in Status](pipeline-api.md#provenance) | beta | `enable-provenance-in-status` | +| [Isolated `Step` & `Sidecar` `Workspaces`](./workspaces.md#isolated-workspaces) | beta | | +| [Bundles ](./pipelineruns.md#tekton-bundles) | alpha | `enable-tekton-oci-bundles` | +| [Hermetic Execution Mode](./hermetic.md) | alpha | | +| [Windows Scripts](./tasks.md#windows-scripts) | alpha | | +| [Debug](./debug.md) | alpha | | +| [Step and Sidecar Overrides](./taskruns.md#overriding-task-steps-and-sidecars) | alpha | | +| [Matrix](./matrix.md) | alpha | | +| [Task-level Resource Requirements](compute-resources.md#task-level-compute-resources-configuration) | alpha | | +| [Trusted Resources](./trusted-resources.md) | alpha | `trusted-resources-verification-no-match-policy`| +| [Larger Results via Sidecar Logs](#enabling-larger-results-using-sidecar-logs) | alpha | `results-from` | +| [Configure Default Resolver](./resolution.md#configuring-built-in-resolvers) | alpha | | +| [Coschedule](./affinityassistants.md) | alpha | `coschedule` | + +#### Example of introducing new features: +**i.** A single feature "pipeline-in-pipeline" being introduced in `v1`: + We will add a new feature flag "enable-pipeline-in-pipeline" to the configMap, it will have the `alpha` stability level as a new feature and will be disabled by default. + - Cluster operators will now be able to enable or disable feature "pipeline-in-pipeline". + - Tekton Pipeline authors will be informed by the cluster operators on whether the new feature is turned on or off. + +**ii.** Two more features "trusted-artifacts" and "cloud-event-controller" are introduced while the feature introduced in step i remains alpha: + Regardless of the alpha "foo" feature, we are going to add two more feature flags "bar" and "baz". + - Cluster operators will have to make the choice of turning on or off two more features flags with addition to the ones introduced in step i. + - Tekton Pipeline authors will be informed by the cluster operators on whether the new feature is turned on or off. + +## Design Evaluaion + +#### Pros +- Migrates Tekton to a state where only stable features are enabled by default in a backwards compatible way. +- Cluster operators can more granular control over features to be turned on. Previously they can only have features of a certain stability level all on or off, but now they can enable individual alpha or beta features controlled by API fields. +- Unblocks the [internal version work](https://docs.google.com/document/d/1wXQaiay18hlcuxOvl5T3BZiyOFSLN9hsr4rkhOwCz6I/edit#bookmark=id.v5nbt7ga5jny) where the validations in the internal version does not depend on apiVersions, which requires the decoupling of feature and API versioning. +- Improves consistency among existing features enabled with `enable-api-fields` set to `beta`, since the existing beta feature isolated step and sidecar workspaces(since v0.50) is validated differently. +- The validations for per feature flag will have clear source of truth of feature levels and traceability and there will not be coupling in the future. + +#### Cons +- This is adding complexity to both the implementations and the testing matrix for the newly introduced flag. + +## Alternatives +### Fix validation and migrate `enable-api-fields` back to `stable` +The following alternatives all proposes to fix validation and migrate `enable-api-fields` back to `stable`. They differ in the details of how the new value for `enable-api-fields` will be named, or whether a new flag will be introduced. + +#### Variant i. Introduce `new-stable` value for `enable-api-fields`; migrate `enable-api-fields` to `stable` in 9 months + +- A new option, `enable-api-fields` = `new-stable`, will be added to the API. This option will use the preferred validation where `stable` and `beta` features are validated the same across apiVersions. In 9 months, `new-stable` will be renamed as `stable`. +- The current behavior for `stable` `enable-api-fields` remains the same for now where it allows users to use `beta` features in `v1beta1` that have coupled feature and API versioning in `v1beta1` i.e. remote resolution. In 9 months “stable” will be renamed as `legacy-stable` and be marked as deprecated. +- The existing `beta` option for `enable-api-fields` will remain the same for now and in 9 months. +- Taking the remote resolution feature, which currently couples feature and API versioning, as an example, after the change: + - With `new-stable` in `v1beta1`, we disable resolver. + - With `legacy-stable` in `v1beta1`, we do not validate resolvers as a field needs `enable-api-fields` as beta, so they are still turned on by default. + - When `enable-api-fields` is set to `beta` in both `v1` and `v1beta1`, we are turning on resolver. + +#### Variant ii: Make `beta` feature validation changes now; migrate `enable-api-fields` to `stable` in 9 months + +Require `enable-api-fields` to be set to `beta` when using beta features, regardless of CRD API version. Immediately make this change for existing beta features. This will solve the unintended behavior that taskruns cannot be created for version coupled features with `enable-api-fields` set to `stable` during the `v1` storage swap right away. + + +#### Variant iii. New `legacy-stable` value for `enable-api-fields`; migrate `enable-api-fields` to `stable` in 9 months +Change the validation for beta features when `enable-api-fields` is set to `stable` to validate only stable features across apiVersions. Add a `legacy-stable` value to the `enable-api-fields` to keep the behavior of the current `stable` feature validations in `v1beta1` that includes the coupled features. + +For the default value of `enable-api-fields` in the long run, in 9 months, it will be switched back to `stable`, and the `beta` option will remain the same, with `legacy-stable` being removed. + +#### Cons +- The main reason that those variants are rejected is that updating existing beta features valiations in `v1beta1` is a backwards incompatible change. This will break users who are accidentally using the coupled beta features while `enable-api-fields` is set to stable. +- The migration of `enable-api-fields` back to `stable` is backwards incompatible. + +### New `enable-api-fields-new` flag; wait for all existing beta/alpha features to stabilize +This alternative proposes introducing a new flag `enable-api-fields-new` that validates new alpha and beta features and leave the existing flag `enable-api-fields` as is for existing alpha/beta features. Once all existing beta/alpha features become stable or v1beta1 apiVersion is removed, we could remove the existing `enable-api-fields`. + +#### Pros +- This has few impacts on users who are currently using `enable-api-fields=stable` for v1beta1. + +#### Cons +- This will potentially adds to confusions to users for newly promoted alpha or beta features. + +### New `legacy-enable-beta-features-by-default` flag +This alternative proposes introducing a new flag `legacy-enable-beta-features-by-default ` that takes boolean value and it would be phased out in 9 months. The existing `enable-api-fields` will continue applying to existing beta features when the flag `legacy-enable-beta-features-by-default` is `true`. + +This chart would apply to the existing beta features (array results, array indexing, object params and results, and remote resolution): + +| enable-api-fields | legacy-enable-beta-features-by-default | enabled in v1beta1? | enabled in v1? | +| ------ | ------ | --- | --- | +| beta | true | yes | yes | +| beta | false | yes | yes | +| stable | true | yes | no | +| stable | false | no | no | + +For new beta features(e.g. matrix in the future): + +| enable-api-fields | legacy-enable-beta-features-by-default | enabled in v1beta1? | enabled in v1? | +| ------ | ------ | --- | --- | +| beta | true | yes | yes | +| beta | false | yes | yes | +| stable | true | no | no | +| stable | false | no | no | + +Once all existing beta features become stable, `legacy-enable-beta-features-by-default` can be removed and we will deprecate and then remove `legacy-enable-beta-features-by-default` and to use `stable` `enable-api-fields`. We would default `true` for the new flag and after 9 months, we default `enable-api-fields` to `stable` to preserve the existing behavior. + +#### Pros +- This will provide a smoother transition for switching default `enable-api-fields` value back to `stable`. + +#### Cons +- This is a backwards incompatible change. +- It is not clear what to do with alpha features for the feature promotion process when there are two `enable-api-fields` related flags, which might lead to confusions. +- This is adding complexity to both the implementations and the testing matrix for the newly introduced flag. + + +### Make validation changes only for new beta features +This alternative proposes keeping beta features on by default. It will keep the current implementations and validations for beta features in `v1` and `v1beta1` apiVersion and only make new features have the synchronized validations across different apiVersions. This also proposes to keep the default value of `enable-api-fields` as `beta`, + +#### Pros +- This is backwards compatible. +- Changes are minimal for the current codebase with probably only documentation required. +#### Cons +- We would still need to use `beta` as the default stability of features that are turned on. Some beta features are still coupled in feature and API versioning. This will not allow cluster operators to opt-in only stable features in v1beta1. + +### Give 9-month warning before breaking changes and default to stable +This alternative proposes the validation change for stable features in v1beta1 to be done with giving a notice for the breaking change for 9 months. + +#### Cons: +- Since v1beta1 only has 12 months of support period left, making a breaking change to it might have more impacts on v1beta1 users than its worth after 9 months. + +## Implementation Plan + +For each new feature driven by api fields, a new feature flag using FeatureFlag struct will be added to the existent set of feature flags. This struct will include the stability of feature and whether it is turned on or locked to default. Currently it will only work for new features without `enable-api-fields`, see future work for more details of the option to introduce `enable-api-fields=none` to use per feature flags for new features. + + +```go +type FeatureFlag struct { + // Name of the feature flag + Name string + + // Stability level of the feature, one of "stable", "beta" or "alpha" + Stability string + + // Enabled is whether the feature is turned on + Enabled bool + + // LockToDefault indicates whether the feature is locked to default value and cannot be changed + LockToDefault bool + + // Deprecated is whether the feature is deprecated + // +optional + Deprecated bool +} +``` + +## Test Plan +### Retain existing CI end-to-end testing matrix +We plan to retain the existing testing matrix for fields at `alpha`, `beta` and `stable`. With the addition of per-feature flags for new features, it would look like: + +| |enable-api-fields|Per feature flags|Integration tests| +| -- | -- | -- | -- | +|Opt-in stable| stable|Turn OFF all per feature flags|Run all stable e2e tests.| +|Opt-in beta| beta|Turn ON all beta per-feature flags|Run all beta e2e tests.| +|Opt-in alpha| alpha|Turn ON all per-feature flags (alpha and beta)|Run all alpha e2e tests.| + + +## Additional CI tests +### Additional Test Combinations +We cannot test 2\*\*N combinations of per feature flags since that would be too time consuming. +Therefore, we try to mimic the following scenarios. + +- Say, a cluster operator has set the cluster to run Stable features. Occasionally, they may want to turn on a feature that is not yet stable. +- Similarly if the cluster is running Beta features by default (i.e. all feature flags at a beta stability level are ON and enable-api-fields is set to “stable”), cluster operators may want to turn on an individual feature at an alpha level. +- Conversely, if the cluster is running all the features (all feature flags are ON and enable api fields is set to “alpha”), they may want to turn off individual features. + + + +||enable-api-fields|Per feature flags|Integration tests| +| :- | :- | :- | :- | +|Opt-in stable|Set to stable|
All feature flags are OFF by default.
Turn ON one feature flag at a time.
|Run a small number of e2e tests against N combinations. It is not feasible to run the entire e2e test suite against N combinations.| +|Opt-in beta|Set to beta|Turn ON all beta per-feature flags by default.