From 4b41175ea63be6147b1022cf37dcbb8352c4095e Mon Sep 17 00:00:00 2001 From: Jerome Ju Date: Mon, 25 Sep 2023 15:35:13 -0400 Subject: [PATCH] TEP-0138: Decouple API and Feature Versioning This commit adds the proposals for decoupling api and feature versioning. This TEP is marked as 'implementable'. --- teps/0033-tekton-feature-gates.md | 2 + ...138-decouple-api-and-feature-versioning.md | 363 +++++++++++++++++- teps/README.md | 2 +- 3 files changed, 356 insertions(+), 11 deletions(-) diff --git a/teps/0033-tekton-feature-gates.md b/teps/0033-tekton-feature-gates.md index 7ebd4656d..64a58f544 100644 --- a/teps/0033-tekton-feature-gates.md +++ b/teps/0033-tekton-feature-gates.md @@ -1,5 +1,7 @@ --- status: implemented +superseded-by: +- TEP-0138 title: Tekton Feature Gates creation-date: '2020-11-20' last-updated: '2021-12-16' diff --git a/teps/0138-decouple-api-and-feature-versioning.md b/teps/0138-decouple-api-and-feature-versioning.md index 055f7bca6..8a302a54d 100644 --- a/teps/0138-decouple-api-and-feature-versioning.md +++ b/teps/0138-decouple-api-and-feature-versioning.md @@ -1,8 +1,8 @@ --- -status: proposed -title: Decouple api and feature versioning +status: implementable +title: Decouple API and feature versioning creation-date: '2023-07-07' -last-updated: '2023-07-27' +last-updated: '2023-08-23' authors: - '@JeromeJu' - '@chitrangpatel' @@ -17,6 +17,25 @@ authors: - [Goals](#goals) - [Non-Goals](#non-goals) - [Use Cases](#use-cases) +- [Requirements](#requirements) +- [Proposal](#proposal) +- [Design Details](#design-details) + - [Change existing validation to decouple feature and API versioning](#change-existing-validation-to-decouple-feature-and-api-versioning) + - [Per feature flag for new api-driven features](#per-feature-flag-for-new-api-driven-features) + - [Sunset `enable-api-fields` after existing features stabilize](#sunset-enable-api-fields-after-existing-features-stabilize) +- [Design Evaluation](#design-evaluation) + - [Pros and cons](#pros-and-cons) +- [Alternatives](#alternatives) + - [Per feature flag with new value for enable-api-fields `none`](#per-feature-flag-with-new-value-for-enable-api-fields-none) + - [New `legacy-enable-beta-features-by-default` flag](#new-legacy-enable-beta-features-by-default-flag) + - [Make `beta` feature validation changes now](#make-beta-feature-validation-changes-now-migrate-enable-api-fields-to-stable-in-9-months) + - [New`legacy-stable` value for `enable-api-fields`](#new-legacy-stable-value-for-enable-api-fields-migrate-enable-api-fields-to-stable-in-9-months) + - [Make validation changes only for new beta features](#make-validation-changes-only-for-new-beta-features) + - [Give 9-month warning before making v1beta1 validation changes and default to stable](#give-9-month-warning-before-making-v1beta1-validation-changes-and-default-to-stable) + - [Wait until v1beta1 is removed to swap `enable-api-fields` back to `stable`](#wait-until-v1beta1-is-removed-to-swap-enable-api-fields-back-to-stable) +- [Implementation Plan](#implementation-plan) + - [Test Plan](#test-plan) +- [Future Work](#future-work) - [References](#references) @@ -26,7 +45,7 @@ This document proposes updating Tekton Pipelines' feature flags design, as origi ## Motivation -Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), the behavior of `enable-api-fields` depends on the CRD API version being used. In `v1beta1 CRDs`, `beta` features can be enabled by setting `enable-api-fields` to `beta` or to "`stable`", but in `v1` CRDs, `beta` features can only be enabled by setting `enable-api-fields` to `beta`. This couples API versioning to feature stability, and has led to the following pain points: +Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), the behavior of `enable-api-fields` depends on the CRD API version being used. In `v1beta1` CRDs, `beta` features can be enabled by setting `enable-api-fields` to `beta` or to "`stable`", but in `v1` CRDs, `beta` features can only be enabled by setting `enable-api-fields` to `beta`. This couples API versioning to feature stability, and has led to the following pain points: - [Feedback indicates](https://github.com/tektoncd/pipeline/issues/6592#issuecomment-1533268522) that users upgrading their CRDs from `v1beta1` to `v1` were confused to find `beta` features that worked by default in `v1beta1` did not work by default in `v1` when `enable-api-fields` was set to "`stable`" (its default value). This is especially confusing for users who are not cluster operators and cannot control the value of `enable-api-fields`, especially if they are not aware they are using `beta` features. @@ -34,19 +53,19 @@ Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton- - When promoting features, it could cause confusions for contributors to be dependent on the fact whether an apiVersion is available. For example, during [the promotion to beta for projected workspaces](https://github.com/tektoncd/pipeline/pull/5530), the `v1` api's existence led to confusions of what to do with `beta` features in `v1beta1` and its difference with in `v1`. -### Goals +## Goals -- Feature validations and implementation should be independant from any API version. -- Come up with a backward-compatible migration plan for setting the `enable-api-fields` feature flag to `stable` in the long term. -- Changes and updates made to the existent feature validaitons regarding decoupling api and feature versioning should keep much backwards compatiblity as possible. +- Feature validations and implementation should be independent from any API version. +- Come up with a plan that makes the migration easier for setting feature flags to enable stable features only by default in the long term. +- Changes and updates made to the existing feature validations regarding decoupling API and feature versioning should keep much backwards compatiblity as possible. -### Non goals +## Non goals - Better guidance on feature promotion and when features can be promoted - This is a nice-to-have but not necessarily a blocker, since the feature graduating process should not affect the implementation of how features are enabled. - Ensure pending resources don't break with changing feature flags on downgrades or upgrades - As [handling backwards incompatible changes for pending resources](https://github.com/tektoncd/pipeline/issues/6479) pointed out, we have run into the cases where [feature flag info are changed or lost](https://github.com/tektoncd/pipeline/issues/5999) when handling deprecated fields which led the pending resources to break. However, this issue was introduced by the implementation of feature flags rather than its design, and can be addressed separately. - - Users can downgrade their pipeline versions without invalidating stored resources, even if stored resources cannot be run with the downgraded server. Keeping the stored resources valid relates with the storage migration instead of our feature flags implementations, which has been covered in [Storage version migrator v1beta1 -> v1](https://github.com/tektoncd/pipeline/issues/6667) and is out of scope. + - Users can downgrade their pipeline versions without invalidating stored resources, even if stored resources cannot be run with the downgraded server. Keeping the stored resources valid relates with the storage migration instead of our feature flags implementations, which has been covered in [Storage version migrator `v1beta1` -> `v1`](https://github.com/tektoncd/pipeline/issues/6667) and is out of scope. ### Use Cases @@ -72,8 +91,332 @@ Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton- **Tekton Maintainers** - I would like to be able to migrate the apiVersion without having to make backwards incompatible changes. +## Requirements +- Avoid making changes to existing features' stability levels just for the purpose of addressing coupling feature and API versioning. +- Avoid blocking the promotion of `beta` features from the existing `alpha` features. +- It should have a testing stratey that will give us confidence in our impelmentations of per feature flags and changes to existing feature flags. + +## Proposal + +This TEP provides a plan for ensuring that feature stability doesn't depend on CRD API version, and using per-feature flags to move to a future where only stable features are enabled by default through the following steps: +- [Change existing validation to decouple feature and API versioning](#change-existing-validation-to-decouple-feature-and-api-versioning) +- [Per feature flag for new api-driven features](#per-feature-flag-for-new-api-driven-features) +- [Sunset `enable-api-fields` after existing features stabilize](#sunset-enable-api-fields-after-existing-features-stabilize) + + +## Design Details + +### Change existing validation to decouple feature and API versioning +We will change the current validation for `enable-api-fields=stable` to only allow using `stable` features regardless of API version for both supported versions `v1beta1` and `v1`. This will resolve the current issue of the coupling of API and feature versioning in `v1beta1`. More specifically, [beta features](https://github.com/tektoncd/pipeline/blob/main/docs/additional-configs.md#beta-features) such as resolvers, object/ array params and results will require `enable-api-fields` set to `beta` to be used with `v1beta1` API. This means that, the current [beta features](https://github.com/tektoncd/pipeline/blob/main/docs/additional-configs.md#beta-features) (resolvers, object/ array params and results) will no longer be enabled with `enable-api-fields=stable`. +To continue using these beta features, users will need to explicitly set the `enable-api-fields` flag to `beta`. This change will not affect the pipeline deployments with `enable-api-fields` flag set to `beta`. This is the default configuration and will continue to be. This will impact the cluster operators who would like to enable only `stable` features. Those cluster operators will have to either opt off these [beta features](https://github.com/tektoncd/pipeline/blob/main/docs/additional-configs.md#beta-features) like resolvers or change the configuration of the pipeline deployment such that `enable-api-fields` is set to `stable` for their deployments. + +Note that although this looks like a behavior change, it is actually a bug fix. Currently, with `enable-api-fields` set to `stable`, PipelineRuns like [this one](https://github.com/tektoncd/pipeline/blob/main/examples/`v1`/pipelineruns/beta/git-resolver.yaml) fail because the controller cannot create child TaskRuns. This change will result in a validation failure instead and this `pipelineSpec` will be prohibited with ``enable-api-fields=stable`. + +However, the [default](https://github.com/tektoncd/pipeline/blob/main/config/config-feature-flags.yaml#L89) value of `enable-api-fields` continues to be `beta`. The default deployment of Tekton Pipelines comes with the current [beta features](https://github.com/tektoncd/pipeline/blob/main/docs/additional-configs.md#beta-features) enabled. + +- **Impacts on users:** + - Cluster Operators: + - No action needed from the cluster operator. The existing deployments with `enable-api-fields` set to `alpha` or `beta` should not experience any changes. + - This makes it possible for the cluster operators to have a full control over their deployments. The cluster operator can guarantee enabling only `stable` features and avoid user bypassing this enforcement . For example, currently it is not possible for the cluster operators to enable only `stable` features for `v1beta1` with an exception of a list of beta features such as resolver, object params and results. + - Pipeline and Task Authors: + - The same `pipeline` and `task` definitions might not be applied to a cluster with `enable-api-fields` set to `stable` after this change is implemented. + - The `pipeline` and `task` definitions in `v1beta1` implementing [beta features](https://github.com/tektoncd/pipeline/blob/main/docs/additional-configs.md#beta-features) will not be applied to a cluster with `enable-api-fields` set to `stable` Such pipeline with beta features used in `v1beta1` apiVersion would not result in a successful pipelineRun because of [the validation after the conversion to `v1` as the storage version](https://github.com/tektoncd/pipeline/blob/d9d2d1760fa534e2dfb16ca656cdf2d293a5900e/pkg/apis/pipeline/`v1`/taskref_validation.go#L39). Since in this example, resolver is still a beta feature, this pipeline should not have been applied to the cluster that has not set `enable-api-fields=beta` in the first place. + +### Per feature flag for _new_ api-driven features +Introduce per-feature flags for each **new** API driven feature. Each feature will have its own flag, instead of using the group API driven flag `enable-api-fields` to enable or disable all features of a stability level. Note that our proposal only takes effect on api-driven features while behavioural flags will remain the existing behaviour. + +The flag will also include the maintainer information and stability level for the feature as the source of truth. New behavioural features will also have a new per-feature flag to either enable or disable the feature. This will allow behavioural features that has values leading to behaviors in different stability levels to be turned on or off instead of depending on the stability level of features that are enabled. + +For example: + +```yaml +apiVersion: `v1` +kind: ConfigMap +metadata: + name: feature-flags +data: + # alpha: v0.53 + # + # Pipeline in pipeline has ... functionalities. + # It is disabled by default. + enable-pipeline-in-pipeline: "false" + # alpha: v0.53 + # + # Trusted artifacts has ... functionalities. + # It is disabled by default. Set it to true to enable this feature. + enable-trusted-artifacts: "true" +``` + +See [implementation plan](#implementation-plan) for more details on the PerFeatureFlag struct. + +All **new** features can only be enabled via per-feature flags. When they first get introduced as alpha, they will be disabled by default. When new features get promoted to stable, they will be enabled by default according to the following table: + +| Feature stability level | Default | +| ----------------------- | -------- | +| Stable | Enabled; Cannot be disabled | +| Beta | Disabled | +| Alpha | Disabled | + +Note that for per feature flag that has stabilized, they cannot be disabled and we will remove the feature flag after 3 releases after it has become stable. Notification will be given to users via release notes. + +The behaviour of existing `enable-api-fields` flag with per feature flag: +- Any current beta features can be enabled with `enable-api-fields` set to “beta” or “alpha”. +- Any current alpha features can be enabled with `enable-api-fields` set to “alpha”. When current alpha features are promoted to beta, they can be enabled with `enable-api-fields` set to `beta` or `alpha`. +- The existing features are not adopting this new mechanism of having one flag per feature. It will not be possible to enable existing features using per-feature flags. + - We cannot enable existing features using per-feature flags because this would not be backwards compatible. If we allowed this, the individual flag would have precedence over the grouped flag, which means that to preserve backwards compatibility, the individual flag would need to be on by default for beta features and off by default for alpha features. However, this would not be backwards compatible for cluster operators who set `enable-api-fields` to `stable`, since they would also need to override the new `beta` level per feature flags. + +- **Cluster Operators:** For new features, cluster operators will explicitly turn on or off each features in the ConfigMap. They will be able to choose to turn on a single feature. See future work for how cluster operators would communicate with their users on the list of features enabled. + +- **Task and Pipeline Authors:** Tekton Pipeline and Task authors will get to know the list of features that are turned on from their service providers. See [future work](#future-work) for the better communication from cluster operators to users for more details. + +### Sunset `enable-api-fields` after existing features stabilize +When all existing alpha and beta features have either been stabilized or removed, we will be able to remove the `enable-api-fields` flag. + +Snapshot of existent beta and alpha features as of today: +| Feature | Stability level | Individual flag | +| ----------------------------------------------------------------------------------------------------- | --------------- | ----------------------------------------------- | +| [Array Results and Array Indexing](pipelineruns.md#specifying-parameters) | beta | | +| [Object Parameters and Results](pipelineruns.md#specifying-parameters) | beta | | +| [Remote Tasks](./taskruns.md#remote-tasks) and [Remote Pipelines](./pipelineruns.md#remote-pipelines) | beta | | +| [`Provenance` field in Status](pipeline-api.md#provenance) | beta | `enable-provenance-in-status` | +| [Isolated `Step` & `Sidecar` `Workspaces`](./workspaces.md#isolated-workspaces) | beta | | +| [Bundles ](./pipelineruns.md#tekton-bundles) | alpha | `enable-tekton-oci-bundles` | +| [Hermetic Execution Mode](./hermetic.md) | alpha | | +| [Windows Scripts](./tasks.md#windows-scripts) | alpha | | +| [Debug](./debug.md) | alpha | | +| [Step and Sidecar Overrides](./taskruns.md#overriding-task-steps-and-sidecars) | alpha | | +| [Matrix](./matrix.md) | alpha | | +| [Task-level Resource Requirements](compute-resources.md#task-level-compute-resources-configuration) | alpha | | +| [Trusted Resources](./trusted-resources.md) | alpha | `trusted-resources-verification-no-match-policy`| +| [Larger Results via Sidecar Logs](#enabling-larger-results-using-sidecar-logs) | alpha | `results-from` | +| [Configure Default Resolver](./resolution.md#configuring-built-in-resolvers) | alpha | | +| [Coschedule](./affinityassistants.md) | alpha | `coschedule` | + +#### Example of introducing new features: +**i.** A single feature "pipeline-in-pipeline" being introduced in `v1`: + We will add a new feature flag "enable-pipeline-in-pipeline" to the configMap, it will have the `alpha` stability level as a new feature and will be disabled by default. + - Cluster operators will now be able to enable or disable feature "pipeline-in-pipeline". + - Tekton Pipeline authors will be informed by the cluster operators on whether the new feature is turned on or off. + +**ii.** Two more features "trusted-artifacts" and "cloud-event-controller" are introduced while the feature introduced in step i remains alpha: + Regardless of the alpha "foo" feature, we are going to add two more feature flags "bar" and "baz". + - Cluster operators will have to make the choice of turning on or off two more features flags with addition to the ones introduced in step i. + - Tekton Pipeline authors will be informed by the cluster operators on whether the new feature is turned on or off. + +## Design Evaluaion + +#### Pros +- Migrates Tekton to a state where only stable features are enabled by default in a backwards compatible way. +- Cluster operators can more granular control over features to be turned on. Previously they can only have features of a certain stability level all on or off, but now they can enable individual alpha or beta features controlled by API fields. +- Unblocks the [internal version work](https://docs.google.com/document/d/1wXQaiay18hlcuxOvl5T3BZiyOFSLN9hsr4rkhOwCz6I/edit#bookmark=id.v5nbt7ga5jny) where the validations in the internal version does not depend on apiVersions, which requires the decoupling of feature and API versioning. +- Improves consistency among existing features enabled with `enable-api-fields` set to `beta`, since the existing beta feature isolated step and sidecar workspaces(since v0.50) is validated differently. +- The validations for per feature flag will have clear source of truth of feature levels and traceability and there will not be coupling in the future. + +#### Cons +- This is adding complexity to both the implementations and the testing matrix for the newly introduced flag. + +## Alternatives +### Fix validation and migrate `enable-api-fields` back to `stable` +The following alternatives all proposes to fix validation and migrate `enable-api-fields` back to `stable`. They differ in the details of how the new value for `enable-api-fields` will be named, or whether a new flag will be introduced. + +#### Variant i. Introduce `new-stable` value for `enable-api-fields`; migrate `enable-api-fields` to `stable` in 9 months + +- A new option, `enable-api-fields` = `new-stable`, will be added to the API. This option will use the preferred validation where `stable` and `beta` features are validated the same across apiVersions. In 9 months, `new-stable` will be renamed as `stable`. +- The current behavior for `stable` `enable-api-fields` remains the same for now where it allows users to use `beta` features in `v1beta1` that have coupled feature and API versioning in `v1beta1` i.e. remote resolution. In 9 months “stable” will be renamed as `legacy-stable` and be marked as deprecated. +- The existing `beta` option for `enable-api-fields` will remain the same for now and in 9 months. +- Taking the remote resolution feature, which currently couples feature and API versioning, as an example, after the change: + - With `new-stable` in `v1beta1`, we disable resolver. + - With `legacy-stable` in `v1beta1`, we do not validate resolvers as a field needs `enable-api-fields` as beta, so they are still turned on by default. + - When `enable-api-fields` is set to `beta` in both `v1` and `v1beta1`, we are turning on resolver. + +#### Variant ii: Make `beta` feature validation changes now; migrate `enable-api-fields` to `stable` in 9 months + +Require `enable-api-fields` to be set to `beta` when using beta features, regardless of CRD API version. Immediately make this change for existing beta features. This will solve the unintended behavior that taskruns cannot be created for version coupled features with `enable-api-fields` set to `stable` during the `v1` storage swap right away. + + +#### Variant iii. New `legacy-stable` value for `enable-api-fields`; migrate `enable-api-fields` to `stable` in 9 months +Change the validation for beta features when `enable-api-fields` is set to `stable` to validate only stable features across apiVersions. Add a `legacy-stable` value to the `enable-api-fields` to keep the behavior of the current `stable` feature validations in `v1beta1` that includes the coupled features. + +For the default value of `enable-api-fields` in the long run, in 9 months, it will be switched back to `stable`, and the `beta` option will remain the same, with `legacy-stable` being removed. + +#### Cons +- The main reason that those variants are rejected is that updating existing beta features valiations in `v1beta1` is a backwards incompatible change. This will break users who are accidentally using the coupled beta features while `enable-api-fields` is set to stable. +- The migration of `enable-api-fields` back to `stable` is backwards incompatible. + +### New `enable-api-fields-new` flag; wait for all existing beta/alpha features to stabilize +This alternative proposes introducing a new flag `enable-api-fields-new` that validates new alpha and beta features and leave the existing flag `enable-api-fields` as is for existing alpha/beta features. Once all existing beta/alpha features become stable or `v1beta1` apiVersion is removed, we could remove the existing `enable-api-fields`. + +#### Pros +- This has few impacts on users who are currently using `enable-api-fields=stable` for `v1beta1`. + +#### Cons +- This will potentially adds to confusions to users for newly promoted alpha or beta features. + +### New `legacy-enable-beta-features-by-default` flag +This alternative proposes introducing a new flag `legacy-enable-beta-features-by-default ` that takes boolean value and it would be phased out in 9 months. The existing `enable-api-fields` will continue applying to existing beta features when the flag `legacy-enable-beta-features-by-default` is `true`. + +This chart would apply to the existing beta features (array results, array indexing, object params and results, and remote resolution): + +| enable-api-fields | legacy-enable-beta-features-by-default | enabled in `v1beta1`? | enabled in `v1`? | +| ------ | ------ | --- | --- | +| beta | true | yes | yes | +| beta | false | yes | yes | +| stable | true | yes | no | +| stable | false | no | no | + +For new beta features(e.g. matrix in the future): + +| enable-api-fields | legacy-enable-beta-features-by-default | enabled in `v1beta1`? | enabled in `v1`? | +| ------ | ------ | --- | --- | +| beta | true | yes | yes | +| beta | false | yes | yes | +| stable | true | no | no | +| stable | false | no | no | + +Once all existing beta features become stable, `legacy-enable-beta-features-by-default` can be removed and we will deprecate and then remove `legacy-enable-beta-features-by-default` and to use `stable` `enable-api-fields`. We would default `true` for the new flag and after 9 months, we default `enable-api-fields` to `stable` to preserve the existing behavior. + +#### Pros +- This will provide a smoother transition for switching default `enable-api-fields` value back to `stable`. + +#### Cons +- This is a backwards incompatible change. +- It is not clear what to do with alpha features for the feature promotion process when there are two `enable-api-fields` related flags, which might lead to confusions. +- This is adding complexity to both the implementations and the testing matrix for the newly introduced flag. + + +### Make validation changes only for new beta features +This alternative proposes keeping beta features on by default. It will keep the current implementations and validations for beta features in `v1` and `v1beta1` apiVersion and only make new features have the synchronized validations across different apiVersions. This also proposes to keep the default value of `enable-api-fields` as `beta`, + +#### Pros +- This is backwards compatible. +- Changes are minimal for the current codebase with probably only documentation required. +#### Cons +- We would still need to use `beta` as the default stability of features that are turned on. Some beta features are still coupled in feature and API versioning. This will not allow cluster operators to opt-in only stable features in `v1beta1`. + +### Give 9-month warning before breaking changes and default to stable +This alternative proposes the validation change for stable features in `v1beta1` to be done with giving a notice for the breaking change for 9 months. + +#### Cons: +- Since `v1beta1` only has 12 months of support period left, making a breaking change to it might have more impacts on `v1beta1` users than its worth after 9 months. + +## Implementation Plan + +For each new feature driven by api fields, a new feature flag using FeatureFlag struct will be added to the existent set of feature flags. This struct will include the stability of feature and whether it is turned on or locked to default. Currently it will only work for new features without `enable-api-fields`, see future work for more details of the option to introduce `enable-api-fields=none` to use per feature flags for new features. + + +```go +type FeatureFlag struct { + // Name of the feature flag + Name string + + // Stability level of the feature, one of "stable", "beta" or "alpha" + Stability string + + // Enabled is whether the feature is turned on + Enabled bool + + // LockToDefault indicates whether the feature is locked to default value and cannot be changed + LockToDefault bool + + // Deprecated is whether the feature is deprecated + // +optional + Deprecated bool +} +``` + +## Test Plan +### Retain existing CI end-to-end testing matrix +We plan to retain the existing testing matrix for fields at `alpha`, `beta` and `stable`. With the addition of per-feature flags for new features, it would look like: + +| |enable-api-fields|Per feature flags|Integration tests| +| -- | -- | -- | -- | +|Opt-in stable| stable|Turn OFF all per feature flags|Run all stable e2e tests.| +|Opt-in beta| beta|Turn ON all beta per-feature flags|Run all beta e2e tests.| +|Opt-in alpha| alpha|Turn ON all per-feature flags (alpha and beta)|Run all alpha e2e tests.| + + +## Additional CI tests +### Additional Test Combinations +We cannot test 2\*\*N combinations of per feature flags since that would be too time consuming. +Therefore, we try to mimic the following scenarios. + +- Say, a cluster operator has set the cluster to run Stable features. Occasionally, they may want to turn on a feature that is not yet stable. +- Similarly if the cluster is running Beta features by default (i.e. all feature flags at a beta stability level are ON and enable-api-fields is set to “stable”), cluster operators may want to turn on an individual feature at an alpha level. +- Conversely, if the cluster is running all the features (all feature flags are ON and enable api fields is set to “alpha”), they may want to turn off individual features. + + + +||enable-api-fields|Per feature flags|Integration tests| +| :- | :- | :- | :- | +|Opt-in stable|Set to stable|

All feature flags are OFF by default.

Turn ON one feature flag at a time.

|Run a small number of e2e tests against N combinations. It is not feasible to run the entire e2e test suite against N combinations.| +|Opt-in beta|Set to beta|Turn ON all beta per-feature flags by default.

Turn ON one per-feature (flag at an alpha stability level) at a time.|Run a small number of e2e tests against M combinations (where M <= N)| +|Opt-in alpha|Set to alpha|Turn ON all per-feature flags by default.

TURN OFF one feature flag at a time.|Run a small number of e2e tests against N combinations.| + +### How many tests can we run in a reasonable amount of time? +Based on a [recent PR](https://github.com/tektoncd/pipeline/pull/7032), our integration tests take between 26 mins (stable) → 33 mins (alpha). We don’t want to go beyond that. Based on [Feature flags testing matrix](https://docs.google.com/document/d/1r_MX9-mzRtdbfNQq5VC4guHb-tphA0WxWhlQIsusEEA/edit?resourcekey=0-RALry7-GaKn9i19UEaRnYg) benchmarking, the approximate time is: + +T = Npipelines\*Ntasks\*Nfeatures\*6 s + + + +Assume: + +Nfeatures = 20 (currently we have 17 api features; lets assume that at any point in time, we will likely have ~20 individual features) +Ntasks = 2 (two tasks per pipeline) + + +|Scenario|Npipelines|Nfeatures|Ntasks/pipeline|T (mins)| +| :- | :- | :- | :- | :- | +|How many pipelines can we afford to run in 30 mins?|**7**|20|2|30| +|How long would it take to run a single (same) pipeline for all the features?|1|20|2|**4**| + +### How many tests should we run against the additional tests? +- It is not feasible to run the entire e2e test suite against N features. +- Coming up with a single test that covers all common core features might be challenging. + - We probably want our tests to be able to cover + - params + - workspaces + - results + - DAG (small fan out)? + - Finally + - remote resolution + - ... +- We can afford to run a maximum of 7 pipelines per feature combination in a reasonable amount of time. +- During implementation, we can figure out the number of pipelines that allow us to cover as many areas of the API spec as possible. + +For testing out individual per feature flags, we will use unit tests for each single feature flag in end to end tests and the combinations of features that could have overlapped(needs to elaborate). The integration tests for each flag will be in place for the CI while the combinations will be in the nightly tests. + + + +- Testing of behavioural features remains the way they run as of today. + +## Future Work +- For seeking better way of communicating enabled features from cluster operators to downstream users including service providers and pipeline authors, we could consider having the CLI feature request to query all enabled features at each stability level via tkn. + - Example output: + ``` + $pipeline: tkn features list enabled + stable: + beta: , + alpha: + ``` +- For debugging purposes for Pipeline end users, we could have included all enabled feature flags in the output yaml of PipelineRuns and TaskRuns, for example one way is to keep it in annotations. This would be beneficial to users who do not have access to the configMap or the controller logs. +- For preserving the ability of easily enabling all features of a stability level, we could provide some script that turns on all alpha/beta features by modifying the group of per feature flags. + +- To avoid the possible errors from the manual process of documenting feature stability, we can automate the process by using the stability level field of the Per Feature Flags struct as the source of truth. For example, a script could be added to ./hack for updating the ./config/config-feature-flags.yaml. + +- Per feature flag with new value for enable-api-fields `none` +This is similar to the proposed solution to introduce per feature flag and migrate Tekton to opt-in onl stable features in a backwards compatible way, except that we are introducing a new value `none` for `enable-api-fields` and it must be used for per feature flags for existing features. All new features enabled via per feature flags will be off by default, regardless of the value of “enable-api-fields”. + - Any current beta features can be enabled with “enable-api-fields” set to “beta” or “alpha”. + - If “enable-api-fields” is set to “none”, you can also enable them with per-feature flags. These are off by default. + - Any current alpha features can be enabled with “enable-api-fields” set to “alpha”. + - If “enable-api-fields” is set to “none”, you can also enable them with per-feature flags. These are off by default. + - + When promoting an alpha feature to beta, it can be enabled with enable-api-fields set to “alpha”, “beta”, or “none”. + Disallowing it when enable-api-fields is set to “beta” wouldn’t actually help us phase out the flag more quickly, as we’d still need to wait until the feature is stabilized or removed. + ## References - [TEP-0033](https://github.com/tektoncd/community/blo9b/main/teps/0033-tekton-feature-gates.md) - [Decoupling API versioning and Feature versioning for features turned on by default](https://github.com/tektoncd/pipeline/issues/6592) - [Versioned validation of referenced Pipelines/Tasks](https://github.com/tektoncd/pipeline/issues/6616) +- [Default enable-api-fields value for opt-in features once feature and API versioning are decoupled](https://github.com/tektoncd/pipeline/issues/6948) diff --git a/teps/README.md b/teps/README.md index 2674766c5..57c9e7284 100644 --- a/teps/README.md +++ b/teps/README.md @@ -127,6 +127,6 @@ This is the complete list of Tekton TEPs: |[TEP-0135](0135-coscheduling-pipelinerun-pods.md) | Coscheduling PipelineRun pods | implemented | 2023-07-27 | |[TEP-0136](0136-capture-traces-for-task-pod-events.md) | Capture traces for task pod events | implementable | 2023-07-23 | |[TEP-0137](0137-cloudevents-controller.md) | CloudEvents controller | implementable | 2023-07-31 | -|[TEP-0138](0138-decouple-api-and-feature-versioning.md) | Decouple api and feature versioning | proposed | 2023-07-27 | +|[TEP-0138](0138-decouple-api-and-feature-versioning.md) | Decouple API and feature versioning | implementable | 2023-08-23 | |[TEP-0140](0140-producing-results-in-matrix.md) | Producing Results in Matrix | implementable | 2023-08-21 | |[TEP-0141](0141-platform-context-variables.md) | Platform Context Variables | proposed | 2023-08-21 |