From d044f21737896a43f8301acf054790f829a3bb97 Mon Sep 17 00:00:00 2001 From: Marius Poke Date: Wed, 15 Nov 2023 17:06:41 +0100 Subject: [PATCH] docs: adding some info on ICS processes (#1429) * moving docs/old/testing.md to TESTING.md * adding release process * adding info on ICS releases * updating contributing guidelines * apply review suggestions * Update RELEASES.md Co-authored-by: insumity * Update RELEASES.md Co-authored-by: insumity * adding note about editor for unclog * add some context on breaking changes * adding context on state-compatibility * Update STATE-COMPATIBILITY.md Co-authored-by: insumity --------- Co-authored-by: insumity --- CONTRIBUTING.md | 72 ++++++++-- README.md | 2 +- RELEASES.md | 97 ++++++++++++++ RELEASE_PROCESS.md | 119 +++++++++++++++++ STATE-COMPATIBILITY.md | 212 ++++++++++++++++++++++++++++++ docs/old/testing.md => TESTING.md | 0 6 files changed, 487 insertions(+), 15 deletions(-) create mode 100644 RELEASES.md create mode 100644 RELEASE_PROCESS.md create mode 100644 STATE-COMPATIBILITY.md rename docs/old/testing.md => TESTING.md (100%) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8975d37269..90456c842f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -14,11 +14,10 @@ - [Pull Request Templates](#pull-request-templates) - [Requesting Reviews](#requesting-reviews) - [Updating Documentation](#updating-documentation) + - [Changelog](#changelog) - [Dependencies](#dependencies) - [Protobuf](#protobuf) - [Branching Model and Release](#branching-model-and-release) - - [Semantic Versioning](#semantic-versioning) - - [Backwards Compatibility](#backwards-compatibility) - [PR Targeting](#pr-targeting) Thank you for considering making contributions to the Interchain Security (ICS) repository! 🎉👍 @@ -216,6 +215,63 @@ items. In addition, use the following review explanations: If you open a PR in ICS, it is mandatory to update the relevant documentation in `/docs`. +### Changelog + +To manage and generate our changelog, we currently use [unclog](https://github.com/informalsystems/unclog). + +Every PR with types `fix`, `feat`, `deps`, and `refactor` should include a file +`.changelog/unreleased/${section}/[${component}/]${pr-number}-${short-description}.md`, +where: + +- `section` is one of + `dependencies`, `improvements`, `features`, `bug-fixes`, `state-breaking`, `api-breaking`, + and _**if multiple apply, create multiple files**_, + not necessarily with the same `short-description` or content; +- `pr-number` is the PR number; +- `short-description` is a short (4 to 6 word), hyphen separated description of the change; +- `component` is used for changes that affect one of the components defined in the [config](.changelog/config.toml), e.g., `provider`, `consumer`. + +For examples, see the [.changelog](.changelog) folder. + +Use `unclog` to add a changelog entry in `.changelog` (check the [requirements](https://github.com/informalsystems/unclog#requirements) first): +```bash +# add a general entry +unclog add \ + -i "${pr-number}-${short-description}" \ + -p "${pr-number}" \ + -s "${section}" \ + -m "${description}" \ + +# add a entry to a component +unclog add + -i "${pr-number}-${short-description}" \ + -p "${pr-number}" \ + -c "${component}" \ + -s "${section}" \ + -m "${description}" \ +``` +where `${description}` is a detailed description of the changelog entry. + +For example, +```bash +# add an entry for bumping IBC to v7.2.0 +unclog add -i "1196-bump-ibc" -p 1196 -s dependencies -m "Bump [ibc-go](https://github.com/cosmos/ibc-go) to [v7.2.0](https://github.com/cosmos/ibc-go/releases/tag/v7.2.0)" + +# add an entry for changing the consumer module; +# note that the entry is added to both state-breaking and features sections +unclog add -i "1024-jail-throttling-v2" -p 1024 -c consumer -s state-breaking -m "Add the consumer-side changes for jail throttling with retries (cf. ADR 008)." +unclog add -i "1024-jail-throttling-v2" -p 1024 -c consumer -s features -m "Add the consumer-side changes for jail throttling with retries (cf. ADR 008)." +``` + +**Note:** `unclog add` requires an editor. This can be set either by configuring +an `$EDITOR` environment variable or by manually specify an editor binary path +via the `--editor` flag. + +**Note:** Changelog entries should answer the question: "what is important about this +change for users to know?" or "what problem does this solve for users?". It +should not simply be a reiteration of the title of the associated PR, unless the +title of the PR _very_ clearly explains the benefit of a change to a user. + ## Dependencies We use [Go Modules](https://github.com/golang/go/wiki/Modules) to manage @@ -243,18 +299,6 @@ To generate the protobuf stubs, you can run `make proto-gen`. ICS adheres to the [trunk based development branching model](https://trunkbaseddevelopment.com/). User branches should start with a user name, example: `{moniker}/{issue#}-branch-name`. -### Semantic Versioning - -ICS follows [semantic versioning](https://semver.org), but with the following deviations (similar to [IBC-Go](https://github.com/cosmos/ibc-go/blob/main/RELEASES.md)): - -- A library API breaking change will result in an increase of the MAJOR version number (X.y.z | x > 0). -- A state breaking change (change requiring coordinated upgrade and/or state migration for the consumer, the provider, or both) will result in an increase of the MINOR version number (x.Y.z | x > 0). -- Any other changes (including node API breaking changes) will result in an increase of the PATCH version number (x.y.Z | x > 0). - -### Backwards Compatibility - -A MAJOR version of ICS will always be backwards compatible with the previous MAJOR version of ICS. Versions before that are not supported. For example, a provider chain could run ICS at version 3.4.5, and would be compatible with consumers running ICS at 2.0.0, 2.1.2, 3.2.1, but not 1.2.7. - ### PR Targeting Ensure that you base and target your PRs on either `main` or a feature branch. diff --git a/README.md b/README.md index bb6da8be76..0b9ea0e5c9 100644 --- a/README.md +++ b/README.md @@ -43,7 +43,7 @@ Inspect the [Makefile](./Makefile) if curious. ## Testing -See [testing docs](./docs/old/testing.md). +See [testing docs](./TESTING.md). ## Learn more diff --git a/RELEASES.md b/RELEASES.md new file mode 100644 index 0000000000..92662b3318 --- /dev/null +++ b/RELEASES.md @@ -0,0 +1,97 @@ +# Releases + +- [Releases](#releases) + - [Semantic Versioning](#semantic-versioning) + - [Breaking Changes](#breaking-changes) + - [Release Cycle](#release-cycle) + - [Stable Release Policy](#stable-release-policy) + - [Version Matrix](#version-matrix) + - [Backwards Compatibility](#backwards-compatibility) + +## Semantic Versioning + +Interchain Security (ICS) follows [semantic versioning](https://semver.org), but with the following deviations (similar to [IBC-Go](https://github.com/cosmos/ibc-go/blob/main/RELEASES.md)): + +- A library API breaking change will result in an increase of the MAJOR version number (X.y.z | X > 0). +- A state-machine breaking change will result in an increase of the MINOR version number (x.Y.z | x > 0). +- Any other changes (including node API breaking changes) will result in an increase of the PATCH version number (x.y.Z | x > 0). + +### Breaking Changes + +A change is considered to be ***library API breaking*** if it modifies the integration of ICS on either consumer or provider chains (i.e., it changes the way ICS is used as a library). +Note that bumping the major version of [Cosmos SDK](https://github.com/cosmos/cosmos-sdk) or [IBC](https://github.com/cosmos/ibc-go) will be considered as a library API breaking change. + +A change is considered to be ***state-machine breaking*** if it requires a coordinated upgrade and/or state migration for either consumer or provider chains in order to preserve [state compatibility](./STATE-COMPATIBILITY.md). +Note that when bumping the dependencies of [Cosmos SDK](https://github.com/cosmos/cosmos-sdk) and [IBC](https://github.com/cosmos/ibc-go) we will only treat patch releases as non state-machine breaking. + +A change is considered to be ***node API breaking*** if it modifies the API provided by a node of either consumer or provider chains. +This includes events, queries, CLI interfaces. + +## Release Cycle + +ICS follows a traditional release cycle involving release candidates (RCs) releases before finalizing a new version. +The stable release guarantees do not go into affect until a final release is performed. + +❗***It is never advisable to use a non-final release in production.*** + +Final releases should contain little to no changes in comparison to the latest RC. + +A release should not be finalized until the development team and the external community have done sufficient integration tests on the targeted release. + +## Stable Release Policy + +The beginning of a new major release series is marked by the release of a new major version. +A major release series is comprised of all minor and patch releases made under the same major version number. +The series continues to receive bug fixes (released as minor or patch releases) until it reaches end of life. +The date when a major release series reaches end of life is determined by one of the following methods: + +- If there is no known chain using a major release series, then it reached end of life. + This is a temporary policy that is possible due to the relatively low number of consumer chains. +- If the next major release is made within the first 6 months, then the end of + life date of the major release series is 1 year after its initial release. +- If the next major release is made 6 months after the initial release, then the + end of life date of the major release series is 6 months after the release date + of the next major release. + +Only the following major release series have a stable release status. +All missing minor release versions have been discontinued. + +| Release | End of Life Date | +|---------|------------------| +| `v1.2.x` | February 21, 2024 | +| `v2.0.x` | June 09, 2024 | +| `v2.1.x-provider-lsm` | June 09, 2024 | +| `v2.3.x-provider-lsm` | June 09, 2024 | +| `v3.1.x` | July 10, 2024 | +| `v3.2.x` | July 10, 2024 | + +**Note**: As of [Gaia v12.0.0](https://github.com/cosmos/gaia/releases/tag/v12.0.0), +the Cosmos Hub uses a fork of Cosmos SDK ([v0.45.16-ics-lsm](https://github.com/cosmos/cosmos-sdk/releases/tag/v0.45.16-ics-lsm)) +that contains the Liquid Staking Module (LSM). +This means the Cosmos Hub requires a fork of ICS. +This fork is maintained by the development team and released using the `-lsm` prefix. +As soon as the Cosmos Hub uses mainline Cosmos SDK, the `-lsm` releases will reach end of life. + +## Version Matrix + +Versions of Golang, IBC, Cosmos SDK and CometBFT used by ICS in the currently active releases: + +| ICS | Golang | IBC | Cosmos SDK | CometBFT | Note | +|-----|--------|-----|------------|----------|------| +| [v1.2.0-multiden](https://github.com/cosmos/interchain-security/releases/tag/v1.2.0-multiden) | 1.18 | v4.2.0 | v0.45.15-ics | v0.34.27 | Consumer only | +| [v2.0.0](https://github.com/cosmos/interchain-security/releases/tag/v2.0.0) | 1.19 | v4.4.2 | v0.45.15-ics | v0.34.28 | +| [v2.1.0-provider-lsm](https://github.com/cosmos/interchain-security/releases/tag/v2.1.0-provider-lsm) | 1.19 | v4.4.2 | v0.45.16-ics-lsm | v0.34.28 | Provider only (Cosmos Hub specific) | +| [v2.3.0-provider-lsm](https://github.com/cosmos/interchain-security/releases/tag/v2.3.0-provider-lsm) | 1.19 | v4.4.2 | v0.45.16-ics-lsm | v0.34.28 | Provider only (Cosmos Hub specific) | +| [v3.1.0](https://github.com/cosmos/interchain-security/releases/tag/v3.1.0) | 1.20 | v7.1.0 | v0.47.3 | v0.37.2 | + +### Backwards Compatibility + +A MAJOR version of ICS will always be backwards compatible with the previous MAJOR version of ICS. + +The following table indicates the compatibility of currently active releases: + +| consumer | provider | `v2.0.0` | `v2.1.0-provider-lsm` | `v2.3.0-provider-lsm` | `v3.1.0` | +|----------|----------|--------:|----------------------:|----------------------:|---------:| +| `v1.2.0-multiden` || ✅ | ✅ | ✅ | ✅ | +| `v2.0.0` || ✅ | ✅ | ✅ | ✅ | +| `v3.1.0` || ✅ | ✅ | ✅ | ✅ | diff --git a/RELEASE_PROCESS.md b/RELEASE_PROCESS.md new file mode 100644 index 0000000000..ef0863ed4f --- /dev/null +++ b/RELEASE_PROCESS.md @@ -0,0 +1,119 @@ +# Release Process + +- [Release Process](#release-process) + - [Changelog](#changelog) + - [Creating a new release branch](#creating-a-new-release-branch) + - [Cutting a new release](#cutting-a-new-release) + - [Update the changelog on main](#update-the-changelog-on-main) + - [Tagging Procedure](#tagging-procedure) + + +This document outlines the release process for Interchain Security (ICS). + +For details on ICS releases, see [RELEASES.md](./RELEASES.md). + +## Changelog + +For PRs that are changing production code, please add a changelog entry in `.changelog` (for details, see [contributing guidelines](./CONTRIBUTING.md#changelog)). + +To manage and generate the changelog on ICS, we currently use [unclog](https://github.com/informalsystems/unclog). +Read the [README.md](https://github.com/informalsystems/unclog#readme) in the unclog repo for instruction on how to install and use unclog. + +### Creating a new release branch + +Unreleased changes are collected on `main` in `.changelog/unreleased/`. +However, `.changelog/` on `main` contains also existing releases (e.g., `v3.2.0`). +Thus, when creating a new release branch (e.g., `release/v3.3.x`), the following steps are necessary: + +- create a new release branch, e.g., `release/v3.3.x` + ```bash + git checkout main + git pull + git checkout -b release/v3.3.x + ``` +- delete all the sub-folders in `.changelog/` except `unreleased/` + ```bash + find ./.changelog -mindepth 1 -maxdepth 1 -type d -not -name unreleased | xargs rm -r + ``` +- replace the content of `.changelog/epilogue.md` with the following text + ```md + ## Previous Versions + + [CHANGELOG of previous versions](https://github.com/cosmos/interchain-security/blob/main/CHANGELOG.md) + ``` +- push the release branch upstream + ```bash + git push + ``` + +### Cutting a new release + +Before cutting a _**release candidate**_ (e.g., `v3.3.0-rc0`), the following steps are necessary: + +- move to the release branch, e.g., `release/v3.3.x` + ```bash + git checkout release/v3.3.x + ``` +- move all entries in ".changelog/unreleased" to the release version, e.g., `v3.3.0`, i.e., + ```bash + unclog release v3.3.0 + ``` +- update `CHANGELOG.md`, i.e., + ```bash + unclog build > CHANGELOG.md + ``` +- open a PR (from this new created branch) against the release branch, e.g., `release/v3.3.x` + +Now you can cut the release candidate, e.g., v3.3.0-rc0 (follow the [Tagging Procedure](#tagging-procedure)). + +### Update the changelog on main + +Once the **final release** is cut, the new changelog section must be added to main: + +- checkout a new branch from the `main` branch, i.e., + ```bash + git checkout main + git pull + git checkout -b /backport_changelog + ``` +- bring the new changelog section from the release branch into this branch, e.g., + ```bash + git checkout release/v3.3.x .changelog/v3.3.0 + ``` +- remove duplicate entries that are both in `.changelog/unreleased/` and the new changelog section, e.g., `.changelog/v3.3.0` +- update `CHANGELOG.md`, i.e., + ```bash + unclog build > CHANGELOG.md + ``` +- open a PR (from this new created branch) against `main` + +## Tagging Procedure + +**Important**: _**Always create tags from your local machine**_ since all release +tags should be [signed and annotated](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits). + +The following steps are the default for tagging a specific branch commit using git +on your local machine. Usually, release branches are labeled `release/v*`: + +Ensure you have checked out the commit you wish to tag and then do: +```bash +git pull --tags +git tag -s v3.2.0 -m v3.2.0 +git push origin v3.2.0 +``` + +To re-create a tag: +```bash +# delete a tag locally +git tag -d v3.2.0 + +# push the deletion to the remote +git push --delete origin v3.2.0 + +# redo the tagging +git tag -s v3.2.0 -m v3.2.0 +git push origin v3.2.0 +``` + +For final releases, once the tag is created, use the GitHub interface to create a release. +Note that this is not necessary for release candidates. \ No newline at end of file diff --git a/STATE-COMPATIBILITY.md b/STATE-COMPATIBILITY.md new file mode 100644 index 0000000000..1819aaf981 --- /dev/null +++ b/STATE-COMPATIBILITY.md @@ -0,0 +1,212 @@ +# State-Compatibility + +- [State-Compatibility](#state-compatibility) + - [Scope](#scope) + - [Validating State-Compatibility](#validating-state-compatibility) + - [AppHash](#apphash) + - [LastResultsHash](#lastresultshash) + - [Major Sources of State-incompatibility](#major-sources-of-state-incompatibility) + - [Creating Additional State](#creating-additional-state) + - [Changing Proto Field Definitions](#changing-proto-field-definitions) + - [Returning Different Errors Given Same Input](#returning-different-errors-given-same-input) + - [Variability in Gas Usage](#variability-in-gas-usage) + - [Secondary Limitations To Keep In Mind](#secondary-limitations-to-keep-in-mind) + - [Network Requests to External Services](#network-requests-to-external-services) + - [Randomness](#randomness) + - [Parallelism and Shared State](#parallelism-and-shared-state) + - [Hardware Errors](#hardware-errors) + + +It is critical for the patch releases to be state-machine compatible with prior releases in the same minor version. +For example, v3.2.1 must be state-machine compatible with v3.2.0. + +This is to ensure **determinism**, i.e., given the same input, the nodes will always produce the same output. + +State-incompatibility is allowed for major upgrades because all nodes in either the consumer or provider networks +perform it at the same time. Therefore, after the upgrade, the nodes continue functioning in a deterministic way. + +## Scope + +The state-machine scope includes the following areas: + +- All ICS messages including: + - Every msg's ValidateBasic method + - Every msg's MsgServer method + - Net gas usage, in all execution paths + - Error result returned + - State changes (namely every store write) +- AnteHandlers in "DeliverTx" mode +- All `BeginBlock`/`EndBlock` logic + +The following are **NOT** in the state-machine scope: + +- Events +- Queries that are not whitelisted +- CLI interfaces + +## Validating State-Compatibility + +CometBFT ensures state compatibility by validating a number of hashes that can be found [here](https://github.com/cometbft/cometbft/blob/9f76e8da150414ce73eed2c4f248947b657c7587/proto/tendermint/types/types.proto#L70-L77). + +`AppHash` and `LastResultsHash` are the common sources of problems stemming from our work. +To avoid these problems, let's now examine how these hashes work. + +### AppHash + +**Note:** The following explanation is simplified for clarity. + +An app hash is a hash of hashes of every store's Merkle root that is returned by ABCI's `Commit()` from Cosmos-SDK to CometBFT. +Cosmos-SDK [takes an app hash of the application state](https://github.com/osmosis-labs/cosmos-sdk/blob/5c9a51c277d067e0ec5cf48df30a85fae95bcd14/store/rootmulti/store.go#L430), and propagates it to CometBFT which, in turn, compares it to the app hash of the rest of the network. +Then, CometBFT ensures that the app hash of the local node matches the app hash of the network. + +### LastResultsHash + +`LastResultsHash` is the root hash of all results from the transactions in the block returned by the ABCI's `DeliverTx`. + +The [`LastResultsHash`](https://github.com/cometbft/cometbft/blob/v0.34.29/types/results.go#L47-L54) +in CometBFT [v0.34.29](https://github.com/cometbft/cometbft/releases/tag/v0.34.29) contains: + +1. Tx `GasWanted` + +2. Tx `GasUsed` + > `GasUsed` being Merkelized means that we cannot freely reorder methods that consume gas. + > We should also be careful of modifying any validation logic since changing the + > locations where we error or pass might affect transaction gas usage. + > + > There are plans to remove this field from being Merkelized in a subsequent CometBFT release, + > at which point we will have more flexibility in reordering operations / erroring. + +3. Tx response `Data` + + > The `Data` field includes the proto marshalled Tx response. Therefore, we cannot + > change these in patch releases. + +4. Tx response `Code` + + > This is an error code that is returned by the transaction flow. In the case of + > success, it is `0`. On a general error, it is `1`. Additionally, each module + > defines its custom error codes. + > For example, `x/provider` currently has [these error codes](./x/ccv/provider/types/errors.go) defined. + > + > As a result, it is important to avoid changing custom error codes or change + > the semantics of what is valid logic in transaction flows. + +Note that all of the above stem from `DeliverTx` execution path, which handles: + +- `AnteHandler`'s marked as deliver tx +- `msg.ValidateBasic` +- execution of a message from the message server + +The `DeliverTx` return back to the CometBFT is defined [here](https://github.com/cosmos/cosmos-sdk/blob/d11196aad04e57812dbc5ac6248d35375e6603af/baseapp/abci.go#L293-L303). + +## Major Sources of State-incompatibility + +### Creating Additional State + +By erroneously creating database entries that exist in Version A but not in +Version B, we can cause the app hash to differ across nodes running +these versions in the network. Therefore, this must be avoided. + +### Changing Proto Field Definitions + +For example, if we change a field that gets persisted to the database, +the app hash will differ across nodes running these versions in the network. + +Additionally, this affects `LastResultsHash` because it contains a `Data` field that is a marshaled proto message. + +### Returning Different Errors Given Same Input + +```go +// Version A +func (sk Keeper) validateAmount(ctx context.Context, amount math.Int) error { + if amount.IsNegative() { + return sdkerrors.Wrap(sdkerrors.ErrInvalidRequest, "amount must be positive or zero") + } + return nil +} +``` + +```go +// Version B +func (sk Keeper) validateAmount(ctx context.Context, amount math.Int) error { + if amount.IsNegative() || amount.IsZero() { + return sdkerrors.Wrap(sdkerrors.ErrInvalidRequest, "amount must be positive") + } + return nil +} +``` + +Note that now an amount of 0 can be valid in "Version A", but invalid in "Version B". +Therefore, if some nodes are running "Version A" and others are running "Version B", +the final app hash might not be deterministic. + +Additionally, a different error message does not matter because it +is not included in any hash. However, an error code `sdkerrors.ErrInvalidRequest` does. +It translates to the `Code` field in the `LastResultsHash` and participates in +its validation. + +### Variability in Gas Usage + +For transaction flows (or any other flow that consumes gas), it is important +that the gas usage is deterministic. + +Currently, gas usage is being Merklized in the state. As a result, reordering functions +becomes risky. + +Suppose my gas limit is 2000 and 1600 is used up before entering +`someInternalMethod`. Consider the following: + +```go +func someInternalMethod(ctx sdk.Context) { + object1 := readOnlyFunction1(ctx) # consumes 1000 gas + object2 := readOnlyFunction2(ctx) # consumes 500 gas + doStuff(ctx, object1, object2) +} +``` + +- It will run out of gas with `gasUsed = 2600` where 2600 getting merkelized +into the tx results. + +```go +func someInternalMethod(ctx sdk.Context) { + object2 := readOnlyFunction2(ctx) # consumes 500 gas + object1 := readOnlyFunction1(ctx) # consumes 1000 gas + doStuff(ctx, object1, object2) +} +``` + +- It will run out of gas with `gasUsed = 2100` where 2100 is getting merkelized +into the tx results. + +Therefore, we introduced a state-incompatibility by merklezing diverging gas +usage. + +## Secondary Limitations To Keep In Mind + +### Network Requests to External Services + +It is critical to avoid performing network requests to external services +since it is common for services to be unavailable or rate-limit. + +Imagine a service that returns exchange rates when clients query its HTTP endpoint. +This service might experience downtime or be restricted in some geographical areas. + +As a result, nodes may get diverging responses where some +get successful responses while others errors, leading to state breakage. + +### Randomness + +Randomness cannot be used in the state machine, as the state machine must be deterministic. + +**Note:** Iteration order over `map`s is non-deterministic, so to be deterministic +you must gather the keys, and sort them all prior to iterating over all values. + +### Parallelism and Shared State + +Threads and Goroutines might preempt differently in different hardware. Therefore, +they should be avoided for the sake of determinism. Additionally, it is hard +to predict when the multi-threaded state can be updated. + +### Hardware Errors + +This is out of the developer's control but is mentioned for completeness. \ No newline at end of file diff --git a/docs/old/testing.md b/TESTING.md similarity index 100% rename from docs/old/testing.md rename to TESTING.md