Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration testing between major cosmos repositories #12228

Closed
faddat opened this issue Jun 11, 2022 · 7 comments
Closed

Integration testing between major cosmos repositories #12228

faddat opened this issue Jun 11, 2022 · 7 comments

Comments

@faddat
Copy link
Contributor

faddat commented Jun 11, 2022

Hi I think it's perfectly possible that I may have stumbled into my first ADR.

I don't know the process, and I want to begin by describing a set of issues that has been very challenging for us as we develop Craft Economy. Also, since to my knowledge we used sdk 46 first, (others now do iirc) -- we had the earliest exposure to this set of issues. In fact if one used it later, its possible that you wouldn't notice nearly as much.

Problem definition

The following things are related bigly imo:

1) lack of integration testing between 
* sdk
* cw
* IBC-go

2) lack of a formal process to start and torture testnets at every point release of every library in the ecosystem

3) lack of an ignite template or other template in the sdk

These events are driving one another to happen

We can absolutely automate celestia scale testnets

And if we glance about that’s been the clear finding of every party out there

Above is an excerpt from a conversation that I had today, words are entirely mine. The core idea here is that since IBC left this repository, integration between the latest version of the cosmos SDK as expressed by the main branch, the latest version of CW, and the latest version of IBC go, is not happening. More recently, we can see that Tendermint may be involved in this complex situation. The team working on tendermint consensus has commented to me that the cosmos sdk is using old tendermint and that performance is negatively impacted by that. The sdk team is concerned about (what seem to me to be) very real issues in tm 35's peer to peer networking.

Personally I favor monorepos, and we could potentially consider that here (tm+sdk+ibc-go+cw as "the base for cosmos bockchain aplications" but I'd encourage everyone to rule that out unless all of the teams read this and suddenly think wow that's a super great idea and in that case I can help to make that happen but I am perfectly aware that we already have a pretty busy I system and I'm pretty busy repository across all four of those repositories so while I usually use monorepos to contain a single product the way that I did on craft https://github.com/notional-labs/craft let's just say that I completely understand if people would not want to do something like that because these are definitely distinct products but then those distinct products are combined to create a commonly used base.

So, here's a discussion about removing Ignite cli from the sdk-- but-- in it I'd mentioned that there were other alternatives.

See, craft led to forking the ignite cli: https://github.com/notional-labs/tinyport (note incomplete fork that doesn't yet fully express goals of fork)

The reason is that there was tremendous demand from every direction for a base recipe for a chain that included:

  • sdk 46
  • tendermint 35
  • ibc-go v3 (soon v4)
  • cosmwasm

And... we did the work -- and reworked the work towards that goal when middleware changes landed:

CW:

ibc-go:

And Gaia's maintainer praised all this, calling it "drive by pull requests" tho I might be misinterperting a put-down for praise. And yes it's relevant here.

So OK it's a lot. Actually the reality is that's not all of it I just didn't feel like spending forever pasting links into here. So they integration work we know that it's challenging. How can we automate as much as possible of that integration work or show people where breakages are likely to occur?

Well, that's why we need a known-good template that lives in this repository, and doesn't use something like cosmoscmd to layer in harmful abstraction.

I think that we (craft) knew about the tm bug much sooner than anyone else because we took simapp and reworked it into a fully functional chain and got it rolling.

Solutions

A few things need to happen to get ahead of integration issues.

Templating

Better than removing the ignite CLI from this repository would definitely definitely be allowing it to support multiple templates so that developers can lay out and test a completely up-to-date chain. But it really does need to eschew the cosmoscmd route. This template could be a part of the SDK's ci system, and PR's could be blocked if they stop the template from making a. working change so that we always have a template that to the best of everyone's knowledge is "known good at the tip of main"

CI

Here I prototyped a neat test against mainnet gaia:

Here's the test itself:

# This workflow state syncs a live Gaia automatically with each commit.  It is intended to be used for:
# * Monitoring functionality
# * Measuring "time to live gaia" across different databases

name: State Sync Gaia
on:
  pull_request:
  push:


jobs:
  build:
    runs-on: ubuntu-latest
    container: ghcr.io/faddat/gaia
    continue-on-error: true
    timeout-minutes: 10
    env:
      INTERVAL: 1000
      GAIAD_STATESYNC_ENABLE: true
      GAIAD_P2P_MAX_NUM_OUTBOUND_PEERS: 200
      GAIAD_P2P_SEEDS: "bf8328b66dceb4987e5cd94430af66045e59899f@public-seed.cosmos.vtwit.com:26656,[email protected]:26656,[email protected]:26656,ba3bacc714817218562f743178228f23678b2873@public-seed-node.cosmoshub.certus.one:26656,[email protected]:26656,[email protected]:26656"
      GAIAD_STATESYNC_RPC_SERVERS: "https://cosmoshub.validator.network:443,https://cosmoshub.validator.network:443"
      GOPATH: /go




    strategy: 
      matrix:
        database: [rocksdb, boltdb, badgerdb, goleveldb, cleveldb]

    steps:
      - run: git config --global --add safe.directory /__w/tendermint/tm-db
      - uses: actions/checkout@v3
      - name: state sync gaia with ${{ matrix.database }}
        run: |
          cd ..
          git clone https://github.com/cosmos/gaia --branch release/v7.0.x
          cd gaia
          go mod edit -replace github.com/tendermint/tm-db=../tm-db
          go mod tidy -go=1.18
          go install -ldflags '-w -s -X github.com/cosmos/cosmos-sdk/types.DBBackend=${{ matrix.database }}' -tags ${{ matrix.database }} ./...
          export LATEST_HEIGHT=$(curl -s https://cosmoshub.validator.network/block | jq -r .result.block.header.height);
          export BLOCK_HEIGHT=$(($LATEST_HEIGHT-$INTERVAL)) 
          export TRUST_HASH=$(curl -s "https://cosmoshub.validator.network/block?height=$BLOCK_HEIGHT" | jq -r .result.block_id.hash)
          export GAIAD_STATESYNC_TRUST_HEIGHT=$BLOCK_HEIGHT
          export GAIAD_STATESYNC_TRUST_HASH=$TRUST_HASH
          export PATH=$PATH:/go/bin
          echo "TRUST HEIGHT: $BLOCK_HEIGHT"
          echo "TRUST HASH: $TRUST_HASH"
          gaiad init gaia-matrix
          cp /genesis.json ~/.gaia/config/genesis.json
          gaiad start --x-crisis-skip-assert-invariants --db_backend ${{ matrix.database }}

It takes tm-db, replaces it with my customized branch, and then state syncs with each database supported by tm-db.

But this can be applied more widely-- in ci, we could

  • test sdk's main against ibc-go's main
  • test tendermint's main against the sdk's main
  • test cw's main against ibc-go's main and the sdk's main

... and so on.

We should have a system that tells us when the mains are not compatible we are all aware that sometimes they won't be and that's not a problem. But sometimes we don't know and especially new developers don't know. That is a problem.

Also, certain ranges of versions are supposed to be mutually compatible, so with point releases, it should be possible to specify a range where it should be compatible.

big goal

I have a feeling that we wouldn't have actually gotten to this place if we were more mutually aware of one another's work. Of course the communications deluge is very hard and we are working on very interesting and special software that's attracted a lot of people and companies and projects and teams. We now have more organizations than can possibly be expected to coordinate with one another smoothly working on the software together. So my proposal here is basically to automate as much of that as possible and to make it easier for a chain to get to mainnet much newer code, or even testnets for that matter-- with much newer code so that we can find these bugs more rapidly.

@tac0turtle
Copy link
Member

Thanks for the well written issue. I don't think it is possible to test against main/master branches because there are many breaking changes that aren't accounted for. I agree there needs to be better cross repo testing, but on released versions, once an RC is released then we can work towards aligning all repos to the same version.

@faddat
Copy link
Contributor Author

faddat commented Jul 7, 2022

This is more to inform devs of the breakages against other major modules, not so much to enforce a lack of breakages. Right now we don't have a warning system for second order effects.

@robert-zaremba
Copy link
Collaborator

I agree - having a more ongoing integrations tests with "ongoing" releases will help to push compatibility with latest releases.

@faddat faddat mentioned this issue Jul 16, 2022
9 tasks
@charleenfei
Copy link
Contributor

i think in many cases there is possibly sufficient testing against other major versions but this is not publicly documented anywhere as its done in the context of QA done by the teams before the release is put out. we've had this issue open for a while on the ibc-go repo to try to remedy this visibility problem, @crodriguezvega has kindly agreed to take it on. the ibc-go repo has also got our first e2e integration test with pinned docker versions live on the ibc-go repo as of this week basically, these are both ways that maybe we help support this versioning/visibility problem in lieu of pending work on the sdk side for these issues.

@jackzampolin
Copy link
Member

I will also mention that ibctest kinda does this type of compatibility testing that you want.

https://github.com/strangelove-ventures/ibctest

@faddat faddat mentioned this issue Dec 8, 2022
9 tasks
@faddat faddat changed the title Testing every major module against every other major module at point releases (or commits to main) Integration testing between major cosmos repositories May 24, 2023
@tac0turtle
Copy link
Member

we started doing this in the knightly repo. we hope to have something soon here

@faddat
Copy link
Contributor Author

faddat commented Jul 10, 2024

Where is that repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants