Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add testing improvements ADR (adr-011) #1197

Merged
merged 5 commits into from
Aug 24, 2023
Merged

Conversation

MSalopek
Copy link
Contributor

@MSalopek MSalopek commented Aug 11, 2023

Description

Closes: #1165

References: cosmos/gaia#2427

Add long-term QA/testing goals ADR:

  • matrix tests
  • cometmock tests
  • e2e test refactoring
  • MBT considerations

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct docs: prefix in the PR title
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • Confirmed the correct docs: prefix in the PR title
  • Confirmed all author checklist items have been addressed
  • Confirmed that this PR only changes documentation
  • Reviewed content for consistency
  • Reviewed content for spelling and grammar
  • Tested instructions (if applicable)
  • Checked that the documentation website can be built and deployed successfully (run make build-docs)

@MSalopek MSalopek self-assigned this Aug 11, 2023
@MSalopek MSalopek requested a review from a team as a code owner August 11, 2023 13:18
Copy link
Contributor

@shaspitz shaspitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great writeup and points to consider

docs/docs/adrs/adr-011-improving-test-confidence.md Outdated Show resolved Hide resolved

Out of all the approaches used, unit testing has the most tools available and the coverage can simply be displayed as % of code lines tested. Although this is a very nice and very easy to understand metric, it does not speak about the quality of the test coverage.

Since distributed systems testing is a lot more involved, our reliance on simple unit testing tools should be minimized due to lack of useful feedback. Of course, sometimes unit tests are still necessary and helpful, but in some cases where unit tests are not helpful, we should use e2e or integration tests as appropriate rather than taking pains to add unit tests only to artificially increase coverage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on not artificially increasing coverage. However I tend to disagree that usage of UTs should be minimized. Test driven development with UTs can always reduce how much we need to rely on more complex higher level tests. Also TDD generally encourages better software.

I bring this up as there are still ways we can improve UTs with fuzzing for example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly referring to code coverage results. You can have 100% code coverage and still not cover much of the actual code interactions going on.

Copy link
Contributor Author

@MSalopek MSalopek Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to

Since distributed systems testing is a lot more involved, unit tests are oftentimes not sufficient to cover complex interactions. Unit tests are still necessary and helpful, but in cases where unit tests are not helpful e2e or integration tests should be favored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea seems reasonable, my comment was just trying to communicate that imo unit tests are always helpful, even in complex interactions, by testing individual components of the system. Higher level tests have their place as well for integrations

Specifications written in a dedicated and executable specification language are easier to maintain than the ones written entirely entirely in text.
Additionally, we can create models based on the specification OR make the model equivalent to a specification.

Models do not care about the intricacies of implementation and neither do specifications. Since both models and specifications care about concisely and accurately describing a system (such as a finite state machine), we see a benefit of adding model based tools (such as [quint](https://github.com/informalsystems/quint)) to our testing and development workflows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quint does seem cool and all, but it seems from @p-offtermatt's experience that it doesn't suit all our needs in its current form.

We should consider a tool like https://github.com/flyingmutant/rapid to define our model, and generate high level test cases. Sure, we wouldn't be writing things in a TLA-esq syntax. But rapid might provide the usefulness we need in the short to medium term.

You could conceivably write an executable spec in the language of the rapid framework, arguably in a more readable form to layman like me who don't know TLA+.

I favor the simplicity of writing everything (both the model and driver) in golang, avoiding the need for specialized knowledge in an experimental specification language, and avoiding a test trace interpreter.

My question here is, what does Quint offer that rapid does not? I'm failing to see the value props of using Quint in its current form besides the fact that it's cool to write your spec in a TLA-esq syntax and we want to support Informal projects

Copy link
Contributor

@p-offtermatt p-offtermatt Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point Shawn, I also considered using Rapid.

My feeling is that it's probably very hard to get a specification of reasonable abstraction level with Rapid,
potentially harder than with Quint. This is just an educated guess, but part of the complexity is just that the protocol is inherently complex.

My current intuition is that if we are just interested in good test coverage, doing property-based testing with Rapid is probably better in terms of cost/benefit, but this wouldn't yield executable specs.

If we are interested in executable specs, Quint still seems reasonable to me, but I don't have a good feeling on how to get better data on this - maybe just trying the rapid approach next would be a good idea to see how it goes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, here's some followup questions to consider

My feeling is that it's probably very hard to get a specification of reasonable abstraction level with Rapid

Why is this the case? What makes a protocol easier to describe/abstract with Quint compared to Rapid, or any language/framework for that matter? Does Rapid have any functionality/advantages that Quint does not?

Maybe we can try modeling just a subprotocol of replicated security with both Quint and Rapid, and compare side by side how challenging it is to model with each.

Regarding the challenge of creating a succinct model: The particular language or framework we use matters. But generally well written and encapsulated software matters just as much imo

Copy link
Contributor

@p-offtermatt p-offtermatt Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good points here Shawn. The working hypothesis is that for high-level specifications where implementation-level details are abstracted away, a language that purposely abstracts a lot of these details away is easier to use than a language that needs more fine-grained management of details like memory, concurrency, ... . We will see how it turns out, but I agree that giving both a try could be a good idea.


#### Matrix tests
Instead of only running tests against current `main` branch we should adopt an approach where we also:
- **run regression tests against released software versions** (`ICS v1/v2/v3`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 including mismatched consumer/provider versions



### Negative
It might be easier to forgo the MBT tooling and instead focus on pure property based testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@p-offtermatt p-offtermatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice writeup. Left a few comments.


Out of all the approaches used, unit testing has the most tools available and the coverage can simply be displayed as % of code lines tested. Although this is a very nice and very easy to understand metric, it does not speak about the quality of the test coverage.

Since distributed systems testing is a lot more involved, our reliance on simple unit testing tools should be minimized due to lack of useful feedback. Of course, sometimes unit tests are still necessary and helpful, but in some cases where unit tests are not helpful, we should use e2e or integration tests as appropriate rather than taking pains to add unit tests only to artificially increase coverage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with some things here, similar to Shawns comment.
My feeling is we should maximize our (reasonable) use of unit tests. Of course there are situations when they are not the right tool and it's fine to move to other testing layers, but unit tests have many advantages over integration or e2e tests.
Some reasons for unit tests are: better localizability of errors, less flakeyness, less unstated assumptions on implementation details e.g. output formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the text.

I'm not saying UTs are bad, I'm saying that the coverage metric is misleading and gives a false sense of security.

Have in mind that we're talking about UTs in ICS which relies on cosmos-sdk and ibc-go.

Let's go over a scenario of writing UTs for code that uses ibc-go and cosmos-sdk:

Write a test for a function that performs the following operation:
1. fetch a clientID from ibc-go ChannelKeeper
2. store the fetched clientID to blockchain state using the ProviderKeeper

Even though the operation is trivial, when writing UTs you rely on mocks for the ChannelKeeper which have to be generated correctly. The UTs get more convoluted the more store keepers you add.

Writing this as an integration test is comparatively easier, since you het access to the "real" ChannelKeeper and the ProviderKeeper without adding extra complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to

Since distributed systems testing is a lot more involved, unit tests are oftentimes not sufficient to cover complex interactions. Unit tests are still necessary and helpful, but in cases where unit tests are not helpful e2e or integration tests should be favored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go over a scenario of writing UTs for code that uses ibc-go and cosmos-sdk

I'd argue that in this scenario, UTs are less complex, and more useful. The process of generating mocks is a single automated CLI command. Accessing the "real" external keepers requires far more manual setup as we've seen in our integration tests. Not to mention, setting up external keepers goes against the intention of TDD, you're no longer only testing code from the local module.

Integration tests should be reserved for situations where your intention is truly to test the high level integration of software components

Copy link
Contributor

@p-offtermatt p-offtermatt Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated text is good for me, thanks! :)

Copy link
Contributor

@mpoke mpoke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MSalopek

docs/docs/adrs/adr-011-improving-test-confidence.md Outdated Show resolved Hide resolved
docs/docs/adrs/adr-011-improving-test-confidence.md Outdated Show resolved Hide resolved
docs/docs/adrs/adr-011-improving-test-confidence.md Outdated Show resolved Hide resolved
docs/docs/adrs/adr-011-improving-test-confidence.md Outdated Show resolved Hide resolved
docs/docs/adrs/adr-011-improving-test-confidence.md Outdated Show resolved Hide resolved
docs/docs/adrs/adr-011-improving-test-confidence.md Outdated Show resolved Hide resolved
docs/docs/adrs/adr-011-improving-test-confidence.md Outdated Show resolved Hide resolved
docs/docs/adrs/adr-011-improving-test-confidence.md Outdated Show resolved Hide resolved
Harness is the infrastructure layer of the pipeline that accepts inputs from the driver.

There can be multiple harnesses as long as they can perform 3 things:
* bootstrap a test execution environment (local, docker, k8s…)
Copy link
Contributor

@insumity insumity Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that is not clear to me is on how can we guarantee that the driver can execute the exact same trace as it was given by the model. For example, consider we use docker containers, and the model-generated trace that we have is a sequence of events like: "chain A sends message m, chain B sends message m', chain A receives m', ..." then to run this exact trace we would need to enforce how/when the relayer relays packets or coordinate when chains A and B take steps. Is this something we can (easily) do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relayers can be controlled in a relatively fine-grained manner - you can in general make them clear packets on a given channel, then stop clearing more packets. What you described is easily possible in principle - just clear the channel that m' was sent on first.

Coordinating when exactly certain steps are taken is more difficult, it's e.g. possible with CometMock.

But generally, we often do not assume these very strong orderings. For example, assume we want to delegate on A, this triggers a message m to be sent from A to B, we want to first submit a tx to B, then B should receive m, then we want to submit another tx' to B. This is easily possible - we submit the delegation to A, wait until it is included, submit tx to B, wait until it is included, have the relayer relay m to B, wait until m was included, submit tx' to B, wait until it is included. Only the relative order of the events matters, we don't need precise coordination between block numbers on either chain.

Does that cover the kinds of examples you were thinking of?

Copy link
Contributor

@insumity insumity Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that cover the kinds of examples you were thinking of?

Yes. Thank you very much @p-offtermatt !

@github-actions github-actions bot added C:Docs Assigned automatically by the PR labeler C:ADR Assigned automatically by the PR labeler labels Aug 23, 2023
Copy link
Contributor

@shaspitz shaspitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've nitpicked your PR enough ;) great job and thanks for bringing up these topics

@MSalopek MSalopek changed the title docs: add testing improvements ADR docs: add testing improvements ADR (adr-011) Aug 24, 2023
@MSalopek MSalopek merged commit 2a1e3a9 into main Aug 24, 2023
@MSalopek MSalopek deleted the masa/1165-testing-adr branch August 24, 2023 12:30
mpoke pushed a commit that referenced this pull request Aug 25, 2023
* docs: add testing improvements ADR (adr-011) (#1197)

* docs: add testing improvements ADR

* docs: fix links and typos

* Update docs/docs/adrs/adr-011-improving-test-confidence.md

Co-authored-by: Shawn <[email protected]>

* Update docs/docs/adrs/adr-011-improving-test-confidence.md

Co-authored-by: Shawn <[email protected]>

* docs: update from comments

---------

Co-authored-by: Shawn <[email protected]>

* refactor: clean protos for clarity (#1071)

* consumer proto folder

* provider comments, move one msg

* ccv comments

* moved shiz

* proto changes to build

* make proto gen again

* no circ dep

* Update wire.proto

* comments

* dun builds

* progress save

* FIN

* refactors

* rm improper proto ref

* rm todos, issue now

* lint till I die

* update comments, rebuild protos

* change ccv.go to wire.go, and test file

* consistent comment

* Update wire.proto

* example alias diff

---------

Co-authored-by: MSalopek <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:ADR Assigned automatically by the PR labeler C:Docs Assigned automatically by the PR labeler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

document possible approaches regarding ICS QA and maintenance
6 participants