Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(testing): update validators before UpdateClient call #4315

Closed
wants to merge 1 commit into from

Conversation

tbruyelle
Copy link
Contributor

Description

The current version of ICS uses a copy of ibc-go/testing, located in the legacy_ibc_testing folder, because it requires a couple of shims to work properly in its context.

There's a pending PR where this folder is removed and where ICS really uses this repo for testing. I was able to workaround most of the shims, except one that in my opinion requires a change in ibco-go/testing. This change (contained in this PR) switches the order of 2 instructions in the NextBlock() function.

Currently, the validators are updated after the chain.LastHeader update, which, in the context of the ICS tests, makes the UpdateClient() call fail, because the trusted validators hash doesn't match the consensus state next validators hash. The condition that fails is exactly here:

if !bytes.Equal(consState.NextValidatorsHash, tvalHash) {

The patch simply ensures that the validators are updated before the lastHeader is updated, which fix the problem for ICS. For this repo, this change doesn't fail the tests, but I lack knowledge on IBC to undertand all the consequences, so please let me know if this is irrelevant. In that case we'll probably have to find an other workaround or to keep the legacy_ibc_testing folder in ICS.

This change targets the release/v7.2.x branch because ICS depends on 7.1.0. Ideally, we expect to have this patch in a 7.2.1 or 7.3.0, please let us know what's possible.


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (see CONTRIBUTING.md).
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the module structure standards and Go style guide.
  • Wrote unit and integration tests.
  • Updated relevant documentation (docs/) or specification (x/<module>/spec/).
  • Added relevant godoc comments.
  • Provide a commit message to be used for the changelog entry in the PR description for review.
  • Re-reviewed Files changed in the Github PR explorer.
  • Review Codecov Report in the comment section below once CI passes.

@MSalopek
Copy link

MSalopek commented Aug 9, 2023

It also seems that the change does not affect any of the tests in the ibc-go repo.

Should we target main instead of a release/v* branch?

@colin-axner
Copy link
Contributor

Hi, thanks for raising the issue. Can you provide more context on why exactly UpdateClient is failing? From my understanding, the signed header for an executing block does not change its validator set or next validator set after executing (delayed execution). The validator set changes would be applied to the next header to be signed.

For context, the testing pkg on main doesn't fully mimic ABCI execution. I've raised a series of issues to fix this, with #3997 probably being most relevant. All the issues are listed on this milestone. If you could comment with any context I am missing, that would be very helpful!

@colin-axner
Copy link
Contributor

Should we target main instead of a release/v* branch?

Yes main, thank you. We cherry pick back to release branches when we backport fixes

@colin-axner
Copy link
Contributor

The current terminology in the testing pkg is a bit confusing, see #3971 and #3973 for how we can refer to the headers in question

@tbruyelle
Copy link
Contributor Author

tbruyelle commented Aug 9, 2023

Hey @colin-axner, thanks for your response and all the details.

It seems like the code in ICS is able to replicate the bug mentioned in #1668 (funny enough the diff is exactly the same as this PR). I'll work to extract the code from ICS and provide a basic way to reproduce. This should help me to provide the missing context.

EDIT: didn't find the time to finish that task, and I'm OOO the next 2 weeks, so this will wait a little 🙏

@colin-axner
Copy link
Contributor

Looked into this some more by testing with your branch in the linked pr. Still trying to grapple with how to fix this issue, but I think I've narrowed in on a problem.

From my understanding of looking at the SDK code and cometbft. The validator set changes returned by the SDK (which have already been applied) are not applied to the validator set which signs h + 1, rather, it is applied as we have it written in the ibc testing package, that is:

res := endBlock()
nextHeader.Vals = chain.NextVals
nextHeader.NextVals = applyChanges(chain.NextVals, changes)

But there's a subtle issue. Historical info in the SDK sets on begin block the current validator set it tracks. I believe the validators of the header being executed is not necessarily equal to the validator set stored in the SDK. This is because the applied changes in block H won't take affect until block H + 2. I think this is why the proposed fix works, it reverses the logic. It creates a signed header where the next vals is updated earlier (to match SDK logic).

Instead, the testing pkg should likely be fixed to query the correct validators. The current issue I think is that we query for the validators stored by historical info at H + 1, but as I noted, the validators stored at H + 1, are the validators of H + 2. I guess this means to get, validators of H + 1, we need to query historical info at H. I didn't have any luck with fixing the tests using this logic, will continue looking later

@tbruyelle
Copy link
Contributor Author

@colin-axner thanks for the investigation. Let me try to understand the issue you have raised.

On my side I naively tried to reproduce the case by adding some packets exchange in testing/TestChangeValSet, but that didn't worked. There's something more in the ICS repo testing setup that triggers the error. Will investigate further.

@colin-axner
Copy link
Contributor

Hi @tbruyelle, I figured it out!

So, I was correct in my above logic, the trusted height validator query to historical info should occur at the trusted height (based on the logic described above).

The tests were not passing though because there was double trouble. I needed to tweak ccv tests as well. While debugging I noticed that despite my logic, the historical info was returning the next validators for a block ahead. This made no sense to me, because when I queried at trusted height - 1, I received the validators for a block behind.

Then it dawned on me that it was as if the validator set stored in historical info was the same produced at the end of the block. But how could that happen? I was already aware of the finickiness of BeginBlock being called multiple times. Historical info is written on begin block based on the latest validator set. After a little investigation, my hunch turned out to be correct. BeginBlock was being called, state changes to the validator set occurred and then begin block was being called again! This occurs when we try to update the header time.

Here are my changes to ibc-go (which I will open a pr to fix):

diff --git a/testing/chain.go b/testing/chain.go
index 4a1a2f5db..cddb026bd 100644
--- a/testing/chain.go
+++ b/testing/chain.go
@@ -413,18 +413,19 @@ func (chain *TestChain) ConstructUpdateTMClientHeaderWithTrustedHeight(counterpa
        // Once we get TrustedHeight from client, we must query the validators from the counterparty chain
        // If the LatestHeight == LastHeader.Height, then TrustedValidators are current validators
        // If LatestHeight < LastHeader.Height, we can query the historical validator set from HistoricalInfo
-       if trustedHeight == counterparty.LastHeader.GetHeight() {
+       if trustedHeight.GTE(counterparty.LastHeader.GetHeight()) {
                tmTrustedVals = counterparty.Vals
        } else {
                // NOTE: We need to get validators from counterparty at height: trustedHeight+1
                // since the last trusted validators for a header at height h
                // is the NextValidators at h+1 committed to in header h by
                // NextValidatorsHash
-               tmTrustedVals, ok = counterparty.GetValsAtHeight(int64(trustedHeight.RevisionHeight + 1))
+               tmTrustedVals, ok = counterparty.GetValsAtHeight(int64(trustedHeight.RevisionHeight))
                if !ok {
                        return nil, sdkerrors.Wrapf(ibctm.ErrInvalidHeaderHeight, "could not retrieve trusted validators at trustedHeight: %d", trustedHeight)
                }
        }
+

and here are my changes in ccv:

diff --git a/tests/integration/unbonding.go b/tests/integration/unbonding.go
index 7693c782..fd323937 100644
--- a/tests/integration/unbonding.go
+++ b/tests/integration/unbonding.go
@@ -370,6 +370,8 @@ func (s *CCVTestSuite) TestRedelegationNoConsumer() {
                s.providerCtx().BlockTime().Add(stakingKeeper.UnbondingTime(s.providerCtx())),
        )
 
+       s.providerChain.NextBlock()
+
        // Increment time so that the unbonding period passes on the provider
        incrementTimeByUnbondingPeriod(s, Provider)
 
diff --git a/testutil/simibc/chain_util.go b/testutil/simibc/chain_util.go
index 64b0e913..6cc697b0 100644
--- a/testutil/simibc/chain_util.go
+++ b/testutil/simibc/chain_util.go
@@ -48,12 +48,12 @@ func EndBlock(c *ibctesting.TestChain, preCommitCallback func()) (*ibctmtypes.He
 
        c.App.Commit()
 
+       c.LastHeader = c.CurrentTMClientHeader()
+
        c.Vals = c.NextVals
 
        c.NextVals = ibctesting.ApplyValSetChanges(c.T, c.Vals, ebRes.ValidatorUpdates)
 
-       c.LastHeader = c.CurrentTMClientHeader()
-
        sdkEvts := ABCIToSDKEvents(ebRes.Events)
        packets := ParsePacketsFromEvents(sdkEvts)
 
diff --git a/testutil/simibc/relay_util.go b/testutil/simibc/relay_util.go
index 6ede5aa0..18f63b34 100644
--- a/testutil/simibc/relay_util.go
+++ b/testutil/simibc/relay_util.go
@@ -151,14 +151,14 @@ func augmentHeader(sender *ibctesting.TestChain, receiver *ibctesting.TestChain,
        // Once we get TrustedHeight from client, we must query the validators from the counterparty chain
        // If the LatestHeight == LastHeader.Height, then TrustedValidators are current validators
        // If LatestHeight < LastHeader.Height, we can query the historical validator set from HistoricalInfo
-       if trustedHeight == sender.LastHeader.GetHeight() {
-               tmTrustedVals = sender.Vals
+       if trustedHeight.GTE(sender.LastHeader.GetHeight()) {
+               tmTrustedVals = sender.NextVals
        } else {
                // NOTE: We need to get validators from counterparty at height: trustedHeight+1
                // since the last trusted validators for a header at height h
                // is the NextValidators at h+1 committed to in header h by
                // NextValidatorsHash
-               tmTrustedVals, ok = sender.GetValsAtHeight(int64(trustedHeight.RevisionHeight + 1))
+               tmTrustedVals, ok = sender.GetValsAtHeight(int64(trustedHeight.RevisionHeight))
                if !ok {
                        return errorsmod.Wrapf(ibctmtypes.ErrInvalidHeaderHeight, "could not retrieve trusted validators at trustedHeight: %d", trustedHeight)
                }

I think the testutil/simibc changes are unnecessary as this is the directory you want to delete?

@tbruyelle
Copy link
Contributor Author

Wow excellent @colin-axner!

I was actually debugging TestBasicSlashPacketThrottling and I had noticed something weird in the trusted vals of the header sent to UpdateClient, but I think I was still 2 light years away from finding the solution ^^'.

The fix in ibc-go is enough to fix TestBasicSlashPacketThrottling, but not for TestRedelegationNoConsumer which requires an additional call to NextBlock() thanks for that!

About testutil/simibc, this is not the directory I want to delete, the directory I want to delete is legacy_ibc_testing, which used to be a kind of ibc-go/testing with some shims. That said, I was able to run successfully all the tests in tests/integration and tests/differences without the need of your changes in testutil/simibc. Can you point me what are the tests that require them ?

@colin-axner
Copy link
Contributor

Can you point me what are the tests that require them ?

I'm not sure which tests require them. I only made the modification after finding that directory when I used grep to look for begin/end block calls. Maybe that package should be updated then to match the ibc-go testing pkg usage.

You would like the fix to be backported to the v7.2.x line?

@tbruyelle
Copy link
Contributor Author

You would like the fix to be backported to the v7.2.x line?

Yes please 🙏

@tbruyelle
Copy link
Contributor Author

Maybe that package should be updated then to match the ibc-go testing pkg usage.

Yes that's also what I think, I'll keep that in mind!

@tbruyelle
Copy link
Contributor Author

I can confirm the patch 668fa1d backported to the 7.2 branch in PR #4505 fixes the issue that initially motivated that PR.

So I can close this PR which doesn't contain the correct fix for the issue #4484.

Thanks again @colin-axner !

cc @MSalopek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants