Tighten next bottom up checkpoint height #824

cryptoAtwill · 2024-03-19T05:37:40Z

One of the key assumptions for bottom up checkpoint are:

Bottom up checkpoints are submitted at a fixed interval, i.e. multiples of bottom up checkpoint period
Bottom up checkpoint heights are sequential, say the checkpoint period is 10, then heights are 10, 20, 30, ...

With the message batch mechanism, one is allowed to submit at an earlier checkpoint height, e.g. if the last checkpoint height is 10 and the message batch is full at height 12, then bottom up checkpoint submission is allowed at height 12. However, when the above happens, the next expected checkpoint height is not specified clearly. At the same time, past checkpoints are allowed to be submitted but no opt is done (for historically reason). For example, past checkpoints are allowed here, but actually not all past heights are allowed here.

This results in:

Some unit tests are passing, but not it's actually not fully correct, some are corrected in the previous PR.
Relayer submitting past checkpoint which nothing is done and wasting gas.

The previous PR fixes the issue where checkpoint with a full message batch cannot be submitted, but there are just other hidden checks needs to be updated. This PR is created to:

Put all the checkpoint height checks in one place.
Enforce checkpoint periods are submitted at multiples of checkpoint period and a full message batch is just early submission. That means even with a full message batch checkpoint is submitted, the next expected checkpoint is still the multiple of checkpoint period.
Prevent already submitted checkpoint height to be submitted again.
More unit tests to simulate potential real world situations.

contracts/src/gateway/router/CheckpointingFacet.sol

raulk

The height expectation is incorrect. See inline review comments for details.

I propose the following:

Keep the Fendermint behaviour as-is. It's easier to reason about regular checkpoints being at predictable heights from genesis, that don't depend on the previous checkpoint submission.

Move this block up so that we accept any checkpoint with a full message batch.

        // if the bottom up messages' length is max, we consider that epoch valid, allow early submission
        if (checkpoint.msgs.length == s.maxMsgsPerBottomUpBatch) {
            return;
        }

Then accept non-full checkpoints at checkpoint.blockHeight % s.bottomUpCheckPeriod == 0, as long as they're not in the future (already checked). Rename nextCheckpointHeight to maxPossibleCheckpointHeight.

raulk · 2024-03-19T20:07:56Z

contracts/src/subnet/SubnetActorCheckpointingFacet.sol

+        // validate signatures and quorum threshold, revert if validation fails
+        validateActiveQuorumSignatures({signatories: signatories, hash: checkpointHash, signatures: signatures});


For the future, #791 and this PR are closely interrelated. The former is adding checks that this PR is now deleting. It would've been simpler to push these changes to the previous PR, to prevent reviewers from reviewing stale logic.

raulk · 2024-03-19T20:22:26Z

contracts/src/subnet/SubnetActorCheckpointingFacet.sol

+        }
+
+        // the expected bottom up checkpoint height, valid height
+        if (checkpoint.blockHeight == nextCheckpointHeight) {


This does not line up with my understanding Fendermint, which is also not being updated in this PR. So I think this will PR will break checkpointing.

ipc/fendermint/vm/interpreter/src/fvm/checkpoint.rs

Line 328 in 9509c75

} else if height.value() % gateway.bottom_up_check_period(state)? == 0 {

I believe Fendermint produces checkpoints every bottomUpCheckPeriod from genesis. If an early checkpoint was produced due to reaching the msg batch limit, Fendermint will still produce the next checkpoint at the original schedule, but this contract change will expect it at exactly bottomUpCheckPeriod epochs after the last checkpoint.

Test screenshot you shared with me also confirms my understanding is correct:

Did you verify that the checkpoint at 1500 was accepted by the subnet actor at the parent?

I think the confusion comes from this line:

uint256 nextCheckpointHeight = LibGateway.getNextEpoch(lastBottomUpCheckpointHeight, bottomUpCheckPeriod);

This function basically deduces the next checkpoint height, see test cases added.

Logic seems correct, apologies for the false alarm! I'd suggest improving the naming of the method at least (LibGateway.getNextEpoch() is missing information).

fridrik01 · 2024-03-19T20:54:26Z

contracts/src/subnet/SubnetActorCheckpointingFacet.sol

+            revert BottomUpCheckpointAlreadySubmitted();
+        }
+
+        uint256 nextCheckpointHeight = LibGateway.getNextEpoch(lastBottomUpCheckpointHeight, bottomUpCheckPeriod);


ok so if I understand this right, then in case we had an early submission due to reaching msg size limit then we still continue to do checkpoints at multiples of bottomUpCheckPeriod after the early submission?

So for example if both bottomUpCheckPeriod and maxMsgsPerBottomUpBatch is 10, and we had the following checkpoints and msg sizes:

chk 10 and 2 msg
chk 20 and 3 msg
chk 25 and 10msg -> here we had 10 msg already at height 25 causing early submission
chk 30 and 2 msg
...

I s this correctly understood? I may have this wrong though as the screenshot that raul shares does not show that this is the case :S

I thought the above screenshot is aligned with what you described? Just the above screenshot has checkpoint period 60. Am I missing sth?

raulk · 2024-03-20T00:17:01Z

@cryptoAtwill kindly pointed me to the getNextEpoch calculation which does seem to be doing the right thing. But it'd be simpler to follow if we used modulo calculus.

raulk

Looks like my assessment was off this time!

more fixes and tests

f5b1ae0

cryptoAtwill requested review from raulk and aakoshh March 19, 2024 05:50

cryptoAtwill force-pushed the more-fix-bu-msg-batch branch from eadac3f to f5b1ae0 Compare March 19, 2024 10:14

remove check when creating bu cp

a4cf6a2

raulk added the Fluence label Mar 19, 2024

cryptoAtwill commented Mar 19, 2024

View reviewed changes

contracts/src/gateway/router/CheckpointingFacet.sol Show resolved Hide resolved

raulk requested changes Mar 19, 2024

View reviewed changes

fridrik01 reviewed Mar 19, 2024

View reviewed changes

raulk approved these changes Mar 20, 2024

View reviewed changes

raulk merged commit 53b07ea into fix-bu-msg-batch-full Mar 20, 2024
22 checks passed

raulk deleted the more-fix-bu-msg-batch branch March 20, 2024 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tighten next bottom up checkpoint height #824

Tighten next bottom up checkpoint height #824

cryptoAtwill commented Mar 19, 2024 •

edited

Loading

raulk left a comment

raulk Mar 19, 2024

raulk Mar 19, 2024

cryptoAtwill Mar 20, 2024

raulk Mar 20, 2024

fridrik01 Mar 19, 2024

cryptoAtwill Mar 20, 2024

raulk commented Mar 20, 2024

raulk left a comment

		// validate signatures and quorum threshold, revert if validation fails
		validateActiveQuorumSignatures({signatories: signatories, hash: checkpointHash, signatures: signatures});

Tighten next bottom up checkpoint height #824

Tighten next bottom up checkpoint height #824

Conversation

cryptoAtwill commented Mar 19, 2024 • edited Loading

raulk left a comment

Choose a reason for hiding this comment

raulk Mar 19, 2024

Choose a reason for hiding this comment

raulk Mar 19, 2024

Choose a reason for hiding this comment

cryptoAtwill Mar 20, 2024

Choose a reason for hiding this comment

raulk Mar 20, 2024

Choose a reason for hiding this comment

fridrik01 Mar 19, 2024

Choose a reason for hiding this comment

cryptoAtwill Mar 20, 2024

Choose a reason for hiding this comment

raulk commented Mar 20, 2024

raulk left a comment

Choose a reason for hiding this comment

cryptoAtwill commented Mar 19, 2024 •

edited

Loading