Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tighten next bottom up checkpoint height #824

Merged
merged 2 commits into from
Mar 20, 2024

Conversation

cryptoAtwill
Copy link
Contributor

@cryptoAtwill cryptoAtwill commented Mar 19, 2024

One of the key assumptions for bottom up checkpoint are:

  1. Bottom up checkpoints are submitted at a fixed interval, i.e. multiples of bottom up checkpoint period
  2. Bottom up checkpoint heights are sequential, say the checkpoint period is 10, then heights are 10, 20, 30, ...

With the message batch mechanism, one is allowed to submit at an earlier checkpoint height, e.g. if the last checkpoint height is 10 and the message batch is full at height 12, then bottom up checkpoint submission is allowed at height 12. However, when the above happens, the next expected checkpoint height is not specified clearly. At the same time, past checkpoints are allowed to be submitted but no opt is done (for historically reason). For example, past checkpoints are allowed here, but actually not all past heights are allowed here.

This results in:

  1. Some unit tests are passing, but not it's actually not fully correct, some are corrected in the previous PR.
  2. Relayer submitting past checkpoint which nothing is done and wasting gas.

The previous PR fixes the issue where checkpoint with a full message batch cannot be submitted, but there are just other hidden checks needs to be updated. This PR is created to:

  1. Put all the checkpoint height checks in one place.
  2. Enforce checkpoint periods are submitted at multiples of checkpoint period and a full message batch is just early submission. That means even with a full message batch checkpoint is submitted, the next expected checkpoint is still the multiple of checkpoint period.
  3. Prevent already submitted checkpoint height to be submitted again.
  4. More unit tests to simulate potential real world situations.

@cryptoAtwill cryptoAtwill requested review from raulk and aakoshh March 19, 2024 05:50
@cryptoAtwill cryptoAtwill force-pushed the more-fix-bu-msg-batch branch from eadac3f to f5b1ae0 Compare March 19, 2024 10:14
@raulk raulk added the Fluence label Mar 19, 2024
Copy link
Contributor

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The height expectation is incorrect. See inline review comments for details.

I propose the following:

  1. Keep the Fendermint behaviour as-is. It's easier to reason about regular checkpoints being at predictable heights from genesis, that don't depend on the previous checkpoint submission.

  2. Move this block up so that we accept any checkpoint with a full message batch.

            // if the bottom up messages' length is max, we consider that epoch valid, allow early submission
            if (checkpoint.msgs.length == s.maxMsgsPerBottomUpBatch) {
                return;
            }
  3. Then accept non-full checkpoints at checkpoint.blockHeight % s.bottomUpCheckPeriod == 0, as long as they're not in the future (already checked). Rename nextCheckpointHeight to maxPossibleCheckpointHeight.

Comment on lines +36 to +37
// validate signatures and quorum threshold, revert if validation fails
validateActiveQuorumSignatures({signatories: signatories, hash: checkpointHash, signatures: signatures});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future, #791 and this PR are closely interrelated. The former is adding checks that this PR is now deleting. It would've been simpler to push these changes to the previous PR, to prevent reviewers from reviewing stale logic.

}

// the expected bottom up checkpoint height, valid height
if (checkpoint.blockHeight == nextCheckpointHeight) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not line up with my understanding Fendermint, which is also not being updated in this PR. So I think this will PR will break checkpointing.

} else if height.value() % gateway.bottom_up_check_period(state)? == 0 {

I believe Fendermint produces checkpoints every bottomUpCheckPeriod from genesis. If an early checkpoint was produced due to reaching the msg batch limit, Fendermint will still produce the next checkpoint at the original schedule, but this contract change will expect it at exactly bottomUpCheckPeriod epochs after the last checkpoint.


Test screenshot you shared with me also confirms my understanding is correct:

image

Did you verify that the checkpoint at 1500 was accepted by the subnet actor at the parent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the confusion comes from this line:

uint256 nextCheckpointHeight = LibGateway.getNextEpoch(lastBottomUpCheckpointHeight, bottomUpCheckPeriod);

This function basically deduces the next checkpoint height, see test cases added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic seems correct, apologies for the false alarm! I'd suggest improving the naming of the method at least (LibGateway.getNextEpoch() is missing information).

revert BottomUpCheckpointAlreadySubmitted();
}

uint256 nextCheckpointHeight = LibGateway.getNextEpoch(lastBottomUpCheckpointHeight, bottomUpCheckPeriod);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so if I understand this right, then in case we had an early submission due to reaching msg size limit then we still continue to do checkpoints at multiples of bottomUpCheckPeriod after the early submission?

So for example if both bottomUpCheckPeriod and maxMsgsPerBottomUpBatch is 10, and we had the following checkpoints and msg sizes:

chk 10 and 2 msg
chk 20 and 3 msg
chk 25 and 10msg -> here we had 10 msg already at height 25 causing early submission
chk 30 and 2 msg
...

I s this correctly understood? I may have this wrong though as the screenshot that raul shares does not show that this is the case :S

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the above screenshot is aligned with what you described? Just the above screenshot has checkpoint period 60. Am I missing sth?

@raulk
Copy link
Contributor

raulk commented Mar 20, 2024

@cryptoAtwill kindly pointed me to the getNextEpoch calculation which does seem to be doing the right thing. But it'd be simpler to follow if we used modulo calculus.

Copy link
Contributor

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my assessment was off this time!

@raulk raulk merged commit 53b07ea into fix-bu-msg-batch-full Mar 20, 2024
22 checks passed
@raulk raulk deleted the more-fix-bu-msg-batch branch March 20, 2024 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants