-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tighten next bottom up checkpoint height #824
Conversation
eadac3f
to
f5b1ae0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The height expectation is incorrect. See inline review comments for details.
I propose the following:
-
Keep the Fendermint behaviour as-is. It's easier to reason about regular checkpoints being at predictable heights from genesis, that don't depend on the previous checkpoint submission.
-
Move this block up so that we accept any checkpoint with a full message batch.
// if the bottom up messages' length is max, we consider that epoch valid, allow early submission if (checkpoint.msgs.length == s.maxMsgsPerBottomUpBatch) { return; }
-
Then accept non-full checkpoints at
checkpoint.blockHeight % s.bottomUpCheckPeriod == 0
, as long as they're not in the future (already checked). RenamenextCheckpointHeight
tomaxPossibleCheckpointHeight
.
// validate signatures and quorum threshold, revert if validation fails | ||
validateActiveQuorumSignatures({signatories: signatories, hash: checkpointHash, signatures: signatures}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future, #791 and this PR are closely interrelated. The former is adding checks that this PR is now deleting. It would've been simpler to push these changes to the previous PR, to prevent reviewers from reviewing stale logic.
} | ||
|
||
// the expected bottom up checkpoint height, valid height | ||
if (checkpoint.blockHeight == nextCheckpointHeight) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not line up with my understanding Fendermint, which is also not being updated in this PR. So I think this will PR will break checkpointing.
} else if height.value() % gateway.bottom_up_check_period(state)? == 0 { |
I believe Fendermint produces checkpoints every bottomUpCheckPeriod
from genesis. If an early checkpoint was produced due to reaching the msg batch limit, Fendermint will still produce the next checkpoint at the original schedule, but this contract change will expect it at exactly bottomUpCheckPeriod
epochs after the last checkpoint.
Test screenshot you shared with me also confirms my understanding is correct:
Did you verify that the checkpoint at 1500 was accepted by the subnet actor at the parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the confusion comes from this line:
uint256 nextCheckpointHeight = LibGateway.getNextEpoch(lastBottomUpCheckpointHeight, bottomUpCheckPeriod);
This function basically deduces the next checkpoint height, see test cases added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic seems correct, apologies for the false alarm! I'd suggest improving the naming of the method at least (LibGateway.getNextEpoch()
is missing information).
revert BottomUpCheckpointAlreadySubmitted(); | ||
} | ||
|
||
uint256 nextCheckpointHeight = LibGateway.getNextEpoch(lastBottomUpCheckpointHeight, bottomUpCheckPeriod); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so if I understand this right, then in case we had an early submission due to reaching msg size limit then we still continue to do checkpoints at multiples of bottomUpCheckPeriod
after the early submission?
So for example if both bottomUpCheckPeriod
and maxMsgsPerBottomUpBatch is 10, and we had the following checkpoints and msg sizes:
chk 10 and 2 msg
chk 20 and 3 msg
chk 25 and 10msg -> here we had 10 msg already at height 25 causing early submission
chk 30 and 2 msg
...
I s this correctly understood? I may have this wrong though as the screenshot that raul shares does not show that this is the case :S
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the above screenshot is aligned with what you described? Just the above screenshot has checkpoint period 60. Am I missing sth?
@cryptoAtwill kindly pointed me to the getNextEpoch calculation which does seem to be doing the right thing. But it'd be simpler to follow if we used modulo calculus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like my assessment was off this time!
One of the key assumptions for bottom up checkpoint are:
With the message batch mechanism, one is allowed to submit at an earlier checkpoint height, e.g. if the last checkpoint height is 10 and the message batch is full at height 12, then bottom up checkpoint submission is allowed at height 12. However, when the above happens, the next expected checkpoint height is not specified clearly. At the same time, past checkpoints are allowed to be submitted but no opt is done (for historically reason). For example, past checkpoints are allowed here, but actually not all past heights are allowed here.
This results in:
The previous PR fixes the issue where checkpoint with a full message batch cannot be submitted, but there are just other hidden checks needs to be updated. This PR is created to: