Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subnet relayer attempts spurious checkpoints when behind #1160

Open
3benbox opened this issue Oct 1, 2024 · 2 comments
Open

Subnet relayer attempts spurious checkpoints when behind #1160

3benbox opened this issue Oct 1, 2024 · 2 comments
Labels
bug Something isn't working papercut

Comments

@3benbox
Copy link

3benbox commented Oct 1, 2024

Issue type

Bug

Have you reproduced the bug with the latest dev version?

Yes

Version

main

Custom code

Yes

OS platform and distribution

No response

Describe the issue

When the subnet-relayer gets behind, it not only attempts the relay the new batch, but also any other batches in finds filling the logs with error messages.

2024-10-01T17:32:21.435352Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 171000: cannot submit bottom up checkpoint at height 171000 due to: Contract call reverted with data: 0x
2024-10-01T17:32:22.123241Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 171600: cannot submit bottom up checkpoint at height 171600 due to: Contract call reverted with data: 0x
2024-10-01T17:32:22.813057Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 172200: cannot submit bottom up checkpoint at height 172200 due to: Contract call reverted with data: 0x
2024-10-01T17:32:23.527525Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 172800: cannot submit bottom up checkpoint at height 172800 due to: Contract call reverted with data: 0x
2024-10-01T17:32:24.206572Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 173400: cannot submit bottom up checkpoint at height 173400 due to: Contract call reverted with data: 0x
2024-10-01T17:32:24.875503Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 174000: cannot submit bottom up checkpoint at height 174000 due to: Contract call reverted with data: 0x
2024-10-01T17:32:25.550911Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 174600: cannot submit bottom up checkpoint at height 174600 due to: Contract call reverted with data: 0x
2024-10-01T17:32:26.211507Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 175200: cannot submit bottom up checkpoint at height 175200 due to: Contract call reverted with data: 0x
2024-10-01T17:32:30.070998Z  INFO ipc_provider::checkpoint: submitted bottom up checkpoint(73800) in parent at height 2015918



2024-10-01T17:32:45.187891Z  INFO ipc_provider::checkpoint: last submission height: 73800

2024-10-01T17:34:11.485937Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 74400: cannot submit bottom up checkpoint at height 74400 due to: Contract call reverted with data: 0x
2024-10-01T17:34:11.502185Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 75600: cannot submit bottom up checkpoint at height 75600 due to: Contract call reverted with data: 0x
2024-10-01T17:34:11.510439Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 76200: cannot submit bottom up checkpoint at height 76200 due to: Contract call reverted with data: 0x
2024-10-01T17:34:12.182941Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 76800: cannot submit bottom up checkpoint at height 76800 due to: Contract call reverted with data: 0x
2024-10-01T17:34:12.189926Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 77400: cannot submit bottom up checkpoint at height 77400 due to: Contract call reverted with data: 0x
2024-10-01T17:34:12.226879Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 78000: cannot submit bottom up checkpoint at height 78000 due to: Contract call reverted with data: 0x
2024-10-01T17:34:12.872170Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 79200: cannot submit bottom up checkpoint at height 79200 due to: Contract call reverted with data: 0x
2024-10-01T17:34:12.882912Z ERROR ipc_provider::checkpoint: Fail to submit checkpoint at height 78600: cannot submit bottom up checkpoint at height 78600 due to: Contract call reverted with data: 0x

The relayer should only attempt to submit the next sequential checkpoint.

When already behind, this extra time will increase the time to catch up.

Repro steps

Run a validator out of funds, or stop it for a while.

Relevant log output

No response

@3benbox 3benbox added the bug Something isn't working label Oct 1, 2024
@raulk
Copy link
Contributor

raulk commented Oct 2, 2024

The IPC relayer is designed to submit checkpoints in parallel when it detects a backlog of unsubmitted checkpoints. This accelerates the catch-up. The feature was introduced in #840. The parallelism level can be adjusted with the --max-parallelism flag.

This was known to work in the past. It works because ethers-rs should use eth_estimateGas with the pending block, which instructs the API to execute all pending messages from the same sender prior to this one. I think this behaviour might have been disabled in recent versions of Lotus, or the endpoint you're using might have turned it off, since it places additional computation load on the node.

@raulk
Copy link
Contributor

raulk commented Oct 2, 2024

Workaround today: set --max-parallelism 1.

@raulk raulk added the papercut label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working papercut
Projects
Status: Backlog
Development

No branches or pull requests

2 participants