Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subnet relayer stopped making checkpoints #1227

Open
sam701 opened this issue Dec 4, 2024 · 5 comments
Open

Subnet relayer stopped making checkpoints #1227

sam701 opened this issue Dec 4, 2024 · 5 comments

Comments

@sam701
Copy link
Contributor

sam701 commented Dec 4, 2024

I observed today at 8am UTC that our relayer stopped making checkpoints without any errors in logs. Here is the latest log snippet (--checkpoint-interval-sec 600):

2024-12-04T05:58:31.340556Z  INFO tracing_event: domain="Bottomup" event=CheckpointSubmitted { height: 2052600, hash: 1d284a4321db7a84d2e0e7c287f289f38e6ec4ee262e6761d6d273069c1acd32 }
2024-12-04T05:59:30.558226Z  INFO ipc_provider::checkpoint: submitted bottom up checkpoint(2053200) in parent at height 2198852
2024-12-04T05:59:30.558248Z  INFO tracing_event: domain="Bottomup" event=CheckpointSubmitted { height: 2053200, hash: 8eb2a0b1b74d1ccd044af6efce8d527888be692b50d81039c917cff18c2402d1 }
2024-12-04T06:09:30.960764Z  INFO ipc_provider::checkpoint: last submission height: 2053200
2024-12-04T06:10:00.518882Z  INFO ipc_provider::checkpoint: submitted bottom up checkpoint(2053800) in parent at height 2198873
2024-12-04T06:10:00.518902Z  INFO tracing_event: domain="Bottomup" event=CheckpointSubmitted { height: 2053800, hash: ef653773d23d0d174ba6054dd516e58d90cdd416277a19c6414eef0cd850f082 }
2024-12-04T06:20:00.892893Z  INFO ipc_provider::checkpoint: last submission height: 2053800
2024-12-04T06:21:00.921719Z  INFO ipc_provider::checkpoint: submitted bottom up checkpoint(2054400) in parent at height 2198895
2024-12-04T06:21:00.921742Z  INFO tracing_event: domain="Bottomup" event=CheckpointSubmitted { height: 2054400, hash: 5ecbeb0caf9ebc7f39f2041888e3739ec7e71b9c57bfd29d3a34ec6fcb14db39 }
2024-12-04T06:22:01.181595Z  INFO ipc_provider::checkpoint: submitted bottom up checkpoint(2055000) in parent at height 2198897
2024-12-04T06:22:01.181618Z  INFO tracing_event: domain="Bottomup" event=CheckpointSubmitted { height: 2055000, hash: 7db07bd701c162c3543c72c44066cbb7344b8c417152e4e437b585638f854c8b }
2024-12-04T06:32:01.600061Z  INFO ipc_provider::checkpoint: last submission height: 2055000
2024-12-04T06:33:00.324699Z  INFO ipc_provider::checkpoint: submitted bottom up checkpoint(2055600) in parent at height 2198919
2024-12-04T06:33:00.324721Z  INFO tracing_event: domain="Bottomup" event=CheckpointSubmitted { height: 2055600, hash: 6b4612908c7b54d12baf26b62b643978b001a2e0b843fbaf6d3a9dfef5374d97 }
2024-12-04T06:34:00.452005Z  INFO ipc_provider::checkpoint: submitted bottom up checkpoint(2056200) in parent at height 2198921
2024-12-04T06:34:00.452029Z  INFO tracing_event: domain="Bottomup" event=CheckpointSubmitted { height: 2056200, hash: df1ccdef8665da197fcc41f3c66fe1efe9f971d196b20c7043a986763526f9b0 }
2024-12-04T06:44:00.878009Z  INFO ipc_provider::checkpoint: last submission height: 2056200
2024-12-04T06:45:00.964538Z  INFO ipc_provider::checkpoint: submitted bottom up checkpoint(2056800) in parent at height 2198943
2024-12-04T06:45:00.964562Z  INFO tracing_event: domain="Bottomup" event=CheckpointSubmitted { height: 2056800, hash: 08504b5db70e007ee754c821a546d6390e878fb7cda0aea99c76dbc45029618b }
2024-12-04T06:45:30.963911Z  INFO ipc_provider::checkpoint: submitted bottom up checkpoint(2057400) in parent at height 2198944
2024-12-04T06:45:30.963936Z  INFO tracing_event: domain="Bottomup" event=CheckpointSubmitted { height: 2057400, hash: a7f91a038a6cb840eb5933519722027599db6185e88c0232f0d6b624c0ddac6a }
2024-12-04T06:55:31.336437Z  INFO ipc_provider::checkpoint: last submission height: 2057400
2024-12-04T06:56:31.366458Z  INFO ipc_provider::checkpoint: submitted bottom up checkpoint(2058000) in parent at height 2198966
2024-12-04T06:56:31.366481Z  INFO tracing_event: domain="Bottomup" event=CheckpointSubmitted { height: 2058000, hash: cd7faa1e9dd13365ee3abc02f64b696b57aab0d134deb9baaf39f7e6e65fbbe9 }
2024-12-04T07:06:31.737210Z  INFO ipc_provider::checkpoint: last submission height: 2058000
2024-12-04T07:27:00.636927Z  INFO ipc_provider::checkpoint: submitted bottom up checkpoint(2058600) in parent at height 2199027
2024-12-04T07:27:00.636950Z  INFO tracing_event: domain="Bottomup" event=CheckpointSubmitted { height: 2058600, hash: 7fe714c5c6e541351fe834354d197e31f49b9abfe818dc8514dd5a5281730984 }

I restarted the relay with RUST_LOG=debug. Here is the part where it starts hanging:

2024-12-04T08:49:14.016532Z DEBUG hyper::client::pool: reuse idle connection for ("http", xxxxxxx:8545)
2024-12-04T08:49:14.016587Z DEBUG hyper::proto::h1::io: flushed 350 bytes
2024-12-04T08:49:14.017576Z DEBUG hyper::proto::h1::io: parsed 7 headers
2024-12-04T08:49:14.017584Z DEBUG hyper::proto::h1::conn: incoming body is content-length (36 bytes)
2024-12-04T08:49:14.017597Z DEBUG hyper::proto::h1::conn: incoming body completed
2024-12-04T08:49:14.017608Z DEBUG hyper::client::pool: pooling idle connection for ("http", xxxxxxx:8545)
2024-12-04T08:49:14.017636Z DEBUG ipc_provider::checkpoint: no reached events at height : 2061603
2024-12-04T08:49:14.017693Z DEBUG hyper::client::pool: reuse idle connection for ("http", xxxxxxx:8545)
2024-12-04T08:49:14.017757Z DEBUG hyper::proto::h1::io: flushed 350 bytes
2024-12-04T08:49:14.019656Z DEBUG hyper::proto::h1::io: parsed 7 headers
2024-12-04T08:49:14.019666Z DEBUG hyper::proto::h1::conn: incoming body is content-length (719 bytes)
2024-12-04T08:49:14.019679Z DEBUG hyper::proto::h1::conn: incoming body completed
2024-12-04T08:49:14.019691Z DEBUG hyper::client::pool: pooling idle connection for ("http", xxxxxxx:8545)
2024-12-04T08:49:14.019756Z DEBUG ipc_provider::checkpoint: found reached events at height : 2061604
2024-12-04T08:49:14.020018Z DEBUG hyper::client::pool: reuse idle connection for ("http", xxxxxxx:8545)
2024-12-04T08:49:14.020078Z DEBUG hyper::proto::h1::io: flushed 341 bytes
2024-12-04T08:49:14.027803Z DEBUG hyper::proto::h1::io: parsed 7 headers
2024-12-04T08:49:14.027813Z DEBUG hyper::proto::h1::conn: incoming body is content-length (1702 bytes)
2024-12-04T08:49:14.027827Z DEBUG hyper::proto::h1::conn: incoming body completed
2024-12-04T08:49:14.027841Z DEBUG hyper::client::pool: pooling idle connection for ("http", xxxxxxx:8545)
2024-12-04T08:49:14.027897Z DEBUG ipc_api: raw evm subnet addr: 0xdb89650aa010018195bcaf868a2f289135e13cae
2024-12-04T08:49:14.027912Z DEBUG ipc_api: raw evm subnet addr: 0xd7ab017914762fc028ee922e549e13374f7b5a29
2024-12-04T08:49:14.027941Z DEBUG ipc_provider::checkpoint: bottom up bundle: BottomUpCheckpointBundle { checkpoint: BottomUpCheckpoint { subnet_id: SubnetID { root: 314159, children: [Address("t410f3oewkcvacaaydfn4v6diulzise26cpfolfj7heq")] }, block_height: 2061600, block_hash: [114
, 7, 163, 123, 155, 63, 146, 78, 235, 31, 134, 83, 206, 109, 206, 9, 203, 220, 79, 188, 227, 221, 21, 14, 248, 84, 83, 73, 187, 174, 201, 119], next_configuration_number: 0, msgs: [] }, signatures: [[160, 171, 26, 111, 177, 131, 64, 215, 58, 253, 46, 213, 192, 24, 235, 126, 250, 33,
246, 176, 42, 72, 153, 140, 112, 178, 205, 105, 159, 22, 246, 130, 29, 209, 96, 248, 161, 133, 170, 191, 149, 195, 188, 223, 132, 28, 200, 43, 250, 163, 67, 114, 24, 11, 113, 190, 157, 81, 127, 228, 41, 38, 185, 103, 28]], signatories: [Address("t410f26vqc6iuoyx4akhosixfjhqtg5hxwwrjm
zjiwxq")] }
2024-12-04T08:49:14.126659Z DEBUG hyper::proto::h1::io: parsed 8 headers
2024-12-04T08:49:14.126676Z DEBUG hyper::proto::h1::conn: incoming body is content-length (103 bytes)
2024-12-04T08:49:14.126689Z DEBUG hyper::proto::h1::conn: incoming body completed
2024-12-04T08:49:14.126724Z DEBUG hyper::client::pool: pooling idle connection for ("https", calibration.node.glif.io)
2024-12-04T08:49:15.126948Z DEBUG ethers_providers::toolbox::pending_transaction: Starting to poll pending tx 0x2300f31baf5f43258797bf738076d3be76bf34b0b2324d53d7bde60e0e609829
2024-12-04T08:49:15.127020Z DEBUG hyper::client::pool: reuse idle connection for ("https", calibration.node.glif.io)
2024-12-04T08:49:15.127153Z DEBUG hyper::proto::h1::io: flushed 345 bytes
2024-12-04T08:49:15.246314Z DEBUG hyper::proto::h1::io: parsed 8 headers
2024-12-04T08:49:15.246337Z DEBUG hyper::proto::h1::conn: incoming body is content-length (1952 bytes)
2024-12-04T08:49:15.246353Z DEBUG hyper::proto::h1::conn: incoming body completed
2024-12-04T08:49:15.246388Z DEBUG hyper::client::pool: pooling idle connection for ("https", calibration.node.glif.io)
2024-12-04T08:49:16.246651Z DEBUG hyper::client::pool: reuse idle connection for ("https", calibration.node.glif.io)
2024-12-04T08:49:16.246773Z DEBUG hyper::proto::h1::io: flushed 345 bytes
2024-12-04T08:49:16.365895Z DEBUG hyper::proto::h1::io: parsed 8 headers
2024-12-04T08:49:16.365914Z DEBUG hyper::proto::h1::conn: incoming body is content-length (1952 bytes)
2024-12-04T08:49:16.365928Z DEBUG hyper::proto::h1::conn: incoming body completed

I started another relayer on another host with a different address. It started catching up but at some point I saw the same pooling idle connection for ("https", calibration.node.glif.io) as above without making progress.

@sam701
Copy link
Contributor Author

sam701 commented Dec 4, 2024

Two hours ago the relay started picking up and currently running smoothly. The issue disappeared without an explicit fix. Closing for now.

@sam701 sam701 closed this as completed Dec 4, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Done in IPC Core Development Dec 4, 2024
@sam701
Copy link
Contributor Author

sam701 commented Dec 16, 2024

We experienced the same situation again.

Image

I couldn't run relayer from any machine and any wallet. But then it progressed and caught up over the weekend without an explicit fix.

cc @karlem @raulk

@stbrody
Copy link

stbrody commented Dec 16, 2024

Would like to re-open this issue. Even though it eventually resolves, having hours/days during which the relayer is hung and we can't bring tokens down into the subnet from the parent chain is still a problem.

@raulk raulk reopened this Dec 18, 2024
@github-project-automation github-project-automation bot moved this from Done to Ready in IPC Core Development Dec 18, 2024
@raulk
Copy link
Contributor

raulk commented Dec 18, 2024

@stbrody can you supply the subnet ID and the relayer's account address so we can investigate further? Thanks!

@sam701
Copy link
Contributor Author

sam701 commented Dec 18, 2024

@raulk Here are the data:

  • subnet ID: /r314159/t410f2mjuvgks4vlwbmtfmwohbaqo5v2w5fbcr5zf7li
  • relayer's address: 0x5B9E0328f97f6C4cfCBbA3eD03114170b3645e45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready
Development

No branches or pull requests

3 participants