Downloader sync mode changed mid-run #29817

rjl493456442 · 2024-05-22T04:02:56Z

Recently i have observed some error logs in our benchmark infra, saying Downloader sync mode changed mid-run.
And even, in the latest snap sync, bad block error occurs

WARN [05-21|16:40:03.161] Unexpected trienode heal packet          peer=c847ea3a reqid=1,847,216,001,477,791,397
ERROR[05-21|16:40:03.171] Beacon backfilling failed                err="syncing canceled (requested)"
WARN [05-21|16:40:03.177] Retrieved pivot header from local        number=19,919,467 hash=694934..f919d8 latest=19,919,531 oldest=19,919,529
INFO [05-21|16:40:03.177] Forkchoice requested sync to new head    number=19,919,531 hash=235392..2ef6b4 finalized=unknown
WARN [05-21|16:40:03.190] Failed to load old bad blocks            error="pebble: not found"
ERROR[05-21|16:40:03.196] 
########## BAD BLOCK #########
Error: could not apply tx 0 [0x9521c089bbcb8808db31da1d39338cc4177ef92e325f0dee7408cb7419c24c67]: nonce too high: address 0x22B0351D445840dB59B97df3808dd642DCB17e96, tx: 1105 state: 0
Platform: geth (devel) go1.22.3 amd64 linux
VCS: 639900af-20240518
Chain config: &params.ChainConfig{ChainID:1, HomesteadBlock:1150000, DAOForkBlock:1920000, DAOForkSupport:true, EIP150Block:2463000, EIP155Block:2675000, EIP158Block:2675000, ByzantiumBlock:4370000, ConstantinopleBlock:7280000, PetersburgBlock:7280000, IstanbulBlock:9069000, MuirGlacierBlock:9200000, BerlinBlock:12244000, LondonBlock:12965000, ArrowGlacierBlock:13773000, GrayGlacierBlock:15050000, MergeNetsplitBlock:<nil>, ShanghaiTime:(*uint64)(0xc0003ed5c8), CancunTime:(*uint64)(0xc0003ed5d0), PragueTime:(*uint64)(nil), VerkleTime:(*uint64)(nil), TerminalTotalDifficulty:58750000000000000000000, TerminalTotalDifficultyPassed:true, Ethash:(*params.EthashConfig)(0x2eb3da0), Clique:(*params.CliqueConfig)(nil)}
Receipts: 
##############################

After some preliminary investigations, here are some important events ordered by timeline:

Initial state sync completes at block 19,919,465

INFO [05-21|16:39:48.215] Committed new head block number=19,919,465 hash=481b69..7f2734

Remaining blocks after 19,919,465 are inserted one by one
A new block announcement received and it's gapped with local beacon tip

A new sync cycle is scheduled with snap mode

INFO [05-21|16:39:52.214] New head block                           number=19,919,530 hash=e4e51e..b852e8 beaconRoot=52355f..a5e8b4
WARN [05-21|16:39:52.224] Ignoring payload while snap syncing      number=19,919,530 hash=e4e51e..b852e8 reason="chain gapped, head: 19919528, newHead: 19919530"
INFO [05-21|16:39:52.225] Forkchoice requested sync to new head    number=19,919,530 hash=e4e51e..b852e8 finalized=unknown
INFO [05-21|16:39:52.225] Restarting sync cycle                    reason="chain gapped, head: 19919528, newHead: 19919530"

Remaining blocks of last sync cycle are fully inserted, with 19,919,529 as chain tip

May 22 00:40:00 bench06.ethdevops.io geth INFO [05-21|16:40:00.918] Imported new chain segment               number=19,919,529 hash=fa4f2c..c4a659 blocks=23         txs=4505 mgas=338.444 elapsed=4.648s      mgasps=72.805 snapdiffs=359.45KiB triediffs=100.91MiB triedirty=0.00B
May 22 00:40:00 bench06.ethdevops.io geth INFO [05-21|16:40:00.918] Syncing: chain download in progress      synced=100.00% chain=753.90GiB headers=19,919,[email protected] bodies=19,919,[email protected] receipts=19,919,[email protected] eta=-11.071s
May 22 00:40:00 bench06.ethdevops.io geth INFO [05-21|16:40:00.918] Snap sync complete, auto disabling

A new sync cycle is restarted with "snap mode" (the mode was assigned before the completion of last sync cycle)

State healer removes the inconsistent root trie node and seeks for the state corresponding to block 19,919,530

May 22 00:40:00 bench06.ethdevops.io geth INFO [05-21|16:40:00.919] Disabled trie database due to state sync
May 22 00:40:01 bench06.ethdevops.io geth INFO [05-21|16:40:00.941] Aborting state snapshot generation       root=3dc301..5a1fab in=0042a5..807326 at=dab444..649ebe accounts=249,416              slots=1,914,524               storage=149.99MiB dangling=0 elapsed=12.725s     eta=3h28m21.072s
May 22 00:40:01 bench06.ethdevops.io geth WARN [05-21|16:40:00.948] Retrieved pivot header from local        number=19,919,466 hash=7f30be..0df48d latest=19,919,530 oldest=19,919,529
May 22 00:40:01 bench06.ethdevops.io geth INFO [05-21|16:40:00.949] Disabling direct-ancient mode            origin=19,919,465 ancient=19,918,621
May 22 00:40:01 bench06.ethdevops.io geth WARN [05-21|16:40:00.950] Retrieved beacon headers from local      from=19,919,466 count=63
May 22 00:40:01 bench06.ethdevops.io geth INFO [05-21|16:40:00.951] Syncing: state healing in progress       [email protected]        slots=119,[email protected]         [email protected]       nodes=929,[email protected] pending=1
May 22 00:40:01 bench06.ethdevops.io geth WARN [05-21|16:40:01.027] Ignoring already known beacon payload    number=19,919,529 hash=fa4f2c..c4a659 age=26s
May 22 00:40:01 bench06.ethdevops.io geth INFO [05-21|16:40:01.140] Syncing: state healing in progress       [email protected]        slots=119,[email protected]         [email protected]       nodes=929,[email protected] pending=475
May 22 00:40:01 bench06.ethdevops.io geth WARN [05-21|16:40:01.152] Unexpected trienode heal packet          peer=5b1055ce reqid=6,030,026,841,057,844,391
May 22 00:40:01 bench06.ethdevops.io geth INFO [05-21|16:40:01.922] Indexing transactions                    blocks=2742       txs=477,046 tail=19,916,788 total=2,350,000  elapsed=1.003s
May 22 00:40:02 bench06.ethdevops.io factor INFO [05-21|16:40:02.424] New head block                           number=19,919,531 hash=235392..2ef6b4 beaconRoot=e794fb..6e2916
May 22 00:40:02 bench06.ethdevops.io geth ERROR[05-21|16:40:02.431] Downloader sync mode changed mid-run     old=full new=full
May 22 00:40:03 bench06.ethdevops.io geth INFO [05-21|16:40:03.152] Syncing: state healing in progress       [email protected]        slots=119,[email protected]         [email protected]       nodes=931,[email protected] pending=1152

A new payload event for block 19,919,530 is executed in the mean time, result in the bad block due to state loss

The text was updated successfully, but these errors were encountered:

rjl493456442 · 2024-05-22T04:06:23Z

Conclusion I got until now:

Sync mode is changed in the middle of sync process because of data race

The mode of the last sync cycle was assigned before the completion of the prior cycle, and the desired sync mode was set to full when the prior cycle completes. Namely snap sync is running while the desired sync mode has been changed to full. Any consensus event received during this time window could trigger this error log.

Block insertion may run with snap sync simultaneously

It could lead to bad block or potentially state corruption.

smin-k · 2024-08-05T13:07:04Z

I am interested in this issue.

Was the change from "snap" to "full" sync automatic, or was it initiated by a client command?

Could restricting mode changes during synchronization, as managed by the setMode function, be a potential solution to this problem?

rjl493456442 added the type:bug label May 22, 2024

fjl added status:triage and removed status:triage labels Jun 6, 2024

smin-k mentioned this issue Jul 25, 2024

eth/downloader: correct sync mode logging to show old mode #30219

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downloader sync mode changed mid-run #29817

Downloader sync mode changed mid-run #29817

rjl493456442 commented May 22, 2024

rjl493456442 commented May 22, 2024 •

edited

Loading

smin-k commented Aug 5, 2024

Downloader sync mode changed mid-run #29817

Downloader sync mode changed mid-run #29817

Comments

rjl493456442 commented May 22, 2024

rjl493456442 commented May 22, 2024 • edited Loading

smin-k commented Aug 5, 2024

rjl493456442 commented May 22, 2024 •

edited

Loading