Ethereum Prysm client experienced a mainnet, resulting in resource exhaustion and a large-scale loss of blocks and witnesses.

This article is machine translated
Show original
According to Mars Finance, the Prysm team released a mainnet recap report stating that during the Ethereum mainnet Fusaka session on December 4th, almost all Prysm beacon nodes experienced resource exhaustion while processing specific attestations, resulting in their inability to respond to validator requests in a timely manner and causing a large number of missing blocks and witnesses. The incident affected epochs 411439 to 411480, a total of 42 epochs, with 248 blocks missing out of 1344 slots, a missing rate of approximately 18.5%. Network participation dropped to 75% at one point, and validators lost approximately 382 ETH in witness rewards. The root cause was that Prysm received attestations from nodes that might have been out of sync with the mainnet. These attestations referenced the block root of the previous epoch. To verify their legitimacy, Prysm repeatedly replayed the old epoch state and performed high-cost epoch transitions, causing nodes to exhaust their resources under high concurrency. The defect originated from Prysm PR 15965, which had been deployed to the testnet a month prior but did not trigger the same scenario. The official temporary solution was to enable the `--disable-last-epoch-target` parameter in version 7.0; subsequent releases 7.1 and 7.1.0 included a long-term fix, using head state to verify attestations and avoid repeatedly replaying historical states. Prysm stated that the issue gradually subsided after 4:45 UTC on December 4th, with network participation recovering to over 95% by epoch 411480. The Prysm team pointed out that this incident highlights the importance of client diversity; if a single client accounts for more than one-third, it may lead to a temporary inability to terminate; exceeding two-thirds poses a risk of an invalid termination chain. They also reflected on the unclear communication regarding feature switches and the failure of the test environment to simulate large-scale asynchronous nodes, and will improve testing strategies and configuration management in the future.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
85
Add to Favorites
15
Comments