As mentioned in our community call today, we landed a performance improvement that speeds up block execution throughput by 20-25%. On one of our servers, @ethrex_client went from 514 to 637 mgas/s, with latency dropping from 64 to 57ms.
The idea came from reth's latest version, where they added a cache so proof workers can share already-fetched db values instead of each one hitting the db independently. We don't use overlays or proof workers, we use trie layers for merkleization, but we had the same problem elsewhere: when pre-warming state, individual workers were fetching the same state multiple times, hitting trie layers at best and the db at worst. Adding a shared cache for fetched values gave us the gains above.
We got a 50% improvement over the last few weeks without using Claude Code. With it, expect more gains soon. For reference, on our servers @Nethermind mainnet averages 772 mgas/s and ethrex is at 692 mgas/s.
Congrats @class_lambda.