30% reduction in stored contract code size beyond deduplication and compression

My initial research shows that it is possible to gain a further 30% storage reduction for contract code, on top of contract deduplication and compressing code at storage time.

Of necessity, the solidity compiler and every other compiler will create bytecode with patterns of opcodes. When compressing an individual contract, an ideal compression algorithm learns these patterns after they have occurred once in the individual contract, and can then refer to them in shortened ways throughout the rest of the contact.

However, this means that the compression algorithm is always surprised by the first occurrence of a pattern in an individual contract, because the algorithm has no idea that this is common to many smart contracts, and so can only learn from the individual contract that it is working on.

The solution to this is built into most compression algorithm libraries - you can provide a small pre-trained “dictionary” that encodes common patterns already seen in historical data. This allows immediate compression of common patterns the first time they appear in new data.

As a second benefit, when a pretrained dictionary is used, it also remembers large patterns from the over represented spam/spam contracts. For those contracts that have been spammed tens of thousands of times with minor variations, a dictionary can reduce these to single digit percentages of their original size.

Here’s the results:

image986×428 11 KB

Using the zelliac dataset of all contract bytecode deployed up to early 2025, there are 1,539,858 deduplicated deployed bytecode sets.

Using the Zstandard compression library at its default fast compression level of 3, and compressing each individual bytecode set, the total size goes from 100% to 41.8% of original size. Adding in a 100KB dictionary, trained at the default settings for compression level results in bytecode taking up 29.3% of the original size, a 30% reduction from the compressed size.

Increasing the compression dictionary size or increasing the compression level further reduces the final size. I kept this optimized for speed. It’s also quite possible that further tuning of dictionary training parameters could result in even smaller sizes.

Total storage space required across different sized contract byte codes:

image1149×523 17.7 KB

Worst to best performance across all contracts.

image1136×529 26.4 KB

Final notes:

If using this in a client, I would presume storing a single byte that would tell if a contract was compressed at all, and if so which dictionary / algo was used. This would allow for smooth upgrades to better dictionaries in the future, as well as not compressing files that compression makes worse.
I used zstandard as a compression library simply because I’ve had good experiences with it in the past. I’ve not compared different compression libraries or algorithms at this point.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

Financial Times

Breaking News: Iran "fully opens" the Strait of Hormuz

Tin Tức Bitcoin

RAVE surpasses $28, setting a new all-time high; $20.54 million in contracts liquidated.

RAVE

28.31%

BlockTempo

Iran: We will not accept the removal of enriched uranium! A slap in the face to Trump, but we admit to having drafted a 60-day negotiation memorandum with the US.