a16z: Why is it difficult for encrypted memory pools to become a panacea for MEV?

This article is machine translated
Show original
By Pranav Garimidi, Joseph Bonneau, Lioba Heimbach, a16z

Compiled by Saoirse, Foresight News

In blockchains, the maximum value that can be earned by determining which transactions are included in blocks and which are excluded, or by adjusting the order of transactions, is called Maximum Extractable Value, or MEV. MEV is ubiquitous in most blockchains and has been a topic of widespread interest and discussion within the industry.

Many researchers, observing the MEV phenomenon, have raised a clear question: Can cryptography address this issue? One approach involves using an encrypted mempool: users broadcast encrypted transactions that are only decrypted and disclosed after sorting. This forces the consensus protocol to "blindly select" the order of transactions, which appears to prevent exploitation of MEV opportunities during the sorting phase.

Unfortunately, encrypted memory pools cannot provide a universal solution to the MEV problem, either from a practical or theoretical perspective. This article will explain the difficulties involved and explore feasible design directions for encrypted memory pools.

How the encrypted memory pool works

There are many proposals for encrypted mempools, but the general framework is as follows:

1. The user broadcasts the encrypted transaction.

2. The encrypted transaction is submitted to the chain (in some proposals, the transaction must first undergo a verifiable random shuffle).

3. When the block containing these transactions is finalized, the transactions are decrypted.

4. Finally execute these transactions.

It's important to note that step 3 (transaction decryption) presents a key question: Who is responsible for decryption? What happens if decryption fails? A simple approach is to allow users to decrypt their own transactions (in this case, encryption isn't even necessary; simply hiding the commitment is sufficient). However, this approach has a vulnerability: attackers could potentially implement speculative MEV.

In speculative MEV, an attacker speculates that a particular encrypted transaction contains an MEV opportunity. They then encrypt their own transaction and attempt to insert it into a favorable position (e.g., before or after the target transaction). If the transactions are in the expected order, the attacker decrypts them and extracts the MEV through their own transaction. If not, they refuse to decrypt the transaction, and their transaction will not be included in the final blockchain.

While it might be possible to penalize users who fail to decrypt, such a mechanism would be extremely difficult to implement. The reason is that the penalty would have to be uniform across all encrypted transactions (after all, encryption makes transactions indistinguishable), and it would also have to be severe enough to deter speculative MEV even against high-value targets. This would result in a large amount of locked-up funds, which would need to remain anonymous (to avoid revealing the connection between transactions and users). Furthermore, if a bug or network failure prevented the decryption of the funds, the real users would also suffer losses.

Therefore, most solutions recommend that when encrypting a transaction, it is necessary to ensure that it can be decrypted at some point in the future, even if the initiating user is offline or refuses to cooperate. This goal can be achieved in the following ways:

Trusted Execution Environments (TEEs) : Users can encrypt transactions to a key held within a secure enclave within a Trusted Execution Environment (TEE). In some basic implementations, the TEE is used only to decrypt transactions after a specific point in time (this requires time awareness within the TEE). More complex solutions assign the TEE the responsibility of decrypting transactions and constructing blocks, sorting transactions based on criteria such as arrival time and cost. Compared to other encrypted memory pool solutions, TEEs offer the advantage of processing plaintext transactions directly, reducing on-chain redundancy by filtering out transactions that are subject to rollback. However, this approach suffers from its reliance on hardware trustworthiness.

Secret-sharing and threshold encryption : In this scheme, users encrypt transactions to a secret key that is shared by a committee (usually a subset of validators). Decryption requires a threshold (e.g., two-thirds of the committee agrees).

When using threshold decryption, the trust factor shifts from hardware to the committee. Proponents argue that since most protocols assume an "honest majority" of validators in their consensus mechanisms, we can also make a similar assumption: the majority of validators will remain honest and not decrypt transactions prematurely.

However, a key distinction must be noted: these two trust assumptions are not synonymous. Consensus failures, such as blockchain forks, are publicly visible (a "weak trust assumption"), whereas a malicious committee that preemptively decrypts transactions leaves no public evidence, making such attacks undetectable and unpunishable (a "strong trust assumption"). Therefore, while the security assumptions of consensus mechanisms and cryptographic committees appear consistent on the surface, in practice, the assumption that the committee will not collude is much less reliable.

Time-lock and delay encryption : An alternative to threshold encryption, delay encryption works by encrypting transactions to a public key whose corresponding private key is hidden within a time-locked puzzle. A time-locked puzzle is a cryptographic puzzle that encapsulates a secret, which cannot be revealed until a predetermined time has passed. More specifically, the decryption process requires repeated execution of a series of computations that cannot be parallelized. In this mechanism, anyone can solve the puzzle to obtain the key and decrypt the transaction, but only after completing a slow (essentially serial) computation designed to take sufficiently long to ensure that the transaction cannot be decrypted before final confirmation. In its strongest form, this cryptographic primitive publicly generates such a puzzle using delay encryption; this process can also be approximated using trusted committees using time-locked encryption, but its advantages over threshold encryption are debatable.

Whether using delayed encryption or having a trusted committee perform computations, these schemes face numerous practical challenges: First, because the delay is inherently dependent on the computational process, it is difficult to ensure the accuracy of the decryption time. Second, these schemes rely on a specific entity running high-performance hardware to efficiently solve the puzzle. While anyone can take on this role, it remains unclear how to incentivize this entity to participate. Finally, in these designs, all broadcasted transactions are decrypted, including those that were never ultimately included in the block. Threshold-based (or witness-based) schemes, on the other hand, have the potential to decrypt only those transactions that were successfully included.

Witness encryption : The final, most advanced cryptographic scheme utilizes witness encryption. In theory, witness encryption encrypts information so that only someone who knows the witness information corresponding to a specific NP relationship can decrypt it. For example, the information can be encrypted so that only someone who can solve a Sudoku puzzle or provide a hash preimage of a certain value can decrypt it.

(Note: NP relation is the correspondence between "questions" and "answers that can be quickly verified")

For any NP-like relation, similar logic can be implemented using SNARKs. Essentially, witness encryption encrypts data in a form that allows it to be decrypted only by entities that can prove, via SNARKs, that certain conditions are met. In the context of encrypted mempools, a typical example of such a condition is that transactions can only be decrypted after a block is finalized.

This is a theoretical primitive with great potential. In reality, it's a general-purpose solution, with committee-based and delay-based approaches being specific applications. Unfortunately, no practical witness-based encryption schemes exist yet. Furthermore, even if such a scheme existed, it's difficult to argue that it would offer advantages over committee-based approaches in a proof-of-stake blockchain. Even if witness encryption is configured to "decrypt transactions only after they have been sequenced in a finalized block," a malicious committee could privately simulate the consensus protocol to forge the finalization of transactions, then use this private chain as a "witness" to decrypt the transactions. In this case, threshold decryption by the same committee offers comparable security with significantly simpler implementation.

However, in the Proof-of-Work consensus protocol, the advantage of witness encryption is even more significant, because even if the committee is completely malicious, it cannot privately mine multiple new blocks at the head of the current blockchain to forge the final confirmation state.

Technical Challenges Facing Encrypted Mempools

Several practical challenges limit the ability of encrypted memory pools to protect against MEV. Generally speaking, maintaining confidentiality is a difficult problem in itself. It's worth noting that encryption technology isn't widely used in Web3. However, decades of experience deploying encryption technology in networks (such as TLS/HTTPS) and private communications (from PGP to modern encrypted messaging platforms like Signal and WhatsApp) have fully exposed the difficulties involved: while encryption is a tool for protecting confidentiality, it cannot guarantee it absolutely.

First, certain entities may directly access the plaintext information of user transactions. In typical scenarios, users typically do not encrypt transactions themselves, but delegate this work to wallet providers. This allows wallet providers to access transaction plaintext and potentially exploit or sell this information to extract MEV. The security of encryption always depends on all entities with access to the key. The scope of key control is the security boundary.

Beyond this, the biggest problem lies in metadata—the unencrypted data surrounding the encrypted payload (transaction). This metadata can be used by attackers to infer transaction intent and conduct speculative MEV. It's important to note that attackers don't need to fully understand the transaction content or always guess correctly. For example, simply being able to determine with reasonable probability that a transaction is a buy order from a specific decentralized exchange (DEX) is sufficient to launch an attack.

We can categorize metadata into several categories: those that are classic challenges inherent to cryptography, and those that are specific to encrypted mempools.

Transaction size : Encryption itself cannot hide the size of plaintext (notably, the formal definition of semantic security explicitly excludes hiding plaintext size). This is a common attack vector in encrypted communications. A typical example is that even with encryption, an eavesdropper can still determine what's playing on Netflix in real time by looking at the size of each packet in the video stream. In an encrypted mempool, certain types of transactions may have unique sizes, thus leaking information.

Broadcast Time : Encryption also fails to hide time information (another classic attack vector). In Web3 scenarios, certain senders (e.g., in structured sell-off scenarios) may initiate transactions at regular intervals. Transaction times may also be correlated with other information, such as activity on external exchanges or news events. A more subtle exploitation of time information is arbitrage between centralized exchanges (CEXs) and decentralized exchanges (DEXs). Sequents can leverage the latest CEX price information by inserting transactions created as late as possible. At the same time, they can exclude all other transactions broadcast after a certain point in time (even if encrypted), ensuring that their own transactions receive the latest price advantage.

Source IP Address : By monitoring peer-to-peer networks and tracking the source IP address, a searcher can infer the identity of a transaction's sender. This problem has been known since the early days of Bitcoin (over a decade ago). This can be extremely valuable to a searcher if a particular sender exhibits consistent behavior. For example, knowing the sender's identity can link encrypted transactions with decrypted historical transactions.

Transaction sender and fee/gas information : Transaction fees are a type of metadata unique to crypto mempools. In Ethereum, a traditional transaction includes the on-chain sender's address (used to pay fees), a maximum gas budget, and the unit gas fee the sender is willing to pay. Similar to the source network address, the sender's address can be used to link multiple transactions to real-world entities; the gas budget can provide hints about the transaction's intent. For example, interacting with a specific DEX might require a recognizable fixed amount of gas.

Sophisticated searchers may combine multiple metadata types mentioned above to predict transaction content.

In theory, all of this information can be hidden, but at the cost of performance and complexity. For example, padding transactions to a standard length can hide their size, but this wastes bandwidth and on-chain space. Adding a delay before sending can hide the time, but it increases latency. Submitting transactions through an anonymous network like Tor can hide IP addresses, but this introduces new challenges.

The most difficult metadata to hide is transaction fee information. Encrypting fee data presents a number of problems for block builders. First, there's the spam problem. If fee data is encrypted, anyone can broadcast malformed encrypted transactions. These transactions will be ordered, but they won't cover the fees. Once decrypted, they won't execute, leaving no one responsible. This could be addressed with SNARKs, which prove that transactions are correctly formatted and funds are sufficient, but this would significantly increase overhead.

Secondly, there's the issue of efficiency in block construction and fee auctions. Builders rely on fee information to create profit-maximizing blocks and determine the current market price of on-chain resources. Encrypting fee data disrupts this process. One solution is to set a fixed fee for each block, but this is economically inefficient and could foster a secondary market for transaction bundling, defeating the purpose of the encrypted mempool. Another option is to conduct fee auctions through secure multi-party computation or trusted hardware, but both approaches are extremely costly.

Finally, a secure encrypted memory pool will increase system overhead in many ways: encryption will increase chain latency, computational complexity, and bandwidth consumption; how it will be combined with important future goals such as sharding or parallel execution is currently unclear; it may also introduce new failure points for liveness (such as the decryption committee and delay function solver in the threshold scheme); and at the same time, the design and implementation complexity will also increase significantly.

Many of the issues with encrypted mempools mirror those faced by blockchains designed to guarantee transaction privacy, such as Zcash and Monero. If there’s any positive implication, it’s that resolving all the challenges that cryptography mitigates in MEV will also, incidentally, clear the way for transaction privacy.

Economic Challenges Facing Crypto Mempools

Finally, crypto mempools face economic challenges. Unlike technical challenges, which can be mitigated over time with sufficient engineering investment, these economic challenges are fundamental limitations and are extremely difficult to resolve.

The core problem with MEV stems from the information asymmetry between transaction creators (users) and those who mine MEV opportunities (searchers and block builders). Users often have no idea how much extractable value their transactions contain. Therefore, even in a perfectly encrypted mempool, they can be tricked into revealing their decryption keys in exchange for a reward that's lower than the actual MEV value. This phenomenon is known as "incentivized decryption."

This scenario isn't difficult to imagine, as similar mechanisms, such as MEV Share, already exist. MEV Share is an order flow auction mechanism that allows users to selectively submit trade information to a pool, where searchers compete for the right to capitalize on the MEV opportunities presented by those trades. After the winning bidder withdraws MEV, a portion of the proceeds (i.e., the bid amount or a percentage thereof) is returned to the user.

This model is directly adaptable to crypto memory pools: users must disclose their decryption keys (or partial information) to participate. However, most users are unaware of the opportunity cost of participating in such mechanisms. They only see immediate rewards and are therefore happy to disclose information. Similar examples exist in traditional finance: for example, the zero-commission trading platform Robinhood, whose profit model is based on selling user order flow to third parties through "payment for order flow."

Another possible scenario is that large builders could force users to disclose transaction contents (or related information) under the pretext of censorship. Censorship resistance is an important and controversial topic in Web3. However, if large validators or builders are subject to legal requirements (such as those imposed by the U.S. Office of Foreign Assets Control (OFAC)) requiring them to comply with a censorship checklist, they might refuse to process any crypto transactions. Technically, users could potentially verify that their crypto transactions meet censorship requirements through zero-knowledge proofs, but this would incur additional cost and complexity. Even if a blockchain is highly censorship-resistant (ensuring that encrypted transactions are included), builders might still prioritize known-plaintext transactions at the front of the block and encrypted transactions at the end. Therefore, transactions that require guaranteed execution priority might ultimately be forced to disclose their contents to builders.

Other efficiency challenges

Encrypted mempools increase system overhead in several obvious ways. Users must encrypt transactions, and the system must somehow decrypt them, which increases computational costs and potentially increases transaction size. As mentioned earlier, processing metadata further exacerbates these overheads. However, there are also less obvious efficiency costs. In finance, markets are considered efficient when prices reflect all available information; latency and information asymmetry can lead to market inefficiencies. This is precisely the inevitable consequence of encrypted mempools.

A direct consequence of these inefficiencies is increased price uncertainty, a direct product of the additional latency introduced by crypto mempools. Consequently, there is a potential increase in failed transactions due to exceeding price slippage tolerance, wasting on-chain space.

Similarly, this price uncertainty could spur speculative MEV trading, which attempts to profit from on-chain arbitrage. Notably, crypto mempools could exacerbate these opportunities: execution delays further obscure the current state of decentralized exchanges (DEXs), potentially leading to market inefficiencies and price discrepancies across trading platforms. These speculative MEV transactions also waste block space, as they often abort execution if no arbitrage opportunities are identified.

Summarize

The original intention of this article is to sort out the challenges faced by encrypted memory pools so that people can shift their attention to the research and development of other solutions, but encrypted memory pools may still become part of the MEV governance solution.

One viable approach is a hybrid design: some transactions are blindly sorted using an encrypted memory pool, while others use a different sorting scheme. This hybrid design may be suitable for certain types of transactions (such as buy and sell orders from large market participants who can afford to carefully encrypt or pad their transactions and are willing to pay a higher cost to circumvent MEV). This design also makes sense for highly sensitive transactions, such as fixes for vulnerable security contracts.

However, due to technical limitations, high engineering complexity, and performance overhead, encrypted mempools are unlikely to become the hoped-for "MEV panacea." The community needs to develop other solutions, including MEV auctions, application-layer defense mechanisms, and shortened final confirmation times. MEV will remain a challenge for some time to come, requiring in-depth research to find a balance between various solutions to mitigate its negative impacts.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments