Reviewing Layer 2 solutions that rely on soft forks/restrictions

This article is machine translated
Show original

By Peter Todd

Source: https://petertodd.org/2024/covenant-dependent-layer-2-review

On-chain wallets achieve a (roughly) one-to-one mapping from transaction to transaction: for a user to perform an economic transaction, roughly one blockchain transaction is required. Transaction aggregation, coinjoin, cut-through, etc. techniques all deviate slightly, but in general, our statement is still correct.

Lightning channels achieve a many-to-one mapping: the magic of lightning channels is that a virtually unlimited number of economic transactions can occur within a channel, which itself is bound in a single unspent transaction output (UTXO). Essentially, we capture the “time” dimension — transactions — and achieve significant scalability by collapsing it.

However, it is not good enough (at least arguable) to let each user create a UTXO. Therefore, many proposals have emerged to try to allow multiple users to share a UTXO with autonomous custody to achieve greater expansion effects. This time, the "space" dimension is collapsed - the user - into a UTXO.

My goal here is to review all of these proposals, find out the technical patterns they share, find out what types of new opcodes or soft fork upgrades they require, and then build a comprehensive table to put all of them in. In the process, we will also define what a "Layer 2 protocol" is, what kind of scaling the Lightning Network can already achieve, and understand what upgrades we need to make to the transaction pool (mempool) to implement these proposals.

We would like to thank Fulgur Ventures for funding this research. They had no editorial authority over the content of this article and did not review it prior to publication.

We would like to thank Daniela Brozzoni , Sarah Cox and others for their pre-publication review.

1 Definition

1.1 What is “Layer 2”?

Often people define “Layer 2” so broadly that even bank-like entities (such as Liquid) can be defined as Layer 2. For the purposes of this article, we will use a narrower definition: Layer 2 is a Bitcoin-denominated system that exists to allow BTC to be transferred between related parties at a higher frequency than on-chain transactions, and:

  1. After accounting for the penalties and costs within the system , no one can profit by stealing funds from the system. Costs and penalties outside the system, such as loss of reputation, legal consequences, etc., are not considered in our definition.
  2. The (preferred) true owner of the funds can unilaterally withdraw their own funds (minus transaction fees) without the cooperation of any third party.

The first is necessary because we want our L2 system to be able to represent amounts and transactions that are too small to be represented on-chain. For example, in a Lightning channel, an HTLC can have a denomination that is too small to be represented on-chain. In this case, the value of the HTLC is added to the transaction fee of the commitment transaction. While Lightning nodes can "steal" a dust HTLC by closing channels, doing so is very expensive [1] and costs more than the value of the HTLC itself, so theft is unprofitable.

The second is that unilateral withdrawals are always our primary design goal. [2]

Under this definition, Lightning Channels would be considered an L2 system. However, systems like Liquid, Cashu, and Fedemint are not L2 because another party (or parties) controls your funds. "Client-side validation" schemes (such as RGB) are also not L2 under this definition because they cannot transfer BTC itself trustlessly. Finally, " Statechains " do not meet this definition ( Chinese translation ) because if the Statechain entity (service provider) intentionally does not comply with the agreement, it can steal funds.

1.2 What are “restrictions”?

…So why do L2 systems need restrictions to achieve greater scalability?

In Bitcoin scripting, a "constraint" is a mechanism that limits in advance how a transaction output (txout) can be spent, so that the form of the transaction that spends the txout is predefined, or, in a way that does not rely entirely on electronic signatures, limits its spending. L2 systems that share UTXO between multiple participants need constraints because they need to limit how UTXO can be spent in order to implement the rules and incentives of the L2 protocol.

1.2.1 Recursive Restrictions

Recursive constraints are constraints that have the property that rules restricting how a UTXO can be spent can be applied recursively, extending indefinitely to the child UTXOs of the spending transaction. Recursive constraints have long been considered undesirable by some because they can lead to funds being trapped forever. Or, at least, being trapped forever without the permission of a third party (such as a government).

2 Objectives

The Lightning Channel is the best in the current Layer 2 system. But it also has limitations, namely:

  1. Scalability - Lightning channels currently require each end user to have at least one UTXO [3] .
  2. Liquidity — Flash channels require funds to be bound in the channel.
  3. Interactive — Lightning channels require the recipient of a payment to be online to trustlessly receive the payment.

When evaluating Layer 2 systems, our goal is to improve upon these key limitations, ideally without introducing new ones.

2.1 Scalability Limitations of Lightning Channels

What does “each end user needs a UTXO” mean in real life? Since lightning channels can run indefinitely, one way to analyze this is to figure out how many new channels can be created per year [4] ? The marginal cost of creating a taproot output is $43vB$; if channel creation is amortized, meaning many channels can be opened in a single transaction, then other transaction overhead becomes negligible, and a significant number of channels can be opened each year to accommodate new users. For example, assuming 90% of block space is used to open new taproot lightning channels:
$$
52{\small,}560\frac{\mathrm{blocks}}{\mathrm{year}} \times 1{\small,}000{\small,}000\frac{\mathrm{vB}}{\mathrm{block}} \times 90% \times 1\frac{\mathrm{channel}}{43\mathrm{vB}} = 1.1,\mathrm{billion }\frac{\mathrm{channels}}{\mathrm{year}}
$$
(The result is that 1.1 billion new channels can be opened every year)

It’s estimated that half of the world’s population has a smartphone , which is 4.3 billion people. So in fact, every year we can bring a large percentage of the people who could potentially use Lightning Network to the Lightning Network.

However, channels don’t stay closed forever. Sometimes, users want to switch wallets, increase or decrease the capacity of a channel, etc. The most efficient way to change the capacity of a channel is “ channel splicing ”, especially the implementation of Phoenix Wallet .

Like channel opening, channel splicing can also be amortized for efficiency: multiple splicing operations can share a single transaction, reducing the number of inputs and outputs required to add and remove funds [5] . Therefore, the incremental block space required for each user's splicing operation, assuming the user uses musig , is $43vB$ of taproot outputs plus $57.5vB$ of witness data for the taproot key path cost, for a total of $100.5vB$. If we again assume that 90% of the block space is used for this purpose, then:
$$
52{\small,}560\frac{\mathrm{blocks}}{\mathrm{year}} \times 1{\small,}000{\small,}000\frac{\mathrm{vB}}{\mathrm{block}} \times 90% \times 1\frac{\mathrm{splice}}{100.5\mathrm{vB}} = 470,\mathrm {million}\frac{\mathrm{splices}}{\mathrm{year}}
$$
(The result is 470 million channel stitches per year)

Finally, note that switching Lightning channels between wallets can be done in one transaction, either by trusting the new wallet and signing a commitment transaction after the funds have been sent to the commitment address, or by supporting cooperative “closing and opening new channels” between the old and new wallet implementations.

Of course, there will be other use cases for Bitcoin outside of Lightning channels that will compete for block space, and it’s hard to know how this will translate into fee rates, but these numbers give us a rough estimate of how it is, at least technically, possible to support hundreds of millions of self-custodial Lightning users with current technology.

3 L2 Overview

Under our L2 definition, there are two design patterns that the Bitcoin developer community has been discussing:

  1. aisle
  2. Virtual UTXO

In the channel model, of which Lightning channels are a prime example, the advancement of state is achieved by participants exchanging pre-signed transactions that can be mined (but do not represent a "happy ending"). These pre-signed transactions split the value of a UTXO between participants; economic transactions are achieved by repeatedly using new pre-signed transactions to change the split results. Because there will be many valid transactions spending the same UTXO, some incentive mechanism is needed to ensure that only the correct transactions are actually confirmed by the block.

In the "virtual UTXO (V-UTXO)" design pattern, Ark is the most prominent example. V-UTXO is created through restrictive clauses or agreements between multiple parties; transactions representing value can be mined, thereby turning the V-UTXO of each party into a real UTXO on the chain, but such transactions do not represent a "happy ending". From this perspective, V-UTXO is also similar to a channel. However, unlike a channel, the V-UTXO scheme implements transactions by spending the V-UTXO itself, which is (conceptually) a single [6] pre-signed transaction.

The "everyone is happy" design pattern is to use a script path that "all participants agree on", such as an N-of-N multi-signature device; taproot is designed specifically for this concept, allowing the key path (via musign) to become an N-of-N multi-signature device. Assuming all participants agree, this path allows funds to be spent efficiently (and privately).

Interestingly, because virtual UTXO is "real" in many ways, it is very easy to build channels on virtual UTXO, just by making the virtual UTXO cause the UTXO required for the channel to be created when it is mined. In this sense, virtual UTXO is slightly lower level than channels.

3.1 Lightning Network

The Lightning Network implemented in production environments so far is mainly based on the BOLT standard . The Lightning Network is a combination of a number of technologies, including Lightning channels and HTLCs, P2P routing networks, onion routing, invoicing standards, and more. Importantly, the Lightning Network is not a public system, so the different modules of the "Lightning system" do not need to be adopted in exactly the same way by all users. For the purposes of this article, when we say "Lightning Network", we take a broad sense to include easily foreseeable upgrades to the existing (typical), widely used Lightning Network protocol(s).

As mentioned above, the key feature of the Lightning Network is the scalability limitations for end users, which come from the requirement that each user needs at least one UTXO. That said, for the “core” routing module of the Lightning Network — the public Lightning nodes that forward a large number of transactions — these scalability limitations are not a big problem, because the Lightning Network can work well as long as there are many more end users than routing nodes, and each public, user-payment-forwarding channel can easily handle a large number of transactions instantaneously. This is why many newly proposed L2 systems are expected to participate in the Lightning Network. We can also observe that existing, non-L2-compliant systems, such as Cashu, need to rely heavily on the Lightning Network to be truly useful: Cashu’s main usage is probably sending and receiving Lightning payments.

3.1.1 Non-interactive channels

This construction reduces interaction requirements and optimizes lightning channels by using OP_CTV . However, it does not optimize the scalability limitation of "one UTXO per user", so we will not discuss it further.

3.2 Channel Factory

In this construction, we can have multiple parties coordinate to enter an n-of-n multi-signature address, and a matching pre-signed transaction will spend this multi-signature address and create n different UTXOs to split the funds. And each of these n UTXOs will be used as a payment channel. The security of these channels is the same as opening them directly on-chain, because the transaction that splits the funds can be mined when the channel state needs to be published on-chain. This may save space on the chain because when the channel is closed, because - in theory - $n$ participants can cooperate to close all $n$ channels at once.

Because the Channel Factory is a coordination UTXO that can be mined but is not expected to actually be mined in the happy ending, it is a very primitive example of a V-UTXO.

Channel factories do not require any soft forks to be implemented. However, the simple channel factory described above may become impractical after a few participants, as user cooperation is required to truly achieve the scaling benefits. Therefore, proposed restrictions such as OP_Evict or CTV (via the txout tree) may help, allowing for more fine-grained results to be published - a single party can be ejected on-chain without forcing everyone to be on-chain at the same time.

3.3 Eltoo/LN-Symmetry

Because Eltto is a bad and confusing name, we will just use the updated name "LN-Symmetry" below.

While Poon-Dryja channels penalize the act of posting incorrect states to incentivize posting correct states, LN-Symmetry does the opposite, allowing incorrect states to be updated with an additional transaction. Its benefit is that it simplifies lightning channels by removing the complexity of penalties. However, it may also have disadvantages in untrusted environments, as penalties are arguably necessary to deter them.

LN-Symmetry requires a soft fork to enable SIGHASH_ANYPREVOUT to allow new state transactions to double-spend old state transactions.

On its own, LN-Symmetry doesn’t bring any scaling benefits to traditional lightning channels. But its backers argue that it will make channel factories easier to implement.

3.4 Ark

Ark takes a novel approach to transactional scaling: fully transferable virtual UTXOs (V-UTXOs) that can be merged and split in atomic [7] off-chain transactions. In Ark, a central coordinator, the Ark Service Provider (ASP), provides users with V-UTXOs for a defined period of time (e.g., 4 weeks). These periods are called " rounds ." These V-UTXOs are created from pool transaction outputs, once per round, using some mechanism (e.g., CTV) to allow a single on-chain transaction output to commit to a tree of V-UTXOs. The round expiration mechanism is key to Ark's scalability benefits: at the end of a round, the pool transaction output is unlocked, allowing the ASP to unilaterally spend it with a single signature in a small transaction. Because rounds have an expiration time, the V-UTXO created by the fund pool transaction output also has an expiration time: the user holding the V-UTXO must either spend the V-UTXO before the corresponding pool transaction output expires, or publish it to the chain (unilateral withdrawal).

To transfer a V-UTXO, the Ark coordinator co-signs a transaction that spends one or more V-UTXOs, and makes the transaction only effective if one or more other V-UTXOs are created in another round. Combined with some carefully designed timeouts - see the Ark documentation for full details - this dependency is what makes the spending of V-UTXOs trustless: unless a new V-UTXO is created in another pool transaction, the old V-UTXO cannot be taken on-chain (along with the pool transaction output being spent). There are ways to implement this dependency. But the exact details are not relevant for the purposes of this article.

Note that this means an ASP will be operating multiple active rounds at once. New rounds are created frequently to allow funds in existing rounds to be transferred. However, existing rounds will overlap with new rounds because they generally expire after new rounds (new pool transactions) are created.

3.4.1 Ark’s Economic Model

When a V-UTXO is spent, the ASP must provide matching BTC in a pool transaction output representing a new round. However, they cannot use the value of the spent V-UTXO until the current round ends. Therefore, there is a cost to spending V-UTXO: the time value of money, because the ASP must advance funds.

Specifically, this cost is incurred when the V-UTXO is spent . When the V-UTXO is not spent, it represents a very real potential UTXO that can be posted to the chain to unilaterally withdraw funds; users control their own funds. However, in order to spend this V-UTXO, the ASP must create a new pool transaction output using funds obtained by the ASP from elsewhere, and the funds in the spent V-UTXO cannot be used by the ASP until its round expires.

Therefore, spending a V-UTXO requires a short-term loan to cover the time from now until the round expires. This means that as the V-UTXO ages (getting closer to the expiration of the round), the liquidity cost of spending a V-UTXO will gradually decrease - in theory - and eventually approach zero (that is, when the round finally expires).

Finally, it’s also important to remember that the cost of spending a V-UTXO is relative to the total size of the V-UTXOs being spent, not the amount given to the recipient. This means that a wallet interested in transferring multiple V-UTXOs directly (as opposed to managing a single V-UTXO for (e.g.) a V-UTXO-based Lightning channel) needs to make a trade-off: deciding how many V-UTXOs to split a sum of funds into. Leaving only one V-UTXO minimizes the cost of unilateral withdrawals, but maximizes liquidity-based transaction fees; splitting into many V-UTXOs does the opposite. This is completely different from the economics of both on-chain Bitcoin and Lightning transactions.

What is the cost of liquidity? As of this writing, Phoenix, the Lightning Network wallet, charges a 1% fee for channel liquidity that lasts for 1 year; in the worst case, Phoenix will have to tie up funds for 1 year. However, this assumes that this liquidity is not used. It is very likely that the cost of funds for Phoenix is actually higher than 1% per year, but they assume that the average customer will use up this incoming liquidity in less than a year. Phoenix also earns income from transaction fees, so it can also subsidize channel liquidity. Finally, Phoenix may not make money!

The yield on US Treasury Bills can give us another estimate. At the time of writing, the yield on 3-month Treasury Bills is about 5% annualized. Because the inflation of the US dollar will make this yield inflated, for the purpose of analysis, we assume that the liquidity cost of funds based on BTC is 3% annualized.

If a round is four weeks long, the liquidity cost of trading will start at $3% / \frac{52}{4} = 0.23%$ and gradually decrease to 0. Assuming that the user moves funds two weeks before the current round expires, the liquidity cost required to achieve self-custody of funds is about 1.5% per annum. On the other hand, if the user waits until the last minute [8] , this liquidity cost will be close to zero, with the risk of missing the expiration date.

Users may not consider this a cheap price. Moreover, this price assumes that the cost of each round is fixed and that transaction fees and other costs have been amortized over a large number of participants, making them insignificant.

But what if this fixed cost is not small? Assume that an ASP has 1,000 users, creating one pool transaction per hour on average. In 4 weeks, there will be 672 on-chain transactions. This means that, just to keep their own funds, the users of this ASP as a whole have to pay fees for almost as many transactions as there are users! It may be cheaper for them to open their own lightning channels, and the ASP will make them wait an hour for the transaction to be confirmed.

3.4.2 Cold Start Ark

A new ASP with only a few users faces a dilemma: either ASP rounds occur infrequently, so users have to wait a long time for the corresponding round to collect enough V-UTXO to achieve useful scalability and transaction fee reduction; or, ASP pool transactions occur frequently, and each user has to pay high transaction fees. As we discussed in the previous section, it may take a large number of users to amortize the frequent rounds and corresponding pool transactions.

This problem is exacerbated by expiration times, even more so than with Flash channels: at least a Flash channel can be useful indefinitely, and after a channel is opened it can be amortized over the next few months. Second, because rounds expire, there is less flexibility in the timing of creating the transaction outputs that support them: if high fees persist for a week or two, users of expiring pool transaction outputs will have no choice but to (collectively) pay the high fee rate to maintain custody of their funds. With Flash channels, the timing of opening channels is much more flexible.

Although the authors of Ark were very optimistic at the beginning and believed that a new round could be created in just a few seconds, if transaction fees cannot be subsidized, Ark's initial launch may only occur in application scenarios that can wait for several hours to confirm transactions.

3.4.3 Interactivity

Non-custodial Akr is a high interaction protocol: because your V-UTXO expires, you need to interact with the ASP before it expires, otherwise the ASP can take your funds. This interaction requirement cannot be outsourced: in contrast, lightning channels have " watchtowers " that can curb your opponents from trying to defraud you - even if your node is offline, and Ark V-UTXO holders must use their own private keys to refresh funds in order to be trustless. In Ark, the closest thing to a watchtower is signing a transaction to allow the watchtower to unilaterally retrieve funds on your behalf before expiration, which has a high transaction fee cost.

Consider what happens to a V-UTXO if the owner of the funds goes offline: after a round expires, ASPs need to recover the funds to meet liquidity needs in future rounds. If a V-UTXO holder is offline, publishing the V-UTXO to the chain will face high transaction costs because the ASP needs to withdraw funds from multiple layers of the V-UTXO tree. ASPs can recreate unspent V-UTXOs in a new round, but from the perspective of the V-UTXO holder, this is not trustless because they will not be able to spend these V-UTXOs without data from the ASP [9] . ASPs can also directly record unspent V-UTXOs as escrow balances. There can even be provisions for confiscation of funds!

My personal opinion is that given the cost of self-custody in Ark is not cheap, many users will switch to ASPs that can automatically roll funds into new rounds, accepting the risk of fraud at the end of each round. This is cheaper than preventively moving funds to ensure the safety of funds (for example, not transferring funds because the phone is not opened in time and the control wallet is not transferred).

3.4.4 More Advanced Ark

It might be plausible to use more advanced restrictions to reduce Ark's liquidity requirements if the typical scenario is that liquidity is spent in one round. For example, let's assume that 50% of the total value of all V-UTXOs in a transaction pool output is spent. If ASPs can recycle only a portion of the transaction pool output, they can recycle funds faster and reduce overall liquidity costs. While no specific proposals to do this have yet been made publicly, it seems likely that with sufficiently advanced TM restrictions, this could be done. Most likely, this would be done through some kind of "Script Revival" soft fork, adding multiple useful opcodes at once.

Similarly, with sufficiently advanced TM constraints like this, the full transaction output tree structure can be replaced with some kind of rolling withdrawal scheme to save space. We will discuss this topic in the next section, as this technique may also be useful for other schemes.

The custody problem at the end of a round is another problem that sufficiently advanced TM constraints can solve: a constraint, especially one that can be verified with zero-knowledge proofs (ZK-proofs), can force the ASP to recreate all unspent constraints in the next round, eliminating the problem of custody being handed over to the ASP at the end of a round. Although this may not be enough to make it trustless , because users may still need some data from the ASP to spend their V-UTXO in the new round, which can prevent the ASP from benefiting from fraud against offline users.

3.4.5 On-chain fee payment for unilateral withdrawal

Similar to lightning channels, the economics of on-chain fee payments and the actual value of a V-UTXO after fees are paid determine whether Ark’s usage meets our L2 definition (unilateral exit without ASPs profiting from fraud). We will discuss this further below when discussing the transaction output tree design pattern.

3.5 Validity Rollups

A broad class of sidechain-like constructions that are commonly proposed to use some form of zero-knowledge proof technology to enforce rules. Such zero-knowledge proof technology is the key difference between "validity rollups" and other forms of sidechains: if the relevant zero-knowledge proof scheme can work, the validity of the transaction will be guaranteed by mathematics, without the need to trust a third party. In this usage, the "zero knowledge" property of zero-knowledge proofs is not necessary: it is completely fine even if the proof "leaks" information about what it is proving. It just happens that the vast majority of these mathematical schemes happen to be zero-knowledge proof schemes.

From a Bitcoin perspective, validity rollup schemes require a restriction because we want to be able to create a UTXO for such a scheme that can only be spent if the scheme's rules are followed. This is not necessarily a decentralized system. Many validity rollup schemes are actually completely centralized; the rollup proof is only used to prove that a centralized transaction orderer applied the rules to a set of ordered transactions.

As for what restrictions to use… Zero-knowledge proof technology is still a very new field, and developments are happening all the time. Therefore, it is highly unlikely that we will see any opcodes added to Bitcoin that directly verify a particular type of zero-knowledge proof. Instead, it is generally accepted that specific schemes will use more general opcodes (particularly OP_CAT ) to verify zero-knowledge proofs in scripts. For example, StarkWare is working to get OP_CAT adopted.

Validity rollups are a very large area with many low-quality/high-hype projects. We won't discuss them further than pointing out what opcodes might be needed to make this class of designs feasible.

3.6 BitVM

Very roughly speaking, BitVM is a way to construct a lightning channel between two participants so that the rules of the lightning channel can be enforced with a zero-knowledge proof. Because it can be implemented on Bitcoin today without restrictions, and it cannot be directly used to create L2 systems that are more scalable than "each user has 1 UTXO", we will not discuss it further.

3.7 Hierarchical Channels

Hierarchical channels [10] aim to make resizing channels faster and cheaper: “Hierarchical channels are to channel capacity what Lightning channels are to Bitcoin.” However, fundamentally, it still does not go beyond the “1 UTXO per user” constraint. It also does not require any changes to the Bitcoin protocol. So we will not discuss it further. Hierarchical channel supporters should just implement it! It does not require our permission.

3.8 CoinPool

Coinpool allows multiple users to share a UTXO, transfer funds between users, and users can withdraw funds unilaterally. CoinPool's written proposal requires three new soft fork features: SIGHASH_ANYPREVOUT , SIGHASH_GROUP to allow a signature to be applied only to a certain UTXO, and OP_MerkleSub to verify that a branch has been removed from a Merkle tree; the latter can also be implemented with OP_CAT .

Currently, CoinPool’s development seems to be somewhat stagnant, and the webpage describing its specifications was last updated two years ago.

3.9 Enigma Network

While I was asked to discuss Enigma Network, there seems to be no documentation on what the proposal actually looks like. The Bitfinex blog post proposes a series of slogans; the MIT page is blank. Since the blog post doesn’t really make it clear what it’s actually built, we won’t discuss it further.

4. Trading Pool Considerations

Bitcoin Core’s current transaction pool strategy is not ideal for L2 systems. Here we describe some of the major challenges it faces, as well as possible optimizations.

4.1 Transaction Nailing

Ultimately, it is an economic blow. "Transaction pinning attacks" refer to a variety of situations where someone can intentionally ( or unintentionally ) make a target transaction difficult to mine because another conflicting transaction was broadcast first and was not mined. This is an economic blow because in a true transaction pinning scenario, the target transaction is the one that miners will benefit from once it is mined; while the conflicting transaction has not been mined for a long time (possibly forever).

The simplest example of a pinning attack comes from the fact that without "full-RBF" (i.e. nodes assume that all transactions are replaceable by default), transaction replacement can be turned off. Then, a transaction with a low fee and replacement turned off can be mined and cannot be replaced. Basically, all block producers have turned on full-RBF, which solves this problem; and as of this writing, full-RBF should be turned on by default in the next version of Bitcoin Core (after 11 years of work !).

This makes the nailing attack related to BIP-125 rule #3 the only remaining nailing attack related to multi-party L2 protocols (and not yet solved in Bitcoin Core). Quoting BIP-125 rule #3 here:

The replacement exchange will need to pay a higher absolute value of fees (not just the fee rate); higher than the sum of fees paid by all the exchanges being replaced.

This rule can be exploited: a large but low-fee nailing transaction (or set of transactions) can be broadcast, spending the outputs associated with the multi-party agreement. Because the transaction has a low fee, it will not be mined soon (maybe ever). However, because its fee is high in aggregate, it is uneconomical to replace it with another transaction.

The nailing attack related to BIP-125 rule #3 is easily solved in the " Redeem Rate Replacement (RBFR)" and can be solved in all cases. Unfortunately, it is not clear whether RBFR will be adopted by Bitcoin Core soon, because they spent a lot of time on a worse incomplete solution " TRUC/V3 transactions " ( Chinese translation ).

4.2 Payment Methods of Fees

RBF, CPFP, SIGHASH_ANYONECANPAY , anchor output and handling fee funding

Because fee rates are unpredictable, reliable and economical fees are very difficult to achieve when transactions are pre-signed. The gold standard for fee payment is to use RBF (replacement fee), starting with an "underestimated" value and gradually replacing it with a higher fee version until the transaction is mined. For example, the OpenTimestamps calendar software has used this approach for many years, and LND also supports " RBF with expiration date " in v0.18.

RBF is the gold standard because it is the most space-efficient in almost all [11] scenarios: a replacement transaction requires no additional inputs or outputs relative to a transaction that guessed the correct fee from the beginning.

Efficiency is important because inefficiencies in fee payments make darknet fee payments profitable for large miners, while small, decentralized miners cannot benefit because it is impractical and useless to pay small miners in the hope of transaction confirmation. Out-of-protocol payments may also raise AML/KYC issues: Currently, most out-of-protocol fee payment systems require some form of AML/KYC process; a notable exception is the mempool.space accelerator , which, at the time of writing (August 2024), can use lightning payments without an account.

In order to use RBF directly in the context of presigned transactions, you would need to presign the same transaction with different fee variants to cover the full range of possible fees. While this is quite feasible in many cases, as the number of necessary variants is typically small [12] , currently, the Lightning Network protocol in production environments - as well as other proposed protocols - have chosen to instead use " child pays parent (CPFP)", typically via "anchor outputs".

The idea behind anchor outputs is to add one or more small (or zero-value) outputs to a transaction, which can be spent by child transactions with an additional fee (i.e., CPFP). This is naturally very inefficient when applied to protocols that use small on-chain transactions such as Lightning Channels, which can almost double the total size of commitment transactions using temporary anchor outputs . This is less troublesome when applied to protocols that use larger transactions, such as using OP_CAT to implement restriction clauses.

A less obvious problem with anchor outputs is the need to keep extra UTXOs (used in child transactions) to pay fees. In a standard "client" application, this can be a significant overhead burden, since when not using anchor outputs, there is usually no need to keep more than one UTXO at all. In fact, in some existing consumer-facing Lightning wallets, the inability to pay fees in a high-fee environment may make them vulnerable to channel adversaries (theft of funds).

SIGHASH_ANYONECANPAY can be used to pay fees in some cases, allowing additional inputs to be added to a signed transaction; SIGHASH_SINGLE allows outputs to be added as well. The Lightning Network protocol uses them in HTLC transactions. Currently, if not handled with care [13] , this usage is vulnerable to pinning attacks, as an attacker can add many inputs and/or outputs to create a high-fee/low-rate pinning transaction. RBFR can solve this problem; the method used in TRUC/V3 transactions cannot solve this problem. This fee payment method is not as efficient as RBF, but can be more efficient than anchor outputs.

Finally, there are also many soft fork proposals to add a fee funding system to the Bitcoin protocol. This allows transactions to declare dependencies on other transactions, so that the funding transaction can only be mined when the funded transaction is mined (most likely in the same block). This can be much more efficient than traditional CPFP, because the funding transaction can use far fewer bytes than the transaction input to declare this dependency.

4.3 Alternative Transaction Cycle Attack

The "Substitution Transaction Cycle Attack" [14] attempts to block a target L2 transaction with substitution transactions long enough to allow a less valuable transaction to be mined. Essentially, the Substitution Transaction Cycle Attack is an alternative to the Transaction Pinning Attack for the attacker, as the attacker's intent is to block a good, honest transaction long enough to allow a less valuable, dishonest transaction to be mined. However, the Substitution Transaction Cycle Attack cannot be triggered accidentally.

A typical example is the HTLC transaction in the lightning channel. Although people may think that HTLC is a contract, either a transaction spends it by revealing the original image, or it will time out. But in fact, due to the limitations of Bitcoin scripts, the opportunity to spend it by revealing the original image always exists. After the timeout, it will only open an additional timeout spending mechanism.

The Replacement Transaction Cycle Attack exploits this by continuing to attempt to use the preimage spending transaction after the timeout to replace the transaction that attempted to redeem the value through the timeout mechanism, while keeping the preimage hidden from the victim. A successful Replacement Transaction Cycle Attack will last long enough until the HTLC in another channel times out.

A major challenge in profiting from a replacement transaction cycle is that each round of the attack costs money. A deadline-aware Lightning Network implementation would use increasingly high fees in an attempt to spend (redeem) the current HTLC before the next HTLC output expires. Second, once the replacement cycle ends, anyone can defeat the attack by rebroadcasting the replaced transaction [15] .

Like the pinning attack, the replacement transaction cycle is also an economic explosion for miners. At the end of each cycle, another transaction is removed from the transaction pool, although it is completely valid and can be mined as long as the miner retains it in the transaction pool.

5. Feature Mode and Soft Fork

Now that we have outlined the challenges faced by various L2 systems and transaction pools that rely on restrictions, we will now distill a series of well-known soft fork features (mainly new opcodes) and design patterns common to these L2 systems. For soft fork proposals, we will also discuss the technical risks associated with these proposals and the challenges faced in deploying them.

5.1 OP_Expire

Let’s get this straight first. OP_Expire was proposed as a way to directly eliminate the substitution cycle attack [16] , and it goes straight to the root: HTLCs can be spent in two different ways at the same time. In the context of L2 systems, this is relevant to all systems that use HTLCs and similar mechanisms, and may also be relevant to other usages. OP_Expire will make a transaction output unable to be spent after a certain point in time, so that the spending conditions of HTLCs become truly exclusive OR, rather than "programmer's OR".

A true OP_Expire soft fork would probably consist of two features, similar to OP_CheckLockTimeVerify and OP_CheckSequenceVerify coming in two steps:

  1. The expiration height field of the transaction, most likely implemented via a taproot annex.
  2. An OP_Expire opcode that checks that the transaction's expiration height is not less than the target height.

While OP_Expire itself is hardly a constraint, it seems to be useful for many L2 systems that rely on constraints. However, given that alternative transaction loops can also be mitigated through mutual rebroadcasting [15] , it may not be useful enough.

One very obvious challenge to deploying and using OP_Expire is blockchain reorganizations: the Bitcoin community, since the re-base bug [17] , has been trying to ensure that Bitcoin’s consensus protocol has such a property that previously mined transactions can be included in new blocks even after a deep reorganization. This design principle attempts to avoid the evil scenario where a large number of confirmed UTXOs suddenly become permanently invalid - and people who rely on these UTXOs will lose their funds - if a consensus error causes a large-scale reorganization.

In the event of a large-scale reorganization, transactions using the above expiration mechanism may become unavailable for mining because they have reached their expiration height. The OP_Expire proposal proposes that transactions using the above expiration mechanism can be treated as coinbase transactions, making their outputs unspendable within 100 blocks, thereby alleviating this problem.

A significant burden of deploying a transaction expiration mechanism is reaching consensus: is this trade-off acceptable? Or even, do we need it? In transactions where OP_Expire can work, long locks that freeze user funds are already included. Adding longer timeouts is inappropriate. In addition, after a block reorganization, double spending can always invalidate some UTXOs: with the popularity of RBF and the emergence of "keyless anchor outputs", will the transaction timeout mechanism still have much effect?

5.3 SIGHASH_ANYPREVOUT

BIP-118 proposes two new signature hash modes, neither of which commits to a specific UTXO being spent. SIGHASH_ANYPREVOUT , which (essentially) commits to scriptPubKey instead, and SIGHASH_ANYPREVOUTANYSCRIPT which allows any script. As discussed earlier, this was originally proposed to support LN-Symmetry, to avoid requiring a dedicated response for every channel state that is signed.

SIGHASH_ANYPREVOUT may also be useful when we want to use pre-signed transactions with RBF fee variants, because the signature is no longer dependent on a specific transaction id, thus avoidingthe combinatorial explosion of fee variants . However, the current BIP-118 does not point out this application scenario; it may also be incompatible because SIGHASH_ANYPREVOUT is proposed to also commit the value of UTXO.

One initial objection to SIGHASH_ANYPREVOUT was that wallets could get themselves into trouble by using it inappropriately. The problem is that once a SIGHASH_ANYPREVOUT signature is issued, it can be used to spend any transaction output that uses the same script. Therefore, SIGHASH_ANYPREVOUT allows a simple replay attack that could lead to stolen funds, as long as a second output using the same script is accidentally created. However, since there are many ways for wallets and L2 implementations to shoot themselves in the foot, this concern seems to have died down.

At this point, the broader technical community seems reasonably optimistic about implementing BIP-118. However, as we discussed when discussing LN-Symmetry, people are also debating whether its main use case - LN-Symmetry - is a good idea.

5.3 OP_CheckTemplateVerify

The first proposal designed specifically for constraints we'll discuss is OP_CheckTemplateVerify , often referred to as "CTV", which seeks to create a very specific, limited constraint opcode that does just one thing: hash a spending transaction that doesn't include a specific input UTXO in a specific way, and then check that the resulting hash digest is equal to the top element of the script stack. This allows us to pre-constrain spending transactions that spend an output without making true recursive constraints possible.

Why can't CTV implement recursive restrictions? Because the hash function: CTV uses a template hash value to check the spending transaction, so there is no way [18] to create a template that can contain CTV and its own hash value.

That said, this isn’t necessarily a real limitation: on the latest computers, you can easily hash a CTV template chain with a depth of tens of millions of transactions in a few seconds. And nSeuqunce’s relative time lock and limited block space will make it easy for this chain to lock up a sum of money for thousands of years.

The current CTV proposal in BIP-119 only has one hash mode, called DefaultCheckTemplateVerifyHash , which essentially commits every aspect of the spend transaction to the template hash. From a practical perspective, this means that in many cases, CPFP will become the only available means of paying fees. As mentioned earlier, this can be a problem because it makes dark payments a means of saving a lot of costs when the transaction size using CTV is small.

To be fair, CTV also has broad support in the technical community around the proposed restriction opcode because of its relative simplicity and versatility.

5.3.1 LNHANCE

One proposal to implement CTV is to combine it with two additional opcodes OP_CheckSigFromStack(Verify) and OP_InternalKey . The problem is that, as of this writing, the documentation in the PR and BIP is insufficient to support or reject this proposal. The BIP contains absolutely no analysis of the real-world use cases in which these opcodes are expected to work, let alone in-depth case scripts.

While the authors of this proposal may have good reasons, it is their responsibility to explain those reasons and to give appropriate justification. Therefore, we will not discuss it further.

5.4 OP_TXHASH

Similar to CTV, this proposal implements a non-recursive limit clause function by hashing data from the spending transaction. Unlike CTV, the TXHASH proposal provides a "field selector" mechanism that allows for flexible spending restrictions. This flexibility achieves two main goals:

  1. Allows adding fees to transactions without breaking the protocol for chaining multiple transactions.
  2. Multi-user protocols can allow users to restrict input and output to only their own users.

The main problem with OP_TXHASH is that the field selector mechanism also introduces a lot of complexity, making it harder to audit and test (compared to the much simpler CTV proposal). As of this writing, there is not even a design analysis of what benefits the field selector mechanism can bring and how to use it specifically. Therefore, we will not discuss it.

5.5 OP_CAT

This is a concatenation opcode that concatenates the top two elements of the stack and pushes the result back to the stack. When Bitcoin was first released, OP_CAT was enabled. But Satoshi Nakamoto quietly removed it in 2010 because the initial implementation lacked a size limit on the result element, making it vulnerable to DoS attacks. Take a look at this script:

 DUP CAT DUP CAT...

If the size of the stack elements is not limited, each iteration of DUP CAT will double the size of the top element of the stack, eventually using up all the memory.

Concatenation is sufficient to implement many types of restrictions, including recursive restrictions, by:

  1. In the stack, one or more OP_CAT devices (and whatever constraint-specific logic is needed) are used to assemble the incomplete transaction without the witness data .
  2. Verify that the assembled transaction matches the spending transaction in the stack.

It turns out that by abusing the mathematics of Schnorr signatures , it is possible to use a carefully constructed signature to perform the second step in OP_CheckSig . However, it is more likely that OP_CAT soft fork will be combined with OP_CheckSigFromStack , which allows performing the second step by verifying that a signature in the stack is a valid signature for a target transaction [19] ; and then using OP_CheckSig on the same signature to verify that the spending transaction is consistent with the target transaction [20] .

The fact that we only need to assemble transactions , and not witness data, is a key point: the constraints only need to verify what the transaction does - its inputs and outputs - and not whether the witness data (if any) makes that operation valid.

Modulo script size limits, combined with OP_CAT and OP_CheckSigFromStack , are sufficient to develop many types of constraints, including recursive ones. This will be more expensive than more efficient solutions (such as CTV). But the difference in cost is smaller than you think!

Basically, using OP_CAT requires that all non-witness data parts of the spend transaction be placed on the stack via witness data. For standard CTV use cases, such as the transaction output tree, the spend transaction has no witness data at all. But because the witness data can be discounted by 75%, the actual transaction fee of the child transaction is only 25% higher. Not bad!

5.5.1 Is OP_CAT too powerful?

This is probably the biggest political and technical obstacle to deploying OP_CAT : it is hard to predict what uses OP_CAT will make possible. Once the cat is out of the cage, it is not so easy to catch it back.

A good example is the argument that only OP_CAT is needed to implement fairly efficient and secure STARK (Scalable Transparent Statements of Knowledge) verification in Bitcoin Script . Since STARKs can prove a fairly wide range of statements, being able to implement STARKs efficiently has huge implications that don't just affect L2 systems, as it would enable many different systems to be built on Bitcoin. One strong objection is that these usages may not be good for all Bitcoin users.

The creation of harmful, centralization-inducing "miner extractable value (MEV)", which Matt Corallo has named " evil MEV (MEVil)", is a key potential problem ( Chinese translation ). In short, MEVil refers to a situation where large miners/large mining pools can earn additional income by using complex transaction mining strategies - rather than just collecting as many fees as possible - while small miners have difficulty using these strategies. The financial instruments that OP_CAT can create are very complex, which will make MEVil difficult to eliminate. In Bitcoin, significant MEVil has already appeared when the token auction protocol was introduced; fortunately, this problem has been solved due to the adoption of full-RBF.

In addition to potential MEVil, there are several other uses of OP_CAT that could be harmful. For example, the Drivechain proposal, which we have reviewed before, is widely considered to be harmful to Bitcoin. Some people think that Drivechain can be implemented using OP_CAT . Another example is token protocols like Taproot Assets. While there is basically nothing to prevent them from using the concept of " client verification ", some people have proposed using OP_CAT to implement them, which may be more attractive to end users, but this may use much more block space, potentially squeezing out "orthodox" Bitcoin transactions. These uses may raise legal issues, depending on how often these token protocols are used in financial fraud.

5.6 Progressive Hashing

In a restricted implementation, the primary use of OP_CAT is to concatenate data and then hash it. Another way to achieve the same goal is to use some kind of incremental hashing opcode, taking some intermediate state of a SHA256 operation, and then hashing more data; SHA256 itself operates on 64-byte blocks of data. There are many possible designs for incremental hashing opcodes.

An important design decision was whether to expose th

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments