Author: roasbeef
Source: https://delvingbitcoin.org/t/ln-summit-2024-notes-summary-commentary/1198
About 3 weeks ago, more than 30 Lightning Network developers and researchers gathered in Tokyo, Japan, and discussed many issues related to the current status and future development of the Lightning Network protocol (as well as the related Bitcoin P2P protocol and consensus protocol) over three days.
Previously, the last such conference took place in Oakland, California, USA in June 2022: LN Summit 2022 Notes & Summary/Commentary . A rough meeting note or summary can be seen here: LN Summit Oakland 2022 - Google Docs .
My notes from the Lightning Developers Meeting can be found here: Lightning Summit Tokyo - 2024 - Google Docs . And our agreed-upon daily schedule can be found here: lightning summit.md · GitHub .
It is worth mentioning that many of the panel discussions did not appear in my notes (although I tried my best) or are not reflected in the above schedule.
Having said all that, here is my attempt to summarize the main discussion topics and conclusions (with some comments). If attendees find anything inaccurate or incomplete in my summary, please reply and fill in the blanks regarding the discussions and decisions that took place during the three days.
What about transaction packet forwarding and V3 commitment transactions?
This was the first major topic we discussed, following the release of the latest release candidate of Bitcoin Core 28.0 (which has already been released at the time of writing).
Fee Estimates and Basic Commitments
Before jumping into the latest and greatest new proposed commitment transaction design to date, I want to briefly describe how today’s commitment transaction design works and where it falls short.
(If you already know how commitment transactions work in current lightning channels, you can skip this section)
A key aspect of the Lightning Network's design is the concept of a "unilateral exit". At any time, any party must be able to force exit from the channel and get their funds back after a delay. This delay is needed because a party may defraud the adversary by publishing an old, revoked state to the chain. This time window allows the honest party to deny the adversary's request for funds by proving that they know a secret value that is only revealed when a state is revoked.
This ability to unilaterally exit a channel is also a key module required to enforce HTLC contracts; HTLC contracts implement the "take it or refund" function, making multi-hop Lightning Network payments possible. These contracts come with another delay module: a delay determined by an absolute time point (absolute time lock). On a path forwarding a payment in the Lightning Network, each hop has a time window of T blocks (CLTV's time lock difference) to confirm their commitment transactions and time out HTLCs that have been unable to settle for a long time. If these commitment transactions cannot be confirmed in time, they will risk losing funds due to "unilateral redemption" because the next HTLC to be credited will time out, resulting in a timeout/redemption race.
(Translator's note: When forwarding a payment, a node will get an "incoming HTLC" from its previous node and provide an "outgoing HTLC" to its next node. When the downstream node neither fails the payment nor sends back the original image, but just delays, the node's only way is to get the commitment transaction confirmed by the block and retrieve the value of the outgoing HTLC on the chain. However, the incoming HTLC also has an expiration time. Once the incoming HTLC expires, the downstream node uses the original image to claim the value in the outgoing HTLC (participating in the handling fee race), which will result in a net loss of funds for the node. Therefore, the node can only avoid this race/net loss by getting the outgoing HTLC confirmed by the block before the incoming HTLC times out.)
The ability of nodes to get their Commitment Transaction (in the event of a unilateral exit) confirmed in a timely manner depends on their ability to make valid fee predictions. If the Commitment Transaction (which cannot be modified unilaterally) carries an insufficient fee, it will not be confirmed in a timely manner (or even accepted by the node's transaction pool!). Currently, the channel initiator can send an update_fee
message to the counterparty to increase (or decrease) the fee rate of the latest Commitment Transaction. This is a critical tool, but it forces the initiator to either be prepared to pay a significantly higher fee for the Commitment Transaction (to ensure that it will always be confirmed in the next block after the transaction is broadcast) or to go to great lengths to predict future fee rates. Because the initiator pays the entire fee of the Commitment Transaction, the responder cannot directly influence this fee, and instead, currently, can only try to force close the channel if they do not accept the fee level.
Anchor output is here!
To address some of the shortcomings of the current commitment transaction format, people have proposed "anchor outputs" [ 1 ]. The general idea is that both parties can get a small output in the commitment transaction that can only be spent by themselves (not by the other party); this dust output allows the commitment transaction to be charged a fee afterwards. This design relaxes the requirement to estimate the fee, but does not completely eliminate it. Now, the goal is no longer to get the commitment transaction confirmed in the next block, but to get the fee high enough to enter the node's transaction pool . Once the transaction enters the transaction pool, both parties can add fees to eventually get the commitment transaction confirmed. Moreover, this allows us to bundle two-stage HTLC transactions together, achieving greater batching when clearing unsettled HTLC contracts. However, the fee required to enter the transaction pool, like the fee required to enter the next block, is constantly changing. Because nodes have a default transaction pool space limit of about 300 MB, as more transactions enter, nodes will begin to discard some of the lowest-fee transactions, thereby raising the fee threshold for entering the transaction pool. Finally, the maximum handling fee that the existing anchor transaction can give may not be enough to enter the transaction pool, and the committed transaction will be abandoned by the node. At this time, the node that wants the transaction to be confirmed cannot broadcast its committed transaction (using the existing P2P network), which means that TA may not be able to confirm in time (or even completely), so it is impossible to avoid the race.
Along the way, developers and researchers have discovered a number of subtleties related to transaction propagation and mempool forwarding policies that effectively allow an adversary to “nail” [ 2 ] a transaction in the mempool (intentionally prevent it from being confirmed). Various known pinning attack surfaces exploit degenerate cases related to BIP 125 and the mempool policies in widespread use today (multiple ancestor transaction limits).
V3, TRUC and 1P1C are here (this time)!
Next up are “V3 transactions” and “Topologically Restricted Until Block Confirmation (TRUC)”. The ultimate dream of Lightning Network developers is to remove update_fee
entirely and make commitment transactions zero-fee instead. This would take all the guesswork out of trying to figure out what the fee should be. However, if this is done alone, the consequence of a commitment transaction with zero fee is that it will not be accepted by the transaction pool and therefore will not be propagated in the Bitcoin P2P network.
The combination of TRUC (i.e. BIP 434) (a new type of anchor output) and the "optimistic 1 parent 1 child" transaction package forwarding strategy has become the best solution currently known, which can practically solve the current transaction forwarding and confirmation problems of the Lightning Network.
TRUC introduces a new set of transaction replacement rules that are intended to address the degradation of BIP 125 in a small number of scenarios. In addition, it adds a new set of transaction topology size constraints to further limit the problem. TRUC transactions use the version field value 3 (rather than the sequence field like BIP 125) to indicate that the transaction voluntarily uses this new set of rules.
Also in Bitcoin Core 28.0, a new standard pubkey script type "PayToAnchor (P2A)" was made available [ 2 ]. P2A is a new special SegWit v1 output ( OP_1 <0x4e37>
) intended for use with CPFP fee additions. Inputs spending this type of output must have empty witness inputs and can be spent without a signature. Future versions of this new type of output may eventually allow outputs to become dust, as long as they are spent in the same block in which they were created (via CPFP).
The final module designed into this new commitment transaction format is 1 Parent 1 Child (1P1C) transaction packet forwarding [ 4 ]. 1P1C is basically opportunistic transaction packet forwarding. Instead of relying on a new P2P message (which may take time to deploy across the entire network), it allows nodes to change their behavior when they receive an orphan transaction (a transaction whose input is not yet known). Instead of storing the child transaction in the orphan transaction pool, the node will selectively try to request the parent transaction from the node that reported the child transaction, even if the parent transaction's fee is lower than the local transaction pool's fee threshold.
Together, these three new transaction forwarding primitives can be used to redesign the commitment transaction format (i.e. anchors) for lightning channels to address many of the long-standing issues mentioned above. t-bast has already begun prototyping a new commitment transaction format: x.com .
That said, there are still some issues to be resolved, including:
- How to handle dust outputs from commitment transactions?
- Using the P2A approach, we can put all dust into anchor outputs, rather than having them become miner fees as is currently the case. This will solve some known issues related to excessive dust outputs, but will also bring new concerns, as P2A anchor outputs do not require signatures to spend.
- Should P2A outputs become “keyless”?
- If the output is keyless, then any third party can immediately help clear the anchor output (compared to currently, only the two parties participating in the channel can immediately clear the anchor output until 16 blocks after the anchor output is confirmed).
- Related to the above point, if we put all dust outputs into a P2A anchor, then whoever clears the P2A anchor can take the value of all dust outputs. Naturally, miners are the most reliable people to claim the funds, assuming there is no signature requirement.
- Some argue that we should keep the signature requirement because it prevents others from interfering with participants trying to get the commitment transaction confirmed. This is to hedge against the risk of undiscovered flaws in the TRUC+P2A combination (which could introduce the risk of undiscovered pinning attacks).
- Will the viral nature of V3 transactions impact certain use cases related to advanced channel stitching?
- All unconfirmed descendants of a V3 transaction must still be V3 transactions. So there is concern that this will affect the use of channel splicing, where a node tries to satisfy multiple transaction streams with a single batched transaction. The viral nature of the V3 transaction type will force anyone spending unconfirmed change outputs to continue using V3 transactions, which may be beyond their capabilities.
The new TRUC rules also allow for a type of transaction bundle RBF, where if a new conflicting transaction bundle comes in, the node will attempt to RBF the existing bundle. In the context of 1P1C, this is also called “sibling eviction” [ 5 ].
All of the above information can be found in more detail in this wallet developer's handy guide [ 6 ].
This is probably one of the more concrete new initiatives that will come out of this meeting. From here, we will figure out what the new V3 commitment transactions look like, while waiting for enough nodes in the P2P network to upgrade to the new version so that we can rely on the new forwarding behavior.
It is also worth pointing out that this shift will further impact how wallets behind lightning nodes handle UTXO inventory. With this approach, given that commitment transactions do not carry fees, in order to confirm transactions, nodes must use an existing UTXO to anchor the commitment transaction, otherwise the commitment transaction will not even be broadcast. In practice, this means that wallets need to reserve on-chain funds in case of forced channel closures. Tools such as channel splicing and submarine swaps can also be used to allow wallets to transfer funds, or batch multiple on-chain interactions.
PTLC and Simplified Commitment Transaction Format
The next session focused on the combination of “Point Time Lock Contracts (PTLCs)” and “Simplified Channel State Machines”. At first glance, these two topics seem to have nothing to do with each other, but we will soon see that some of the tricky situations caused by the design considerations of PTLCs can be alleviated by modifying the current channel state machine protocol (let’s call it “Lightning Channel Commitment Protocol (LCP)”) to a simplified variant.
First, let's briefly introduce PTLC. In the current Lightning Network protocol, we use payment hashes to implement multi-hop claims or degradation devices, allowing trust-minimized multi-hop payments. Although simple, this protocol has a big disadvantage in terms of privacy: the hash value carrying the payment is the same in every channel that constitutes the entire forwarding path, so if an adversary occupies two positions on the path, it can easily link a payment (and break the "multi-path payment (MPP)" cycle).
To fix this privacy hole, developers proposed many years ago to switch to using elliptic curves and private keys (instead of payment hashes and preimages). In 2018, a formal proposal emerged [ 7 ] that actually allows this new construction to be instantiated using an “adaptor signature” across the ECDSA and Schnorr signature schemes. This is interesting because it means that we don’t have to wait for the taproot upgrade to activate (which will enable Schnorr signatures). Instead, multi-hop locks can be transferred over a path consisting of ECDSA hops and Schnorr hops. Ultimately, for various reasons, this hybrid approach was never deployed. The benefit is that we can deploy a simpler, more unified, Schnorr-only multi-hop lock.
Fast forward to today, in addition to working on the specific design of “LN-Symmetry”, istagibbs has explored various design spaces, from message propagation to state machine plugins [ 8 ].
After discussing some of his findings, we move on to some key design questions:
Should we use single-signature or multi-signature based adapter signatures? In both approaches, the adapter point
T
used to create the signature allows the proposer to supplement the private key required to sign the HTLC.Adapter signatures based on the musiG2 signature algorithm are smaller in size (ultimately only one signature instead of two), but add additional coordination requirements, as both parties need to provide a nonce value to properly create a new commitment transaction.
Single-signature based adapters have larger signatures (two signatures, like current two-phase HTLC transactions), but the protocol is simpler because the HTLC signature can be sent along with
commit_sig
as usual.If we decide to use the musig2 adapter signature design, should we try to keep the current full-duplex asynchronous LCP process, or should we simplify further and move to a simple synchronous commitment state machine protocol?
Introducing musiG2 nonce for two-phase HTLC transactions will make the existing LCP protocol more complicated, because we will no longer be able to send the signature of the two-phase HTLC transaction along with the
commit_sig
message, because the party proposing the state change needs a shard signature from the responding party in order to proceed safely.However, if we modify the channel state machine protocol to be round-based , then, although we sacrifice some x-put, we don’t need to worry about how to deal with possible interleaved execution (both parties send
update_add_sig+commit_sig
at the same time). This leads to the topic of simplifying the process.
Round-based channel state machine protocol
In today's Lightning Network protocol, we use a full-duplex asynchronous state machine. This means that both parties can propose state changes at any time without consulting the other party in advance. At any time, we may have 4 types of commitment transactions: commitment transactions that have been finalized for one party, and commitment transactions that have not yet been constructed (signatures have been received, but the revocation secret value has not yet been sent). Assuming that both parties continue to send signatures and revoke their old commitment transactions, eventually the "commitment transaction chain head" of both parties will overwrite the same set of ongoing HTLCs.
This design is very throughput friendly because even if both parties send multiple revocation secret values at the beginning of the connection, they can continue to send new states without waiting for each other, because they consume revocation points in a sliding window, and each time the other party sends a new revocation secret value, it can invalidate an old state. It is similar to the role of TCP's sliding window.
One drawback of this protocol is that it is absolutely impossible to recover from a disagreement or error during execution, because both parties are sending updates arbitrarily (and continue to do so even after retransmissions), and there is no way to pause or rewind, and thus restore the state of the protocol.
One example is the channel reserve. In current commitment transactions, no matter which party wants to add an HTLC, the channel initiator has to pay the transaction fee with his/her own funding input. However, in order to actually pay for new HTLCs that may appear at any time, the channel initiator may have to pay the transaction fee with his/her own channel reserve. Because both parties can propose HTLCs at any time, in order to prevent this tricky situation, they need to leave enough transaction fee buffer for future HTLCs. However, it is difficult to accurately estimate the other party's future HTLCs.
If we have a turn-based protocol, we can catch all these tricky situations in advance and ensure that we can always resume the execution of the channel and avoid expensive forced closes. Such a turn-based protocol would be similar to the flow control protocol based on RTS (Request to Send) and CTS (Clear to Send) [ 8 ].
During normal execution, both parties take turns proposing changes to the set of committed transactions (adding or removing HTLCs). If one party does not have any changes, the other party can go first. Importantly, in this simplified protocol, both parties can explicitly object (NACK) or agree (ACK) to a set of changes. The ability to object to changes allows us to recover from faulty processes and makes the protocol more robust to false shutdown requests.
If we use musig2-based PTLC, then in a round-based execution both parties can send the nonce value in advance, thus eliminating the analytical difficulty of asynchronous cross-exchange of nonce values. Another easter egg is that such a protocol may be much easier to analyze, as current state machine protocols are notoriously poorly described.
Off-chain fancy tricks: SuperScalar, channel factories, and more
At the end of the first day of the conference, we focused on the off-chain channel constructions that can be enabled by "shared UTXO ownership": off-chain channel creation, cheaper self-custodial mobile wallets (to attract new users), batching transaction execution for multiple parties. Related proposals include: channel factory, timeout tree, ark, clark, etc.
A recently published proposal, SuperScalar [ 9 ], attempts to combine many of these primitives to solve the “last mile” problem for user-hosted mobile Lightning wallets. SuperScalar attempts to change the status quo while: ensuring that LSPs (Lightning Service Providers) cannot steal funds, not relying on any discussed Bitcoin consensus changes, and ultimately maintaining the ability to evolve as the system evolves while allowing some/all users to go offline.
The best explanation of SuperScalar is that it is the sum of three technologies: duplex micropayment channels [ 10 ], “Timeout Trees” proposed by John Law [ 11 ], and a ladder technique that allows coordinators of SuperScalar instances to spread funds and minimize opportunity costs.
I won’t go into the full proposal here, instead I refer those interested to the Delving Bitcoin post referenced above. Since the meeting, Z has proposed a few new iterations of his proposal that address some of the shortcomings and fork in different directions.
In summary, once you combine the above three together, you have a large transaction tree, where each transaction (leaf) is a regular two-party channel, belonging to the LSP and a user. Starting from the channel (leaf), each layer up is another multi-signature device consisting of the participants of its subtree. Each leaf also has an additional output with additional funds L, which can be used to allocate additional liquidity to a channel, as long as the LSP and the user are online. If more participants are online, branches higher up the transaction tree can also be re-signed, allowing the capacity of the channel to be redistributed on a larger scale.
This laddering technique allows LSPs to distribute their funds across multiple instances of this off-chain tree structure. Timeout tree technology is used to provide a time-delayed exit path for all users. Instead of always revealing the entire transaction tree to force an exit, all funds in an instance are given to the LSP after a period of time. This means that users need to jump to the next instance/ladder of the construct, similar to how shared VTXOs work in the Ark construct (which also uses a form of timeout trees). As a result, all channels in this construct no longer have infinite lifespans: users either transfer all funds out or work with the LSP to obtain a new channel in the next instance. Otherwise, users will lose their funds.
The life of a SuperScalar instance can be divided into two phases: active and extinction. In the active phase, users can use their channels normally. They may choose to exit the instance early, but can remain offline most of the time. In the extinction phase, users must come online, transfer funds, or migrate to another instance. It also has a built-in safety window. Once the extinction period begins, the LSP will no longer collaborate with the user to update the transaction tree, and may only sign outgoing payments (LSP is part of all channels, but sub-channels are also possible, requiring additional trust).
Returning to the additional output L, as mentioned above, output L is spendable by the LSP at will. If a user needs additional channel capacity, then the LSP can spend L and create a new subchannel with the target user A. However, the LSP can also sign L with another user B, which is to double-spend output L off-chain . Only one version of such a spend can appear on-chain, which essentially indicates that the LSP is overdrawn and may steal funds or cause users to lose funds they think are theirs. One solution is to use a signature scheme that will expose the private key if signed twice. There are a few ways to construct such a scheme: OP_CAT, breaking the signature into 7 or more instances; or using the double adapter signature scheme described in this paper [11].
Using duplex micropayment channels at higher levels means that as the number of internal updates increases, so does the number of transactions that users have to issue to force an exit from this construction. As always, we end up with an unavoidable trade-off related to the economics of the blockchain: if the cost of initiating a payment exceeds the value of the payment itself, the payment will not happen, or it will happen in a system that sacrifices security to save costs. In other words, it may not make sense for users to have small channels in such a construction because of the additional transaction fees required to force an exit. In order for small channels to be economical, either the coordinator needs to subsidize them, or users never need to display them on-chain, meaning they can always jump to the next SuperScalar step.
Another interesting topic is the conjecture that it is impossible to safely join a channel off-chain without using any on-chain transactions at all. To see why this is difficult, consider a scenario where Alice and Bob already have a channel and they want Carol to join it. Alice and Bob create a new state that adds a third output to the channel's commitment transaction, using Carol's public key. Carol asks Alice and Bob for some information to convince her that this is the latest state, but in fact, A and B can always forge some false state update history. Because the multi-signature device on the root has only two signers, A and B can always double-spend the commitment transaction to Carol, remove her from the channel, and steal the balance promised to her. If you think about it for a moment, you will find that this is similar to the "nothing at stake" problem in PoS chains: it is costless for A and B to forge history to deceive Caorl.
The main consequence of this impossibility conjecture is that fully off-chain constructions with dynamic membership (anyone can enter and leave at any time) require either (1) trust in the root signer, (2) some type of attribution + penalty mechanism, or (3) on-chain transactions. Solutions in the first category include: Liquid, Statechian, and Ark with out-of-round payments. In the second category, over the past year we have seen the emergence of solutions like BitVM, which relies on the 1-of-n honest actor assumption and uses an interactive on-chain fraud proof to attribute and punish fraud. In the last category, I would include constructions like: Ark, SuperScalar, and more broadly John Law's timeout trees. In this last category, users use new outputs created by on-chain transactions to verify a valid and unchangeable set of transactions from the leaves to the root, allowing them to unilaterally summon their own new channels.
To sum up, I think the meaningful conclusion of this part is:
- Developers and service providers are looking for new ways to onboard users with a smaller on-chain footprint and high capital efficiency.
- A promising solution seems to be some combination of the following technologies: channel factories, timeout trees, multi-party channels, temporary off-chain funds exchange protocols (Ark protocol family).
- To avoid introducing too much unnecessary complexity, any new protocol is likely to follow an incremental deployment plan, delivering modules in sequence, with each new module building on the previous ones.
Easter egg session: Lightning speech
In between sessions, there was a lightning talk event where people could talk about whatever interesting stuff they were working on.
One cool feature that came out of these talks is the ability for users to recover from spurious force close requests. This happens all the time for a variety of reasons, such as inconsistencies between software implementations, but most commonly due to disagreements over fees. The basic idea here is to give out an extra key that allows an adversary to spend their funds as quickly as possible when they force close a channel. This would be a purely altruistic action on behalf of the party that did not initiate the on-chain transaction, and a friendly action that actually helps the adversary.
Mechanically, one way to achieve this is to do a best effort to help the adversary clean the outputs by undoing the path (the opposite of normal usage). Some have also discussed slightly modifying the derivation of the outputs, encapsulating the new information in the channel reconstruction message. The non-broadcasting party would only publish this information when they know for sure that the latest state being published has been confirmed.
Make Gossip Less Annoying
On the second morning, the first meeting was about identifying specific improvements that could be made to the gossip protocol.
Gossip synchronization thread
The Lightning Network's gossip protocol has a well-defined structure, but leaves many areas of behavior up to the implementation. Examples of such behavior include: How many synchronized peers to maintain? Should we rate-limit incoming gossip messages? How should we validate new incoming channels (if any)? Should we periodically spot-check the topology graph for missing channels? Should we download everything from scratch every time?
During the discussion, it was mostly about each implementation learning from the others what they didn’t implement as well. Every few months/weeks, we find subtle bugs that prevent new channel updates or channel announcements from propagating. LND found that the biggest improvement they made recently that helped with visibility and propagation was to start using timestamp information about channel updates in gossip queries. Without this information, nodes cannot tell that even if they have the same set of channels as a counterparty (based on scids), the counterparty may have a newer channel. If an implementation prunes “zombie channels” that have been inactive for a long time, but does not actively sync gossip, then if they do not spot check via channel update timestamps in gossip queries, then they will not be able to recover old zombie channels and lose a large part of the network topology map.
Gossip 2.0
Next, we turn to the newly proposed changes to the gossip protocol, codenamed “Gossip 2.5” (or “2.0”, depending on who you talk to). Since the last specification meeting, LND has been advancing the specification [14] and implementation [ 15 ]. Currently, the specification is awaiting additional review/feedback, and this year LND has made the protocol work in an e2e environment (new channels only).
One new addition we discuss is the addition of SPV proofs to channel announcement messages. Some implementations either conditionally or unconditionally do not verify the on-chain funds of announced channels at all (e.g. LND, using the --routing.assumechanvalid
flag). For light clients that use a purely P2P network (e.g. Neutrino), fetching tens of thousands of transactions in a block can be a significant burden on power/bandwidth/CPU. If channel announcement messages could carry (but always commit to) an SPV proof, then the existence of a channel could be verified solely by the latest block header chain. If only the hash digest/root of the final payload is signed, then nodes that do not need the additional proof can request that the sender omit this information. In the past, LND has developed a proof format that supports aggregation at the batch level, which may be reused [ 16 ].
As for interoperability testing, other implementations either have other priorities right now, or may have to wait for progress in the upstream library to integrate musig2 (the PR for musig2 in the libsecp library was merged after this dev meeting!). No major implementation supports testnet4 today, so it may not have lightning channels yet. The attendees all agreed to make testnet4 the first testing ground for gossip 2.0!
Gossip 2.0 removes the timestamp field in the old channel update message and replaces it with the block height. This simplifies rate limiting because you can stipulate that a peer node can only update once per block. Because the block height is globally unified (no local properties such as time zones), it is more suitable for various collective coordination protocols. Several participants have done some research on reusing existing minisketch implementations, although we face different limitations and may end up using multiple different things at the same time.
(Note: During this time, I spilled coffee on my laptop. I missed a good portion of the discussion while troubleshooting.)
Fundamental limitations on payment distribution
We then had a session discussing some recent research on Lightning Network pathfinding/routing. The main topic was a presentation/discussion about some new research that attempts to understand fundamental limitations on distributing payments in payment channel networks [13]( https://github.com/renepickhardt/Lightning-Network-Limitations/blob/305db330c96dc751f0615d9abb096b12b8a6191f/Limits of two party channels/paper/a mathematical theory of payment channel networks.pdf)].
In summary, this study models the network topology as a series of edges and vertices, where each edge has three properties: the local balance, the remote balance, and the total capacity. Given a sample topology, we can determine whether a payment can be delivered: whether there is a series of pairwise balance modifications that can give the "receiver" the desired balance end state. Instead of running a conventional greedy-based pathfinding algorithm, this study looks at the feasibility of a payment globally. Note that this approach naturally captures the ability to force rebalancing during payments, allowing payment flows that would otherwise be unsatisfactory to run.
Inevitably, some payment flows are simply impossible. Reasons include: insufficient channel capacity, the sender does not have enough balance, the receiver does not have enough credit, etc. When this happens, within this model, an on-chain transaction must occur to add funds to or remove funds from the network's current balance set. Examples of such on-chain transactions include: opening a channel, closing a channel, channel splicing, or using a submarine call.
Based on the above, given some starting assumptions (topology, balance distribution, sample distribution of payment possibilities between any two nodes), we can derive an upper limit on the effective throughput of the payment channel network. To get this value (T), we divide the blockchain bandwidth TPS (Q) by the expected occurrence rate of infeasible payments (R), that is, T = Q/R. If we want T to be (for example) 47k TPS, then substitute the current main chain TPS (about 14) and we get R of 0.029%, that is, 47k TPS can only be achieved when 0.029% of payments are infeasible.
Ultimately, these numbers boil down to simple math based on simplifying assumptions. One aspect that this model doesn’t take into account is that on-chain interactions can be batched, i.e., multiple channels/users can configure their off-chain capacity/bandwidth with a single on-chain transaction. The above derivation also doesn’t take into account balancing payments (e.g., I pay back and forth between my two nodes, no fees), which never need to trigger an on-chain transaction, but are not counted in the TPS derivation. Still, such a model is useful for getting an abstract sense of the limitations of this system.
Multi-party channel & credit channel
The study also identifies two primitives that can help reduce the number of infeasible payments: multi-party channels, and credit within the network.
Multi-party channels aggregate multiple users in the channel graph, effectively forming a new fully connected subgraph. The intuition here is: if you treat the amount of money added to the channel in each direction as a constant, then by increasing the number of participants, you also increase the maximum amount of money each user can have. And when you increase this maximum amount, you reduce the number of payments that are infeasible due to balance/capacity constraints.
Then there are points, the idea is also simple: if a payment is not feasible, then in some hops, credit can be introduced to permanently or temporarily expand the capacity of a channel while increasing the balance of one party. In order to minimize systemic risk, such credit seems to be something that should not be introduced in the core of the network, but only exist at the edges. In theory, a protocol like Taproot Assets can also be used to increase the feasibility of payments while reducing the cost of onboarding users, because it allows users to express the concept of addressable/verifiable credit natively in channels.
The last mile problem of attracting mobile users
At the end, we had two separate but related sessions focusing on customer acquisition and user experience for mobile self-custody wallets. The first was the “last mile” issue as it relates to mobile user experience and onboarding [ 17 ].
In the current Lightning Network, the vast majority of user experience challenges come from one user trying to pay another user, but the recipient uses a self-custodial mobile wallet. This is similar to the last mile transmission problem in Internet infrastructure and bandwidth: the inner part of the network contains "loose pipes" with high bandwidth, which can quickly exchange information within the network. However, it becomes more expensive, less reliable, and slower to send information within the network to the final destination.
Customer acquisition cost & channel liquidity
In the context of the Lightning Network, it is not the aging infrastructure or the high construction costs that need to be dealt with, but the characteristics of mobile devices themselves. Compared with the routing nodes that are constantly online, mobile nodes need to be woken up to sign new account updates. In addition, if a mobile node wants to become a net payee (but does not have main chain funds yet, directly entering the Lightning Network), then a routing node must lock a line of liquidity for it. The establishment of this first channel is capital occupation from the perspective of the routing node, because it is possible that this mobile node will exit the network from now on, leaving the funds in the channel idle. In order to recover funds from users who have been offline for a long time, the routing node needs to force the closure of the channel, pay on-chain fees and time (wait for the relative time lock to expire, which can be up to 2 weeks).
As we move deeper into the last mile of liquidity costs, we begin to run into some fundamental limits on economics. If a user who only receives 10 satoshis in a channel has to pay 1,000 satoshis (on-chain fees) to open a channel, then creating a connection with such a user will result in a net loss for the routing node (not to mention the minimum channel capacity restrictions that exist on the network today). Any capital (receipt amount) that a routing node allocates to an inactive user can instead be allocated to high-speed channels in the network, earning routing fees to cover the cost of opening the channel. Assuming that costs can be amortized and subsidies can be sustained, infrastructure tools include: Phoenix Wallet's JIT channel system, Liquidity Ads [ 18 ], Sidecar Channels using Lightning Pool, Amboss's Magma, and so on.
User experience issues caused by the protocol
In addition to the online interaction requirements and on-chain fees, the current protocol design also has some abstractions missing that end up in the end user's mobile wallet, which becomes a customer acquisition cost. An example is the channel reserve: in order to ensure that both parties have concerns throughout the life cycle (deter fraud attempts), they must always keep a small balance in the channel (usually about 1%). This can be frustrating for users, because they often want to transfer all the balance in their wallet to migrate to another wallet, but then they find that they always have to keep a small amount of funds. In addition, as on-chain fees rise, the economically efficient channel capacity also rises.
Liquidity Fee Rebate
A more recent solution to the dust/small output problem is a concept used by phoenixd [ 21 ] called “fee rebates”. A fee rebate is a non-refundable payment used to purchase future payment volume. Whenever a user receives funds from a special routing node (a node that supports the protocol plugin) and the user does not already have an existing channel, the funds go into a fee rebate jar. Once the user has accumulated enough funds in the rebate jar, the routing node will open a channel with the user and use the rebates in the jar to pay for the service fee and on-chain fees. The minimum amount required to open a channel will vary depending on the service and on-chain fees.
From a pragmatic perspective, fee rebates work well. Assuming a user eventually receives enough money, they can receive the funds immediately without waiting for a channel to open. Once they have enough funds to form an L1 UTXO, they can pay with the fee rebate jar to create this UTXO. This technique can be combined with a system like ecash (using amounts in a mint to represent pending funds), or even credit channels using Taproot Assets (using asset UTXOs represented in a Pocket Universe to offset fees on L1).
At this point, the discussion returned to various off-chain constructions similar to Channel Factories and their limitations in the context of specific on-chain fees, number of users, and distribution of balances among those users. Basically, if you imagine some construction like this, based on a timeout tree, then if there are 100 million users, each with only 1 satoshi, then it is not cost-effective for them to expand the entire tree on-chain (since the fees are already more than 1 satoshi). If we imagine there is a built-in mechanism for users to migrate their funds to somewhere else so that the coordinator can take all the funds at once (similar to the model of Ark), then if the user fails to exit in time, the 1 BTC will be confiscated by the coordinator. All users trying to get their funds back on-chain is equivalent to burning all funds, so some participants imagined a "big red button" that can be used to burn all existing balances. Ideally, such a burn would require some kind of script (or client) that can verify the evidence so that the coordinator cannot cheat.
While the above scenario is more or less a thought experiment, I think it mocks some of the fundamental limitations we have when it comes to on-chain fees and small UTXOs. There is nothing new here, and this mechanism is exactly why most full nodes default to the concept of dust outputs: it is not cost-effective to spend 1/3 of the value of a UTXO to pay fees (to transfer the value in it). The same is true for off-chain systems, only some kind of subsidy or exogenous value system can serve as a lifeline. Any transfer that is uneconomical on the main chain (or at a higher level) will inevitably migrate to some other system that still uses BTC as the unit of account, but sacrifices security in exchange for cheaper fees.
BOLT 12: What’s next?
The fog was thick and the food was good, you could enjoy cold drinks and sukiyaki, and then the PR for BOLT12 was merged into the Lightning Network specification repository! Earlier in the day, as the last meeting, we discussed the next steps for BOLT 12, that is, which plugins were cut from the original version and what was wanted.
Potential BOLT12 plugins
The first plugin discussed is: Invoice Replacement. Consider a scenario where a user obtains an invoice using an Offer, but pays it a long time later, so all blinded paths and/or the invoice itself are out of date. In this scenario, it would be useful if the user could request a replacement invoice. How this differs from simply requesting a new invoice with an Offer is probably a matter of context.
One area that some implementers are most eager to make a comeback is recurrent payments. Parts of recurrent payments made it into the original spec, but were ultimately cut. Parameters associated with recurrent payments include the time interval, payment window, limit, and start and end times. One trick that receivers can use is to leverage hash chains to minimize the number of preimages that need to be stored. If they can deliver a special salt/seed to the sender (during initial coordination), then only the sender and receiver will know that the preimages form a hash chain.
As for authentication, a reverse version of BIP 353 has been proposed. The basic idea is to allow users to bind a node's public key to a domain name. This can be used to identify node Y as being associated with some service/domain/company.
Onion message rate limiting and backpressure transport
At the end of the session, attention turned to Onion Messaging and the current state of implementation/behavior of several major implementations. One topic was how wallets handle fallbacks and the associated user experience impact when a wallet fails to get an Offer. Onion Messaging is an unreliable, best-effort network without any built-in feedback mechanisms, so it is possible that a message will never be delivered. Therefore, wallets need to be prepared to either try another path, resend the message, or fall back on some other mechanism if the wallet request fails.
Generally speaking, the status quo is to use a single-hop onion message routing, or direct connections. "Direct connections" refers to connecting directly to the recipient in P2P, blinding the entry node of the path, or connecting to the node of the entry node to try to send the message using a shorter path. If these attempts also fail (no node is listening, or the recipient is not online, etc.), the node either needs other fallback solutions or tries to send some form of spontaneous payment.
Returning to messaging, it is clear that some kind of rate limiting is needed. Nodes may start out with some free budget, but messaging needs to be limited, otherwise a node might unknowingly forward 10 GB of onion message traffic for free (it seems to me that after a free tier, most nodes will eventually switch to a payment system that measures bandwidth [ 22 ]). Therefore, nodes need to employ some kind of bandwidth and rate limiting. If the network is significantly over-provisioned relative to typical messaging usage, then service will remain relatively high because usage will not reach the configured bandwidth limit. However, if the network is heavily populated with messaging activity (people trying to livestream their gaming sessions or something like that), then service will be hampered because most attempts to send messages will fail due to the tragedy of the commons. Furthermore, the difference between a message being dropped, not delivered, or the recipient being offline will be indistinguishable, further creating challenges for the user experience.
Eventually the discussion turned to the old back pressure rate limiting algorithm [ 23 ] that had been proposed previously on the mailing list. Using this algorithm, a node can keep a relatively compact description of the last peer to send a message to it. Once a peer exceeds the limit, the node sends an onion_message_drop
message to the source of the traffic. The sender then attempts to track down who sent it the message and further propagate the onion_message_drop
message, halving the rate limit in the process. If the sender does not overflow the rate limit within a 30 second interval, the receiver should double its rate limit until the normal rate limit is reached.
There are still some open questions here, such as: How can a node correctly attribute a flood of messages to a peer? Can a node frame other nodes and shut down their messaging activity? Is any other metadata required to correctly determine the source of the flood of messages? Is this scheme resilient to attackers who are aware of these rate limits but still try to abuse as much bandwidth as possible within the limits? When this scheme was first proposed, some basic simulations were run to measure the effectiveness and resilience of the scheme [ 24 ]. The initial results are encouraging and have raised some additional research questions [ 25 ].
Eventually, some attendees agreed to start resuming work/research on the backpressure algorithm and apply conservative rate limiting parameters in the short term.