By Alex Myers
Source: https://btctranscripts.com/bitcoinplusplus/2022/2022-06-07-alex-myers-minisketch-lightning-gossip
This article is a transcript of the author's speech at the Bitcoin++ 2022 conference, transcribed by Michael Folkson. The slides of the speech are here: https://endothermic.dev/presentations/magical-minisketch
Minisketch Repository: https://github.com/sipa/minisketch
Rusty Russell on using Minisketch for Lightning Network gossip: https://gnusha.org/url/https://lists.linuxfoundation.org/pipermail/lightning-dev/2018-December/001741.html
Bitcoin Core PR Review Club has three discussions about Minisketch:
introduction
I’m a complete newbie to Lightning development and just joined the Core Lightning team. I’m going to talk about Minisketch and what I’ve learned from gossip networks.
topic
I’ll start by talking about the role of the gossip protocol in the Lightning Network. You’re all familiar with the Lightning Network, or at least a little researched, right? I won’t spend too much time on the background of how the Lightning Network works. But I will explain how gossip is interwoven into the operation of the Lightning Network. Then, I’ll also give an overview of how Minisketch works, walk through a case study, and then show how we can use it to optimize gossip for the Lightning Network.
The role of gossip
I thought it would be interesting to explain the role of gossip in the Lightning Network through an example, so let’s start with a simple Lightning Network transaction.
Send a Lightning Payment
From the perspective of ordinary users, the so-called "lightning payment" may be to open the wallet app on your phone first, and then point the camera at a QR code or a scannable text (invoice). Then, the app will give you a transaction note and remarks, and after you verify that the amount is roughly correct, click "Send". As long as you are not very unlucky, in just a moment, a green check will appear on the phone screen, indicating that the payment has been successfully sent.
But the people present were all enthusiasts, so I thought we could go a little deeper - What is really happening behind the phone screen?
Lightning Payments - Advanced
In the Core Lightning Client, there are many command line commands that can be used to get more information.
lightning-cli listpays <BOLT 11>After entering this command, we can see what actually happens. When the payment is completed, the preimage will appear, which is a bit like the receipt we get when we pay on the Lightning Network. Interestingly, we can see that this payment is broken up into 12 smaller parts. These 12 payment paths are all calculated to complete this payment.
lightning-cli listpaystatus <BOLT 11> Let's look at this a little more closely. We can see that the client didn't just try 12 paths, it actually tried 53. This is called "multi-path payment". Some of the paths failed because they timed out. Before the payment was successful, we got some failure information. This CLI command prints this data. But actually, we can tell how many hops (forwards) were performed in each failed attempt; and we get text-encoded strings ( 0x1007... ). If we pass these strings into another tool ( devtools/decodemsg ), we can decode them. This is part of BOLT4, and the specification states that 1007 means we have temporary_channel_failure .
There is also a "missed channel update failure" ( 0x1000 new channel update enclosed), which means that we missed the change in the payment forwarding requirement announced by this node. This intermediate node has no idea who we are or where the payment is going to end up, because all messages are onion-routed. So it says, "Whoever you are, I think you're talking to the wrong person. Some of the information you have is out of date, and I think you need to look at this." In this case, we can see that we get information about the fee that this channel is going to charge, its cltv_expiry_delta value, and how many blocks we want to add to the timeout. The important thing is this: channel_flags=1 . It doesn't look impressive, but if you look at the protocol spec, you'll see that it indicates that the channel has been disabled. Obviously, when we calculated the forwarding path, we didn't have this information; it's why this path doesn't work.
Lightning Payments - Summary
What did we learn from this base case? Even in this simple case, we used 70 different channels to make this payment. Our information about some of these channels was out of date, but we were able to retrieve some new gossip messages and update our routing graph — which is like a map of the network’s topography. Using the new information, we were able to recalculate our path and make the payment go through.
The role of gossip
Back to the question we asked at the beginning: what is the role of gossip in the Lightning Network in general? Just like gossip in life. Gossip is the second-hand information you share about peers that you don’t communicate with directly. In the Lightning Network, various nodes are connected to each other. We may be directly connected to several nodes, but, for example, we are in the bottom left corner, we need to collect information about other nodes that are not in this area. This is done through the gossip message called node_announcement .

This is done via a gossip message called node_announcement . The node_announcement message gives the public key of a node, as well as the network address to reach it (in case you want to establish a direct connection with it). For example, an IP address, an onion address, a web socket, you name it.
The next thing we want to know is where the channels are. So we have the channel_announcement message. What it does is indicate that a channel exists between node A and node B, and that there is a transaction on the blockchain that funded the channel. It lists the "short channel ID", which is the information you need to search for the channel and check which UTXO on the blockchain funds it. This information is not good for privacy, but it is useful for preventing denial of service attacks (DoS) - the person sending the message needs to prove that they have a UTXO that opened the channel.
Now that we know the basic topology of the network, we can figure out a path through it. But it would be nice to have more information. So we have channel_update messages. This message contains information like the handling fee, the maximum amount of HTLCs that the channel is willing to forward, whether it has been disabled, etc., as we saw in the example above. This is all information that helps us eliminate unsuitable paths when calculating the path.
Now we have all the information we need to construct a path. But why is my talk titled “Lightning Gossip Network”? Isn’t that what we just talked about the Lightning Network? The short answer is that the two concepts are not exactly the same. The difference is that the connections between these nodes we see are (payment) channels in the Lightning Network. But when participating in gossip, we only communicate with a subset of nodes. The exact number depends on the implementation, I remember that Core Lightning uses 5 peer nodes, and you gossip with 3 to 5 of the nodes you connect to. Your gossip object does not need to be your peer node. You can gossip with any node. It doesn’t even have to be a node, as long as it understands the protocol and can provide information that you don’t know yet. As long as new information about the state of the network can be gathered, the back and forth communication is welcome. The most important detail is that the gossip protocol operates in flood propagation mode. For example, if you are connected to 5 gossip peers, as soon as you receive a message from one of them, you will broadcast it to the other 4. This is very efficient in the initial stage of information dissemination, because you can quickly pass the information to all your partners. But after the information has been transmitted for a few hops, its efficiency starts to decline - you start receiving the same data at multiple peers, which is a manifestation of declining efficiency.
Gossip Statistics
Here are some basic statistics on the state of the Lightning Network. Currently, there are 80,000 channels and 17,000 public nodes in the entire Lightning Network. Going back to the flood propagation model above, if we are in a minimum group size (three interconnected gossip peers), that means that in order for a message to propagate throughout the network, we need at least 14 hops. And gossip propagation has inherent limitations. In practice, you batch all the gossip messages you receive and then wait 60 to 90 seconds. Core Lightning uses a value of 60 seconds, but I think LND uses a 90 second loop. You periodically broadcast all the new gossip messages you receive to all your gossip peers. 14 hops, with 60 to 90 seconds between each hop, is a long time. In practice, we found that 95% of the nodes in the network received new messages within 13 minutes. You may need to wait 20 minutes before you can expect everyone to see the new information - at least it is a useful rule of thumb.
What is "Minisketch"?
Switching gears, we are now going to introduce "Minisketch". I will do my best.
Minisketch is a set reconciliation protocol. What is "set reconciliation"? We have two sets of data, and you can see from this Venn diagram that they largely overlap. In this case, both of our data is valid, and we want to make sure we both get a portion of the new data so that both sets are updated. At this time, we care about the "symmetric difference" between the two sets. This is where Minisketch helps us. If you are like me a few months ago, then you might think that this is a very difficult problem to solve, and that trying to reconcile these two sets will be a non-trivial overhead. You might send more information than is strictly necessary because you don't know what the peer is missing. But like me, you might be wrong. Minisketch has some very cool features.
background
Minisketch is actually from a family of error-correcting codes called "BCH error-correcting codes". It uses a map, just like the Berlekamp-Massey algorithm. I'm going to give you a very simple, very abstract example.
BCH Example
Suppose we have two sets of data, containing elements (1, 2, 3) and (1, 2, 3, 4) respectively.
If we want to reconcile these two sets, a cool trick is to make sure we both have all the data. We compute the sum of all the elements in each set, so one is 6 and the other is 10. If we're in the left set, and we want to sync up with the right set, we say, "Here's the sum of the elements in my set, which is 6. Let's take the difference between the two sets, which is 10 - 6 = 4 , and that's the element I'm missing." This is great, and it's basically just subtracting. But this only works if the difference is one element. If it's more than one element, it doesn't work. But here's another trick.
Let's say we want to encode two differences, and we can do this by putting the simple sum of all the elements in an array - that's our array, and the first element of the array is the simple sum of all the elements in the set, and the second element is the sum of the squares of all the elements in the set. This is all basic math. This array is what we're going to pass on to other people. Now we want to reconcile the two differences, which are twice as many as before. We can put these two differences together: "The first difference is the difference of the simple sum of the elements, and the second is the difference of the sum of the squares of the elements." Think back to our algebra: there are two equations here, and there are two unknowns. That means we can solve them!
As the number of differences increases (and the order increases), it becomes a progressively more difficult problem. This is where the Berlekamp-Massey algorithm comes in, and it is a very efficient solution.
Constructing a large sketch
We can do this for any size set. We can go all the way up to order n. Obviously, the larger the order, the longer it takes to solve. You have to encode every entry until you get to order n. But it works, and we can find the difference between two sets, no matter how many elements are involved.
Minisketch
Minisketch Repository: https://github.com/sipa/minisketch
This is a C++ library developed by Pieter Wuille. It implements the PinSketch algorithm. It works on a variety of architectures and hardware. It uses tables to optimize the root-finding process and save time. Pieter also came up with a pure Python implementation, which is amazing in itself. It only takes 500 lines of code to run all kinds of calculations that would blow my mind. The code is also very readable, I recommend you to check out the GitHub page for yourself.
Using Minisketch
But let's say you're just an engineer, like me. How do you put this fun math to use in practice? Like this:
First, we initialize a sketch, and to do this, we need to know the width of the data we want to encode. We're talking about 64 bits here. We fill in the width of the data first, and then fill in the capacity. The capacity is the number of elements of difference we want to be able to reconcile between the two sets. If we think that the two sets may differ by 5 elements, then we can choose a value of 8, just to make sure we exceed it a little bit. But we don't want to exceed it by too much. Then, we input our set data and calculate the composite value. This process is exactly like what we said in the "Constructing a Big Sketch" section. Then, we serialize the result and transmit it from Alice to Bob.
Bob goes through the exact same process. He builds his sketch, and then he merges the two sketches. This is really interesting, he has to choose the same width, but the capacity can be different. The math is such that you can always chop off the array so that the two are equal, and you merge the two sketches. The capacity is a little looser (not equal to the number of different elements), but it still works out pretty well. Then you use the Berlekamp-Massey algorithm, and the result is the difference between the two sets.
Black box characteristics
What properties does this math have when used? It supports data from 2 bits wide to 64 bits wide. The really cool thing is that the size of the sketch after serialization, that is, the size of the data that has to be transferred between nodes, is the capacity of the sketch (that is, the number of difference elements you want to solve) multiplied by the width of the data. If you choose the sketch capacity just right, then you can get 100% efficiency out of the data (the volume of the number of transferred elements is exactly the volume of the missing elements you extract).
草图序列化体积= 草图容量* 数据宽度It blew my mind. But there are some other things to be aware of. As you increase the size of the sketch, the time to reconcile goes up (linearly). Depending on how many elements you actually want to differ, the time to reconcile goes up quadratically. So we all want to constrain the number of elements to differ. You don't want to get overwhelmed by how big the differences of the sets are.
limitation
There are a few basic things to remember. When you initialize a sketch, you can't encode zero elements. Any other number will do, just not zero. Another thing is, it's also useful to make sure that the number of elements in each sketch, the difference between the two collections, is not very large (it doesn't exceed the capacity of the sketch). Generally this won't be a problem, but let's say you have 50 elements in one sketch and 100 elements in another sketch, but your sketch capacity is only 10. It won't be able to solve it because the capacity is not enough. But the good thing is that if it can't solve it, it will give you a warning, and it will say "I can't find any polynomials that satisfy this equation." Most of the time, this works.
Erlay
Here's some background on how we might use this technology in the future. The Bitcoin network faces very similar problems to the Lightning gossip network - flooding transactions doesn't scale well. If you've heard of the Erlay protocol, you probably know that's what Minisketch was developed for. Erlay uses Minisketch to encode every transaction in a transaction pool. You can then share the sketch with your peers in the Bitcoin network. Transaction IDs are 32 bytes wide, but in Minisketch we only have 8 bytes of data. So we need to compress the transaction ID into a smaller fingerprint so that we know which transaction the difference we found represents. There's a hash function that can do this.
Nodes also reconcile inventory sets. This is another factor that makes things a little more complicated. Let's say we have a whole pool of transactions, one way to do this is to encode every transaction in a sketch, all peers have to do this, and then they all have to solve to see if there are any differences in each other's transaction pools. But this is inconsistent with how Erlay works. What Erlay does is that the node keeps track of every peer it communicates with, and the node says, "It's been 15 seconds since we last communicated, and I made an inventory of things I wanted to tell you that I haven't told you yet. Here are the five things I want to tell you." At the same time, your peer Bob does the exact same thing, but maybe he collected seven things. Erlay doesn't ask for the encoding of the transactions in the transaction pool (which may be thousands of transactions), it only asks for the encoding of these five transactions and seven transactions. We reconcile the two and maybe find that there are only three differences between them. Once the reconciliation is completed, these inventory sets are cleared, and the two peers move on to the next round: record the new things that happened in the next 15 seconds.
Lightning Network Gossip vs. Bitcoin Transaction Forwarding
The problems that the Bitcoin network faces in transaction forwarding are very similar to those of the Lightning Network Gossip, but there are also some differences. First, any time you want to use a short hash function to generate a 64-bit fingerprint, you have to worry about collisions. Someone might take advantage of this and try again and again to find a transaction ID (which is a hash value) that will produce the same fingerprint. This becomes a denial of service attack interface. This is indeed worrying. In addition, there is a timing analysis of private transactions. In gossip, there is no need to hide anything because all information is naturally public. In addition, we have a little trick that allows us to not use hash functions. We don't just have one data type, there are 3 types of messages we want to forward.
Application in Gossip
There are three types of these messages: channel_update , node_announcement , and channel_announcement . channel_announcement contains information about the transaction fee, channel usage status (activated or disabled), the maximum HTLC amount supported, etc. This probably makes up 97% of the gossip network traffic, which is a lot of snot, so we want to be more efficient. The first two messages ( channel_update , node_announcement ) are only valid for two weeks, so they will be broadcast repeatedly.
challenge
Our challenge is that we have to encode all 3 message types. We can only use 8 bytes, but we have a tool called the "Short Channel ID (SCID)". This is the information that associates a channel with the funding transaction on the blockchain. Every transaction on the blockchain is unique. This is a great shortcut that we can use. In node_announcement , there is no channel associated with this node, so it is difficult to use the short channel ID. Ideally, we would reference the node ID, which is the node's public key. But the public key is also 32 bytes wide, so we have to hash it, or do something else.
Coding Scheme
Actually, we can use this data when encoding this information: block height, transaction index, output index, which is the short channel ID. We can reference a node announcement message by saying "oldest channel for this node", and which side of the channel this node is on. So we have a way to consistently locate which node we are talking about. This is usually 8 bytes of data, and we are trying to fit it into 8 bytes. All we are doing is shaving off a few bits that we don't need. If a block only contains 32,000 transactions, then this might be enough. If you are funding a lightning channel, it is unlikely that the funding transaction will have 1,000 outputs. We have saved a few bytes, so there is room for other information, such as the type of message and which side of the channel this node is on. This is how we can locate which gossip message we are broadcasting with exactly 64 bits. Finally, we have the timestamp. For messages that are sent regularly, we need to know which one is newer and which one is older. It helps to have a few bits to distinguish between them.
Benefits of collective mediation
What's so appealing about this? The short answer is: we can save at least 60% of gossip bandwidth. Once we've done this, we can talk to more peers. Talking to more peers is a very good thing, because it improves the reliability of gossip. In particular, peer announcement messages have suffered in the past, because for the other two messages, there are some simple heuristics to tell if you missed them. For channel update messages, we've seen in the opening example that in the worst case, our payment path will fail, and the error packet will bring the update and we can get feedback. But what about the peer announcement message, it tells you the IP address of the node. If the IP address of the node changes and you don't see it, you may no longer be able to connect to it. This is very bad, and there is hope that you can get real benefits from the increased reliability.
What's next?
I have a few open questions. One of them is that we can either maintain a global sketch or have an inventory set for each peer. There are arguments for both approaches.
One other thing is that we also have rate limits on the gossip messages we receive. If we rate limit gossip messages very strictly that can reduce our sketch capacity significantly. For implementations that opt in to this collective mediation feature, it would be nice if people could agree on what timescale to rate limit on. I'm leaning towards using block height. I'm trying to convince people. If I could associate every gossip message with the current block height; say, one gossip message per block, or one every 100 blocks, etc. This is very easy to verify since you're already running a full node.
Finally, in the long term, we really want to get rid of short channel IDs in gossip, because it does leak privacy. You don't want to publicize all your unspent transaction outputs. We are not completely ready for that yet, but once we start doing that, we will move towards inventory collection schemes corresponding to different peer nodes. This may have indicated what we should choose now.
in conclusion
Gossip allows us to build a view of the entire Lightning Network and figure out the paths. Minisketch is really cool and everyone should check it out. Hopefully we can use it to improve the reliability of Lightning Network gossip.
(over)




