SwiftSync: Nearly stateless and parallelizable Bitcoin blockchain verification

This article is machine translated
Show original

By Ruben Somsen

Source: https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd

Nearly stateless, fully parallelizable verification of the Bitcoin blockchain is achieved using hints about which outputs will remain unspent. All other inputs/outputs are effectively crossed out in a single aggregate hash value that eventually equals zero if verification succeeds and the hints are correct.

introduction

Validation is at the heart of Bitcoin. Any improvement in validation speed has a direct impact on the scalability of the system (and everything built on top of it). For this reason, improving validation performance is probably one of the most important things we can do.

The general observation of SwiftSync is that if we have two sets (in this case, inputs and outputs), and we want to know the difference between these two sets (i.e. the UTXO set), it is much easier to verify a given answer to this question (given a hint) than to compute the answer itself. What is insightful (as far as I can tell) is how to apply this observation specifically to this set. This observation seems to be useful in other blockchain-related contexts as well, and perhaps beyond.

Currently, regular validation that happens in Bitcoin Core requires walking the blockchain in order (while running some minor checks that are independent of context), adding outputs to the UTXO set, then retrieving the (spent) outputs and removing them. Typically the UTXO set does not fit in memory, which is further slowed down by disk reads and writes.

SwiftSync completely changes this paradigm without changing the safety assumptions. There are no more database lookups, memory usage is reduced to near zero, and all verification can be done in parallel. This removes memory, disk I/O, and single-threaded performance from the list of possible bottlenecks. The remaining bottlenecks are CPU and bandwidth, and the better you are at both, the faster you can verify.

The statelessness of the protocol may also make it well suited for memory-constrained environments, such as using specialized hardware (FPGAs and ASICs) to verify the blockchain, construct zero-knowledge proofs, or simply free up more memory for other operations (such as batch verification of Schnorr signatures). You can also easily spread the verification burden across multiple devices.

We’ll need further benchmarking to determine exactly how fast SwiftSync is, but a very preliminary proof-of-concept (without any parallelization yet) shows an astonishing 5.28x speedup .

Protocol Overview

For each output on the chain, we need a hint to indicate whether the output is still unspent at the tip of the blockchain, but a hint only takes one bit ( less than 100 MB compressed ).

If the prompt is incorrect, verification will fail (this is a DoS interface), and such prompts should come from a reliable source (otherwise, your full node software binary) [1] .

We will rely on "assumevalid " to skip certain checks. This is not required, but it results in a more elegant and bandwidth-efficient protocol. The specific consequences of assumevalid , the non-assumevalid version , and the relationship between SwiftSync and "assumeutxo" will be explained later.

We process all inputs and outputs in a sequentially independent (fully parallelizable) manner. The only state we require is a hashed aggregate of a 32-byte field (no memory usage, no random reads or writes to disk). The protocol's bribery scales linearly with available CPU and bandwidth.

For each output:

  • If the prompt indicates that the output will not be spent even at the top of the chain:
    • Then write it to the UTXO set (i.e. append-only - we don’t need this data until SwiftSync completes and returns to normal validation mode)
  • on the contrary:

For each input:

When we complete SwiftSync, the hash aggregate value should be 0, proving that the outputs we wrote to the UTXO set are all unspent, and that all other outputs have only been spent once (no double spending).

There are more details below, but this is the core of the agreement.

detail

About hash aggregation values

All we need is to know whether set A (the outputs being spent) is consistent with set B (the inputs), regardless of order. This order-independent property ensures that removing an element before adding it will not cause problems, which means that it is completely parallelizable.

Mathematically, it is almost enough to just hash the preimage and then add/subtract the result (modular arithmetic) - we also need to add a secret value, the salt, to ensure that the data cannot be manipulated from the outside (a generalized version of the birthday problem). [2]

example

Spent outputs: outpoint_A , outpoint_B

Input: outpoint_C , outpoint_D

If hash(outpoint_A||salt) + hash(outpoint_B||salt) - hash(outpoint_C||salt) - hash(outpoint_D||salt) == 0 , then our list of spent outputs is identical to our list of inputs (i.e. (A==C && B==D) || (A==D && B==C) ), and the rest is identical to the UTXO set.

About assumevalid

As currently implemented, assumevalid is only used to skip script validation. The security argument behind it is that users already trust their software to be correct, so they can also trust a statement about the validity of a chain. There are other things besides scripts that can be skipped, but most of the remaining checks are inexpensive and/or difficult to remove, which is probably why they are left in. If we expand the usage of assumevalid slightly, we can significantly reduce the amount of data required to validate an input. See.

A complete entry in the UTXO set contains the following data:

  1. Output Point
  2. Output Script
  3. Coinbase Tags
  4. Block height
  5. Amount

Simple Case

Without assumevalid, all of this data would have to be available to check the input. In the assumevalid version of SwiftSync, we only use the output point (i.e. (1) in the listing above). This is convenient because this data is already inside every input. It's easy to understand why (2) was omitted - the script doesn't check it anyway in assumevalid - but what about the rest of the data?

Skipping the (3) coinbase tag means that we can no longer check whether an output created from a coinbase has been spent earlier than it is allowed to be spent - these outputs inherently have an implicit relative timelock of up to 100 blocks. In order to check such a relative timelock, we also need to know the (4) block height. Similarly, the nsequence field in a transaction can be used to enable relative timelocks (which we do not check).

The block height also indirectly helps with something that only happens in the parallel validation case: it ensures that outputs are created before they are spent (not after). Note that the block height alone may not be enough, because an output may be spent in the same block in which it was created. We will discuss this in more detail later , because we need to handle this explicitly in the non-assumevalid version .

Another issue we will deal with later is BIP30 , which currently requires full access to the UTXO set. The UTXO set is fully addressable, but for the assumevalid version this should not be needed.

Including skipping the above checks in assumevalid mode should be completely uncontroversial, because no check is as important as the script check (which is what assumevalid skips) [3] .

A more difficult situation

It is debatable to skip the amount check in (5), because without the amount check, you no longer have an active check for inflation bugs. Fortunately, there is a workaround - we can sum up the amounts from the UTXO set and check if it exceeds the total supply. Note that this should also include the amounts of OP_RETURN outputs.

One limitation of this approach is that there is no easy way to account for funds that miners did not take, since we implicitly skip the transaction calculation. In theory, if someone took these unclaimed funds, we would not notice. From a scarcity perspective, this is no worse than the worst case scenario of assumevalid - funds were improperly taken via an invalid script. Furthermore, the amount of funds that could be taken in this way is very small [4] .

Finally, there is one more consequence that is undoubtedly a disadvantage , but not too big. Because we don’t have the complete previous output data when checking a block’s input, we can’t generate the so-called “undo” data - the data needed to roll back the blockchain in the event of a blockchain reorganization. This means that we can’t roll back deeper than the assumevalid SwiftSync point. Running Bitcoin Core in pruned mode has exactly the same limitations. There is a practical way to deal with this: for the latest X blocks, don’t use assumevalid SwiftSync; X only needs to be the depth at which you think a reorganization is unlikely.

In short, by relying on a minimally extended definition of assumevalid, and accepting the requirement (as with pruning nodes) to resynchronize in the event of a deep reorg, we get a very clean, simple protocol. Of course, the ability to verify in non-assumevalid mode is also important, so we'll discuss that next.

No need to assumevalid

SwiftSync without assumevalid can fully verify all consensus rules, but we need some extra steps. In addition to adding and subtracting the hashed output points, we also need to include everything needed to check an input (see the 5-data list above ) in the hash value.

Adding these to the hash aggregate is easy, because we already have them when we need to add them (when processing the output). But when we want to remove them (when processing the input), we only have the output points - so we have to re-download the missing data. Compared to regular full verification, this may require 15% more data, which is a lot of room for efficient encoding [5] .

We can allow users to download this data externally, but we can also consider having the peer-to-peer network provide this data. Fortunately, most nodes may already have this data - the undo data mentioned above that allows nodes to reorganize the blockchain is what we need [6] . Note that we also need the same source that provides us with the hint to provide us with a hash of the undo data to ensure that random peers do not give us invalid data and cause verification to fail [7] .

Because we are now checking amounts, a separate inflation check on the UTXO set is not needed, but we do run into some new problems (already mentioned in the previous section).

Order of transactions within a block

Because parallelization is enabled by default in SwiftSync, we don't necessarily know if an output will be spent before it is created, for example, the output may be created and then spent in the same block (in all other cases, checking the block height is sufficient).

While the problem is undeniable, its consequences are not a big deal if you think about it [8] . But we can fix it, and once we do, it will be easier to talk about security.

The most straightforward and practical approach is to use a per-block cache. You can think of it as a per-block micro-UTXO set (technically, it only needs data that can prove the order), constructed in order. We only need to retrieve an input when its (genesis) block height is the same as the current block, although there may be reasons to do more [9] . This makes the protocol less elegant because we add back some state and do sequential searches, but it limits us to a single-block environment - we are still completely free to process all blocks in parallel.

If we want to stick with a no-cache strategy and avoid searching and sequential checking altogether, there are more options [10] , but for full nodes running on regular hardware, the block cache should be good enough.

BIP30 Verification

In order for SwiftSync to be equivalent to regular full validation, it also needs to meet the requirements of BIP30 , even if it does not have access to the UTXO set. BIP30 was only activated in March 2012, but its scope of enforcement is from the Genesis Block to the time when BIP34 was activated (block height 227931, March 2013). BIP30 prevents duplicate outputs by invalidating blocks that produce already existing unspent outputs. In theory, BIP30 does not completely prevent duplicate transactions (BIP34 can, but it also brings some unresolved problems ), because it is still possible to create, spend and then recreate, although in reality this has never happened.

Prior to BIP30 activation, there were two duplicate outputs where the same coinbase transaction output was created twice. The second output overwrote the first, effectively making the first output unspendable. These instances were treated as exceptions in BIP30. More information can be found here .

SwiftSync has no concept of UTXO sets, so we can't track whether an output already exists. Fortunately, another approach can be devised that gives us the same result. We can simply save a swap containing the coinbase transaction id of each transaction and check their uniqueness. This only takes up 227931 * 32 bytes ~= 7MB of memory, and we can optimize it further and even make it stateless if we want [11] . It is not only equivalent to BIP30, but also a faster direct replacement for what we do today.

There is also a theoretical scenario where a violent reorg rolls back to 2013, causing BIP34 to never activate and BIP30 to last forever. Rather than dealing with this is-or-nothing scenario, a more practical solution is to just go back to non-SwiftSync validation if it happens.

SwiftSync and assumeutxo

Assumeutxo and SwiftSync are completely orthogonal. Assumeutxo starts with an assumption that the UTXO set is correct, and then performs verification in the background to verify the truth of this assumption. This latter part is where we can use SwiftSync.

One complexity caused by assumeutxo is that it requires two sets of state: one from the assumeutxo point to the current chain tip, and one from the Genesis Block to the assumeutxo point. Once the background verification is completed, we will return to one set of state again. Because SwiftSync is nearly stateless, using it in background verification may make things simpler.

An important but easily fixable difference is that while we no longer need to construct a UTXO set, we still need to verify that the UTXO set of assumeutxo used as a starting point is exactly the same as the UTXO set that can be derived from the SwiftSync prompt. We can do this using the same hash aggregation trick.

Each time we process an output that is hinted to stay in the UTXO set, we add it to the hash aggregate (this will be a hash of the full UTXO set data, not just the output points). Similarly, we hash every entry in the UTXO set of assumeutxo and then remove them from the hash aggregate. If both are correct, we end up with a 0 as well.

What’s satisfying is that if you use the non-assumevalid version of SwiftSync, you don’t even need a one-bit hint for each output, because whether an output ends up in the UTXO set or not, the data you add to the hash aggregate value will be exactly the same (i.e.输出- 输入- UTXO == 0 ). I’m intoxicated by this elegance.

Acknowledgements

Verification optimization is a topic that has been discussed for years. Many people have explored related ideas before me. Cory Fields' UHS is probably the most similar to this protocol, which itself is an improvement on Bram Cohen's bitfields idea. Tadge Dryja's utreexo also overlaps with this protocol in terms of state reduction, parallelization, and cache-related initialization block synchronization (IBD) hints.

Thanks to theStack and 0xb10c for the initial benchmarks and statistics . I also want to thank the Bitcoin Core developers who have been actively working on IBD optimizations — davidgumberg , l0rinc , and andrewtoth — as well as those who provided helpful discussions, and many others who provided feedback.

footnote

1. What other options do we have for receiving hints? One downside of relying on values ​​stored in your software is that the software may have been released months ago. After using the hints in it, the verification process will be significantly slowed down. Another possibility is to receive [non-assumevalid SwiftSync] hints from a trusted third party. In the worst case, this third party will lie and your verification will eventually fail. The important thing is that you don't assume that an invalid chain is valid. It is possible to mix and match the two approaches (i.e. catch up when your node is offline), but in this case we would have to add some additional logic to remove UTXOs from the set between SwiftSync sessions. Finally, hints can also be committed to the chain, but this requires a soft fork.

2. MuHash and XOR Stacking vs. hashing with a secret salt? MuHash is a viable alternative if we wanted to remove the secret salt requirement (e.g., to allow state comparisons between clients), but it seems like overkill for our use case. XOR is insufficient because elements cancel each other out if they appear twice (once in the input, once in the output). This could be compensated for with additional checks, but that's pretty close to what we're proposing here.

3. What happens in the worst case if we ignore timelocks and transaction ordering? Unenforced timelocks or out-of-order transactions simply mean that some funds are spent prematurely, or that something weird happens with the order of transactions (funds are spent from a non-existent output first, creating inflation, and then the non-existent output is created later, canceling out the inflation) - if the inflation is not corrected, SwiftSync validation will fail. Crucially, this is not worse than the worst case of assumevalid - an invalid script spending an output.

4. What if we really want a more precise check on the funds that miners don't take? In theory, we could add amount hints to all inputs of blocks where miners don't take all due fees, and provide hints about which outputs were spent in those blocks. For these specific cases, we could include the amount in the hash. Miners could harass by always leaving 1 satoshi. A soft fork requiring a commitment to total fees or disallowing legacy fees would be a more radical solution. That said, given that the worst case scenario is no worse than what assumevalid already assumes, it's perfectly reasonable to accept this limitation. People who care should choose to disable assumevalid.

5. How can we compress the data needed for the non-assumevalid version? There are a lot of opportunities for efficient compression and deduplication here. There are many repeating patterns (e.g., standard transactions), hashes whose preimages we know (e.g., P2SH), and frequently occurring outputs (e.g., most inputs that appeared at recent block heights). Some ideas from BIP337 may be reusable as well.

6. What else can providing undo data over a P2P network be useful for? It allows pruned nodes to roll back deeper than their own pruning point, as long as they save a hash of the pruned undo data (assumevalid's SwiftSync can't do this, unfortunately), and peers are willing to provide the undo data for the blocks they forked from. Because undo data essentially allows you to fully validate blocks out of context, protocols like " PoW fraud proofs " (low-bandwidth blockchain validation by selectively validating forked blocks) can also benefit from it.

7. Is there another way to provide undo data without causing DoS problems? The most direct way is to hash this data, put it in every block, and make it part of the consensus; but soft forks are difficult. There is also a theoretical peer-to-peer method of comparing the state of undo data between peers, and if there is a difference, find out which peer is lying (as long as at least one peer is honest, it will work), but this is also relatively complicated. An example of doing this (although in a different environment) can be found here .

8. Assuming we don't check transaction ordering within blocks, how could an attacker exploit this? Accepting a chain with incorrect transaction ordering will have no effect on fund ownership or inflation. The only problem is that regular fully validating nodes will reject this alternative ordering, so it will become a hard fork. However, this fork will not happen unless this alternative ordering actually appears on the chain with the most proof-of-work. It is very difficult to imagine a situation where an attacker can actually exploit this. Here is an attempt: (1) subvert the current chain with the most proof-of-work and mine a block using this alternative ordering; (2) have the victim perform SwiftSync verification all the way to the top of the chain, so that they believe your alternative ordering (perhaps by tainting their software first?); (3) have the victim buy bitcoins from you on this "wrong" chain. Eventually, the "good" chain will beat your "bad" chain again. All in all, it seems that this attack is not more effective than just selling your bitcoins and then reorganizing the transactions. Even if the attacker's goal is just to cause chaos, it is hard to say how a large-scale reorganization would be different from this attack in terms of effectiveness.

9. Perhaps we could save bandwidth by caching instead of re-downloading every UTXO ? Yes, and there are good ways to handle this, but it does add more complexity and makes the whole verification process more serial (hard to parallelize). You could have the cache cover multiple blocks, and even go a step further and provide hints about which outputs will be spent. This could bring back some bottlenecks, like bandwidth.

10. If we don't have a cache, how do we check ordering within a block? We can add an extra element to the UTXO set: a canonical transaction number (output numbers are also OK, and are attractive for other reasons, but their precision exceeds our needs). Now, when an output is about to be spent, and its genesis block height is the same as the current height, we check the transaction number to verify that it was created before the transaction we are evaluating. The downside of this approach is that it increases the size of the full UTXO set (by 2 bytes per entry). A more interesting approach might be to add an extra hint to each output that we know will be spent in the same block. Specifically, we hint at the number of the transaction that will spend it (which must be greater than the number of the transaction that includes it as an output - we just need to encode the difference between the two), and include this hint in the hash value. Now, when we encounter an input whose genesis height matches the current height, we include its transaction number in the hash and remove it from the aggregate value. Unfortunately, it is very common for an output to be spent in the same block as its genesis, so the necessary hint data can increase significantly. While caching is a more obvious approach, it would be interesting to benchmark it against a no-caching alternative.

11. BIP30 is another place where caching comes in handy - how can we optimize this? We can reduce the dataset by truncating the txid. Normally, we worry about collisions, but we can guarantee a collision-free result on this specific chain by pre-calculating the specific bits that need to be kept (rehashing with a salt is also possible, but is slower to compute). Information-theoretically, this should get us down to 18 bits per txid, although computationally it's not feasible. Practically, we should be able to get it down to 4 bytes (or even 3 bytes). This brings the size down to less than 1 MB. We can also go a step further and throw away the state and search with the help of hash aggregates. To do this, we have an ordered list of truncated hashes. We iterate over this list, checking that each element is greater than the previous one (thus proving uniqueness) and adding them to the hash aggregate. When we encounter them in the block, we remove them from the hash aggregate (if we end up with 0, we have proven set equivalence).

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments