By Suhas Daftuar
The original article was published in February 2019.
summary
The Bitcoin block header contains a commitment to the set of transactions confirmed by the block, which is achieved by constructing a Merkle tree with transaction ids (two consecutive SHA256 hashes of the transaction) and then including the root of the tree in the block header. In turn, it is possible to prove to a Bitcoin light client that a transaction is confirmed by a block by providing a Merkle path from the root to the transaction. However, this particular construction of the Merkle tree has several security weaknesses, including at least two block melting vulnerabilities that affect Bitcoin Core consensus logic, and an attack on light clients (an invalid transaction can be "proven" to appear in a block) with much less work than finding a collision of two consecutive SHA256 hashes.
Acknowledgements
Many people have reported some of the issues described in this summary on the bitcoin-dev mailing list, including Sergio Lerner and Peter Todd. This summary is based on private IRC discussions between the author and Johnson Law and Greg Maxwell. Some of the observations discussed in Chapter 3 originally came, I believe, from Luke Dashjr.
1 Background
1.1 Construction of Merkle Root
To recap, the Merkle root construction used by Bitcoin starts by performing two consecutive SHA256 operations (double-SHA256) on each transaction [1] , and obtaining the hash value $H$ as the leaf of the tree. For each pair of consecutive leaves, we concatenate the leaves and perform two consecutive SHA256 hash operations on the leaves, using the resulting hash value as the parent node of the pair of leaves. Importantly, if there is an odd number of nodes on a certain level of the tree, the last element will be copied first, so its parent node will be the hash value of the element copied itself and concatenated. For example, if there are three transactions in a block, the Merkle tree will look like this:
Here, Node1 is calculated using H(H(Tx1)||H(Tx2))
, and Node2 is calculated using H(H(Tx3)||H(Tx3))
. Finally, the Merkle root is calculated using H(Node1||Node2)
, and this Merkle root appears in the block header.
If a block has only one transaction, then the Merkle root will be the hash of this transaction. Note that every valid block must contain at least one transaction.
1.2 Block Validity in Bitcoin Core
Bitcoin Core's consensus logic breaks block validation into several parts and keeps track of a block's validity across all of these stages. For example, when a new block header is processed (generally before a block is received), the header is checked for validity under the network's consensus rules. Later, when a block arrives, it is also processed in stages, running context-free checks first (not dependent on any other blocks and headers), then checks that depend on the block's header chain, and finally checks on transactions and their signatures, and whether the block belongs to the header chain with the most work.
Each stored block header has an associated cache value that tracks how many validations have been completed for the associated block. If a block is found to be invalid, Bitcoin Core will permanently store the invalid state of the block and avoid reprocessing the block and its successors.
In the next chapter, we discuss the concerns surrounding permanently marking invalid blocks to prevent attackers from using this optimization to split the network.
2 Copying transactions, CVE-2012-2459
This is written in the Bitcoin Core source code:
警告!如果你是为学习密码学以及/或者设计一种将要使用默克尔树的新系统,请记住,下面的默克尔树算法有一个跟交易id 重合相关的严重错误,最终会产生漏洞(CVE-2012-2456)。原因在于,如果某个时刻,在列表中的哈希值的数量是奇数,则最后一个哈希值会被复制,以计算下一层(这在默克尔树中是不常见的设计)。这导致交易的特定排序会产生相同的默克尔根。举个例子,这两棵树: AA / \ / \ BCBC / \ | / \ / \ DEFDEEF/ \ / \ / \ / \ / \ / \ / \1 2 3 4 5 6 1 2 3 4 5 6 5 6交易列表[1, 2, 3, 4, 5, 6] 跟[1, 2, 3, 4, 5, 6, 5, 6](5 和6 被重复了)会得出相同的哈希值A(因为(F) 和(F, F) 的哈希值都是C)。漏洞在于,你可以发送一个带有重复交易列表的区块,它跟原版区块有相同的默克尔根值、相同的区块哈希值,但却无法通过验证。然而,如果接收节点将这个区块标记为永久无效的,就不再能接收同一区块没有篡改过(因此可能是有效)的版本。我们通过检测这种情形来防止这种误伤:检测会在列表末尾哈希两个完全相同的哈希值的行为,并跟具有无效默克尔根的区块采取相同的处理措施。假设没有double-SHA256 的碰撞,这将检查出所有已知的改变交易而不影响默克尔根的方法。
In the next section we discuss a new type of block melt that was not discovered at the time the above comment was written.
3 Weaknesses caused by Merkle tree ambiguity
Greg Maxwell has pointed out (in a private IRC message) that a weakness stems from the "lack of domain separation between leaves and nodes" in the Merkle tree. A non-leaf node in the Merkle tree is the hash of a 64-byte input (the concatenation of its children). Meanwhile, a leaf node is the hash of a transaction (in serialized form without witnesses). If the serialized form of a transaction (without witness data) can be 64 bytes, then there is no way to distinguish between leaf nodes and non-leaf nodes, which could expose security vulnerabilities.
3.1 Block meltability
Consider a block $B$ with 2 transactions. The Merkle root of block B is:
Imagine a block with only 1 transaction. Then the Merkle root would be H(Tx1)
.
Suppose a peer forwards the block header of $B$ and claims that $B$ contains only one transaction, not two, and that transaction $T$ in the block is serialized as H(Tx1)||H(Tx2)
. If T can be successfully deserialized into a single transaction [2] (and can indeed be reserialized as H(Tx1)||H(Tx2)
), then the hash of $T$ will match the Merkle root in the block header. If T is invalid, then a node that receives this version of block $B$ will reject the block as invalid. However, if the receiving node permanently marks the block hash as invalid, it will no longer be able to process and accept a truly valid block $B$, even if an honest node transmits two valid transactions. This can be used to drive a node out of consensus.
This attack can be generalized. In a block with $N$ transactions, the attacker claims that the block actually has N/ 2k transactions, all of which are 64-byte "transactions" whose hash values are the nodes in the original k-th row of the tree. If all these 64-byte values can be successfully deserialized into transactions (and therefore can be communicated), then the nodes that receive this block should not permanently mark this hash value as invalid, even if these 64-byte transactions are invalid.
**3.1.1 How much work is required to make such a 2-transaction block so that the Merkle root preimage can be deserialized into a transaction? **
Transactions are serialized like this:
[version][vin][vout][locktime]
The version number and lock time are both 4-byte fields, and there is no requirement for deserialization. $vin$ (transaction input) and $vout$ (transaction output) are serialized as follows:
[|vin|][vin0]...[vinn][|vout|][vout0]...[voutn]
The size of $vin$ is encoded using Bitcoin's "compact volume". Because each $vin$ contains a 32-byte hash of the previous transaction, and we want to create a 64-byte value that can be successfully deserialized into a transaction, we can constrain |vin|
to 1, and only one byte ( 0x01
) is needed to encode it. Each $vin_i$ consists of the following fields:
[hash][index][scriptSig][sequence]
Where hash
is the 32-byte hash of the previous transaction, and index
is a 4-byte sequence number that indexes the $vout$ vector of the transaction specified by hash
. A valid coinbase transaction must have an empty previous transaction hash and the index number must be set to Oxffffffff
, but invalid coinbase transactions are not subject to these values.
scriptsig
is encoded as the length of the script plus the contents of the script, so there is a constraint that the length must match the number of bytes read. For now, we can assume an empty $scriptSig$, which is only 1 byte long, encoded as 0x00
.
sequence
is a 4-byte field with no constraints.
The $vout$ array is also encoded with the length, and in order to make the serialization effective, this rule must also be followed. So we also constrain the volume of $vout$ to 1, which will introduce an additional fixed byte 0x01
. The remaining vout vector will be serialized like this:
[amount][scriptPubKey]
amount
is an 8-byte field representing the quantity, and scriptPubKey
, like scriptSig
, has a length encoding, so it is at least 1 byte.
To summarize, a 64-byte transaction that can be successfully deserialized can be constructed like this:
A = version number (4 bytes, unconstrained)
B = vin amount (1 byte, constrained)
C = previous transaction id (32 bytes, no constraint)
D = Preamble output index number (4 bytes, unconstrained)
E = Script signature (1 byte, constrained)
F = sequence (4 bytes, unconstrained)
G = number of vouts (1 byte, constrained)
H = amount (8 bytes, unconstrained)
I = length of the script's public key (1 byte, constrained)
J = remainder of the script's public key (4 bytes, unconstrained)
K = locktime (4 bytes, unconstrained)
Here, we set $scriptPubKey$ in $vout_0$ to a total of 5 bytes, so that the transaction is 64 bytes long, and successful deserialization is constrained to only those bytes (length encoding). But looking at it in its entirety, we can see that we only need the sum of the lengths of $scriptPubKey$ and $scriptSig$ to be 4 bytes.
Also note that only 4 of the 64 bytes are constrained, and they all appear in different halves. So, to make a preimage of the Merkle root that can be efficiently deserialized into a block header with 64-byte transactions, only 8 bits of searching are required to find a valid coinbase transaction (which hashes to the first 32 bytes), and about 22 bits of searching ((1/5) * 2 24 , so slightly less than 22 bits) to find the second transaction, which hashes to the second 32 bytes - very little computation.
(Translator's note: What this means is that you need to find a 64-byte value that can be deserialized into a transaction; and its first 32 bytes are exactly the hash value of a coinbase transaction, and its last 32 bytes are exactly the hash value of another transaction. In other words, you need to find a combination of a coinbase transaction + a normal transaction, and the concatenation of their hash values can be deserialized into a transaction.)
3.1.2 How much work is required to create a block whose Merkle tree nodes can all be deserialized into valid transactions?
Note that the first transaction in a block must be a coinbase transaction, and, as mentioned before, this heavily constrains the first 32 bytes representing the first transaction: only the 4 bytes of the version number are unconstrained. So, you need to search at least 28*8 = 224 bits to find a case where the first half of the first node of a given row in the tree matches a coinbase transaction, and then continue searching to find a case where the second half matches a somewhat meaningful transaction (this is much easier, only about 16 bytes are constrained, so only 128 bits of search are needed to find a collision). Of course, you can use any row in the Merkle tree, but obviously this should be computationally infeasible.
3.2 Attacks on SPV (Light) Clients
SPV clients are expected to receive proof that a transaction appeared in a block. The nature of this proof is to provide the client with a path through the Merkle tree, from the root to the transaction.
However, suppose a (valid) 64-byte transaction $T$ is included in a block, and its next 32 bytes (less constrained than the previous 32 bytes) are carefully constructed so that it collides with some other fake, invalid transaction F. Let $R$ be the Merkle root value, then we have:
R = H(H(H(...H(T)||H(...)...)T = [32bytes]||[H(F)]
If such a transaction is possible to construct, then a node can fool an SPV client into believing that $F$ is in the block — because it can provide a Merkle path from the Merkle root to $T$ and then interpret $T$ as an internal node (rather than a leaf node). Because the number of transactions in a block is not visible in the block header, the SPV client cannot know this number in advance, and therefore cannot know the correct depth of the tree.
3.2.1 How much work is required to make a valid 64-byte transaction so that the next 32 bytes collide with the hash value of another transaction?
Note that Sergio Lerner has published a dedicated analysis showing that this can be achieved with a 72-bit search; Peter Todd has also published an analysis claiming that this can be achieved with a 60-bit search.
From the diagram above, we can see that the first 32 bytes of the transaction are subject to more constraints, because it contains more of the previous output that is spent, and it must be valid (otherwise the transaction is invalid). Then, we focus on the last 32 bytes:
C = Previous transaction id (the remaining 5 bytes of the previous transaction id, no constraints)
D = Preamble output index (4 bytes, 21 bits constrained)
E = scriptSig (1 byte, constrained)
F = sequence (4 bytes, no constraint if the transaction version number is 1)
G = number of transaction outputs (1 byte, constrained)
H = amount (8 bytes, about 33 bytes are limited, see below)
I = length of scriptPubKey (1 byte, constrained)
J = remainder of scriptPubKey (4 bytes, unconstrained)
K = locktime (4 bytes, about 29 bits unconstrained, see below)
We can assume that the remaining bits of the predecessor transaction id are unconstrained, because once we find a candidate transaction, we can do a dedicated 40-bit search to find an input whose last 5 bits of the predecessor transaction id match our requirements. Similarly, we can partially constrain the predecessor output index number and only require that it is not greater than (say) 2048, because we can assume that we can create a transaction with 2048 outputs and the appropriate amount in the appropriate output position.
We can assume that $sequence$ is unconstrained, since this field has no consensus meaning for version 1 transactions. Since we assume that the first 32 bytes of a transaction are fixed, we assume that the version number is 1.
$amount$ is an 8-byte field. If the attacker has access to a lot of money, then more bits are free, since the consensus constraint is that the output value must be less than or equal to the input value. However, since the input funds of this transaction are spendable by anyone (or not spendable by anyone), the funds used in this attack may not be recovered (if the attacker is also a miner, then they can try to recover the funds by re-spending the same block, although other miners may orphan the block, thereby stealing the attacker's transaction fees and the spendable output by anyone). Moreover, as long as we can lose funds, we must consider other attacks that can be run for the same cost, such as mining a small number of invalid blocks and using these blocks to attack light clients (note that these attacks are different in nature, though, since this attack can fool a light client into believing that a fake transaction has been confirmed by an arbitrary number of blocks - which may be more valuable than mining a small number of fake blocks, since fake blocks do not have the ability to confirm the target transaction by an arbitrary number of blocks).
For the sake of discussion, we set the cost the attacker is willing to bear to roughly 25 BTC, which is currently worth 2 block rewards. At this time, the number of constrained bits is 33. (If we assume that the attacker is willing to risk 687 BTC, the number of searches required will drop by 5 bits).
$locktime$ is a 4-byte integer that is interpreted as Unix time if it is greater than 5000 0000, and as block height if it is less than this value. The current time is about 1.4 billion seconds since the epoch, so we estimate that there are 900 million valid locktime values, which is about 29 bits of freedom.
Putting this together, we have 81 bits constrained, meaning that an attacker who can run an 81-bit search (plus an extra 40 bits to construct a funding transaction that provides funds that can be spent by anyone) can fool an SPV light node in this way. (Note: this is much smaller than the 128-bit search we would expect to need to find two transactions with the same hash.)
4 Vulnerabilities and Mitigations
4.1 Block meltability
The consensus split risk brought by block melting is rooted in the processing logic around invalid blocks. As mentioned in the Bitcoin Core code comments above, we have realized that as long as the processed blocks may be melted, the consensus logic must never permanently mark any block header as invalid: as long as a block corresponding to the same block header may be valid, permanently disabling the block header may lead to consensus split.
The problem with copied transactions has been fixed since Bitcoin-Qt version 0.6.1. Back then (and now) there was a set of checks on incoming blocks (performed in a function called CheckBlock()
), and if these checks failed, the block was not stored - and check errors were not cached permanently. The new check for copied transactions was added to this set of checks, solving the problem.
Another check going up the Merkle tree, also performed in CheckBlock()
is related to coinbase transactions: if the first transaction in a block does not have the structure required for a coinbase transaction—an input whose previous transactions hash to all 0s and whose index is all 1s—then the block will fail validation. A side effect of this check is that it essentially prevents the problem discussed in Section 3.1 about block meltability before we even know about it—as mentioned above, it requires an attacker to perform at least 224 bits of search work to produce a melted block that passes the coinbase transaction check.
As of Bitcoin Core 0.13.0, the implementation around handling block melt has changed so that the possibility of a merkle tree being melted is checked in CheckBlock()
, and a flag is used to track whether the block has been melted. This flag is used in the logic that decides whether the block should be permanently disabled. But CheckBlock()
is still called twice: once with the feature of returning early on failure, as described above, and once before the block is stored (required to pass the check). So even if a block fails in CheckBlock()
for reasons that make us think it is definitely invalid (it can't be melted), the logic of the function will not cause the block to be permanently disabled.
Therefore, calling it twice seems redundant. In Bitcoin Core commit dbb89dc
, the seemingly redundant call was removed, so that when CheckBlock()
fails due to an unknown melt possibility, the error will permanently disable the corresponding block. Therefore, Bitcoin Core 0.13.0, 0.13.1, and 0.13.2 are all vulnerable to the attack described in Section 3.1.
After the issue was discovered privately, the change was reversed in Bitcoin Core commit ba803ef
, just in time for the 0.14.0 release. 0.13 was the only version known to be vulnerable to this attack.
Further mitigations may be possible in the future, such as never marking a block permanently invalid if it consists entirely of 64-byte transactions [3] or disabling 64-byte transactions in consensus. Proposals for these more immediate fixes have been deferred pending discovery and mitigations of the issues faced by light clients (described in Section 3.2).
4.2 Light Client
There are a number of mitigations that can be implemented to protect light clients from the attacks described in Section 3.2. Starting with Bitcoin Core 0.16.1, 64-byte transactions are no longer allowed into the mempool. If miners adopt this rule, it would be reasonable to propose a consensus change that would invalidate 64-byte transactions, which, if adopted, would permanently eliminate the ambiguity between leaves and internal nodes in the Merkle tree, and would also protect existing light clients (which would not require any changes) [4] .
Note that there have been a small number of 64-byte transactions on the Bitcoin blockchain.
Another mitigation that light clients could employ is to require that proofs also include proofs for the coinbase transaction along with the coinbase transaction itself (which is used to determine the depth of the Merkle tree, which must be the same for all transactions in a block). This would require changes to light client software — specifically, to verify that incoming coinbase transactions are of a predefined form — but would not require changes to consensus.
Sergio Lerner also discussed additional mitigation measures in his report.
References
Sergio Lerner, “ Leaf-Node Vulnerability in Bitcoin Merkle Tree Design ,” August 4, 2017. https://bitslog.com/2018/06/09/leaf-node-weakness-in-bitcoin-merkle-tree-design/
Peter Todd, “ Trusted merkle tree depth for safe tx inclusion proofs without a soft fork ,” June 7, 2018. https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-June/016091.html
footnote
1. In this article, we only discuss the Merkle root value in the block header, so transactions are serialized and hashed without witnesses. ↩
2. Note that any block, whether melted or not, is considered a block by a node only if it can be communicated - that is, the message can be deserialized according to the network's message processing rules. Unintelligible messages are simply discarded. ↩
3. Note that the "64 bytes" in this article refers to the serialized length of a transaction without a witness. ↩
4. We omitted attacks against SPV nodes that claim an internal node is a transaction of interest by walking up the Merkle tree. Under reasonable assumptions, it is possible to fool a light client into thinking an output has been spent with 116 bytes of work. If consensus is changed to invalidate 64-byte transactions, and light clients are modified to reject 64-byte transactions, this problem will no longer be a concern. ↩