Improving column propagation with cell-centric erasure/network coding

05-09

The Fusaka fork will introduce 1D, column-based data availability sampling, where each sample is a column made up of cells and proofs from all blobs. After Fusaka, improving the propagation of columns is one of the main avenues to further increase the blob throughput, and the tools proposed to do so have so far been moving from push-based gossip to an announce-and-pull/push hybrid, and introducing some form of coding, like RS or RLNC. In this post, we explore various ways to do the latter, while also making individual cells be the basic unit of propagation. Thanks to @asn, @dankrad, @potuz, @MarcoPolo , @vbuterin for discussion and feedback.

Goals

Efficient propagation. Apply some form of erasure coding to column propagation, to unlock a better curve in the latency-bandwidth tradeoff space.
Cell-centric gossip. Make cells be the unit of propagation:
- lets nodes use mempool blobs to contribute to propagation even when they only have a (non contiguous) subset of them (e.g. even when pulling a single blob)
- possibly lets us eventually move to 1 message = 1 network packet. For example with 256 columns, 1 cell = 1 KB, and fits into one packet even with its KZG proof.

Note that 2. is a fairly strong constraint. If we just wanted to improve the CL propagation path, without concerning ourselves with the interaction with the EL mempool and distributed blob publishing, this would simplify things significantly for some of the designs, because we could work with fewer, larger chunks. However, the following exploration takes this constraint as a given.

In the rest of the doc, we’ll by default assume 256 columns and 128 blobs total, unless otherwise specified.

Overview of Approaches

The designs we explore primarily differ along two axes:

Encoding Scheme: How redundancy is introduced. We always take a cell to be the fundamental unit of the encoding scheme, and consider two possibilities, both of which consists of a linear combination of the cells in a column:
- Reed-Solomon (RS) coding: Applying standard erasure coding vertically to columns (similar to how it’s already used horizontally for rows in EIP-7594).
- Random Linear Network Coding (RLNC): random linear combinations of the original cells in a column.
Gossip Verification Mechanism: How nodes ensure the integrity and authenticity of the chunks they receive and forward during gossip, before reconstruction. For example, we could break a DataColumnSidecar up in chunks and encode it, but it would only be verifiable once the whole message is reconstructed, whereas we need each chunk to be independently verifiable. We ensure verifiability in one of the following ways:
- KZG Proof Verification: Each cell, including the coded ones, carries its KZG proof, which is verified during gossip. KZG proofs for coded cells are also linear combinations of the original proofs, verified against linear combinations of the original commitments, taking advantage of the linearity of KZG.
- Cell root verification: The block commits to all cell roots, for both the original and coded ones. Lists of cell roots for each column are distributed in the respective subnets, and verified against the commitment in the block. Gossip verification involves a simple check against the relevant list. This approach does not allow recoding during propagation, because we can only verify the cells whose roots were committed to and distributed.
- Pedersen: The proposer uses Pedersen vector commitments to commit to the base cells, and linear combinations are verified homomorphically against these. It’s the approach from this post, adapted to column propagation.

RS

At a high level, we vertically extend the columns through RS encoding (the same used for the horizontal extension), and gossip individual cells. We consider two ways of doing so, which differ in the way they ensure integrity of the transmitted chunks:

RS + KZG: cells come with kzg proofs, which are verified before forwarding
RS + cell roots: the publisher commits to the cells in a simple way, e.g., sending lists of cell roots in the relevant subnets, and committing to these in the block. Forwarding just requires checking against these roots, not proof verification.

The second option has much lower computational cost for the publisher, which does not need to do the expensive computation of extension proofs.

RS + KZG

Each DataColumnSidecar subnet is replaced by a CellSidecar subnet, where all cells corresponding to the vertically extended column are gossiped. In addition, we have a single, global HeaderSidecar subnet, to gossip the signed header with all kzg commitments (and an inclusion proof). The HeaderSidecar topic would have a high-degree mesh, e.g. 30, to ensure that it propagates as fast as possible. We can accept a high duplication factor here, because the message is only a few KBs (most of the size is due to the kzg_commitments, ~6 KBs for 128 blobs)


class CellSidecar(Container)cell: Cellrow_index: RowIndexcolumn_index: ColumnIndexkzg_proof: KZGProof


class HeaderSidecar(Container):signed_block_header: SignedBeaconBlockHeaderkzg_commitments: List[KZGCommitment, MAX_BLOB_COMMITMENTS_PER_BLOCK]kzg_commitments_inclusion_proof: Vector[Bytes32, KZG_COMMITMENTS_INCLUSION_PROOF_DEPTH]

CellSidecar topic (subnet)

cell_sidecar_{subnet_id}

Reject if subnet_id does not match compute_subnet_for_cell_sidecar(column_index)
CellSidecar verification is put on halt until a BeaconBlock or a HeaderSidecar has been received.
Once a kzg_commitment for row_index has been obtained, reject if kzg proof verification fails

While waiting for a matching BeaconBlock orHeaderSidecar, neither message delivery nor forwarding can happen. Therefore, whenever a peer sends us a cell that we cannot yet verify, we request a HeaderSidecar from them (these are quite small, and we can cap the number of parallel requests we send out). We could also downscore a peer that doesn’t respond to this request after a timeout, to punish sending unverifiable cells.

Once a HeaderSidecar has been received, all kzg proof verifications can be scheduled flexibly, to maximize the benefits of batched verification.

HeaderSidecar topic (global)

header_sidecar

Ignore if it is from a future slot or not from a slot greater than the latest finalized slot
Reject if inclusion proof is not valid
Reject if the header’s signature is not valid
Reject if the current finalized checkpoint is not an ancestor of block_header.parent_root
Ignore if it is not the first sidecar for the tuple (block_header.slot, block_header.proposer_index, sidecar.index) with valid header signature and sidecar inclusion proof
Reject if the proposer is not the expected one for block_header.slot in the context of the current shuffling, defined by block_header.parent_root.

KZG commitments for extension cells

The HeaderSidecar only contains the kzg commitments for the original cells. Verifying the extension cells requires us to interpolate the original commitments, i.e., compute C' = M_{RS} \cdot CC′=MRS⋅C where C = (C_1, \dots, C_{128})C=(C1,…,C128), the original commitments, and M_{RS}MRS is the RS matrix. This requires 128 (one per commitment) 128-point MSMs in G1 , around ~130ms multi-threaded (~1ms per 128-point MSM on zka.lc). Note that the computation only has to be done once per node, when first receiving the HeaderSidecar.

We could in principle avoid repeating the computation at every node. kzg_commitments in the HeaderSidecar could contain all commitments, including the extension ones. These would need to be verified for consistency with the original ones, but that can be done faster than recomputation, by choosing a random scalar vector rr and checking r^TC' = (r^T M_{RS}) \cdot CrTC′=(rTMRS)⋅C. This only requires two 128-point MSMs in G1, ~2ms. The tradeoff is of course with bandwidth, as this would mean another ~6 KBs in the HeaderSidecar.

Computational cost

The most expensive operation is the interpolation of the original proofs to compute the extension proofs. For each column, this is the same exact operation we discussed for the extension of KZG commitments, i.e., 128 (one per proof) 128-point MSMs in G1, to compute \pi_j' = M_{RS} \cdot \pi_jπ′j=MRS⋅πj where \pi_j = (\pi_{1j}, \dots, \pi_{128j})πj=(π1j,…,π128j), the vector of original proofs for column jj. The proposer has to do this for every column, in the critical path, or in other words it has to do one 128-point MSM per proof. This seems prohibitively expensive, since it has to compute 128 \times 256 = 32768128×256=32768 proofs. Using the numbers from zka.lc, it would be ~34s already multi-threaded. A further ~3x speedup could come from replacing multiplication by M_{RS}MRS with 2 FFTs, but this still leaves us far from an acceptable cost. Note moreover that, even if we abandoned the idea of fitting a CellSidecar in a single packet, we can’t go any lower than 128 columns, which would only halve the cost.

Removing such a high computational cost at the publishing step is precisely why proof computation for the original blobs has been moved out of the critical path by outsourcing it to tx senders. On the other hand, proof computation for the vertical extension can neither be outsourced nor precomputed as we get blobs from the mempool, because we need all blobs that will go in the block to do it.

RS + cell roots

Let’s consider a different approach, which avoids the high cost of computing the extension proofs, by using a much simpler way to commit to The erasure coding is limited to the cells, and accordingly gossip validation does not require KZG proof verification (not even for the original cells). Instead, it relies on checking against a list of cell_roots that is sent sent separately in the same topic, as part of a ColumnSidecar. In turn, this is verified against a list of column sidecar roots, included in the HeaderSidecar and committed to in the block.


# Column topic# Without a kzg proofclass CellSidecar(Container)cell: Cellrow_index: RowIndexcolumn_index: ColumnIndex# New sidecar with commitments to all cells and carrying the proofsclass ColumnSidecar(Container):column_index: ColumnIndexcell_roots: List[Root, 2 * MAX_BLOB_COMMITMENTS_PER_BLOCK]kzg_proofs: List[KZGProof, MAX_BLOB_COMMITMENTS_PER_BLOCK]


# Global topics# Adding a commitment to the column sidecarsclass HeaderSidecar(Container):signed_block_header: SignedBeaconBlockHeadercolumn_sidecar_roots: List[Root, NUMBER_OF_COLUMNS]column_sidecar_roots_inclusion_proof: Vector[Bytes32, COLUMN_SIDECAR_ROOTS_INCLUSION_PROOF_DEPTH]kzg_commitments: List[KZGCommitment, MAX_BLOB_COMMITMENTS_PER_BLOCK]kzg_commitments_inclusion_proof: Vector[Bytes32, KZG_COMMITMENTS_INCLUSION_PROOF_DEPTH]class BeaconBlockBody(Container):...# need a better namecolumn_sidecar_roots_root: Root

As before, we propagate a HeaderSidecar in a global topic. In addition to its previous content, it now contains a list of roots, column_sidecar_roots, plus an inclusion proofcolumn_sidecar_roots_inclusion_proof against header.body_root. The BeaconBlockBody itself only contains a commitment to thecolumn_sidecar_roots, i.e., column_sidecar_roots_root = hash_tree_root(column_sidecar_roots).

In the subnet for a given column, we propagate:

column_sidecar, containing cell_roots, committing to all cells for the given column, including those for the extension cells, and moreover containing all kzg_proofs. For gossip validation, we check hash_tree_root(column_sidecar) == header_sidecar.column_sidecar_roots[column_sidecar.column_index]. Note that the kzg_proofs are not verified as part of gossip validation.
cell_sidecar corresponding to this column. For gossip validation, we check that column_sidecar.cell_roots[cell_sidecar.row_index] == hash_tree_root(cell_sidecar.cell).

Some notes:

The publisher does not need to do any heavy computation, just simple erasure coding of the cells (not in the group) and hashing.
Since we are verifying the cell sidecars against cell_roots (itself in turn verified against column_sidecar_roots), we don’t need to verify a kzg_proof as part of forwarding, so kzg verification does not slow down propagation.
We do need to verify all kzg proofs for a column before considering it available (and attesting). The kzg proofs are contained in the ColumnSidecar, and can be verified as soon as the respective cells are also received. Since kzg proof verification is decoupled from forwarding, this can be done while maximally taking advantage of batching.
Just like in the RS + KZG scheme, the interaction with getBlobs is ideal: when we retrieve a blob from the mempool, we get a full CellSidecar for each column, which we can immediately propagate
All benefits from the RS + KZG scheme apply here as well.

Proposer without the blob data

In RS + KZG case, as well as in the Fusaka protocol, a proposer that somehow knows the blobs associated to a list of blob txs are available, without having downloaded them, could just include those txs, and the respective KZG commitments, and entirely rely on distributed blob publishing to propagate the actual blob data on the CL (whether cells or data column sidecars). This could for example be the case if a committee attests to the availability of mempool txs.

On the other hand, this protocol does not allow the proposer to do so, because having the cells (or to be precise the cell_roots) and the kzg_proofs is required in order to create the column_sidecars, which are themselves needed in order for cell_sidecar propagation to happen. Note that the column_sidecars have to also be committed to in the block, so the proposer really needs them at block creation time.

RLNC

Here we consider two ways to adopt this proposal to our goals for column propagation. In both cases, we use cell-level chunks, and the main difference is in how we verify chunks during propagation: one option is to reuse KZG proofs, the other to stick with Pedersen commitments as in the post.

RLNC + KZG

We can leverage the linearity property of KZG commitments and proofs to enable RLNC for column propagation, while maintaining the verifiability of the recombined chunks tied back to the original KZG row commitments:

Chunks correspond to cells, due to our initial goal of cell-centric gossip. They travel with and are verified via kzg proofs, so that doing a random linear combination of chunks necessitates also doing the same linear combination of the associated proofs, to produce the proof for the recombined chunk.
Verifying a chunk necessitates computing the corresponding KZG commitment, which is the linear combination of the original commitments, with the given scalars.

Protocol

Base Chunks: The fundamental unit (chunk) of RLNC is a cell, and the original cells are the base chunks that we want to transmit. Let v_{ij}vij denote the data payload (field elements) of the cell at row ii and column jj. Associated with v_{ij}vij is its KZG proof \pi_{ij}πij (proving the evaluation against the row commitment C_iCi).
Coded Chunks & Representation: Any chunk vv propagated via RLNC represents a linear combination of the base chunks for that column jj, v = \sum_i s_i v_{ij}v=∑isivij. Each chunk vv is transmitted along with its corresponding aggregated KZG proof \pi = \sum_i s_i \pi_{ij}π=∑isiπij and the vector of scalars s = (s_i)_is=(si)i defining the combination. For the base chunk v_{ij}vij, the scalar vector ss is simply given by s_k = 1sk=1 if k=ik=i and 00 otherwise.
RLNC Encoding (Re-coding): A node possessing a set of input chunks (v_k, \pi_k, s_k)_k(vk,πk,sk)k for column jj (where each s_k = (s_{ki})_isk=(ski)i so v_k = \sum_i s_{ki} v_{ij}vk=∑iskivij and \pi_k = \sum_i s_{ki} \pi_{ij}πk=∑iskiπij) can create a new coded chunk. It selects a set of random scalars (r_k)_k(rk)k and computes:
- Coded Data: v = \sum_k r_k v_kv=∑krkvk.
- New Scalars: The new vector of scalars s' = (s'_i)_is′=(s′i)i such that v = \sum_i s_i' v_{ij}v=∑is′ivij. For this, s'_i = \sum_k r_k s_{ki}s′i=∑krkski.
- Coded Proof: \pi = \sum_k r_k \pi_kπ=∑krkπk. (Note that this is equivalent to \pi = \sum_i s'_i \pi_{ij}π=∑is′iπij).
RLNC Packet: The node gossips a packet containing (v, \pi, s')(v,π,s′). To fit the existing sidecar structure, this can be represented as a CellSidecar where cell holds vv, kzg_proof holds \piπ, and the new rlnc_scalars field holds ss. The column_index is retained, but row_index is repleced by rlnc_scalars, since the packet represents a combination of rows.
```
class CellSidecar(Container):cell: Cellcolumn_index: ColumnIndexkzg_proof: KZGProofrlnc_scalars: List[Bytes32, MAX_BLOB_COMMITMENTS_PER_BLOCK]
```
Verification: A node receives (v, \pi, s)(v,π,s) for column jj
- It has the original row commitments C_iCi from the HeaderSidecar (same as in the RS + KZG approach).
- It computes the combined commitment: C = \sum_i s_i C_iC=∑isiCi.
- It verifies the KZG proof \piπ against the coded data vv and the combined commitment CC, using the evaluation points corresponding to jj (column_index).

Scalar Size and Overhead

A concern with RLNC is the overhead of transmitting the scalar vector s = (s_i)_is=(si)i (rlnc_scalars). Let NN be the number of base chunks (original rows/cells per column, e.g., 128).

Full-Size Scalars: If s_isi are full 32-byte scalars, the overhead is 32N32N bytes, i.e. 128 \times 32 = 4096128×32=4096 bytes, prohibitively high.
Small Scalars: When recombining chunks, we could always choose new scalars r_krk from a smaller range (e.g., 1 byte, 0-255). This still gives us a high probability of sending a useful chunk (if at all possible), \ge 1 - 1/256 \approx 99.6≥1−1/256≈99.6%. Initial overhead is low (e.g., N \times 1 = 128N×1=128 bytes with 1-byte scalars).
Scalar Blow-up: Re-coding computes s'_i = \sum_k r_k s_{ki}s′i=∑krkski. If s_{ki}ski requires B_sBs bits and r_krk requires B_bBb bits, s'_is′i requires roughly B_s + B_r + \log_2(m)Bs+Br+log2(m) bits, where mm is the number of terms combined. Bit length grows roughly linearly with the number of re-coding hops (hh). If starting with 1-byte scalars (B=8B=8) and assuming propagation needs h=4h=4 hops, scalars might need roughly (h+1)B \approx 5(h+1)B≈5 bytes (40 bits) at the last hop. For N =128N=128, total overhead at the last hop would be 128 \times 5 = 640128×5=640 bytes.

Fighting the scalar blow-up

We could also experiment with different strategies to reduce the blow-up, for example:

Only the publisher recodes, in order to have initial redundancy but always keep the same 1-byte scalars. In other words, we would be using the random linear coding purely as a replacement for erasure coding, with the only benefit being that we can trade-off the FFT optimizations of RS encoding and its stronger guarantees (knowing that any 50% of the chunks is guaranteed to be able to reconstruct) for using smaller scalars. In particular, we get:
- 32x lower computational load for the publisher. As we have seen for RS + KZG and as we discuss in the next section, the load for publisher is a huge problem of these schemes
- The small scalars used by the proposer can be agreed upon in advance. Again, we’re just replacing the RS code matrix with another matrix of 1-byte scalars. Then, the combined commitments can be computed just once, as soon as the HeaderSidecar with the original commitments is received. Moreover, since the code is known in advance, we don’t need to include the scalars in the messages, which avoids the 128 B of overhead per cell.
Only recode at least M chunks, for some MM to figure out experimentally. The idea here would be to directly forward chunks in the earlier phases of propagation, when duplicates are less likely, and to only do recoding when it is most useful (somewhat related to push-pull phase transition). That way, the first few hops don’t blow-up the scalars at all. Note also that part of the reason to do recoding is that it lets you avoid needing to choose which chunks to prioritize when you’re bandwidth constrained: if you are not bandwidth constrained, you could always just send all the chunks you have to every peer, but if you are bandwidth constrained it is best to send as many recoded chunks are you are able, to maximize the chances of the chunks you send being useful to the receiver. When you don’t yet have many chunks and you can forward all of them (and presumably the receiver downlink is similarly not saturated), recoding isn’t as useful.

Computational Cost

Sender: To combine mm received packets (v_k, \pi_k, s_k)(vk,πk,sk) using coefficients r_krk, the bulk of the cost is computing \pi = \sum r_k \pi_kπ=∑rkπk, which requires one m-term G1 MSM. At worst m = 128m=128. As we’ve already seen in the RS + KZG case, this takes ~1ms on zka.lc, already multi-threaded. This is for full scalars. With 1-byte scalars, the cost becomes 32x smaller, and close to negligible.
Receiver: To verify a received packet (v, \pi, s)(v,π,s) where ss has mm non-zero entries, the bulk of the cost is:
- Compute C = \sum_i s_i C_iC=∑isiCi, which again requires one mm-term G1 MSM (~1ms)
- Verify the kzg proof for (v, \pi, C)(v,π,C), requiring 2 pairings (~1.5ms)
  Note that, even if we use 1-byte scalars, the s_isi here are not freshly generated scalars, and are thus subject to blow-up, whereas the linear combination of proofs done by the sender directly uses the newly generated small scalars, and thus always maintains the same speedup. However even with some blow-up, the pairings would dominate the cost here.
Proposer: To generate RR coded chunks for a column, the proposer incurs the worst case cost of the sender of a packet RR times, i.e., RR 128-term G1 MSMs. It needs to this for each column, so 256 times. To tolerate a loss of ~50% of the chunks, we need R = 256R=256, so the cost is 256 \times (128\text{-term MSM})256×(128-term MSM) per column. Alternatively, the proposer could publish the original chunks (base cells v_{ij}vij with their proofs \pi_{ij}πij) as well, and only generate R = 128R=128 coded chunks (or less, depending on what the desired redundancy is).

Using the latter strategy, the cost for the proposer is the same as in RS + KZG, which makes sense since we’re essentially also erasure coding the block with a 1/2 rate, but with a random code. Unlike RS + KZG, we cannot even benefit from FFTs since the code is not structured that way. However, there’s a big advantage if we use 1-byte scalars instead of full (32-byte) scalars, as this should result in a roughly ~32x speedup in the MSMs, bringing the cost down to ~1s.

RLNC + Pedersen

Let’s now discuss the approach from the RLNC ethresearch post, which uses Pedersen commitments to guarantee integrity of the recoded chunks. A lot of what follows is just re-stating what’s in the post, for the specific case of propagating columns, and reasoning through the concrete overheads.

Protocol

Base Chunks & Commitments: The base chunk is still a cell v_{ij}vij, seen as a vector of M=32M=32 field elements (v_{ijk})_k(vijk)k (for 256 columns). The proposer computes a Pedersen Vector Commitment for each base cell using M=32M=32 fixed generator points G_lGl: C_{ij} = \sum_{l=1}^{32} v_{ijk} G_lCij=∑32l=1vijkGl. These original commitments C_{0j}, ..., C_{N-1, j}C0j,...,CN−1,j (where N=128 is the number of original cells per column) can, much like the cell_roots of the RS + cell roots approach, be distributed in a ColumnSidecar which is itself committed to in the block.
```
class ColumnSidecar(Container):column_index: ColumnIndexpedersen_commitments: List[PedersenCommitment, MAX_BLOB_COMMITMENTS_PER_BLOCK]kzg_proofs: List[KZGProof, MAX_BLOB_COMMITMENTS_PER_BLOCK]
```
Coded Chunks & Representation: Any chunk vv propagated represents a linear combination of base chunks for column jj, v = \sum_i s_i v_{ij}v=∑isivij.
RLNC Encoding (Re-coding): A node takes input chunks (v_k, s_k)_k(vk,sk)k (where s_k = (s_{ki})_isk=(ski)i defines v_k = \sum_i s_{ki} v_{ij}vk=∑iskivij). It selects random scalars (r_k)_k(rk)k, and computes:
- Coded Data: v = \sum_k r_k v_kv=∑krkvk.
- New Scalars: s' = (s'_i)_is′=(s′i)i where s'_i = \sum_k r_k s_{ki}s′i=∑krkski.

RLNC Packet: The node gossips (v, s')(v,s′), in this form:


class CellSidecar(Container):cell: Cellcolumn_index: ColumnIndexrlnc_scalars: List[Bytes32, MAX_BLOB_COMMITMENTS_PER_BLOCK]

Verification: A node receives (v, s')(v,s′).
- It obtains the original base Pedersen commitments C_{ij}Cij for column jj from the ColumnSidecar.
- It computes the expected combined commitment based on the scalars: C' = \sum_i s'_i C_{ij}C′=∑is′iCij.
- It directly computes the commitment for vv: C = \sum_{l=1}^{32} v_l G_lC=∑32l=1vlGl.
- It verifies if C' = CC′=C.

Computational Cost

Proposer:
- Has to compute one Pedersen Vector Commitment C_{ij} = \sum_{l=1}^{32} a_{ij,l} G_lCij=∑32l=1aij,lGl for each original cell (i=0..N-1i=0..N−1). This requires one M-term (32-term) MSM in the group per original cell, and 128 such MSMs per column. For all 256 columns, that’s still 32768 MSMs like in the RS + KZG and RLNC + KZG cases, but with each MSM having 32 terms instead of 128. Still prohibitively high.
Sender (Intermediate Node re-coding): Only cheap data and scalar arithmetic (v = \sum r_k v_kv=∑rkvk, s' = \sum r_k s_ks′=∑rksk). No group arithmetic, negligible cost.
Receiver (Verifying Packet): one 128128-term MSM in the group (\sum_i s'_i C_{ij}∑is′iCij), and one M-term (32-term) MSM in the group (\sum_{l=1}^{32} v_l G_l∑32l=1vlGl)

A couple of comments:

Unlike the RLNC + KZG case, using 1-byte scalars does not reduce the computational overhead for the publisher, since this is due to having to compute the Pedersen commitments to the original cells, not related to recombined chunks
Just like we have outsourced kzg proof computation to the tx senders, we could in principle outsource

Sector:

Governance

DeFi

Decentralized Exchange (DEX) Token

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content