Gas/Security-Bit for PQ Signatures on EVM: Dataset + Methodology

Gas per Secure Bit: a normalized benchmark for PQ signatures on EVM

Happy holidays everyone.

Following up on the AA / ERC-4337 / PQ signatures discussion in this thread:

I ended up isolating one missing piece that keeps coming up implicitly:

We don’t have a normalized unit to compare different signature schemes at different security levels on EVM.

Most comparisons use “gas per verify”, but that silently mixes:

  • different security targets (e.g., ~128-bit ECDSA vs Cat3/Cat5 PQ schemes),

  • different verification surfaces (EOA vs ERC-1271 / AA),

  • and sometimes different benchmark scopes (pure verify vs full handleOps pipelines).

That makes it hard to answer basic engineering questions like:
“Is ML-DSA-65 viable on EVM relative to Falcon, under explicit assumptions?”


What I built

A small benchmark lab + dataset with explicit provenance and explicit security denominators:

Repo: GitHub - pipavlo82/gas-per-secure-bit: Gas per secure bit benchmarking for PQ signatures and VRF.

Core idea:

gas_per_secure_bit = gas_verify / security_bits

I intentionally report two denominators, because both viewpoints are useful:

Metric A — Baseline normalization (128-bit baseline)

This answers: “What is the cost per 128-bit baseline unit?”

gas_per_128b = gas_verify / 128

This is not claiming every scheme is 128-bit secure; it’s just a budgeting/normalization tool.

Metric B — Security-equivalent bits (declared convention)

This answers: “How costly is each ‘security bit’ under a declared normalization convention?”

gas_per_sec_equiv_bit = gas_verify / security_equiv_bits

For signatures I currently use the following explicit convention:

SchemeNIST category (where applicable)security_equiv_bits
ECDSA (secp256k1)128
ML-DSA-65 (FIPS-204, Cat 3)3192
Falcon-1024 (Cat 5)5256

I use a simple mapping Cat{1,3,5} → {128,192,256} as a declared normalization convention (open to better community conventions).

Note: security_equiv_bits is a declared normalization convention for comparability. It is not a security proof and not a NIST-provided “bits” value.

Category sources:


Provenance & reproducibility

All numbers are currently single-run gas snapshots (no averaging) with full provenance:
repo, commit, bench_name, chain_profile, and a notes field.

No hidden averaging, no “best-of-N” selection — just reproducible snapshots others can rerun.


Results (current snapshots)

Chart (security-equivalent bits)

Raw SVG (recommended):

https://raw.githubusercontent.com/pipavlo82/gas-per-secure-bit/main/docs/gas-per-sec-equiv-bit-chart.svg

GitHub page:

https://github.com/pipavlo82/gas-per-secure-bit/blob/0b126bc2d2ee82f6f25c91b565106b243d4b077c/docs/gas-per-sec-equiv-bit-chart.svg

(These benches are not all the same surface; treat this as a normalized dataset view, not a single ranking.)

A few key rows (baseline normalization — divide by 128)

Scheme / benchGasgas_per_128bNotes
ECDSA ecrecover21,126165classical baseline; not PQ-secure (Shor)
Falcon getUserOpHash218,3331,705small AA primitive
ML-DSA-65 PreA (isolated hot-path)1,499,35411,714optimized compute core
Falcon full verify10,336,05580,751PQ full verify
ML-DSA-65 verify POC68,901,612538,294end-to-end POC

Security-equivalent normalization (divide by security_equiv_bits)

Scheme / benchGassecurity_equiv_bitsgas_per_sec_equiv_bit
Falcon getUserOpHash218,333256853
ML-DSA-65 PreA1,499,3541927,809
Falcon full verify10,336,05525640,375
ML-DSA-65 verify POC68,901,612192358,863

What stood out to me:

  • ML-DSA-65 PreA lands at ~7,809 gas / security-equivalent bit (Cat3-equivalent)

  • Falcon-1024 full verify lands at ~40,375 gas / security-equivalent bit (Cat5-equivalent)

That’s roughly a 5.2× difference for those specific benches.

This is not “ML-DSA beats Falcon overall”; it’s a narrower claim:
some ML-DSA verification surfaces can be made much more EVM-friendly if you avoid recomputing heavy public structure on-chain.


What “PreA” means (why it changes the picture)

In standard ML-DSA verification, a large portion of the cost is effectively:
ExpandA + converting the public matrix into the NTT domain.

The “PreA” path isolates the hot arithmetic core (A·z − c·t₁ in the NTT domain) by accepting A_ntt precomputed, and binding it with CommitA to prevent matrix substitution.

In my harness, A_ntt is derived from the public key seed (rho) and then bound via CommitA to prevent substitution.

This is an explicit engineering design point (especially in AA contexts): move large public structure off-chain, but keep it cryptographically bound.

Rough breakdown (current harness):

  • Full compute_w with on-chain ExpandA+NTT(A): ~64.8M gas

  • Isolated matrix multiply core (PreA): ~1.5M gas

Implementation:


Why this matters for AA / ERC-7913

In AA, the unit you care about is rarely “verify one signature in isolation”.
You care about stable ABI surfaces and comparability across candidates.

ERC-7913 provides a generic verification interface.

My working assumption: if we want PQ adoption to be engineered (not guessed), we need:

  • a shared benchmark schema,

  • explicit security denominators,

  • and comparable surfaces (pure verify vs AA pipeline).


Open questions / feedback welcome

1) Hash/XOF wiring on EVM
For EVM implementations: do we want (a) strict FIPS SHAKE wiring, (b) Keccak-based non-conformant variants, or (c) dual-mode implementations with explicit labeling in the dataset?

2) Is the dual-metric approach reasonable?
Baseline normalization is useful for budgeting; security-equivalent bits are useful for honest efficiency per security unit. Any objections to reporting both?

3) PreA standardization options
What’s the least-bad approach in AA context?

  • calldata (large, but stateless),

  • storage per key,

  • precompile,

  • hybrid with CommitA binding?


Reproducibility quick start

git clone https://github.com/pipavlo82/gas-per-secure-bitcd gas-per-secure-bitRESET_DATA=0 MLDSA_REF="feature/mldsa-ntt-opt-phase12-erc7913-packedA" \./scripts/run_vendor_mldsa.shRESET_DATA=0 ./scripts/run_ecdsa.shQA_REF=main RESET_DATA=0 ./scripts/run_vendor_quantumaccount.shtail -n 20 data/results.csv

Thanks for reading — I’m very open to corrections on conventions, better threat-model framing, and suggestions on which schemes/surfaces to add next.


Links


Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments