Gas/Security-Bit for PQ Signatures on EVM: Dataset + Methodology

Ethereum Research

12-18

Gas per Secure Bit: a normalized benchmark for PQ signatures on EVM

Happy holidays everyone.

Following up on the AA / ERC-4337 / PQ signatures discussion in this thread:

I ended up isolating one missing piece that keeps coming up implicitly:

We don’t have a normalized unit to compare different signature schemes at different security levels on EVM.

Most comparisons use “gas per verify”, but that silently mixes:

different security targets (e.g., ~128-bit ECDSA vs Cat3/Cat5 PQ schemes),
different verification surfaces (EOA vs ERC-1271 / AA),
and sometimes different benchmark scopes (pure verify vs full handleOps pipelines).

That makes it hard to answer basic engineering questions like:
“Is ML-DSA-65 viable on EVM relative to Falcon, under explicit assumptions?”

What I built

A small benchmark lab + dataset with explicit provenance and explicit security denominators:

Repo: GitHub - pipavlo82/gas-per-secure-bit: Gas per secure bit benchmarking for PQ signatures and VRF.

Core idea:


gas_per_secure_bit = gas_verify / security_bits

I intentionally report two denominators, because both viewpoints are useful:

Metric A — Baseline normalization (128-bit baseline)

This answers: “What is the cost per 128-bit baseline unit?”


gas_per_128b = gas_verify / 128

This is not claiming every scheme is 128-bit secure; it’s just a budgeting/normalization tool.

Metric B — Security-equivalent bits (declared convention)

This answers: “How costly is each ‘security bit’ under a declared normalization convention?”


gas_per_sec_equiv_bit = gas_verify / security_equiv_bits

For signatures I currently use the following explicit convention:

Scheme	NIST category (where applicable)	security_equiv_bits
ECDSA (secp256k1)	—	128
ML-DSA-65 (FIPS-204, Cat 3)	3	192
Falcon-1024 (Cat 5)	5	256

I use a simple mapping Cat{1,3,5} → {128,192,256} as a declared normalization convention (open to better community conventions).

Note: security_equiv_bits is a declared normalization convention for comparability. It is not a security proof and not a NIST-provided “bits” value.

Category sources:

Provenance & reproducibility

All numbers are currently single-run gas snapshots (no averaging) with full provenance:
repo, commit, bench_name, chain_profile, and a notes field.

No hidden averaging, no “best-of-N” selection — just reproducible snapshots others can rerun.

Results (current snapshots)

Chart (security-equivalent bits)

Raw SVG (recommended):

https://raw.githubusercontent.com/pipavlo82/gas-per-secure-bit/main/docs/gas-per-sec-equiv-bit-chart.svg

GitHub page:

https://github.com/pipavlo82/gas-per-secure-bit/blob/0b126bc2d2ee82f6f25c91b565106b243d4b077c/docs/gas-per-sec-equiv-bit-chart.svg

(These benches are not all the same surface; treat this as a normalized dataset view, not a single ranking.)

A few key rows (baseline normalization — divide by 128)

Scheme / bench	Gas	gas_per_128b	Notes
ECDSA ecrecover	21,126	165	classical baseline; not PQ-secure (Shor)
Falcon getUserOpHash	218,333	1,705	small AA primitive
ML-DSA-65 PreA (isolated hot-path)	1,499,354	11,714	optimized compute core
Falcon full verify	10,336,055	80,751	PQ full verify
ML-DSA-65 verify POC	68,901,612	538,294	end-to-end POC

Security-equivalent normalization (divide by security_equiv_bits)

Scheme / bench	Gas	security_equiv_bits	gas_per_sec_equiv_bit
Falcon getUserOpHash	218,333	256	853
ML-DSA-65 PreA	1,499,354	192	7,809
Falcon full verify	10,336,055	256	40,375
ML-DSA-65 verify POC	68,901,612	192	358,863

What stood out to me:

ML-DSA-65 PreA lands at ~7,809 gas / security-equivalent bit (Cat3-equivalent)
Falcon-1024 full verify lands at ~40,375 gas / security-equivalent bit (Cat5-equivalent)

That’s roughly a 5.2× difference for those specific benches.

This is not “ML-DSA beats Falcon overall”; it’s a narrower claim:
some ML-DSA verification surfaces can be made much more EVM-friendly if you avoid recomputing heavy public structure on-chain.

What “PreA” means (why it changes the picture)

In standard ML-DSA verification, a large portion of the cost is effectively:
ExpandA + converting the public matrix into the NTT domain.

The “PreA” path isolates the hot arithmetic core (A·z − c·t₁ in the NTT domain) by accepting A_ntt precomputed, and binding it with CommitA to prevent matrix substitution.

In my harness, A_ntt is derived from the public key seed (rho) and then bound via CommitA to prevent substitution.

This is an explicit engineering design point (especially in AA contexts): move large public structure off-chain, but keep it cryptographically bound.

Rough breakdown (current harness):

Full compute_w with on-chain ExpandA+NTT(A): ~64.8M gas
Isolated matrix multiply core (PreA): ~1.5M gas

Implementation:

ML-DSA-65 verifier: GitHub - pipavlo82/ml-dsa-65-ethereum-verification
Benchmark lab: GitHub - pipavlo82/gas-per-secure-bit: Gas per secure bit benchmarking for PQ signatures and VRF.

Why this matters for AA / ERC-7913

In AA, the unit you care about is rarely “verify one signature in isolation”.
You care about stable ABI surfaces and comparability across candidates.

ERC-7913 provides a generic verification interface.

My working assumption: if we want PQ adoption to be engineered (not guessed), we need:

a shared benchmark schema,
explicit security denominators,
and comparable surfaces (pure verify vs AA pipeline).

Open questions / feedback welcome

1) Hash/XOF wiring on EVM
For EVM implementations: do we want (a) strict FIPS SHAKE wiring, (b) Keccak-based non-conformant variants, or (c) dual-mode implementations with explicit labeling in the dataset?

2) Is the dual-metric approach reasonable?
Baseline normalization is useful for budgeting; security-equivalent bits are useful for honest efficiency per security unit. Any objections to reporting both?

3) PreA standardization options
What’s the least-bad approach in AA context?

calldata (large, but stateless),
storage per key,
precompile,
hybrid with CommitA binding?

Reproducibility quick start


git clone https://github.com/pipavlo82/gas-per-secure-bitcd gas-per-secure-bitRESET_DATA=0 MLDSA_REF="feature/mldsa-ntt-opt-phase12-erc7913-packedA" \./scripts/run_vendor_mldsa.shRESET_DATA=0 ./scripts/run_ecdsa.shQA_REF=main RESET_DATA=0 ./scripts/run_vendor_quantumaccount.shtail -n 20 data/results.csv

Thanks for reading — I’m very open to corrections on conventions, better threat-model framing, and suggestions on which schemes/surfaces to add next.

Links

Benchmark repo: GitHub - pipavlo82/gas-per-secure-bit: Gas per secure bit benchmarking for PQ signatures and VRF.
ML-DSA-65 implementation: GitHub - pipavlo82/ml-dsa-65-ethereum-verification
Chart (raw SVG): https://raw.githubusercontent.com/pipavlo82/gas-per-secure-bit/main/docs/gas-per-sec-equiv-bit-chart.svg
ERC-7913: ERC-7913: Signature Verifiers
Original AA/PQ discussion: The road to Post-Quantum Ethereum transaction is paved with Account Abstraction (AA)

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content