Skip to content

fix(soundness): bound untrusted num_instances in native + recursion verifiers#1360

Draft
spherel wants to merge 3 commits into
masterfrom
fix/issue-1178-1240-num-instance-bounds
Draft

fix(soundness): bound untrusted num_instances in native + recursion verifiers#1360
spherel wants to merge 3 commits into
masterfrom
fix/issue-1178-1240-num-instance-bounds

Conversation

@spherel

@spherel spherel commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

num_instances (per-shard, per-circuit instance counts) is read straight from
the untrusted proof and then flows into size/shift/field arithmetic in both
verifiers. Neither side bounded it, so a crafted proof could push these
quantities out of range — a verifier-side DoS and a field-order decoupling, i.e.
a soundness concern. This PR adds the matching upper bound on both the native
Rust verifier and the recursion verifier so they stay in lockstep.

Problem

In the native verifier, proof.num_instances feeds:

  • proof.num_instances.iter().sum() — can wrap (or panic in debug);
  • next_pow2_instance_padding and the 1 << log2_num_instances usize shifts —
    can overflow / panic on attacker input (DoS);
  • the from_canonical_usize transcript casts — can exceed the base-field modulus
    and wrap, decoupling the transcript-bound value from the real count.

The recursion verifier (verify_chip_proof_pre_main) is worse: it does not even
derive log2_num_instances/sum_num_instances — it reads them as untrusted
hints
. An out-of-range log2_num_instances drives pow_2(log2_num_instances)
past the BabyBear field size (~2³¹), wrapping the next_pow2/next_pow2 - sum
offset used by the eq selectors, which would let the recursive proof verify a
different statement than the native verifier.

Fix

A valid proof's instance count is always bounded by the PCS setup size
(2^MAX_NUM_VARIABLES, with the witness committed at ≤ 2^MAX_NUM_VARIABLES), so
rejecting anything larger only drops malformed proofs earlier and does not change
the statement verified for any valid proof.

  • Native (ceno_zkvm/src/scheme/verifier.rs): reject any per-shard
    num_instances entry (and their sum) > 2^MAX_NUM_VARIABLES, both where the
    count is first summed and (authoritatively) in verify_chip_proof_pre_main,
    right before the shift/cast it protects.
  • Recursion (ceno_recursion/src/zkvm_verifier/verifier.rs): in
    verify_chip_proof_pre_main, before the hints feed pow_2(log2_num_instances)
    and the offset, assert log2_num_instances ≤ MAX_NUM_VARIABLES and
    sum_num_instances / each num_instances[i] ≤ 2^MAX_NUM_VARIABLES. The bound
    lives in the verifier function (not the hint read(), which is not part of the
    verified circuit).

2^MAX_NUM_VARIABLES = 2^24 is far below the BabyBear modulus, so the field-order
wrap is closed on both sides.

Test plan

  • cargo check / cargo clippy clean for both ceno_zkvm and ceno_recursion
  • Native: integration e2e subset (fibonacci + ceno_rt_alloc + keccak_syscall,
    debug MOCK + release prove/verify, single + multi-shard) — 9/9 pass; every
    release run exercises the new bound in verify_chip_proof_pre_main
  • Recursion: e2e_aggregate over multi-shard keccak_syscall verifies —
    both leaf proofs run verify_chip_proof_pre_main (and thus the new asserts)
    for every chip across 2 shards, then internal aggregation completes (final
    height 1), confirming valid proofs are not rejected

Follow-up (out of scope)

This closes the bound (DoS / field-wrap). Fully binding the recursion's
log2_num_instances / bit-decomposition hints to be exactly consistent with
num_instances (rather than only bounding them) — which the native side gets for
free by deriving log2 — remains a separate hardening item.

Closes #1178, #1240.

🤖 Generated with Claude Code

spherel and others added 2 commits June 13, 2026 00:23
…, #1240)

`proof.num_instances` is read from the untrusted proof and flows into
`next_pow2_instance_padding`, the `1 << log2_num_instances` usize shifts, and the
`from_canonical_usize` transcript casts. With no upper bound, a crafted proof
could:
  * make `proof.num_instances.iter().sum()` wrap (or panic in debug);
  * drive `log2_num_instances` past the usize bit-width so `1 << log2_num_instances`
    panics / wraps (a verifier DoS on attacker input, per the repo's panic-on-
    untrusted-input concern); and
  * exceed the base-field modulus so `from_canonical_usize` wraps, decoupling the
    transcript-bound value from the real count (the #1240 field-order concern).

Fix: reject any per-shard `num_instances` entry (and their sum) larger than the
PCS setup bound `2^MAX_NUM_VARIABLES` up front. A valid proof's instance count is
always within this bound (the witness is committed at `<= 2^MAX_NUM_VARIABLES`),
so this only rejects malformed proofs earlier and does not change the statement
verified for any valid proof — keeping native/recursion lockstep intact. The
guard is placed both where the count is first summed and (authoritatively) in
`verify_chip_proof_pre_main`, right before the shift/cast it protects.
`2^MAX_NUM_VARIABLES = 2^24` is far below the BabyBear modulus (~2^31), so the
field-order wrap (#1240) is also closed.

Recursion mirror: the recursion verifier reads `sum_num_instances`,
`sum_num_instances_minus_one_bit_decomposition` and `log2_num_instances` as
unconstrained hints (ceno_recursion/src/zkvm_verifier/binding.rs:585-587 — they
are never bound to each other or to a field-order limit). Mirroring this guard
and verifying the bit-decomposition/log2 derivation there is the matching
follow-up but requires recursion-DSL changes that cannot be validated without an
end-to-end recursion run, so it is intentionally left out of this commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Companion to the native-verifier bound. The native verifier rejects any
num_instances entry (and their sum) exceeding 2^MAX_NUM_VARIABLES and derives
log2_num_instances from those bounded counts. The recursion verifier reads
sum_num_instances and log2_num_instances as untrusted hints without bounding
them, so a crafted proof could drive pow_2(log2_num_instances) past the field
size and wrap the next_pow2 - sum offset, verifying a different statement than
the native verifier.

verify_chip_proof_pre_main now bounds log2_num_instances <= MAX_NUM_VARIABLES and
sum_num_instances / each num_instances[i] <= 2^MAX_NUM_VARIABLES before they are
used. Valid proofs are always within this bound, so only malformed proofs are
rejected; native and recursion stay in lockstep.

Validated: cargo check / clippy clean; aggregation e2e over multi-shard
keccak_syscall verifies (leaf proofs run the asserts for every chip, valid
proofs are not rejected).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@spherel spherel force-pushed the fix/issue-1178-1240-num-instance-bounds branch from a3bf785 to 70d3710 Compare June 16, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

structural witness max length check under field arithmetics

2 participants