fix(soundness): bound untrusted num_instances in native + recursion verifiers#1360
Draft
spherel wants to merge 3 commits into
Draft
fix(soundness): bound untrusted num_instances in native + recursion verifiers#1360spherel wants to merge 3 commits into
spherel wants to merge 3 commits into
Conversation
…, #1240) `proof.num_instances` is read from the untrusted proof and flows into `next_pow2_instance_padding`, the `1 << log2_num_instances` usize shifts, and the `from_canonical_usize` transcript casts. With no upper bound, a crafted proof could: * make `proof.num_instances.iter().sum()` wrap (or panic in debug); * drive `log2_num_instances` past the usize bit-width so `1 << log2_num_instances` panics / wraps (a verifier DoS on attacker input, per the repo's panic-on- untrusted-input concern); and * exceed the base-field modulus so `from_canonical_usize` wraps, decoupling the transcript-bound value from the real count (the #1240 field-order concern). Fix: reject any per-shard `num_instances` entry (and their sum) larger than the PCS setup bound `2^MAX_NUM_VARIABLES` up front. A valid proof's instance count is always within this bound (the witness is committed at `<= 2^MAX_NUM_VARIABLES`), so this only rejects malformed proofs earlier and does not change the statement verified for any valid proof — keeping native/recursion lockstep intact. The guard is placed both where the count is first summed and (authoritatively) in `verify_chip_proof_pre_main`, right before the shift/cast it protects. `2^MAX_NUM_VARIABLES = 2^24` is far below the BabyBear modulus (~2^31), so the field-order wrap (#1240) is also closed. Recursion mirror: the recursion verifier reads `sum_num_instances`, `sum_num_instances_minus_one_bit_decomposition` and `log2_num_instances` as unconstrained hints (ceno_recursion/src/zkvm_verifier/binding.rs:585-587 — they are never bound to each other or to a field-order limit). Mirroring this guard and verifying the bit-decomposition/log2 derivation there is the matching follow-up but requires recursion-DSL changes that cannot be validated without an end-to-end recursion run, so it is intentionally left out of this commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Companion to the native-verifier bound. The native verifier rejects any num_instances entry (and their sum) exceeding 2^MAX_NUM_VARIABLES and derives log2_num_instances from those bounded counts. The recursion verifier reads sum_num_instances and log2_num_instances as untrusted hints without bounding them, so a crafted proof could drive pow_2(log2_num_instances) past the field size and wrap the next_pow2 - sum offset, verifying a different statement than the native verifier. verify_chip_proof_pre_main now bounds log2_num_instances <= MAX_NUM_VARIABLES and sum_num_instances / each num_instances[i] <= 2^MAX_NUM_VARIABLES before they are used. Valid proofs are always within this bound, so only malformed proofs are rejected; native and recursion stay in lockstep. Validated: cargo check / clippy clean; aggregation e2e over multi-shard keccak_syscall verifies (leaf proofs run the asserts for every chip, valid proofs are not rejected). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
a3bf785 to
70d3710
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
num_instances(per-shard, per-circuit instance counts) is read straight fromthe untrusted proof and then flows into size/shift/field arithmetic in both
verifiers. Neither side bounded it, so a crafted proof could push these
quantities out of range — a verifier-side DoS and a field-order decoupling, i.e.
a soundness concern. This PR adds the matching upper bound on both the native
Rust verifier and the recursion verifier so they stay in lockstep.
Problem
In the native verifier,
proof.num_instancesfeeds:proof.num_instances.iter().sum()— can wrap (or panic in debug);next_pow2_instance_paddingand the1 << log2_num_instancesusizeshifts —can overflow / panic on attacker input (DoS);
from_canonical_usizetranscript casts — can exceed the base-field modulusand wrap, decoupling the transcript-bound value from the real count.
The recursion verifier (
verify_chip_proof_pre_main) is worse: it does not evenderive
log2_num_instances/sum_num_instances— it reads them as untrustedhints. An out-of-range
log2_num_instancesdrivespow_2(log2_num_instances)past the BabyBear field size (~2³¹), wrapping the
next_pow2/next_pow2 - sumoffset used by the eq selectors, which would let the recursive proof verify a
different statement than the native verifier.
Fix
A valid proof's instance count is always bounded by the PCS setup size
(
2^MAX_NUM_VARIABLES, with the witness committed at≤ 2^MAX_NUM_VARIABLES), sorejecting anything larger only drops malformed proofs earlier and does not change
the statement verified for any valid proof.
ceno_zkvm/src/scheme/verifier.rs): reject any per-shardnum_instancesentry (and their sum)> 2^MAX_NUM_VARIABLES, both where thecount is first summed and (authoritatively) in
verify_chip_proof_pre_main,right before the shift/cast it protects.
ceno_recursion/src/zkvm_verifier/verifier.rs): inverify_chip_proof_pre_main, before the hints feedpow_2(log2_num_instances)and the offset, assert
log2_num_instances ≤ MAX_NUM_VARIABLESandsum_num_instances/ eachnum_instances[i] ≤ 2^MAX_NUM_VARIABLES. The boundlives in the verifier function (not the hint
read(), which is not part of theverified circuit).
2^MAX_NUM_VARIABLES = 2^24is far below the BabyBear modulus, so the field-orderwrap is closed on both sides.
Test plan
cargo check/cargo clippyclean for bothceno_zkvmandceno_recursiondebug MOCK + release prove/verify, single + multi-shard) — 9/9 pass; every
release run exercises the new bound in
verify_chip_proof_pre_maine2e_aggregateover multi-shardkeccak_syscallverifies —both leaf proofs run
verify_chip_proof_pre_main(and thus the new asserts)for every chip across 2 shards, then internal aggregation completes (final
height 1), confirming valid proofs are not rejected
Follow-up (out of scope)
This closes the bound (DoS / field-wrap). Fully binding the recursion's
log2_num_instances/ bit-decomposition hints to be exactly consistent withnum_instances(rather than only bounding them) — which the native side gets forfree by deriving
log2— remains a separate hardening item.Closes #1178, #1240.
🤖 Generated with Claude Code