Skip to content

feat(cuda): R4 DEEP composition + FRI commit phase on GPU#648

Open
ColoCarletti wants to merge 57 commits into
mainfrom
feat/cuda-pr4
Open

feat(cuda): R4 DEEP composition + FRI commit phase on GPU#648
ColoCarletti wants to merge 57 commits into
mainfrom
feat/cuda-pr4

Conversation

@ColoCarletti
Copy link
Copy Markdown
Collaborator

Summary

Extends the GPU-resident proving pipeline through Round 4. R4 DEEP composition and the full FRI commit phase (fold + per-layer Keccak leaves + pair-hash Merkle
tree) now run device-side, with only per-layer roots D2H'd for the transcript. The R2 composition-parts LDE moves to a _keep variant so its de-interleaved device
buffer is retained on Round2 and reused by R4 DEEP without a re-H2D. Also lands a Blelloch chunk-scan parallel batch-inverse kernel as infrastructure for future
GPU-side denominator inversion (not yet wired).

Changes

  • crypto/math-cuda/kernels/{inverse,deep,fri}.cu — new kernels.
  • crypto/math-cuda/src/{inverse,deep,fri}.rs — host orchestrators including FriCommitState (ping-pong eval buffers, in-place inv_twiddles squaring, per-layer
    fused fold + leaves + tree).
  • crypto/stark/src/gpu_lde.rs — new dispatches: try_evaluate_parts_on_lde_gpu_keep, try_deep_composition_gpu, try_fri_commit_gpu. New counters:
    gpu_deep_calls, gpu_fri_calls.
  • crypto/stark/src/prover.rsRound2.gpu_composition_parts holds the R2 keep handle; R4 DEEP fast path inside compute_deep_composition_poly_evaluations
    consumes R1 main/aux + R2 parts handles when available.
  • crypto/stark/src/fri/mod.rscommit_phase_from_evaluations routes through try_fri_commit_gpu when cuda is enabled.
  • Tests: parity for batch invert (n in {2..2^20}), DEEP, FRI per-layer tree (log_num_leaves in {1..18}); cuda_path_integration asserts the two new counters fire
    end-to-end.

Fallback

Every dispatch is gated by TypeId checks (Goldilocks + ext3) and the LDE-size threshold. Below threshold or on any cudarc error, the dispatch returns None and
the existing CPU implementation runs unchanged. Exception: mid-FRI-loop cudarc failure panics, because the transcript is already advanced and a CPU restart would
re-sample zeta_0 against mutated state.

Test plan

  • cargo test -p math-cuda --release --tests (GPU host) — 67 tests
  • cargo test -p stark --release --features cuda — 128 tests
  • cargo test -p stark --release (no cuda) — 128 tests
  • cargo test -p lambda-vm-prover --release (no cuda) — 384 tests
  • cargo test -p lambda-vm-prover --release --features cuda --test cuda_path_integration -- --ignored — all 6 counters fire end-to-end
  • cargo clippy --workspace --all-targets --features cuda -- -D warnings — clean
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo fmt --all --check — clean

ColoCarletti and others added 30 commits May 6, 2026 15:12
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
ColoCarletti and others added 27 commits May 29, 2026 11:18
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
# Conflicts:
#	crypto/math-cuda/build.rs
#	crypto/math-cuda/src/device.rs
#	crypto/math-cuda/src/lib.rs
#	crypto/stark/src/gpu_lde.rs
#	crypto/stark/src/prover.rs
#	prover/tests/cuda_path_integration.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants