ADR 0012 — Proposer/verifier value proposition (bounded-memory + recall; platform-forked throughput) by FluffyAIcode · Pull Request #113 · FluffyAIcode/Kakeya-LLM-Inference-engine

FluffyAIcode · 2026-06-13T03:09:33Z

What

New ADR docs/adr/0012-proposer-verifier-value-proposition.md (+ README index row) that settles the recurring question — "is the proposer worth it given Step-1 hits 1.0× AR without it / is spec-decode dead on Mac?" — so the decision tree isn't re-derived every time the Mac throughput number looks bad in isolation.

Decision recorded (Nygard format: Context → Decision → Consequences → Alternatives)

Core value = bounded memory + recall, not "fast." Since ADR 0008 §11 the proposer is a history reconstructor. Realised on the memory axis: Step-1 = 1.0× AR + recall 5/5 + KV 132.9 MB vs naive 1308.9 MB (89.8 %; ~48 MB after affine4 / ADR 0010). Honesty note: Mac S5-native is a Gemma-4 free coupon (native hybrid attention); on pure sliding-window models (Qwen3 K1/K2) and CUDA full-restoration, proposer reconstruction is still the only recall source.
Spec-decode value forks by platform. CUDA realised (K3 GPU beta: f_θ+S5 K/V-Restoration verifier — incremental decode (=AR) + DFlash fused spec-decode (>AR) on Gemma 4 26B-A4B #107: 1.27× AR, recall 1.0); Mac is acceptance-bound (30–40 % vs 44.7 % reference) → "waiting for the alignment asset" (ADR 0001/0004), not "architecture dead."
Verification-primitive option value. v3 byte-level consistency ⇒ any draft source is plug-and-play (NGramProposer × v3, draft ≈ 0, no training) — one bridge run verifies it.
Multi-host foundation. ADR 0009 / PR v0.5-M1 milestone: agent capability exchange + distributed spec decode on multi-host fleets (ADR 0009) #105 capability plane uses proposer/verifier as primitives; the Mac bridge is itself an instance of its tool plane.

Cross-links ADR 0001/0004/0006/0008/0009/0010/0011 + evidence pointers (k3_mlx_fused_fair_ctx280_n5_gen32_*.json, docs/k3-gpu-beta.md, results/research/verify_l_sweep.json).

Notes

Status Accepted, title 0012-proposer-verifier-value-proposition.md (per the agreed defaults).
Rendered in English to match the existing ADR series + README; the substance and all numbers are preserved from the source analysis. Say the word if you'd prefer the original Chinese verbatim.
Doc-only; opened off main, kept separate from the MLX-port branches.

Testing

Documentation-only change — no code paths affected.

… platform-forked throughput) Settle the recurring 'is the proposer worth it / is spec-decode dead on Mac' question. Records the value map across axes and platforms: - core value = bounded memory + recall (all platforms); Step-1 = 1.0x AR + recall 5/5 + KV 132.9MB vs 1308.9MB (89.8%; ~48MB after affine4). S5 is a Gemma-4 free coupon; proposer reconstruction remains the only recall source on pure sliding-window (Qwen3 K1/K2) + CUDA full-restoration. - spec-decode value forks by platform: CUDA realised (1.27x AR, #107); Mac is acceptance-bound (30-40% vs 44.7% ref) -> 'waiting for alignment asset', not 'architecture dead'. - verification primitive option value: v3 byte-level consistency -> any draft source plug-and-play (NGramProposer x v3, draft~0). - multi-host capability plane foundation (ADR 0009 / PR #105; mac bridge is its tool plane). Cross-links ADR 0001/0004/0006/0008/0009/0010/0011 + evidence pointers; README index updated. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…cture evidence Per 2026-06-13 directive: Step-1 incremental decode + native-cache path get recall from Gemma-4's native full-attn layers + native sliding eviction, never exercising f_theta/proposer KV restoration. 'recall 5/5 / 1.0x AR' is Gemma-4 native behaviour, NOT proof the restoration architecture works -- and the path is structurally unable to fail in a way that tests the architecture. Step-1/native bypass forbidden for any architecture-validation attempt; bounded-memory+recall claim is unvalidated on a falsifiable model and must be re-validated on a pure sliding-window model (Qwen3) where recall is impossible without restoration. Memory-saving numbers remain real. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

FluffyAIcode mentioned this pull request Jun 13, 2026

ADR 0013 — Distributed inference topology: what AR sequentiality allows #114

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR 0012 — Proposer/verifier value proposition (bounded-memory + recall; platform-forked throughput)#113

ADR 0012 — Proposer/verifier value proposition (bounded-memory + recall; platform-forked throughput)#113
FluffyAIcode wants to merge 2 commits into
mainfrom
AgentMemory/adr-0012-value-proposition-2815

FluffyAIcode commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FluffyAIcode commented Jun 13, 2026

What

Decision recorded (Nygard format: Context → Decision → Consequences → Alternatives)

Notes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants