ADR 0012 — Proposer/verifier value proposition (bounded-memory + recall; platform-forked throughput)#113
Draft
FluffyAIcode wants to merge 2 commits into
Draft
Conversation
… platform-forked throughput) Settle the recurring 'is the proposer worth it / is spec-decode dead on Mac' question. Records the value map across axes and platforms: - core value = bounded memory + recall (all platforms); Step-1 = 1.0x AR + recall 5/5 + KV 132.9MB vs 1308.9MB (89.8%; ~48MB after affine4). S5 is a Gemma-4 free coupon; proposer reconstruction remains the only recall source on pure sliding-window (Qwen3 K1/K2) + CUDA full-restoration. - spec-decode value forks by platform: CUDA realised (1.27x AR, #107); Mac is acceptance-bound (30-40% vs 44.7% ref) -> 'waiting for alignment asset', not 'architecture dead'. - verification primitive option value: v3 byte-level consistency -> any draft source plug-and-play (NGramProposer x v3, draft~0). - multi-host capability plane foundation (ADR 0009 / PR #105; mac bridge is its tool plane). Cross-links ADR 0001/0004/0006/0008/0009/0010/0011 + evidence pointers; README index updated. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…cture evidence Per 2026-06-13 directive: Step-1 incremental decode + native-cache path get recall from Gemma-4's native full-attn layers + native sliding eviction, never exercising f_theta/proposer KV restoration. 'recall 5/5 / 1.0x AR' is Gemma-4 native behaviour, NOT proof the restoration architecture works -- and the path is structurally unable to fail in a way that tests the architecture. Step-1/native bypass forbidden for any architecture-validation attempt; bounded-memory+recall claim is unvalidated on a falsifiable model and must be re-validated on a pure sliding-window model (Qwen3) where recall is impossible without restoration. Memory-saving numbers remain real. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
New ADR
docs/adr/0012-proposer-verifier-value-proposition.md(+ README index row) that settles the recurring question — "is the proposer worth it given Step-1 hits 1.0× AR without it / is spec-decode dead on Mac?" — so the decision tree isn't re-derived every time the Mac throughput number looks bad in isolation.Decision recorded (Nygard format: Context → Decision → Consequences → Alternatives)
Cross-links ADR 0001/0004/0006/0008/0009/0010/0011 + evidence pointers (
k3_mlx_fused_fair_ctx280_n5_gen32_*.json,docs/k3-gpu-beta.md,results/research/verify_l_sweep.json).Notes
0012-proposer-verifier-value-proposition.md(per the agreed defaults).main, kept separate from the MLX-port branches.Testing
Documentation-only change — no code paths affected.