Skip to content

ADR 0012 — Proposer/verifier value proposition (bounded-memory + recall; platform-forked throughput)#113

Draft
FluffyAIcode wants to merge 2 commits into
mainfrom
AgentMemory/adr-0012-value-proposition-2815
Draft

ADR 0012 — Proposer/verifier value proposition (bounded-memory + recall; platform-forked throughput)#113
FluffyAIcode wants to merge 2 commits into
mainfrom
AgentMemory/adr-0012-value-proposition-2815

Conversation

@FluffyAIcode

Copy link
Copy Markdown
Owner

What

New ADR docs/adr/0012-proposer-verifier-value-proposition.md (+ README index row) that settles the recurring question — "is the proposer worth it given Step-1 hits 1.0× AR without it / is spec-decode dead on Mac?" — so the decision tree isn't re-derived every time the Mac throughput number looks bad in isolation.

Decision recorded (Nygard format: Context → Decision → Consequences → Alternatives)

  1. Core value = bounded memory + recall, not "fast." Since ADR 0008 §11 the proposer is a history reconstructor. Realised on the memory axis: Step-1 = 1.0× AR + recall 5/5 + KV 132.9 MB vs naive 1308.9 MB (89.8 %; ~48 MB after affine4 / ADR 0010). Honesty note: Mac S5-native is a Gemma-4 free coupon (native hybrid attention); on pure sliding-window models (Qwen3 K1/K2) and CUDA full-restoration, proposer reconstruction is still the only recall source.
  2. Spec-decode value forks by platform. CUDA realised (K3 GPU beta: f_θ+S5 K/V-Restoration verifier — incremental decode (=AR) + DFlash fused spec-decode (>AR) on Gemma 4 26B-A4B #107: 1.27× AR, recall 1.0); Mac is acceptance-bound (30–40 % vs 44.7 % reference) → "waiting for the alignment asset" (ADR 0001/0004), not "architecture dead."
  3. Verification-primitive option value. v3 byte-level consistency ⇒ any draft source is plug-and-play (NGramProposer × v3, draft ≈ 0, no training) — one bridge run verifies it.
  4. Multi-host foundation. ADR 0009 / PR v0.5-M1 milestone: agent capability exchange + distributed spec decode on multi-host fleets (ADR 0009) #105 capability plane uses proposer/verifier as primitives; the Mac bridge is itself an instance of its tool plane.

Cross-links ADR 0001/0004/0006/0008/0009/0010/0011 + evidence pointers (k3_mlx_fused_fair_ctx280_n5_gen32_*.json, docs/k3-gpu-beta.md, results/research/verify_l_sweep.json).

Notes

  • Status Accepted, title 0012-proposer-verifier-value-proposition.md (per the agreed defaults).
  • Rendered in English to match the existing ADR series + README; the substance and all numbers are preserved from the source analysis. Say the word if you'd prefer the original Chinese verbatim.
  • Doc-only; opened off main, kept separate from the MLX-port branches.

Testing

Documentation-only change — no code paths affected.

Open in Web Open in Cursor 

… platform-forked throughput)

Settle the recurring 'is the proposer worth it / is spec-decode dead on Mac'
question. Records the value map across axes and platforms:
- core value = bounded memory + recall (all platforms); Step-1 = 1.0x AR +
  recall 5/5 + KV 132.9MB vs 1308.9MB (89.8%; ~48MB after affine4). S5 is a
  Gemma-4 free coupon; proposer reconstruction remains the only recall source
  on pure sliding-window (Qwen3 K1/K2) + CUDA full-restoration.
- spec-decode value forks by platform: CUDA realised (1.27x AR, #107); Mac is
  acceptance-bound (30-40% vs 44.7% ref) -> 'waiting for alignment asset', not
  'architecture dead'.
- verification primitive option value: v3 byte-level consistency -> any draft
  source plug-and-play (NGramProposer x v3, draft~0).
- multi-host capability plane foundation (ADR 0009 / PR #105; mac bridge is its
  tool plane).
Cross-links ADR 0001/0004/0006/0008/0009/0010/0011 + evidence pointers; README
index updated.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…cture evidence

Per 2026-06-13 directive: Step-1 incremental decode + native-cache path get recall
from Gemma-4's native full-attn layers + native sliding eviction, never exercising
f_theta/proposer KV restoration. 'recall 5/5 / 1.0x AR' is Gemma-4 native behaviour,
NOT proof the restoration architecture works -- and the path is structurally unable
to fail in a way that tests the architecture. Step-1/native bypass forbidden for any
architecture-validation attempt; bounded-memory+recall claim is unvalidated on a
falsifiable model and must be re-validated on a pure sliding-window model (Qwen3)
where recall is impossible without restoration. Memory-saving numbers remain real.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants