README: design philosophy + MLX & CUDA beta scorecards (Kakeya vs standalone model)#120
Merged
Merged
Conversation
…n → memory-bounded Gemma-4 26B) + MLX & CUDA beta scorecards - Add 'Design philosophy' section: memory-bounded Gemma-4 26B without trading intelligence (recall 1.0), throughput, or context length; KV restoration as the mechanism (bounded sink+window resident, full effective context restored). - Add 'Beta scorecards' with Kakeya-vs-standalone tables on both platforms: Mac MLX (89.8% KV saved, recall 1.0, 0.93x ~parity) and CUDA H200 (43.9x/87.0x KV saving, 47.9x/94.9x ctx compression, 1.79x AR fused). - Reconcile honest-ceiling reference to fresh main 1.79x (H200, block-16). Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ible) Adds the full raw scorecard reports as <details> code blocks under each platform's summary table, so the exact reproducible evidence sits alongside the condensed tables. Reconciled the Mac report's trailing H200 reference 1.27x -> 1.79x to match the fresh main GPU scorecard (avoids contradicting the CUDA report in the same section). Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ta-scorecards-2815 Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Updates
main'sREADME.mdwith (1) the design philosophy of the AR-verifier + dLLM-proposer architecture, (2) the MLX (Mac) and CUDA (H200) beta scorecards as summary tables, and (3) the full verbatim raw scorecard reports for both platforms (collapsible<details>blocks under each table).Design philosophy
Frames the whole engine around one inequality:
The proposer's first role is history reconstructor (a dLLM has no KV cache → restores evicted verifier K/V on demand via
f_θ/S5); KV restoration keeps only a bounded sink+window resident while the full effective context is reconstructed. Because restored/spec-decoded K/V is byte-checked against the AR cache, output is identical to the standalone model (recall 1.0) — the memory win costs zero intelligence.Scorecards added (
main@9d5e6b4)Mac (MLX) vs
mlx_lmAR oracle — 89.8 % KV saved (132.92 vs 1308.88 MB), recall 1.0, throughput 0.93× (≈parity).CUDA (H200) vs standalone Gemma-4 26B AR — 43.9×/87.0× KV saving (constant 16.71 MB), 47.9×/94.9× context compression, recall 1.0, fused spec-decode 1.79× AR.
Both at byte-identical output; the throughput axis is the only platform fork (CUDA's cheap verify-batch → >AR; Mac's 26B
verify(L)floor → parity). Each platform's table is followed by a collapsible raw scorecard report with the exact reproducible evidence. Reconciled one staleH200 1.27×reference in the Mac raw report to the fresh1.79×so the two reports agree.Files
README.md: new## Design philosophy …section +### Beta scorecards …(tables + collapsible raw reports for Mac and CUDA).Testing
<details>/</details>balanced (2/2), all headline figures present.k3-beta-scorecard+k3-fused-allmlx-code-trim(PR bridge: k3-beta-scorecard preset — Kakeya vs MLX-only on main (#117) #118) and H200k3_e2e_gpu_bench+k3_specdecode_gpu_bench(PR evidence: GPU beta scorecard — Kakeya vs standalone AR on H200 (main #117) #119).