README: design philosophy + MLX & CUDA beta scorecards (Kakeya vs standalone model) by FluffyAIcode · Pull Request #120 · FluffyAIcode/Kakeya-LLM-Inference-engine

FluffyAIcode · 2026-06-13T11:54:36Z

What

Updates main's README.md with (1) the design philosophy of the AR-verifier + dLLM-proposer architecture, (2) the MLX (Mac) and CUDA (H200) beta scorecards as summary tables, and (3) the full verbatim raw scorecard reports for both platforms (collapsible <details> blocks under each table).

Design philosophy

Frames the whole engine around one inequality:

Make Gemma-4 26B-A4B-it memory-bounded without trading away model intelligence (recall), token throughput, or context length.

The proposer's first role is history reconstructor (a dLLM has no KV cache → restores evicted verifier K/V on demand via f_θ/S5); KV restoration keeps only a bounded sink+window resident while the full effective context is reconstructed. Because restored/spec-decoded K/V is byte-checked against the AR cache, output is identical to the standalone model (recall 1.0) — the memory win costs zero intelligence.

Scorecards added (`main` @ `9d5e6b4`)

Mac (MLX) vs mlx_lm AR oracle — 89.8 % KV saved (132.92 vs 1308.88 MB), recall 1.0, throughput 0.93× (≈parity).

CUDA (H200) vs standalone Gemma-4 26B AR — 43.9×/87.0× KV saving (constant 16.71 MB), 47.9×/94.9× context compression, recall 1.0, fused spec-decode 1.79× AR.

Both at byte-identical output; the throughput axis is the only platform fork (CUDA's cheap verify-batch → >AR; Mac's 26B verify(L) floor → parity). Each platform's table is followed by a collapsible raw scorecard report with the exact reproducible evidence. Reconciled one stale H200 1.27× reference in the Mac raw report to the fresh 1.79× so the two reports agree.

Files

README.md: new ## Design philosophy … section + ### Beta scorecards … (tables + collapsible raw reports for Mac and CUDA).

Testing

✅ Markdown sanity check: fenced code blocks balanced (even count), <details>/</details> balanced (2/2), all headline figures present.
Numbers sourced from the just-run evidence: Mac-bridge k3-beta-scorecard + k3-fused-allmlx-code-trim (PR bridge: k3-beta-scorecard preset — Kakeya vs MLX-only on main (#117) #118) and H200 k3_e2e_gpu_bench + k3_specdecode_gpu_bench (PR evidence: GPU beta scorecard — Kakeya vs standalone AR on H200 (main #117) #119).

…n → memory-bounded Gemma-4 26B) + MLX & CUDA beta scorecards - Add 'Design philosophy' section: memory-bounded Gemma-4 26B without trading intelligence (recall 1.0), throughput, or context length; KV restoration as the mechanism (bounded sink+window resident, full effective context restored). - Add 'Beta scorecards' with Kakeya-vs-standalone tables on both platforms: Mac MLX (89.8% KV saved, recall 1.0, 0.93x ~parity) and CUDA H200 (43.9x/87.0x KV saving, 47.9x/94.9x ctx compression, 1.79x AR fused). - Reconcile honest-ceiling reference to fresh main 1.79x (H200, block-16). Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…ible) Adds the full raw scorecard reports as <details> code blocks under each platform's summary table, so the exact reproducible evidence sits alongside the condensed tables. Reconciled the Mac report's trailing H200 reference 1.27x -> 1.79x to match the fresh main GPU scorecard (avoids contradicting the CUDA report in the same section). Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…ta-scorecards-2815 Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

cursoragent and others added 4 commits June 13, 2026 11:54

README: drop <details> collapsibles; show raw scorecard reports inline

84cbb19

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into AgentMemory/readme-be…

5226f52

…ta-scorecards-2815 Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

cursor Bot marked this pull request as ready for review June 13, 2026 13:35

cursor Bot merged commit a2f5086 into main Jun 13, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README: design philosophy + MLX & CUDA beta scorecards (Kakeya vs standalone model)#120

README: design philosophy + MLX & CUDA beta scorecards (Kakeya vs standalone model)#120
cursor[bot] merged 4 commits into
mainfrom
AgentMemory/readme-beta-scorecards-2815

FluffyAIcode commented Jun 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FluffyAIcode commented Jun 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Design philosophy

Scorecards added (main @ 9d5e6b4)

Files

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FluffyAIcode commented Jun 13, 2026 •

edited by cursor Bot

Loading

Scorecards added (`main` @ `9d5e6b4`)