feat(mac-launcher): long-answer-safe defaults + full-mode validation preset by FluffyAIcode · Pull Request #148 · FluffyAIcode/Kakeya-LLM-Inference-engine

FluffyAIcode · 2026-06-17T15:15:53Z

Summary

Updates the one-command Mac launcher scripts/run_kakeya_mac.sh now that the long-decode wrap fix (PR #146) is merged, and adds an on-device validation preset so the launcher's full pipeline is guarded end-to-end (not just the --fast path).

Why

Before PR #146, the full engine degenerated into a runaway repeat past the ~1024-token native-cache ring wrap, so the launcher's default budget (1024) sat right at the coherence cliff. PR #146 fixed that (single-token commits once the sliding RotatingKVCache wraps), so long answers are now coherent — the default can be generous.

Changes

scripts/run_kakeya_mac.sh:
- Default --max-new-tokens 1024 → 2048 (the wrap is no longer a coherence cliff; FULL mode just forgoes the spec-decode speedup past it).
- Documented the long-answer-safe behavior (header + FULL-mode banner + help).
- Verified: bash -n clean; --dry-run and --help correct; all flags valid against the current harness CLI.
inference_engine/bridge/manifest.py: new mlx-kakeya-launcher-full preset — invokes the launcher in FULL mode (f_θ verifier+proposer+f_θ) on a long scripted answer (请详细解释POW的工作原理, 1300 tokens) that crosses the wrap, with validate_reports=True (§4 liveness + §2.4 quality gate). mlx-kakeya-launcher-smoke stays for fast wiring checks.
Tests: allowlist + validate-reports sets updated; added test_mlx_kakeya_launcher_full_preset_runs_full_mode_past_wrap.

The FAST path (--fast → --cuda-trim, all-KVCache) is immune to the ring-wrap bug by construction; only the FULL path uses the hybrid cache that received the fix.

Validation (Mac M4, via bridge — `mlx-kakeya-launcher-full`, 1300 tokens)

The launcher's FULL pipeline ran end-to-end and passed the on-device gate:

exit_code = 0, evidence_gate_exit_code = 0 → §4 liveness + §2.4 quality gate passed.
f_theta_ran = True (25 sliding layers) → full verifier+proposer+f_θ pipeline executed.
tokens = 1241 (crossed the ~1024 wrap), mean_accept_len = 1.526 (single-token past the wrap, as designed), 3.81 tok/s, resident KV 235.7 MB.
Output coherent with a clean structured conclusion — no 由于由于 runaway.

run_kakeya_mac_full_validation.txt

Testing

✅ pytest tests/inference_engine/bridge/test_manifest.py (32 passed)
✅ bash -n scripts/run_kakeya_mac.sh; --dry-run shows FULL-mode argv with --max-new-tokens 2048
✅ On-device mlx-kakeya-launcher-full (Mac M4, 1300 tokens): gate passed, f_θ ran, coherent past the wrap (evidence above)

_{To show artifacts inline, enable in settings.}

…preset run_kakeya_mac.sh: - Document that long answers are now coherent past the ~1024 native-cache ring wrap (PR #146: single-token commits once the sliding RotatingKVCache wraps). - Raise default --max-new-tokens 1024 -> 2048 (the wrap is no longer a coherence cliff; FULL mode just drops the spec-decode speedup past it). - Refresh help text and FULL-mode mode banner. bridge: add mlx-kakeya-launcher-full preset (FULL f_θ path, long scripted answer crossing the wrap, validate_reports) so CI/on-device guards the launcher's full pipeline + the wrap fix end-to-end; launcher-smoke stays for fast wiring checks. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

github-actions Bot added the needs-mac-m4 label Jun 17, 2026

cursor Bot merged commit bc74bf9 into main Jun 18, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mac-launcher): long-answer-safe defaults + full-mode validation preset#148

feat(mac-launcher): long-answer-safe defaults + full-mode validation preset#148
cursor[bot] merged 1 commit into
mainfrom
AgentMemory/update-mac-full-engine-launcher-2815

FluffyAIcode commented Jun 17, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FluffyAIcode commented Jun 17, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Changes

Validation (Mac M4, via bridge — mlx-kakeya-launcher-full, 1300 tokens)

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FluffyAIcode commented Jun 17, 2026 •

edited by cursor Bot

Loading

Validation (Mac M4, via bridge — `mlx-kakeya-launcher-full`, 1300 tokens)