Skip to content

feat(mac-launcher): long-answer-safe defaults + full-mode validation preset#148

Merged
cursor[bot] merged 1 commit into
mainfrom
AgentMemory/update-mac-full-engine-launcher-2815
Jun 18, 2026
Merged

feat(mac-launcher): long-answer-safe defaults + full-mode validation preset#148
cursor[bot] merged 1 commit into
mainfrom
AgentMemory/update-mac-full-engine-launcher-2815

Conversation

@FluffyAIcode

@FluffyAIcode FluffyAIcode commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Summary

Updates the one-command Mac launcher scripts/run_kakeya_mac.sh now that the long-decode wrap fix (PR #146) is merged, and adds an on-device validation preset so the launcher's full pipeline is guarded end-to-end (not just the --fast path).

Why

Before PR #146, the full engine degenerated into a runaway repeat past the ~1024-token native-cache ring wrap, so the launcher's default budget (1024) sat right at the coherence cliff. PR #146 fixed that (single-token commits once the sliding RotatingKVCache wraps), so long answers are now coherent — the default can be generous.

Changes

  • scripts/run_kakeya_mac.sh:
    • Default --max-new-tokens 1024 → 2048 (the wrap is no longer a coherence cliff; FULL mode just forgoes the spec-decode speedup past it).
    • Documented the long-answer-safe behavior (header + FULL-mode banner + help).
    • Verified: bash -n clean; --dry-run and --help correct; all flags valid against the current harness CLI.
  • inference_engine/bridge/manifest.py: new mlx-kakeya-launcher-full preset — invokes the launcher in FULL mode (f_θ verifier+proposer+f_θ) on a long scripted answer (请详细解释POW的工作原理, 1300 tokens) that crosses the wrap, with validate_reports=True (§4 liveness + §2.4 quality gate). mlx-kakeya-launcher-smoke stays for fast wiring checks.
  • Tests: allowlist + validate-reports sets updated; added test_mlx_kakeya_launcher_full_preset_runs_full_mode_past_wrap.

The FAST path (--fast--cuda-trim, all-KVCache) is immune to the ring-wrap bug by construction; only the FULL path uses the hybrid cache that received the fix.

Validation (Mac M4, via bridge — mlx-kakeya-launcher-full, 1300 tokens)

The launcher's FULL pipeline ran end-to-end and passed the on-device gate:

  • exit_code = 0, evidence_gate_exit_code = 0 → §4 liveness + §2.4 quality gate passed.
  • f_theta_ran = True (25 sliding layers) → full verifier+proposer+f_θ pipeline executed.
  • tokens = 1241 (crossed the ~1024 wrap), mean_accept_len = 1.526 (single-token past the wrap, as designed), 3.81 tok/s, resident KV 235.7 MB.
  • Output coherent with a clean structured conclusion — no 由于由于 runaway.

run_kakeya_mac_full_validation.txt

Testing

  • pytest tests/inference_engine/bridge/test_manifest.py (32 passed)
  • bash -n scripts/run_kakeya_mac.sh; --dry-run shows FULL-mode argv with --max-new-tokens 2048
  • ✅ On-device mlx-kakeya-launcher-full (Mac M4, 1300 tokens): gate passed, f_θ ran, coherent past the wrap (evidence above)

To show artifacts inline, enable in settings.

Open in Web Open in Cursor 

…preset

run_kakeya_mac.sh:
- Document that long answers are now coherent past the ~1024 native-cache ring
  wrap (PR #146: single-token commits once the sliding RotatingKVCache wraps).
- Raise default --max-new-tokens 1024 -> 2048 (the wrap is no longer a coherence
  cliff; FULL mode just drops the spec-decode speedup past it).
- Refresh help text and FULL-mode mode banner.

bridge: add mlx-kakeya-launcher-full preset (FULL f_θ path, long scripted answer
crossing the wrap, validate_reports) so CI/on-device guards the launcher's full
pipeline + the wrap fix end-to-end; launcher-smoke stays for fast wiring checks.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants