merge-train: #150 + #154 + #148 + #149 onto current main (conflicts resolved, tests green) by FluffyAIcode · Pull Request #155 · FluffyAIcode/Kakeya-LLM-Inference-engine

FluffyAIcode · 2026-06-18T13:16:08Z

What this is

A verified integration branch that merges the four open Mac PRs — #150 (runner Python pin), #154 (launcher bash-3.2 fix), #148 (launcher 2048 + full preset), #149 (runaway guard + codegen presets) — onto current main (which already has #152/#153 streaming), with all conflicts resolved and tests green. Merge this one PR to land all four at once, or use it as the reference resolution for merging them individually.

Verified merge order

#150 → #154 → #148 → #149

feat(mac-bridge): pin workload Python interpreter + import gate (Layers B/C) + reusable skill #150 — clean (only adds runner_python.py/run_preset.py/docs + launcher PYBIN; main's launcher was untouched by fix(mac-chat): stream tokens live so the CLI doesn't look frozen on long answers #152/fix(mac-chat): actually stop [stream] lines interleaving (the part #152 squash-dropped) #153).
fix(mac-launcher): bash 3.2-safe empty-array expansion (fixes "EXTRA[@]: unbound variable") #154 — 1 conflict: run_kakeya_mac.sh cmd=(…) line (feat(mac-bridge): pin workload Python interpreter + import gate (Layers B/C) + reusable skill #150's $PYBIN vs fix(mac-launcher): bash 3.2-safe empty-array expansion (fixes "EXTRA[@]: unbound variable") #154's ${EXTRA[@]+…}). Resolution = union: cmd=( "$PYBIN" … "${args[@]}" ${EXTRA[@]+"${EXTRA[@]}"} ).
feat(mac-launcher): long-answer-safe defaults + full-mode validation preset #148 — 2 conflicts: manifest.py + test_manifest.py (preset additions at the same anchor). Resolution = keep both presets (launcher-dryrun-bash32 + launcher-full); run_kakeya_mac.sh auto-merged (2048 default).
fix(mlx-fused): runaway-loop guard for greedy markdown-marker collapse on code prompts #149 — the heavy one, 4 conflicts because main's fix(mac-chat): stream tokens live so the CLI doesn't look frozen on long answers #152/fix(mac-chat): actually stop [stream] lines interleaving (the part #152 squash-dropped) #153 streaming touches the same files:
- fused_specdecode.py (×3 signatures): union both params → on_commit=…, stop_on_runaway=True. Loop bodies auto-merged so each loop has both _emit(on_commit, …) (streaming) and the _trailing_runaway_drop guard.
- k3_integrated_niah_eval_mac.py (×4): union — keep --chat-stream-stdout (streaming) + --chat-scripted-file/--fused-no-loop-guard (fix(mlx-fused): runaway-loop guard for greedy markdown-marker collapse on code prompts #149) args; each generate call gets both on_commit=on_commit, stop_on_runaway=_guard.
- manifest.py / test_manifest.py: keep all presets (chat-stream-probe + codegen-degen-probe + codegen-guard-validate + the launcher ones).

Testing (on the merged result)

✅ pytest tests/inference_engine/bridge/ tests/backends/mlx/test_fused_specdecode.py tests/inference_engine/bench/test_k3_report_gate.py → 112 passed
✅ bash -n scripts/run_kakeya_mac.sh; combined launcher dry-run (empty EXTRA) → --max-new-tokens 2048, $PYBIN, ${EXTRA[@]+…} all present
✅ Merged fused_specdecode.py loops carry both streaming and the runaway guard; k3_… has streaming + native-ref + guard + _guard defined

If you merge the four PRs individually instead, follow the order/resolutions above. After landing, close this integration PR (and the four originals will show as merged/redundant).

…preset run_kakeya_mac.sh: - Document that long answers are now coherent past the ~1024 native-cache ring wrap (PR #146: single-token commits once the sliding RotatingKVCache wraps). - Raise default --max-new-tokens 1024 -> 2048 (the wrap is no longer a coherence cliff; FULL mode just drops the spec-decode speedup past it). - Refresh help text and FULL-mode mode banner. bridge: add mlx-kakeya-launcher-full preset (FULL f_θ path, long scripted answer crossing the wrap, validate_reports) so CI/on-device guards the launcher's full pipeline + the wrap fix end-to-end; launcher-smoke stays for fast wiring checks. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…ive-control probe KAKEYA_KDBG-gated per-block logging (sampled/committed ids, cyc_frac/cyc_p, cache offsets) in fused_specdecode_generate, and a turn_compare_fused_vs_native record (first_divergence_idx + both tails) in _run_fused_chat. New bridge preset mlx-kakeya-codegen-degen-probe runs the C-code prompt with --chat-native-ref to decide greedy-pathology vs engine bug. Instrumentation only; reverted after fix. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…efill) + multi-turn degen preset KAKEYA_KDBG-gated prefill_state_fused / prefill_state_native records in _run_fused_chat: per-turn prompt_len, evicted_count, rot/full cache offsets, any_wrapped, would_wrap_block0, plus a turn index on turn_compare. Repoints mlx-kakeya-codegen-degen-probe to the multi-turn repro (turn-1 PoW explanation pushes the turn-2 code prompt's prefill past the sliding window) at 1200 tok. Instrumentation only; reverted after fix. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…t prefill) Multi-turn+native at 1200x2 OOM'd the Mac runner. Per debug analysis, the cheapest test of H-C' (long-prompt prefill corrupts logits) vs H-A' (bounded- greedy pathology) is a single-turn LONG prompt that wraps the ring AT prefill (would_wrap_block0) with a tiny 192-tok budget. Add --chat-scripted-file so the ~2k-char context is a committed fixture (pow_codegen_longprompt.txt) instead of a giant manifest argv; repoint mlx-kakeya-codegen-degen-probe to it. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

Repro evidence: single-turn fused decode is TOKEN-IDENTICAL to native greedy (first_divergence_idx=None) and coherent through 1200 tokens, so the engine is faithful — the user's '由于...'/'**/.2/*' collapse is greedy-decoding pathology on code/markdown-heavy prompts that the fused path (pure argmax, unlike chat_mlx_kakeya.py) had no mitigation for. Once a loop starts the drafter trivially predicts the repeats and the greedy verifier accepts them (high accept_len), so it walls indefinitely. Fix: _trailing_runaway_drop detects a 1..8-token unit repeated >=12x at the tail (conservative; never trims legit lists/enumerations/code) and the three fused loops stop generation, keeping a short clean tail instead of an unbounded wall. Default ON (stop_on_runaway=True); --fused-no-loop-guard disables it for degeneration probes. Adds stopped_on_runaway to the result. Also: --chat-scripted-file (long prompt as committed fixture) + repoint the codegen-degen probe to a single-turn long prompt that wraps the ring at prefill (cheap; the multi-turn+native variant OOM'd the Mac runner). KAKEYA_KDBG probe instrumentation retained (inert unless the env var is set) for the pending on-device H-C'-vs-H-A' confirmation. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…_lm); add guard-ON validation preset The 'env KAKEYA_KDBG=1 python3' prefix resolved a python3 without mlx_lm on the runner (ModuleNotFoundError). Drop it (KDBG instrumentation is now inert, which is also what we want for the final PR). The native_ref/text/stopped_on_runaway signals in the JSON are sufficient to characterize + validate. Add mlx-kakeya-codegen-guard-validate (guard ON) to prove the clean stop. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…get 1100) to reach the ~978-tok collapse onset Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…ive-ref, budget 900 (matches the user's high-accept regime) Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…codegen presets - Remove the KAKEYA_KDBG-gated debug instrumentation (helpers + per-block emission + prefill_state/turn_compare) from fused_specdecode.py and k3_integrated_niah_eval_mac.py. Investigation complete. - Keep the production fix (runaway-loop guard) + the --chat-scripted-file / --fused-no-loop-guard / --chat-native-ref flags. - Repoint the two codegen presets to the multi-turn 'explain||code' chat (guard-off probe + guard-on validate), accurate descriptions; drop the now- unused pow_codegen_longprompt.txt fixture. On-device (Mac M4): across short/long/multi-turn regimes the engine is coherent (fused==native); guard-on and guard-off outputs are byte-identical on the multi-turn code scenario -> the guard is inert on healthy output (no regression) and the systematic degeneration was already resolved by the wrap fix (#146). Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

… prompt Captures the diagnosis+fix for the post-reboot ModuleNotFoundError (mlx_lm) on the kakeya-mac-m4 runner: lightweight env-probe diagnosis, 3-layer fix (pin venv on the runner agent PATH via .path/.env|launchd|systemd; resolve a pinned interpreter in the workflow/executor instead of bare python3; fail-fast import gate), reboot-inclusive verification, and the Cloud-VM-vs-runner distinction (Mac-only deps belong on the runner, not the Linux Cloud Agent env). Includes a ready-to-paste setup-agent prompt; generalized for any Claude/Codex agent. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…eck gate (Layer C) Reboots can repoint the runner's default python3 to one without mlx_lm, which broke every full-engine preset with a deep ModuleNotFoundError. Make the workload interpreter explicit and verified: - inference_engine/bridge/runner_python.py (NEW, pure + 100% unit-tested): workload_python_candidates (pin KAKEYA_MAC_PYTHON -> venvs -> PATH), resolve_workload_python (first interpreter that can import mlx_lm; else fallback), preset_requires_gate (mlx-/k3- engine presets, minus env-probe/ upgrade), substitute_python, gate_error_message. - scripts/mac_bridge/run_preset.py: resolve the pinned interpreter, rewrite bare python3 argv0 to it, export KAKEYA_MAC_PYTHON to the subprocess, and FAIL FAST (exit 90 + ::error::) when a gated preset has no mlx_lm-capable interpreter. - scripts/run_kakeya_mac.sh: honor KAKEYA_MAC_PYTHON; preflight asserts mlx+mlx_lm. CI enforcement: the resolution/gate logic lives in the unit-tested, 100%-coverage library (runner_python.py), so every PR exercises it on the Linux gate. See docs/skills/pin-selfhosted-runner-python-env-skill.md. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…ound variable) scripts/run_kakeya_mac.sh used 'set -u' + a bare "${EXTRA[@]}". macOS's default /bin/bash is 3.2, where expanding an EMPTY array under nounset errors with 'EXTRA[@]: unbound variable' — hit when the launcher is run with no pass-through args (the common interactive case). Use the canonical ${EXTRA[@]+"${EXTRA[@]}"} form (elements if set, nothing if empty, no nounset error). Add mlx-kakeya-launcher-dryrun-bash32 preset to guard it on the real /bin/bash 3.2. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…nning-skill-2815' into _train2 Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…rray-fix-2815' into _train2 # Conflicts: # scripts/run_kakeya_mac.sh Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…ne-launcher-2815' into _train2 # Conflicts: # inference_engine/bridge/manifest.py # tests/inference_engine/bridge/test_manifest.py Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…ration-fix-2815' into _train2 # Conflicts: # inference_engine/backends/mlx/fused_specdecode.py # inference_engine/bridge/manifest.py # scripts/research/k3_integrated_niah_eval_mac.py # tests/inference_engine/bridge/test_manifest.py Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

cursoragent and others added 16 commits June 17, 2026 15:15

debug(probe): long single-decode A/B (drop native-ref for memory, bud…

f8a7a9a

…get 1100) to reach the ~978-tok collapse onset Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

debug(probe): multi-turn (explanation->code) guard-off/on A/B, no nat…

85abe81

…ive-ref, budget 900 (matches the user's high-accept regime) Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

Merge remote-tracking branch 'origin/AgentMemory/runner-python-env-pi…

6b8320e

…nning-skill-2815' into _train2 Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

Merge remote-tracking branch 'origin/AgentMemory/mac-launcher-empty-a…

5bece7b

…rray-fix-2815' into _train2 # Conflicts: # scripts/run_kakeya_mac.sh Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

Merge remote-tracking branch 'origin/AgentMemory/update-mac-full-engi…

bc74bf9

…ne-launcher-2815' into _train2 # Conflicts: # inference_engine/bridge/manifest.py # tests/inference_engine/bridge/test_manifest.py Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

github-actions Bot added the needs-mac-m4 label Jun 18, 2026

cursor Bot merged commit 5c1bc29 into main Jun 18, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge-train: #150 + #154 + #148 + #149 onto current main (conflicts resolved, tests green)#155

merge-train: #150 + #154 + #148 + #149 onto current main (conflicts resolved, tests green)#155
cursor[bot] merged 16 commits into
mainfrom
AgentMemory/merge-train-148-149-150-154-2815

FluffyAIcode commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FluffyAIcode commented Jun 18, 2026

What this is

Verified merge order

Testing (on the merged result)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants