merge-train: #150 + #154 + #148 + #149 onto current main (conflicts resolved, tests green)#155
Merged
Merged
Conversation
…preset run_kakeya_mac.sh: - Document that long answers are now coherent past the ~1024 native-cache ring wrap (PR #146: single-token commits once the sliding RotatingKVCache wraps). - Raise default --max-new-tokens 1024 -> 2048 (the wrap is no longer a coherence cliff; FULL mode just drops the spec-decode speedup past it). - Refresh help text and FULL-mode mode banner. bridge: add mlx-kakeya-launcher-full preset (FULL f_θ path, long scripted answer crossing the wrap, validate_reports) so CI/on-device guards the launcher's full pipeline + the wrap fix end-to-end; launcher-smoke stays for fast wiring checks. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ive-control probe KAKEYA_KDBG-gated per-block logging (sampled/committed ids, cyc_frac/cyc_p, cache offsets) in fused_specdecode_generate, and a turn_compare_fused_vs_native record (first_divergence_idx + both tails) in _run_fused_chat. New bridge preset mlx-kakeya-codegen-degen-probe runs the C-code prompt with --chat-native-ref to decide greedy-pathology vs engine bug. Instrumentation only; reverted after fix. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…efill) + multi-turn degen preset KAKEYA_KDBG-gated prefill_state_fused / prefill_state_native records in _run_fused_chat: per-turn prompt_len, evicted_count, rot/full cache offsets, any_wrapped, would_wrap_block0, plus a turn index on turn_compare. Repoints mlx-kakeya-codegen-degen-probe to the multi-turn repro (turn-1 PoW explanation pushes the turn-2 code prompt's prefill past the sliding window) at 1200 tok. Instrumentation only; reverted after fix. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…t prefill) Multi-turn+native at 1200x2 OOM'd the Mac runner. Per debug analysis, the cheapest test of H-C' (long-prompt prefill corrupts logits) vs H-A' (bounded- greedy pathology) is a single-turn LONG prompt that wraps the ring AT prefill (would_wrap_block0) with a tiny 192-tok budget. Add --chat-scripted-file so the ~2k-char context is a committed fixture (pow_codegen_longprompt.txt) instead of a giant manifest argv; repoint mlx-kakeya-codegen-degen-probe to it. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
Repro evidence: single-turn fused decode is TOKEN-IDENTICAL to native greedy (first_divergence_idx=None) and coherent through 1200 tokens, so the engine is faithful — the user's '由于...'/'**/.2/*' collapse is greedy-decoding pathology on code/markdown-heavy prompts that the fused path (pure argmax, unlike chat_mlx_kakeya.py) had no mitigation for. Once a loop starts the drafter trivially predicts the repeats and the greedy verifier accepts them (high accept_len), so it walls indefinitely. Fix: _trailing_runaway_drop detects a 1..8-token unit repeated >=12x at the tail (conservative; never trims legit lists/enumerations/code) and the three fused loops stop generation, keeping a short clean tail instead of an unbounded wall. Default ON (stop_on_runaway=True); --fused-no-loop-guard disables it for degeneration probes. Adds stopped_on_runaway to the result. Also: --chat-scripted-file (long prompt as committed fixture) + repoint the codegen-degen probe to a single-turn long prompt that wraps the ring at prefill (cheap; the multi-turn+native variant OOM'd the Mac runner). KAKEYA_KDBG probe instrumentation retained (inert unless the env var is set) for the pending on-device H-C'-vs-H-A' confirmation. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…_lm); add guard-ON validation preset The 'env KAKEYA_KDBG=1 python3' prefix resolved a python3 without mlx_lm on the runner (ModuleNotFoundError). Drop it (KDBG instrumentation is now inert, which is also what we want for the final PR). The native_ref/text/stopped_on_runaway signals in the JSON are sufficient to characterize + validate. Add mlx-kakeya-codegen-guard-validate (guard ON) to prove the clean stop. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…get 1100) to reach the ~978-tok collapse onset Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ive-ref, budget 900 (matches the user's high-accept regime) Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…codegen presets - Remove the KAKEYA_KDBG-gated debug instrumentation (helpers + per-block emission + prefill_state/turn_compare) from fused_specdecode.py and k3_integrated_niah_eval_mac.py. Investigation complete. - Keep the production fix (runaway-loop guard) + the --chat-scripted-file / --fused-no-loop-guard / --chat-native-ref flags. - Repoint the two codegen presets to the multi-turn 'explain||code' chat (guard-off probe + guard-on validate), accurate descriptions; drop the now- unused pow_codegen_longprompt.txt fixture. On-device (Mac M4): across short/long/multi-turn regimes the engine is coherent (fused==native); guard-on and guard-off outputs are byte-identical on the multi-turn code scenario -> the guard is inert on healthy output (no regression) and the systematic degeneration was already resolved by the wrap fix (#146). Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
… prompt Captures the diagnosis+fix for the post-reboot ModuleNotFoundError (mlx_lm) on the kakeya-mac-m4 runner: lightweight env-probe diagnosis, 3-layer fix (pin venv on the runner agent PATH via .path/.env|launchd|systemd; resolve a pinned interpreter in the workflow/executor instead of bare python3; fail-fast import gate), reboot-inclusive verification, and the Cloud-VM-vs-runner distinction (Mac-only deps belong on the runner, not the Linux Cloud Agent env). Includes a ready-to-paste setup-agent prompt; generalized for any Claude/Codex agent. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…eck gate (Layer C) Reboots can repoint the runner's default python3 to one without mlx_lm, which broke every full-engine preset with a deep ModuleNotFoundError. Make the workload interpreter explicit and verified: - inference_engine/bridge/runner_python.py (NEW, pure + 100% unit-tested): workload_python_candidates (pin KAKEYA_MAC_PYTHON -> venvs -> PATH), resolve_workload_python (first interpreter that can import mlx_lm; else fallback), preset_requires_gate (mlx-/k3- engine presets, minus env-probe/ upgrade), substitute_python, gate_error_message. - scripts/mac_bridge/run_preset.py: resolve the pinned interpreter, rewrite bare python3 argv0 to it, export KAKEYA_MAC_PYTHON to the subprocess, and FAIL FAST (exit 90 + ::error::) when a gated preset has no mlx_lm-capable interpreter. - scripts/run_kakeya_mac.sh: honor KAKEYA_MAC_PYTHON; preflight asserts mlx+mlx_lm. CI enforcement: the resolution/gate logic lives in the unit-tested, 100%-coverage library (runner_python.py), so every PR exercises it on the Linux gate. See docs/skills/pin-selfhosted-runner-python-env-skill.md. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ound variable)
scripts/run_kakeya_mac.sh used 'set -u' + a bare "${EXTRA[@]}". macOS's default
/bin/bash is 3.2, where expanding an EMPTY array under nounset errors with
'EXTRA[@]: unbound variable' — hit when the launcher is run with no pass-through
args (the common interactive case). Use the canonical ${EXTRA[@]+"${EXTRA[@]}"}
form (elements if set, nothing if empty, no nounset error). Add
mlx-kakeya-launcher-dryrun-bash32 preset to guard it on the real /bin/bash 3.2.
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…nning-skill-2815' into _train2 Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…rray-fix-2815' into _train2 # Conflicts: # scripts/run_kakeya_mac.sh Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ne-launcher-2815' into _train2 # Conflicts: # inference_engine/bridge/manifest.py # tests/inference_engine/bridge/test_manifest.py Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ration-fix-2815' into _train2 # Conflicts: # inference_engine/backends/mlx/fused_specdecode.py # inference_engine/bridge/manifest.py # scripts/research/k3_integrated_niah_eval_mac.py # tests/inference_engine/bridge/test_manifest.py Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
A verified integration branch that merges the four open Mac PRs — #150 (runner Python pin), #154 (launcher bash-3.2 fix), #148 (launcher 2048 + full preset), #149 (runaway guard + codegen presets) — onto current
main(which already has #152/#153 streaming), with all conflicts resolved and tests green. Merge this one PR to land all four at once, or use it as the reference resolution for merging them individually.Verified merge order
#150 → #154 → #148 → #149runner_python.py/run_preset.py/docs + launcherPYBIN; main's launcher was untouched by fix(mac-chat): stream tokens live so the CLI doesn't look frozen on long answers #152/fix(mac-chat): actually stop [stream] lines interleaving (the part #152 squash-dropped) #153).run_kakeya_mac.shcmd=(…)line (feat(mac-bridge): pin workload Python interpreter + import gate (Layers B/C) + reusable skill #150's$PYBINvs fix(mac-launcher): bash 3.2-safe empty-array expansion (fixes "EXTRA[@]: unbound variable") #154's${EXTRA[@]+…}). Resolution = union:cmd=( "$PYBIN" … "${args[@]}" ${EXTRA[@]+"${EXTRA[@]}"} ).manifest.py+test_manifest.py(preset additions at the same anchor). Resolution = keep both presets (launcher-dryrun-bash32+launcher-full);run_kakeya_mac.shauto-merged (2048 default).fused_specdecode.py(×3 signatures): union both params →on_commit=…, stop_on_runaway=True. Loop bodies auto-merged so each loop has both_emit(on_commit, …)(streaming) and the_trailing_runaway_dropguard.k3_integrated_niah_eval_mac.py(×4): union — keep--chat-stream-stdout(streaming) +--chat-scripted-file/--fused-no-loop-guard(fix(mlx-fused): runaway-loop guard for greedy markdown-marker collapse on code prompts #149) args; each generate call gets bothon_commit=on_commit, stop_on_runaway=_guard.manifest.py/test_manifest.py: keep all presets (chat-stream-probe+codegen-degen-probe+codegen-guard-validate+ the launcher ones).Testing (on the merged result)
pytest tests/inference_engine/bridge/ tests/backends/mlx/test_fused_specdecode.py tests/inference_engine/bench/test_k3_report_gate.py→ 112 passedbash -n scripts/run_kakeya_mac.sh; combined launcher dry-run (emptyEXTRA) →--max-new-tokens 2048,$PYBIN,${EXTRA[@]+…}all presentfused_specdecode.pyloops carry both streaming and the runaway guard;k3_…has streaming + native-ref + guard +_guarddefined