Skip to content

merge-train: #150 + #154 + #148 + #149 onto current main (conflicts resolved, tests green)#155

Merged
cursor[bot] merged 16 commits into
mainfrom
AgentMemory/merge-train-148-149-150-154-2815
Jun 18, 2026
Merged

merge-train: #150 + #154 + #148 + #149 onto current main (conflicts resolved, tests green)#155
cursor[bot] merged 16 commits into
mainfrom
AgentMemory/merge-train-148-149-150-154-2815

Conversation

@FluffyAIcode

Copy link
Copy Markdown
Owner

What this is

A verified integration branch that merges the four open Mac PRs — #150 (runner Python pin), #154 (launcher bash-3.2 fix), #148 (launcher 2048 + full preset), #149 (runaway guard + codegen presets) — onto current main (which already has #152/#153 streaming), with all conflicts resolved and tests green. Merge this one PR to land all four at once, or use it as the reference resolution for merging them individually.

Verified merge order

#150 → #154 → #148 → #149

Testing (on the merged result)

  • pytest tests/inference_engine/bridge/ tests/backends/mlx/test_fused_specdecode.py tests/inference_engine/bench/test_k3_report_gate.py112 passed
  • bash -n scripts/run_kakeya_mac.sh; combined launcher dry-run (empty EXTRA) → --max-new-tokens 2048, $PYBIN, ${EXTRA[@]+…} all present
  • ✅ Merged fused_specdecode.py loops carry both streaming and the runaway guard; k3_… has streaming + native-ref + guard + _guard defined

If you merge the four PRs individually instead, follow the order/resolutions above. After landing, close this integration PR (and the four originals will show as merged/redundant).

Open in Web Open in Cursor 

cursoragent and others added 16 commits June 17, 2026 15:15
…preset

run_kakeya_mac.sh:
- Document that long answers are now coherent past the ~1024 native-cache ring
  wrap (PR #146: single-token commits once the sliding RotatingKVCache wraps).
- Raise default --max-new-tokens 1024 -> 2048 (the wrap is no longer a coherence
  cliff; FULL mode just drops the spec-decode speedup past it).
- Refresh help text and FULL-mode mode banner.

bridge: add mlx-kakeya-launcher-full preset (FULL f_θ path, long scripted answer
crossing the wrap, validate_reports) so CI/on-device guards the launcher's full
pipeline + the wrap fix end-to-end; launcher-smoke stays for fast wiring checks.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ive-control probe

KAKEYA_KDBG-gated per-block logging (sampled/committed ids, cyc_frac/cyc_p,
cache offsets) in fused_specdecode_generate, and a turn_compare_fused_vs_native
record (first_divergence_idx + both tails) in _run_fused_chat. New bridge preset
mlx-kakeya-codegen-degen-probe runs the C-code prompt with --chat-native-ref to
decide greedy-pathology vs engine bug. Instrumentation only; reverted after fix.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…efill) + multi-turn degen preset

KAKEYA_KDBG-gated prefill_state_fused / prefill_state_native records in
_run_fused_chat: per-turn prompt_len, evicted_count, rot/full cache offsets,
any_wrapped, would_wrap_block0, plus a turn index on turn_compare. Repoints
mlx-kakeya-codegen-degen-probe to the multi-turn repro (turn-1 PoW explanation
pushes the turn-2 code prompt's prefill past the sliding window) at 1200 tok.
Instrumentation only; reverted after fix.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…t prefill)

Multi-turn+native at 1200x2 OOM'd the Mac runner. Per debug analysis, the
cheapest test of H-C' (long-prompt prefill corrupts logits) vs H-A' (bounded-
greedy pathology) is a single-turn LONG prompt that wraps the ring AT prefill
(would_wrap_block0) with a tiny 192-tok budget. Add --chat-scripted-file so the
~2k-char context is a committed fixture (pow_codegen_longprompt.txt) instead of
a giant manifest argv; repoint mlx-kakeya-codegen-degen-probe to it.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
Repro evidence: single-turn fused decode is TOKEN-IDENTICAL to native greedy
(first_divergence_idx=None) and coherent through 1200 tokens, so the engine is
faithful — the user's '由于...'/'**/.2/*' collapse is greedy-decoding pathology
on code/markdown-heavy prompts that the fused path (pure argmax, unlike
chat_mlx_kakeya.py) had no mitigation for. Once a loop starts the drafter
trivially predicts the repeats and the greedy verifier accepts them (high
accept_len), so it walls indefinitely.

Fix: _trailing_runaway_drop detects a 1..8-token unit repeated >=12x at the tail
(conservative; never trims legit lists/enumerations/code) and the three fused
loops stop generation, keeping a short clean tail instead of an unbounded wall.
Default ON (stop_on_runaway=True); --fused-no-loop-guard disables it for
degeneration probes. Adds stopped_on_runaway to the result.

Also: --chat-scripted-file (long prompt as committed fixture) + repoint the
codegen-degen probe to a single-turn long prompt that wraps the ring at prefill
(cheap; the multi-turn+native variant OOM'd the Mac runner). KAKEYA_KDBG probe
instrumentation retained (inert unless the env var is set) for the pending
on-device H-C'-vs-H-A' confirmation.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…_lm); add guard-ON validation preset

The 'env KAKEYA_KDBG=1 python3' prefix resolved a python3 without mlx_lm on the
runner (ModuleNotFoundError). Drop it (KDBG instrumentation is now inert, which
is also what we want for the final PR). The native_ref/text/stopped_on_runaway
signals in the JSON are sufficient to characterize + validate. Add
mlx-kakeya-codegen-guard-validate (guard ON) to prove the clean stop.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…get 1100) to reach the ~978-tok collapse onset

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ive-ref, budget 900 (matches the user's high-accept regime)

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…codegen presets

- Remove the KAKEYA_KDBG-gated debug instrumentation (helpers + per-block
  emission + prefill_state/turn_compare) from fused_specdecode.py and
  k3_integrated_niah_eval_mac.py. Investigation complete.
- Keep the production fix (runaway-loop guard) + the --chat-scripted-file /
  --fused-no-loop-guard / --chat-native-ref flags.
- Repoint the two codegen presets to the multi-turn 'explain||code' chat
  (guard-off probe + guard-on validate), accurate descriptions; drop the now-
  unused pow_codegen_longprompt.txt fixture.

On-device (Mac M4): across short/long/multi-turn regimes the engine is coherent
(fused==native); guard-on and guard-off outputs are byte-identical on the
multi-turn code scenario -> the guard is inert on healthy output (no regression)
and the systematic degeneration was already resolved by the wrap fix (#146).

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
… prompt

Captures the diagnosis+fix for the post-reboot ModuleNotFoundError (mlx_lm) on
the kakeya-mac-m4 runner: lightweight env-probe diagnosis, 3-layer fix (pin
venv on the runner agent PATH via .path/.env|launchd|systemd; resolve a pinned
interpreter in the workflow/executor instead of bare python3; fail-fast import
gate), reboot-inclusive verification, and the Cloud-VM-vs-runner distinction
(Mac-only deps belong on the runner, not the Linux Cloud Agent env). Includes a
ready-to-paste setup-agent prompt; generalized for any Claude/Codex agent.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…eck gate (Layer C)

Reboots can repoint the runner's default python3 to one without mlx_lm, which
broke every full-engine preset with a deep ModuleNotFoundError. Make the
workload interpreter explicit and verified:

- inference_engine/bridge/runner_python.py (NEW, pure + 100% unit-tested):
  workload_python_candidates (pin KAKEYA_MAC_PYTHON -> venvs -> PATH),
  resolve_workload_python (first interpreter that can import mlx_lm; else
  fallback), preset_requires_gate (mlx-/k3- engine presets, minus env-probe/
  upgrade), substitute_python, gate_error_message.
- scripts/mac_bridge/run_preset.py: resolve the pinned interpreter, rewrite bare
  python3 argv0 to it, export KAKEYA_MAC_PYTHON to the subprocess, and FAIL FAST
  (exit 90 + ::error::) when a gated preset has no mlx_lm-capable interpreter.
- scripts/run_kakeya_mac.sh: honor KAKEYA_MAC_PYTHON; preflight asserts mlx+mlx_lm.

CI enforcement: the resolution/gate logic lives in the unit-tested, 100%-coverage
library (runner_python.py), so every PR exercises it on the Linux gate. See
docs/skills/pin-selfhosted-runner-python-env-skill.md.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ound variable)

scripts/run_kakeya_mac.sh used 'set -u' + a bare "${EXTRA[@]}". macOS's default
/bin/bash is 3.2, where expanding an EMPTY array under nounset errors with
'EXTRA[@]: unbound variable' — hit when the launcher is run with no pass-through
args (the common interactive case). Use the canonical ${EXTRA[@]+"${EXTRA[@]}"}
form (elements if set, nothing if empty, no nounset error). Add
mlx-kakeya-launcher-dryrun-bash32 preset to guard it on the real /bin/bash 3.2.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…nning-skill-2815' into _train2

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…rray-fix-2815' into _train2

# Conflicts:
#	scripts/run_kakeya_mac.sh

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ne-launcher-2815' into _train2

# Conflicts:
#	inference_engine/bridge/manifest.py
#	tests/inference_engine/bridge/test_manifest.py

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…ration-fix-2815' into _train2

# Conflicts:
#	inference_engine/backends/mlx/fused_specdecode.py
#	inference_engine/bridge/manifest.py
#	scripts/research/k3_integrated_niah_eval_mac.py
#	tests/inference_engine/bridge/test_manifest.py

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
@cursor cursor Bot merged commit 5c1bc29 into main Jun 18, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants