Skip to content

Experiment with proposer KV full-attn restoration on Mac#108

Draft
FluffyAIcode wants to merge 1 commit into
AgentMemory/v04-pr-k3-block-c-f-theta-v2-trainer-fix-recall-8e7ffrom
AgentMemory/k3-proposer-kv-restoration-fastpath-8e7f
Draft

Experiment with proposer KV full-attn restoration on Mac#108
FluffyAIcode wants to merge 1 commit into
AgentMemory/v04-pr-k3-block-c-f-theta-v2-trainer-fix-recall-8e7ffrom
AgentMemory/k3-proposer-kv-restoration-fastpath-8e7f

Conversation

@FluffyAIcode

Copy link
Copy Markdown
Owner

Summary

  • Adds a disabled Mac S5 experiment flag, --s5-f-theta-restored-full-attn, that builds full-attention restored K/V from proposer K/V via f_theta instead of running an extra verifier prompt forward.
  • Captures ctx70 Mac evidence for both KL ON and KL OFF, showing the fixed build_restoration_s cost drops to about 2s.
  • Marks this as draft because current f_theta_v5_s5_sliding was trained with S5 full-attn layers excluded, and the experiment regresses recall to 0/1 while slowing attach/decode.

Evaluation

  • Unit checks: python3 -m py_compile scripts/research/k3_integrated_niah_eval_mac.py
  • Unit checks: pytest -q tests/inference_engine/v04/test_f_theta.py tests/backends/mlx/test_cache.py
  • KL ON ctx70: build_restoration_s=1.962, prefill_attach_s=24.096, decode_s=42.455, recall_cross_model=0.0, recall_oracle=1.0
  • KL OFF ctx70: build_restoration_s=2.154, prefill_attach_s=21.971, decode_s=27.862, recall_cross_model=0.0, recall_oracle=1.0

Interpretation

This confirms the architectural diagnosis: avoiding the extra verifier capture forward can remove most fixed restoration-build latency. It does not produce a usable optimization with the current checkpoint because full-attn f_theta restoration is not trained well enough for NIAH recall and the bad restored cache degrades decode.

Next Step

Train or fine-tune an f_theta checkpoint that includes full-attn S5 layers, then rerun this same flag as the latency/recall gate.

Made with Cursor

Adds a disabled S5 experiment that builds full-attention restored K/V from proposer K/V via f_theta instead of an extra verifier capture forward. The Mac ctx70 evidence shows the fixed build cost drops to about 2s, but recall regresses to 0/1 and decode slows, so this is evaluation evidence for retraining/next-step design rather than a merge-ready optimization.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant