Experiment with proposer KV full-attn restoration on Mac#108
Draft
FluffyAIcode wants to merge 1 commit into
Conversation
Adds a disabled S5 experiment that builds full-attention restored K/V from proposer K/V via f_theta instead of an extra verifier capture forward. The Mac ctx70 evidence shows the fixed build cost drops to about 2s, but recall regresses to 0/1 and decode slows, so this is evaluation evidence for retraining/next-step design rather than a merge-ready optimization. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--s5-f-theta-restored-full-attn, that builds full-attention restored K/V from proposer K/V viaf_thetainstead of running an extra verifier prompt forward.build_restoration_scost drops to about 2s.f_theta_v5_s5_slidingwas trained with S5 full-attn layers excluded, and the experiment regresses recall to 0/1 while slowing attach/decode.Evaluation
python3 -m py_compile scripts/research/k3_integrated_niah_eval_mac.pypytest -q tests/inference_engine/v04/test_f_theta.py tests/backends/mlx/test_cache.pybuild_restoration_s=1.962,prefill_attach_s=24.096,decode_s=42.455,recall_cross_model=0.0,recall_oracle=1.0build_restoration_s=2.154,prefill_attach_s=21.971,decode_s=27.862,recall_cross_model=0.0,recall_oracle=1.0Interpretation
This confirms the architectural diagnosis: avoiding the extra verifier capture forward can remove most fixed restoration-build latency. It does not produce a usable optimization with the current checkpoint because full-attn f_theta restoration is not trained well enough for NIAH recall and the bad restored cache degrades decode.
Next Step
Train or fine-tune an f_theta checkpoint that includes full-attn S5 layers, then rerun this same flag as the latency/recall gate.
Made with Cursor