Skip to content

Native tool-call SFT + eval, transcript-distillation trajectories, local-serve + Codex skill sync#115

Open
ProfSynapse wants to merge 1 commit into
mainfrom
feature/native-toolcall-sft-eval
Open

Native tool-call SFT + eval, transcript-distillation trajectories, local-serve + Codex skill sync#115
ProfSynapse wants to merge 1 commit into
mainfrom
feature/native-toolcall-sft-eval

Conversation

@ProfSynapse

Copy link
Copy Markdown
Owner

What's included

  • SFT trainer: config-driven loss_mask_mode + tool_call_mode (assistant_only / native), plus a single-shard load fix that skips split-count verification when loading one shard.
  • Transcript distillation: native tool-trajectory emit, with claude.ai export and Codex CLI adapters for mining local agent transcripts into SFT/KTO rows.
  • Evaluator: prompt-set support, per-case verifier dispatch, and Qwen XML tool-call response parsing.
  • tuner: local-serve vLLM handler (tuner/handlers/local_serve_handler.py) and a Docker utility module (tuner/utils/docker.py).
  • Skills: .codex/skills sync mirror added alongside .skills / .agents/skills / .claude/skills, plus .gitignore hygiene and a configs/transcript_import/default.yaml import config.

What's intentionally NOT included

  • Tools/ holdout-materialization + aggregator scripts and their tests (tests/tools/, tests/evaluator/) — these are personal-model glue that lived on the feature branch and are out of scope for this generic PR.
  • The runner-dispatch test is left out because it imports the personal aggregator.

Testing

New tests added under:

  • tests/trainers/sft/ — loss-mask modes, tool-call mode data loader, native tool calls, preprocessing contract, chat-template kwargs passthrough, multi-split data loader.
  • tests/scripts/test_trajectory_emit.py — transcript trajectory emit (36 passing).
  • tests/shared/ — verifier extraction and verifier dispatch.
  • tests/cloud/test_training_capacity.py — training capacity.

🤖 Generated with Claude Code

…cal-serve + Codex skill sync

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant