Systematic methodology for optimizing agent skills using deep-learning-style approach.
Based on arXiv:2605.23904 by Microsoft, Shanghai Jiao Tong University, Tongji University, and Fudan University.
SkillOpt treats skill documents as trainable parameters and applies bounded, validated, iterative updates - similar to gradient descent in deep learning, but operating in text space.
| Deep Learning | SkillOpt Analogue |
|---|---|
| Parameter | Skill document (SKILL.md) |
| Gradient | Edit direction from trajectories |
| Learning rate | Edit budget (max edits/step) |
| Validation | Held-out test set |
| Training stability | Rejected-edit buffer |
✅ Use when:
- Skill has inconsistent performance
- You have execution feedback (success/failure data)
- Manual edits aren't improving results
- Need to prevent regression during updates
❌ Skip when:
- No feedback data available
- Skill is already performing well
- One-off tasks
- Collect feedback: Run 20-40 tasks with current skill, record outcomes
- Separate: Split into successes and failures
- Analyze: Feed failures →
analyst_error.md, successes →analyst_success.md - Merge: Use merge prompts to combine proposals
- Apply: Limit to 3-5 edits per iteration
- Validate: Test on held-out cases before accepting
- Iterate: Repeat until performance stabilizes
skillopt-methodology-skill/
├── SKILL.md # Main skill definition
├── README.md # This file
└── references/
├── methodology.md # Complete methodology guide
├── design-principles.md # Core design principles
└── prompts/
├── analyst_error.md # Failure analysis prompt
├── analyst_success.md # Success analysis prompt
├── merge_failure.md # Merge failure proposals
├── merge_success.md # Merge success proposals
├── merge_final.md # Final merge (failure-priority)
├── ranking.md # Rank and select top edits
├── slow_update.md # Epoch-wise slow update
└── meta_skill.md # Optimizer memory
| Parameter | Default | Description |
|---|---|---|
| Epochs (E) | 4 | Number of optimization epochs |
| Rollout batch (B) | 40 | Tasks per rollout batch |
| Minibatch size (B_m) | 8 | Trajectories per reflection minibatch |
| Edit budget (L_t) | 4 | Max edits per step (cosine decay) |
| Min learning rate | 2 | Minimum edit budget |
- 52 evaluation cells (6 benchmarks × 7 models × 3 harnesses)
- Best or tied-best on ALL 52 cells
- GPT-5.5 improvements: +23.5 (chat), +24.8 (Codex), +19.1 (Claude Code)
- vs. strongest baseline: +5.4 points average
MIT
@article{skillopt2026,
title={SkillOpt: Controllable Text-Space Optimization for Agent Skills},
author={Microsoft Research and Shanghai Jiao Tong University and Tongji University and Fudan University},
journal={arXiv preprint arXiv:2605.23904},
year={2026}
}