GitHub - op12no2/patchwork: An informal cumulative and competitive frontier model eval using a Javascript chess engine

Patchwork

An informal cumulative and competitive frontier model eval using a Javascript chess engine.

Procedure

Assume A is currently the leading engine (initially 0000_original). A model/CLI is selected to improve it by creating a new engine B via prompt.md. If a B v A SPRT passes, B becomes the new leader. So for example 0002_sonnet_4_6 was derived from 0000_original, not 0001_haiku_4_5.

    /---> 0001          /---> 0004
0000 ---> 0002 ---> 0003 ---> 0005 ---> 0006 etc.

See bin/sprt.

Progress

Engine	Diff	Model	CLI	SPRT
0008_opus_4_8	Δ	Anthropic Claude Opus 4.8	Claude Code	✓
0007_opus_4_7	Δ	Anthropic Claude Opus 4.7	Claude Code	✓
0006_gpt_5_5	Δ	OpenAI GPT 5.5	Codex	✓
0005_opus_4_7	Δ	Anthropic Claude Opus 4.7	Claude Code	✓
0004_gpt_5_5	Δ	OpenAI GPT 5.5	Codex	✗
0003_opus_4_7	Δ	Anthropic Claude Opus 4.7	Claude Code	✓
0002_sonnet_4_6	Δ	Anthropic Claude Sonnet 4.6	Claude Code	✓
0001_haiku_4_5	Δ	Anthropic Claude Haiku 4.5	Claude Code	✗
0000_original

Tournament

Rank	Engine	Elo	Games	Score	Draws
1	0008_opus_4_8	2164 ±17.21	1600	73.0%	27.4%
2	0007_opus_4_7	2148 ±17.39	1600	71.1%	26.9%
3	0006_gpt_5_5	2060 ±15.50	1600	59.6%	30.0%
4	0005_opus_4_7	2026 ±15.53	1600	55.0%	31.0%
5	0003_opus_4_7	2019 ±15.96	1600	53.9%	30.8%
6	0004_gpt_5_5	2013 ±15.92	1600	53.0%	27.6%
7	0002_sonnet_4_6	1900 ±16.55	1600	37.0%	26.6%
8	0000_original	1800 ±18.66	1600	24.8%	21.9%
9	0001_haiku_4_5	1778 ±18.42	1600	22.6%	21.6%

See bin/tourny.

Notes

There is a Windows executable for each engine in ./engines for anybody that is interested.

Acknowledgements

https://github.com/Disservin/fastchess - SPRT and tournament manager

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
bin		bin
engines		engines
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
prompt.md		prompt.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Patchwork

Procedure

Progress

Tournament

Notes

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Patchwork

Procedure

Progress

Tournament

Notes

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages