Claude Code spends your money on work you never asked for: re-reading files it just edited, pasting back diffs it already applied, writing three-paragraph reports for two-line fixes. I measured every token of it, deleted it, and published all 82 raw runs — including the ones where I lose.
Measured hardest on Claude Fable 5 — the hungriest configuration that ever existed — and tuned for every effort-based Claude. /eco: −31% to −73% output tokens depending on task and effort level (−63% mean on the flagship n=5 study), with critical findings intact and all produced fixes executed and verified. /eco-max: up to −75% by dialing reasoning effort down — opt-in, labeled.
Results · Install · Usage · Benchmarks · FAQ
The waste isn't occasional — it repeats on every turn of every session, which is why usage limits evaporate and API bills surprise you. claude-eco deletes it with two skills:
/eco— frugality rules with a non-negotiable quality floor. Full reasoning depth preserved./eco-max— the same rules plus a low reasoning-effort override for routine chores. Effort is the single biggest token lever on modern Claude models (Fable 5, Sonnet 5, Opus 4.8), and of every terse-mode tool we surveyed, this is the only one we found that touches it.
What that means in practice: verbose answers shrink 2–4×, replies come back up to 3× faster (30s → 10–15s, measured), and the same weekly limit simply lasts longer — same model, same fixes, minus the padding.
Same question. Same correct answer. Half the tokens — and the critical warning kept. One real pair, shown for illustration and excluded from the study statistics below (what does applyDiscount(100, 'SAVE10') return?; verbatim, raw JSONs):
Without eco (231-token answer):
applyDiscount(100, 'SAVE10')returns 90.In
test/orders.js:11-16, the code looks up'SAVE10'in the discounts map and gets10(percent), then computes100 - (100 * 10) / 100 = 90.
With /eco (122-token answer):
applyDiscount(100, 'SAVE10')returns 90 —SAVE10maps to 10%, so100 - (100*10)/100(test/orders.js:15).Unrelated bug noticed:
calcTotalloopsi <= items.length(test/orders.js:5), so it reads past the array and throws onitems[i].price.
Half the tokens. That pair is real but hand-picked — eco's answer was chosen precisely because it carried the warning — so instead of implying it's typical, we measured how typical it is: full numbers in the warning & reporting studies. One spoiler worth the click: with the buggy file in view (n=5 per arm), eco v1.1 flagged the crash bug 5/5 times; stock Claude, 1/5. On that surface eco isn't just quieter — it's more reliable.
Baseline = stock Claude Code, no CLAUDE.md; the eco arm differs only by the skill invocation. Rows are labeled (effort · skill version).
| Task (effort · skill) | Baseline | With /eco | Output tokens | Cost | Quality |
|---|---|---|---|---|---|
| Code review (max · v1.0) | 1,096 tok | 531 tok | −52% | −21% | Single run: both arms found all 3 issues (2 planted + 1 unplanted). At default effort the picture differs — see the n=5 row |
| Real editing (max · v1.0) | 3,776 tok | 1,026 tok | −73% | −46% | Fixes verified functionally identical with Node |
| Real editing, re-run (default effort · v1.1 · n=1) | 1,610 tok | 1,107 tok | −31% | ≈0% | Both fixes executed and verified identical with Node — v1.1 consistent on in-scope tasks; smaller margin because the default-effort baseline is already leaner |
| Multi-file project, 3-turn session (max · v1.0) | 11,912 tok | 3,285 tok | −72% | −46% | Same root cause, same fix, tests pass |
| Code review with /eco-max (max · v1.0) | 1,096 tok | 279 tok | −75% | −30% | 2/2 planted bugs; missed the 1 unplanted edge case — that's the effort tradeoff, and it's why eco-max is opt-in |
| Code review, n=5 per arm (default effort · v1.1) | 891 mean (824–937) | 328 mean (310–380) | −63% (ratio of means) | −12% mean | 10/10 runs found both planted bugs. The unplanted non-critical nitpick: baseline 5/5, eco 0/5 — by design; correctness-critical findings are exempt (measured) |
The short version: savings hold across tasks, models and effort levels, and quality is measured rather than assumed — noticed critical warnings get reported 5/5 even at low effort (the old v1.0 rules suppressed them 0/5; stock Claude volunteered 1/5), and detection on explicit reviews was 10/10. Every number above maps to a raw JSON: methodology, grading criteria, the warning/reporting studies, run inventory and reproducibility notes are all in benchmarks/results.md — or skip the reading and run the same A/B on your task: ./benchmarks/run.sh "your task here".
Plugin (cleanest):
/plugin marketplace add sup3x/claude-code-eco
/plugin install claude-eco@claude-eco
Personal skill (all your projects):
git clone https://github.com/sup3x/claude-code-eco && cd claude-code-eco && ./install.sh # macOS / Linuxgit clone https://github.com/sup3x/claude-code-eco; cd claude-code-eco; .\install.ps1 # WindowsProject-only: copy skills/ into your repo's .claude/skills/. This is also what makes /eco available in Claude Code web/mobile cloud sessions — those only load skills from the repo, not from ~/.claude/skills/.
Uninstall: delete the eco/ and eco-max/ folders from your skills directory. /eco setup only ever writes plain settings.json keys (effortLevel and two env entries) — remove them to fully revert.
| Command | Effect |
|---|---|
/eco |
Frugal mode for the rest of the session |
/eco <task> |
Do the task frugally (mode stays on) |
/eco-max <task> |
Maximum savings: frugality rules plus low reasoning effort — for routine chores |
/eco setup |
Propose permanent savings in settings.json — applies only after you confirm |
Works in any language — the rules are English, the replies follow yours. No slash commands available (mobile app, web)? Just say it in plain words — "activate eco mode" triggers the skill (verified in English and Turkish). Invoke once per session, not per question: activation costs one turn plus ~1.2k input tokens once per session (skill body + descriptions, then cached), so for a single trivial question the overhead exceeds the savings (we measured that too). Activated early, every subsequent answer is ~2–4× smaller. One more honest caveat: if a very long session gets context-compacted, the rules may be summarized away — re-invoke /eco after compaction.
Both skills share the v1.1 quality floor, including the keep-noticed-critical-warnings clause — measured on both (5/5 each, eco-max at low effort). Use /eco as the everyday default — full reasoning depth on what you ask for. What it stops doing is volunteering non-critical extras: in the max-effort single run it matched the baseline's bonus finding, but the default-effort n=5 study shows the honest steady state (baseline volunteers an unplanted nitpick 5/5, eco 0/5) — while correctness-critical warnings stay protected by the quality floor. Use /eco-max for renames, small fixes, boilerplate and lookups; it's instructed to tell you and recommend /eco if the task turns out hard. Note that its effort override applies per-invocation and effort is part of the prompt-cache key — so calling /eco-max mid-way through a long, heavily-cached session can cost a cache rebuild. Best at session start or in short sessions.
Same review task; n=1 per cell and v1.0 skill except where labeled. Fable 5 ran at max effort (its session default at the time), the others at their own defaults — so percentages are not directly comparable across rows, and absolute token counts never are (tokenizers differ between models; Sonnet 5 ships an updated one). Only the within-row % matters:
| Model | Baseline | /eco | Output tokens |
|---|---|---|---|
| Fable 5 (max effort) | 1,096 | 531 | −52% |
| Opus 4.8 | 648 | 340 | −48% |
| Sonnet 5 (baseline n=5, eco n=10 · default · v1.1) | 592 mean (464–770) | 276 mean (204–380) | −53% mean |
| Haiku 4.5 | 631 | 733 | +16% — skip it |
The Haiku row is a negative result, published on purpose: Haiku is already terse and cheap, so the skill's body overhead isn't worth it there. Biggest wins are at high effort, but default effort works too (−51% to −63% measured). Sonnet 5 got the deep treatment after it became the Free/Pro default model — and it produced our most interesting quality finding. The critical crash bug: found in every run by both arms (baseline 5/5, eco 10/10). The secondary NaN edge case: baseline 5/5, eco 6/10 — a tendency that persisted when we extended eco from n=5 to n=10, though with baseline still at n=5 it is not yet statistically conclusive. The consistent pattern suggests brevity pressure on Sonnet sometimes trims the minor finding. If your review workflow depends on exhaustive minor-issue sweeps on Sonnet, that's a real cost; a completeness-over-brevity clause for review tasks is the candidate fix for v1.2 and will be benchmarked before it ships. The single-run rows found every planted bug in both arms.
- Replies — lead with the answer; no preamble, recap, or unprompted progress summaries; soft ≤8-line default; never paste back files just written (cite
path:line); one solution, not a menu. - Reasoning — deliberate minimally on routine steps, think deeply only at genuine decision points.
- Tools — Edit over Write (Write re-emits whole files); grep first, then read only the matched region; no re-reads after own edits; batch independent calls; quiet shell flags; broad sweeps via one cheap Haiku subagent.
- Quality floor — read before changing, verify when the task calls for it, never truncate deliverables, and report noticed correctness-critical findings in one line even when unasked (v1.1). If brevity ever conflicts with correctness, correctness wins.
Anthropic shipped a hard "≤100 words" cap in the Claude Code system prompt on 2026-04-16 and reverted it four days later after measuring a 3% coding-quality drop (postmortem). claude-eco uses behavioral rules with a soft, escapable length default — the savings come from deleting waste, not from squeezing substance.
The biggest lever on effort-based Claude models (Fable 5, Sonnet 5, Opus 4.8) is the reasoning effort level: on Opus 4.5, Anthropic measured medium effort matching a peer model's quality with 76% fewer output tokens. /eco setup proposes "effortLevel": "medium" persistently; /eco-max applies a per-task override. Pick effort at session start — mid-session switches invalidate the prompt cache. The full ranked list (cache hygiene, MCP schema debloat, subagent economics, CLAUDE.md diet) is in docs/token-optimization-guide.md.
These solve different layers and compose with claude-eco:
| Project | Layer | Notes |
|---|---|---|
| caveman | Terse output style (skill) | The category giant; explicitly output-tokens-only — no reasoning-effort control |
| claude-token-efficient | Terse CLAUDE.md ruleset | Ships honest raw benchmarks (~2–11% on one-shot Q&A) |
| rtk / headroom | Tool-output compression proxies | Shrink command/log output before it hits context |
| token-savior | Code-navigation MCP | Structural navigation instead of file dumps |
| ccusage | Monitoring | Measures spend; reduces nothing |
What claude-eco adds that none of the above have: the reasoning-effort lever, agentic benchmarks with executed and quality-graded fixes, and a one-command harness to A/B your own workload.
Why are your percentages bigger than other rulesets report? Regime. On one-shot Q&A prompts there's little fat to cut — drona23's own raw data honestly shows ~2–11%. On agentic coding tasks — where an unconstrained model pastes diffs, writes long reports, and creates unrequested files — there is far more waste; that's what we measured, with quality graded and fixes executed. Run the harness on your workload and trust your own number.
What does this do to my actual bill? Output tokens are only part of a Claude Code bill — input/context often dominates. That's why we report cost, not just tokens: ≈0% to −46% measured total cost per task. The low end is the single-pair fix re-run at default effort — the baseline there is already lean, and one-shot costs are dominated by fixed session overhead. The high end comes from long agentic tasks at high effort, where eco also cuts the input side (~40% fewer read-tokens in the multi-turn test via grep-first reading) and fewer turns mean fewer full-context passes. Your split depends on cache configuration; the raw JSONs itemize both sides.
Does it make Claude dumber? /eco — mostly no; it makes Claude quieter, and we publish the exceptions. In the v1.0 max-effort benchmarks it found every planted bug and produced functionally identical, test-passing fixes; the v1.1 n=5 study confirms 10/10 planted-bug consistency at default effort on Fable. On Sonnet 5 it matched the critical crash bug 10/10 but missed a secondary edge-case bug in 4/10 runs — reported in the models section. One real tradeoff we noticed and addressed: early versions also suppressed useful unsolicited observations, so the v1.1 quality floor explicitly requires correctness-critical findings to be flagged in one line even when unasked — suppress noise, never warnings. Honest scope note: that's a guarantee about reporting what gets noticed, not about noticing — the warning-rate study in benchmarks/results.md shows out-of-scope issues rarely get noticed by either arm on tasks that don't ask for review. /eco-max does lower reasoning effort — opt-in, per task, labeled.
Why was it built around Fable 5? Because that was the hungriest configuration in existence: the headline single-run benchmarks (v1.0) ran on claude-fable-5 at max effort — the worst case, not a cherry-picked easy one. The v1.1 studies add the default-effort picture and the Sonnet 5 deep study, where savings are smaller but consistent.
Other models? Measured with /eco: −48% on Opus 4.8, −53% mean on Sonnet 5 (eco n=10 — now the Free/Pro default model), −31% to −73% on Fable 5 depending on task and effort (the −75% figure is /eco-max, which lowers effort). The exception is Haiku (+16%, skip it) — see Across models.
How much do the skills themselves cost? ~1.2k input tokens once per session when invoked (skill body plus the two ~60-token descriptions loaded at startup), cached afterwards. Same number as in Usage above — it's why activating for one trivial question isn't worth it.
Can a skill reduce thinking tokens? Softly at best via instructions — on adaptive-reasoning models like Fable 5, thinking follows the effort setting and MAX_THINKING_TOKENS is ignored (model config docs). That's exactly why /eco-max overrides effort via skill frontmatter and setup proposes a persistent effortLevel.
What can't it fix? The fixed session overhead (system prompt, tool schemas), MCP schema bloat, and your CLAUDE.md size — the guide covers those levers.
Issues and PRs welcome — especially benchmark results from other workloads and platforms; the harness makes it a one-liner, please include the raw JSON. If claude-eco stretched your usage window or trimmed your bill, a ⭐ helps other people find it — that's this project's only price.
MIT © 2026 Kerim
