Skip to content

sup3x/claude-code-eco

Repository files navigation

claude-eco — eco mode for Claude Code

Claude Code spends your money on work you never asked for: re-reading files it just edited, pasting back diffs it already applied, writing three-paragraph reports for two-line fixes. I measured every token of it, deleted it, and published all 82 raw runs — including the ones where I lose.

Measured hardest on Claude Fable 5 — the hungriest configuration that ever existed — and tuned for every effort-based Claude. /eco: −31% to −73% output tokens depending on task and effort level (−63% mean on the flagship n=5 study), with critical findings intact and all produced fixes executed and verified. /eco-max: up to −75% by dialing reasoning effort down — opt-in, labeled.

License: MIT Claude Code Benchmarks

Results · Install · Usage · Benchmarks · FAQ

Same answer at half the tokens — /eco −31% to −73% depending on task and effort, critical findings intact; /eco-max up to −75%

The waste isn't occasional — it repeats on every turn of every session, which is why usage limits evaporate and API bills surprise you. claude-eco deletes it with two skills:

  • /eco — frugality rules with a non-negotiable quality floor. Full reasoning depth preserved.
  • /eco-max — the same rules plus a low reasoning-effort override for routine chores. Effort is the single biggest token lever on modern Claude models (Fable 5, Sonnet 5, Opus 4.8), and of every terse-mode tool we surveyed, this is the only one we found that touches it.

What that means in practice: verbose answers shrink 2–4×, replies come back up to 3× faster (30s → 10–15s, measured), and the same weekly limit simply lasts longer — same model, same fixes, minus the padding.

See the difference

Same question. Same correct answer. Half the tokens — and the critical warning kept. One real pair, shown for illustration and excluded from the study statistics below (what does applyDiscount(100, 'SAVE10') return?; verbatim, raw JSONs):

Without eco (231-token answer):

applyDiscount(100, 'SAVE10') returns 90.

In test/orders.js:11-16, the code looks up 'SAVE10' in the discounts map and gets 10 (percent), then computes 100 - (100 * 10) / 100 = 90.

With /eco (122-token answer):

applyDiscount(100, 'SAVE10') returns 90SAVE10 maps to 10%, so 100 - (100*10)/100 (test/orders.js:15).

Unrelated bug noticed: calcTotal loops i <= items.length (test/orders.js:5), so it reads past the array and throws on items[i].price.

Half the tokens. That pair is real but hand-picked — eco's answer was chosen precisely because it carried the warning — so instead of implying it's typical, we measured how typical it is: full numbers in the warning & reporting studies. One spoiler worth the click: with the buggy file in view (n=5 per arm), eco v1.1 flagged the crash bug 5/5 times; stock Claude, 1/5. On that surface eco isn't just quieter — it's more reliable.

Measured results

Baseline = stock Claude Code, no CLAUDE.md; the eco arm differs only by the skill invocation. Rows are labeled (effort · skill version).

Task (effort · skill) Baseline With /eco Output tokens Cost Quality
Code review (max · v1.0) 1,096 tok 531 tok −52% −21% Single run: both arms found all 3 issues (2 planted + 1 unplanted). At default effort the picture differs — see the n=5 row
Real editing (max · v1.0) 3,776 tok 1,026 tok −73% −46% Fixes verified functionally identical with Node
Real editing, re-run (default effort · v1.1 · n=1) 1,610 tok 1,107 tok −31% ≈0% Both fixes executed and verified identical with Node — v1.1 consistent on in-scope tasks; smaller margin because the default-effort baseline is already leaner
Multi-file project, 3-turn session (max · v1.0) 11,912 tok 3,285 tok −72% −46% Same root cause, same fix, tests pass
Code review with /eco-max (max · v1.0) 1,096 tok 279 tok −75% −30% 2/2 planted bugs; missed the 1 unplanted edge case — that's the effort tradeoff, and it's why eco-max is opt-in
Code review, n=5 per arm (default effort · v1.1) 891 mean (824–937) 328 mean (310–380) −63% (ratio of means) −12% mean 10/10 runs found both planted bugs. The unplanted non-critical nitpick: baseline 5/5, eco 0/5 — by design; correctness-critical findings are exempt (measured)

The short version: savings hold across tasks, models and effort levels, and quality is measured rather than assumed — noticed critical warnings get reported 5/5 even at low effort (the old v1.0 rules suppressed them 0/5; stock Claude volunteered 1/5), and detection on explicit reviews was 10/10. Every number above maps to a raw JSON: methodology, grading criteria, the warning/reporting studies, run inventory and reproducibility notes are all in benchmarks/results.md — or skip the reading and run the same A/B on your task: ./benchmarks/run.sh "your task here".

Install

Plugin (cleanest):

/plugin marketplace add sup3x/claude-code-eco
/plugin install claude-eco@claude-eco

Personal skill (all your projects):

git clone https://github.com/sup3x/claude-code-eco && cd claude-code-eco && ./install.sh          # macOS / Linux
git clone https://github.com/sup3x/claude-code-eco; cd claude-code-eco; .\install.ps1             # Windows

Project-only: copy skills/ into your repo's .claude/skills/. This is also what makes /eco available in Claude Code web/mobile cloud sessions — those only load skills from the repo, not from ~/.claude/skills/.

Uninstall: delete the eco/ and eco-max/ folders from your skills directory. /eco setup only ever writes plain settings.json keys (effortLevel and two env entries) — remove them to fully revert.

Usage

Command Effect
/eco Frugal mode for the rest of the session
/eco <task> Do the task frugally (mode stays on)
/eco-max <task> Maximum savings: frugality rules plus low reasoning effort — for routine chores
/eco setup Propose permanent savings in settings.json — applies only after you confirm

Works in any language — the rules are English, the replies follow yours. No slash commands available (mobile app, web)? Just say it in plain words — "activate eco mode" triggers the skill (verified in English and Turkish). Invoke once per session, not per question: activation costs one turn plus ~1.2k input tokens once per session (skill body + descriptions, then cached), so for a single trivial question the overhead exceeds the savings (we measured that too). Activated early, every subsequent answer is ~2–4× smaller. One more honest caveat: if a very long session gets context-compacted, the rules may be summarized away — re-invoke /eco after compaction.

Both skills share the v1.1 quality floor, including the keep-noticed-critical-warnings clause — measured on both (5/5 each, eco-max at low effort). Use /eco as the everyday default — full reasoning depth on what you ask for. What it stops doing is volunteering non-critical extras: in the max-effort single run it matched the baseline's bonus finding, but the default-effort n=5 study shows the honest steady state (baseline volunteers an unplanted nitpick 5/5, eco 0/5) — while correctness-critical warnings stay protected by the quality floor. Use /eco-max for renames, small fixes, boilerplate and lookups; it's instructed to tell you and recommend /eco if the task turns out hard. Note that its effort override applies per-invocation and effort is part of the prompt-cache key — so calling /eco-max mid-way through a long, heavily-cached session can cost a cache rebuild. Best at session start or in short sessions.

Across models

Same review task; n=1 per cell and v1.0 skill except where labeled. Fable 5 ran at max effort (its session default at the time), the others at their own defaults — so percentages are not directly comparable across rows, and absolute token counts never are (tokenizers differ between models; Sonnet 5 ships an updated one). Only the within-row % matters:

Model Baseline /eco Output tokens
Fable 5 (max effort) 1,096 531 −52%
Opus 4.8 648 340 −48%
Sonnet 5 (baseline n=5, eco n=10 · default · v1.1) 592 mean (464–770) 276 mean (204–380) −53% mean
Haiku 4.5 631 733 +16% — skip it

The Haiku row is a negative result, published on purpose: Haiku is already terse and cheap, so the skill's body overhead isn't worth it there. Biggest wins are at high effort, but default effort works too (−51% to −63% measured). Sonnet 5 got the deep treatment after it became the Free/Pro default model — and it produced our most interesting quality finding. The critical crash bug: found in every run by both arms (baseline 5/5, eco 10/10). The secondary NaN edge case: baseline 5/5, eco 6/10 — a tendency that persisted when we extended eco from n=5 to n=10, though with baseline still at n=5 it is not yet statistically conclusive. The consistent pattern suggests brevity pressure on Sonnet sometimes trims the minor finding. If your review workflow depends on exhaustive minor-issue sweeps on Sonnet, that's a real cost; a completeness-over-brevity clause for review tasks is the candidate fix for v1.2 and will be benchmarked before it ships. The single-run rows found every planted bug in both arms.

What it actually does

  1. Replies — lead with the answer; no preamble, recap, or unprompted progress summaries; soft ≤8-line default; never paste back files just written (cite path:line); one solution, not a menu.
  2. Reasoning — deliberate minimally on routine steps, think deeply only at genuine decision points.
  3. Tools — Edit over Write (Write re-emits whole files); grep first, then read only the matched region; no re-reads after own edits; batch independent calls; quiet shell flags; broad sweeps via one cheap Haiku subagent.
  4. Quality floor — read before changing, verify when the task calls for it, never truncate deliverables, and report noticed correctness-critical findings in one line even when unasked (v1.1). If brevity ever conflicts with correctness, correctness wins.

Why no hard word cap?

Anthropic shipped a hard "≤100 words" cap in the Claude Code system prompt on 2026-04-16 and reverted it four days later after measuring a 3% coding-quality drop (postmortem). claude-eco uses behavioral rules with a soft, escapable length default — the savings come from deleting waste, not from squeezing substance.

Squeeze further

The biggest lever on effort-based Claude models (Fable 5, Sonnet 5, Opus 4.8) is the reasoning effort level: on Opus 4.5, Anthropic measured medium effort matching a peer model's quality with 76% fewer output tokens. /eco setup proposes "effortLevel": "medium" persistently; /eco-max applies a per-task override. Pick effort at session start — mid-session switches invalidate the prompt cache. The full ranked list (cache hygiene, MCP schema debloat, subagent economics, CLAUDE.md diet) is in docs/token-optimization-guide.md.

Related projects

These solve different layers and compose with claude-eco:

Project Layer Notes
caveman Terse output style (skill) The category giant; explicitly output-tokens-only — no reasoning-effort control
claude-token-efficient Terse CLAUDE.md ruleset Ships honest raw benchmarks (~2–11% on one-shot Q&A)
rtk / headroom Tool-output compression proxies Shrink command/log output before it hits context
token-savior Code-navigation MCP Structural navigation instead of file dumps
ccusage Monitoring Measures spend; reduces nothing

What claude-eco adds that none of the above have: the reasoning-effort lever, agentic benchmarks with executed and quality-graded fixes, and a one-command harness to A/B your own workload.

FAQ

Why are your percentages bigger than other rulesets report? Regime. On one-shot Q&A prompts there's little fat to cut — drona23's own raw data honestly shows ~2–11%. On agentic coding tasks — where an unconstrained model pastes diffs, writes long reports, and creates unrequested files — there is far more waste; that's what we measured, with quality graded and fixes executed. Run the harness on your workload and trust your own number.

What does this do to my actual bill? Output tokens are only part of a Claude Code bill — input/context often dominates. That's why we report cost, not just tokens: ≈0% to −46% measured total cost per task. The low end is the single-pair fix re-run at default effort — the baseline there is already lean, and one-shot costs are dominated by fixed session overhead. The high end comes from long agentic tasks at high effort, where eco also cuts the input side (~40% fewer read-tokens in the multi-turn test via grep-first reading) and fewer turns mean fewer full-context passes. Your split depends on cache configuration; the raw JSONs itemize both sides.

Does it make Claude dumber? /eco — mostly no; it makes Claude quieter, and we publish the exceptions. In the v1.0 max-effort benchmarks it found every planted bug and produced functionally identical, test-passing fixes; the v1.1 n=5 study confirms 10/10 planted-bug consistency at default effort on Fable. On Sonnet 5 it matched the critical crash bug 10/10 but missed a secondary edge-case bug in 4/10 runs — reported in the models section. One real tradeoff we noticed and addressed: early versions also suppressed useful unsolicited observations, so the v1.1 quality floor explicitly requires correctness-critical findings to be flagged in one line even when unasked — suppress noise, never warnings. Honest scope note: that's a guarantee about reporting what gets noticed, not about noticing — the warning-rate study in benchmarks/results.md shows out-of-scope issues rarely get noticed by either arm on tasks that don't ask for review. /eco-max does lower reasoning effort — opt-in, per task, labeled.

Why was it built around Fable 5? Because that was the hungriest configuration in existence: the headline single-run benchmarks (v1.0) ran on claude-fable-5 at max effort — the worst case, not a cherry-picked easy one. The v1.1 studies add the default-effort picture and the Sonnet 5 deep study, where savings are smaller but consistent.

Other models? Measured with /eco: −48% on Opus 4.8, −53% mean on Sonnet 5 (eco n=10 — now the Free/Pro default model), −31% to −73% on Fable 5 depending on task and effort (the −75% figure is /eco-max, which lowers effort). The exception is Haiku (+16%, skip it) — see Across models.

How much do the skills themselves cost? ~1.2k input tokens once per session when invoked (skill body plus the two ~60-token descriptions loaded at startup), cached afterwards. Same number as in Usage above — it's why activating for one trivial question isn't worth it.

Can a skill reduce thinking tokens? Softly at best via instructions — on adaptive-reasoning models like Fable 5, thinking follows the effort setting and MAX_THINKING_TOKENS is ignored (model config docs). That's exactly why /eco-max overrides effort via skill frontmatter and setup proposes a persistent effortLevel.

What can't it fix? The fixed session overhead (system prompt, tool schemas), MCP schema bloat, and your CLAUDE.md size — the guide covers those levers.

Contributing

Issues and PRs welcome — especially benchmark results from other workloads and platforms; the harness makes it a one-liner, please include the raw JSON. If claude-eco stretched your usage window or trimmed your bill, a ⭐ helps other people find it — that's this project's only price.

License

MIT © 2026 Kerim

About

Eco mode for Claude Code. /eco: -31% to -73% output tokens with critical findings intact; /eco-max: up to -75% with lowered effort. Measured hardest on Claude Fable 5 (fable5), deep-studied on Sonnet 5, works on Opus 4.8 too. We publish our negative results. 82 raw benchmark runs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors