How to reduce Claude Code token usage

If you're hitting the 5-hour window or watching your agent spend climb, most of the waste comes from three places: bloated context, repeated file discovery, and verbose output. Here are six techniques that measurably help — and how to find which ones your repo needs.

1. Add a CLAUDE.md that shapes responses

A short repo guide (architecture, commands, conventions, "be terse") is the single biggest lever on output tokens. The agent stops re-deriving context and stops over-explaining. Keep it tight — a bloated CLAUDE.md is its own leak.

2. Add a .claudeignore for generated noise

Exclude node_modules, dist, build, coverage, lockfiles, snapshots, and large fixtures. These pull the agent into low-signal reads and inflate every turn's context.

3. Give the agent a repo map / file:line hints

Static symbol + import hints let the first turn open the right file instead of running Glob → Grep → Read → Read. On large repos this removes several exploration calls per task.

4. Cap the thinking budget on simple tasks

For routine edits, a high thinking budget is pure spend. Lowering MAX_THINKING_TOKENS for simple work keeps reasoning where it pays off.

5. Run exploration on a cheaper model

Use a Haiku subagent for "where is X / how does Y work" exploration in an isolated context, so your main session doesn't pay Opus/Sonnet rates to read files.

6. Compact earlier

Triggering compaction before the context window is jammed keeps each turn smaller and preserves the decisions that matter.

How much does this actually save?

Bundled together, these techniques ship as the open-source context-os (MIT). On a 36-call A/B against identical fixtures, it measured −40.9% total tokens and −35.3% wall-clock (Sonnet; 6/6 prompt-level wins; paired t-test p=5.1e-7). Raw data. Your repo will differ — the only way to know your number is to measure your repo.

Find your repo's leaks in 8 seconds

Paste a public repo into the free scanner and get a leak score + the specific findings above, ranked. If you want it fixed for you, the 48-hour private audit ships a report, a tuned CI gate, and a concrete fix plan.

Run the free scan Request invoice

Limits: these reduce waste; they don't raise provider rate limits or guarantee a specific dollar saving for your repo. Savings depend on repo size, language, and how you currently work.

Independent service. Not affiliated with, endorsed by, or sponsored by Anthropic. "Claude" is a trademark of Anthropic. · Agent Cost Leak Checker