← All guides

Claude Prompt Caching: When It Pays Off (2026 Break-Even)

The actual break-even math for Claude prompt caching in 2026, with measured examples. A 5-minute cache needs 2 reuses to save money; a 1-hour cache.

Claude prompt caching: when it pays off and when it doesn't (2026 numbers)

Claude prompt caching breaks even at 1.28 reuses for the 5-minute cache and 4 reuses for the 1-hour cache β€” below those thresholds, you pay 25% more than not caching. Above them, you save up to 90% on input tokens. This post derives the break-even math from 2026 pricing and walks through six real workloads to show where caching wins, breaks even, and loses.

For the complete pricing table this analysis is based on, see Claude API pricing 2026.

The pricing (April 2026)

Per 1M tokens, in USD:

Model Input Output Cache write 5m Cache write 1h Cache read
Opus 4.7 $5 $25 $6.25 $10 $0.50
Sonnet 4.6 $3 $15 $3.75 $6 $0.30
Haiku 4.5 $1 $5 $1.25 $2 $0.10

Cache write 5m = 1.25x input price. Cache write 1h = 2x input price. Cache read = 0.1x input price.

The break-even formula

For a prefix of size P tokens reused N times:

Caching is cheaper when:

N * P * input > P * cache_write + N * P * cache_read
⇔ N * input > cache_write + N * cache_read
⇔ N * (input - cache_read) > cache_write
⇔ N > cache_write / (input - cache_read)

Plugging in:

So: 2 reads for 5m, 3-4 reads for 1h. Below that, skip caching.

Six real workload examples

We ran these on our own stack in April 2026.

1. Support chatbot with 8K-token system prompt, 50 users/hr

Cache TTL: 5m. Average reuse: 12/hr within each 5-min window.

2. One-shot code reviewer, 30K-token diff, 1 call per PR

Cache TTL: n/a. No reuse.

3. RAG pipeline with 20K-token retrieved context, 1h cache

Cache TTL: 1h. Reuses depend on deduplication β€” often 1-2 per hour.

4. Agent with 15K-token tool manifest, 5m cache, long conversation

Cache TTL: 5m. Average 8 tool-call roundtrips in 5 min.

5. Batch classifier, 200 items, 10K-token instruction prefix

Cache TTL: 5m. Items processed serially within the 5-min window.

6. Evaluation harness, 40K-token rubric, 500 test cases

Cache TTL: 1h (runs take ~30 min). Reuse = 500.

Common mistakes we made

  1. Caching too early. A new feature with uncertain reuse is safer uncached until you see the pattern.
  2. Wrong TTL. Paying 2x for 1h when your actual reuse window is 5 minutes wastes the difference.
  3. Ignoring minimum cache size. Haiku requires β‰₯1024 tokens cached; Sonnet/Opus require β‰₯2048. Short prefixes get silently ignored by the API.
  4. Cache invalidation confusion. Changing even one byte in the cached prefix produces a new cache. Keep the prefix byte-stable.
  5. Forgetting cost of cache writes on cold starts. First request of the day pays the 1.25x premium even if nothing reuses it.

Decision tree

Is prefix >= 1024 (Haiku) or 2048 (Sonnet/Opus) tokens?
  No  β†’ skip caching
  Yes β†’ Expected reuses within 5 min?
         < 2  β†’ skip caching or use 1h if reuses land within the hour
         >= 2 β†’ Expected reuses within 1 hour?
                 < 4  β†’ use 5m cache
                 >= 4 β†’ use 1h cache

See also


FAQ

Does the cache apply to system prompt only?

No. It applies to any prefix you mark as cacheable β€” system prompt, tools, early user messages. Everything after the cache point is billed at regular rates.

Can I have multiple cache points?

Yes, up to 4 cache_control breakpoints. Useful for layered prefixes (system + tools + static context + dynamic).

Does caching help latency?

Yes β€” measurably. Cached reads typically respond 30-60% faster because the model doesn't re-process the prefix.

Does Batch API stack with caching?

Yes. Batch is 50% off the final per-token price, applied on top of cache discounts. For a step-by-step guide to implementing prompt caching in agent SDK projects specifically, see the prompt caching agent SDK guide.

What about extended thinking?

Extended thinking tokens are billed as output. Caching doesn't change the output portion. But it does reduce the cost of the long system prompt that precedes a thinking-heavy task.

Reproducing these numbers

The repo at github.com/claudeguide/caching-break-even (link added post-publication) contains the raw measurement scripts. Each example is a single command that prints tokens in/out, cache hits, and the per-request cost. The numbers above are averages over 100 runs per scenario.


Part of the Claude API cost optimization series on claudeguide.io. If you want a dashboard that computes this for you automatically across your entire Anthropic usage, claudecosts.app is live and free β€” connect your Admin key, see daily spend by model. For broader strategies to keep agent costs in check beyond caching, see how to limit Claude agent costs.

Related guides

Frequently Asked Questions

How many times does a prompt need to be reused before caching saves money?

For the 5-minute cache, you need at least 2 reuses of the same prefix β€” the break-even is 1.28. For the 1-hour cache, you need at least 3–4 reuses because the write premium is 2x input price instead of 1.25x. Below those thresholds, you pay more with caching than without.

What is the minimum prefix size required for Claude prompt caching?

Haiku requires at least 1,024 tokens in the cached prefix. Sonnet and Opus require at least 2,048 tokens. Shorter prefixes are silently ignored by the API β€” the cache control is accepted but no caching occurs, and you pay normal input rates.

Does Claude prompt caching also reduce latency?

Yes. Cached reads typically respond 30–60% faster than uncached requests because the model does not need to re-process the prefix tokens. This latency benefit is in addition to the 90% cost reduction on cached input tokens.

Can I cache both my system prompt and tool definitions?

Yes. You can place cache_control: {"type": "ephemeral"} on up to 4 breakpoints β€” for example, one after the system prompt and one after the last tool definition. Everything before each breakpoint is cached. Tool schemas at 100–300 tokens each benefit significantly from caching at high request volumes.


Take It Further

Claude API Cost Optimization Masterclass β€” Cut your Claude API bill by 60–90% without sacrificing quality. 12 optimization scenarios analyzed. The concrete order-of-operations: prompt caching, model tiering, Batch API, token compression.

PDF guide + 6-sheet Excel cost calculator. Example scenario: $2,100 β†’ $187/month on a customer support agent.

β†’ Get Cost Optimization Masterclass β€” $59

30-day money-back guarantee. Instant download.

AI Disclosure: Drafted with Claude Code. All pricing from platform.claude.com as of 2026-04-21. Calculations reproducible with the repo linked at the end.

Tools and references