Question 1

What's the break-even point for Claude prompt caching?

Accepted Answer

About 1.28 requests per cache TTL window (5 minutes ephemeral or 1 hour extended). Cache writes cost 1.25× input price (5min) or 2.0× (1h); cache reads cost 0.1× input price. Below ~1.28 requests per window, caching costs more than it saves.

Question 2

Should I use 5-minute or 1-hour cache TTL?

Accepted Answer

Use 5-min ephemeral for most cases — write premium is lower (1.25× vs 2.0×) and traffic usually has 5-min bursts. 1-hour extended makes sense for large infrequent prompts (e.g., long codebase context) where you'd otherwise re-write hourly.

Question 3

Does caching work with all Claude models?

Accepted Answer

Yes — Sonnet, Haiku, and Opus all support prompt caching. Pricing scales with each model's input rate.

Question 4

What's the minimum prompt size to cache?

Accepted Answer

1,024 tokens minimum for Haiku, 2,048 tokens for Sonnet and Opus. Below this, the cache_control breakpoint is silently ignored.

Question 5

How is the break-even formula derived?

Accepted Answer

Break-even = (write_cost − read_cost) / (no_cache_cost − read_cost). With 5-min cache: write = input × 1.25, read = input × 0.1, no-cache = input × 1. Solving gives ~1.28 requests per window. The 1-hour cache moves break-even to ~2.11 requests per window.

Prompt Caching Break-Even Calculator

Should you enable prompt caching?

The math, in one line

What this calculator gets right

What this calculator does NOT model

Going deeper