1.28 reusesPrompt caching break-even — 5-minute cache
At 2026-04 pricing, the 5-minute prompt cache costs more than no caching until you reuse the cached prefix at least 1.28 times. Below that, caching is a net loss; at 2+ reuses, savings climb toward 90% on input tokens.
Source: /claude-api-cost-prompt-caching-break-even · 2026-04
4 reusesPrompt caching break-even — 1-hour cache
The 1-hour cache (beta) costs more upfront than the 5-minute version, so it needs ~4 reuses to break even versus no caching, or ~3.1 reuses to beat the 5-minute cache.
Source: /claude-api-cost-prompt-caching-break-even · 2026-04
$2,100 → $187/month (91% ↓)Customer support agent cost reduction
A production customer support agent reduced its Claude API bill by 91% by combining prompt caching, model tiering (80/15/5 rule), and Batch API for async workloads.
Source: /claude-api-cost-optimization-guide · 2026-04
60-75%Model routing — typical savings
Routing 80% of traffic to Haiku, 15% to Sonnet, 5% to Opus typically reduces overall Claude API spend by 60-75% versus running everything on Sonnet, with no measurable quality regression on classification, extraction, summarization, and translation tasks.
Source: /claude-haiku-sonnet-opus-which-model · 2026-04
Anthropic's Batch API delivers a 50% discount on all model rates for async workloads with up to 24h SLA. Compatible with prompt caching for combined 75-90% savings.
Source: /claude-batch-api-guide · 2026-04