How accurate is this calculator?
Pricing reflects public Anthropic API rates as of April 2026. The Optimized scenario assumes 80% Haiku / 15% Sonnet / 5% Opus routing, 5-minute prompt caching at 80% hit rate, and Batch API for 50% of traffic. Real workload savings vary — these are best-case approximations.
What is the 80/15/5 model routing rule?
Route 80% of work to Haiku, 15% to Sonnet, 5% to Opus. Typical bill reduction: 60-75% versus Sonnet-everywhere. See Haiku vs Sonnet vs Opus.
When does prompt caching break even?
5-minute cache: 1.28 reuses. 1-hour cache: 4 reuses. See Prompt caching break-even analysis.
Does Batch API affect quality?
No. Same models, same quality. Trade-off is up-to-24h latency for 50% discount. See Batch API guide.
Why is my actual bill higher than this estimate?
Common reasons: (1) you're not actually caching cacheable tokens, (2) tool use adds tokens not modeled here, (3) retries on rate limits, (4) max_tokens not set so responses run long. The Cost Optimization Masterclass walks through diagnosing each.