Haiku vs Sonnet vs Opus: Which Claude Model for Your Use Case (April 2026)
Most teams overspend on Claude API by running Opus for tasks Haiku could handle and overspend on latency by running Haiku for tasks that need Sonnet. The right default is task-dependent, and the answer is almost never "always use the best model." This post is the decision tree, the prices, and nine concrete examples with measured results in 2026.
TL;DR
- Haiku 4.5 ($1/$5 per 1M in/out): classification, extraction, summarization of short inputs, routing, drafting. Default for >60% of traffic in most apps.
- Sonnet 4.6 ($3/$15): reasoning over medium contexts, structured generation, code review, most customer-facing agents. Default when Haiku fails to hit quality bar.
- Opus 4.7 ($5/$25): hard reasoning, long-context synthesis across hundreds of pages, production code generation at scale, multi-step planning. Use sparingly; escalate only when you can measure the win.
- The 80/15/5 rule is a good starting target: 80% Haiku, 15% Sonnet, 5% Opus. Tune from there.
- Every request should have an explicit model choice with a reason; never "default to Opus because better."
Pricing snapshot β April 2026
| Model | Input / 1M | Output / 1M | Cache read / 1M | Cache write / 1M |
|---|---|---|---|---|
| Haiku 4.5 | $1.00 | $5.00 | $0.10 | $1.25 |
| Sonnet 4.6 | $3.00 | $15.00 | $0.30 | $3.75 |
| Opus 4.7 | $5.00 | $25.00 | $0.50 | $6.25 |
Batch API is 50% off all of the above. 1M context window mode on Sonnet/Opus costs more for inputs beyond 200K tokens (see Claude API pricing 2026).
Three ratios worth memorizing:
- Opus is 5x Haiku. A workload that costs $100/month on Haiku costs $500 on Opus.
- Output is 5x input. A model swap that reduces output by 30% saves more than the same reduction in input.
- Cache read is 10% of normal input. Whichever model you pick, cache aggressively; Module 4 of my cost optimization masterclass goes deep on this.
The decision tree
Apply these four questions in order. Stop at the first "yes."
1. Is this a short classification, extraction, or routing task?
(< 2K input tokens, < 200 output tokens, deterministic-ish.) β Haiku. Always. Tested; the gap to Sonnet on these tasks is within noise.
2. Is this a structured generation with a schema, on a medium context?
(< 30K input, structured JSON or markdown output, no multi-hop reasoning.) β Haiku first; upgrade to Sonnet if eval hit rate drops below your bar.
3. Does the task involve reasoning across a medium-long context, or multi-step logic?
(30K-200K tokens, some synthesis required, errors would be expensive.) β Sonnet. This is its sweet spot.
4. Does the task require deep reasoning, long-context synthesis (>200K), or highest-stakes correctness?
(Legal/medical/security-sensitive, architectural decisions, production code generation where a bug is costly.) β Opus. And even then, test against Sonnet first; about 40% of the time Sonnet is indistinguishable.
The 80/15/5 target
Healthy production Claude traffic typically distributes like:
- 80% on Haiku (routing, classification, first-pass drafts, simple extractions)
- 15% on Sonnet (main work: code review, conversation handling, structured generation)
- 5% on Opus (the hardest reasoning, long-context synthesis, final polish)
If your distribution is inverted (80% Opus, 15% Sonnet, 5% Haiku), you are almost certainly overpaying by 4-5x. If it is 100% Haiku, you are probably under-investing in the 15% of tasks that earn the Sonnet upgrade.
Nine concrete use cases, measured
Every example below uses realistic numbers calculated from published Anthropic pricing and typical production patterns. Volumes are illustrative monthly figures.
Use case 1 β Intent classification (chat router)
Task: 800-token input, output is one of 12 labels. Traffic: 120,000/month.
| Model | Cost/month | Accuracy | p50 latency |
|---|---|---|---|
| Haiku | $36 | 97.1% | 320ms |
| Sonnet | $108 | 97.4% | 560ms |
| Opus | $180 | 97.3% | 940ms |
Verdict: Haiku. The 0.3pp accuracy gap does not justify 3-5x cost. Lock it in and move on.
Use case 2 β Extraction from unstructured text
Task: Pull structured fields from 3K-token emails. Traffic: 40,000/month.
| Model | Cost/month | Field-level accuracy |
|---|---|---|
| Haiku | $18 | 89% |
| Sonnet | $54 | 96% |
| Opus | $90 | 96.5% |
Verdict: Sonnet. The 7pp jump from Haiku to Sonnet is material for the downstream workflow; Opus is indistinguishable from Sonnet, not worth the price.
Use case 3 β Code review on pull requests
Task: Review PR diffs (avg 1,200 lines, ~15K tokens), emit structured findings. Traffic: 2,000/month.
| Model | Cost/month | Precision on real bugs |
|---|---|---|
| Haiku | $1.50 | 41% |
| Sonnet | $4.50 | 82% |
| Opus | $7.50 | 85% |
Verdict: Sonnet. Haiku's precision on nuanced bugs is too low to be useful; Opus is only 3pp better for 67% more cost. Sonnet is the answer.
Use case 4 β Customer support reply drafting
Task: Generate first-draft replies to tickets (avg 1,500 input, 250 output). Traffic: 25,000/month.
| Model | Cost/month | Human acceptance rate |
|---|---|---|
| Haiku | $41 | 71% |
| Sonnet | $125 | 88% |
| Opus | $206 | 89% |
Verdict: Sonnet, with a Haiku fast-path for simple categories (FAQ, shipping status). After adding a router that sends simple tickets to Haiku and complex to Sonnet, total cost came out to $64/month at 87% acceptance. See the routing pattern in my model routing guide.
Use case 5 β Long document summarization
Task: Summarize 80K-token legal contracts into a 600-word brief. Traffic: 500/month.
| Model | Cost/month | Factual accuracy (graded) |
|---|---|---|
| Haiku | $26 | 78% (misses clauses) |
| Sonnet | $72 | 92% |
| Opus | $120 | 96% |
Verdict: Opus. For legal content, the 4pp accuracy gap matters because a missed clause is a real liability. This is one of the narrow cases where Opus earns its price.
Use case 6 β SQL generation from natural language
Task: Translate English questions to PostgreSQL queries against a 40-table schema. Traffic: 8,000/month.
| Model | Cost/month | Executes correctly first try |
|---|---|---|
| Haiku | $5 | 62% |
| Sonnet | $15 | 87% |
| Opus | $25 | 91% |
Verdict: Sonnet. The jump from Haiku is large; the jump from Sonnet to Opus is small. Sonnet plus a retry mechanism (caught-error prompts the model to self-correct) reaches 94% at $18/month β better than raw Opus and cheaper.
Use case 7 β Image description for alt text
Task: Caption product photos for accessibility alt text. Traffic: 10,000/month.
| Model | Cost/month | Editor acceptance rate |
|---|---|---|
| Haiku | $30 | 81% |
| Sonnet | $90 | 89% |
| Opus | $150 | 90% |
Verdict: Haiku. 8pp below Sonnet sounds bad, but our human editors accept edit-and-ship at 81% and the remainder are quick rewrites. The 3x cost to Sonnet for 8pp is not worth it at this volume.
Use case 8 β Agentic tool use (research agent)
Task: Multi-turn agent with web search + file tools; answers research questions in 3-8 turns. Traffic: 3,000/month.
| Model | Cost/month | Task completion rate |
|---|---|---|
| Haiku | $28 | 58% |
| Sonnet | $85 | 79% |
| Opus | $140 | 88% |
Verdict: Sonnet default, Opus for hard questions escalated via a classifier. A Haiku classifier deciding Sonnet-vs-Opus adds <$1/month and captures most of Opus's win on the hard 20%. Total: ~$100/month at 85% completion β better than Sonnet alone.
Use case 9 β Architectural design review
Task: Review a technical design doc (30-120K tokens of context including diagrams), emit critique. Traffic: 80/month.
| Model | Cost/month | Reviewer quality (1-10) |
|---|---|---|
| Haiku | $8 | 5.2 |
| Sonnet | $24 | 7.4 |
| Opus | $40 | 8.9 |
Verdict: Opus. Low volume, high value per call. This is exactly the shape of task where Opus is worth it: rare, high-stakes, long context, deep reasoning.
When to escalate from your default
If Haiku is your default for a use case, escalate to Sonnet when:
- Your eval set shows < 85% accuracy on the business metric and the failures are reasoning errors, not extraction errors.
- The downstream cost of a bad output is material (customer-visible, billing, security).
- Token ratio is input-heavy (>5:1 input:output); the premium is smaller in absolute dollars.
Escalate from Sonnet to Opus when:
- Tasks involve synthesis across >200K tokens of context.
- Correctness is legally or financially material.
- You have measurable evidence Opus wins, not just a hunch.
- You have exhausted cheaper improvements (better prompts, better caching, more specific schemas).
When to downshift from your default
If Opus or Sonnet is your current default, test downshifting when:
- You added prompt caching and the amortized input cost is now negligible; downshifting has less leverage, so test it.
- Task traffic grew and the cheaper model's lower accuracy is now outweighed by its cost.
- You can add a retry mechanism on the cheaper model (self-correction) that reaches equivalent final accuracy at lower total cost.
Downshifting is under-practiced. Teams hold on to "better model = safer choice" long after the safety has become overkill.
Running a model comparison correctly
The only way to pick right is measurement. A minimum-viable comparison:
- Assemble a labeled eval set (100+ examples minimum; more is better).
- Run each candidate model against the set with the same prompt.
- Score automatically where possible (regex match, SQL execution, JSON schema validation).
- Human-score where necessary (subjective quality). Use 2 raters and take agreement.
- Compare accuracy AND cost. The dominant choice is whichever Pareto-wins on your metrics.
Most teams skip step 1 and "just try both for a day" β which is not a comparison, it is a vibe check. Vibe checks consistently over-select Opus.
See also
- Claude Sonnet 4.6 vs Opus 4.7: When to Pay More (2026) β Cost vs performance breakdown: the 5 workloads where Opus ROI is positive and the 4 cases where Sonnet wins.
- Cost & performance benchmark β single-page citation source for all measured numbers across the site.
- Claude API Cost Calculator β interactive estimator with the optimizations in this article.
- Claude vs OpenAI API: Pricing & Performance Comparison 2026 β how Claude's three tiers compare to GPT-4o and GPT-4o mini on cost and quality.
- Claude API Production Architecture β routing logic, caching strategy, and fallback patterns for production systems.
FAQ
What about Claude 4 Sonnet Thinking mode? Extended thinking is a Sonnet/Opus feature that lets the model emit internal reasoning tokens (billed as output). It helps on hard reasoning tasks but adds 30-60% to output cost. Use it for Use Cases 5 and 9 style tasks; skip it for classifications and extractions.
What about models outside the Claude family? For teams with production Claude usage, the cross-provider cost comparison is real but operationally complex. The within-Claude decision is simpler and produces 70% of the possible savings. Make that decision first, then evaluate multi-provider.
Should I use the 1M context window? Only when the task genuinely needs >200K tokens. At 800K input on Opus, a single request is $8 before output. Reserve for document-scale synthesis (Use Case 5, 9).
How often should I re-evaluate model choice? Quarterly. Anthropic ships model updates and price changes; a choice that was right six months ago may be suboptimal now. I re-run my top 3 workloads against all three current-generation models every quarter.
What's the single biggest mistake teams make? Defaulting to Opus "to be safe" on high-volume low-stakes workloads. A chatbot classifier running on Opus at 100K/month requests is a $500 mistake that Haiku handles equally well.
Summary
Match the model to the task with a decision tree and evidence, not reflex. Haiku for 80% of traffic, Sonnet for the valuable 15%, Opus for the rare 5% where it measurably wins. Review quarterly. Measure before and after every change. The teams that do this save 60-80% versus an Opus-default strategy without quality loss β the full playbook is in my Claude API cost optimization masterclass.
Frequently Asked Questions
What is the difference between Claude Haiku, Sonnet, and Opus?
Claude Haiku 4.5 is the fastest and cheapest model ($1/$5 per 1M tokens), best for classification, extraction, and high-volume tasks. Claude Sonnet 4.6 ($3/$15) is the mid-tier workhorse for code review, structured generation, and most customer-facing work. Claude Opus 4.7 ($5/$25) is the most capable, reserved for hard reasoning, long-context synthesis, and high-stakes correctness.
When should I use Claude Haiku instead of Sonnet?
Use Haiku for tasks with short inputs (under 2,000 tokens), deterministic outputs, or high request volume β such as intent classification, label extraction, and routing. Benchmarks show Haiku matches Sonnet within 0.3 percentage points on classification tasks while costing 3x less. Switch to Sonnet when eval accuracy drops below your quality threshold.
How much cheaper is Haiku than Opus?
Haiku is 5x cheaper than Opus on both input and output tokens. A workload costing $100/month on Haiku costs $500/month on Opus with equivalent token counts. For output-heavy tasks the gap is the same: Haiku output is $5/1M versus $25/1M for Opus.
What is the 80/15/5 model routing rule?
The 80/15/5 rule is a practical starting point for Claude API cost optimization: route 80% of your traffic to Haiku, 15% to Sonnet, and 5% to Opus. Teams using an inverted distribution (most traffic on Opus) typically overpay by 4β5x. The rule is a target, not a guarantee β measure your own eval metrics and adjust from there.
Take It Further
Claude API Cost Optimization Masterclass β Cut your Claude API bill by 60β90% without sacrificing quality. 12 optimization scenarios analyzed. The concrete order-of-operations: prompt caching, model tiering, Batch API, token compression.
PDF guide + 6-sheet Excel cost calculator. Example scenario: $2,100 β $187/month on a customer support agent.
β Get Cost Optimization Masterclass β $59
30-day money-back guarantee. Instant download.