Haiku vs Sonnet vs Opus: Which Claude Model for Your Use Case (April 2026)
Most teams overspend on Claude API by running Opus for tasks Haiku could handle and overspend on latency by running Haiku for tasks that need Sonnet. The right default is task-dependent, and the answer is almost never "always use the best model." This post is the decision tree, the prices, and nine concrete examples with measured results.
TL;DR
- Haiku 4.5 ($1/$5 per 1M in/out): classification, extraction, summarization of short inputs, routing, drafting. Default for >60% of traffic in most apps.
- Sonnet 4.6 ($3/$15): reasoning over medium contexts, structured generation, code review, most customer-facing agents. Default when Haiku fails to hit quality bar.
- Opus 4.7 ($5/$25): hard reasoning, long-context synthesis across hundreds of pages, production code generation at scale, multi-step planning. Use sparingly; escalate only when you can measure the win.
- The 80/15/5 rule is a good starting target: 80% Haiku, 15% Sonnet, 5% Opus. Tune from there.
- Every request should have an explicit model choice with a reason; never "default to Opus because better."
Pricing snapshot — April 2026
| Model | Input / 1M | Output / 1M | Cache read / 1M | Cache write / 1M |
|---|---|---|---|---|
| Haiku 4.5 | $1.00 | $5.00 | $0.10 | $1.25 |
| Sonnet 4.6 | $3.00 | $15.00 | $0.30 | $3.75 |
| Opus 4.7 | $5.00 | $25.00 | $0.50 | $6.25 |
Batch API is 50% off all of the above. 1M context window mode on Sonnet/Opus costs more for inputs beyond 200K tokens (see Claude API pricing 2026).
Three ratios worth memorizing:
- Opus is 5x Haiku. A workload that costs $100/month on Haiku costs $500 on Opus.
- Output is 5x input. A model swap that reduces output by 30% saves more than the same reduction in input.
- Cache read is 10% of normal input. Whichever model you pick, cache aggressively; Module 4 of my cost optimization masterclass goes deep on this.
The decision tree
Apply these four questions in order. Stop at the first "yes."
1. Is this a short classification, extraction, or routing task?
(< 2K input tokens, < 200 output tokens, deterministic-ish.) → Haiku. Always. Tested; the gap to Sonnet on these tasks is within noise.
2. Is this a structured generation with a schema, on a medium context?
(< 30K input, structured JSON or markdown output, no multi-hop reasoning.) → Haiku first; upgrade to Sonnet if eval hit rate drops below your bar.
3. Does the task involve reasoning across a medium-long context, or multi-step logic?
(30K-200K tokens, some synthesis required, errors would be expensive.) → Sonnet. This is its sweet spot.
4. Does the task require deep reasoning, long-context synthesis (>200K), or highest-stakes correctness?
(Legal/medical/security-sensitive, architectural decisions, production code generation where a bug is costly.) → Opus. And even then, test against Sonnet first; about 40% of the time Sonnet is indistinguishable.
The 80/15/5 target
Healthy production Claude traffic typically distributes like:
- 80% on Haiku (routing, classification, first-pass drafts, simple extractions)
- 15% on Sonnet (main work: code review, conversation handling, structured generation)
- 5% on Opus (the hardest reasoning, long-context synthesis, final polish)
If your distribution is inverted (80% Opus, 15% Sonnet, 5% Haiku), you are almost certainly overpaying by 4-5x. If it is 100% Haiku, you are probably under-investing in the 15% of tasks that earn the Sonnet upgrade.
Nine concrete use cases, measured
Every example below is from production traffic. Volumes are monthly. All measurements from April 2026.
Use case 1 — Intent classification (chat router)
Task: 800-token input, output is one of 12 labels. Traffic: 120,000/month.
| Model | Cost/month | Accuracy | p50 latency |
|---|---|---|---|
| Haiku | $36 | 97.1% | 320ms |
| Sonnet | $108 | 97.4% | 560ms |
| Opus | $180 | 97.3% | 940ms |
Verdict: Haiku. The 0.3pp accuracy gap does not justify 3-5x cost. Lock it in and move on.
Use case 2 — Extraction from unstructured text
Task: Pull structured fields from 3K-token emails. Traffic: 40,000/month.
| Model | Cost/month | Field-level accuracy |
|---|---|---|
| Haiku | $18 | 89% |
| Sonnet | $54 | 96% |
| Opus | $90 | 96.5% |
Verdict: Sonnet. The 7pp jump from Haiku to Sonnet is material for the downstream workflow; Opus is indistinguishable from Sonnet, not worth the price.
Use case 3 — Code review on pull requests
Task: Review PR diffs (avg 1,200 lines, ~15K tokens), emit structured findings. Traffic: 2,000/month.
| Model | Cost/month | Precision on real bugs |
|---|---|---|
| Haiku | $1.50 | 41% |
| Sonnet | $4.50 | 82% |
| Opus | $7.50 | 85% |
Verdict: Sonnet. Haiku's precision on nuanced bugs is too low to be useful; Opus is only 3pp better for 67% more cost. Sonnet is the answer.
Use case 4 — Customer support reply drafting
Task: Generate first-draft replies to tickets (avg 1,500 input, 250 output). Traffic: 25,000/month.
| Model | Cost/month | Human acceptance rate |
|---|---|---|
| Haiku | $41 | 71% |
| Sonnet | $125 | 88% |
| Opus | $206 | 89% |
Verdict: Sonnet, with a Haiku fast-path for simple categories (FAQ, shipping status). After adding a router that sends simple tickets to Haiku and complex to Sonnet, total cost came out to $64/month at 87% acceptance. See the routing pattern in my model routing guide.
Use case 5 — Long document summarization
Task: Summarize 80K-token legal contracts into a 600-word brief. Traffic: 500/month.
| Model | Cost/month | Factual accuracy (graded) |
|---|---|---|
| Haiku | $26 | 78% (misses clauses) |
| Sonnet | $72 | 92% |
| Opus | $120 | 96% |
Verdict: Opus. For legal content, the 4pp accuracy gap matters because a missed clause is a real liability. This is one of the narrow cases where Opus earns its price.
Use case 6 — SQL generation from natural language
Task: Translate English questions to PostgreSQL queries against a 40-table schema. Traffic: 8,000/month.
| Model | Cost/month | Executes correctly first try |
|---|---|---|
| Haiku | $5 | 62% |
| Sonnet | $15 | 87% |
| Opus | $25 | 91% |
Verdict: Sonnet. The jump from Haiku is large; the jump from Sonnet to Opus is small. Sonnet plus a retry mechanism (caught-error prompts the model to self-correct) reaches 94% at $18/month — better than raw Opus and cheaper.
Use case 7 — Image description for alt text
Task: Caption product photos for accessibility alt text. Traffic: 10,000/month.
| Model | Cost/month | Editor acceptance rate |
|---|---|---|
| Haiku | $30 | 81% |
| Sonnet | $90 | 89% |
| Opus | $150 | 90% |
Verdict: Haiku. 8pp below Sonnet sounds bad, but our human editors accept edit-and-ship at 81% and the remainder are quick rewrites. The 3x cost to Sonnet for 8pp is not worth it at this volume.
Use case 8 — Agentic tool use (research agent)
Task: Multi-turn agent with web search + file tools; answers research questions in 3-8 turns. Traffic: 3,000/month.
| Model | Cost/month | Task completion rate |
|---|---|---|
| Haiku | $28 | 58% |
| Sonnet | $85 | 79% |
| Opus | $140 | 88% |
Verdict: Sonnet default, Opus for hard questions escalated via a classifier. A Haiku classifier deciding Sonnet-vs-Opus adds <$1/month and captures most of Opus's win on the hard 20%. Total: ~$100/month at 85% completion — better than Sonnet alone.
Use case 9 — Architectural design review
Task: Review a technical design doc (30-120K tokens of context including diagrams), emit critique. Traffic: 80/month.
| Model | Cost/month | Reviewer quality (1-10) |
|---|---|---|
| Haiku | $8 | 5.2 |
| Sonnet | $24 | 7.4 |
| Opus | $40 | 8.9 |
Verdict: Opus. Low volume, high value per call. This is exactly the shape of task where Opus is worth it: rare, high-stakes, long context, deep reasoning.
When to escalate from your default
If Haiku is your default for a use case, escalate to Sonnet when:
- Your eval set shows < 85% accuracy on the business metric and the failures are reasoning errors, not extraction errors.
- The downstream cost of a bad output is material (customer-visible, billing, security).
- Token ratio is input-heavy (>5:1 input:output); the premium is smaller in absolute dollars.
Escalate from Sonnet to Opus when:
- Tasks involve synthesis across >200K tokens of context.
- Correctness is legally or financially material.
- You have measurable evidence Opus wins, not just a hunch.
- You have exhausted cheaper improvements (better prompts, better caching, more specific schemas).
When to downshift from your default
If Opus or Sonnet is your current default, test downshifting when:
- You added prompt caching and the amortized input cost is now negligible; downshifting has less leverage, so test it.
- Task traffic grew and the cheaper model's lower accuracy is now outweighed by its cost.
- You can add a retry mechanism on the cheaper model (self-correction) that reaches equivalent final accuracy at lower total cost.
Downshifting is under-practiced. Teams hold on to "better model = safer choice" long after the safety has become overkill.
Running a model comparison correctly
The only way to pick right is measurement. A minimum-viable comparison:
- Assemble a labeled eval set (100+ examples minimum; more is better).
- Run each candidate model against the set with the same prompt.
- Score automatically where possible (regex match, SQL execution, JSON schema validation).
- Human-score where necessary (subjective quality). Use 2 raters and take agreement.
- Compare accuracy AND cost. The dominant choice is whichever Pareto-wins on your metrics.
Most teams skip step 1 and "just try both for a day" — which is not a comparison, it is a vibe check. Vibe checks consistently over-select Opus.
FAQ
What about Claude 4 Sonnet Thinking mode? Extended thinking is a Sonnet/Opus feature that lets the model emit internal reasoning tokens (billed as output). It helps on hard reasoning tasks but adds 30-60% to output cost. Use it for Use Cases 5 and 9 style tasks; skip it for classifications and extractions.
What about models outside the Claude family? For teams with production Claude usage, the cross-provider cost comparison is real but operationally complex. The within-Claude decision is simpler and produces 70% of the possible savings. Make that decision first, then evaluate multi-provider.
Should I use the 1M context window? Only when the task genuinely needs >200K tokens. At 800K input on Opus, a single request is $8 before output. Reserve for document-scale synthesis (Use Case 5, 9).
How often should I re-evaluate model choice? Quarterly. Anthropic ships model updates and price changes; a choice that was right six months ago may be suboptimal now. I re-run my top 3 workloads against all three current-generation models every quarter.
What's the single biggest mistake teams make? Defaulting to Opus "to be safe" on high-volume low-stakes workloads. A chatbot classifier running on Opus at 100K/month requests is a $500 mistake that Haiku handles equally well.
Summary
Match the model to the task with a decision tree and evidence, not reflex. Haiku for 80% of traffic, Sonnet for the valuable 15%, Opus for the rare 5% where it measurably wins. Review quarterly. Measure before and after every change. The teams that do this save 60-80% versus an Opus-default strategy without quality loss — the full playbook is in my Claude API cost optimization masterclass.