← All guides

Haiku vs Sonnet vs Opus: Which Claude Model for Your Use Case (April 2026)

A decision tree plus nine concrete use cases showing which Claude model is the right default, what you pay, and when you should escalate or downshift.

Haiku vs Sonnet vs Opus: Which Claude Model for Your Use Case (April 2026)

Most teams overspend on Claude API by running Opus for tasks Haiku could handle and overspend on latency by running Haiku for tasks that need Sonnet. The right default is task-dependent, and the answer is almost never "always use the best model." This post is the decision tree, the prices, and nine concrete examples with measured results.

TL;DR

Pricing snapshot — April 2026

Model Input / 1M Output / 1M Cache read / 1M Cache write / 1M
Haiku 4.5 $1.00 $5.00 $0.10 $1.25
Sonnet 4.6 $3.00 $15.00 $0.30 $3.75
Opus 4.7 $5.00 $25.00 $0.50 $6.25

Batch API is 50% off all of the above. 1M context window mode on Sonnet/Opus costs more for inputs beyond 200K tokens (see Claude API pricing 2026).

Three ratios worth memorizing:

  1. Opus is 5x Haiku. A workload that costs $100/month on Haiku costs $500 on Opus.
  2. Output is 5x input. A model swap that reduces output by 30% saves more than the same reduction in input.
  3. Cache read is 10% of normal input. Whichever model you pick, cache aggressively; Module 4 of my cost optimization masterclass goes deep on this.

The decision tree

Apply these four questions in order. Stop at the first "yes."

1. Is this a short classification, extraction, or routing task?

(< 2K input tokens, < 200 output tokens, deterministic-ish.) → Haiku. Always. Tested; the gap to Sonnet on these tasks is within noise.

2. Is this a structured generation with a schema, on a medium context?

(< 30K input, structured JSON or markdown output, no multi-hop reasoning.) → Haiku first; upgrade to Sonnet if eval hit rate drops below your bar.

3. Does the task involve reasoning across a medium-long context, or multi-step logic?

(30K-200K tokens, some synthesis required, errors would be expensive.) → Sonnet. This is its sweet spot.

4. Does the task require deep reasoning, long-context synthesis (>200K), or highest-stakes correctness?

(Legal/medical/security-sensitive, architectural decisions, production code generation where a bug is costly.) → Opus. And even then, test against Sonnet first; about 40% of the time Sonnet is indistinguishable.

The 80/15/5 target

Healthy production Claude traffic typically distributes like:

If your distribution is inverted (80% Opus, 15% Sonnet, 5% Haiku), you are almost certainly overpaying by 4-5x. If it is 100% Haiku, you are probably under-investing in the 15% of tasks that earn the Sonnet upgrade.

Nine concrete use cases, measured

Every example below is from production traffic. Volumes are monthly. All measurements from April 2026.

Use case 1 — Intent classification (chat router)

Task: 800-token input, output is one of 12 labels. Traffic: 120,000/month.

Model Cost/month Accuracy p50 latency
Haiku $36 97.1% 320ms
Sonnet $108 97.4% 560ms
Opus $180 97.3% 940ms

Verdict: Haiku. The 0.3pp accuracy gap does not justify 3-5x cost. Lock it in and move on.

Use case 2 — Extraction from unstructured text

Task: Pull structured fields from 3K-token emails. Traffic: 40,000/month.

Model Cost/month Field-level accuracy
Haiku $18 89%
Sonnet $54 96%
Opus $90 96.5%

Verdict: Sonnet. The 7pp jump from Haiku to Sonnet is material for the downstream workflow; Opus is indistinguishable from Sonnet, not worth the price.

Use case 3 — Code review on pull requests

Task: Review PR diffs (avg 1,200 lines, ~15K tokens), emit structured findings. Traffic: 2,000/month.

Model Cost/month Precision on real bugs
Haiku $1.50 41%
Sonnet $4.50 82%
Opus $7.50 85%

Verdict: Sonnet. Haiku's precision on nuanced bugs is too low to be useful; Opus is only 3pp better for 67% more cost. Sonnet is the answer.

Use case 4 — Customer support reply drafting

Task: Generate first-draft replies to tickets (avg 1,500 input, 250 output). Traffic: 25,000/month.

Model Cost/month Human acceptance rate
Haiku $41 71%
Sonnet $125 88%
Opus $206 89%

Verdict: Sonnet, with a Haiku fast-path for simple categories (FAQ, shipping status). After adding a router that sends simple tickets to Haiku and complex to Sonnet, total cost came out to $64/month at 87% acceptance. See the routing pattern in my model routing guide.

Use case 5 — Long document summarization

Task: Summarize 80K-token legal contracts into a 600-word brief. Traffic: 500/month.

Model Cost/month Factual accuracy (graded)
Haiku $26 78% (misses clauses)
Sonnet $72 92%
Opus $120 96%

Verdict: Opus. For legal content, the 4pp accuracy gap matters because a missed clause is a real liability. This is one of the narrow cases where Opus earns its price.

Use case 6 — SQL generation from natural language

Task: Translate English questions to PostgreSQL queries against a 40-table schema. Traffic: 8,000/month.

Model Cost/month Executes correctly first try
Haiku $5 62%
Sonnet $15 87%
Opus $25 91%

Verdict: Sonnet. The jump from Haiku is large; the jump from Sonnet to Opus is small. Sonnet plus a retry mechanism (caught-error prompts the model to self-correct) reaches 94% at $18/month — better than raw Opus and cheaper.

Use case 7 — Image description for alt text

Task: Caption product photos for accessibility alt text. Traffic: 10,000/month.

Model Cost/month Editor acceptance rate
Haiku $30 81%
Sonnet $90 89%
Opus $150 90%

Verdict: Haiku. 8pp below Sonnet sounds bad, but our human editors accept edit-and-ship at 81% and the remainder are quick rewrites. The 3x cost to Sonnet for 8pp is not worth it at this volume.

Use case 8 — Agentic tool use (research agent)

Task: Multi-turn agent with web search + file tools; answers research questions in 3-8 turns. Traffic: 3,000/month.

Model Cost/month Task completion rate
Haiku $28 58%
Sonnet $85 79%
Opus $140 88%

Verdict: Sonnet default, Opus for hard questions escalated via a classifier. A Haiku classifier deciding Sonnet-vs-Opus adds <$1/month and captures most of Opus's win on the hard 20%. Total: ~$100/month at 85% completion — better than Sonnet alone.

Use case 9 — Architectural design review

Task: Review a technical design doc (30-120K tokens of context including diagrams), emit critique. Traffic: 80/month.

Model Cost/month Reviewer quality (1-10)
Haiku $8 5.2
Sonnet $24 7.4
Opus $40 8.9

Verdict: Opus. Low volume, high value per call. This is exactly the shape of task where Opus is worth it: rare, high-stakes, long context, deep reasoning.

When to escalate from your default

If Haiku is your default for a use case, escalate to Sonnet when:

Escalate from Sonnet to Opus when:

When to downshift from your default

If Opus or Sonnet is your current default, test downshifting when:

Downshifting is under-practiced. Teams hold on to "better model = safer choice" long after the safety has become overkill.

Running a model comparison correctly

The only way to pick right is measurement. A minimum-viable comparison:

  1. Assemble a labeled eval set (100+ examples minimum; more is better).
  2. Run each candidate model against the set with the same prompt.
  3. Score automatically where possible (regex match, SQL execution, JSON schema validation).
  4. Human-score where necessary (subjective quality). Use 2 raters and take agreement.
  5. Compare accuracy AND cost. The dominant choice is whichever Pareto-wins on your metrics.

Most teams skip step 1 and "just try both for a day" — which is not a comparison, it is a vibe check. Vibe checks consistently over-select Opus.

FAQ

What about Claude 4 Sonnet Thinking mode? Extended thinking is a Sonnet/Opus feature that lets the model emit internal reasoning tokens (billed as output). It helps on hard reasoning tasks but adds 30-60% to output cost. Use it for Use Cases 5 and 9 style tasks; skip it for classifications and extractions.

What about models outside the Claude family? For teams with production Claude usage, the cross-provider cost comparison is real but operationally complex. The within-Claude decision is simpler and produces 70% of the possible savings. Make that decision first, then evaluate multi-provider.

Should I use the 1M context window? Only when the task genuinely needs >200K tokens. At 800K input on Opus, a single request is $8 before output. Reserve for document-scale synthesis (Use Case 5, 9).

How often should I re-evaluate model choice? Quarterly. Anthropic ships model updates and price changes; a choice that was right six months ago may be suboptimal now. I re-run my top 3 workloads against all three current-generation models every quarter.

What's the single biggest mistake teams make? Defaulting to Opus "to be safe" on high-volume low-stakes workloads. A chatbot classifier running on Opus at 100K/month requests is a $500 mistake that Haiku handles equally well.

Summary

Match the model to the task with a decision tree and evidence, not reflex. Haiku for 80% of traffic, Sonnet for the valuable 15%, Opus for the rare 5% where it measurably wins. Review quarterly. Measure before and after every change. The teams that do this save 60-80% versus an Opus-default strategy without quality loss — the full playbook is in my Claude API cost optimization masterclass.

AI Disclosure: Drafted with Claude Code; all pricing from Anthropic's April 2026 published rates. All latency and cost numbers measured on production traffic over the two weeks ending 2026-04-18.