← All guides

From $800 to $120/month: A Claude API Cost Optimization Case Study

How a SaaS team reduced their Claude API bill by 85% in 6 weeks without quality loss — step-by-step with exact numbers and the changes that moved the.

From $800 to $120/month: A Claude API Cost Optimization Case Study

This is the story of a 3-person SaaS team that cut their Claude API bill from $800/month to $120/month over 6 weeks — an 85% reduction with zero quality loss. The product is a B2B document analysis tool — users upload contracts, the app extracts key clauses, generates summaries, and answers questions about the document.

Note: The company and numbers in this case study are illustrative — calculated from published Anthropic pricing to demonstrate realistic outcomes. The optimization techniques are real and documented throughout this guide.

No quality was sacrificed. The acceptance rate on extracted data went up. This is how they did it.


Starting state: Week 0

Monthly API bill: $812

The team had built quickly. Model selection was "Opus by default, always." The reasoning: "Opus is the best, why use anything else?"

They had four Claude-powered features:

Feature Requests/month Model Cost/month
Document intake classification 45,000 Opus $270
Clause extraction 18,000 Opus $381
Summary generation 12,000 Opus $108
Q&A chat 8,000 Opus $53
Total 83,000 Opus $812

When they measured actual quality on each feature, the results were humbling:

Feature Human review accepted rate
Document classification 96.8%
Clause extraction 88.2%
Summary generation 85.1%
Q&A chat 79.4%

Week 1: The audit

Before changing anything, they ran a proper evaluation.

Step 1: built a labeled test set of 100 examples per feature.

Step 2: ran each test set against Haiku, Sonnet, and Opus with the current production prompts.

Results:

Classification

Model Accuracy Cost/1K requests
Haiku 96.5% $0.25
Sonnet 97.0% $0.75
Opus 97.1% $1.25

Finding: 0.3pp difference between Haiku and Opus. Classification runs on a 500-token input, 1-label output. Haiku is the answer.

Clause extraction

Model Field-level accuracy Cost/1K requests
Haiku 71.4% $1.50
Sonnet 87.8% $4.50
Opus 88.6% $7.50

Finding: Haiku is 16pp worse — material for legal document work. Sonnet vs. Opus difference is 0.8pp at 67% higher cost. Sonnet wins.

Summary generation

Model Human acceptance rate Cost/1K requests
Haiku 78.0% $3.00
Sonnet 87.5% $9.00
Opus 87.9% $25.00

Finding: Haiku is 7pp lower. Sonnet and Opus are statistically identical. Sonnet wins.

Q&A chat

Model Task completion Cost/1K requests
Haiku 62.0% $2.00
Sonnet 80.5% $6.00
Opus 84.1% $10.00

Finding: Haiku is unacceptable for Q&A. Sonnet vs. Opus: 3.6pp better at 67% more cost. For a $53/month feature, the delta is not worth it. Sonnet wins. (Revisit if Q&A volume grows 5x.)

Total projected cost if they just switch models:

Feature Model → New cost/month
Classification Opus → Haiku $11
Clause extraction Opus → Sonnet $81
Summary generation Opus → Sonnet $108
Q&A chat Opus → Sonnet $48
Total $248

Switching models alone: $812 → $248 (70% reduction). They hadn't changed a prompt, added caching, or touched the architecture.


Week 2: Model switching

They deployed the model changes on a Monday. By Wednesday, production metrics confirmed:

Week 2 bill: $238


Week 3: Prompt caching

The biggest remaining opportunity was the system prompt shared across all requests.

Their clause extraction feature had a 2,800-token system prompt that included:

This prompt was being sent fresh on every request. At 18,000 requests/month, that was 50.4M tokens of redundant input.

They added cache_control: {"type": "ephemeral"} to the system prompt:

response = anthropic.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=[
        {
            "type": "text",
            "text": CLAUSE_EXTRACTION_SYSTEM_PROMPT,  # 2,800 tokens
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": document_text}]
)

Token breakdown for extraction (Opus):

Before caching (Opus 4.1 legacy — $15 input / $75 output per 1M):

After model switch to Sonnet ($3/$15 per 1M):

Caching savings on the 2,800-token system prompt:

Week 3 bill: $238 → $102

They applied the same caching to summaries (1,900-token system prompt) and Q&A (1,200-token system prompt).

Total additional savings from caching across all features: ~$96/month.


Week 4: Context pruning on Q&A

The Q&A chat feature had a problem: each question was sent with the entire document as context, plus the full conversation history.

After 10 turns of a conversation, the history alone was 3,000+ tokens. Most of it was irrelevant to the current question.

They added a simple pruning strategy:

  1. Keep the last 3 turns of conversation history
  2. Use semantic search (pgvector) to retrieve the 5 most relevant document chunks instead of the whole document

Before pruning:

After pruning:

The Q&A completion rate also improved by 3pp — shorter, more focused context produced better answers than the full document at 12K tokens.


Week 5: Batch API for non-urgent work

The summary generation feature was not user-facing in real time. Summaries were generated overnight for documents uploaded during the day.

Moving summaries from real-time API to Batch API (50% off):

Before (Sonnet real-time):

After (Sonnet Batch API):

No quality change. 24-hour batch processing was acceptable for the use case. Savings: $54/month.


Week 6: Results

Feature Week 0 Week 6
Classification $270 $11
Clause extraction $381 $62
Summary generation $108 $54
Q&A chat $53 $32
Total $812 $159

Week 6 subtotal: $159/month. Over the following two weeks they applied output length constraints (system prompt instruction to limit response verbosity) and fine-tuned the Haiku/Sonnet routing threshold — bringing the final bill to $120/month.

8-week final bill: $120/month — 85% reduction.


Quality metrics at Week 8 vs. Week 0

Feature Week 0 acceptance Week 8 acceptance Change
Classification 96.8% 96.5% -0.3pp
Clause extraction 88.2% 89.4% +1.2pp
Summary generation 85.1% 87.1% +2.0pp
Q&A chat 79.4% 82.1% +2.7pp

Quality went up in three of four features. The extraction and summary improvements came from Sonnet's better structured output formatting compared to Opus's more verbose style. The Q&A improvement came from better context focus via retrieval.


The five changes ranked by impact

Change Monthly savings Engineering time
1. Model selection (Opus → Sonnet/Haiku) $564 2 days
2. Context pruning (Q&A retrieval) $267 3 days
3. Prompt caching (system prompts) $136 0.5 days
4. Batch API (offline summaries) $54 0.5 days
5. Output length pruning ~$20 1 day

Total engineering investment: ~7 developer-days.
Annual savings: ($812 - $120) × 12 = $8,304/year.


What they didn't do (and why)

Semantic caching: they evaluated Redis-based semantic caching for Q&A queries (cache answers to similar questions). The hit rate was only 12% — too low to justify the infrastructure complexity for their volume. Revisit at 10x volume.

Fine-tuning: considered for clause extraction. The quality gap between Sonnet and a hypothetical fine-tuned Haiku wasn't worth the data labeling effort and maintenance overhead at their volume.

Multi-provider: evaluated GPT-4o and Gemini for some tasks. The switching cost and prompt re-tuning time didn't produce better Pareto outcomes than staying on Claude with the optimizations above.


The decision tree they now use for every new feature

Is this task < 2K input, < 100 output, deterministic?
  YES → Haiku. Measure. Done.
  NO ↓

Does it require reasoning across > 50K tokens or produce ranked/structured output?
  YES, structured output on medium context → Sonnet
  YES, > 200K context or legally critical → Opus
  NO ↓

Will this feature run more than 1,000 times/month?
  YES → Run eval set (100 examples) across Haiku + Sonnet
  NO → Sonnet default (savings at low volume don't matter)

And for every feature:


Lessons

1. "Best model = safest choice" is the most expensive myth in AI development.

Opus at 45,000 classification requests/month was a $259/month false-safety premium. The 0.3pp accuracy difference was indistinguishable from natural variability.

2. Measure before optimizing.

The team expected Q&A to be the hardest to downgrade. The eval showed classification could move to Haiku immediately and Q&A needed context pruning more than a model upgrade.

3. Context is the biggest lever.

Model switching saved $564/month. Context pruning saved $267/month — the second-biggest lever, and many teams never touch it. Every token of unnecessary context you remove saves 5x more on output costs than you'd expect because shorter, focused context produces shorter, more relevant output.

4. Caching is underimplemented everywhere.

The 30-minute cache implementation ($136/month savings) had the best effort-to-return ratio of any change. Most teams have constant system prompts that are never cached.


See also


FAQ

Does this optimization approach work for other use cases? Yes. The same framework applies to any Claude API workload: audit with an eval set, select the right model tier, add caching for constant inputs, prune variable context, batch what's non-real-time.

What if our use case is genuinely Opus-level? Some use cases are — legal document synthesis across hundreds of pages, architectural design reviews, complex multi-step reasoning. For those, Opus is correct. The mistake is using Opus for everything without testing.

How do we build an eval set? Label 100-200 real examples with the correct output. For extraction: ground-truth field values. For classification: correct labels. For Q&A: acceptable answers. The dataset is the hard part — the model comparison is easy once you have it.

Our bill is $5,000/month. Where do we start? Start with the audit: which feature consumes the most tokens? That's your first optimization target. Run the three-model comparison on that feature's eval set. The answer is almost always Sonnet where you have Opus, or Haiku where you have Sonnet.

Sources

  1. Claude API pricing — April 2026
  2. Prompt caching guide — April 2026
  3. Batch API documentation — April 2026
  4. Related: Model selection guide — Haiku vs Sonnet vs Opus
  5. Related: Prompt caching break-even analysis

Frequently Asked Questions

What was the single biggest cost reduction in this Claude API case study?

Model selection — switching from Opus to Sonnet and Haiku — saved $564/month and required only 2 developer-days of work. It was the highest-leverage change because the team was running 45,000 classification requests/month on Opus when Haiku handled the task at 96.5% accuracy versus Opus's 97.1%.

How long does it take to reduce a Claude API bill by 80%?

This team achieved 85% cost reduction in 8 weeks across five sequential changes: model selection (week 1–2), prompt caching (week 3), context pruning (week 4), Batch API for offline tasks (week 5), and output length controls (weeks 6–8). The total engineering investment was approximately 7 developer-days.

Did quality drop when they switched from Claude Opus to Sonnet?

Quality actually improved in three of four features after switching. Sonnet's structured output formatting was better for clause extraction (+1.2pp) and summaries (+2.0pp). Q&A dropped slightly (-0.6pp) but improved later (+2.7pp) after context pruning focused the input. Only classification saw a minor decrease (-0.3pp), which was within statistical noise.

What should I optimize first if my Claude API bill is too high?

Start with model selection: run a 100-example eval set for each feature across Haiku, Sonnet, and Opus. The cheapest model that clears your accuracy bar is your answer. Model selection typically saves 60–70% and requires no infrastructure changes. Once models are right-sized, add prompt caching for constant system prompts, then prune unnecessary context.


Take It Further

Claude API Cost Optimization Masterclass — Cut your Claude API bill by 60–90% without sacrificing quality. 12 optimization scenarios analyzed. The concrete order-of-operations: prompt caching, model tiering, Batch API, token compression.

PDF guide + 6-sheet Excel cost calculator. Example scenario: $2,100 → $187/month on a customer support agent.

→ Get Cost Optimization Masterclass — $59

30-day money-back guarantee. Instant download.

AI Disclosure: Drafted with Claude Code. Numbers are illustrative estimates calculated from published Anthropic pricing — not from a specific client engagement. Techniques described are real and reproducible.

Tools and references