Claude API Pricing 2026: Complete Breakdown with Calculators
Anthropic's Claude API uses a per-token pricing model. You pay for tokens consumed — input (what you send) and output (what the model generates). This guide covers every pricing tier, feature, and real-world cost example as of April 2026.
Current pricing table (April 2026)
Standard API
| Model | Input per 1M tokens | Output per 1M tokens |
|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Claude Opus 4.7 | $5.00 | $25.00 |
Prompt caching
| Model | Cache write per 1M | Cache read per 1M |
|---|---|---|
| Claude Haiku 4.5 | $1.25 | $0.10 |
| Claude Sonnet 4.6 | $3.75 | $0.30 |
| Claude Opus 4.7 | $6.25 | $0.50 |
Cache read prices are 10% of standard input prices. Cache writes are 125% of standard input prices.
Batch API (50% off all standard rates)
| Model | Input per 1M tokens | Output per 1M tokens |
|---|---|---|
| Claude Haiku 4.5 | $0.50 | $2.50 |
| Claude Sonnet 4.6 | $1.50 | $7.50 |
| Claude Opus 4.7 | $2.50 | $12.50 |
Batch API processes requests asynchronously within 24 hours. No streaming. Ideal for non-time-sensitive bulk workloads.
1M context window (extended context)
For Sonnet 4.6 and Opus 4.7, input tokens beyond 200K are billed at higher rates. Haiku 4.5 does not support 1M context.
| Context range | Sonnet 4.6 input | Opus 4.7 input |
|---|---|---|
| 0 – 200K tokens | $3.00/1M | $5.00/1M |
| 200K – 1M tokens | $6.00/1M | $10.00/1M |
Output pricing is unchanged regardless of context length.
Three ratios to memorize
1. Output is 5x more expensive than input (for all models). A 1K-token output costs the same as a 5K-token input. Every prompt engineering choice that reduces output length saves 5x more than the same reduction in input.
2. Opus is 5x more expensive than Haiku. A Haiku workload costing $100/month costs $500/month on Opus. Use the cheapest model that clears your quality bar. For a practical guide to matching tasks to models, see Haiku vs Sonnet vs Opus: which model to use.
3. Cache reads are 10% of input price. If the same system prompt is reused across calls, every cache hit saves 90% on that input slice. The break-even is reached at 1.28 cache hits per write. See the prompt caching break-even guide for the full calculation with worked examples.
Worked cost examples
Example 1: High-volume classification
- Task: classify user messages into 12 categories
- Input: 500 tokens (message + system prompt)
- Output: 10 tokens (one label + confidence)
- Volume: 200,000 requests/month
- Model: Haiku 4.5
Calculation:
- Input: 200,000 × 500 tokens = 100M tokens → $100
- Output: 200,000 × 10 tokens = 2M tokens → $10
- Total: $110/month
If you used Opus: $550 input + $50 output = $600/month. That is $490/month wasted.
Example 2: Customer support drafts
- Task: generate reply drafts for support tickets
- Input: 2,000 tokens (ticket + system prompt + few-shot examples)
- Output: 300 tokens (draft reply)
- Volume: 30,000 requests/month
- Model: Sonnet 4.6
- Caching: system prompt (1,200 tokens) cached across all requests
Without caching:
- Input: 30,000 × 2,000 = 60M tokens → $180
- Output: 30,000 × 300 = 9M tokens → $135
- Total: $315/month
With prompt caching:
- Cache write: 1,200 tokens × 1 write = 1,200 tokens → $0.005 (negligible)
- Cache reads: 1,200 tokens × 30,000 = 36M tokens → $10.80
- Non-cached input: 800 tokens × 30,000 = 24M tokens → $72
- Output: unchanged → $135
- Total with caching: $217.80/month (31% savings)
Example 3: Document summarization (1M context)
- Task: summarize 400K-token legal contracts
- Input: 400,000 tokens per request
- Output: 800 tokens per summary
- Volume: 200 requests/month
- Model: Opus 4.7
Calculation:
- First 200K tokens: 200,000 × 200 = 40M tokens → $200
- Extended (200K-400K): 200,000 × 200 = 40M tokens at $10/1M → $400
- Output: 200 × 800 = 160,000 tokens → $4
- Total: $604/month
Note: a 400K-token document on Sonnet 4.6 would cost $200 + $200 = $400 input + $2 output = $402/month — saving $200/month with minimal quality loss in most summarization tasks. Test before assuming Opus is required.
Example 4: Batch API for nightly data enrichment
- Task: enrich 50,000 product records with descriptions
- Input: 300 tokens per record
- Output: 200 tokens per record
- Model: Sonnet 4.6, Batch API
Without batch (standard):
- Input: 50,000 × 300 = 15M tokens → $45
- Output: 50,000 × 200 = 10M tokens → $150
- Total: $195/run
With Batch API:
- Input: 15M tokens at $1.50/1M → $22.50
- Output: 10M tokens at $7.50/1M → $75
- Total: $97.50/run (50% savings)
At twice-weekly runs: $195/week → $97.50/week = $410/month saved.
How to calculate your own costs
Step 1: Estimate token volumes
Use the countTokens API endpoint to measure actual token counts for your prompts rather than estimating:
import anthropic
client = anthropic.Anthropic()
response = client.messages.count_tokens(
model="claude-sonnet-4-6",
system="Your system prompt here",
messages=[{"role": "user", "content": "Sample user message"}],
)
print(f"Input tokens: {response.input_tokens}")
Step 2: Calculate cost
def estimate_monthly_cost(
model: str,
input_tokens_per_request: int,
output_tokens_per_request: int,
requests_per_month: int,
cached_tokens_per_request: int = 0,
) -> dict:
pricing = {
"claude-haiku-4-5": {"input": 1.0, "output": 5.0, "cache_read": 0.10, "cache_write": 1.25},
"claude-sonnet-4-6": {"input": 3.0, "output": 15.0, "cache_read": 0.30, "cache_write": 3.75},
"claude-opus-4-7": {"input": 5.0, "output": 25.0, "cache_read": 0.50, "cache_write": 6.25},
}
p = pricing[model]
non_cached_input = input_tokens_per_request - cached_tokens_per_request
total_input = non_cached_input * requests_per_month
total_output = output_tokens_per_request * requests_per_month
total_cache_reads = cached_tokens_per_request * requests_per_month
cache_write_cost = (cached_tokens_per_request / 1_000_000) * p["cache_write"] # one-time
monthly = {
"input_cost": (total_input / 1_000_000) * p["input"],
"output_cost": (total_output / 1_000_000) * p["output"],
"cache_read_cost": (total_cache_reads / 1_000_000) * p["cache_read"],
"cache_write_cost": cache_write_cost,
}
monthly["total"] = sum(monthly.values())
return monthly
# Example: Sonnet, 2000 input, 300 output, 1200 cached, 30K req/month
result = estimate_monthly_cost(
model="claude-sonnet-4-6",
input_tokens_per_request=2000,
output_tokens_per_request=300,
requests_per_month=30_000,
cached_tokens_per_request=1200,
)
print(result)
What is a token?
A token is approximately 4 characters of English text, or 0.75 words. Common reference points:
| Content | Approximate tokens |
|---|---|
| One tweet | 30-60 |
| One email | 200-800 |
| One blog post (1,500 words) | 2,000 |
| One short story (10K words) | 13,000 |
| One full novel (100K words) | 130,000 |
| Python file (500 lines) | 3,000-8,000 |
| Full codebase (10K files) | Millions |
Code is generally more token-dense than prose because of symbols, brackets, and short variable names.
Rate limits
Rate limits are separate from pricing — they control how many tokens and requests you can send per minute.
| Tier | Requests/min | Input tokens/min | Output tokens/min |
|---|---|---|---|
| Tier 1 (new) | 50 | 50,000 | 10,000 |
| Tier 2 | 1,000 | 100,000 | 32,000 |
| Tier 3 | 2,000 | 200,000 | 64,000 |
| Tier 4 | 4,000 | 400,000 | 128,000 |
Tier upgrades are automatic based on spend history. You can request manual upgrades via the Anthropic Console.
See also
- Cost & performance benchmark — single-page citation source for all measured numbers across the site.
- Claude API Cost Calculator — interactive estimator with the optimizations in this article.
FAQ
Does Claude Code Pro use API credits? Claude Code Pro ($20/month) includes a pool of model usage with Fair Use limits. Very heavy usage consumes the included pool, after which additional usage is billed at standard API rates. The Pro subscription is separate from a raw API key.
Is there a free tier? No free tier on the API. There is a free tier on Claude.ai (the web interface), but it does not provide API access. API access requires a paid account with a credit card.
What currency is billing in? USD. International teams pay in USD and may incur FX conversion fees depending on their payment method.
Are there volume discounts? Not publicly listed as of April 2026. Enterprise contracts negotiated directly with Anthropic may include volume pricing.
How does the Batch API work?
You submit a JSONL file of requests to /v1/messages/batches. Anthropic processes them asynchronously and returns results within 24 hours. Streaming and real-time responses are not supported in Batch mode. For a complete walkthrough with code examples, see the Batch API guide.
What is the maximum context window? 200K tokens standard; 1M tokens with 1M context mode enabled (Sonnet 4.6 and Opus 4.7 only). 1M context mode is not enabled by default and requires requesting access via the Anthropic Console.
How do I monitor my spend? The Anthropic Console (console.anthropic.com) shows real-time token usage and dollar spend by model. You can set budget alerts via the Console's billing settings.
Sources
- Anthropic API pricing — April 2026
- Anthropic API reference — token counting — April 2026
- Batch API documentation — April 2026
Related guides
- Prompt Caching: The 90% Discount Most Devs Miss — the single biggest cost lever, implemented in one line
- Claude Batch API Guide — 50% off all models for async workloads
- Haiku vs Sonnet vs Opus: Which Model? — right-sizing your model choice
Frequently Asked Questions
How much does Claude API cost per 1,000 requests?
Cost depends on tokens per request, not raw request count. At 1,000 input tokens and 300 output tokens per request, Claude Haiku costs roughly $0.0025 per request ($1/1M input + $5/1M output). Claude Sonnet costs about $0.0075, and Opus about $0.0125. Use the countTokens endpoint to measure your actual token counts before estimating.
What is the cheapest Claude API model?
Claude Haiku 4.5 is the cheapest model at $1.00/1M input tokens and $5.00/1M output tokens as of April 2026. The Batch API reduces those rates by 50% to $0.50/$2.50. Combining Haiku with the Batch API gives the lowest possible per-token cost in the Claude lineup.
How does Claude API prompt caching reduce costs?
Prompt caching stores a prefix of your prompt on Anthropic's servers. Subsequent requests that reuse that prefix pay the cache read rate — which is 10% of the normal input price. For a 1,200-token system prompt reused 30,000 times per month on Sonnet, caching saves approximately $108 versus sending it fresh every time.
Does Claude API have a free tier?
No. The Claude API has no free tier — it requires a paid account with a credit card on file. Claude.ai (the consumer web interface) offers a free plan, but that plan does not grant access to the API. API usage is billed per token at the rates published on Anthropic's pricing page.
Take It Further
Claude API Cost Optimization Masterclass — Cut your Claude API bill by 60–90% without sacrificing quality. 12 optimization scenarios analyzed. The concrete order-of-operations: prompt caching, model tiering, Batch API, token compression.
PDF guide + 6-sheet Excel cost calculator. Example scenario: $2,100 → $187/month on a customer support agent.
→ Get Cost Optimization Masterclass — $59
30-day money-back guarantee. Instant download.