← All guides

Claude API Pricing 2026: Complete Breakdown with Calculators

Full Claude API pricing table for April 2026 — Haiku, Sonnet, Opus, caching, batching, 1M context — with worked cost examples for common workloads.

Claude API Pricing 2026: Complete Breakdown with Calculators

Anthropic's Claude API uses a per-token pricing model. You pay for tokens consumed — input (what you send) and output (what the model generates). This guide covers every pricing tier, feature, and real-world cost example as of April 2026.

Current pricing table (April 2026)

Standard API

Model Input per 1M tokens Output per 1M tokens
Claude Haiku 4.5 $1.00 $5.00
Claude Sonnet 4.6 $3.00 $15.00
Claude Opus 4.7 $5.00 $25.00

Prompt caching

Model Cache write per 1M Cache read per 1M
Claude Haiku 4.5 $1.25 $0.10
Claude Sonnet 4.6 $3.75 $0.30
Claude Opus 4.7 $6.25 $0.50

Cache read prices are 10% of standard input prices. Cache writes are 125% of standard input prices.

Batch API (50% off all standard rates)

Model Input per 1M tokens Output per 1M tokens
Claude Haiku 4.5 $0.50 $2.50
Claude Sonnet 4.6 $1.50 $7.50
Claude Opus 4.7 $2.50 $12.50

Batch API processes requests asynchronously within 24 hours. No streaming. Ideal for non-time-sensitive bulk workloads.

1M context window (extended context)

For Sonnet 4.6 and Opus 4.7, input tokens beyond 200K are billed at higher rates. Haiku 4.5 does not support 1M context.

Context range Sonnet 4.6 input Opus 4.7 input
0 – 200K tokens $3.00/1M $5.00/1M
200K – 1M tokens $6.00/1M $10.00/1M

Output pricing is unchanged regardless of context length.


Three ratios to memorize

1. Output is 5x more expensive than input (for all models). A 1K-token output costs the same as a 5K-token input. Every prompt engineering choice that reduces output length saves 5x more than the same reduction in input.

2. Opus is 5x more expensive than Haiku. A Haiku workload costing $100/month costs $500/month on Opus. Use the cheapest model that clears your quality bar.

3. Cache reads are 10% of input price. If the same system prompt is reused across calls, every cache hit saves 90% on that input slice. The break-even is reached at 1.28 cache hits per write. See the prompt caching break-even guide for the full calculation.


Worked cost examples

Example 1: High-volume classification

Calculation:

If you used Opus: $550 input + $50 output = $600/month. That is $490/month wasted.

Example 2: Customer support drafts

Without caching:

With prompt caching:

Example 3: Document summarization (1M context)

Calculation:

Note: a 400K-token document on Sonnet 4.6 would cost $200 + $200 = $400 input + $2 output = $402/month — saving $200/month with minimal quality loss in most summarization tasks. Test before assuming Opus is required.

Example 4: Batch API for nightly data enrichment

Without batch (standard):

With Batch API:

At twice-weekly runs: $195/week → $97.50/week = $410/month saved.


How to calculate your own costs

Step 1: Estimate token volumes

Use the countTokens API endpoint to measure actual token counts for your prompts rather than estimating:

import anthropic

client = anthropic.Anthropic()

response = client.messages.count_tokens(
    model="claude-sonnet-4-6",
    system="Your system prompt here",
    messages=[{"role": "user", "content": "Sample user message"}],
)

print(f"Input tokens: {response.input_tokens}")

Step 2: Calculate cost

def estimate_monthly_cost(
    model: str,
    input_tokens_per_request: int,
    output_tokens_per_request: int,
    requests_per_month: int,
    cached_tokens_per_request: int = 0,
) -> dict:
    pricing = {
        "claude-haiku-4-5": {"input": 1.0, "output": 5.0, "cache_read": 0.10, "cache_write": 1.25},
        "claude-sonnet-4-6": {"input": 3.0, "output": 15.0, "cache_read": 0.30, "cache_write": 3.75},
        "claude-opus-4-7": {"input": 5.0, "output": 25.0, "cache_read": 0.50, "cache_write": 6.25},
    }

    p = pricing[model]
    non_cached_input = input_tokens_per_request - cached_tokens_per_request
    total_input = non_cached_input * requests_per_month
    total_output = output_tokens_per_request * requests_per_month
    total_cache_reads = cached_tokens_per_request * requests_per_month
    cache_write_cost = (cached_tokens_per_request / 1_000_000) * p["cache_write"]  # one-time

    monthly = {
        "input_cost": (total_input / 1_000_000) * p["input"],
        "output_cost": (total_output / 1_000_000) * p["output"],
        "cache_read_cost": (total_cache_reads / 1_000_000) * p["cache_read"],
        "cache_write_cost": cache_write_cost,
    }
    monthly["total"] = sum(monthly.values())
    return monthly

# Example: Sonnet, 2000 input, 300 output, 1200 cached, 30K req/month
result = estimate_monthly_cost(
    model="claude-sonnet-4-6",
    input_tokens_per_request=2000,
    output_tokens_per_request=300,
    requests_per_month=30_000,
    cached_tokens_per_request=1200,
)
print(result)

What is a token?

A token is approximately 4 characters of English text, or 0.75 words. Common reference points:

Content Approximate tokens
One tweet 30-60
One email 200-800
One blog post (1,500 words) 2,000
One short story (10K words) 13,000
One full novel (100K words) 130,000
Python file (500 lines) 3,000-8,000
Full codebase (10K files) Millions

Code is generally more token-dense than prose because of symbols, brackets, and short variable names.


Rate limits

Rate limits are separate from pricing — they control how many tokens and requests you can send per minute.

Tier Requests/min Input tokens/min Output tokens/min
Tier 1 (new) 50 50,000 10,000
Tier 2 1,000 100,000 32,000
Tier 3 2,000 200,000 64,000
Tier 4 4,000 400,000 128,000

Tier upgrades are automatic based on spend history. You can request manual upgrades via the Anthropic Console.


FAQ

Does Claude Code Pro use API credits? Claude Code Pro ($20/month) includes a pool of model usage with Fair Use limits. Very heavy usage consumes the included pool, after which additional usage is billed at standard API rates. The Pro subscription is separate from a raw API key.

Is there a free tier? No free tier on the API. There is a free tier on Claude.ai (the web interface), but it does not provide API access. API access requires a paid account with a credit card.

What currency is billing in? USD. International teams pay in USD and may incur FX conversion fees depending on their payment method.

Are there volume discounts? Not publicly listed as of April 2026. Enterprise contracts negotiated directly with Anthropic may include volume pricing.

How does the Batch API work? You submit a JSONL file of requests to /v1/messages/batches. Anthropic processes them asynchronously and returns results within 24 hours. Streaming and real-time responses are not supported in Batch mode.

What is the maximum context window? 200K tokens standard; 1M tokens with 1M context mode enabled (Sonnet 4.6 and Opus 4.7 only). 1M context mode is not enabled by default and requires requesting access via the Anthropic Console.

How do I monitor my spend? The Anthropic Console (console.anthropic.com) shows real-time token usage and dollar spend by model. You can set budget alerts via the Console's billing settings.

Sources

  1. Anthropic API pricing — April 2026
  2. Anthropic API reference — token counting — April 2026
  3. Batch API documentation — April 2026
AI Disclosure: Drafted with Claude Code; pricing verified against Anthropic's published rates as of April 22, 2026.