← All guides

Claude 1M Context Window: What It Can Do and What It Costs

Everything you need to know about Claude's 1M token context window — what fits in it, what it costs, when to use it, and when not to.

Claude 1M Context Window: What It Can Do and What It Costs

Claude Opus 4.7 and Claude Sonnet 4.6 support a 1 million token context window — roughly 750,000 words, or the equivalent of 10 average novels. This guide explains what that actually means for your use case, what it costs, and when the extended context is worth it.

What 1M tokens looks like in practice

Content type Fits in 1M tokens
Words (English prose) ~750,000 words
Pages (standard 250 words/page) ~3,000 pages
Code (Python, ~100 tokens/KB) ~10 MB of source code
GitHub repo (median size) ~3-5 repos in full
Legal documents ~500 standard contracts
Emails ~5,000 average emails
Slack messages ~20,000 messages
PDF pages (no images) ~2,500 pages

Practical upper bound: 1M tokens is the technical limit. In practice, Anthropic recommends staying under 800K for reliable output quality. The model's attention degrades at the very edges of a very long context.

Pricing for extended context

Standard context (0-200K tokens) is billed at the normal rate. Beyond 200K, the per-token rate doubles.

Model 0-200K input 200K-1M input Output
Sonnet 4.6 $3.00/1M $6.00/1M $15.00/1M
Opus 4.7 $5.00/1M $10.00/1M $25.00/1M

Real cost example — 800K token request on Opus:

At 100 requests/month: $705/month on input alone. This is the context where selective context matters enormously.

When 1M context is worth it

1. Whole-codebase analysis

When you need Claude to reason across an entire codebase — not just find a file, but understand how components interact — you need the whole thing in context at once.

Use cases:

Alternative to consider first: Claude Code's built-in file navigation (Read, Glob, Grep) lets it explore code without putting everything in context. For 80% of coding tasks, targeted file reading is faster and cheaper.

2. Multi-document synthesis

Legal due diligence, medical record review, financial document analysis, research literature synthesis — tasks where the answer depends on relationships across hundreds of documents.

Use cases:

3. Long conversation history

Agents that run for many turns can use the full history as context for decision-making. A research agent that has made 50 tool calls, read 30 documents, and produced intermediate results can load the entire history for a final synthesis step.

4. Large structured data

When you need Claude to reason over a large dataset — a 100K-row export in CSV form is ~500K tokens — and the reasoning requires seeing all the data rather than a sample. (Note: for data analysis at scale, a database + targeted query is almost always better than loading raw data into context.)

When NOT to use 1M context

1. You don't actually need it

The most common misuse is sending the full codebase when the task only requires 2-3 files. Use targeted file reads first. Save the full-context approach for tasks where the answer genuinely requires reading everything.

Test: can you find the relevant files with Grep/Glob and read just those? If yes, do that.

2. Speed matters

1M token requests have measurably higher latency. Time to first token is longer. If you need a fast response for a user-facing workflow, consider whether you can reduce the context or use a retrieval step.

3. The cost doesn't justify the use case

At $7+ per request, 1M context requests are expensive. For a use case running 1,000 times/month, that is $7,000+ in input alone. The quality premium must be real and measurable.

4. The task is repetitive over sub-documents

If you are summarizing 1,000 individual documents and do not need cross-document reasoning, process them one at a time (or in batches via Batch API). You do not need 1M context to summarize a single 5-page contract.

How to use the 1M context window

Via the API

1M context requires requesting access via the Anthropic Console for some accounts. Once enabled, you use it by simply sending a larger messages array — no special flag required.

import anthropic

client = anthropic.Anthropic()

# Read all your documents
with open("large_document.txt") as f:
    document = f.read()

response = client.messages.create(
    model="claude-opus-4-7",  # or claude-sonnet-4-6
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": f"Analyze this document and find all clauses that could represent liability:\n\n{document}"
        }
    ]
)
print(response.content[0].text)

Checking your context usage

The response object includes usage.input_tokens. Check this to know exactly what you sent:

response = client.messages.create(...)
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")

Combining with prompt caching

For repeated analysis over the same large document (e.g., answering multiple questions about the same contract), use prompt caching to avoid re-billing the input tokens on each call:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=[
        {
            "type": "text",
            "text": large_document_text,
            "cache_control": {"type": "ephemeral"}  # Cache the document
        }
    ],
    messages=[{"role": "user", "content": "What are the termination clauses?"}]
)

# Second call reuses cached document — 90% cheaper on the input
response2 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=[
        {
            "type": "text",
            "text": large_document_text,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "What are the payment terms?"}]
)

With a 700K-token document on Sonnet 4.6:


What Claude actually does with a million tokens

This is the question that matters most for deciding whether to use it.

What works well:

What degrades at very long context:

Mitigation: structure your context so the most important information appears at the beginning and end of the messages array. If you have critical instructions or key documents, place them first.


FAQ

Is 1M context available on Haiku? No. Haiku 4.5 supports up to 200K tokens. Only Sonnet 4.6 and Opus 4.7 support 1M context.

Does context length affect output quality? For tasks within the first 200K tokens of context, quality is equivalent to shorter contexts. For very long contexts, attention degrades slightly in the middle. Plan your context layout accordingly.

Can I use 1M context with the Batch API? Yes. Batch API supports up to 1M context. Pricing is 50% off standard rates, so extended context on Batch API: Sonnet at $3.00/1M for extended tokens (vs. $6.00 standard).

How do I estimate whether I need 1M context? Count your actual tokens with the countTokens endpoint before building. Many tasks that seem to require full context can be handled with targeted retrieval. Build the retrieval version first; upgrade to full context only if quality is insufficient.

What is the maximum output token length? Independent of input context length: 8,192 tokens for most models, 16,000 for Opus 4.7. Input context affects what the model knows, not how much it can generate.

Sources

  1. Anthropic models documentation — April 2026
  2. Claude API pricing — April 2026
  3. Long context best practices — April 2026
AI Disclosure: Drafted with Claude Code; pricing verified against Anthropic's published rates as of April 2026.