Claude 1M Context Window: What It Can Do and What It Costs
Claude Opus 4.7 and Claude Sonnet 4.6 support a 1 million token context window — roughly 750,000 words, or the equivalent of 10 average novels. This guide explains what that actually means for your use case, what it costs, and when the extended context is worth it.
What 1M tokens looks like in practice
| Content type | Fits in 1M tokens |
|---|---|
| Words (English prose) | ~750,000 words |
| Pages (standard 250 words/page) | ~3,000 pages |
| Code (Python, ~100 tokens/KB) | ~10 MB of source code |
| GitHub repo (median size) | ~3-5 repos in full |
| Legal documents | ~500 standard contracts |
| Emails | ~5,000 average emails |
| Slack messages | ~20,000 messages |
| PDF pages (no images) | ~2,500 pages |
Practical upper bound: 1M tokens is the technical limit. In practice, Anthropic recommends staying under 800K for reliable output quality. The model's attention degrades at the very edges of a very long context.
Pricing for extended context
Standard context (0-200K tokens) is billed at the normal rate. Beyond 200K, the per-token rate doubles.
| Model | 0-200K input | 200K-1M input | Output |
|---|---|---|---|
| Sonnet 4.6 | $3.00/1M | $6.00/1M | $15.00/1M |
| Opus 4.7 | $5.00/1M | $10.00/1M | $25.00/1M |
Real cost example — 800K token request on Opus:
- First 200K: 200,000 tokens × $5/1M = $1.00
- Remaining 600K: 600,000 tokens × $10/1M = $6.00
- Total input: $7.00 per request
- Plus output: if the response is 2,000 tokens → $0.05
- Single request total: ~$7.05
At 100 requests/month: $705/month on input alone. This is the context where selective context matters enormously.
When 1M context is worth it
1. Whole-codebase analysis
When you need Claude to reason across an entire codebase — not just find a file, but understand how components interact — you need the whole thing in context at once.
Use cases:
- Security audit: finding vulnerability chains across modules
- Architecture review: identifying circular dependencies, anti-patterns
- Refactoring plan: understanding all callers before changing a shared function
- Onboarding doc generation: summarizing the entire codebase for new hires
Alternative to consider first: Claude Code's built-in file navigation (Read, Glob, Grep) lets it explore code without putting everything in context. For 80% of coding tasks, targeted file reading is faster and cheaper.
2. Multi-document synthesis
Legal due diligence, medical record review, financial document analysis, research literature synthesis — tasks where the answer depends on relationships across hundreds of documents.
Use cases:
- Summarizing 200 earnings calls to find recurring themes
- Finding discrepancies across 50 supplier contracts
- Synthesizing 100 research papers into a literature review
- Analyzing a complete audit trail (logs, tickets, emails) for an incident investigation
3. Long conversation history
Agents that run for many turns can use the full history as context for decision-making. A research agent that has made 50 tool calls, read 30 documents, and produced intermediate results can load the entire history for a final synthesis step.
4. Large structured data
When you need Claude to reason over a large dataset — a 100K-row export in CSV form is ~500K tokens — and the reasoning requires seeing all the data rather than a sample. (Note: for data analysis at scale, a database + targeted query is almost always better than loading raw data into context.)
When NOT to use 1M context
1. You don't actually need it
The most common misuse is sending the full codebase when the task only requires 2-3 files. Use targeted file reads first. Save the full-context approach for tasks where the answer genuinely requires reading everything.
Test: can you find the relevant files with Grep/Glob and read just those? If yes, do that.
2. Speed matters
1M token requests have measurably higher latency. Time to first token is longer. If you need a fast response for a user-facing workflow, consider whether you can reduce the context or use a retrieval step.
3. The cost doesn't justify the use case
At $7+ per request, 1M context requests are expensive. For a use case running 1,000 times/month, that is $7,000+ in input alone. The quality premium must be real and measurable.
4. The task is repetitive over sub-documents
If you are summarizing 1,000 individual documents and do not need cross-document reasoning, process them one at a time (or in batches via Batch API). You do not need 1M context to summarize a single 5-page contract.
How to use the 1M context window
Via the API
1M context requires requesting access via the Anthropic Console for some accounts. Once enabled, you use it by simply sending a larger messages array — no special flag required.
import anthropic
client = anthropic.Anthropic()
# Read all your documents
with open("large_document.txt") as f:
document = f.read()
response = client.messages.create(
model="claude-opus-4-7", # or claude-sonnet-4-6
max_tokens=4096,
messages=[
{
"role": "user",
"content": f"Analyze this document and find all clauses that could represent liability:\n\n{document}"
}
]
)
print(response.content[0].text)
Checking your context usage
The response object includes usage.input_tokens. Check this to know exactly what you sent:
response = client.messages.create(...)
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
Combining with prompt caching
For repeated analysis over the same large document (e.g., answering multiple questions about the same contract), use prompt caching to avoid re-billing the input tokens on each call:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=[
{
"type": "text",
"text": large_document_text,
"cache_control": {"type": "ephemeral"} # Cache the document
}
],
messages=[{"role": "user", "content": "What are the termination clauses?"}]
)
# Second call reuses cached document — 90% cheaper on the input
response2 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=[
{
"type": "text",
"text": large_document_text,
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": "What are the payment terms?"}]
)
With a 700K-token document on Sonnet 4.6:
- Without caching: $3/call for first 200K + $6/call for remaining 500K = $4.80 per question
- With caching (after first write): $0.30/1M on cached tokens = $0.21 for 700K tokens per question
- Savings: 96% on repeated queries over the same document
What Claude actually does with a million tokens
This is the question that matters most for deciding whether to use it.
What works well:
- Finding specific information anywhere in the context ("does this contract mention force majeure?")
- Cross-referencing across documents ("does the pricing in the email match the contract?")
- Summarizing the whole into a structured output
- Finding patterns that only emerge from seeing many instances
What degrades at very long context:
- Precise recall of specific facts from the middle of a 1M token context (the "lost in the middle" problem — performance is best at the beginning and end)
- Maintaining a single coherent thread over very long outputs
- Complex multi-step reasoning when the relevant context is scattered across the full 1M
Mitigation: structure your context so the most important information appears at the beginning and end of the messages array. If you have critical instructions or key documents, place them first.
FAQ
Is 1M context available on Haiku? No. Haiku 4.5 supports up to 200K tokens. Only Sonnet 4.6 and Opus 4.7 support 1M context.
Does context length affect output quality? For tasks within the first 200K tokens of context, quality is equivalent to shorter contexts. For very long contexts, attention degrades slightly in the middle. Plan your context layout accordingly.
Can I use 1M context with the Batch API? Yes. Batch API supports up to 1M context. Pricing is 50% off standard rates, so extended context on Batch API: Sonnet at $3.00/1M for extended tokens (vs. $6.00 standard).
How do I estimate whether I need 1M context?
Count your actual tokens with the countTokens endpoint before building. Many tasks that seem to require full context can be handled with targeted retrieval. Build the retrieval version first; upgrade to full context only if quality is insufficient.
What is the maximum output token length? Independent of input context length: 8,192 tokens for most models, 16,000 for Opus 4.7. Input context affects what the model knows, not how much it can generate.
Sources
- Anthropic models documentation — April 2026
- Claude API pricing — April 2026
- Long context best practices — April 2026