Claude API Error Handling: Rate Limits, Retries, and Production Patterns
The Anthropic API returns structured errors with specific HTTP status codes. Knowing which errors to retry, which to log and surface to users, and which indicate bugs in your code is the difference between a production-ready integration and one that silently fails. For general Claude API concepts, see the Claude Agent SDK Guide in 2026.
Error code reference
Each row links to a dedicated troubleshooting page with Python + TypeScript code examples (Korean):
| HTTP Status | Error type | Meaning | Action |
|---|---|---|---|
| 400 | invalid_request_error |
Malformed request β bad JSON, unsupported parameters, exceeded context window | Fix the request β do not retry |
| 401 | authentication_error |
Invalid API key | Check key validity β do not retry |
| 403 | permission_error |
Valid key but insufficient permissions (e.g. model not enabled) | Check account permissions β do not retry |
| 404 | not_found_error |
Endpoint or model doesn't exist | Fix model name or endpoint β do not retry |
| 413 | request_too_large |
Request body exceeds 32MB limit | Use Files API for large attachments |
| 422 | unprocessable_entity |
Request valid but semantically wrong (e.g. invalid tool schema) | Fix the schema β do not retry |
| 429 | rate_limit_error |
Too many requests or tokens per minute | Retry with exponential backoff |
| 500 | api_error |
Internal server error | Retry with backoff, max 3 attempts |
| 529 | overloaded_error |
API overloaded | Retry with longer backoff |
Additional HTTP status codes
| Status | Type | Quick fix |
|---|---|---|
| 502 | bad_gateway |
Retry [3, 10, 30, 60, 120s] |
| 503 | service_unavailable |
Check status.anthropic.com + backoff |
| 504 | gateway_timeout |
Switch to streaming for long outputs |
Error subtype deep-dives (νκ΅μ΄, code samples)
context_length_exceededβ 컨ν μ€νΈ μ°½ μ΄κ³Ό μ νΈλ¦¬λ°invalid_api_keyβ key νμ κ²μ¦ + νκ²½λ³μ trimmax_tokensβ λͺ¨λΈλ³ 8192 νλ capmodel_not_foundβ μ΅μ λͺ¨λΈ μλ³μprompt_too_longβ λμ conversation μλ trimstreaming_errorβ SSE λκΉ μ resume ν¨ν΄tool_use_errorβ tool_use β tool_result pairing κ²μ¦vision_errorβ μ΄λ―Έμ§ ν¬λ§·/ν¬κΈ° μλ μ κ·νfile_upload_errorβ Files API + beta ν€λbatch_errorβ Batch 10K/250MB νλ κ²μ¦cache_errorβ Prompt Caching cache_control μμΉbilling_errorβ κ²°μ /ν¬λ λ§ λΆμ‘± alert
The critical distinction: 4xx errors (except 429) indicate a problem with your request and should not be retried. 429 and 5xx errors are transient and should be retried. To reduce 400-class errors from oversized contexts, see Claude 1M Context Window for truncation and caching strategies.
Rate limit errors (429)
The most common production error. Rate limits are enforced on:
- Requests per minute (RPM): number of API calls
- Input tokens per minute (ITPM): total input tokens
- Output tokens per minute (OTPM): total output tokens
The Retry-After header in the 429 response tells you exactly how many seconds to wait.
Python:
import anthropic
import time
client = anthropic.Anthropic()
def call_with_retry(
messages: list,
model: str = "claude-sonnet-4-6",
max_retries: int = 5,
base_delay: float = 1.0,
) -> anthropic.types.Message:
for attempt in range(max_retries):
try:
return client.messages.create(
model=model,
max_tokens=2048,
messages=messages,
)
except anthropic.RateLimitError as e:
if attempt == max_retries - 1:
raise
# Respect Retry-After header if present
retry_after = float(
getattr(e, "response", None) and
e.response.headers.get("Retry-After", 0) or 0
)
wait = max(retry_after, base_delay * (2 ** attempt))
print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1}/{max_retries})")
time.sleep(wait)
except anthropic.APIStatusError as e:
if e.status_code >= 500 and attempt < max_retries - 1:
# 5xx: transient server error, retry
wait = base_delay * (2 ** attempt)
print(f"Server error {e.status_code}. Waiting {wait:.1f}s")
time.sleep(wait)
else:
raise # 4xx or final attempt: re-raise
raise RuntimeError("Should not reach here")
TypeScript:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function callWithRetry(
messages: Anthropic.Messages.MessageParam[],
model = "claude-sonnet-4-6",
maxRetries = 5,
baseDelay = 1000
): Promise<Anthropic.Messages.Message> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await client.messages.create({
model,
max_tokens: 2048,
messages,
});
} catch (err) {
if (err instanceof Anthropic.RateLimitError) {
if (attempt === maxRetries - 1) throw err;
const retryAfter = parseInt(err.headers?.["retry-after"] ?? "0") * 1000;
const wait = Math.max(retryAfter, baseDelay * Math.pow(2, attempt));
console.log(`Rate limited. Waiting ${wait}ms (attempt ${attempt + 1}/${maxRetries})`);
await new Promise((r) => setTimeout(r, wait));
continue;
}
if (err instanceof Anthropic.APIError && err.status >= 500) {
if (attempt === maxRetries - 1) throw err;
const wait = baseDelay * Math.pow(2, attempt);
console.log(`Server error ${err.status}. Waiting ${wait}ms`);
await new Promise((r) => setTimeout(r, wait));
continue;
}
throw err; // 4xx β do not retry
}
}
throw new Error("Max retries exceeded");
}
Context window exceeded (400)
When your input exceeds the model's context window, you get a 400 error:
Error: 400 {"type":"error","error":{"type":"invalid_request_error",
"message":"prompt is too long: 205432 tokens > 200000 maximum"}}
Resolution strategies:
- Truncate early messages: for conversations, remove the oldest turns first
- Summarize then truncate: use Haiku to summarize the oldest portion, replace with summary
- Retrieval instead of full context: use pgvector to retrieve relevant chunks instead of full document
- Upgrade to 1M context window: for Sonnet 4.6 or Opus 4.7, request 1M context access
Python β truncate to fit:
def truncate_to_fit(
messages: list[dict],
system_prompt: str,
model: str,
max_tokens: int = 180_000, # Leave headroom below 200K
) -> list[dict]:
"""Remove oldest messages until content fits in context window."""
while len(messages) > 1:
# Count tokens
response = client.messages.count_tokens(
model=model,
system=system_prompt,
messages=messages,
)
if response.input_tokens <= max_tokens:
break
# Remove oldest exchange (user + assistant pair)
if len(messages) >= 2:
messages = messages[2:]
else:
messages = messages[1:]
return messages
Streaming errors
Streaming responses can fail mid-stream. Handle both initial connection errors and mid-stream errors:
import httpx
def stream_with_recovery(prompt: str) -> str:
collected = []
try:
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": prompt}],
) as stream:
for text in stream.text_stream:
collected.append(text)
print(text, end="", flush=True)
return "".join(collected)
except anthropic.APIConnectionError as e:
# Network error mid-stream
partial = "".join(collected)
if partial:
# Re-prompt asking Claude to continue from where it stopped
print(f"\n[Reconnecting after {len(partial)} chars...]")
continuation = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[
{"role": "user", "content": prompt},
{"role": "assistant", "content": partial},
{"role": "user", "content": "Continue from exactly where you left off."},
],
)
return partial + continuation.content[0].text
raise # No partial content β re-raise
Tool use errors
When a tool raises an error, return the error in the tool result rather than raising in your code. This lets the model reason about the error and retry differently:
def safe_tool_call(tool_name: str, tool_input: dict) -> dict:
"""Always return a tool_result, even on error."""
try:
result = dispatch_tool(tool_name, tool_input)
return {
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": result,
}
except Exception as e:
# Return error as content β model can retry with different params
return {
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": f"Error: {type(e).__name__}: {e}",
"is_error": True,
}
Why this matters: if you raise an exception instead of returning an error tool result, the conversation is broken β the tool_use block exists in the assistant message without a matching tool_result, which is a malformed conversation.
The circuit breaker pattern
For high-volume production systems, wrap your Claude calls with a circuit breaker. After N consecutive failures, stop hitting the API for a cooldown period:
import time
from dataclasses import dataclass, field
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing β reject calls
HALF_OPEN = "half_open" # Testing recovery
@dataclass
class CircuitBreaker:
failure_threshold: int = 5
recovery_timeout: float = 60.0 # seconds
state: CircuitState = CircuitState.CLOSED
failure_count: int = 0
last_failure_time: float = 0.0
def call(self, fn, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = CircuitState.HALF_OPEN
else:
raise Exception("Circuit OPEN β Claude API calls suspended")
try:
result = fn(*args, **kwargs)
self._on_success()
return result
except (anthropic.RateLimitError, anthropic.APIStatusError) as e:
if getattr(e, "status_code", 0) >= 500:
self._on_failure()
raise
def _on_success(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
print(f"Circuit OPEN after {self.failure_count} failures")
Logging and observability
Log every API call with enough context to debug failures later:
import logging
import time
logger = logging.getLogger("claude_api")
def logged_call(messages: list, model: str = "claude-sonnet-4-6") -> anthropic.types.Message:
start = time.time()
try:
response = client.messages.create(
model=model,
max_tokens=2048,
messages=messages,
)
duration_ms = (time.time() - start) * 1000
logger.info(
"claude_api.success",
extra={
"model": model,
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"duration_ms": round(duration_ms),
"stop_reason": response.stop_reason,
},
)
return response
except anthropic.APIStatusError as e:
duration_ms = (time.time() - start) * 1000
logger.error(
"claude_api.error",
extra={
"model": model,
"status_code": e.status_code,
"error_type": type(e).__name__,
"duration_ms": round(duration_ms),
},
)
raise
FAQ
Should I retry on 400 errors? No. A 400 means your request is malformed. Retrying will get the same 400. Fix the request before retrying.
What is the default retry behavior in the SDK?
The Anthropic Python and TypeScript SDKs retry 429 and 5xx errors automatically with exponential backoff β 2 retries by default. Configure via max_retries=N in the client constructor.
How do I disable automatic retries?
client = anthropic.Anthropic(max_retries=0)
What happens to in-flight streaming requests when I'm rate limited?
A 429 during a stream interrupts the stream. Handle anthropic.RateLimitError in your streaming code and implement the partial-continuation pattern shown above.
How do I test error handling in development?
Use httpretty (Python) or nock (Node.js) to mock specific HTTP responses from the Anthropic endpoint.
Sources
- Anthropic API error codes β April 2026
- Anthropic Python SDK β error handling β April 2026
- Anthropic rate limits β April 2026
Frequently Asked Questions
What HTTP status codes should I retry when calling the Claude API?
Retry on 429 (rate limit), 500 (internal server error), 502 (bad gateway), 503 (service unavailable), and 529 (overloaded). Always use exponential backoff and respect the Retry-After header on 429 responses. Never retry 4xx errors other than 429 β they indicate a problem with your request that will not resolve on its own.
How do I implement exponential backoff for Claude API rate limit errors?
Catch anthropic.RateLimitError, read the Retry-After header from the response, and wait max(retry_after, base_delay * 2^attempt) seconds before retrying. The Anthropic Python and TypeScript SDKs automatically retry 429 and 5xx errors with 2 retries by default β configure with max_retries=N in the client constructor.
What causes a Claude API 400 error and how do I fix it?
A 400 (invalid_request_error) means your request is malformed β the most common causes are exceeding the model's context window, invalid JSON in the request body, or an unsupported parameter. Check error.message for the specific reason. Context window overflows are fixed by truncating earlier messages or upgrading to a model with a larger window.
What happens when a Claude API tool call fails mid-conversation?
Return the error as a tool_result with "is_error": true rather than raising an exception. If you raise instead, the conversation becomes malformed β the tool_use block in the assistant message has no matching tool_result. Returning the error lets Claude reason about it and attempt a different approach.
Take It Further
Claude API Cost Optimization Masterclass β The practical guide to cutting Claude API costs by 60β90% in production. Model tiering, prompt caching, Batch API, and token compression β with real numbers from 12 optimization scenarios.
PDF guide + Excel cost calculator.
β Get Cost Optimization Masterclass β $59
30-day money-back guarantee. Instant download.