Claude API Error Handling: Production Patterns with Retry Logic
Claude API errors fall into two categories: retriable (529 overload, 529 rate limit) and non-retriable (400 bad request, 401 authentication, 404 not found). The production pattern is: retry 529 errors with exponential backoff + jitter, fail fast on 4xx errors, and wrap all API calls in a circuit breaker for sustained outages. This guide covers every error type and provides ready-to-use Python and TypeScript implementations.
Error codes reference
| HTTP Status | Error Type | Retriable? | Action |
|---|---|---|---|
| 400 | invalid_request_error |
No | Fix your request |
| 401 | authentication_error |
No | Check API key |
| 403 | permission_error |
No | Check model access |
| 404 | not_found_error |
No | Check model name |
| 422 | invalid_request_error |
No | Fix request body |
| 429 | rate_limit_error |
Yes | Exponential backoff |
| 500 | api_error |
Yes | Retry with backoff |
| 529 | overload_error |
Yes | Retry with longer backoff |
The baseline retry pattern (Python)
import anthropic
import time
import random
from typing import Optional
client = anthropic.Anthropic()
def call_with_retry(
model: str,
messages: list,
max_tokens: int = 1024,
max_retries: int = 5,
base_delay: float = 1.0,
) -> anthropic.types.Message:
"""
Call the Claude API with exponential backoff for retriable errors.
Raises immediately on non-retriable 4xx errors.
"""
for attempt in range(max_retries):
try:
return client.messages.create(
model=model,
max_tokens=max_tokens,
messages=messages,
)
except anthropic.RateLimitError as e:
if attempt == max_retries - 1:
raise
# Respect Retry-After header if present
retry_after = float(e.response.headers.get("retry-after", base_delay))
delay = retry_after + random.uniform(0, 1) # add jitter
print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1})")
time.sleep(delay)
except anthropic.APIStatusError as e:
if e.status_code in (500, 529):
if attempt == max_retries - 1:
raise
# Exponential backoff with jitter
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
max_delay = 60.0 # cap at 60 seconds
delay = min(delay, max_delay)
print(f"API error {e.status_code}. Retrying in {delay:.1f}s")
time.sleep(delay)
else:
# 4xx errors: fail fast, don't retry
raise
raise RuntimeError("Max retries exceeded")
TypeScript retry implementation
import Anthropic, { APIStatusError, RateLimitError } from "@anthropic-ai/sdk";
const client = new Anthropic();
async function callWithRetry(
model: string,
messages: Anthropic.MessageParam[],
maxTokens = 1024,
maxRetries = 5,
baseDelay = 1000, // ms
): Promise<Anthropic.Message> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await client.messages.create({
model,
max_tokens: maxTokens,
messages,
});
} catch (error) {
if (error instanceof RateLimitError) {
if (attempt === maxRetries - 1) throw error;
const retryAfter =
Number(error.headers?.["retry-after"] ?? baseDelay / 1000) * 1000;
const delay = retryAfter + Math.random() * 1000; // add jitter
console.log(`Rate limited. Retrying in ${delay}ms`);
await sleep(delay);
} else if (error instanceof APIStatusError) {
if ([500, 529].includes(error.status)) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.min(
baseDelay * Math.pow(2, attempt) + Math.random() * 1000,
60_000,
);
console.log(`API error ${error.status}. Retrying in ${delay}ms`);
await sleep(delay);
} else {
throw error; // 4xx: fail fast
}
} else {
throw error; // unknown error: fail fast
}
}
}
throw new Error("Max retries exceeded");
}
const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));
Using the Anthropic SDK's built-in retry
The official Anthropic SDK includes built-in retry logic. You can configure it on the client:
# Python: SDK-level retry
client = anthropic.Anthropic(
max_retries=3, # default is 2
)
// TypeScript: SDK-level retry
const client = new Anthropic({
maxRetries: 3, // default is 2
});
The SDK's built-in retry handles 529 and 500 errors with exponential backoff automatically. Use this for simple use cases. Use the manual patterns above when you need:
- Custom logging and metrics per retry
- Per-request retry configuration (different max_retries for different criticality)
- Circuit breaker integration
Circuit breaker pattern
For production agents that make many API calls, a circuit breaker prevents cascading failures when the API has a sustained outage:
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing, reject calls
HALF_OPEN = "half_open" # Testing recovery
class CircuitBreaker:
def __init__(
self,
failure_threshold: int = 5,
recovery_timeout: float = 60.0,
success_threshold: int = 2,
):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.success_threshold = success_threshold
self.failure_count = 0
self.success_count = 0
self.last_failure_time: Optional[float] = None
self.state = CircuitState.CLOSED
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = CircuitState.HALF_OPEN
self.success_count = 0
else:
raise Exception("Circuit breaker OPEN — API calls rejected")
try:
result = func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
def _on_success(self):
if self.state == CircuitState.HALF_OPEN:
self.success_count += 1
if self.success_count >= self.success_threshold:
self.state = CircuitState.CLOSED
self.failure_count = 0
elif self.state == CircuitState.CLOSED:
self.failure_count = 0
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
# Usage
breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=60)
def safe_api_call(messages):
return breaker.call(
call_with_retry,
"claude-sonnet-4-6",
messages,
)
Timeout handling
The Anthropic SDK uses a 10-minute default timeout. For production, set explicit timeouts per request type:
# Short timeout for simple queries
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=256,
messages=messages,
timeout=30.0, # 30 seconds
)
# Longer timeout for complex generation
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
messages=messages,
timeout=120.0, # 2 minutes
)
For streaming responses, the timeout applies to connection establishment, not the full stream:
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=messages,
timeout=30.0, # timeout for first token
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Error logging and observability
In production, log every error with context for debugging:
import logging
import uuid
logger = logging.getLogger(__name__)
def call_with_observability(messages: list, trace_id: str = None) -> dict:
trace_id = trace_id or str(uuid.uuid4())[:8]
start_time = time.time()
try:
response = call_with_retry("claude-sonnet-4-6", messages)
duration = time.time() - start_time
logger.info(
"claude_api_success",
extra={
"trace_id": trace_id,
"duration_ms": int(duration * 1000),
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"cache_read_tokens": getattr(response.usage, "cache_read_input_tokens", 0),
}
)
return {"success": True, "response": response}
except anthropic.RateLimitError as e:
logger.warning("claude_rate_limit", extra={"trace_id": trace_id})
return {"success": False, "error": "rate_limit", "retriable": True}
except anthropic.APIStatusError as e:
logger.error(
"claude_api_error",
extra={"trace_id": trace_id, "status": e.status_code, "message": str(e)}
)
return {"success": False, "error": str(e), "retriable": e.status_code >= 500}
Frequently asked questions
What is the difference between a 429 and a 529 error?
A 429 (rate_limit_error) means you've exceeded your requests-per-minute or tokens-per-minute limit. A 529 (overload_error) means Anthropic's servers are temporarily overloaded — it's a server-side capacity issue, not your usage rate. Both are retriable, but 529 errors typically resolve faster (seconds to a few minutes).
How do I check my rate limits?
Rate limit headers are included in every API response: anthropic-ratelimit-requests-limit, anthropic-ratelimit-requests-remaining, anthropic-ratelimit-tokens-limit, anthropic-ratelimit-tokens-remaining. Monitor these to proactively throttle before hitting limits.
Should I use streaming to avoid timeouts on long responses? Yes. For responses over ~500 tokens, streaming is recommended. The first token arrives quickly, so the connection stays active. Non-streaming requests wait for the full response, increasing timeout risk on long generations.
How do I handle errors in multi-step agent loops? For each step in the agent loop, decide whether an error should halt the entire agent (non-retriable 4xx) or retry the step (5xx/529). Most agent frameworks retry the current step up to 3 times before failing the overall task. See the Agent SDK patterns guide for structured error handling in agent loops.
How do I make sure my API key and authentication headers are set up correctly?
A 401 authentication_error almost always means a misconfigured key or header. See the API authentication setup guide for the correct header format and environment variable configuration.
Is there a way to test error handling without causing real errors?
The Anthropic SDK supports a test mode using the httpx mock client. Alternatively, wrap your API client in an adapter interface and inject a mock that raises specific error types in your test suite.
Take It Further
Claude Agent SDK Cookbook: 40 Production Patterns — Pattern 12: Retry Logic + Circuit Breaker is covered in depth with production-tested code. Includes patterns for per-step error handling in multi-agent pipelines.
→ Get the Agent SDK Cookbook — $49
30-day money-back guarantee. Instant download.