Claude API errors fall into four categories: rate limits (429), authentication (401), invalid requests (400), and server errors (500+). Each category requires a different response: rate limits warrant exponential backoff with jitter, auth errors should page your on-call immediately, bad requests need request inspection, and server errors call for retry with circuit-breaker logic. Getting these four right means your production app survives the edge cases without manual intervention.
What error types does the Claude API return?
The Anthropic SDK maps every HTTP status code to a typed exception. Here is the full reference:
| Error | HTTP Status | SDK Exception | When it happens |
|---|---|---|---|
| Rate limit | 429 | RateLimitError |
Too many requests or tokens per minute |
| Auth error | 401 | AuthenticationError |
Invalid or missing API key |
| Permission | 403 | PermissionDeniedError |
Key lacks required capability |
| Not found | 404 | NotFoundError |
Invalid model name or endpoint |
| Invalid request | 400 | BadRequestError |
Malformed message structure |
| Server error | 500/529 | InternalServerError |
Anthropic-side issue |
| Overloaded | 529 | APIStatusError |
Service temporarily overloaded |
The SDK raises these as Python exceptions (or TypeScript Error subclasses) — you do not need to inspect raw HTTP status codes in your application logic. Catch by exception type and your logic stays clean.
How do you implement exponential backoff for Claude API rate limits in Python?
The pattern below covers RateLimitError and InternalServerError, adds random jitter to avoid thundering-herd problems, and gives up after a configurable number of attempts:
import anthropic
import time
import random
from typing import Callable, TypeVar
T = TypeVar("T")
client = anthropic.Anthropic()
def with_retry(
fn: Callable[[], T],
max_retries: int = 3,
base_delay: float = 1.0,
) -> T:
"""Retry with exponential backoff + jitter."""
for attempt in range(max_retries + 1):
try:
return fn()
except anthropic.RateLimitError:
if attempt == max_retries:
raise
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1})")
time.sleep(delay)
except anthropic.InternalServerError:
if attempt == max_retries:
raise
time.sleep(base_delay * (2 ** attempt))
# Usage
response = with_retry(
lambda: client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)
)
Key decisions in this implementation:
- Jitter is added only to rate-limit retries, not server errors. Server error retries do not need jitter because they are not competing for the same token bucket.
base_delay * (2 ** attempt)gives delays of 1s, 2s, 4s on attempts 1–3. Combined with jitter, real delays are 1.0–2.0s, 2.0–3.0s, 4.0–5.0s.- Do not retry
AuthenticationError,BadRequestError, orPermissionDeniedError. These will not succeed on retry — fix the request or the key first.
How do you handle Claude API errors in TypeScript?
The TypeScript SDK ships with a built-in retry mechanism. You can enable it with a single constructor option:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ maxRetries: 3 }); // Built-in retry!
// Or manual for custom logic:
async function withRetry<T>(
fn: () => Promise<T>,
maxRetries = 3,
): Promise<T> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
if (error instanceof Anthropic.RateLimitError && attempt < maxRetries) {
const delay = Math.pow(2, attempt) * 1000 + Math.random() * 1000;
await new Promise((resolve) => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
throw new Error("Max retries exceeded");
}
The maxRetries constructor option handles RateLimitError and InternalServerError automatically using the same exponential backoff strategy. Use the manual withRetry function only when you need custom logic — for example, switching models mid-retry or logging attempt counts to your observability stack.
What are the best graceful degradation patterns for Claude API failures?
Graceful degradation means your application returns something useful even when the API is unavailable or rate-limited beyond recovery. Four patterns cover most production scenarios:
1. Fallback response
Cache the last successful response for idempotent requests. If all retries fail, return the cached version with a staleness flag. Works well for dashboards, summaries, and classification tasks where slightly stale data is acceptable.
2. Model downgrade
If claude-sonnet-4-5 returns a RateLimitError on final retry, fall back to claude-haiku-4-5. Haiku has a separate token-per-minute quota and a lower per-token cost. Implement this as a model chain: try Sonnet, catch limit, try Haiku, catch limit, raise.
3. Partial response on streaming cutoff
When streaming, buffer tokens as they arrive. If the stream terminates with a network error or InternalServerError, return the buffered partial text to the user rather than an empty response. Prefix it with a notice that the response was truncated.
4. Queue for later processing
For non-interactive workloads — document processing, batch classification, report generation — push failed requests to a durable queue (Redis, SQS, or a local SQLite table). A background worker retries with its own backoff schedule. This decouples user-facing latency from API availability.
How do you monitor Claude API rate limit headers to avoid hitting limits?
The API returns rate limit state in every response header. Reading these proactively lets you throttle before you get a 429:
x-ratelimit-limit-requests— your RPM ceilingx-ratelimit-remaining-requests— requests left in the current windowx-ratelimit-reset-requests— ISO-8601 timestamp when the window resets
In Python, access them via response.http_response.headers. In TypeScript, they are available on the raw response object when you use .withResponse().
A practical approach: if remaining-requests drops below 10% of limit-requests, introduce a short sleep before the next call. This costs a few hundred milliseconds on hot paths but eliminates the retry tax entirely.
What should production error monitoring look like for Claude API integrations?
Log every API call with at minimum: model name, tokens requested, error type, and attempt number. Structure logs as JSON so your observability platform (Datadog, Grafana, CloudWatch) can build metrics from them.
Alert on:
- Error rate above 5% over a 5-minute window — indicates a systemic problem
- Two or more consecutive 500 errors from the same request path — may indicate a model-specific regression
- Any
AuthenticationError— fire an immediate page; your key may have been rotated or revoked
Do not alert on:
- Occasional 429s — these are normal under bursty load and your retry logic handles them
- Single 500 errors — transient, retry handles them
Keep a separate counter for final-attempt failures (retries exhausted). Alert when that counter exceeds your SLA budget. This is the number your on-call engineer actually needs to wake up for.
Frequently asked questions
Does the Anthropic Python SDK have built-in retries?
No — the Python SDK does not retry automatically. You must implement retry logic yourself, as shown in the with_retry example above. The TypeScript SDK does include built-in retries via the maxRetries constructor option.
Should I retry on a 400 BadRequestError?
No. A 400 means your request was malformed — wrong message structure, unsupported parameter, or content that failed moderation. Retrying the same request will produce the same error. Log the request body, fix the schema, and redeploy.
What is error 529 and how is it different from 500?
HTTP 529 is an Anthropic-specific status meaning the service is temporarily overloaded. The SDK may surface it as APIStatusError rather than InternalServerError. Treat it identically to a 500: retry with exponential backoff, but do not retry more than three times in under 30 seconds.
How do I handle timeout errors in the Claude API?
Set an explicit timeout on the client constructor (anthropic.Anthropic(timeout=30.0) in Python). When the request exceeds that threshold, the SDK raises anthropic.APITimeoutError. Treat it like a 500 in your retry logic — it is likely transient and worth one or two retries.
Can I use the Retry-After header from a 429 response?
The Anthropic API does not reliably return a Retry-After header. Use the x-ratelimit-reset-requests header instead: parse the ISO-8601 timestamp and sleep until that time before retrying. If neither header is present, fall back to your exponential backoff default.
Take It Further
Claude Agent SDK Cookbook: 40 Production Patterns — Pattern 8 covers the complete Production Error Handling system: circuit breakers, model fallback chains, dead letter queues for failed requests, cost-aware retry logic, and the monitoring dashboard that distinguishes transient failures from systemic issues.
-> Get the Agent SDK Cookbook — $49
30-day money-back guarantee. Instant download.