← All guides

Claude API Error Handling: Production Patterns with Retry Logic

Complete guide to handling Claude API errors in production: rate limits, overload errors, timeout patterns, exponential backoff, and circuit breakers in.

Claude API Error Handling: Production Patterns with Retry Logic

Claude API errors fall into two categories: retriable (529 overload, 529 rate limit) and non-retriable (400 bad request, 401 authentication, 404 not found). The production pattern is: retry 529 errors with exponential backoff + jitter, fail fast on 4xx errors, and wrap all API calls in a circuit breaker for sustained outages. This guide covers every error type and provides ready-to-use Python and TypeScript implementations.


Error codes reference

HTTP Status Error Type Retriable? Action
400 invalid_request_error No Fix your request
401 authentication_error No Check API key
403 permission_error No Check model access
404 not_found_error No Check model name
422 invalid_request_error No Fix request body
429 rate_limit_error Yes Exponential backoff
500 api_error Yes Retry with backoff
529 overload_error Yes Retry with longer backoff

The baseline retry pattern (Python)

import anthropic
import time
import random
from typing import Optional

client = anthropic.Anthropic()

def call_with_retry(
    model: str,
    messages: list,
    max_tokens: int = 1024,
    max_retries: int = 5,
    base_delay: float = 1.0,
) -> anthropic.types.Message:
    """
    Call the Claude API with exponential backoff for retriable errors.
    Raises immediately on non-retriable 4xx errors.
    """
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model=model,
                max_tokens=max_tokens,
                messages=messages,
            )

        except anthropic.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Respect Retry-After header if present
            retry_after = float(e.response.headers.get("retry-after", base_delay))
            delay = retry_after + random.uniform(0, 1)  # add jitter
            print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1})")
            time.sleep(delay)

        except anthropic.APIStatusError as e:
            if e.status_code in (500, 529):
                if attempt == max_retries - 1:
                    raise
                # Exponential backoff with jitter
                delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
                max_delay = 60.0  # cap at 60 seconds
                delay = min(delay, max_delay)
                print(f"API error {e.status_code}. Retrying in {delay:.1f}s")
                time.sleep(delay)
            else:
                # 4xx errors: fail fast, don't retry
                raise

    raise RuntimeError("Max retries exceeded")

TypeScript retry implementation

import Anthropic, { APIStatusError, RateLimitError } from "@anthropic-ai/sdk";

const client = new Anthropic();

async function callWithRetry(
  model: string,
  messages: Anthropic.MessageParam[],
  maxTokens = 1024,
  maxRetries = 5,
  baseDelay = 1000, // ms
): Promise<Anthropic.Message> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.messages.create({
        model,
        max_tokens: maxTokens,
        messages,
      });
    } catch (error) {
      if (error instanceof RateLimitError) {
        if (attempt === maxRetries - 1) throw error;
        const retryAfter =
          Number(error.headers?.["retry-after"] ?? baseDelay / 1000) * 1000;
        const delay = retryAfter + Math.random() * 1000; // add jitter
        console.log(`Rate limited. Retrying in ${delay}ms`);
        await sleep(delay);
      } else if (error instanceof APIStatusError) {
        if ([500, 529].includes(error.status)) {
          if (attempt === maxRetries - 1) throw error;
          const delay = Math.min(
            baseDelay * Math.pow(2, attempt) + Math.random() * 1000,
            60_000,
          );
          console.log(`API error ${error.status}. Retrying in ${delay}ms`);
          await sleep(delay);
        } else {
          throw error; // 4xx: fail fast
        }
      } else {
        throw error; // unknown error: fail fast
      }
    }
  }
  throw new Error("Max retries exceeded");
}

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));

Using the Anthropic SDK's built-in retry

The official Anthropic SDK includes built-in retry logic. You can configure it on the client:

# Python: SDK-level retry
client = anthropic.Anthropic(
    max_retries=3,  # default is 2
)
// TypeScript: SDK-level retry
const client = new Anthropic({
  maxRetries: 3, // default is 2
});

The SDK's built-in retry handles 529 and 500 errors with exponential backoff automatically. Use this for simple use cases. Use the manual patterns above when you need:


Circuit breaker pattern

For production agents that make many API calls, a circuit breaker prevents cascading failures when the API has a sustained outage:

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Failing, reject calls
    HALF_OPEN = "half_open" # Testing recovery

class CircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 5,
        recovery_timeout: float = 60.0,
        success_threshold: int = 2,
    ):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.success_threshold = success_threshold
        self.failure_count = 0
        self.success_count = 0
        self.last_failure_time: Optional[float] = None
        self.state = CircuitState.CLOSED

    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = CircuitState.HALF_OPEN
                self.success_count = 0
            else:
                raise Exception("Circuit breaker OPEN — API calls rejected")

        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise

    def _on_success(self):
        if self.state == CircuitState.HALF_OPEN:
            self.success_count += 1
            if self.success_count >= self.success_threshold:
                self.state = CircuitState.CLOSED
                self.failure_count = 0
        elif self.state == CircuitState.CLOSED:
            self.failure_count = 0

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

# Usage
breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=60)

def safe_api_call(messages):
    return breaker.call(
        call_with_retry,
        "claude-sonnet-4-6",
        messages,
    )

Timeout handling

The Anthropic SDK uses a 10-minute default timeout. For production, set explicit timeouts per request type:

# Short timeout for simple queries
response = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=256,
    messages=messages,
    timeout=30.0,  # 30 seconds
)

# Longer timeout for complex generation
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=messages,
    timeout=120.0,  # 2 minutes
)

For streaming responses, the timeout applies to connection establishment, not the full stream:

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=messages,
    timeout=30.0,  # timeout for first token
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Error logging and observability

In production, log every error with context for debugging:

import logging
import uuid

logger = logging.getLogger(__name__)

def call_with_observability(messages: list, trace_id: str = None) -> dict:
    trace_id = trace_id or str(uuid.uuid4())[:8]
    start_time = time.time()

    try:
        response = call_with_retry("claude-sonnet-4-6", messages)
        duration = time.time() - start_time
        logger.info(
            "claude_api_success",
            extra={
                "trace_id": trace_id,
                "duration_ms": int(duration * 1000),
                "input_tokens": response.usage.input_tokens,
                "output_tokens": response.usage.output_tokens,
                "cache_read_tokens": getattr(response.usage, "cache_read_input_tokens", 0),
            }
        )
        return {"success": True, "response": response}

    except anthropic.RateLimitError as e:
        logger.warning("claude_rate_limit", extra={"trace_id": trace_id})
        return {"success": False, "error": "rate_limit", "retriable": True}

    except anthropic.APIStatusError as e:
        logger.error(
            "claude_api_error",
            extra={"trace_id": trace_id, "status": e.status_code, "message": str(e)}
        )
        return {"success": False, "error": str(e), "retriable": e.status_code >= 500}

Frequently asked questions

What is the difference between a 429 and a 529 error? A 429 (rate_limit_error) means you've exceeded your requests-per-minute or tokens-per-minute limit. A 529 (overload_error) means Anthropic's servers are temporarily overloaded — it's a server-side capacity issue, not your usage rate. Both are retriable, but 529 errors typically resolve faster (seconds to a few minutes).

How do I check my rate limits? Rate limit headers are included in every API response: anthropic-ratelimit-requests-limit, anthropic-ratelimit-requests-remaining, anthropic-ratelimit-tokens-limit, anthropic-ratelimit-tokens-remaining. Monitor these to proactively throttle before hitting limits.

Should I use streaming to avoid timeouts on long responses? Yes. For responses over ~500 tokens, streaming is recommended. The first token arrives quickly, so the connection stays active. Non-streaming requests wait for the full response, increasing timeout risk on long generations.

How do I handle errors in multi-step agent loops? For each step in the agent loop, decide whether an error should halt the entire agent (non-retriable 4xx) or retry the step (5xx/529). Most agent frameworks retry the current step up to 3 times before failing the overall task. See the Agent SDK patterns guide for structured error handling in agent loops.

How do I make sure my API key and authentication headers are set up correctly? A 401 authentication_error almost always means a misconfigured key or header. See the API authentication setup guide for the correct header format and environment variable configuration.

Is there a way to test error handling without causing real errors? The Anthropic SDK supports a test mode using the httpx mock client. Alternatively, wrap your API client in an adapter interface and inject a mock that raises specific error types in your test suite.


Take It Further

Claude Agent SDK Cookbook: 40 Production Patterns — Pattern 12: Retry Logic + Circuit Breaker is covered in depth with production-tested code. Includes patterns for per-step error handling in multi-agent pipelines.

→ Get the Agent SDK Cookbook — $49

30-day money-back guarantee. Instant download.

AI Disclosure: Drafted with Claude Code; error codes from official Anthropic API documentation as of April 2026.

Tools and references