← All guides

Claude API Error Handling: Rate Limits, Retries, Patterns

Every Claude API error code explained with production retry strategies, exponential backoff implementation, and circuit breaker patterns.

Claude API Error Handling: Rate Limits, Retries, and Production Patterns

The Anthropic API returns structured errors with specific HTTP status codes. Knowing which errors to retry, which to log and surface to users, and which indicate bugs in your code is the difference between a production-ready integration and one that silently fails. For general Claude API concepts, see the Claude Agent SDK Guide in 2026.

Error code reference

Each row links to a dedicated troubleshooting page with Python + TypeScript code examples (Korean):

HTTP Status Error type Meaning Action
400 invalid_request_error Malformed request β€” bad JSON, unsupported parameters, exceeded context window Fix the request β€” do not retry
401 authentication_error Invalid API key Check key validity β€” do not retry
403 permission_error Valid key but insufficient permissions (e.g. model not enabled) Check account permissions β€” do not retry
404 not_found_error Endpoint or model doesn't exist Fix model name or endpoint β€” do not retry
413 request_too_large Request body exceeds 32MB limit Use Files API for large attachments
422 unprocessable_entity Request valid but semantically wrong (e.g. invalid tool schema) Fix the schema β€” do not retry
429 rate_limit_error Too many requests or tokens per minute Retry with exponential backoff
500 api_error Internal server error Retry with backoff, max 3 attempts
529 overloaded_error API overloaded Retry with longer backoff

Additional HTTP status codes

Status Type Quick fix
502 bad_gateway Retry [3, 10, 30, 60, 120s]
503 service_unavailable Check status.anthropic.com + backoff
504 gateway_timeout Switch to streaming for long outputs

Error subtype deep-dives (ν•œκ΅­μ–΄, code samples)

The critical distinction: 4xx errors (except 429) indicate a problem with your request and should not be retried. 429 and 5xx errors are transient and should be retried. To reduce 400-class errors from oversized contexts, see Claude 1M Context Window for truncation and caching strategies.


Rate limit errors (429)

The most common production error. Rate limits are enforced on:

The Retry-After header in the 429 response tells you exactly how many seconds to wait.

Python:

import anthropic
import time

client = anthropic.Anthropic()

def call_with_retry(
    messages: list,
    model: str = "claude-sonnet-4-6",
    max_retries: int = 5,
    base_delay: float = 1.0,
) -> anthropic.types.Message:
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model=model,
                max_tokens=2048,
                messages=messages,
            )
        except anthropic.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Respect Retry-After header if present
            retry_after = float(
                getattr(e, "response", None) and
                e.response.headers.get("Retry-After", 0) or 0
            )
            wait = max(retry_after, base_delay * (2 ** attempt))
            print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(wait)
        except anthropic.APIStatusError as e:
            if e.status_code >= 500 and attempt < max_retries - 1:
                # 5xx: transient server error, retry
                wait = base_delay * (2 ** attempt)
                print(f"Server error {e.status_code}. Waiting {wait:.1f}s")
                time.sleep(wait)
            else:
                raise  # 4xx or final attempt: re-raise
    raise RuntimeError("Should not reach here")

TypeScript:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function callWithRetry(
  messages: Anthropic.Messages.MessageParam[],
  model = "claude-sonnet-4-6",
  maxRetries = 5,
  baseDelay = 1000
): Promise<Anthropic.Messages.Message> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.messages.create({
        model,
        max_tokens: 2048,
        messages,
      });
    } catch (err) {
      if (err instanceof Anthropic.RateLimitError) {
        if (attempt === maxRetries - 1) throw err;
        const retryAfter = parseInt(err.headers?.["retry-after"] ?? "0") * 1000;
        const wait = Math.max(retryAfter, baseDelay * Math.pow(2, attempt));
        console.log(`Rate limited. Waiting ${wait}ms (attempt ${attempt + 1}/${maxRetries})`);
        await new Promise((r) => setTimeout(r, wait));
        continue;
      }
      if (err instanceof Anthropic.APIError && err.status >= 500) {
        if (attempt === maxRetries - 1) throw err;
        const wait = baseDelay * Math.pow(2, attempt);
        console.log(`Server error ${err.status}. Waiting ${wait}ms`);
        await new Promise((r) => setTimeout(r, wait));
        continue;
      }
      throw err; // 4xx β€” do not retry
    }
  }
  throw new Error("Max retries exceeded");
}

Context window exceeded (400)

When your input exceeds the model's context window, you get a 400 error:

Error: 400 {"type":"error","error":{"type":"invalid_request_error",
"message":"prompt is too long: 205432 tokens > 200000 maximum"}}

Resolution strategies:

  1. Truncate early messages: for conversations, remove the oldest turns first
  2. Summarize then truncate: use Haiku to summarize the oldest portion, replace with summary
  3. Retrieval instead of full context: use pgvector to retrieve relevant chunks instead of full document
  4. Upgrade to 1M context window: for Sonnet 4.6 or Opus 4.7, request 1M context access

Python β€” truncate to fit:

def truncate_to_fit(
    messages: list[dict],
    system_prompt: str,
    model: str,
    max_tokens: int = 180_000,  # Leave headroom below 200K
) -> list[dict]:
    """Remove oldest messages until content fits in context window."""
    while len(messages) > 1:
        # Count tokens
        response = client.messages.count_tokens(
            model=model,
            system=system_prompt,
            messages=messages,
        )
        if response.input_tokens <= max_tokens:
            break
        # Remove oldest exchange (user + assistant pair)
        if len(messages) >= 2:
            messages = messages[2:]
        else:
            messages = messages[1:]
    return messages

Streaming errors

Streaming responses can fail mid-stream. Handle both initial connection errors and mid-stream errors:

import httpx

def stream_with_recovery(prompt: str) -> str:
    collected = []
    try:
        with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            messages=[{"role": "user", "content": prompt}],
        ) as stream:
            for text in stream.text_stream:
                collected.append(text)
                print(text, end="", flush=True)
        return "".join(collected)
    except anthropic.APIConnectionError as e:
        # Network error mid-stream
        partial = "".join(collected)
        if partial:
            # Re-prompt asking Claude to continue from where it stopped
            print(f"\n[Reconnecting after {len(partial)} chars...]")
            continuation = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=2048,
                messages=[
                    {"role": "user", "content": prompt},
                    {"role": "assistant", "content": partial},
                    {"role": "user", "content": "Continue from exactly where you left off."},
                ],
            )
            return partial + continuation.content[0].text
        raise  # No partial content β€” re-raise

Tool use errors

When a tool raises an error, return the error in the tool result rather than raising in your code. This lets the model reason about the error and retry differently:

def safe_tool_call(tool_name: str, tool_input: dict) -> dict:
    """Always return a tool_result, even on error."""
    try:
        result = dispatch_tool(tool_name, tool_input)
        return {
            "type": "tool_result",
            "tool_use_id": tool_use_id,
            "content": result,
        }
    except Exception as e:
        # Return error as content β€” model can retry with different params
        return {
            "type": "tool_result",
            "tool_use_id": tool_use_id,
            "content": f"Error: {type(e).__name__}: {e}",
            "is_error": True,
        }

Why this matters: if you raise an exception instead of returning an error tool result, the conversation is broken β€” the tool_use block exists in the assistant message without a matching tool_result, which is a malformed conversation.


The circuit breaker pattern

For high-volume production systems, wrap your Claude calls with a circuit breaker. After N consecutive failures, stop hitting the API for a cooldown period:

import time
from dataclasses import dataclass, field
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Failing β€” reject calls
    HALF_OPEN = "half_open"  # Testing recovery

@dataclass
class CircuitBreaker:
    failure_threshold: int = 5
    recovery_timeout: float = 60.0  # seconds
    state: CircuitState = CircuitState.CLOSED
    failure_count: int = 0
    last_failure_time: float = 0.0

    def call(self, fn, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit OPEN β€” Claude API calls suspended")

        try:
            result = fn(*args, **kwargs)
            self._on_success()
            return result
        except (anthropic.RateLimitError, anthropic.APIStatusError) as e:
            if getattr(e, "status_code", 0) >= 500:
                self._on_failure()
            raise

    def _on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN
            print(f"Circuit OPEN after {self.failure_count} failures")

Logging and observability

Log every API call with enough context to debug failures later:

import logging
import time

logger = logging.getLogger("claude_api")

def logged_call(messages: list, model: str = "claude-sonnet-4-6") -> anthropic.types.Message:
    start = time.time()
    try:
        response = client.messages.create(
            model=model,
            max_tokens=2048,
            messages=messages,
        )
        duration_ms = (time.time() - start) * 1000
        logger.info(
            "claude_api.success",
            extra={
                "model": model,
                "input_tokens": response.usage.input_tokens,
                "output_tokens": response.usage.output_tokens,
                "duration_ms": round(duration_ms),
                "stop_reason": response.stop_reason,
            },
        )
        return response
    except anthropic.APIStatusError as e:
        duration_ms = (time.time() - start) * 1000
        logger.error(
            "claude_api.error",
            extra={
                "model": model,
                "status_code": e.status_code,
                "error_type": type(e).__name__,
                "duration_ms": round(duration_ms),
            },
        )
        raise

FAQ

Should I retry on 400 errors? No. A 400 means your request is malformed. Retrying will get the same 400. Fix the request before retrying.

What is the default retry behavior in the SDK? The Anthropic Python and TypeScript SDKs retry 429 and 5xx errors automatically with exponential backoff β€” 2 retries by default. Configure via max_retries=N in the client constructor.

How do I disable automatic retries?

client = anthropic.Anthropic(max_retries=0)

What happens to in-flight streaming requests when I'm rate limited? A 429 during a stream interrupts the stream. Handle anthropic.RateLimitError in your streaming code and implement the partial-continuation pattern shown above.

How do I test error handling in development? Use httpretty (Python) or nock (Node.js) to mock specific HTTP responses from the Anthropic endpoint.

Sources

  1. Anthropic API error codes β€” April 2026
  2. Anthropic Python SDK β€” error handling β€” April 2026
  3. Anthropic rate limits β€” April 2026

Frequently Asked Questions

What HTTP status codes should I retry when calling the Claude API?

Retry on 429 (rate limit), 500 (internal server error), 502 (bad gateway), 503 (service unavailable), and 529 (overloaded). Always use exponential backoff and respect the Retry-After header on 429 responses. Never retry 4xx errors other than 429 β€” they indicate a problem with your request that will not resolve on its own.

How do I implement exponential backoff for Claude API rate limit errors?

Catch anthropic.RateLimitError, read the Retry-After header from the response, and wait max(retry_after, base_delay * 2^attempt) seconds before retrying. The Anthropic Python and TypeScript SDKs automatically retry 429 and 5xx errors with 2 retries by default β€” configure with max_retries=N in the client constructor.

What causes a Claude API 400 error and how do I fix it?

A 400 (invalid_request_error) means your request is malformed β€” the most common causes are exceeding the model's context window, invalid JSON in the request body, or an unsupported parameter. Check error.message for the specific reason. Context window overflows are fixed by truncating earlier messages or upgrading to a model with a larger window.

What happens when a Claude API tool call fails mid-conversation?

Return the error as a tool_result with "is_error": true rather than raising an exception. If you raise instead, the conversation becomes malformed β€” the tool_use block in the assistant message has no matching tool_result. Returning the error lets Claude reason about it and attempt a different approach.


Take It Further

Claude API Cost Optimization Masterclass β€” The practical guide to cutting Claude API costs by 60–90% in production. Model tiering, prompt caching, Batch API, and token compression β€” with real numbers from 12 optimization scenarios.

PDF guide + Excel cost calculator.

β†’ Get Cost Optimization Masterclass β€” $59

30-day money-back guarantee. Instant download.

AI Disclosure: Drafted with Claude Code; verified against Anthropic API documentation April 2026.

Tools and references