Claude API Error Handling: Rate Limits, Retries, and Production Patterns

Q: How do I disable automatic retries?

``python client = anthropic.Anthropic(max_retries=0) ``

Q: What happens to in-flight streaming requests when I'm rate limited?

A 429 during a stream interrupts the stream. Handle anthropic.RateLimitError in your streaming code and implement the partial-continuation pattern shown above.

Q: How do I test error handling in development?

Use httpretty (Python) or nock (Node.js) to mock specific HTTP responses from the Anthropic endpoint.

The Anthropic API returns structured errors with specific HTTP status codes. Knowing which errors to retry, which to log and surface to users, and which indicate bugs in your code is the difference between a production-ready integration and one that silently fails. For general Claude API concepts, see the Claude Agent SDK Guide in 2026.

Error code reference

Each row links to a dedicated troubleshooting page with Python + TypeScript code examples (Korean):

HTTP Status	Error type	Meaning	Action
400	`invalid_request_error`	Malformed request — bad JSON, unsupported parameters, exceeded context window	Fix the request — do not retry
401	`authentication_error`	Invalid API key	Check key validity — do not retry
403	`permission_error`	Valid key but insufficient permissions (e.g. model not enabled)	Check account permissions — do not retry
404	`not_found_error`	Endpoint or model doesn't exist	Fix model name or endpoint — do not retry
413	`request_too_large`	Request body exceeds 32MB limit	Use Files API for large attachments
422	`unprocessable_entity`	Request valid but semantically wrong (e.g. invalid tool schema)	Fix the schema — do not retry
429	`rate_limit_error`	Too many requests or tokens per minute	Retry with exponential backoff
500	`api_error`	Internal server error	Retry with backoff, max 3 attempts
529	`overloaded_error`	API overloaded	Retry with longer backoff

Additional HTTP status codes

Status	Type	Quick fix
502	`bad_gateway`	Retry [3, 10, 30, 60, 120s]
503	`service_unavailable`	Check status.anthropic.com + backoff
504	`gateway_timeout`	Switch to streaming for long outputs

Error subtype deep-dives (한국어, code samples)

context_length_exceeded — 컨텍스트 창 초과 시 트리밍
invalid_api_key — key 형식 검증 + 환경변수 trim
max_tokens — 모델별 8192 한도 cap
model_not_found — 최신 모델 식별자
prompt_too_long — 누적 conversation 자동 trim
streaming_error — SSE 끊김 시 resume 패턴
tool_use_error — tool_use ↔ tool_result pairing 검증
vision_error — 이미지 포맷/크기 자동 정규화
file_upload_error — Files API + beta 헤더
batch_error — Batch 10K/250MB 한도 검증
cache_error — Prompt Caching cache_control 위치
billing_error — 결제/크레딧 부족 alert

The critical distinction: 4xx errors (except 429) indicate a problem with your request and should not be retried. 429 and 5xx errors are transient and should be retried. To reduce 400-class errors from oversized contexts, see Claude 1M Context Window for truncation and caching strategies.

Rate limit errors (429)

The most common production error. Rate limits are enforced on:

Requests per minute (RPM): number of API calls
Input tokens per minute (ITPM): total input tokens
Output tokens per minute (OTPM): total output tokens

The Retry-After header in the 429 response tells you exactly how many seconds to wait.

Python:

import anthropic
import time

client = anthropic.Anthropic()

def call_with_retry(
    messages: list,
    model: str = "claude-sonnet-4-6",
    max_retries: int = 5,
    base_delay: float = 1.0,
) -> anthropic.types.Message:
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model=model,
                max_tokens=2048,
                messages=messages,
            )
        except anthropic.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Respect Retry-After header if present
            retry_after = float(
                getattr(e, "response", None) and
                e.response.headers.get("Retry-After", 0) or 0
            )
            wait = max(retry_after, base_delay * (2 ** attempt))
            print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(wait)
        except anthropic.APIStatusError as e:
            if e.status_code >= 500 and attempt < max_retries - 1:
                # 5xx: transient server error, retry
                wait = base_delay * (2 ** attempt)
                print(f"Server error {e.status_code}. Waiting {wait:.1f}s")
                time.sleep(wait)
            else:
                raise  # 4xx or final attempt: re-raise
    raise RuntimeError("Should not reach here")

TypeScript:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function callWithRetry(
  messages: Anthropic.Messages.MessageParam[],
  model = "claude-sonnet-4-6",
  maxRetries = 5,
  baseDelay = 1000
): Promise<Anthropic.Messages.Message> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.messages.create({
        model,
        max_tokens: 2048,
        messages,
      });
    } catch (err) {
      if (err instanceof Anthropic.RateLimitError) {
        if (attempt === maxRetries - 1) throw err;
        const retryAfter = parseInt(err.headers?.["retry-after"] ?? "0") * 1000;
        const wait = Math.max(retryAfter, baseDelay * Math.pow(2, attempt));
        console.log(`Rate limited. Waiting ${wait}ms (attempt ${attempt + 1}/${maxRetries})`);
        await new Promise((r) => setTimeout(r, wait));
        continue;
      }
      if (err instanceof Anthropic.APIError && err.status >= 500) {
        if (attempt === maxRetries - 1) throw err;
        const wait = baseDelay * Math.pow(2, attempt);
        console.log(`Server error ${err.status}. Waiting ${wait}ms`);
        await new Promise((r) => setTimeout(r, wait));
        continue;
      }
      throw err; // 4xx — do not retry
    }
  }
  throw new Error("Max retries exceeded");
}

Context window exceeded (400)

When your input exceeds the model's context window, you get a 400 error:

Error: 400 {"type":"error","error":{"type":"invalid_request_error",
"message":"prompt is too long: 205432 tokens > 200000 maximum"}}

Resolution strategies:

Truncate early messages: for conversations, remove the oldest turns first
Summarize then truncate: use Haiku to summarize the oldest portion, replace with summary
Retrieval instead of full context: use pgvector to retrieve relevant chunks instead of full document
Upgrade to 1M context window: for Sonnet 4.6 or Opus 4.7, request 1M context access

Python — truncate to fit:

def truncate_to_fit(
    messages: list[dict],
    system_prompt: str,
    model: str,
    max_tokens: int = 180_000,  # Leave headroom below 200K
) -> list[dict]:
    """Remove oldest messages until content fits in context window."""
    while len(messages) > 1:
        # Count tokens
        response = client.messages.count_tokens(
            model=model,
            system=system_prompt,
            messages=messages,
        )
        if response.input_tokens <= max_tokens:
            break
        # Remove oldest exchange (user + assistant pair)
        if len(messages) >= 2:
            messages = messages[2:]
        else:
            messages = messages[1:]
    return messages

Streaming errors

Streaming responses can fail mid-stream. Handle both initial connection errors and mid-stream errors:

import httpx

def stream_with_recovery(prompt: str) -> str:
    collected = []
    try:
        with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            messages=[{"role": "user", "content": prompt}],
        ) as stream:
            for text in stream.text_stream:
                collected.append(text)
                print(text, end="", flush=True)
        return "".join(collected)
    except anthropic.APIConnectionError as e:
        # Network error mid-stream
        partial = "".join(collected)
        if partial:
            # Re-prompt asking Claude to continue from where it stopped
            print(f"\n[Reconnecting after {len(partial)} chars...]")
            continuation = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=2048,
                messages=[
                    {"role": "user", "content": prompt},
                    {"role": "assistant", "content": partial},
                    {"role": "user", "content": "Continue from exactly where you left off."},
                ],
            )
            return partial + continuation.content[0].text
        raise  # No partial content — re-raise

Tool use errors

When a tool raises an error, return the error in the tool result rather than raising in your code. This lets the model reason about the error and retry differently:

def safe_tool_call(tool_name: str, tool_input: dict) -> dict:
    """Always return a tool_result, even on error."""
    try:
        result = dispatch_tool(tool_name, tool_input)
        return {
            "type": "tool_result",
            "tool_use_id": tool_use_id,
            "content": result,
        }
    except Exception as e:
        # Return error as content — model can retry with different params
        return {
            "type": "tool_result",
            "tool_use_id": tool_use_id,
            "content": f"Error: {type(e).__name__}: {e}",
            "is_error": True,
        }

Why this matters: if you raise an exception instead of returning an error tool result, the conversation is broken — the tool_use block exists in the assistant message without a matching tool_result, which is a malformed conversation.

The circuit breaker pattern

For high-volume production systems, wrap your Claude calls with a circuit breaker. After N consecutive failures, stop hitting the API for a cooldown period:

import time
from dataclasses import dataclass, field
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Failing — reject calls
    HALF_OPEN = "half_open"  # Testing recovery

@dataclass
class CircuitBreaker:
    failure_threshold: int = 5
    recovery_timeout: float = 60.0  # seconds
    state: CircuitState = CircuitState.CLOSED
    failure_count: int = 0
    last_failure_time: float = 0.0

    def call(self, fn, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit OPEN — Claude API calls suspended")

        try:
            result = fn(*args, **kwargs)
            self._on_success()
            return result
        except (anthropic.RateLimitError, anthropic.APIStatusError) as e:
            if getattr(e, "status_code", 0) >= 500:
                self._on_failure()
            raise

    def _on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN
            print(f"Circuit OPEN after {self.failure_count} failures")

Logging and observability

Log every API call with enough context to debug failures later:

import logging
import time

logger = logging.getLogger("claude_api")

def logged_call(messages: list, model: str = "claude-sonnet-4-6") -> anthropic.types.Message:
    start = time.time()
    try:
        response = client.messages.create(
            model=model,
            max_tokens=2048,
            messages=messages,
        )
        duration_ms = (time.time() - start) * 1000
        logger.info(
            "claude_api.success",
            extra={
                "model": model,
                "input_tokens": response.usage.input_tokens,
                "output_tokens": response.usage.output_tokens,
                "duration_ms": round(duration_ms),
                "stop_reason": response.stop_reason,
            },
        )
        return response
    except anthropic.APIStatusError as e:
        duration_ms = (time.time() - start) * 1000
        logger.error(
            "claude_api.error",
            extra={
                "model": model,
                "status_code": e.status_code,
                "error_type": type(e).__name__,
                "duration_ms": round(duration_ms),
            },
        )
        raise

FAQ

Should I retry on 400 errors? No. A 400 means your request is malformed. Retrying will get the same 400. Fix the request before retrying.

What is the default retry behavior in the SDK? The Anthropic Python and TypeScript SDKs retry 429 and 5xx errors automatically with exponential backoff — 2 retries by default. Configure via max_retries=N in the client constructor.

How do I disable automatic retries?

client = anthropic.Anthropic(max_retries=0)

What happens to in-flight streaming requests when I'm rate limited? A 429 during a stream interrupts the stream. Handle anthropic.RateLimitError in your streaming code and implement the partial-continuation pattern shown above.

How do I test error handling in development? Use httpretty (Python) or nock (Node.js) to mock specific HTTP responses from the Anthropic endpoint.

Sources

Anthropic API error codes — April 2026
Anthropic Python SDK — error handling — April 2026
Anthropic rate limits — April 2026

Frequently Asked Questions

What HTTP status codes should I retry when calling the Claude API?

Retry on 429 (rate limit), 500 (internal server error), 502 (bad gateway), 503 (service unavailable), and 529 (overloaded). Always use exponential backoff and respect the Retry-After header on 429 responses. Never retry 4xx errors other than 429 — they indicate a problem with your request that will not resolve on its own.

How do I implement exponential backoff for Claude API rate limit errors?

Catch anthropic.RateLimitError, read the Retry-After header from the response, and wait max(retry_after, base_delay * 2^attempt) seconds before retrying. The Anthropic Python and TypeScript SDKs automatically retry 429 and 5xx errors with 2 retries by default — configure with max_retries=N in the client constructor.

What causes a Claude API 400 error and how do I fix it?

A 400 (invalid_request_error) means your request is malformed — the most common causes are exceeding the model's context window, invalid JSON in the request body, or an unsupported parameter. Check error.message for the specific reason. Context window overflows are fixed by truncating earlier messages or upgrading to a model with a larger window.

What happens when a Claude API tool call fails mid-conversation?

Return the error as a tool_result with "is_error": true rather than raising an exception. If you raise instead, the conversation becomes malformed — the tool_use block in the assistant message has no matching tool_result. Returning the error lets Claude reason about it and attempt a different approach.

Take It Further

Claude API Cost Optimization Masterclass — The practical guide to cutting Claude API costs by 60–90% in production. Model tiering, prompt caching, Batch API, and token compression — with real numbers from 12 optimization scenarios.

PDF guide + Excel cost calculator.

→ Get Cost Optimization Masterclass — $59

30-day money-back guarantee. Instant download.

Claude API Error Handling: Rate Limits, Retries, Patterns

Claude API Error Handling: Rate Limits, Retries, and Production Patterns

Error code reference

Additional HTTP status codes

Error subtype deep-dives (한국어, code samples)

Rate limit errors (429)

Context window exceeded (400)

Streaming errors

Tool use errors

The circuit breaker pattern

Logging and observability

FAQ

Sources

Frequently Asked Questions

What HTTP status codes should I retry when calling the Claude API?

How do I implement exponential backoff for Claude API rate limit errors?

What causes a Claude API 400 error and how do I fix it?

What happens when a Claude API tool call fails mid-conversation?

Take It Further

Tools and references

Claude API Error Handling: Rate Limits, Retries, and Production Patterns

Error code reference

Additional HTTP status codes

Error subtype deep-dives (한국어, code samples)

Rate limit errors (429)

Context window exceeded (400)

Streaming errors

Tool use errors

The circuit breaker pattern

Logging and observability

FAQ

Sources

Frequently Asked Questions

What HTTP status codes should I retry when calling the Claude API?

How do I implement exponential backoff for Claude API rate limit errors?

What causes a Claude API 400 error and how do I fix it?

What happens when a Claude API tool call fails mid-conversation?

Take It Further

Related guides

How to Handle Errors and Retries in Claude Agent SDK

Claude API Rate Limits: Complete Production Guide

Claude API Error Handling: Production Patterns with Retry Logic

Claude API Error Handling: Python & TypeScript Production

Tools and references