← All guides

Claude Batch API and Webhooks: Async Processing for High-Volume Workloads

Claude Batch API offers 50% cost savings for latency-insensitive workloads. Learn batch submission, polling, webhook-style callbacks, and a real.

Claude Batch API and Webhooks: Async Processing for High-Volume Workloads

Claude's Batch API processes up to 10,000 requests asynchronously at a 50% discount on both input and output tokens. Submit a batch, poll for completion, and retrieve results — ideal for classification, extraction, summarization, or any workload where waiting a few hours beats paying full price. For cost calculations that include prompt caching on top of batch discounts, see Claude API Cost and Prompt Caching Break-Even.


When to Use Async vs Sync

Use synchronous API (standard /messages) when:

Use Batch API when:

The 50% cost reduction is not a small optimization — it halves your Claude API budget for eligible workloads. Any job that can wait up to 24 hours should be evaluated for batch processing.


How the Claude Batch API Works

The Batch API flow has three stages:

  1. Submit: Send a batch of up to 10,000 requests in one API call. Receive a batch_id.
  2. Poll: Check the batch status periodically until processing_status is ended.
  3. Retrieve: Download results — one result per original request, keyed by your custom_id.

Batches are processed within 24 hours. Most complete much faster, typically 1–4 hours depending on batch size and current load.


Submitting a Batch

import anthropic

client = anthropic.Anthropic()

# Prepare batch requests
requests = [
    {
        "custom_id": f"ticket-{i}",
        "params": {
            "model": "claude-haiku-4-5",
            "max_tokens": 256,
            "messages": [
                {
                    "role": "user",
                    "content": f"Classify this support ticket as: billing, technical, general, or spam.\nTicket: {ticket_text}"
                }
            ]
        }
    }
    for i, ticket_text in enumerate(tickets)
]

# Submit the batch
batch = client.messages.batches.create(requests=requests)

print(f"Batch ID: {batch.id}")
print(f"Status:   {batch.processing_status}")
# processing_status will be "in_progress" immediately after submission

Batch limits:

For workloads larger than 10,000 requests, split into multiple batches and submit them sequentially or in parallel.


Polling for Completion

Claude does not push a webhook natively — you poll the batch status endpoint until processing completes.

import time

def wait_for_batch(client, batch_id, poll_interval_seconds=60):
    """Poll until batch completes. Returns the completed batch object."""
    while True:
        batch = client.messages.batches.retrieve(batch_id)
        
        status = batch.processing_status
        counts = batch.request_counts
        
        print(
            f"Status: {status} | "
            f"Processing: {counts.processing} | "
            f"Succeeded: {counts.succeeded} | "
            f"Errored: {counts.errored}"
        )
        
        if status == "ended":
            return batch
        
        time.sleep(poll_interval_seconds)

completed_batch = wait_for_batch(client, batch.id, poll_interval_seconds=120)

Polling best practices:


Retrieving Completed Results

When processing_status == "ended", retrieve all results:

results = {}
errors = {}

for result in client.messages.batches.results(completed_batch.id):
    custom_id = result.custom_id
    
    if result.result.type == "succeeded":
        # Extract the text response
        message = result.result.message
        text = message.content[0].text
        results[custom_id] = text
        
    elif result.result.type == "errored":
        error = result.result.error
        errors[custom_id] = {
            "type": error.type,
            "message": error.message
        }
    
    elif result.result.type == "expired":
        # Request expired before processing (rare, batch exceeded 24h)
        errors[custom_id] = {"type": "expired"}

print(f"Succeeded: {len(results)}")
print(f"Failed:    {len(errors)}")

Results are returned in streaming JSONL format. The SDK handles this transparently — iterate over batches.results() and each iteration gives you one result.

Result format per item:


Error Handling for Partial Failures

Batches do not fail atomically. A batch with 1,000 requests may have 995 successes and 5 errors. Always handle partial failures:

# Collect failed request IDs
failed_ids = [custom_id for custom_id in errors]

if failed_ids:
    print(f"Retrying {len(failed_ids)} failed requests...")
    
    # Rebuild retry requests from your original data
    retry_requests = [
        build_request(ticket_id=fid, ticket_text=original_data[fid])
        for fid in failed_ids
    ]
    
    retry_batch = client.messages.batches.create(requests=retry_requests)
    retry_completed = wait_for_batch(client, retry_batch.id)
    retry_results = retrieve_results(client, retry_completed.id)
    
    # Merge retry results into main results
    results.update(retry_results)

Common error types:


Building a Webhook-Style Notification System

Claude Batch API does not support native webhooks (push notifications to your endpoint when a batch completes). You can build this yourself with a lightweight polling service.

Pattern: Polling loop with callback

import threading
import requests as http_requests

def batch_watcher(client, batch_id, callback_url, poll_interval=120):
    """
    Runs in a background thread.
    Polls until batch completes, then POSTs results to callback_url.
    """
    batch = wait_for_batch(client, batch_id, poll_interval)
    results = retrieve_all_results(client, batch.id)
    
    # Notify your webhook endpoint
    http_requests.post(callback_url, json={
        "batch_id": batch_id,
        "status":   "completed",
        "counts":   {
            "succeeded": batch.request_counts.succeeded,
            "errored":   batch.request_counts.errored,
        },
        "results_available": True
    })

# Start the watcher in a background thread
watcher = threading.Thread(
    target=batch_watcher,
    args=(client, batch.id, "https://yourapp.com/webhooks/claude-batch"),
    daemon=True
)
watcher.start()

# Your main process can continue or exit — the watcher runs independently

For production systems, move the polling loop to a persistent background worker (Celery, a Lambda on a schedule, or a simple cron job) rather than a thread in your application process. Store batch IDs in a database so the poller can resume after restarts.


Practical Use Case: Processing 1,000 Customer Support Tickets Overnight

A SaaS company receives 1,000 support tickets per day. The goal: classify each ticket by category and urgency, then pre-draft a response — processed overnight, ready for agents in the morning.

Task per ticket:

Setup:

Cost calculation:

Assume average ticket length is 200 words (~280 tokens) and the system prompt is 150 tokens:

Synchronous API cost (Haiku pricing):

Batch API cost (50% discount):

Daily savings: $0.75 → $273/year on this single workload. At higher volumes or with Sonnet-class models, savings compound significantly.

Timeline:


FAQ

Q: Can I cancel a batch after submission? Yes. Call client.messages.batches.cancel(batch_id). Requests that have already been processed will have results available; unprocessed requests will have type: "canceled" in the results.

Q: Does the Batch API support all Claude models? The Batch API is available for Claude Haiku, Sonnet, and Opus models (the current generation). Check the Anthropic documentation for the current list of supported model IDs — model availability can change with new releases. For a comparison of model capabilities and cost trade-offs, see Haiku vs Sonnet vs Opus: Which Model?.

Q: Are prompt caching benefits available in Batch API? Yes. If you use cache-control breakpoints in your batch requests, prompt caching applies and reduces costs further. This is especially valuable when all 10,000 requests share the same long system prompt.

Q: What happens if I exceed the 24-hour processing window? Requests that exceed 24 hours will have result.type == "expired" in the results. Resubmit those requests in a new batch.

Q: Can I use streaming with Batch API? No. Batch API is asynchronous and does not support streaming. Use the synchronous API for streaming output.


Sources


→ Get Cost Optimization Masterclass — $59

Covers Batch API pipelines, prompt caching strategies, model routing, and a full cost calculator — everything needed to cut Claude API spend by 50–80% on production workloads.

AI Disclosure: Drafted with Claude Code; all pricing and feature details from official documentation as of April 2026.

Tools and references