Claude Batch API and Webhooks: Async Processing for High-Volume Workloads

Q: Q: Can I cancel a batch after submission?

Yes. Call client.messages.batches.cancel(batch_id). Requests that have already been processed will have results available; unprocessed requests will have type: "canceled" in the results.

Q: Q: What happens if I exceed the 24-hour processing window?

Requests that exceed 24 hours will have result.type == "expired" in the results. Resubmit those requests in a new batch.

Claude's Batch API processes up to 10,000 requests asynchronously at a 50% discount on both input and output tokens. Submit a batch, poll for completion, and retrieve results — ideal for classification, extraction, summarization, or any workload where waiting a few hours beats paying full price. For cost calculations that include prompt caching on top of batch discounts, see Claude API Cost and Prompt Caching Break-Even.

When to Use Async vs Sync

Use synchronous API (standard /messages) when:

Users are waiting for a response in real time
Latency is under 30 seconds and directly visible to end users
You need streaming output
The task is interactive or conversational

Use Batch API when:

Processing is latency-insensitive (nightly jobs, bulk analysis, data pipelines)
You have 50+ requests to process in a run
You need to minimize API spend
Results can be consumed hours after submission

The 50% cost reduction is not a small optimization — it halves your Claude API budget for eligible workloads. Any job that can wait up to 24 hours should be evaluated for batch processing.

How the Claude Batch API Works

The Batch API flow has three stages:

Submit: Send a batch of up to 10,000 requests in one API call. Receive a batch_id.
Poll: Check the batch status periodically until processing_status is ended.
Retrieve: Download results — one result per original request, keyed by your custom_id.

Batches are processed within 24 hours. Most complete much faster, typically 1–4 hours depending on batch size and current load.

Submitting a Batch

import anthropic

client = anthropic.Anthropic()

# Prepare batch requests
requests = [
    {
        "custom_id": f"ticket-{i}",
        "params": {
            "model": "claude-haiku-4-5",
            "max_tokens": 256,
            "messages": [
                {
                    "role": "user",
                    "content": f"Classify this support ticket as: billing, technical, general, or spam.\nTicket: {ticket_text}"
                }
            ]
        }
    }
    for i, ticket_text in enumerate(tickets)
]

# Submit the batch
batch = client.messages.batches.create(requests=requests)

print(f"Batch ID: {batch.id}")
print(f"Status:   {batch.processing_status}")
# processing_status will be "in_progress" immediately after submission

Batch limits:

Maximum 10,000 requests per batch
Maximum total batch size: 32 MB
Maximum 100 concurrent in-progress batches per workspace

For workloads larger than 10,000 requests, split into multiple batches and submit them sequentially or in parallel.

Polling for Completion

Claude does not push a webhook natively — you poll the batch status endpoint until processing completes.

import time

def wait_for_batch(client, batch_id, poll_interval_seconds=60):
    """Poll until batch completes. Returns the completed batch object."""
    while True:
        batch = client.messages.batches.retrieve(batch_id)
        
        status = batch.processing_status
        counts = batch.request_counts
        
        print(
            f"Status: {status} | "
            f"Processing: {counts.processing} | "
            f"Succeeded: {counts.succeeded} | "
            f"Errored: {counts.errored}"
        )
        
        if status == "ended":
            return batch
        
        time.sleep(poll_interval_seconds)

completed_batch = wait_for_batch(client, batch.id, poll_interval_seconds=120)

Polling best practices:

Poll every 60–300 seconds, not every second. Excessive polling does not speed up processing and wastes API calls.
Persist the batch_id to durable storage (database, file) before starting the poll loop. If your process crashes, you can resume polling from the saved ID.
For overnight jobs, poll every 5–10 minutes. For smaller batches, every 2 minutes is reasonable.

Retrieving Completed Results

When processing_status == "ended", retrieve all results:

results = {}
errors = {}

for result in client.messages.batches.results(completed_batch.id):
    custom_id = result.custom_id
    
    if result.result.type == "succeeded":
        # Extract the text response
        message = result.result.message
        text = message.content[0].text
        results[custom_id] = text
        
    elif result.result.type == "errored":
        error = result.result.error
        errors[custom_id] = {
            "type": error.type,
            "message": error.message
        }
    
    elif result.result.type == "expired":
        # Request expired before processing (rare, batch exceeded 24h)
        errors[custom_id] = {"type": "expired"}

print(f"Succeeded: {len(results)}")
print(f"Failed:    {len(errors)}")

Results are returned in streaming JSONL format. The SDK handles this transparently — iterate over batches.results() and each iteration gives you one result.

Result format per item:

custom_id: Your identifier from the original request
result.type: "succeeded" | "errored" | "expired"
result.message: Full message object (if succeeded) — same shape as a synchronous /messages response

Error Handling for Partial Failures

Batches do not fail atomically. A batch with 1,000 requests may have 995 successes and 5 errors. Always handle partial failures:

# Collect failed request IDs
failed_ids = [custom_id for custom_id in errors]

if failed_ids:
    print(f"Retrying {len(failed_ids)} failed requests...")
    
    # Rebuild retry requests from your original data
    retry_requests = [
        build_request(ticket_id=fid, ticket_text=original_data[fid])
        for fid in failed_ids
    ]
    
    retry_batch = client.messages.batches.create(requests=retry_requests)
    retry_completed = wait_for_batch(client, retry_batch.id)
    retry_results = retrieve_results(client, retry_completed.id)
    
    # Merge retry results into main results
    results.update(retry_results)

Common error types:

invalid_request: Malformed request (fix and retry)
overloaded: Anthropic capacity issue (retry is safe)
expired: Batch exceeded 24-hour processing window (resubmit)

Building a Webhook-Style Notification System

Claude Batch API does not support native webhooks (push notifications to your endpoint when a batch completes). You can build this yourself with a lightweight polling service.

Pattern: Polling loop with callback

import threading
import requests as http_requests

def batch_watcher(client, batch_id, callback_url, poll_interval=120):
    """
    Runs in a background thread.
    Polls until batch completes, then POSTs results to callback_url.
    """
    batch = wait_for_batch(client, batch_id, poll_interval)
    results = retrieve_all_results(client, batch.id)
    
    # Notify your webhook endpoint
    http_requests.post(callback_url, json={
        "batch_id": batch_id,
        "status":   "completed",
        "counts":   {
            "succeeded": batch.request_counts.succeeded,
            "errored":   batch.request_counts.errored,
        },
        "results_available": True
    })

# Start the watcher in a background thread
watcher = threading.Thread(
    target=batch_watcher,
    args=(client, batch.id, "https://yourapp.com/webhooks/claude-batch"),
    daemon=True
)
watcher.start()

# Your main process can continue or exit — the watcher runs independently

For production systems, move the polling loop to a persistent background worker (Celery, a Lambda on a schedule, or a simple cron job) rather than a thread in your application process. Store batch IDs in a database so the poller can resume after restarts.

Practical Use Case: Processing 1,000 Customer Support Tickets Overnight

A SaaS company receives 1,000 support tickets per day. The goal: classify each ticket by category and urgency, then pre-draft a response — processed overnight, ready for agents in the morning.

Task per ticket:

Classify: billing / technical / account / feature-request
Urgency: high / medium / low
Draft a 2-sentence response opening

Setup:

Model: claude-haiku-4-5 (fastest, cheapest for classification)
Max tokens: 300 per request (classification + short draft)
Batch size: 1,000 requests

Cost calculation:

Assume average ticket length is 200 words (~280 tokens) and the system prompt is 150 tokens:

Input per request: 280 (ticket) + 150 (system) + 50 (instruction) = ~480 tokens
Output per request: ~280 tokens (classification JSON + draft)
Total input tokens: 1,000 × 480 = 480,000 tokens
Total output tokens: 1,000 × 280 = 280,000 tokens

Synchronous API cost (Haiku pricing):

Input: 480,000 × $0.80/MTok = $0.384
Output: 280,000 × $4.00/MTok = $1.12
Total: $1.50

Batch API cost (50% discount):

Input: 480,000 × $0.40/MTok = $0.192
Output: 280,000 × $2.00/MTok = $0.56
Total: $0.75

Daily savings: $0.75 → $273/year on this single workload. At higher volumes or with Sonnet-class models, savings compound significantly.

Timeline:

Submit batch at 11 PM → completed by 2–3 AM → agents see pre-classified, pre-drafted tickets at 9 AM

FAQ

Q: Can I cancel a batch after submission? Yes. Call client.messages.batches.cancel(batch_id). Requests that have already been processed will have results available; unprocessed requests will have type: "canceled" in the results.

Q: Does the Batch API support all Claude models? The Batch API is available for Claude Haiku, Sonnet, and Opus models (the current generation). Check the Anthropic documentation for the current list of supported model IDs — model availability can change with new releases. For a comparison of model capabilities and cost trade-offs, see Haiku vs Sonnet vs Opus: Which Model?.

Q: Are prompt caching benefits available in Batch API? Yes. If you use cache-control breakpoints in your batch requests, prompt caching applies and reduces costs further. This is especially valuable when all 10,000 requests share the same long system prompt.

Q: What happens if I exceed the 24-hour processing window? Requests that exceed 24 hours will have result.type == "expired" in the results. Resubmit those requests in a new batch.

Q: Can I use streaming with Batch API? No. Batch API is asynchronous and does not support streaming. Use the synchronous API for streaming output.

Sources

→ Get Cost Optimization Masterclass — $59

Covers Batch API pipelines, prompt caching strategies, model routing, and a full cost calculator — everything needed to cut Claude API spend by 50–80% on production workloads.

Claude Batch API and Webhooks: Async Processing for High-Volume Workloads

When to Use Async vs Sync

How the Claude Batch API Works

Submitting a Batch

Polling for Completion

Retrieving Completed Results

Error Handling for Partial Failures

Building a Webhook-Style Notification System

Practical Use Case: Processing 1,000 Customer Support Tickets Overnight

FAQ

Sources

Related guides

Claude Batch API: 50% Discount for Async Workloads (2026 Guide)

10 Claude API Cost Quick Wins: 5-30 Minute Fixes (2026)

AWS Bedrock vs Anthropic API: Which to Use?

Claude API Cost Optimization: Complete Guide to Reducing Your Bill

Tools and references