Claude Batch API and Webhooks: Async Processing for High-Volume Workloads
Claude's Batch API processes up to 10,000 requests asynchronously at a 50% discount on both input and output tokens. Submit a batch, poll for completion, and retrieve results — ideal for classification, extraction, summarization, or any workload where waiting a few hours beats paying full price. For cost calculations that include prompt caching on top of batch discounts, see Claude API Cost and Prompt Caching Break-Even.
When to Use Async vs Sync
Use synchronous API (standard /messages) when:
- Users are waiting for a response in real time
- Latency is under 30 seconds and directly visible to end users
- You need streaming output
- The task is interactive or conversational
Use Batch API when:
- Processing is latency-insensitive (nightly jobs, bulk analysis, data pipelines)
- You have 50+ requests to process in a run
- You need to minimize API spend
- Results can be consumed hours after submission
The 50% cost reduction is not a small optimization — it halves your Claude API budget for eligible workloads. Any job that can wait up to 24 hours should be evaluated for batch processing.
How the Claude Batch API Works
The Batch API flow has three stages:
- Submit: Send a batch of up to 10,000 requests in one API call. Receive a
batch_id. - Poll: Check the batch status periodically until
processing_statusisended. - Retrieve: Download results — one result per original request, keyed by your
custom_id.
Batches are processed within 24 hours. Most complete much faster, typically 1–4 hours depending on batch size and current load.
Submitting a Batch
import anthropic
client = anthropic.Anthropic()
# Prepare batch requests
requests = [
{
"custom_id": f"ticket-{i}",
"params": {
"model": "claude-haiku-4-5",
"max_tokens": 256,
"messages": [
{
"role": "user",
"content": f"Classify this support ticket as: billing, technical, general, or spam.\nTicket: {ticket_text}"
}
]
}
}
for i, ticket_text in enumerate(tickets)
]
# Submit the batch
batch = client.messages.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")
# processing_status will be "in_progress" immediately after submission
Batch limits:
- Maximum 10,000 requests per batch
- Maximum total batch size: 32 MB
- Maximum 100 concurrent in-progress batches per workspace
For workloads larger than 10,000 requests, split into multiple batches and submit them sequentially or in parallel.
Polling for Completion
Claude does not push a webhook natively — you poll the batch status endpoint until processing completes.
import time
def wait_for_batch(client, batch_id, poll_interval_seconds=60):
"""Poll until batch completes. Returns the completed batch object."""
while True:
batch = client.messages.batches.retrieve(batch_id)
status = batch.processing_status
counts = batch.request_counts
print(
f"Status: {status} | "
f"Processing: {counts.processing} | "
f"Succeeded: {counts.succeeded} | "
f"Errored: {counts.errored}"
)
if status == "ended":
return batch
time.sleep(poll_interval_seconds)
completed_batch = wait_for_batch(client, batch.id, poll_interval_seconds=120)
Polling best practices:
- Poll every 60–300 seconds, not every second. Excessive polling does not speed up processing and wastes API calls.
- Persist the
batch_idto durable storage (database, file) before starting the poll loop. If your process crashes, you can resume polling from the saved ID. - For overnight jobs, poll every 5–10 minutes. For smaller batches, every 2 minutes is reasonable.
Retrieving Completed Results
When processing_status == "ended", retrieve all results:
results = {}
errors = {}
for result in client.messages.batches.results(completed_batch.id):
custom_id = result.custom_id
if result.result.type == "succeeded":
# Extract the text response
message = result.result.message
text = message.content[0].text
results[custom_id] = text
elif result.result.type == "errored":
error = result.result.error
errors[custom_id] = {
"type": error.type,
"message": error.message
}
elif result.result.type == "expired":
# Request expired before processing (rare, batch exceeded 24h)
errors[custom_id] = {"type": "expired"}
print(f"Succeeded: {len(results)}")
print(f"Failed: {len(errors)}")
Results are returned in streaming JSONL format. The SDK handles this transparently — iterate over batches.results() and each iteration gives you one result.
Result format per item:
custom_id: Your identifier from the original requestresult.type:"succeeded"|"errored"|"expired"result.message: Full message object (if succeeded) — same shape as a synchronous/messagesresponse
Error Handling for Partial Failures
Batches do not fail atomically. A batch with 1,000 requests may have 995 successes and 5 errors. Always handle partial failures:
# Collect failed request IDs
failed_ids = [custom_id for custom_id in errors]
if failed_ids:
print(f"Retrying {len(failed_ids)} failed requests...")
# Rebuild retry requests from your original data
retry_requests = [
build_request(ticket_id=fid, ticket_text=original_data[fid])
for fid in failed_ids
]
retry_batch = client.messages.batches.create(requests=retry_requests)
retry_completed = wait_for_batch(client, retry_batch.id)
retry_results = retrieve_results(client, retry_completed.id)
# Merge retry results into main results
results.update(retry_results)
Common error types:
invalid_request: Malformed request (fix and retry)overloaded: Anthropic capacity issue (retry is safe)expired: Batch exceeded 24-hour processing window (resubmit)
Building a Webhook-Style Notification System
Claude Batch API does not support native webhooks (push notifications to your endpoint when a batch completes). You can build this yourself with a lightweight polling service.
Pattern: Polling loop with callback
import threading
import requests as http_requests
def batch_watcher(client, batch_id, callback_url, poll_interval=120):
"""
Runs in a background thread.
Polls until batch completes, then POSTs results to callback_url.
"""
batch = wait_for_batch(client, batch_id, poll_interval)
results = retrieve_all_results(client, batch.id)
# Notify your webhook endpoint
http_requests.post(callback_url, json={
"batch_id": batch_id,
"status": "completed",
"counts": {
"succeeded": batch.request_counts.succeeded,
"errored": batch.request_counts.errored,
},
"results_available": True
})
# Start the watcher in a background thread
watcher = threading.Thread(
target=batch_watcher,
args=(client, batch.id, "https://yourapp.com/webhooks/claude-batch"),
daemon=True
)
watcher.start()
# Your main process can continue or exit — the watcher runs independently
For production systems, move the polling loop to a persistent background worker (Celery, a Lambda on a schedule, or a simple cron job) rather than a thread in your application process. Store batch IDs in a database so the poller can resume after restarts.
Practical Use Case: Processing 1,000 Customer Support Tickets Overnight
A SaaS company receives 1,000 support tickets per day. The goal: classify each ticket by category and urgency, then pre-draft a response — processed overnight, ready for agents in the morning.
Task per ticket:
- Classify: billing / technical / account / feature-request
- Urgency: high / medium / low
- Draft a 2-sentence response opening
Setup:
- Model:
claude-haiku-4-5(fastest, cheapest for classification) - Max tokens: 300 per request (classification + short draft)
- Batch size: 1,000 requests
Cost calculation:
Assume average ticket length is 200 words (~280 tokens) and the system prompt is 150 tokens:
- Input per request: 280 (ticket) + 150 (system) + 50 (instruction) = ~480 tokens
- Output per request: ~280 tokens (classification JSON + draft)
- Total input tokens: 1,000 × 480 = 480,000 tokens
- Total output tokens: 1,000 × 280 = 280,000 tokens
Synchronous API cost (Haiku pricing):
- Input: 480,000 × $0.80/MTok = $0.384
- Output: 280,000 × $4.00/MTok = $1.12
- Total: $1.50
Batch API cost (50% discount):
- Input: 480,000 × $0.40/MTok = $0.192
- Output: 280,000 × $2.00/MTok = $0.56
- Total: $0.75
Daily savings: $0.75 → $273/year on this single workload. At higher volumes or with Sonnet-class models, savings compound significantly.
Timeline:
- Submit batch at 11 PM → completed by 2–3 AM → agents see pre-classified, pre-drafted tickets at 9 AM
FAQ
Q: Can I cancel a batch after submission?
Yes. Call client.messages.batches.cancel(batch_id). Requests that have already been processed will have results available; unprocessed requests will have type: "canceled" in the results.
Q: Does the Batch API support all Claude models? The Batch API is available for Claude Haiku, Sonnet, and Opus models (the current generation). Check the Anthropic documentation for the current list of supported model IDs — model availability can change with new releases. For a comparison of model capabilities and cost trade-offs, see Haiku vs Sonnet vs Opus: Which Model?.
Q: Are prompt caching benefits available in Batch API? Yes. If you use cache-control breakpoints in your batch requests, prompt caching applies and reduces costs further. This is especially valuable when all 10,000 requests share the same long system prompt.
Q: What happens if I exceed the 24-hour processing window?
Requests that exceed 24 hours will have result.type == "expired" in the results. Resubmit those requests in a new batch.
Q: Can I use streaming with Batch API? No. Batch API is asynchronous and does not support streaming. Use the synchronous API for streaming output.
Sources
- Anthropic Batch API documentation
- Batch API pricing — Anthropic
- Claude API rate limits
- Prompt caching with Batch API
→ Get Cost Optimization Masterclass — $59
Covers Batch API pipelines, prompt caching strategies, model routing, and a full cost calculator — everything needed to cut Claude API spend by 50–80% on production workloads.