Claude API Webhook Integration: Async Patterns and Event-Driven AI

Q: Should I use FastAPI BackgroundTasks or Celery for Claude webhook processing?

Use BackgroundTasks for simple, low-volume use cases where jobs can be lost on server restart (e.g., internal tooling, low-stakes notifications). Use Celery, Redis Queue (RQ), or a database-backed queue for production workloads where job durability, visibility, retry logic, and horizontal scaling matter. The rule: if losing a job would be a business problem, use a persistent queue. If it is acceptable to skip a job on deploy, BackgroundTasks is fine.

Q: What model should I use for webhook-triggered Claude calls?

Default to claude-haiku-4-5 for classification, routing, summarization, and short-form generation tasks triggered by webhooks. Haiku has the highest throughput (lowest latency, lowest cost), which matters when processing queues of hundreds of jobs. Use claude-sonnet-4-5 for tasks requiring multi-step reasoning, code generation, or nuanced analysis. Reserve claude-opus-4-5 for edge cases that genuinely need it — keeping Opus usage below 5% of total calls is a practical cost discipline. Full guidance at Haiku vs Sonnet vs Opus: Which Model?.

To use Claude API with webhooks, your server receives an inbound HTTP POST from an external service, queues or immediately calls the Claude API with the extracted payload, then posts the result to a callback URL or stores it for the requester to poll. The pattern has two variants: synchronous (respond within the webhook's timeout window, typically 5–30 seconds) and asynchronous (acknowledge the webhook instantly with HTTP 200, process Claude in the background, push results later). For anything beyond simple one-line completions, the async pattern is more robust and avoids webhook timeout failures.

This guide covers both patterns with production-ready code in Python (FastAPI) and Node.js (Express), plus error handling, retries, rate-limit awareness, and payload security.

Why Webhooks for Claude API?

Most Claude API tutorials show synchronous request-response: send a prompt, wait for the reply. That works for user-facing chat. It breaks for event-driven architectures.

Consider these real scenarios:

GitHub event → label a pull request using Claude → post comment
Stripe payment → generate a personalized onboarding email body via Claude → send via Loops
Typeform submission → Claude scores a lead and writes a CRM note → update HubSpot
Slack message → route to Claude for classification → trigger downstream automation

In each case, an external service calls your endpoint when something happens. You do not control the timing. The upstream service typically expects an HTTP 200 within 5–10 seconds or it retries. Claude API calls can take 1–15 seconds depending on model and output length. The mismatch creates timeout failures at scale.

The solution is an async webhook architecture: acknowledge fast, process separately.

For how webhooks fit into broader production designs, see Claude API Production Architecture.

Pattern 1: Inbound Webhook → Claude Processing → Outbound Response

The simplest viable pattern for latency-tolerant webhooks:

External Service  →  Your Webhook Endpoint
                           ↓ (async, background task)
                      Claude API Call
                           ↓
                      Outbound POST to callback_url
                      (or write to DB / queue)

When to use: The webhook sender either (a) supports a callback_url for async results, (b) does not require a meaningful response body, or (c) your Claude call reliably completes within the sender's timeout window.

Key properties:

Return HTTP 200/202 immediately to acknowledge receipt
Move Claude processing off the request thread
Write results to persistent storage or push to a callback URL
Idempotency key (usually the webhook's event ID) prevents double-processing on retries

Pattern 2: Async Job Queue with Claude

For high-throughput or mission-critical pipelines, add a proper queue between the webhook receiver and the Claude API call:

Webhook Endpoint  →  Enqueue job (Redis / SQS / database)  →  HTTP 202
                           ↓ (worker process, separate)
                      Dequeue + Call Claude API
                           ↓
                      Store result + notify (callback / webhook / polling)

When to use:

Bursts of webhooks could overwhelm your Claude rate limits
Jobs must survive server restarts
You need retry logic with backoff on Claude API errors
Multiple workers process the queue in parallel

For a full breakdown of rate limit strategies in production, see Claude API Production Architecture.

FastAPI Example: Receive Webhook, Call Claude, Respond

This example handles a GitHub PR event webhook, asks Claude to summarize the diff, and posts a comment back.

import os
import hmac
import hashlib
import httpx
from fastapi import FastAPI, BackgroundTasks, HTTPException, Request
from anthropic import Anthropic

app = FastAPI()
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

GITHUB_WEBHOOK_SECRET = os.environ["GITHUB_WEBHOOK_SECRET"].encode()


def verify_github_signature(payload: bytes, signature_header: str) -> bool:
    """Verify GitHub's HMAC-SHA256 webhook signature."""
    if not signature_header or not signature_header.startswith("sha256="):
        return False
    expected = hmac.new(GITHUB_WEBHOOK_SECRET, payload, hashlib.sha256).hexdigest()
    received = signature_header[len("sha256="):]
    return hmac.compare_digest(expected, received)


async def process_pr_event(event: dict):
    """Background task: call Claude, post result as GitHub comment."""
    pr = event.get("pull_request", {})
    diff_url = pr.get("diff_url", "")
    comments_url = pr.get("comments_url", "")
    title = pr.get("title", "(no title)")
    body = pr.get("body", "(no description)")

    # Fetch the diff (up to 8KB — Claude handles larger but keep it focused)
    async with httpx.AsyncClient() as http:
        diff_resp = await http.get(diff_url, timeout=10)
        diff_text = diff_resp.text[:8000]

    # Call Claude — use Haiku for speed and cost on classification/summary tasks
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=512,
        system=(
            "You are a senior code reviewer. Given a pull request title, description, "
            "and diff, write a concise 3-5 bullet summary of the changes and flag any "
            "obvious risks. Be direct. No filler."
        ),
        messages=[
            {
                "role": "user",
                "content": (
                    f"PR: {title}\n\nDescription: {body}\n\nDiff (truncated):\n{diff_text}"
                ),
            }
        ],
    )

    summary = response.content[0].text

    # Post summary as a GitHub PR comment
    github_token = os.environ["GITHUB_TOKEN"]
    async with httpx.AsyncClient() as http:
        await http.post(
            comments_url,
            json={"body": f"**Claude PR Summary**\n\n{summary}"},
            headers={
                "Authorization": f"Bearer {github_token}",
                "Accept": "application/vnd.github+json",
            },
            timeout=10,
        )


@app.post("/webhooks/github/pr")
async def github_pr_webhook(
    request: Request,
    background_tasks: BackgroundTasks,
):
    payload = await request.body()
    signature = request.headers.get("X-Hub-Signature-256", "")

    # Always verify the signature before processing
    if not verify_github_signature(payload, signature):
        raise HTTPException(status_code=401, detail="Invalid signature")

    event_type = request.headers.get("X-GitHub-Event", "")
    if event_type != "pull_request":
        # Acknowledge but ignore non-PR events
        return {"status": "ignored", "event": event_type}

    data = await request.json()
    action = data.get("action", "")

    if action not in ("opened", "synchronize"):
        return {"status": "ignored", "action": action}

    # Enqueue background processing — return 202 immediately
    background_tasks.add_task(process_pr_event, data)
    return {"status": "accepted"}

Key choices in this example:

BackgroundTasks offloads Claude processing after the HTTP response is sent
claude-haiku-4-5 is used for summary tasks — fast, cheap, sufficient quality
Signature verification happens before any processing
Idempotency: GitHub includes X-GitHub-Delivery header — log and check it in production to skip duplicate deliveries

For Sonnet or Opus on more complex analysis, swap the model string. For guidance on choosing the right model tier, see Haiku vs Sonnet vs Opus: Which Model?.

Node.js Express Example

The same PR summarization pattern in Express with an in-memory job queue:

import express from "express";
import crypto from "crypto";
import Anthropic from "@anthropic-ai/sdk";
import fetch from "node-fetch";

const app = express();
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

// Middleware: parse raw body for signature verification
app.use(
  "/webhooks",
  express.raw({ type: "application/json" }),
  (req, _res, next) => {
    req.rawBody = req.body;
    req.body = JSON.parse(req.body.toString());
    next();
  }
);

function verifyGitHubSignature(rawBody, signatureHeader) {
  if (!signatureHeader?.startsWith("sha256=")) return false;
  const expected = crypto
    .createHmac("sha256", process.env.GITHUB_WEBHOOK_SECRET)
    .update(rawBody)
    .digest("hex");
  const received = signatureHeader.slice("sha256=".length);
  return crypto.timingSafeEqual(
    Buffer.from(expected, "hex"),
    Buffer.from(received, "hex")
  );
}

async function processPREvent(event) {
  const pr = event.pull_request;
  if (!pr) return;

  // Fetch diff
  const diffRes = await fetch(pr.diff_url);
  const diffText = (await diffRes.text()).slice(0, 8000);

  const response = await anthropic.messages.create({
    model: "claude-haiku-4-5",
    max_tokens: 512,
    system:
      "You are a senior code reviewer. Given a PR title, description, and diff, " +
      "write a concise 3-5 bullet summary and flag obvious risks. Be direct.",
    messages: [
      {
        role: "user",
        content: `PR: ${pr.title}\n\nDescription: ${pr.body || "(none)"}\n\nDiff:\n${diffText}`,
      },
    ],
  });

  const summary = response.content[0].text;

  // Post comment
  await fetch(pr.comments_url, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.GITHUB_TOKEN}`,
      Accept: "application/vnd.github+json",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ body: `**Claude PR Summary**\n\n${summary}` }),
  });
}

app.post("/webhooks/github/pr", async (req, res) => {
  const signature = req.headers["x-hub-signature-256"] ?? "";

  if (!verifyGitHubSignature(req.rawBody, signature)) {
    return res.status(401).json({ error: "Invalid signature" });
  }

  const eventType = req.headers["x-github-event"];
  if (eventType !== "pull_request") {
    return res.json({ status: "ignored", event: eventType });
  }

  const { action } = req.body;
  if (!["opened", "synchronize"].includes(action)) {
    return res.json({ status: "ignored", action });
  }

  // Respond immediately, process asynchronously
  res.status(202).json({ status: "accepted" });

  // Fire-and-forget with error logging
  processPREvent(req.body).catch((err) => {
    console.error("[webhook] PR processing failed:", err.message);
  });
});

app.listen(3000, () => console.log("Webhook server listening on :3000"));

Notes on the Node.js version:

express.raw() is required before JSON.parse to preserve the raw body for HMAC verification. Using express.json() directly loses the raw bytes.
res.status(202).json(...) is sent before processPREvent is called — the response is already flushed when Claude starts working.
timingSafeEqual prevents timing attacks on signature comparison.

Error Handling and Retries

Claude API calls inside webhook handlers fail for three reasons: network timeouts, API errors (5xx), and rate limits (429). Each needs a different response.

import time
import anthropic
from anthropic import APIStatusError, APIConnectionError, RateLimitError

def call_claude_with_retry(prompt: str, max_retries: int = 3) -> str:
    """
    Call Claude with exponential backoff on retryable errors.
    Raises on non-retryable errors (e.g., invalid_request).
    """
    client = anthropic.Anthropic()
    base_delay = 1.0

    for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model="claude-haiku-4-5",
                max_tokens=512,
                messages=[{"role": "user", "content": prompt}],
            )
            return response.content[0].text

        except RateLimitError as e:
            # 429: rate limit — always retry with backoff
            delay = base_delay * (2 ** attempt)
            retry_after = e.response.headers.get("retry-after")
            wait = float(retry_after) if retry_after else delay
            print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1})")
            time.sleep(wait)

        except APIStatusError as e:
            if e.status_code >= 500:
                # 5xx: transient server error — retry
                delay = base_delay * (2 ** attempt)
                print(f"Server error {e.status_code}. Waiting {delay:.1f}s")
                time.sleep(delay)
            else:
                # 4xx (except 429): invalid request — do not retry
                raise

        except APIConnectionError:
            # Network issue — retry
            delay = base_delay * (2 ** attempt)
            time.sleep(delay)

    raise RuntimeError(f"Claude API call failed after {max_retries} retries")

Dead-letter handling: In a queue-based system, after max_retries exhaustion, write the job to a dead-letter queue or database table rather than discarding it silently. This enables manual review and reprocessing.

Rate Limit Considerations with Webhooks

Webhooks arrive at unpredictable intervals. A quiet night followed by a spike of 500 GitHub events at 9 AM will hit your Claude tokens-per-minute (TPM) limit if every job calls Claude immediately.

Strategies:

1. Token bucket / leaky queue

Dequeue jobs at a controlled rate. For example, if your limit is 100,000 TPM and each job uses ~500 tokens, you can safely process 200 jobs per minute. Set your queue consumer to pull at that rate maximum.

2. Model tier routing

Route simple tasks (classification, short summaries) to Haiku (higher rate limits, lower token cost per job) and complex tasks to Sonnet only when needed. This maximizes throughput within the same TPM cap. See Claude Agent SDK Guide for routing patterns in multi-step workflows.

3. Graceful 429 handling

When a worker hits a 429, pause the entire consumer for the retry-after window, then resume. Do not spin all workers simultaneously into retry loops — that compounds the rate limit pressure.

4. Prompt caching for repeated system prompts

If all jobs in a queue share the same system prompt (e.g., "You are a PR reviewer..."), add cache_control: {"type": "ephemeral"} to the system prompt block. Cache hits do not count toward input TPM on cache reads, which effectively multiplies your throughput for cached content.

response = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=512,
    system=[
        {
            "type": "text",
            "text": "You are a senior code reviewer. Write concise 3-5 bullet summaries...",
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": user_prompt}],
)

The first call populates the cache. Subsequent calls within the 5-minute TTL hit the cache and bypass input token counting for the system prompt portion.

Security: Validating Webhook Payloads

Never process a webhook payload without verifying its origin. Any attacker who knows your endpoint URL can POST fake events.

HMAC signature verification (the standard approach):

Most webhook providers (GitHub, Stripe, Shopify, Twilio) sign payloads with HMAC-SHA256 using a shared secret you configure in their dashboard.

import hmac
import hashlib

def verify_signature(
    raw_body: bytes,
    signature_header: str,
    secret: str,
    prefix: str = "sha256=",
) -> bool:
    """
    Generic HMAC-SHA256 webhook signature verifier.
    Works for GitHub, Stripe (prefix='v1='), and most others.
    """
    if not signature_header:
        return False

    received = signature_header
    if received.startswith(prefix):
        received = received[len(prefix):]

    expected = hmac.new(
        secret.encode("utf-8"), raw_body, hashlib.sha256
    ).hexdigest()

    return hmac.compare_digest(expected, received)

Critical implementation rules:

Always use hmac.compare_digest (or crypto.timingSafeEqual in Node.js) — never ==. Timing attacks can leak the signature one bit at a time with plain equality checks.
Verify against the raw request body bytes, not a re-serialized JSON dict. JSON serialization is not deterministic across libraries.
Store the webhook secret in an environment variable, never in source code.
Add an idempotency check on the provider's event/delivery ID to reject replayed webhooks.

Stripe-specific note: Stripe includes a timestamp in the signature header (t=1620000000,v1=abc123) and expects you to verify that the timestamp is within 5 minutes of now, preventing replay attacks.

import time

def verify_stripe_signature(raw_body: bytes, sig_header: str, secret: str) -> bool:
    parts = dict(item.split("=", 1) for item in sig_header.split(","))
    timestamp = int(parts.get("t", 0))
    received_sig = parts.get("v1", "")

    # Reject if timestamp is more than 5 minutes old
    if abs(time.time() - timestamp) > 300:
        return False

    payload_to_sign = f"{timestamp}.{raw_body.decode('utf-8')}"
    expected = hmac.new(
        secret.encode(), payload_to_sign.encode(), hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, received_sig)

Mid-Article CTA

Webhook integration is one chapter of a larger cost-and-architecture story. The P5 Cost Optimization Masterclass covers prompt caching strategies, model routing decisions, batch API pipelines, and a full Excel/Sheets cost calculator — everything needed to cut Claude API spend by 50–80% on production workloads. Patterns from real deployments, not toy examples.

FAQ

How do I prevent duplicate processing when a webhook is retried?

Webhook providers retry on non-2xx responses or timeouts. Use the provider's event ID as an idempotency key: before processing, check a database or Redis set for the event ID. If it exists, return 200 immediately without re-processing. If not, insert the ID and proceed. Most providers include a stable event ID in the headers or body (e.g., GitHub's X-GitHub-Delivery, Stripe's id field).

What happens if Claude API is slow and the webhook times out?

Return HTTP 202 Accepted immediately and process asynchronously. Never block the HTTP response on the Claude API call. If the webhook sender marks your endpoint as failed after a timeout and retries, idempotency key checks will prevent duplicate Claude calls on the retry. For webhook senders that require synchronous results (rare), use claude-haiku-4-5 with a strict max_tokens cap to minimize latency, and set an explicit timeout on your Claude API call so a slow response does not hang the request thread indefinitely.

Should I use FastAPI BackgroundTasks or Celery for Claude webhook processing?

Use BackgroundTasks for simple, low-volume use cases where jobs can be lost on server restart (e.g., internal tooling, low-stakes notifications). Use Celery, Redis Queue (RQ), or a database-backed queue for production workloads where job durability, visibility, retry logic, and horizontal scaling matter. The rule: if losing a job would be a business problem, use a persistent queue. If it is acceptable to skip a job on deploy, BackgroundTasks is fine.

Can I use Claude's streaming API with webhooks?

Yes, but only for the outbound Claude-to-your-server direction. Webhooks are standard HTTP POST — the webhook sender does not receive a streaming response from you. You can stream Claude's output internally (writing chunks to a database or SSE channel) while the webhook endpoint has already responded with 202. This is useful for real-time UI updates while a background job processes a webhook event.

What model should I use for webhook-triggered Claude calls?

Default to claude-haiku-4-5 for classification, routing, summarization, and short-form generation tasks triggered by webhooks. Haiku has the highest throughput (lowest latency, lowest cost), which matters when processing queues of hundreds of jobs. Use claude-sonnet-4-5 for tasks requiring multi-step reasoning, code generation, or nuanced analysis. Reserve claude-opus-4-5 for edge cases that genuinely need it — keeping Opus usage below 5% of total calls is a practical cost discipline. Full guidance at Haiku vs Sonnet vs Opus: Which Model?.

Summary

The core webhook + Claude API integration has three rules:

Acknowledge fast — return 2xx before calling Claude
Verify signatures — reject unverified payloads before any processing
Handle retries — idempotency keys prevent double-processing on webhook retries

The async background task pattern works for most use cases. Add a persistent queue (Redis, SQS, or database) when job durability, rate-limit management, or horizontal scaling become requirements. Prompt caching on shared system prompts dramatically increases effective throughput within your rate limits.

For the full production architecture picture, including request queuing, cost monitoring, and fallback chains, see Claude API Production Architecture.

→ P5 Cost Optimization Masterclass — $59

Prompt caching, model routing, batch API, and a full cost calculator. Everything needed to run Claude API at production scale without burning your budget.