Vercel AI SDK + Claude: Streaming, Tool Use, and Edge Deployment (2026)
How do I use Claude with the Vercel AI SDK? Install @ai-sdk/anthropic, instantiate the provider with anthropic("claude-sonnet-4-6"), and stream responses with streamText(). The SDK works on both the Edge runtime (SSE-friendly, ~150ms cold start, 25s max execution) and Node (long-running tools, native deps). Tool use, vision, and prompt caching all flow through the same streamText API once you know where to attach providerOptions. Three gotchas catch every team: an outdated anthropic-version header, the system-message role mismatch, and a stream-finalize race that swallows partial output. Below is the working setup, with the exact provider config and a tool-call example you can paste into a Next.js route handler.
Install and minimal example
The SDK is provider-agnostic; you pick the model package separately.
bun add ai @ai-sdk/anthropic
# or: npm i ai @ai-sdk/anthropic
Set ANTHROPIC_API_KEY in .env.local. The provider reads it automatically — no client init needed.
A streaming Next.js route handler:
// app/api/chat/route.ts
import { anthropic } from "@ai-sdk/anthropic";
import { streamText, convertToCoreMessages } from "ai";
export const runtime = "edge"; // or "nodejs"
export const maxDuration = 30;
export async function POST(req: Request) {
const { messages } = await req.json();
const result = await streamText({
model: anthropic("claude-sonnet-4-6"),
system: "You are a concise technical assistant.",
messages: convertToCoreMessages(messages),
maxTokens: 1024,
});
return result.toDataStreamResponse();
}
Frontend with useChat():
"use client";
import { useChat } from "ai/react";
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat();
return (
<form onSubmit={handleSubmit}>
{messages.map((m) => (
<div key={m.id}>{m.role}: {m.content}</div>
))}
<input value={input} onChange={handleInputChange} />
</form>
);
}
That is the entire happy path. Streaming works, role normalization works, abort signals propagate. For deeper streaming internals see the Claude API streaming guide.
Tool use with the SDK
Tools are first-class. Define them with Zod, return JSON, and the SDK handles the multi-turn dance.
import { anthropic } from "@ai-sdk/anthropic";
import { streamText, tool } from "ai";
import { z } from "zod";
export async function POST(req: Request) {
const { messages } = await req.json();
const result = await streamText({
model: anthropic("claude-sonnet-4-6"),
messages,
tools: {
get_weather: tool({
description: "Get current weather for a city.",
parameters: z.object({
city: z.string().describe("City name, e.g. Seoul"),
unit: z.enum(["c", "f"]).default("c"),
}),
execute: async ({ city, unit }) => {
const r = await fetch(
`https://api.example.com/weather?city=${encodeURIComponent(city)}&unit=${unit}`,
);
return await r.json();
},
}),
},
maxSteps: 4, // allow multi-turn tool loops
});
return result.toDataStreamResponse();
}
maxSteps is critical — without it, the SDK stops after the first tool result and never lets Claude summarize. Set it to the longest tool chain you reasonably expect.
Streaming UI: useChat vs manual SSE
useChat is the right default. It handles message state, optimistic updates, abort, error retry, and tool-call rendering. Reach for manual SSE only when:
- You need to render a non-chat UI (e.g. a streaming code diff or a generated chart).
- You are streaming to a non-React client (mobile, CLI, server-to-server).
- You want full control of the wire format (e.g. piping into a custom replay log).
For manual mode, use result.textStream (an AsyncIterable<string>) on the server and consume Response body chunks on the client. toDataStreamResponse() is the wire format useChat expects; do not mix the two.
Prompt caching with the SDK
This is where 80% of teams hit a wall. The SDK does not auto-add cache_control — you must mark blocks via providerOptions:
const result = await streamText({
model: anthropic("claude-sonnet-4-6"),
messages: [
{
role: "system",
content: LARGE_SYSTEM_PROMPT,
providerOptions: {
anthropic: { cacheControl: { type: "ephemeral" } },
},
},
...userMessages,
],
});
For multi-turn conversations, mark the last user message of each cached prefix with cacheControl. Anthropic charges 25% extra on cache writes and 10% of input on hits — break-even is roughly 2 reuses within the 5-minute TTL. The full math, plus a TypeScript helper to auto-mark prefixes, lives in the Claude prompt caching guide.
Cutting Claude costs in production? The Cost Optimization Masterclass walks through caching, model routing (Haiku/Sonnet/Opus), and a real account that dropped from $1,840/month to $310 — with the exact
providerOptionspatterns shown above.
If your hit rate stays under 80%, you are almost certainly paying for cache writes you never read. Log usage.cacheReadInputTokens vs usage.cacheCreationInputTokens on every response and alert when the ratio inverts.
Edge runtime caveats
The Edge runtime is the right default for streaming Claude responses. Cold start is ~150ms in Vercel's measurements, SSE flushing works, and Response streaming is native. But the 25-second hard execution limit will kill long tool chains. Switch to runtime = "nodejs" when:
- A tool calls a slow external API (DB migration, scraper, render farm).
- You need filesystem access for large attachments.
- You depend on a native module (
sharp,canvas, native crypto). - You need durable connections (websockets, long polling) — Vercel functions are not the right primitive; use a worker.
maxDuration = 300 extends Node functions on Pro plans. Hobby plans cap at 60s. If you need 5+ minute jobs, push to a queue and stream the result back via Server-Sent Events from a separate poller.
Vision: passing image messages
Image input goes in the same messages array as multipart content:
const result = await streamText({
model: anthropic("claude-sonnet-4-6"),
messages: [
{
role: "user",
content: [
{ type: "text", text: "What is in this screenshot?" },
{
type: "image",
image: new URL("https://example.com/screenshot.png"),
// or: image: Buffer.from(...) for base64
},
],
},
],
});
The SDK normalizes URL fetching, base64 encoding, and MIME detection. Anthropic accepts JPEG, PNG, GIF, WebP up to ~5MB per image; the SDK does not enforce the size cap, so do it yourself before paying for a rejected request.
Three specific gotchas
1. The anthropic-version header default may be outdated. The pinned version inside @ai-sdk/anthropic lags behind the API. If you need a recent feature (extended thinking, files API, the latest tool-use mode), override it:
const model = anthropic("claude-sonnet-4-6", {
headers: { "anthropic-version": "2026-04-01" },
});
Or globally via providerOptions.anthropic.headers on the call. Symptom: a feature works in the raw SDK but silently no-ops through Vercel AI SDK.
2. The SDK normalizes assistant and user roles but NOT system. Anthropic's API takes system as a top-level parameter, not a message. If you put a { role: "system" } entry in messages, behavior depends on SDK version: some swallow it, some convert it, some throw. Always pass system text via the top-level system: field on streamText.
3. Stream finalize race — check finishReason on every chunk. A stream that ends cleanly has finishReason: "stop" or "tool-calls". A stream that ends with "length" was truncated by maxTokens; "content-filter" means a safety block. If you only render text and never inspect finishReason, your UI silently shows half-answers when limits hit. Wire it through useChat's onFinish callback or read result.finishReason on the server.
Production deploy
There is almost no config:
ANTHROPIC_API_KEYin Vercel project env vars (Production + Preview).runtimeandmaxDurationexported from the route handler.- Vercel Firewall to rate-limit unauthenticated
/api/chatcalls (Claude requests cost real money — never expose an unprotected endpoint). - Cost guardrails — the Claude API cost monitoring guide shows how to attach per-user spend caps via Anthropic's usage API.
That is it. No region pinning, no warmup, no provider-specific build flags.
Vercel AI SDK vs raw @anthropic-ai/sdk
Skip the Vercel SDK when you need:
- Batch API (50% discount on async jobs >24hr) — only in raw SDK.
- Files API for persistent uploads — only in raw SDK.
- Computer use beta — surface lags in the AI SDK.
- Custom retry/backoff logic — the AI SDK's retry is opinionated.
Use the Vercel AI SDK when you want streaming UI primitives, multi-provider portability (swap to GPT-5 with one line), and the useChat ergonomics. For pure backend pipelines or cron jobs, raw SDK is leaner.
Frequently Asked Questions
Does it support prompt caching?
Yes, via providerOptions.anthropic.cacheControl: { type: "ephemeral" } on the message you want cached. The SDK does not auto-detect cacheable prefixes — you mark them. Verify hits by logging usage.cacheReadInputTokens from the onFinish callback.
Edge or Node runtime?
Default to Edge for chat-style apps: faster cold start, native SSE, lower cost on Vercel. Switch to Node when a tool runs longer than 25 seconds, you need a native module, or you call the filesystem. The runtime is per-route — mix freely in the same app.
Can I switch between Claude and OpenAI dynamically?
Yes. Both providers implement the same LanguageModelV1 interface, so streamText({ model: useClaude ? anthropic(...) : openai(...) }) works. Tool definitions are portable; vision message format is identical. The one wart: provider-specific options (cache control, reasoning effort) are siloed under providerOptions.anthropic vs providerOptions.openai.
Does useChat work with tool calls?
Yes, with maxSteps set on the server and tool-call rendering wired in your message component. The hook exposes message.toolInvocations — iterate it to render tool inputs/outputs. Without maxSteps > 1 the loop stops after the first tool result and Claude never gets to summarize.
Why is my stream cut off?
Three causes, in order of likelihood: (1) maxTokens hit — check finishReason === "length". (2) Edge runtime 25-second timeout — switch to Node or shorten the work. (3) Client disconnected and the SDK aborted — useChat handles this, but custom fetchers may need an AbortController wired through. Always log finishReason server-side; "silent truncation" is almost always one of these three.
Last verified May 2026 against @ai-sdk/anthropic v1.x and ai v5.x. Model IDs and runtime caps refresh quarterly — check the Vercel AI SDK changelog before pinning to a version.