Claude API + GraphQL Integration Guide (2026)
To integrate the Claude API with a GraphQL server, add an Apollo Server resolver that calls the Anthropic SDK and returns the completion as a query or mutation result. Define a completeText mutation in your schema, install @anthropic-ai/sdk, and call anthropic.messages.create() inside the resolver. For real-time output, expose a textStream subscription backed by an async iterator that yields delta tokens. The full setup takes under 30 minutes and works with any Apollo Server 4 project.
GraphQL Schema for an AI Completion Endpoint
Start by defining the types your API will expose:
# schema.graphql
type CompletionResult {
id: String!
content: String!
model: String!
inputTokens: Int!
outputTokens: Int!
}
type StreamDelta {
text: String!
done: Boolean!
}
type Query {
ping: String!
}
type Mutation {
completeText(
prompt: String!
model: String
maxTokens: Int
systemPrompt: String
): CompletionResult!
}
type Subscription {
textStream(
prompt: String!
model: String
systemPrompt: String
): StreamDelta!
}
This schema keeps AI concerns isolated. The CompletionResult type mirrors the Anthropic response so clients can log token usage for cost tracking. See Claude API Cost and Prompt Caching Break-Even for how to translate inputTokens and outputTokens into dollar amounts.
Apollo Server Resolver Calling Claude
Install dependencies:
npm install @apollo/server @anthropic-ai/sdk graphql
Wire up the server and resolvers:
// server.js
import { ApolloServer } from "@apollo/server";
import { startStandaloneServer } from "@apollo/server/standalone";
import Anthropic from "@anthropic-ai/sdk";
import { readFileSync } from "fs";
const typeDefs = readFileSync("./schema.graphql", "utf-8");
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const resolvers = {
Query: {
ping: () => "pong",
},
Mutation: {
completeText: async (_, { prompt, model, maxTokens, systemPrompt }) => {
const response = await anthropic.messages.create({
model: model ?? "claude-sonnet-4-5",
max_tokens: maxTokens ?? 1024,
system: systemPrompt ?? "You are a helpful assistant.",
messages: [{ role: "user", content: prompt }],
});
const block = response.content[0];
if (block.type !== "text") throw new Error("Unexpected content type");
return {
id: response.id,
content: block.text,
model: response.model,
inputTokens: response.usage.input_tokens,
outputTokens: response.usage.output_tokens,
};
},
},
};
const server = new ApolloServer({ typeDefs, resolvers });
const { url } = await startStandaloneServer(server, { listen: { port: 4000 } });
console.log(`Server ready at ${url}`);
Test with a GraphQL query:
mutation {
completeText(
prompt: "Summarize the benefits of GraphQL in 3 bullet points."
model: "claude-haiku-4-5"
) {
content
inputTokens
outputTokens
}
}
Build production-ready Claude integrations
Agent SDK Cookbook ($49) includes 30+ production recipes: streaming pipelines, multi-agent coordination, tool-use chains, error handling, and cost optimization patterns — applicable to any backend including GraphQL.
Streaming Responses Over GraphQL Subscriptions
GraphQL subscriptions let you push token deltas to the client as Claude generates them. This eliminates the long wait for a full response and enables real-time chat UIs.
// subscriptions.js — add to your resolvers
import { PubSub } from "graphql-subscriptions";
const pubsub = new PubSub();
const subscriptionResolvers = {
Subscription: {
textStream: {
subscribe: async function* (_, { prompt, model, systemPrompt }) {
const stream = await anthropic.messages.stream({
model: model ?? "claude-sonnet-4-5",
max_tokens: 2048,
system: systemPrompt ?? "You are a helpful assistant.",
messages: [{ role: "user", content: prompt }],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
yield { textStream: { text: event.delta.text, done: false } };
}
}
yield { textStream: { text: "", done: true } };
},
},
},
};
For subscriptions to work you need a WebSocket-capable transport. Use graphql-ws with Apollo Server:
npm install graphql-ws ws
// Update server setup
import { WebSocketServer } from "ws";
import { useServer } from "graphql-ws/lib/use/ws";
import { makeExecutableSchema } from "@graphql-tools/schema";
const schema = makeExecutableSchema({
typeDefs,
resolvers: { ...resolvers, ...subscriptionResolvers },
});
const httpServer = createServer(app);
const wsServer = new WebSocketServer({ server: httpServer, path: "/graphql" });
useServer({ schema }, wsServer);
Client subscription example:
subscription {
textStream(prompt: "Write a haiku about GraphQL.") {
text
done
}
}
Error Handling Patterns
Claude API errors fall into three categories: rate limits (429), invalid requests (400), and transient server errors (5xx). Handle each explicitly:
import Anthropic from "@anthropic-ai/sdk";
async function callClaudeWithRetry(params, maxRetries = 3) {
let lastError;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await anthropic.messages.create(params);
} catch (err) {
lastError = err;
// Rate limit — exponential backoff
if (err instanceof Anthropic.RateLimitError) {
const delay = Math.pow(2, attempt) * 1000;
console.warn(`Rate limited. Retrying in ${delay}ms…`);
await new Promise((r) => setTimeout(r, delay));
continue;
}
// Invalid request — do not retry, surface to caller
if (err instanceof Anthropic.BadRequestError) {
throw new Error(`Invalid request: ${err.message}`);
}
// 5xx server errors — retry with backoff
if (err instanceof Anthropic.InternalServerError) {
await new Promise((r) => setTimeout(r, 1000 * (attempt + 1)));
continue;
}
throw err; // Unknown error — rethrow immediately
}
}
throw lastError;
}
// In your resolver:
completeText: async (_, args) => {
try {
const response = await callClaudeWithRetry({
model: args.model ?? "claude-sonnet-4-5",
max_tokens: args.maxTokens ?? 1024,
messages: [{ role: "user", content: args.prompt }],
});
// ... map response
} catch (err) {
// Apollo converts thrown errors to GraphQL errors automatically
throw new GraphQLError(err.message, {
extensions: { code: "CLAUDE_API_ERROR" },
});
}
},
Validation tip: Check prompt.length before calling the API. A GraphQL custom scalar NonEmptyString can enforce this at the schema layer, preventing wasted API calls entirely.
REST vs GraphQL for AI APIs
| Dimension | REST | GraphQL |
|---|---|---|
| Response shape | Fixed per endpoint | Client-specified fields |
| Streaming | SSE or chunked HTTP | Subscriptions (WebSocket) |
| Type safety | Manual OpenAPI spec | Schema-first, code-gen ready |
| Token usage transparency | Manual logging | Include inputTokens / outputTokens in response type |
| Multiple models in one call | Multiple endpoints | Single mutation, model argument |
| Client complexity | Simple fetch |
Requires GraphQL client (Apollo, urql) |
| Caching (CDN) | Easy with GET | Mutations/subscriptions bypass CDN |
| Over-fetching | Common | Eliminated by design |
| Best fit | Simple integrations, public APIs | Complex apps, multiple consumers |
Practical guidance: If you already have an Apollo Gateway or federated graph, adding a Claude resolver is two files and zero new infrastructure. If you are starting from scratch and only need AI completions, a REST endpoint is simpler to deploy and debug. For vector search + AI generation workflows, see Claude API Semantic Search for patterns that work with either transport.
Cost and Performance
Latency targets:
| Approach | Time to first token | Full response (1 000 tokens) |
|---|---|---|
| REST (non-streaming) | N/A | 3–8 s |
| GraphQL mutation (non-streaming) | N/A | 3–8 s |
| GraphQL subscription (streaming) | 200–400 ms | continuous deltas |
Streaming subscriptions do not reduce total token cost — they improve perceived performance. For cost reduction:
- Route by complexity — use
claude-haiku-4-5for simple completions (~10x cheaper than Sonnet). See Claude Haiku vs Sonnet vs Opus: Which Model for a decision matrix. - Cache system prompts — add
cache_control: { type: "ephemeral" }to the system message block. Repeated calls with the same system prompt hit the cache and cut input token costs by ~90%. - Limit
max_tokens— set the lowest value that still satisfies the use case. A summary endpoint rarely needs more than 512 tokens. - Log token usage — the
CompletionResulttype already exposesinputTokensandoutputTokens. Aggregate these in your metrics layer to detect runaway prompts.
Benchmark data point: A team routing 80% of their GraphQL AI queries to Haiku and 20% to Sonnet reduced their monthly Claude bill by 68% with no user-visible quality change for their summarization use case.
30+ production recipes for Claude API integrations
Agent SDK Cookbook ($49) covers streaming pipelines, multi-agent coordination, tool-use chains, error recovery, prompt caching, and cost optimization — patterns that map directly onto GraphQL resolvers and subscriptions.
Frequently Asked Questions
Can I use Claude API with GraphQL without Apollo Server?
Yes. The Anthropic SDK is transport-agnostic — any GraphQL server that supports custom resolvers works. Alternatives include Mercurius (Fastify), Yoga (Hapi/Express), and Pothos with any HTTP framework. The schema definition and resolver logic shown above are identical regardless of which GraphQL runtime you use.
How do I handle long-running Claude requests in GraphQL mutations?
Mutations are synchronous by default — the client waits for the response. For prompts that generate more than ~2 000 tokens, switch to a subscription-based streaming pattern so the client receives deltas incrementally. Alternatively, return a job ID from the mutation and poll a separate completionJob(id: ID!) query until the result is ready.
Is prompt caching compatible with GraphQL resolvers?
Yes. Prompt caching is a server-side API feature — it does not affect the GraphQL schema or resolver signature at all. Add cache_control: { type: "ephemeral" } to your system message block in the Anthropic SDK call. The cache lives on Anthropic's infrastructure for five minutes. Repeated resolver calls with the same system prompt will hit the cache and reduce input token costs significantly.
How do I secure a public-facing GraphQL AI endpoint?
Three layers are recommended: (1) Apollo's built-in depth/complexity limits prevent runaway nested queries, (2) rate-limit the resolver by user or IP using a middleware like graphql-rate-limit, and (3) validate the prompt argument length with a custom scalar before it reaches the Anthropic SDK. Never expose your ANTHROPIC_API_KEY to the client — all Anthropic calls must happen server-side.
Can I use DataLoader to batch Claude API calls in GraphQL?
DataLoader batches multiple resolver calls within the same tick into a single network request. The Anthropic Messages API does not accept batched prompts in one call (unlike the Batch API), so DataLoader does not provide the usual N+1 benefit here. For bulk processing, use the Anthropic Batch API directly and return results asynchronously. For the Claude Code complete workflow, see Claude Code Complete Guide.