Claude API + GraphQL Integration Guide (2026)

To integrate the Claude API with a GraphQL server, add an Apollo Server resolver that calls the Anthropic SDK and returns the completion as a query or mutation result. Define a completeText mutation in your schema, install @anthropic-ai/sdk, and call anthropic.messages.create() inside the resolver. For real-time output, expose a textStream subscription backed by an async iterator that yields delta tokens. The full setup takes under 30 minutes and works with any Apollo Server 4 project.

GraphQL Schema for an AI Completion Endpoint

Start by defining the types your API will expose:

# schema.graphql

type CompletionResult {
  id: String!
  content: String!
  model: String!
  inputTokens: Int!
  outputTokens: Int!
}

type StreamDelta {
  text: String!
  done: Boolean!
}

type Query {
  ping: String!
}

type Mutation {
  completeText(
    prompt: String!
    model: String
    maxTokens: Int
    systemPrompt: String
  ): CompletionResult!
}

type Subscription {
  textStream(
    prompt: String!
    model: String
    systemPrompt: String
  ): StreamDelta!
}

This schema keeps AI concerns isolated. The CompletionResult type mirrors the Anthropic response so clients can log token usage for cost tracking. See Claude API Cost and Prompt Caching Break-Even for how to translate inputTokens and outputTokens into dollar amounts.

Apollo Server Resolver Calling Claude

Install dependencies:

npm install @apollo/server @anthropic-ai/sdk graphql

Wire up the server and resolvers:

// server.js
import { ApolloServer } from "@apollo/server";
import { startStandaloneServer } from "@apollo/server/standalone";
import Anthropic from "@anthropic-ai/sdk";
import { readFileSync } from "fs";

const typeDefs = readFileSync("./schema.graphql", "utf-8");
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const resolvers = {
  Query: {
    ping: () => "pong",
  },

  Mutation: {
    completeText: async (_, { prompt, model, maxTokens, systemPrompt }) => {
      const response = await anthropic.messages.create({
        model: model ?? "claude-sonnet-4-5",
        max_tokens: maxTokens ?? 1024,
        system: systemPrompt ?? "You are a helpful assistant.",
        messages: [{ role: "user", content: prompt }],
      });

      const block = response.content[0];
      if (block.type !== "text") throw new Error("Unexpected content type");

      return {
        id: response.id,
        content: block.text,
        model: response.model,
        inputTokens: response.usage.input_tokens,
        outputTokens: response.usage.output_tokens,
      };
    },
  },
};

const server = new ApolloServer({ typeDefs, resolvers });
const { url } = await startStandaloneServer(server, { listen: { port: 4000 } });
console.log(`Server ready at ${url}`);

Test with a GraphQL query:

mutation {
  completeText(
    prompt: "Summarize the benefits of GraphQL in 3 bullet points."
    model: "claude-haiku-4-5"
  ) {
    content
    inputTokens
    outputTokens
  }
}

Build production-ready Claude integrations

Agent SDK Cookbook ($49) includes 30+ production recipes: streaming pipelines, multi-agent coordination, tool-use chains, error handling, and cost optimization patterns — applicable to any backend including GraphQL.

Get Agent SDK Cookbook — $49

Streaming Responses Over GraphQL Subscriptions

GraphQL subscriptions let you push token deltas to the client as Claude generates them. This eliminates the long wait for a full response and enables real-time chat UIs.

// subscriptions.js — add to your resolvers
import { PubSub } from "graphql-subscriptions";

const pubsub = new PubSub();

const subscriptionResolvers = {
  Subscription: {
    textStream: {
      subscribe: async function* (_, { prompt, model, systemPrompt }) {
        const stream = await anthropic.messages.stream({
          model: model ?? "claude-sonnet-4-5",
          max_tokens: 2048,
          system: systemPrompt ?? "You are a helpful assistant.",
          messages: [{ role: "user", content: prompt }],
        });

        for await (const event of stream) {
          if (
            event.type === "content_block_delta" &&
            event.delta.type === "text_delta"
          ) {
            yield { textStream: { text: event.delta.text, done: false } };
          }
        }

        yield { textStream: { text: "", done: true } };
      },
    },
  },
};

For subscriptions to work you need a WebSocket-capable transport. Use graphql-ws with Apollo Server:

npm install graphql-ws ws

// Update server setup
import { WebSocketServer } from "ws";
import { useServer } from "graphql-ws/lib/use/ws";
import { makeExecutableSchema } from "@graphql-tools/schema";

const schema = makeExecutableSchema({
  typeDefs,
  resolvers: { ...resolvers, ...subscriptionResolvers },
});

const httpServer = createServer(app);
const wsServer = new WebSocketServer({ server: httpServer, path: "/graphql" });
useServer({ schema }, wsServer);

Client subscription example:

subscription {
  textStream(prompt: "Write a haiku about GraphQL.") {
    text
    done
  }
}

Error Handling Patterns

Claude API errors fall into three categories: rate limits (429), invalid requests (400), and transient server errors (5xx). Handle each explicitly:

import Anthropic from "@anthropic-ai/sdk";

async function callClaudeWithRetry(params, maxRetries = 3) {
  let lastError;

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await anthropic.messages.create(params);
    } catch (err) {
      lastError = err;

      // Rate limit — exponential backoff
      if (err instanceof Anthropic.RateLimitError) {
        const delay = Math.pow(2, attempt) * 1000;
        console.warn(`Rate limited. Retrying in ${delay}ms…`);
        await new Promise((r) => setTimeout(r, delay));
        continue;
      }

      // Invalid request — do not retry, surface to caller
      if (err instanceof Anthropic.BadRequestError) {
        throw new Error(`Invalid request: ${err.message}`);
      }

      // 5xx server errors — retry with backoff
      if (err instanceof Anthropic.InternalServerError) {
        await new Promise((r) => setTimeout(r, 1000 * (attempt + 1)));
        continue;
      }

      throw err; // Unknown error — rethrow immediately
    }
  }

  throw lastError;
}

// In your resolver:
completeText: async (_, args) => {
  try {
    const response = await callClaudeWithRetry({
      model: args.model ?? "claude-sonnet-4-5",
      max_tokens: args.maxTokens ?? 1024,
      messages: [{ role: "user", content: args.prompt }],
    });
    // ... map response
  } catch (err) {
    // Apollo converts thrown errors to GraphQL errors automatically
    throw new GraphQLError(err.message, {
      extensions: { code: "CLAUDE_API_ERROR" },
    });
  }
},

Validation tip: Check prompt.length before calling the API. A GraphQL custom scalar NonEmptyString can enforce this at the schema layer, preventing wasted API calls entirely.

REST vs GraphQL for AI APIs

Dimension	REST	GraphQL
Response shape	Fixed per endpoint	Client-specified fields
Streaming	SSE or chunked HTTP	Subscriptions (WebSocket)
Type safety	Manual OpenAPI spec	Schema-first, code-gen ready
Token usage transparency	Manual logging	Include `inputTokens` / `outputTokens` in response type
Multiple models in one call	Multiple endpoints	Single mutation, `model` argument
Client complexity	Simple `fetch`	Requires GraphQL client (Apollo, urql)
Caching (CDN)	Easy with GET	Mutations/subscriptions bypass CDN
Over-fetching	Common	Eliminated by design
Best fit	Simple integrations, public APIs	Complex apps, multiple consumers

Practical guidance: If you already have an Apollo Gateway or federated graph, adding a Claude resolver is two files and zero new infrastructure. If you are starting from scratch and only need AI completions, a REST endpoint is simpler to deploy and debug. For vector search + AI generation workflows, see Claude API Semantic Search for patterns that work with either transport.

Cost and Performance

Latency targets:

Approach	Time to first token	Full response (1 000 tokens)
REST (non-streaming)	N/A	3–8 s
GraphQL mutation (non-streaming)	N/A	3–8 s
GraphQL subscription (streaming)	200–400 ms	continuous deltas

Streaming subscriptions do not reduce total token cost — they improve perceived performance. For cost reduction:

Route by complexity — use claude-haiku-4-5 for simple completions (~10x cheaper than Sonnet). See Claude Haiku vs Sonnet vs Opus: Which Model for a decision matrix.
Cache system prompts — add cache_control: { type: "ephemeral" } to the system message block. Repeated calls with the same system prompt hit the cache and cut input token costs by ~90%.
Limit max_tokens — set the lowest value that still satisfies the use case. A summary endpoint rarely needs more than 512 tokens.
Log token usage — the CompletionResult type already exposes inputTokens and outputTokens. Aggregate these in your metrics layer to detect runaway prompts.

Benchmark data point: A team routing 80% of their GraphQL AI queries to Haiku and 20% to Sonnet reduced their monthly Claude bill by 68% with no user-visible quality change for their summarization use case.

30+ production recipes for Claude API integrations

Agent SDK Cookbook ($49) covers streaming pipelines, multi-agent coordination, tool-use chains, error recovery, prompt caching, and cost optimization — patterns that map directly onto GraphQL resolvers and subscriptions.

Get Agent SDK Cookbook — $49

Frequently Asked Questions

Can I use Claude API with GraphQL without Apollo Server?

Yes. The Anthropic SDK is transport-agnostic — any GraphQL server that supports custom resolvers works. Alternatives include Mercurius (Fastify), Yoga (Hapi/Express), and Pothos with any HTTP framework. The schema definition and resolver logic shown above are identical regardless of which GraphQL runtime you use.

How do I handle long-running Claude requests in GraphQL mutations?

Mutations are synchronous by default — the client waits for the response. For prompts that generate more than ~2 000 tokens, switch to a subscription-based streaming pattern so the client receives deltas incrementally. Alternatively, return a job ID from the mutation and poll a separate completionJob(id: ID!) query until the result is ready.

Is prompt caching compatible with GraphQL resolvers?

Yes. Prompt caching is a server-side API feature — it does not affect the GraphQL schema or resolver signature at all. Add cache_control: { type: "ephemeral" } to your system message block in the Anthropic SDK call. The cache lives on Anthropic's infrastructure for five minutes. Repeated resolver calls with the same system prompt will hit the cache and reduce input token costs significantly.

How do I secure a public-facing GraphQL AI endpoint?

Three layers are recommended: (1) Apollo's built-in depth/complexity limits prevent runaway nested queries, (2) rate-limit the resolver by user or IP using a middleware like graphql-rate-limit, and (3) validate the prompt argument length with a custom scalar before it reaches the Anthropic SDK. Never expose your ANTHROPIC_API_KEY to the client — all Anthropic calls must happen server-side.

Can I use DataLoader to batch Claude API calls in GraphQL?

DataLoader batches multiple resolver calls within the same tick into a single network request. The Anthropic Messages API does not accept batched prompts in one call (unlike the Batch API), so DataLoader does not provide the usual N+1 benefit here. For bulk processing, use the Anthropic Batch API directly and return results asynchronously. For the Claude Code complete workflow, see Claude Code Complete Guide.

Claude API + GraphQL Integration Guide (2026)

GraphQL Schema for an AI Completion Endpoint

Apollo Server Resolver Calling Claude

Streaming Responses Over GraphQL Subscriptions

Error Handling Patterns

REST vs GraphQL for AI APIs

Cost and Performance

Frequently Asked Questions

Can I use Claude API with GraphQL without Apollo Server?

How do I handle long-running Claude requests in GraphQL mutations?

Is prompt caching compatible with GraphQL resolvers?

How do I secure a public-facing GraphQL AI endpoint?

Can I use DataLoader to batch Claude API calls in GraphQL?

Related guides

Claude API Content Moderation Guide (2026)

Build an AI Chatbot with Next.js and Claude: Step-by-Step Guide

Claude API Discord Bot: Complete Guide (2026)

Claude Python SDK Tutorial: Complete Setup and Usage Guide (2026)

Tools and references