Claude API Ruby on Rails: Complete Guide (2026)

Q: Is there an official Anthropic Ruby gem?

Yes. Anthropic publishes the anthropic gem officially on RubyGems. Add gem "anthropic" to your Gemfile and run bundle install. It supports messages, streaming, tool use, and prompt caching. You can also call the REST API directly with Faraday or Net::HTTP if you prefer zero dependencies.

Q: How do I stream Claude responses in Rails without blocking the main thread?

Use ActionController::Live with include ActionController::Live in your controller and write chunks to response.stream. Puma handles the connection in a separate thread. Ensure your web server is configured for streaming (disable response buffering with X-Accel-Buffering: no for Nginx). Close the stream in an ensure block to prevent connection leaks.

Q: How do I handle token limits in Rails?

Track response.dig("usage", "input_tokens") and output_tokens after each call and log them. For large inputs, truncate or chunk the content before sending. Claude's context window is 200k tokens for Sonnet and Haiku — well above most Rails use cases, but worth monitoring in document-processing pipelines.

You can call the Claude API from Ruby on Rails using the official anthropic gem — add it to your Gemfile, set ANTHROPIC_API_KEY, and get a response in under 10 lines of code. The gem supports message creation, streaming via Server-Sent Events, tool use, and prompt caching. This guide covers gem installation, service objects, ActionController::Live streaming, ActiveJob background processing, Rails API controller patterns, error handling with retries, and rate limit strategies.

Gem Installation

Add to your Gemfile:

# Gemfile
gem "anthropic"

# Optional: for HTTP-level retry and timeout control
gem "faraday-retry"

Then install:

bundle install

Set your API key as an environment variable. Never hardcode it:

# .env (use dotenv-rails in development)
ANTHROPIC_API_KEY=sk-ant-...

For production on Heroku, Fly.io, or Render, set the env var in the platform dashboard. In Rails credentials:

rails credentials:edit
# Add: anthropic_api_key: sk-ant-...

Reference it in your initializer:

# config/initializers/anthropic.rb
Anthropic.configure do |config|
  config.access_token = ENV.fetch("ANTHROPIC_API_KEY") do
    Rails.application.credentials.anthropic_api_key
  end
end

Basic Message Creation

require "anthropic"

client = Anthropic::Client.new

response = client.messages(
  parameters: {
    model: "claude-sonnet-4-5",
    max_tokens: 1024,
    messages: [
      { role: "user", content: "Explain Rails ActiveRecord callbacks in one paragraph." }
    ]
  }
)

puts response.dig("content", 0, "text")
puts "Tokens used: #{response.dig("usage", "input_tokens")} in / #{response.dig("usage", "output_tokens")} out"

The client reads ANTHROPIC_API_KEY from your environment automatically once configured.

Service Object Pattern

Encapsulate API calls in a service object to keep controllers thin:

# app/services/claude_service.rb
class ClaudeService
  MODEL = "claude-sonnet-4-5"

  def initialize
    @client = Anthropic::Client.new
  end

  def chat(user_message, system_prompt: nil)
    params = {
      model: MODEL,
      max_tokens: 1024,
      messages: [{ role: "user", content: user_message }]
    }
    params[:system] = system_prompt if system_prompt

    response = @client.messages(parameters: params)
    response.dig("content", 0, "text")
  rescue Anthropic::Error => e
    Rails.logger.error("[ClaudeService] API error: #{e.message}")
    raise
  end
end

Use it in your controller:

# app/controllers/api/v1/chat_controller.rb
class Api::V1::ChatController < ApplicationController
  def create
    service = ClaudeService.new
    reply = service.chat(
      params.require(:message),
      system_prompt: "You are a helpful Rails assistant."
    )
    render json: { reply: reply }
  rescue Anthropic::Error => e
    render json: { error: e.message }, status: :unprocessable_entity
  end
end

Build production Rails integrations with Claude

Agent SDK Cookbook ($49) includes 30+ production recipes: streaming pipelines, multi-agent coordination, background job patterns, error handling, and cost optimization — with Ruby-compatible patterns throughout.

Get Agent SDK Cookbook — $49

Streaming with ActionController::Live

Use Rails ActionController::Live and Server-Sent Events to stream tokens to the browser in real time:

# app/controllers/api/v1/stream_controller.rb
class Api::V1::StreamController < ApplicationController
  include ActionController::Live

  def create
    response.headers["Content-Type"] = "text/event-stream"
    response.headers["Cache-Control"] = "no-cache"
    response.headers["X-Accel-Buffering"] = "no"

    client = Anthropic::Client.new

    client.messages(
      parameters: {
        model: "claude-sonnet-4-5",
        max_tokens: 2048,
        stream: true,
        messages: [{ role: "user", content: params[:message] }]
      }
    ) do |chunk, _bytesize|
      if chunk["type"] == "content_block_delta"
        text = chunk.dig("delta", "text").to_s
        response.stream.write("data: #{text.to_json}\n\n") unless text.empty?
      end

      if chunk["type"] == "message_stop"
        response.stream.write("data: [DONE]\n\n")
      end
    end
  rescue ActionController::Live::ClientDisconnected
    Rails.logger.info("[StreamController] Client disconnected")
  ensure
    response.stream.close
  end
end

Add the route:

# config/routes.rb
namespace :api do
  namespace :v1 do
    post "stream", to: "stream#create"
    post "chat", to: "chat#create"
  end
end

On the frontend, consume the stream with the EventSource API or fetch with a ReadableStream.

Performance note: First token arrives in 300–500ms with streaming. Without streaming, a 2,000-token response waits 5–12 seconds for the full payload. For any user-facing feature, always stream.

ActiveJob Background Processing

For long-running or batch tasks, offload Claude calls to a background job:

# app/jobs/claude_analysis_job.rb
class ClaudeAnalysisJob < ApplicationJob
  queue_as :default

  retry_on Anthropic::RateLimitError, wait: :polynomially_longer, attempts: 5
  retry_on Anthropic::ServerError, wait: 10.seconds, attempts: 3
  discard_on Anthropic::AuthenticationError

  def perform(document_id)
    document = Document.find(document_id)

    client = Anthropic::Client.new
    response = client.messages(
      parameters: {
        model: "claude-haiku-4-5",   # Use Haiku for batch tasks (10x cheaper)
        max_tokens: 512,
        messages: [
          {
            role: "user",
            content: "Summarize this document in 3 bullet points:\n\n#{document.body}"
          }
        ]
      }
    )

    summary = response.dig("content", 0, "text")
    document.update!(summary: summary, summarized_at: Time.current)
  end
end

Enqueue it from a controller or callback:

ClaudeAnalysisJob.perform_later(document.id)

Use Sidekiq or Solid Queue as the backend. For cost optimization, route simple summarization tasks to claude-haiku-4-5 and reserve Sonnet for complex reasoning. See Claude Haiku vs Sonnet vs Opus: Which Model for detailed benchmarks.

Error Handling with Retries

The anthropic gem raises typed errors you can rescue explicitly:

# app/services/claude_service.rb (robust version)
class ClaudeService
  MAX_RETRIES = 3
  BASE_DELAY  = 1.0  # seconds

  def chat_with_retry(message)
    attempts = 0
    begin
      attempts += 1
      call_api(message)
    rescue Anthropic::RateLimitError => e
      retry_after = e.response&.headers&.[]("retry-after")&.to_i || (BASE_DELAY * (2 ** attempts))
      raise if attempts >= MAX_RETRIES
      Rails.logger.warn("[ClaudeService] Rate limited. Retrying in #{retry_after}s (attempt #{attempts})")
      sleep(retry_after)
      retry
    rescue Anthropic::ServerError
      raise if attempts >= MAX_RETRIES
      sleep(BASE_DELAY * (2 ** attempts))
      retry
    rescue Anthropic::AuthenticationError => e
      Rails.logger.error("[ClaudeService] Invalid API key: #{e.message}")
      raise
    rescue Anthropic::BadRequestError => e
      Rails.logger.error("[ClaudeService] Bad request: #{e.message}")
      raise
    end
  end

  private

  def call_api(message)
    @client ||= Anthropic::Client.new
    response = @client.messages(
      parameters: {
        model: "claude-sonnet-4-5",
        max_tokens: 1024,
        messages: [{ role: "user", content: message }]
      }
    )
    response.dig("content", 0, "text")
  end
end

For the full list of error codes and what each one means, see Claude API Error Codes Reference.

Rate Limit Handling

Anthropic's default rate limits vary by tier. Handle them at the infrastructure level:

# config/initializers/anthropic_rate_limiter.rb

# Use Redis + a token bucket for concurrent requests
# Simple in-process throttle using the `throttle` pattern:
module AnthropicRateLimiter
  REQUESTS_PER_MINUTE = 50  # Adjust to your tier

  def self.with_limit(&block)
    Semaphore.with_limit(REQUESTS_PER_MINUTE, per: 60) { block.call }
  end
end

For header-based backoff, inspect the response headers on 429:

rescue Anthropic::RateLimitError => e
  # Headers available in e.response.headers:
  # x-ratelimit-limit-requests, x-ratelimit-remaining-requests
  # x-ratelimit-reset-requests, retry-after
  wait = e.response&.headers&.[]("retry-after")&.to_f || 60
  sleep(wait)
  retry
end

For cost details and when prompt caching pays off at scale, see Claude API Cost and Prompt Caching Break-Even.

30+ production recipes for Claude API integrations

Agent SDK Cookbook ($49) covers streaming pipelines, multi-agent coordination, tool use chains, error recovery, and cost optimization — patterns that map directly to Ruby on Rails projects.

Get Agent SDK Cookbook — $49

Frequently Asked Questions

Is there an official Anthropic Ruby gem?

Yes. Anthropic publishes the anthropic gem officially on RubyGems. Add gem "anthropic" to your Gemfile and run bundle install. It supports messages, streaming, tool use, and prompt caching. You can also call the REST API directly with Faraday or Net::HTTP if you prefer zero dependencies.

How do I stream Claude responses in Rails without blocking the main thread?

Use ActionController::Live with include ActionController::Live in your controller and write chunks to response.stream. Puma handles the connection in a separate thread. Ensure your web server is configured for streaming (disable response buffering with X-Accel-Buffering: no for Nginx). Close the stream in an ensure block to prevent connection leaks.

Which model should I use for Rails background jobs?

For high-volume batch tasks (summarization, classification, extraction), use claude-haiku-4-5 — it is roughly 10x cheaper than Sonnet with comparable quality for straightforward tasks. Reserve claude-sonnet-4-5 for complex reasoning or code generation. See Claude Haiku vs Sonnet vs Opus: Which Model for a decision framework.

How do I handle token limits in Rails?

Track response.dig("usage", "input_tokens") and output_tokens after each call and log them. For large inputs, truncate or chunk the content before sending. Claude's context window is 200k tokens for Sonnet and Haiku — well above most Rails use cases, but worth monitoring in document-processing pipelines.

Can I use Claude with Rails Action Cable?

Yes. Call the Anthropic API in a Channel action and broadcast chunks as they arrive using ActionCable::Server::Broadcasting. For streaming, run the API call in a background thread or job to avoid blocking the Action Cable connection. Each streamed chunk calls ActionCable.server.broadcast(channel, { text: chunk }).