Claude API Ruby on Rails: Complete Guide (2026)
You can call the Claude API from Ruby on Rails using the official anthropic gem — add it to your Gemfile, set ANTHROPIC_API_KEY, and get a response in under 10 lines of code. The gem supports message creation, streaming via Server-Sent Events, tool use, and prompt caching. This guide covers gem installation, service objects, ActionController::Live streaming, ActiveJob background processing, Rails API controller patterns, error handling with retries, and rate limit strategies.
Gem Installation
Add to your Gemfile:
# Gemfile
gem "anthropic"
# Optional: for HTTP-level retry and timeout control
gem "faraday-retry"
Then install:
bundle install
Set your API key as an environment variable. Never hardcode it:
# .env (use dotenv-rails in development)
ANTHROPIC_API_KEY=sk-ant-...
For production on Heroku, Fly.io, or Render, set the env var in the platform dashboard. In Rails credentials:
rails credentials:edit
# Add: anthropic_api_key: sk-ant-...
Reference it in your initializer:
# config/initializers/anthropic.rb
Anthropic.configure do |config|
config.access_token = ENV.fetch("ANTHROPIC_API_KEY") do
Rails.application.credentials.anthropic_api_key
end
end
Basic Message Creation
require "anthropic"
client = Anthropic::Client.new
response = client.messages(
parameters: {
model: "claude-sonnet-4-5",
max_tokens: 1024,
messages: [
{ role: "user", content: "Explain Rails ActiveRecord callbacks in one paragraph." }
]
}
)
puts response.dig("content", 0, "text")
puts "Tokens used: #{response.dig("usage", "input_tokens")} in / #{response.dig("usage", "output_tokens")} out"
The client reads ANTHROPIC_API_KEY from your environment automatically once configured.
Service Object Pattern
Encapsulate API calls in a service object to keep controllers thin:
# app/services/claude_service.rb
class ClaudeService
MODEL = "claude-sonnet-4-5"
def initialize
@client = Anthropic::Client.new
end
def chat(user_message, system_prompt: nil)
params = {
model: MODEL,
max_tokens: 1024,
messages: [{ role: "user", content: user_message }]
}
params[:system] = system_prompt if system_prompt
response = @client.messages(parameters: params)
response.dig("content", 0, "text")
rescue Anthropic::Error => e
Rails.logger.error("[ClaudeService] API error: #{e.message}")
raise
end
end
Use it in your controller:
# app/controllers/api/v1/chat_controller.rb
class Api::V1::ChatController < ApplicationController
def create
service = ClaudeService.new
reply = service.chat(
params.require(:message),
system_prompt: "You are a helpful Rails assistant."
)
render json: { reply: reply }
rescue Anthropic::Error => e
render json: { error: e.message }, status: :unprocessable_entity
end
end
Build production Rails integrations with Claude
Agent SDK Cookbook ($49) includes 30+ production recipes: streaming pipelines, multi-agent coordination, background job patterns, error handling, and cost optimization — with Ruby-compatible patterns throughout.
Streaming with ActionController::Live
Use Rails ActionController::Live and Server-Sent Events to stream tokens to the browser in real time:
# app/controllers/api/v1/stream_controller.rb
class Api::V1::StreamController < ApplicationController
include ActionController::Live
def create
response.headers["Content-Type"] = "text/event-stream"
response.headers["Cache-Control"] = "no-cache"
response.headers["X-Accel-Buffering"] = "no"
client = Anthropic::Client.new
client.messages(
parameters: {
model: "claude-sonnet-4-5",
max_tokens: 2048,
stream: true,
messages: [{ role: "user", content: params[:message] }]
}
) do |chunk, _bytesize|
if chunk["type"] == "content_block_delta"
text = chunk.dig("delta", "text").to_s
response.stream.write("data: #{text.to_json}\n\n") unless text.empty?
end
if chunk["type"] == "message_stop"
response.stream.write("data: [DONE]\n\n")
end
end
rescue ActionController::Live::ClientDisconnected
Rails.logger.info("[StreamController] Client disconnected")
ensure
response.stream.close
end
end
Add the route:
# config/routes.rb
namespace :api do
namespace :v1 do
post "stream", to: "stream#create"
post "chat", to: "chat#create"
end
end
On the frontend, consume the stream with the EventSource API or fetch with a ReadableStream.
Performance note: First token arrives in 300–500ms with streaming. Without streaming, a 2,000-token response waits 5–12 seconds for the full payload. For any user-facing feature, always stream.
ActiveJob Background Processing
For long-running or batch tasks, offload Claude calls to a background job:
# app/jobs/claude_analysis_job.rb
class ClaudeAnalysisJob < ApplicationJob
queue_as :default
retry_on Anthropic::RateLimitError, wait: :polynomially_longer, attempts: 5
retry_on Anthropic::ServerError, wait: 10.seconds, attempts: 3
discard_on Anthropic::AuthenticationError
def perform(document_id)
document = Document.find(document_id)
client = Anthropic::Client.new
response = client.messages(
parameters: {
model: "claude-haiku-4-5", # Use Haiku for batch tasks (10x cheaper)
max_tokens: 512,
messages: [
{
role: "user",
content: "Summarize this document in 3 bullet points:\n\n#{document.body}"
}
]
}
)
summary = response.dig("content", 0, "text")
document.update!(summary: summary, summarized_at: Time.current)
end
end
Enqueue it from a controller or callback:
ClaudeAnalysisJob.perform_later(document.id)
Use Sidekiq or Solid Queue as the backend. For cost optimization, route simple summarization tasks to claude-haiku-4-5 and reserve Sonnet for complex reasoning. See Claude Haiku vs Sonnet vs Opus: Which Model for detailed benchmarks.
Error Handling with Retries
The anthropic gem raises typed errors you can rescue explicitly:
# app/services/claude_service.rb (robust version)
class ClaudeService
MAX_RETRIES = 3
BASE_DELAY = 1.0 # seconds
def chat_with_retry(message)
attempts = 0
begin
attempts += 1
call_api(message)
rescue Anthropic::RateLimitError => e
retry_after = e.response&.headers&.[]("retry-after")&.to_i || (BASE_DELAY * (2 ** attempts))
raise if attempts >= MAX_RETRIES
Rails.logger.warn("[ClaudeService] Rate limited. Retrying in #{retry_after}s (attempt #{attempts})")
sleep(retry_after)
retry
rescue Anthropic::ServerError
raise if attempts >= MAX_RETRIES
sleep(BASE_DELAY * (2 ** attempts))
retry
rescue Anthropic::AuthenticationError => e
Rails.logger.error("[ClaudeService] Invalid API key: #{e.message}")
raise
rescue Anthropic::BadRequestError => e
Rails.logger.error("[ClaudeService] Bad request: #{e.message}")
raise
end
end
private
def call_api(message)
@client ||= Anthropic::Client.new
response = @client.messages(
parameters: {
model: "claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: message }]
}
)
response.dig("content", 0, "text")
end
end
For the full list of error codes and what each one means, see Claude API Error Codes Reference.
Rate Limit Handling
Anthropic's default rate limits vary by tier. Handle them at the infrastructure level:
# config/initializers/anthropic_rate_limiter.rb
# Use Redis + a token bucket for concurrent requests
# Simple in-process throttle using the `throttle` pattern:
module AnthropicRateLimiter
REQUESTS_PER_MINUTE = 50 # Adjust to your tier
def self.with_limit(&block)
Semaphore.with_limit(REQUESTS_PER_MINUTE, per: 60) { block.call }
end
end
For header-based backoff, inspect the response headers on 429:
rescue Anthropic::RateLimitError => e
# Headers available in e.response.headers:
# x-ratelimit-limit-requests, x-ratelimit-remaining-requests
# x-ratelimit-reset-requests, retry-after
wait = e.response&.headers&.[]("retry-after")&.to_f || 60
sleep(wait)
retry
end
For cost details and when prompt caching pays off at scale, see Claude API Cost and Prompt Caching Break-Even.
30+ production recipes for Claude API integrations
Agent SDK Cookbook ($49) covers streaming pipelines, multi-agent coordination, tool use chains, error recovery, and cost optimization — patterns that map directly to Ruby on Rails projects.
Frequently Asked Questions
Is there an official Anthropic Ruby gem?
Yes. Anthropic publishes the anthropic gem officially on RubyGems. Add gem "anthropic" to your Gemfile and run bundle install. It supports messages, streaming, tool use, and prompt caching. You can also call the REST API directly with Faraday or Net::HTTP if you prefer zero dependencies.
How do I stream Claude responses in Rails without blocking the main thread?
Use ActionController::Live with include ActionController::Live in your controller and write chunks to response.stream. Puma handles the connection in a separate thread. Ensure your web server is configured for streaming (disable response buffering with X-Accel-Buffering: no for Nginx). Close the stream in an ensure block to prevent connection leaks.
Which model should I use for Rails background jobs?
For high-volume batch tasks (summarization, classification, extraction), use claude-haiku-4-5 — it is roughly 10x cheaper than Sonnet with comparable quality for straightforward tasks. Reserve claude-sonnet-4-5 for complex reasoning or code generation. See Claude Haiku vs Sonnet vs Opus: Which Model for a decision framework.
How do I handle token limits in Rails?
Track response.dig("usage", "input_tokens") and output_tokens after each call and log them. For large inputs, truncate or chunk the content before sending. Claude's context window is 200k tokens for Sonnet and Haiku — well above most Rails use cases, but worth monitoring in document-processing pipelines.
Can I use Claude with Rails Action Cable?
Yes. Call the Anthropic API in a Channel action and broadcast chunks as they arrive using ActionCable::Server::Broadcasting. For streaming, run the API call in a background thread or job to avoid blocking the Action Cable connection. Each streamed chunk calls ActionCable.server.broadcast(channel, { text: chunk }).