Claude + Voyage AI Embeddings: The Anthropic-Recommended Stack (2026)
Voyage AI is the embedding provider Anthropic officially recommends for Claude RAG pipelines โ voyage-3 scores 65.4 on MTEB (vs OpenAI text-embedding-3-large at 64.6) and costs $0.06/M tokens (vs OpenAI at $0.13/M), making it ~50% cheaper at higher quality. Voyage also ships rerank-2 ($0.05/M query+doc tokens) and voyage-code-2 (specialized for code search). This guide covers when to choose Voyage over alternatives, all 4 model variants, multilingual handling, the reranking workflow, and migration from OpenAI embeddings.
For Claude + Pinecone RAG end-to-end see Claude + Pinecone Vector DB. For embeddings-vs-search alternatives see Claude API Semantic Search.
Voyage AI vs Alternatives (Benchmark)
| Provider | Model | Cost/M tokens | MTEB | Dims |
|---|---|---|---|---|
| Voyage AI | voyage-3 | $0.06 | 65.4 | 1024 |
| Voyage AI | voyage-3-large | $0.18 | 67.2 | 2048 |
| OpenAI | text-embedding-3-large | $0.13 | 64.6 | 3072 |
| OpenAI | text-embedding-3-small | $0.02 | 62.3 | 1536 |
| Cohere | embed-v3 | $0.10 | 64.5 | 1024 |
| Open source | BGE-large-en-v1.5 | $0 (self-host) | 64.2 | 1024 |
voyage-3 is the new default for Claude RAG: best quality-per-dollar, Anthropic-blessed.
Setup (60 seconds)
pip install voyageai
# or
bun add voyageai
Get an API key at https://voyageai.com (Anthropic Console users get a Voyage credit).
import voyageai
vo = voyageai.Client() # uses VOYAGE_API_KEY env var
# Embed documents
docs = ["Claude is a powerful LLM", "Voyage makes embeddings"]
result = vo.embed(docs, model="voyage-3", input_type="document")
embeddings = result.embeddings # list of 1024-dim vectors
That's it. Now feed into Pinecone, pgvector, Qdrant, or any vector DB.
Choose the Right Model
voyage-3 (recommended default)
- Cost: $0.06/M tokens
- Quality: 65.4 MTEB
- Dimensions: 1024
- Use for: most RAG, semantic search, dedup, recommendation
- Multilingual: yes (Korean, Japanese, Chinese, European languages)
voyage-3-large (when quality > cost)
- Cost: $0.18/M tokens (3x voyage-3)
- Quality: 67.2 MTEB
- Dimensions: 2048
- Use for: legal, medical, high-precision retrieval
- Not for: cost-sensitive, high-volume
voyage-code-2 (code search)
- Cost: $0.12/M tokens
- Specialized: code embeddings
- Use for: code search, repo similarity, function matching
- Use case: building "find similar functions" in Claude Code workflows
rerank-2 (the reranking complement)
- Cost: $0.05/M query+doc tokens
- Use case: refine top-20 vector search to top-5 with high precision
The Two-Stage Pattern (Industry Standard)
Vector search alone has ~70% top-5 accuracy. Adding rerank pushes it to ~90%+.
import voyageai
vo = voyageai.Client()
def search_with_rerank(query: str, vector_db, top_k=5):
# Stage 1: cheap vector search retrieves 20 candidates
query_emb = vo.embed([query], model="voyage-3",
input_type="query").embeddings[0]
candidates = vector_db.query(query_emb, top_k=20)
# Stage 2: expensive rerank picks the best 5
docs = [c["text"] for c in candidates]
reranked = vo.rerank(query=query, documents=docs,
model="rerank-2", top_k=top_k)
return [
{**candidates[r.index], "rerank_score": r.relevance_score}
for r in reranked.results
]
Cost per query (50K-doc index):
- Embed query: 50 tokens ร $0.06/M = $0.000003
- Vector search: ~$0.00001
- Rerank 20 docs ร ~500 tokens = 10K tokens ร $0.05/M = $0.0005
- Total: <$0.001/query
Multilingual: Korean + English Together
voyage-3 handles mixed-language corpora natively:
docs = [
"Claude is Anthropic's AI assistant",
"Claude๋ Anthropic์ AI ์ด์์คํดํธ์
๋๋ค",
"Cloudeใฏ Anthropicใฎ AIใขใทในใฟใณใใงใ"
]
embs = vo.embed(docs, model="voyage-3", input_type="document").embeddings
# Query in any language retrieves all three
No language detection or routing needed โ embed once, search across languages. See Korean Prompt Engineering for Korean-specific Claude patterns.
Input Type: document vs query
Voyage requires you to specify whether text is being embedded as a document (stored) or query (search):
# When embedding for storage
docs_emb = vo.embed(documents, model="voyage-3", input_type="document").embeddings
# When embedding a search query
query_emb = vo.embed([user_query], model="voyage-3", input_type="query").embeddings[0]
Different optimization paths. Skip this distinction โ 5-10% accuracy loss.
Migration from OpenAI Embeddings
If you're moving from OpenAI text-embedding-3-large:
# OLD
import openai
client = openai.OpenAI()
emb = client.embeddings.create(
input=text,
model="text-embedding-3-large"
).data[0].embedding # 3072-dim
# NEW (Voyage)
import voyageai
vo = voyageai.Client()
emb = vo.embed([text], model="voyage-3",
input_type="document").embeddings[0] # 1024-dim
Dimension difference matters: must re-embed all docs in your vector DB. The migration:
- Create new index with
dimension=1024 - Re-embed all docs with voyage-3 (one-time cost: ~$2 per 1M docs)
- Cutover queries to new index
- Delete old index
For 1M doc dataset: ~$2 + ~1 hour. Quality improves, cost drops 50%.
Reranking Without Voyage Vectors
You can use voyage rerank-2 on TOP of OpenAI/Cohere vector search:
# Use OpenAI for vector search (existing infrastructure)
candidates = openai_vector_db.query(query_emb, top_k=20)
# Use Voyage for reranking (additive โ no migration)
reranked = vo.rerank(query=query, documents=[c["text"] for c in candidates],
model="rerank-2", top_k=5)
Cheapest way to get a 20% accuracy boost on existing RAG: add voyage rerank-2 on top.
Cost at Scale
| Scale | One-time embed | Monthly query (50K queries) |
|---|---|---|
| 10K docs | $0.30 | $0.50 |
| 100K docs | $3 | $0.50 |
| 1M docs | $30 | $2 |
| 10M docs | $300 | $20 |
Embedding cost is a one-time write expense. Query cost scales with traffic, not corpus size.
Frequently Asked Questions
Can I use Claude as an embedding model?
No. Claude is a generation model โ it doesn't expose hidden states or embeddings via API. Use a dedicated embedding model (Voyage, OpenAI, Cohere, or open-source).
Should I use voyage-3 or voyage-3-large?
Start with voyage-3 ($0.06/M). Switch to voyage-3-large ($0.18/M) only if you measure a meaningful quality gap on your eval set. For most RAG, voyage-3 is sufficient.
Does Voyage support Korean as well as English?
Yes. voyage-3 is multilingual with strong performance across Korean, Japanese, Chinese, and major European languages. Embed mixed-language corpora directly without translation.
How does reranking compare to using a larger embedding model?
Reranking with a small model + rerank-2 typically beats using a large embedding model alone on retrieval quality. The cost is also lower: vector search remains cheap, reranking targets only top-K. See production-ready pipeline in Claude + Pinecone Vector RAG.
Is there a free tier?
Voyage AI offers $50 in free credits on signup, which is enough to embed ~800M tokens or rerank ~1M queries. Anthropic Console users get additional credits.
Master Production Claude API RAG
Claude Agent SDK Cookbook ($79) โ 40 production recipes including RAG with Voyage embeddings + Pinecone, pgvector, Qdrant. Eval, cost optimization, and security patterns included.