Claude + Voyage AI Embeddings: The Anthropic-Recommended Stack (2026)

Q: Can I use Claude as an embedding model?

No. Claude is a generation model — it doesn't expose hidden states or embeddings via API. Use a dedicated embedding model (Voyage, OpenAI, Cohere, or open-source).

Q: Should I use voyage-3 or voyage-3-large?

Start with voyage-3 ($0.06/M). Switch to voyage-3-large ($0.18/M) only if you measure a meaningful quality gap on your eval set. For most RAG, voyage-3 is sufficient.

Q: Does Voyage support Korean as well as English?

Yes. voyage-3 is multilingual with strong performance across Korean, Japanese, Chinese, and major European languages. Embed mixed-language corpora directly without translation.

Q: How does reranking compare to using a larger embedding model?

Reranking with a small model + rerank-2 typically beats using a large embedding model alone on retrieval quality. The cost is also lower: vector search remains cheap, reranking targets only top-K. See production-ready pipeline in Claude + Pinecone Vector RAG.

Q: Is there a free tier?

Voyage AI offers $50 in free credits on signup, which is enough to embed ~800M tokens or rerank ~1M queries. Anthropic Console users get additional credits.

Voyage AI is the embedding provider Anthropic officially recommends for Claude RAG pipelines — voyage-3 scores 65.4 on MTEB (vs OpenAI text-embedding-3-large at 64.6) and costs $0.06/M tokens (vs OpenAI at $0.13/M), making it ~50% cheaper at higher quality. Voyage also ships rerank-2 ($0.05/M query+doc tokens) and voyage-code-2 (specialized for code search). This guide covers when to choose Voyage over alternatives, all 4 model variants, multilingual handling, the reranking workflow, and migration from OpenAI embeddings.

For Claude + Pinecone RAG end-to-end see Claude + Pinecone Vector DB. For embeddings-vs-search alternatives see Claude API Semantic Search.

Voyage AI vs Alternatives (Benchmark)

Provider	Model	Cost/M tokens	MTEB	Dims
Voyage AI	voyage-3	$0.06	65.4	1024
Voyage AI	voyage-3-large	$0.18	67.2	2048
OpenAI	text-embedding-3-large	$0.13	64.6	3072
OpenAI	text-embedding-3-small	$0.02	62.3	1536
Cohere	embed-v3	$0.10	64.5	1024
Open source	BGE-large-en-v1.5	$0 (self-host)	64.2	1024

voyage-3 is the new default for Claude RAG: best quality-per-dollar, Anthropic-blessed.

Setup (60 seconds)

pip install voyageai
# or
bun add voyageai

Get an API key at https://voyageai.com (Anthropic Console users get a Voyage credit).

import voyageai
vo = voyageai.Client()  # uses VOYAGE_API_KEY env var

# Embed documents
docs = ["Claude is a powerful LLM", "Voyage makes embeddings"]
result = vo.embed(docs, model="voyage-3", input_type="document")
embeddings = result.embeddings  # list of 1024-dim vectors

That's it. Now feed into Pinecone, pgvector, Qdrant, or any vector DB.

Choose the Right Model

voyage-3 (recommended default)

Cost: $0.06/M tokens
Quality: 65.4 MTEB
Dimensions: 1024
Use for: most RAG, semantic search, dedup, recommendation
Multilingual: yes (Korean, Japanese, Chinese, European languages)

voyage-3-large (when quality > cost)

Cost: $0.18/M tokens (3x voyage-3)
Quality: 67.2 MTEB
Dimensions: 2048
Use for: legal, medical, high-precision retrieval
Not for: cost-sensitive, high-volume

voyage-code-2 (code search)

Cost: $0.12/M tokens
Specialized: code embeddings
Use for: code search, repo similarity, function matching
Use case: building "find similar functions" in Claude Code workflows

rerank-2 (the reranking complement)

Cost: $0.05/M query+doc tokens
Use case: refine top-20 vector search to top-5 with high precision

The Two-Stage Pattern (Industry Standard)

Vector search alone has ~70% top-5 accuracy. Adding rerank pushes it to ~90%+.

import voyageai

vo = voyageai.Client()

def search_with_rerank(query: str, vector_db, top_k=5):
    # Stage 1: cheap vector search retrieves 20 candidates
    query_emb = vo.embed([query], model="voyage-3",
                          input_type="query").embeddings[0]
    candidates = vector_db.query(query_emb, top_k=20)

    # Stage 2: expensive rerank picks the best 5
    docs = [c["text"] for c in candidates]
    reranked = vo.rerank(query=query, documents=docs,
                         model="rerank-2", top_k=top_k)

    return [
        {**candidates[r.index], "rerank_score": r.relevance_score}
        for r in reranked.results
    ]

Cost per query (50K-doc index):

Embed query: 50 tokens × $0.06/M = $0.000003
Vector search: ~$0.00001
Rerank 20 docs × ~500 tokens = 10K tokens × $0.05/M = $0.0005
Total: <$0.001/query

Multilingual: Korean + English Together

voyage-3 handles mixed-language corpora natively:

docs = [
    "Claude is Anthropic's AI assistant",
    "Claude는 Anthropic의 AI 어시스턴트입니다",
    "Cloudeは Anthropicの AIアシスタントです"
]
embs = vo.embed(docs, model="voyage-3", input_type="document").embeddings
# Query in any language retrieves all three

No language detection or routing needed — embed once, search across languages. See Korean Prompt Engineering for Korean-specific Claude patterns.

Input Type: document vs query

Voyage requires you to specify whether text is being embedded as a document (stored) or query (search):

# When embedding for storage
docs_emb = vo.embed(documents, model="voyage-3", input_type="document").embeddings

# When embedding a search query
query_emb = vo.embed([user_query], model="voyage-3", input_type="query").embeddings[0]

Different optimization paths. Skip this distinction → 5-10% accuracy loss.

Migration from OpenAI Embeddings

If you're moving from OpenAI text-embedding-3-large:

# OLD
import openai
client = openai.OpenAI()
emb = client.embeddings.create(
    input=text,
    model="text-embedding-3-large"
).data[0].embedding  # 3072-dim

# NEW (Voyage)
import voyageai
vo = voyageai.Client()
emb = vo.embed([text], model="voyage-3",
               input_type="document").embeddings[0]  # 1024-dim

Dimension difference matters: must re-embed all docs in your vector DB. The migration:

Create new index with dimension=1024
Re-embed all docs with voyage-3 (one-time cost: ~$2 per 1M docs)
Cutover queries to new index
Delete old index

For 1M doc dataset: ~$2 + ~1 hour. Quality improves, cost drops 50%.

Reranking Without Voyage Vectors

You can use voyage rerank-2 on TOP of OpenAI/Cohere vector search:

# Use OpenAI for vector search (existing infrastructure)
candidates = openai_vector_db.query(query_emb, top_k=20)

# Use Voyage for reranking (additive — no migration)
reranked = vo.rerank(query=query, documents=[c["text"] for c in candidates],
                     model="rerank-2", top_k=5)

Cheapest way to get a 20% accuracy boost on existing RAG: add voyage rerank-2 on top.

Cost at Scale

Scale	One-time embed	Monthly query (50K queries)
10K docs	$0.30	$0.50
100K docs	$3	$0.50
1M docs	$30	$2
10M docs	$300	$20

Embedding cost is a one-time write expense. Query cost scales with traffic, not corpus size.

Frequently Asked Questions

Can I use Claude as an embedding model?

No. Claude is a generation model — it doesn't expose hidden states or embeddings via API. Use a dedicated embedding model (Voyage, OpenAI, Cohere, or open-source).

Should I use voyage-3 or voyage-3-large?

Start with voyage-3 ($0.06/M). Switch to voyage-3-large ($0.18/M) only if you measure a meaningful quality gap on your eval set. For most RAG, voyage-3 is sufficient.

Does Voyage support Korean as well as English?