How much do embedding APIs cost per million tokens?

OpenAI text-embedding-3-small costs $0.02 per 1M tokens and text-embedding-3-large costs $0.13. Cohere embed-v4 costs $0.10. Voyage AI voyage-3-large costs $0.18. Jina jina-embeddings-v3 costs $0.02. Google text-embedding-004 is free on the Gemini API free tier.

What dimensions should I use for text embeddings?

For most applications, 768-1024 dimensions provide the best balance of quality and storage cost. OpenAI and Cohere support Matryoshka embeddings, which let you truncate to fewer dimensions (256, 512) with only minor quality loss.

Can I use open-source embedding models instead of paid APIs?

Yes. BGE-large-en-v1.5, E5-large-v2, and GTE-large are all free, open-source embedding models. They score within 2-5% of the best commercial models on MTEB. Self-hosting becomes cheaper than API calls only when you exceed about 10-15 million embeddings per month.

What context window do embedding models support?

Context windows vary widely. OpenAI text-embedding-3-large supports 8,191 tokens. Cohere embed-v4 supports 512 tokens. Voyage AI supports 32,000 tokens. Jina-embeddings-v3 supports 8,192 tokens. For long documents, you typically need to chunk text before embedding regardless.

Embedding Model Specs Compared (April 2026): Dimensions, Price per 1M Tokens, and MTEB

Q: What is the best text embedding model in 2026?

It depends on your priorities. For the best MTEB benchmark scores, Cohere embed-v4 and Voyage AI voyage-3-large lead. For the best price-to-performance ratio, Jina-embeddings-v3 at $0.02 per 1M tokens is hard to beat. For self-hosted options, BGE-large-en-v1.5 is free and runs on modest hardware.

Q: What is the MTEB benchmark and why does it matter?

MTEB (Massive Text Embedding Benchmark) is the standard benchmark suite for evaluating embedding models across tasks like retrieval, classification, clustering, and semantic similarity. It includes 56+ datasets. A model MTEB score is the best single predictor of real-world retrieval performance.

Full spec table for every major embedding model: dimensions, price per 1M tokens, max context window, and MTEB retrieval score side by side. Use this to narrow down options based on hard specs, then see our Best Embedding Models picks for real-world testing results on 50K documents.

The Full Comparison Table

All prices verified against official documentation as of April 2026. MTEB scores are from the public leaderboard (English retrieval subset).

Model	Provider	Dimensions	Price / 1M Tokens	Max Tokens	MTEB Avg
text-embedding-3-small	OpenAI	1,536	$0.02	8,191	62.3
text-embedding-3-large	OpenAI	3,072	$0.13	8,191	64.6
embed-v4	Cohere	1,024	$0.10	512	66.3
voyage-3-large	Voyage AI	1,024	$0.18	32,000	67.1
voyage-3-lite	Voyage AI	512	$0.02	32,000	61.4
jina-embeddings-v3	Jina AI	1,024	$0.02	8,192	65.5
text-embedding-004	Google	768	Free / $0.025	2,048	63.0
BGE-large-en-v1.5	BAAI (open source)	1,024	Free (self-hosted)	512	63.6
E5-large-v2	Microsoft (open source)	1,024	Free (self-hosted)	512	62.0
GTE-large-en-v1.5	Alibaba (open source)	1,024	Free (self-hosted)	8,192	65.4
nomic-embed-text-v1.5	Nomic AI	768	Free (open source)	8,192	62.3

How to Read This Table

Dimensions determine how much storage each vector requires. A 1,024-dimension float32 vector takes 4KB. At 10 million documents, that is 40GB of vector storage. Doubling dimensions doubles that cost. Some models (OpenAI, Cohere) support Matryoshka embeddings, which let you use fewer dimensions with graceful quality degradation.

Price per 1M tokens is the API cost. Embeddings are dramatically cheaper than generative model calls. Even at scale (100M tokens per month), the most expensive option here (Voyage AI at $0.18/1M) costs $18 per month. For most teams, embedding costs are negligible compared to vector database storage and generative model inference.

Max tokens is the context window. This determines the longest text chunk you can embed in a single API call. In practice, shorter chunks (256-512 tokens) often produce better retrieval results than embedding entire documents, so this limit matters less than you might expect.

MTEB average is the model's score across the Massive Text Embedding Benchmark suite. Higher is better. The differences between 62 and 67 are meaningful in production retrieval systems. A 5-point gap on MTEB typically translates to 3-8% better recall@10 in real-world search applications.

The Price-Quality Sweet Spot

Jina-embeddings-v3 at $0.02/1M tokens with an MTEB score of 65.5 is the current best value. It scores within 2 points of the most expensive models at 1/9th the price of Voyage AI. For teams that want top-tier quality without budget constraints, voyage-3-large at $0.18/1M leads on benchmarks. OpenAI text-embedding-3-small at $0.02/1M is the safe default: widely integrated, well-documented, and good enough for 90% of applications.

Provider-by-Provider Breakdown

OpenAI: text-embedding-3-small and text-embedding-3-large

OpenAI offers two embedding models in the v3 generation. The small model (1,536 dimensions, $0.02/1M) is the most widely used embedding model in production. It replaced ada-002 in early 2024 and is supported by every major vector database and framework out of the box.

The large model (3,072 dimensions, $0.13/1M) scores about 2 points higher on MTEB. Both models support Matryoshka embeddings, meaning you can request a lower-dimensional output (256, 512, 1024) and the model will return a truncated vector that still performs well. This is useful for reducing storage costs at the expense of a small quality hit.

OpenAI's embedding API is the simplest to integrate. One endpoint, one API key, predictable behavior. The models are not the highest performing on benchmarks, but the ecosystem support, documentation, and reliability make them the safe default for most projects.

Cohere: embed-v4

Cohere's latest embedding model scores 66.3 on MTEB, putting it among the top commercial options. The key differentiator is Cohere's search-specific features: embed-v4 supports separate "search_document" and "search_query" input types, which optimize the embedding differently for indexing versus retrieval.

The main limitation is the 512-token context window. Any input longer than 512 tokens gets truncated. For RAG applications where you are already chunking documents into 256-512 token pieces, this is fine. For applications that need to embed longer passages, you will need to chunk first.

Pricing at $0.10/1M tokens is mid-range. Cohere also offers a trial API key with 1,000 calls per month at no cost, making it easy to evaluate before committing.

Voyage AI: voyage-3-large and voyage-3-lite

Voyage AI consistently leads MTEB benchmarks and scores highest on retrieval-specific tasks. The large model at $0.18/1M tokens and 67.1 MTEB is the premium choice for applications where retrieval quality directly impacts user experience or revenue (search engines, recommendation systems, customer support).

The standout feature is the 32,000-token context window. This is the only major embedding model that can process an entire research paper or long document in a single call. The lite model at $0.02/1M tokens offers the same long context at a fraction of the cost with lower benchmark scores.

Jina AI: jina-embeddings-v3

Jina-embeddings-v3 is the surprise performer at $0.02/1M tokens. It scores 65.5 on MTEB, outperforming OpenAI's large model while costing less than OpenAI's small model. The model supports 8,192 tokens of context and outputs 1,024 dimensions.

Jina also offers this model as a downloadable open-weight model, so you can self-host it if you prefer. The API has been less battle-tested than OpenAI or Cohere at scale, but for developers comfortable with a newer provider, the price-performance ratio is exceptional.

Google: text-embedding-004

Google's embedding model is available through the Gemini API. On the free tier, embedding calls cost nothing (subject to rate limits). On the paid tier, it costs $0.025/1M tokens. The model produces 768-dimension vectors with a 2,048-token context window.

Quality is mid-pack at 63.0 MTEB. The main appeal is the free tier integration. If you are already using Google AI Studio for Gemini text generation, adding embeddings at no cost is a natural choice.

Open-Source Options: BGE, E5, GTE, Nomic

All four of these models are free to download and run locally. BGE-large-en-v1.5 (from BAAI) and GTE-large-en-v1.5 (from Alibaba) are the strongest performers, scoring 63-65 on MTEB. Nomic-embed-text-v1.5 offers the longest context window (8,192 tokens) among open-source models with competitive quality.

The cost calculation for self-hosting: a single A10G GPU (~$0.75/hour on AWS) can process roughly 500-1,000 embeddings per second. At that rate, self-hosting becomes cheaper than API calls only when you exceed about 10-15 million embeddings per month. Below that threshold, a paid API is both cheaper and simpler to maintain.

How to Choose: Decision Framework

If you are building a prototype or MVP: Use OpenAI text-embedding-3-small. It is the most documented, most integrated, and cheapest commercial option. You can always switch later.
If retrieval quality is your top priority: Use Voyage AI voyage-3-large. It leads benchmarks and supports 32K context. The price premium ($0.18 vs. $0.02-$0.13) is negligible at most scales.
If you want the best price-to-quality ratio: Use Jina-embeddings-v3 at $0.02/1M tokens. It outperforms models 5-9x its price on benchmarks.
If you need zero cost and are already on Google Cloud: Use text-embedding-004 on the Gemini free tier. The quality is adequate for most search and classification tasks.
If you must self-host (data sovereignty, air-gapped environments): Use GTE-large-en-v1.5 or nomic-embed-text-v1.5. Both offer strong quality with reasonable hardware requirements.

Practical Cost Examples

To make the pricing concrete, here is what it costs to embed common workloads.

Workload	Approx. Tokens	OpenAI Small	Cohere v4	Voyage Large
10,000 product descriptions	5M	$0.10	$0.50	$0.90
100,000 support articles	150M	$3.00	$15.00	$27.00
1M document chunks (RAG)	500M	$10.00	$50.00	$90.00
10M search queries/month	200M	$4.00	$20.00	$36.00

Notice the scale. Even embedding 1 million documents with the most expensive option (Voyage AI) costs $90. The real expense in a RAG system is the generative model call that answers the user's question, not the embedding that retrieved the context. Optimizing embedding cost matters only at massive scale (billions of vectors).

Related Resources

Frequently Asked Questions

What is the best text embedding model in 2026?

It depends on your priorities. For the best MTEB scores, Cohere embed-v4 and Voyage AI voyage-3-large lead. For price-performance, Jina-embeddings-v3 at $0.02/1M is hard to beat. For the most integrations, OpenAI text-embedding-3-small is the safe default.

How much do embedding APIs cost?

From $0.02 to $0.18 per million tokens. OpenAI small and Jina v3 are cheapest at $0.02. Voyage AI large is most expensive at $0.18. Google's text-embedding-004 is free on the Gemini API free tier. Even at scale, embedding costs are typically under $100/month for most applications.

What dimensions should I use for embeddings?

768-1024 dimensions work well for most applications. OpenAI and Cohere support Matryoshka embeddings, letting you truncate to fewer dimensions (256, 512) with minor quality loss. Higher dimensions (1536, 3072) give marginally better retrieval but increase storage costs proportionally.

What is the MTEB benchmark?

MTEB (Massive Text Embedding Benchmark) is the standard evaluation suite for embedding models. It covers 56+ datasets across retrieval, classification, clustering, and semantic similarity tasks. A model's MTEB score is the best single predictor of real-world search performance.

Should I use open-source or paid embedding models?

For fewer than 10-15 million embeddings per month, paid APIs are typically cheaper and simpler than self-hosting. Above that threshold, self-hosting with models like GTE-large or BGE-large becomes cost-effective. Data sovereignty requirements may also require self-hosting regardless of volume.