Cohere API Pricing: What Each Model Costs

Cohere takes a different approach than OpenAI and Anthropic. Instead of one flagship model, they offer specialized models for generation (Command), embeddings (Embed), and ranking (Rerank). The pricing varies a lot by model type. Here's what each costs and where Cohere makes sense in your stack.

Trial (Free)

$0 Free with rate limits

✓ Access to all models
✓ 100 API calls per minute
✓ 1,000 API calls per month
✓ No production use allowed
✓ Good for evaluation only

Command R+

$2.50 / $10 per 1M input / output tokens

✓ Largest and most capable generation model
✓ 128K context window
✓ Strong multilingual support (10+ languages)
✓ RAG-optimized with citation generation
✓ Tool use and function calling

Command R

$0.15 / $0.60 per 1M input / output tokens

✓ Smaller, faster generation model
✓ 128K context window
✓ Good for RAG, summarization, chat
✓ Lower latency than Command R+
✓ Best value for high-volume generation

Embed v3

$0.10 per 1M tokens

✓ State-of-the-art embedding model
✓ 1024 dimensions (configurable)
✓ Multilingual support
✓ Search, classification, clustering use cases
✓ Cheaper than OpenAI embeddings

Rerank v3

$2 per 1,000 searches

✓ Re-ranks search results by relevance
✓ Works with any retrieval system
✓ Dramatically improves RAG accuracy
✓ Simple API (just pass query + documents)
✓ Multilingual support

Hidden Costs & Gotchas

⚠ The trial tier's 1,000 calls/month limit means you'll exhaust it in a single afternoon of development. Plan to move to paid quickly if you're serious about evaluation.
⚠ Command R+ output tokens cost 4x more than input. Long-form generation tasks can cost significantly more than expected. Use Command R for drafts and R+ only for final outputs.
⚠ Rerank pricing is per search, not per token. Each search can include up to 1,000 documents. If you're re-ranking 100 documents per query, that's still just $0.002 per query.
⚠ There's no batch pricing discount like OpenAI's Batch API. High-volume users should negotiate custom contracts.

Which Plan Do You Need?

RAG pipeline builder

Cohere's Embed + Rerank combo is arguably the best value in the market for retrieval. Embed at $0.10/1M tokens for indexing, Rerank at $2/1K searches for quality. Pair with any LLM for generation.

Multilingual application

Cohere's multilingual support across all models is a standout feature. If your app serves multiple languages, Command R+ handles 10+ languages without separate models. Most competitors charge extra or perform worse on non-English text.

Cost-sensitive production app

Command R at $0.15/$0.60 per 1M tokens is one of the cheapest capable generation models available. It's in the same price range as GPT-4o-mini but with a 128K context window and solid RAG performance.

Developer comparing LLM providers

Use the free trial to test Cohere against OpenAI and Anthropic on your specific tasks. Cohere's sweet spot is retrieval and multilingual work. For general chat and coding, OpenAI and Anthropic typically win.

The Bottom Line

Cohere isn't trying to be the best general-purpose LLM. Its strength is the Embed + Rerank stack for retrieval, multilingual support, and competitive pricing on generation models. If you're building RAG applications, Cohere's embedding and reranking models should be on your shortlist regardless of which LLM you use for generation.

Disclosure: Pricing information is sourced from official websites and may change. We update this page regularly but always verify current pricing on the vendor's site before purchasing.

Related Resources

OpenAI API pricing → Anthropic API pricing → Best LLM Frameworks → RAG architecture guide → Embedding model glossary →

Frequently Asked Questions

How much does Cohere cost?

It depends on the model. Command R+ (generation): $2.50/$10 per 1M tokens. Command R (cheaper generation): $0.15/$0.60 per 1M tokens. Embed v3: $0.10 per 1M tokens. Rerank v3: $2 per 1,000 searches. There's also a free trial tier with 1,000 calls/month.

Is Cohere cheaper than OpenAI?

For embeddings, yes. Cohere Embed v3 at $0.10/1M tokens is cheaper than OpenAI's text-embedding-3-small at $0.02/1M tokens for smaller models, but Cohere's multilingual quality is generally higher. For generation, Command R ($0.15/1M) is comparable to GPT-4o-mini ($0.15/1M). Command R+ ($2.50/1M) is similar to GPT-4o ($2.50/1M).

What is Cohere Rerank?

Rerank is a model that takes a query and a list of documents and re-orders them by relevance. It dramatically improves RAG accuracy by filtering out irrelevant retrieved documents before they reach your LLM. At $2 per 1,000 searches, it's one of the cheapest ways to improve retrieval quality.

Should I use Cohere or OpenAI for embeddings?

Cohere Embed v3 generally produces higher quality embeddings for multilingual and retrieval use cases. OpenAI's text-embedding-3 is simpler to integrate if you're already using the OpenAI API. For English-only, the quality difference is small. For multilingual, Cohere wins.

Does Cohere have a free tier?

Cohere has a trial tier that's free with 1,000 API calls per month and 100 per minute. It's limited to non-production use. There's no permanent free tier for production applications. You move to pay-as-you-go pricing once you're past evaluation.