LLM API pricing changes constantly. New models launch, old ones get cheaper, and providers quietly adjust rates between announcements. This page tracks every major model's cost per million tokens, updated for April 2026.
Whether you're picking a model for a new project, budgeting compute costs, or comparing providers for a procurement decision, the tables below give you the numbers you need without digging through five different pricing pages.
Master Pricing Table: All Major LLM APIs (April 2026)
Prices are per 1 million tokens. Input is what you send to the model. Output is what the model generates back. Every model charges differently for each direction because output tokens require more compute.
| Provider | Model | Input / 1M Tokens | Output / 1M Tokens | Context Window |
|---|---|---|---|---|
| OpenAI | ||||
| OpenAI | GPT-5 | $1.25 | $10.00 | 128K |
| OpenAI | GPT-4.1 | $2.00 | $8.00 | 1M |
| OpenAI | GPT-4.1 Mini | $0.40 | $1.60 | 1M |
| OpenAI | GPT-4.1 Nano | $0.10 | $0.40 | 1M |
| OpenAI | GPT-4o | $2.50 | $10.00 | 128K |
| OpenAI | GPT-4o Mini | $0.15 | $0.60 | 128K |
| OpenAI | o3 | $10.00 | $40.00 | 200K |
| OpenAI | o4-mini | $1.10 | $4.40 | 200K |
| Anthropic | ||||
| Anthropic | Claude Opus 4.6 | $15.00 | $75.00 | 1M |
| Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 | 1M |
| Anthropic | Claude Haiku 4.5 | $0.80 | $4.00 | 200K |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | |
| Mistral | ||||
| Mistral | Mistral Large | $2.00 | $6.00 | 128K |
| Mistral | Mistral Small | $0.10 | $0.30 | 128K |
| Cohere | ||||
| Cohere | Command R+ | $2.50 | $10.00 | 128K |
| Cohere | Command R | $0.15 | $0.60 | 128K |
Note on Gemini 2.5 Pro: Google charges $1.25/$10 for prompts over 200K tokens. Under 200K, input drops to $0.625 and output to $5.00. The table shows the higher tier since most production use cases hit the 200K+ range with system prompts and context.
Models by Budget Tier
Not every project needs a frontier model. Here is how models break down by cost, so you can match your budget to the right capability level.
Under $1 per 1M Input Tokens (Budget Tier)
These models handle classification, extraction, summarization, and simple chat at rock-bottom prices.
| Model | Input / 1M | Output / 1M | Best For |
|---|---|---|---|
| GPT-4.1 Nano | $0.10 | $0.40 | High-volume classification, simple extraction |
| Gemini 2.0 Flash | $0.10 | $0.40 | Fast inference, multimodal on a budget |
| Mistral Small | $0.10 | $0.30 | Lightweight European-hosted tasks |
| GPT-4o Mini | $0.15 | $0.60 | General-purpose cheap model |
| Command R | $0.15 | $0.60 | RAG-optimized retrieval tasks |
| GPT-4.1 Mini | $0.40 | $1.60 | Coding and instruction following on a budget |
| Claude Haiku 4.5 | $0.80 | $4.00 | Fast responses, customer-facing chat |
$1 to $5 per 1M Input Tokens (Mid-Range)
The sweet spot for most production applications. These models handle complex reasoning, coding, and multi-step tasks reliably.
| Model | Input / 1M | Output / 1M | Best For |
|---|---|---|---|
| o4-mini | $1.10 | $4.40 | Reasoning tasks at mid-range cost |
| GPT-5 | $1.25 | $10.00 | Frontier general intelligence |
| Gemini 2.5 Pro | $1.25 | $10.00 | Long-context analysis, multimodal |
| GPT-4.1 | $2.00 | $8.00 | Coding, long-context, instruction following |
| Mistral Large | $2.00 | $6.00 | European data residency, multilingual |
| GPT-4o | $2.50 | $10.00 | Multimodal (vision + text) |
| Command R+ | $2.50 | $10.00 | Enterprise RAG, grounded generation |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Coding, analysis, agentic workflows |
$5+ per 1M Input Tokens (Premium)
| Model | Input / 1M | Output / 1M | Best For |
|---|---|---|---|
| o3 | $10.00 | $40.00 | Hard reasoning, math, science problems |
| Claude Opus 4.6 | $15.00 | $75.00 | Complex agentic tasks, deep analysis |
Premium models are rarely needed for production workloads. Use them for difficult reasoning tasks, complex code generation, or when accuracy on edge cases justifies the 10-50x cost increase over mid-range options.
Batch API Discounts
If your workload can tolerate latency (minutes to hours instead of seconds), batch APIs cut costs significantly.
| Provider | Batch Discount | Typical Latency | How It Works |
|---|---|---|---|
| OpenAI | 50% off all models | Up to 24 hours | Submit JSONL file, results returned asynchronously. Available for all GPT and o-series models. |
| Anthropic | 50% off all models | Up to 24 hours | Message Batches API. Submit up to 100,000 requests per batch. Results within 24 hours. |
| 50% off Gemini models | Up to 24 hours | BatchGenerateContent API. Minimum 2x discount on all Gemini models through Vertex AI. | |
| Mistral | Variable | Varies | Batch inference available through La Plateforme. Discount varies by volume commitment. |
With batch pricing, GPT-4.1 drops to $1.00 input / $4.00 output per million tokens. Claude Sonnet 4.6 drops to $1.50 / $7.50. These are significant savings for data processing pipelines, evaluation runs, and content generation at scale.
Prompt Caching
Prompt caching reduces costs when you send the same system prompt or context prefix repeatedly. Instead of reprocessing identical tokens every call, the provider caches them and charges a reduced rate.
| Provider | Cache Write Cost | Cache Read Discount | TTL | Min Tokens |
|---|---|---|---|---|
| OpenAI | Free (automatic) | 50% off input | 5-10 min | 1,024 |
| Anthropic | 25% surcharge on first write | 90% off input | 5 min (refreshes on hit) | 1,024 (Haiku), 2,048 (Sonnet/Opus) |
| Same as input | 75% off input | Configurable | 32,768 |
Anthropic's caching is the most aggressive: 90% off cached input tokens means a long system prompt that costs $3.00/1M on Sonnet 4.6 drops to $0.30/1M on cache hits. The 25% write surcharge pays for itself after just a few requests. OpenAI's caching is automatic (no code changes needed) but gives a smaller discount. Google requires the most tokens before caching kicks in but offers configurable TTL.
Cost Per 1K Tokens (Conversion Table)
Some documentation and older pricing pages still reference cost per 1,000 tokens. To convert: divide the per-1M price by 1,000.
| Model | Input / 1K Tokens | Output / 1K Tokens |
|---|---|---|
| GPT-4.1 Nano | $0.0001 | $0.0004 |
| Gemini 2.0 Flash | $0.0001 | $0.0004 |
| GPT-4o Mini | $0.00015 | $0.0006 |
| GPT-4.1 Mini | $0.0004 | $0.0016 |
| Claude Haiku 4.5 | $0.0008 | $0.004 |
| GPT-5 | $0.00125 | $0.01 |
| GPT-4.1 | $0.002 | $0.008 |
| Claude Sonnet 4.6 | $0.003 | $0.015 |
| GPT-4o | $0.0025 | $0.01 |
| o3 | $0.01 | $0.04 |
| Claude Opus 4.6 | $0.015 | $0.075 |
Per-1K pricing looks deceptively cheap. Always multiply by 1,000 to understand real costs at scale. A chatbot handling 1 million tokens per day at $0.002/1K input costs $2/day or $60/month just for input tokens.
How to Estimate Your Monthly API Costs
Use this formula to budget your LLM spend before committing to a provider.
Monthly Cost = (Daily Requests x Avg Input Tokens x Input Price/1M) + (Daily Requests x Avg Output Tokens x Output Price/1M) x 30
Example 1: Customer Support Chatbot
- 500 conversations/day, 800 input tokens avg (system prompt + user message), 400 output tokens avg
- Using Claude Sonnet 4.6 ($3/$15 per 1M)
- Input: 500 x 800 = 400,000 tokens/day = $1.20/day
- Output: 500 x 400 = 200,000 tokens/day = $3.00/day
- Monthly: ($1.20 + $3.00) x 30 = $126/month
Example 2: Document Processing Pipeline
- 200 documents/day, 5,000 input tokens avg (document + extraction prompt), 500 output tokens avg
- Using GPT-4.1 Mini ($0.40/$1.60 per 1M)
- Input: 200 x 5,000 = 1,000,000 tokens/day = $0.40/day
- Output: 200 x 500 = 100,000 tokens/day = $0.16/day
- Monthly: ($0.40 + $0.16) x 30 = $16.80/month
Example 3: High-Volume Classification
- 50,000 items/day, 200 input tokens avg, 50 output tokens avg
- Using GPT-4.1 Nano ($0.10/$0.40 per 1M)
- Input: 50,000 x 200 = 10,000,000 tokens/day = $1.00/day
- Output: 50,000 x 50 = 2,500,000 tokens/day = $1.00/day
- Monthly: ($1.00 + $1.00) x 30 = $60/month
These estimates assume no caching or batching. With prompt caching on a chatbot (where the system prompt repeats), expect 30-60% lower input costs. With batch API, cut both input and output costs in half.
Provider Comparison by Use Case
Cheapest for High-Volume Chatbots
Winner: GPT-4.1 Nano ($0.10/$0.40) or Gemini 2.0 Flash ($0.10/$0.40). Both cost the same and handle conversational tasks well. Gemini Flash has the edge for multimodal inputs (images in chat). GPT-4.1 Nano has stronger instruction following for structured system prompts. Mistral Small ($0.10/$0.30) is cheapest on output if you need European data residency.
Best for Coding Assistants
Winner: Claude Sonnet 4.6 ($3/$15). Consistently top-ranked on coding benchmarks. GPT-4.1 ($2/$8) is a strong alternative at lower cost, especially for its 1M context window that fits entire codebases. For budget coding, GPT-4.1 Mini ($0.40/$1.60) punches well above its price.
Best for Complex Reasoning
Winner: o3 ($10/$40) for math-heavy and scientific reasoning. Claude Opus 4.6 ($15/$75) for nuanced analysis and agentic multi-step tasks. These are premium models for premium problems. For most reasoning tasks, Claude Sonnet 4.6 or GPT-5 at a fraction of the cost will be sufficient.
Best for RAG and Retrieval
Winner: Command R+ ($2.50/$10). Cohere built Command R+ specifically for retrieval-augmented generation with built-in citation support. Google Gemini 2.5 Pro is the alternative when you need a massive context window (1M tokens) to stuff retrieved documents into a single prompt.
Best for Enterprise with Data Residency Requirements
Winner: Mistral Large ($2/$6). Hosted in Europe, strong multilingual performance, and competitive pricing. Mistral is the default choice when GDPR compliance and data residency are non-negotiable.
Frequently Asked Questions
What is the cheapest LLM API?
As of April 2026, the cheapest LLM APIs are GPT-4.1 Nano, Gemini 2.0 Flash, and Mistral Small, all at $0.10 per million input tokens. Mistral Small edges ahead on output cost at $0.30/1M vs $0.40/1M for the other two. For batch workloads, GPT-4.1 Nano with the 50% batch discount drops to $0.05/$0.20 per million tokens, making it the absolute cheapest option for asynchronous processing.
How much does GPT-4.1 cost per token?
GPT-4.1 costs $2.00 per million input tokens and $8.00 per million output tokens. That works out to $0.000002 per input token and $0.000008 per output token. With the OpenAI Batch API (50% discount), those drop to $1.00/$4.00 per million. With prompt caching (automatic, 50% off cached tokens), repeated system prompts cost $1.00 per million cached input tokens.
How do LLM API prices compare to self-hosting?
Self-hosting open-source models (Llama 3, Mistral, etc.) on your own GPUs costs roughly $1-3 per GPU-hour on cloud providers. At high volume (millions of tokens per day), self-hosting can be 50-80% cheaper than API pricing. At low to moderate volume, APIs are almost always cheaper because you avoid idle GPU costs, infrastructure management, and the engineering overhead of running inference servers. The break-even point is typically around 10-50 million tokens per day, depending on the model size and hardware choice.
What's the difference between input and output token pricing?
Input tokens are what you send to the model: your system prompt, user message, uploaded documents, and any context. Output tokens are what the model generates in response. Output tokens cost 2-5x more than input tokens because generating each output token requires a full forward pass through the model, while input tokens can be processed in parallel. This is why long system prompts with short responses are relatively cheap, while asking a model to write a 5,000-word essay gets expensive fast.
Which LLM API has the best free tier?
Google offers the most generous free tier through Google AI Studio: Gemini 2.0 Flash is free up to 15 requests per minute with generous daily limits. OpenAI offers limited free credits for new accounts. Anthropic provides free access through claude.ai but no free API tier. Mistral offers a free tier on La Plateforme with rate limits. For serious development and testing, Google's free Gemini access is the clear winner.
How often do LLM API prices change?
Prices have been trending down 30-50% per year since 2023. Major price drops usually happen when providers release new model generations (the old model gets cheaper or the new model matches performance at lower cost). OpenAI and Google have been the most aggressive on price cuts. Anthropic tends to hold pricing longer but offers batch and caching discounts. Expect at least 2-3 significant pricing changes per provider per year.