LLM API Pricing Comparison 2026 - Every Model Cost Per Token

LLM API pricing changes constantly. New models launch, old ones get cheaper, and providers quietly adjust rates between announcements. This page tracks every major model's cost per million tokens, updated for April 2026.

Whether you're picking a model for a new project, budgeting compute costs, or comparing providers for a procurement decision, the tables below give you the numbers you need without digging through five different pricing pages.

Master Pricing Table: All Major LLM APIs (April 2026)

Prices are per 1 million tokens. Input is what you send to the model. Output is what the model generates back. Every model charges differently for each direction because output tokens require more compute.

Provider	Model	Input / 1M Tokens	Output / 1M Tokens	Context Window
OpenAI
OpenAI	GPT-5	$1.25	$10.00	128K
OpenAI	GPT-4.1	$2.00	$8.00	1M
OpenAI	GPT-4.1 Mini	$0.40	$1.60	1M
OpenAI	GPT-4.1 Nano	$0.10	$0.40	1M
OpenAI	GPT-4o	$2.50	$10.00	128K
OpenAI	GPT-4o Mini	$0.15	$0.60	128K
OpenAI	o3	$10.00	$40.00	200K
OpenAI	o4-mini	$1.10	$4.40	200K
Anthropic
Anthropic	Claude Opus 4.6	$15.00	$75.00	1M
Anthropic	Claude Sonnet 4.6	$3.00	$15.00	1M
Anthropic	Claude Haiku 4.5	$0.80	$4.00	200K
Google
Google	Gemini 2.5 Pro	$1.25	$10.00	1M
Google	Gemini 2.0 Flash	$0.10	$0.40	1M
Mistral
Mistral	Mistral Large	$2.00	$6.00	128K
Mistral	Mistral Small	$0.10	$0.30	128K
Cohere
Cohere	Command R+	$2.50	$10.00	128K
Cohere	Command R	$0.15	$0.60	128K

Note on Gemini 2.5 Pro: Google charges $1.25/$10 for prompts over 200K tokens. Under 200K, input drops to $0.625 and output to $5.00. The table shows the higher tier since most production use cases hit the 200K+ range with system prompts and context.

Models by Budget Tier

Not every project needs a frontier model. Here is how models break down by cost, so you can match your budget to the right capability level.

Under $1 per 1M Input Tokens (Budget Tier)

These models handle classification, extraction, summarization, and simple chat at rock-bottom prices.

Model	Input / 1M	Output / 1M	Best For
GPT-4.1 Nano	$0.10	$0.40	High-volume classification, simple extraction
Gemini 2.0 Flash	$0.10	$0.40	Fast inference, multimodal on a budget
Mistral Small	$0.10	$0.30	Lightweight European-hosted tasks
GPT-4o Mini	$0.15	$0.60	General-purpose cheap model
Command R	$0.15	$0.60	RAG-optimized retrieval tasks
GPT-4.1 Mini	$0.40	$1.60	Coding and instruction following on a budget
Claude Haiku 4.5	$0.80	$4.00	Fast responses, customer-facing chat

$1 to $5 per 1M Input Tokens (Mid-Range)

The sweet spot for most production applications. These models handle complex reasoning, coding, and multi-step tasks reliably.

Model	Input / 1M	Output / 1M	Best For
o4-mini	$1.10	$4.40	Reasoning tasks at mid-range cost
GPT-5	$1.25	$10.00	Frontier general intelligence
Gemini 2.5 Pro	$1.25	$10.00	Long-context analysis, multimodal
GPT-4.1	$2.00	$8.00	Coding, long-context, instruction following
Mistral Large	$2.00	$6.00	European data residency, multilingual
GPT-4o	$2.50	$10.00	Multimodal (vision + text)
Command R+	$2.50	$10.00	Enterprise RAG, grounded generation
Claude Sonnet 4.6	$3.00	$15.00	Coding, analysis, agentic workflows

$5+ per 1M Input Tokens (Premium)

Model	Input / 1M	Output / 1M	Best For
o3	$10.00	$40.00	Hard reasoning, math, science problems
Claude Opus 4.6	$15.00	$75.00	Complex agentic tasks, deep analysis

Premium models are rarely needed for production workloads. Use them for difficult reasoning tasks, complex code generation, or when accuracy on edge cases justifies the 10-50x cost increase over mid-range options.

Batch API Discounts

If your workload can tolerate latency (minutes to hours instead of seconds), batch APIs cut costs significantly.

Provider	Batch Discount	Typical Latency	How It Works
OpenAI	50% off all models	Up to 24 hours	Submit JSONL file, results returned asynchronously. Available for all GPT and o-series models.
Anthropic	50% off all models	Up to 24 hours	Message Batches API. Submit up to 100,000 requests per batch. Results within 24 hours.
Google	50% off Gemini models	Up to 24 hours	BatchGenerateContent API. Minimum 2x discount on all Gemini models through Vertex AI.
Mistral	Variable	Varies	Batch inference available through La Plateforme. Discount varies by volume commitment.

With batch pricing, GPT-4.1 drops to $1.00 input / $4.00 output per million tokens. Claude Sonnet 4.6 drops to $1.50 / $7.50. These are significant savings for data processing pipelines, evaluation runs, and content generation at scale.

Prompt Caching

Prompt caching reduces costs when you send the same system prompt or context prefix repeatedly. Instead of reprocessing identical tokens every call, the provider caches them and charges a reduced rate.

Provider	Cache Write Cost	Cache Read Discount	TTL	Min Tokens
OpenAI	Free (automatic)	50% off input	5-10 min	1,024
Anthropic	25% surcharge on first write	90% off input	5 min (refreshes on hit)	1,024 (Haiku), 2,048 (Sonnet/Opus)
Google	Same as input	75% off input	Configurable	32,768

Anthropic's caching is the most aggressive: 90% off cached input tokens means a long system prompt that costs $3.00/1M on Sonnet 4.6 drops to $0.30/1M on cache hits. The 25% write surcharge pays for itself after just a few requests. OpenAI's caching is automatic (no code changes needed) but gives a smaller discount. Google requires the most tokens before caching kicks in but offers configurable TTL.

Cost Per 1K Tokens (Conversion Table)

Some documentation and older pricing pages still reference cost per 1,000 tokens. To convert: divide the per-1M price by 1,000.

Model	Input / 1K Tokens	Output / 1K Tokens
GPT-4.1 Nano	$0.0001	$0.0004
Gemini 2.0 Flash	$0.0001	$0.0004
GPT-4o Mini	$0.00015	$0.0006
GPT-4.1 Mini	$0.0004	$0.0016
Claude Haiku 4.5	$0.0008	$0.004
GPT-5	$0.00125	$0.01
GPT-4.1	$0.002	$0.008
Claude Sonnet 4.6	$0.003	$0.015
GPT-4o	$0.0025	$0.01
o3	$0.01	$0.04
Claude Opus 4.6	$0.015	$0.075

Per-1K pricing looks deceptively cheap. Always multiply by 1,000 to understand real costs at scale. A chatbot handling 1 million tokens per day at $0.002/1K input costs $2/day or $60/month just for input tokens.

How to Estimate Your Monthly API Costs

Use this formula to budget your LLM spend before committing to a provider.

Monthly Cost = (Daily Requests x Avg Input Tokens x Input Price/1M) + (Daily Requests x Avg Output Tokens x Output Price/1M) x 30

Example 1: Customer Support Chatbot

500 conversations/day, 800 input tokens avg (system prompt + user message), 400 output tokens avg
Using Claude Sonnet 4.6 ($3/$15 per 1M)
Input: 500 x 800 = 400,000 tokens/day = $1.20/day
Output: 500 x 400 = 200,000 tokens/day = $3.00/day
Monthly: ($1.20 + $3.00) x 30 = $126/month

Example 2: Document Processing Pipeline

200 documents/day, 5,000 input tokens avg (document + extraction prompt), 500 output tokens avg
Using GPT-4.1 Mini ($0.40/$1.60 per 1M)
Input: 200 x 5,000 = 1,000,000 tokens/day = $0.40/day
Output: 200 x 500 = 100,000 tokens/day = $0.16/day
Monthly: ($0.40 + $0.16) x 30 = $16.80/month

Example 3: High-Volume Classification

50,000 items/day, 200 input tokens avg, 50 output tokens avg
Using GPT-4.1 Nano ($0.10/$0.40 per 1M)
Input: 50,000 x 200 = 10,000,000 tokens/day = $1.00/day
Output: 50,000 x 50 = 2,500,000 tokens/day = $1.00/day
Monthly: ($1.00 + $1.00) x 30 = $60/month

These estimates assume no caching or batching. With prompt caching on a chatbot (where the system prompt repeats), expect 30-60% lower input costs. With batch API, cut both input and output costs in half.

Provider Comparison by Use Case

Cheapest for High-Volume Chatbots

Winner: GPT-4.1 Nano ($0.10/$0.40) or Gemini 2.0 Flash ($0.10/$0.40). Both cost the same and handle conversational tasks well. Gemini Flash has the edge for multimodal inputs (images in chat). GPT-4.1 Nano has stronger instruction following for structured system prompts. Mistral Small ($0.10/$0.30) is cheapest on output if you need European data residency.

Best for Coding Assistants

Winner: Claude Sonnet 4.6 ($3/$15). Consistently top-ranked on coding benchmarks. GPT-4.1 ($2/$8) is a strong alternative at lower cost, especially for its 1M context window that fits entire codebases. For budget coding, GPT-4.1 Mini ($0.40/$1.60) punches well above its price.

Best for Complex Reasoning

Winner: o3 ($10/$40) for math-heavy and scientific reasoning. Claude Opus 4.6 ($15/$75) for nuanced analysis and agentic multi-step tasks. These are premium models for premium problems. For most reasoning tasks, Claude Sonnet 4.6 or GPT-5 at a fraction of the cost will be sufficient.

Best for RAG and Retrieval

Winner: Command R+ ($2.50/$10). Cohere built Command R+ specifically for retrieval-augmented generation with built-in citation support. Google Gemini 2.5 Pro is the alternative when you need a massive context window (1M tokens) to stuff retrieved documents into a single prompt.

Best for Enterprise with Data Residency Requirements

Winner: Mistral Large ($2/$6). Hosted in Europe, strong multilingual performance, and competitive pricing. Mistral is the default choice when GDPR compliance and data residency are non-negotiable.

Frequently Asked Questions

What is the cheapest LLM API?

As of April 2026, the cheapest LLM APIs are GPT-4.1 Nano, Gemini 2.0 Flash, and Mistral Small, all at $0.10 per million input tokens. Mistral Small edges ahead on output cost at $0.30/1M vs $0.40/1M for the other two. For batch workloads, GPT-4.1 Nano with the 50% batch discount drops to $0.05/$0.20 per million tokens, making it the absolute cheapest option for asynchronous processing.

How much does GPT-4.1 cost per token?

GPT-4.1 costs $2.00 per million input tokens and $8.00 per million output tokens. That works out to $0.000002 per input token and $0.000008 per output token. With the OpenAI Batch API (50% discount), those drop to $1.00/$4.00 per million. With prompt caching (automatic, 50% off cached tokens), repeated system prompts cost $1.00 per million cached input tokens.

How do LLM API prices compare to self-hosting?

Self-hosting open-source models (Llama 3, Mistral, etc.) on your own GPUs costs roughly $1-3 per GPU-hour on cloud providers. At high volume (millions of tokens per day), self-hosting can be 50-80% cheaper than API pricing. At low to moderate volume, APIs are almost always cheaper because you avoid idle GPU costs, infrastructure management, and the engineering overhead of running inference servers. The break-even point is typically around 10-50 million tokens per day, depending on the model size and hardware choice.

What's the difference between input and output token pricing?

Input tokens are what you send to the model: your system prompt, user message, uploaded documents, and any context. Output tokens are what the model generates in response. Output tokens cost 2-5x more than input tokens because generating each output token requires a full forward pass through the model, while input tokens can be processed in parallel. This is why long system prompts with short responses are relatively cheap, while asking a model to write a 5,000-word essay gets expensive fast.

Which LLM API has the best free tier?

Google offers the most generous free tier through Google AI Studio: Gemini 2.0 Flash is free up to 15 requests per minute with generous daily limits. OpenAI offers limited free credits for new accounts. Anthropic provides free access through claude.ai but no free API tier. Mistral offers a free tier on La Plateforme with rate limits. For serious development and testing, Google's free Gemini access is the clear winner.

How often do LLM API prices change?

Prices have been trending down 30-50% per year since 2023. Major price drops usually happen when providers release new model generations (the old model gets cheaper or the new model matches performance at lower cost). OpenAI and Google have been the most aggressive on price cuts. Anthropic tends to hold pricing longer but offers batch and caching discounts. Expect at least 2-3 significant pricing changes per provider per year.

About the Author

Rome Thorndike is the founder of the Prompt Engineer Collective, a community of over 1,300 prompt engineering professionals, and author of The AI News Digest, a weekly newsletter with 2,700+ subscribers. Rome brings hands-on AI/ML experience from Microsoft, where he worked with Dynamics and Azure AI/ML solutions, and later led sales at Datajoy (acquired by Databricks).