Pricing Guide

LLM API Pricing Comparison 2026 - Every Model Cost Per Token

By Rome Thorndike · April 2, 2026 · 12 min read

LLM API pricing changes constantly. New models launch, old ones get cheaper, and providers quietly adjust rates between announcements. This page tracks every major model's cost per million tokens, updated for April 2026.

Whether you're picking a model for a new project, budgeting compute costs, or comparing providers for a procurement decision, the tables below give you the numbers you need without digging through five different pricing pages.

Master Pricing Table: All Major LLM APIs (April 2026)

Prices are per 1 million tokens. Input is what you send to the model. Output is what the model generates back. Every model charges differently for each direction because output tokens require more compute.

Provider Model Input / 1M Tokens Output / 1M Tokens Context Window
OpenAI
OpenAIGPT-5$1.25$10.00128K
OpenAIGPT-4.1$2.00$8.001M
OpenAIGPT-4.1 Mini$0.40$1.601M
OpenAIGPT-4.1 Nano$0.10$0.401M
OpenAIGPT-4o$2.50$10.00128K
OpenAIGPT-4o Mini$0.15$0.60128K
OpenAIo3$10.00$40.00200K
OpenAIo4-mini$1.10$4.40200K
Anthropic
AnthropicClaude Opus 4.6$15.00$75.001M
AnthropicClaude Sonnet 4.6$3.00$15.001M
AnthropicClaude Haiku 4.5$0.80$4.00200K
Google
GoogleGemini 2.5 Pro$1.25$10.001M
GoogleGemini 2.0 Flash$0.10$0.401M
Mistral
MistralMistral Large$2.00$6.00128K
MistralMistral Small$0.10$0.30128K
Cohere
CohereCommand R+$2.50$10.00128K
CohereCommand R$0.15$0.60128K

Note on Gemini 2.5 Pro: Google charges $1.25/$10 for prompts over 200K tokens. Under 200K, input drops to $0.625 and output to $5.00. The table shows the higher tier since most production use cases hit the 200K+ range with system prompts and context.

Models by Budget Tier

Not every project needs a frontier model. Here is how models break down by cost, so you can match your budget to the right capability level.

Under $1 per 1M Input Tokens (Budget Tier)

These models handle classification, extraction, summarization, and simple chat at rock-bottom prices.

ModelInput / 1MOutput / 1MBest For
GPT-4.1 Nano$0.10$0.40High-volume classification, simple extraction
Gemini 2.0 Flash$0.10$0.40Fast inference, multimodal on a budget
Mistral Small$0.10$0.30Lightweight European-hosted tasks
GPT-4o Mini$0.15$0.60General-purpose cheap model
Command R$0.15$0.60RAG-optimized retrieval tasks
GPT-4.1 Mini$0.40$1.60Coding and instruction following on a budget
Claude Haiku 4.5$0.80$4.00Fast responses, customer-facing chat

$1 to $5 per 1M Input Tokens (Mid-Range)

The sweet spot for most production applications. These models handle complex reasoning, coding, and multi-step tasks reliably.

ModelInput / 1MOutput / 1MBest For
o4-mini$1.10$4.40Reasoning tasks at mid-range cost
GPT-5$1.25$10.00Frontier general intelligence
Gemini 2.5 Pro$1.25$10.00Long-context analysis, multimodal
GPT-4.1$2.00$8.00Coding, long-context, instruction following
Mistral Large$2.00$6.00European data residency, multilingual
GPT-4o$2.50$10.00Multimodal (vision + text)
Command R+$2.50$10.00Enterprise RAG, grounded generation
Claude Sonnet 4.6$3.00$15.00Coding, analysis, agentic workflows

$5+ per 1M Input Tokens (Premium)

ModelInput / 1MOutput / 1MBest For
o3$10.00$40.00Hard reasoning, math, science problems
Claude Opus 4.6$15.00$75.00Complex agentic tasks, deep analysis

Premium models are rarely needed for production workloads. Use them for difficult reasoning tasks, complex code generation, or when accuracy on edge cases justifies the 10-50x cost increase over mid-range options.

Batch API Discounts

If your workload can tolerate latency (minutes to hours instead of seconds), batch APIs cut costs significantly.

ProviderBatch DiscountTypical LatencyHow It Works
OpenAI50% off all modelsUp to 24 hoursSubmit JSONL file, results returned asynchronously. Available for all GPT and o-series models.
Anthropic50% off all modelsUp to 24 hoursMessage Batches API. Submit up to 100,000 requests per batch. Results within 24 hours.
Google50% off Gemini modelsUp to 24 hoursBatchGenerateContent API. Minimum 2x discount on all Gemini models through Vertex AI.
MistralVariableVariesBatch inference available through La Plateforme. Discount varies by volume commitment.

With batch pricing, GPT-4.1 drops to $1.00 input / $4.00 output per million tokens. Claude Sonnet 4.6 drops to $1.50 / $7.50. These are significant savings for data processing pipelines, evaluation runs, and content generation at scale.

Prompt Caching

Prompt caching reduces costs when you send the same system prompt or context prefix repeatedly. Instead of reprocessing identical tokens every call, the provider caches them and charges a reduced rate.

ProviderCache Write CostCache Read DiscountTTLMin Tokens
OpenAIFree (automatic)50% off input5-10 min1,024
Anthropic25% surcharge on first write90% off input5 min (refreshes on hit)1,024 (Haiku), 2,048 (Sonnet/Opus)
GoogleSame as input75% off inputConfigurable32,768

Anthropic's caching is the most aggressive: 90% off cached input tokens means a long system prompt that costs $3.00/1M on Sonnet 4.6 drops to $0.30/1M on cache hits. The 25% write surcharge pays for itself after just a few requests. OpenAI's caching is automatic (no code changes needed) but gives a smaller discount. Google requires the most tokens before caching kicks in but offers configurable TTL.

Cost Per 1K Tokens (Conversion Table)

Some documentation and older pricing pages still reference cost per 1,000 tokens. To convert: divide the per-1M price by 1,000.

ModelInput / 1K TokensOutput / 1K Tokens
GPT-4.1 Nano$0.0001$0.0004
Gemini 2.0 Flash$0.0001$0.0004
GPT-4o Mini$0.00015$0.0006
GPT-4.1 Mini$0.0004$0.0016
Claude Haiku 4.5$0.0008$0.004
GPT-5$0.00125$0.01
GPT-4.1$0.002$0.008
Claude Sonnet 4.6$0.003$0.015
GPT-4o$0.0025$0.01
o3$0.01$0.04
Claude Opus 4.6$0.015$0.075

Per-1K pricing looks deceptively cheap. Always multiply by 1,000 to understand real costs at scale. A chatbot handling 1 million tokens per day at $0.002/1K input costs $2/day or $60/month just for input tokens.

How to Estimate Your Monthly API Costs

Use this formula to budget your LLM spend before committing to a provider.

Monthly Cost = (Daily Requests x Avg Input Tokens x Input Price/1M) + (Daily Requests x Avg Output Tokens x Output Price/1M) x 30

Example 1: Customer Support Chatbot

  • 500 conversations/day, 800 input tokens avg (system prompt + user message), 400 output tokens avg
  • Using Claude Sonnet 4.6 ($3/$15 per 1M)
  • Input: 500 x 800 = 400,000 tokens/day = $1.20/day
  • Output: 500 x 400 = 200,000 tokens/day = $3.00/day
  • Monthly: ($1.20 + $3.00) x 30 = $126/month

Example 2: Document Processing Pipeline

  • 200 documents/day, 5,000 input tokens avg (document + extraction prompt), 500 output tokens avg
  • Using GPT-4.1 Mini ($0.40/$1.60 per 1M)
  • Input: 200 x 5,000 = 1,000,000 tokens/day = $0.40/day
  • Output: 200 x 500 = 100,000 tokens/day = $0.16/day
  • Monthly: ($0.40 + $0.16) x 30 = $16.80/month

Example 3: High-Volume Classification

  • 50,000 items/day, 200 input tokens avg, 50 output tokens avg
  • Using GPT-4.1 Nano ($0.10/$0.40 per 1M)
  • Input: 50,000 x 200 = 10,000,000 tokens/day = $1.00/day
  • Output: 50,000 x 50 = 2,500,000 tokens/day = $1.00/day
  • Monthly: ($1.00 + $1.00) x 30 = $60/month

These estimates assume no caching or batching. With prompt caching on a chatbot (where the system prompt repeats), expect 30-60% lower input costs. With batch API, cut both input and output costs in half.

Provider Comparison by Use Case

Cheapest for High-Volume Chatbots

Winner: GPT-4.1 Nano ($0.10/$0.40) or Gemini 2.0 Flash ($0.10/$0.40). Both cost the same and handle conversational tasks well. Gemini Flash has the edge for multimodal inputs (images in chat). GPT-4.1 Nano has stronger instruction following for structured system prompts. Mistral Small ($0.10/$0.30) is cheapest on output if you need European data residency.

Best for Coding Assistants

Winner: Claude Sonnet 4.6 ($3/$15). Consistently top-ranked on coding benchmarks. GPT-4.1 ($2/$8) is a strong alternative at lower cost, especially for its 1M context window that fits entire codebases. For budget coding, GPT-4.1 Mini ($0.40/$1.60) punches well above its price.

Best for Complex Reasoning

Winner: o3 ($10/$40) for math-heavy and scientific reasoning. Claude Opus 4.6 ($15/$75) for nuanced analysis and agentic multi-step tasks. These are premium models for premium problems. For most reasoning tasks, Claude Sonnet 4.6 or GPT-5 at a fraction of the cost will be sufficient.

Best for RAG and Retrieval

Winner: Command R+ ($2.50/$10). Cohere built Command R+ specifically for retrieval-augmented generation with built-in citation support. Google Gemini 2.5 Pro is the alternative when you need a massive context window (1M tokens) to stuff retrieved documents into a single prompt.

Best for Enterprise with Data Residency Requirements

Winner: Mistral Large ($2/$6). Hosted in Europe, strong multilingual performance, and competitive pricing. Mistral is the default choice when GDPR compliance and data residency are non-negotiable.

Frequently Asked Questions

What is the cheapest LLM API?

As of April 2026, the cheapest LLM APIs are GPT-4.1 Nano, Gemini 2.0 Flash, and Mistral Small, all at $0.10 per million input tokens. Mistral Small edges ahead on output cost at $0.30/1M vs $0.40/1M for the other two. For batch workloads, GPT-4.1 Nano with the 50% batch discount drops to $0.05/$0.20 per million tokens, making it the absolute cheapest option for asynchronous processing.

How much does GPT-4.1 cost per token?

GPT-4.1 costs $2.00 per million input tokens and $8.00 per million output tokens. That works out to $0.000002 per input token and $0.000008 per output token. With the OpenAI Batch API (50% discount), those drop to $1.00/$4.00 per million. With prompt caching (automatic, 50% off cached tokens), repeated system prompts cost $1.00 per million cached input tokens.

How do LLM API prices compare to self-hosting?

Self-hosting open-source models (Llama 3, Mistral, etc.) on your own GPUs costs roughly $1-3 per GPU-hour on cloud providers. At high volume (millions of tokens per day), self-hosting can be 50-80% cheaper than API pricing. At low to moderate volume, APIs are almost always cheaper because you avoid idle GPU costs, infrastructure management, and the engineering overhead of running inference servers. The break-even point is typically around 10-50 million tokens per day, depending on the model size and hardware choice.

What's the difference between input and output token pricing?

Input tokens are what you send to the model: your system prompt, user message, uploaded documents, and any context. Output tokens are what the model generates in response. Output tokens cost 2-5x more than input tokens because generating each output token requires a full forward pass through the model, while input tokens can be processed in parallel. This is why long system prompts with short responses are relatively cheap, while asking a model to write a 5,000-word essay gets expensive fast.

Which LLM API has the best free tier?

Google offers the most generous free tier through Google AI Studio: Gemini 2.0 Flash is free up to 15 requests per minute with generous daily limits. OpenAI offers limited free credits for new accounts. Anthropic provides free access through claude.ai but no free API tier. Mistral offers a free tier on La Plateforme with rate limits. For serious development and testing, Google's free Gemini access is the clear winner.

How often do LLM API prices change?

Prices have been trending down 30-50% per year since 2023. Major price drops usually happen when providers release new model generations (the old model gets cheaper or the new model matches performance at lower cost). OpenAI and Google have been the most aggressive on price cuts. Anthropic tends to hold pricing longer but offers batch and caching discounts. Expect at least 2-3 significant pricing changes per provider per year.

RT
About the Author

Rome Thorndike is the founder of the Prompt Engineer Collective, a community of over 1,300 prompt engineering professionals, and author of The AI News Digest, a weekly newsletter with 2,700+ subscribers. Rome brings hands-on AI/ML experience from Microsoft, where he worked with Dynamics and Azure AI/ML solutions, and later led sales at Datajoy (acquired by Databricks).

Get smarter about AI tools, careers & strategy. Every week.

AI News Digest covers industry moves & tool updates. AI Pulse covers salary data & career strategy. Both free.

2,700+ subscribers. Unsubscribe anytime.