LLM Pricing Per Million Tokens: Every API Provider Compared (April 2026)
Token-based pricing is confusing. OpenAI quotes "per 1M tokens." Anthropic does the same but with different tiers. Google changes prices based on context length. And output tokens always cost more than input. Every provider uses a slightly different structure, making direct comparison painful. This page normalizes everything to a single unit: cost per 1 million tokens. Input and output, side by side, for every major model worth considering in April 2026.
Complete Pricing Table: All Major LLM APIs (April 2026)
This table covers six providers and 20+ models. All prices are per 1 million tokens. "Input" is what you send to the model (prompts, context, documents). "Output" is what the model generates back. Where a provider offers tiered pricing based on context length, both tiers are listed.
OpenAI
Anthropic
Meta (Llama 4, via inference providers)
Mistral
Cohere
Cheapest Models by Price Tier
Under $0.20 per 1M Input Tokens: Classification and Extraction
GPT-4.1 Nano ($0.10/$0.40), Mistral Small ($0.10/$0.30), Gemini 2.5 Flash ($0.15/$0.60), Llama 4 Scout ($0.15/$0.60), and Cohere Command R ($0.15/$0.60). These models handle classification, entity extraction, data labeling, and simple Q&A. If your task doesn't require nuanced reasoning, start here. Mistral Small has the cheapest output tokens at $0.30 per million. GPT-4.1 Nano has the best overall quality at this price point. For a chatbot that handles 10,000 conversations a day, Nano costs $39/month. The same traffic on GPT-4.1 costs $780/month.
$0.40 to $2.00 per 1M Input: Production Workloads
GPT-4.1 Mini ($0.40/$1.60), Mistral Medium ($0.40/$2.00), Haiku 4.5 ($1.00/$5.00), o4-mini ($1.10/$4.40), GPT-5 ($1.25/$10.00), Gemini 2.5 Pro ($1.25/$10.00), GPT-4.1 ($2.00/$8.00), and o3 ($2.00/$8.00). This is the production tier. Most real applications live here. GPT-4.1 is the default for coding, content, and general tasks. o4-mini is the best value if you need step-by-step reasoning. GPT-5 and Gemini 2.5 Pro offer flagship capabilities with cheap input but expensive output ($10/1M).
$2.00+ per 1M Input: Hard Problems and Premium Quality
Mistral Large ($2.00/$6.00), Cohere Command R+ ($2.50/$10.00), Sonnet 4.6 ($3.00/$15.00), and Opus 4.6 ($5.00/$25.00). Reserve these for tasks where cheaper models fall short. Opus 4.6 is the most expensive mainstream model at $5/$25 but excels at complex reasoning, nuanced writing, and difficult coding. Sonnet 4.6 offers 80-90% of Opus quality at 60% of the cost. Mistral Large is worth considering if EU data residency is a requirement.
Why Output Tokens Cost 4-8x More Than Input
Every LLM provider charges more for output tokens than input tokens. GPT-4.1 charges $2 for input but $8 for output. Opus 4.6 charges $5 for input but $25 for output. This isn't arbitrary. It reflects how the models work.
When you send input tokens (your prompt), the model processes them all at once in parallel. The GPU crunches through your entire prompt in a single forward pass. It's fast and computationally efficient.
Output tokens are different. The model generates them one at a time, sequentially. Each new token depends on every token that came before it. The model can't skip ahead or parallelize this step. Token 50 requires tokens 1 through 49 to exist first. This sequential dependency means each output token requires a separate forward pass through the model, using far more compute per token than input processing.
This has a direct impact on your bill. A typical API call with 2,000 input tokens and 500 output tokens on GPT-4.1 costs $0.004 for input and $0.004 for output. Equal cost despite 4x fewer output tokens. If your responses average 1,000 output tokens, output is 67% of your total cost.
The practical takeaway: controlling output length is the single most effective way to reduce costs. Tell the model to be concise. Use structured output (JSON) instead of prose. For classification, request a single label instead of an explanation. Set max_tokens to prevent runaway responses. A 10-class classifier using structured output typically uses 5-20 output tokens instead of 50-200 with free-form text. At scale, that's a 10x cost reduction on your most expensive token type.
How to Calculate Your Monthly Cost: 3 Real Scenarios
Token pricing means nothing in isolation. Here's what three common workloads cost per month across different models, with the math shown.
Scenario 1: Customer Support Chatbot (10K conversations/day)
Each conversation averages 500 input tokens (system prompt + user message) and 200 output tokens. Daily tokens: 5M input, 2M output.
- GPT-4.1 Nano: (5 x $0.10) + (2 x $0.40) = $1.30/day, $39/month
- Gemini 2.5 Flash: (5 x $0.15) + (2 x $0.60) = $1.95/day, $59/month
- GPT-4.1: (5 x $2.00) + (2 x $8.00) = $26/day, $780/month
- Sonnet 4.6: (5 x $3.00) + (2 x $15.00) = $45/day, $1,350/month
For most support chatbots, GPT-4.1 Nano or Gemini Flash provides sufficient quality. Upgrading to GPT-4.1 or Sonnet only makes sense if the chatbot handles complex, multi-turn conversations where reasoning quality directly impacts customer satisfaction.
Scenario 2: RAG Pipeline (50K queries/day)
Each query includes 2,000 input tokens (retrieved documents + question) and 500 output tokens. Daily tokens: 100M input, 25M output.
- GPT-4.1 Nano: (100 x $0.10) + (25 x $0.40) = $20/day, $600/month
- Mistral Small: (100 x $0.10) + (25 x $0.30) = $17.50/day, $525/month
- GPT-4.1: (100 x $2.00) + (25 x $8.00) = $400/day, $12,000/month
- Sonnet 4.6: (100 x $3.00) + (25 x $15.00) = $675/day, $20,250/month
RAG pipelines are input-heavy. Prompt caching helps here because the system prompt and few-shot examples repeat on every query. With 75% cache hits on GPT-4.1, that $12,000/month drops to roughly $5,000/month.
Scenario 3: Batch Document Classification (1M documents, one-time)
Each document averages 1,000 input tokens. Output is a single classification label (10 tokens). Total: 1B input tokens, 10M output tokens.
- GPT-4.1 Nano: (1000 x $0.10) + (10 x $0.40) = $104 total
- GPT-4.1 Nano (batch, 50% off): $52 total
- Gemini 2.5 Flash: (1000 x $0.15) + (10 x $0.60) = $156 total
- GPT-4.1: (1000 x $2.00) + (10 x $8.00) = $2,080 total
Classification is the ideal use case for cheap models. The difference between $52 (Nano batch) and $2,080 (GPT-4.1) is 40x, and for labeling tasks, Nano's accuracy is typically within 1-2% of the larger model.
Batch and Caching Discounts That Change the Math
The per-million-token prices above are sticker prices. Three discount mechanisms can cut your actual costs by 50-90%. Ignoring them means overpaying.
OpenAI Batch API: 50% off everything. Submit requests as a JSONL file and get results within 24 hours. Every token costs half. GPT-4.1 drops from $2/$8 to $1/$4. GPT-4.1 Nano drops to $0.05/$0.20. Use this for any workload that doesn't need real-time responses: data extraction, content generation, nightly processing, bulk classification. It's the single biggest cost reduction available from any provider.
Prompt caching: 50-90% off input tokens. Both OpenAI and Anthropic cache repeated prompt prefixes automatically. If your system prompt, few-shot examples, or document context stays the same across requests, subsequent calls pay a reduced rate on those cached tokens. OpenAI's discounts vary by model: GPT-5 gets 90% off cached reads, GPT-4.1 gets 75% off, o-series gets 50% off. Anthropic offers 90% off cached input tokens across all Claude models. Structure your prompts with stable content first and variable content last to maximize cache hits.
Google's free tier. Gemini 2.5 Flash is free under certain rate limits (roughly 15 RPM and 1M TPM). For prototyping, low-volume internal tools, or development environments, this is zero-cost access to a model that competes with GPT-4.1 Nano in quality. No credit card required.
These discounts stack in some cases. A cached batch request through OpenAI's GPT-4.1 pays $0.25 per million cached input tokens and $4 per million output tokens. That's 87.5% off standard input pricing. The sticker prices at the top of this page are the ceiling, not the floor.
Which Provider Is Cheapest for Your Use Case?
Stop comparing models in isolation. Match the model to your task. Here are clear winners in each category.
Simple classification, extraction, or routing
Pick GPT-4.1 Nano ($0.10/$0.40) or Gemini 2.5 Flash ($0.15/$0.60). For labeling data, extracting structured fields from text, or routing queries to other models, you don't need a $3+ model. Nano handles these tasks with 95%+ accuracy at a fraction of the cost. Gemini Flash is a strong alternative, especially if you want Google's free tier for development. If output cost matters most, Mistral Small ($0.10/$0.30) has the cheapest output tokens available.
General production (chatbots, content, coding)
Pick GPT-4.1 ($2/$8) or Sonnet 4.6 ($3/$15). This is where most teams spend their money. GPT-4.1 is 33% cheaper on input and 47% cheaper on output. Sonnet 4.6 tends to produce higher-quality writing and handles ambiguous instructions better. Test both on your specific prompts. If cost is the priority, GPT-4.1 wins. If output quality on nuanced tasks matters more, Sonnet 4.6 is worth the premium.
Complex reasoning (math, logic, multi-step analysis)
Pick o4-mini ($1.10/$4.40) or Opus 4.6 ($5/$25). o4-mini is the value play for reasoning tasks. It uses chain-of-thought reasoning at a low price and handles most math, science, and logic problems well. Opus 4.6 is the quality ceiling but costs 5x more on input and 6x more on output. For most reasoning workloads, start with o4-mini and only escalate to Opus when accuracy drops below your threshold. o3 ($2/$8) sits between them as a mid-range option.
Budget-sensitive or experimental
Pick Gemini 2.5 Flash (free tier) or Llama 4 Scout via Groq ($0.15/$0.60). Google's free tier gets you a competitive model at zero cost for low-volume use. Groq's inference of Llama 4 is extremely fast and cheap. Both work for prototyping, internal tools, and development. When you need to move to production, switch to GPT-4.1 Nano or stay on Gemini Flash's paid tier.
The Bottom Line
The cheapest capable model is GPT-4.1 Nano at $0.10 per million input tokens, with Mistral Small and Gemini 2.5 Flash close behind. For production workloads, GPT-4.1 ($2/$8) and Sonnet 4.6 ($3/$15) are the two defaults. For reasoning, o4-mini ($1.10/$4.40) is the value pick. The biggest cost lever isn't which model you pick. It's routing simple tasks to cheap models instead of sending everything through the same expensive one. Combine that with batch processing and prompt caching, and your actual costs will be 30-60% below the sticker prices on this page.
Related Resources
Frequently Asked Questions
What does "per million tokens" mean?
LLM providers charge based on tokens, which are chunks of text roughly 3-4 characters long. "Per million tokens" (per 1M tokens) is the standard pricing unit. One million tokens is approximately 750,000 words. If a model costs $2 per 1M input tokens and you send 500,000 tokens in a month, your input cost is $1. Some older documentation quotes prices "per 1K tokens." To convert, multiply by 1,000. A model priced at $0.002 per 1K tokens costs $2 per 1M tokens.
How many tokens is 1,000 words?
In English, 1,000 words is roughly 1,300 to 1,400 tokens. The exact count depends on word length and punctuation. Code typically tokenizes at a higher ratio because of special characters and syntax. A 10-page report (around 5,000 words) is about 6,500-7,000 tokens. You can use OpenAI's tiktoken library or Anthropic's token counter to get exact counts before sending requests.
Why do output tokens cost more than input tokens?
Input tokens can be processed in parallel because the model reads the entire prompt at once. Output tokens must be generated one at a time, sequentially, because each new token depends on the ones before it. This sequential generation uses more GPU compute per token. That's why output costs 4-8x more than input across every major provider. It also means verbose responses are disproportionately expensive. Controlling output length is the single most effective cost optimization.
Which LLM has the cheapest API?
GPT-4.1 Nano at $0.10/$0.40 per million tokens is the cheapest capable model from a major provider. Mistral Small matches it on input at $0.10 and beats it on output at $0.30. Gemini 2.5 Flash is $0.15/$0.60 but has a free tier for low-volume use. For open-source models, Llama 4 Scout via Groq costs about $0.15/$0.60. With OpenAI's batch API (50% off), GPT-4.1 Nano drops to $0.05/$0.20, making it the cheapest option at scale.
How do I switch providers to save money?
Most LLM APIs follow the same request/response format: a messages array with system, user, and assistant roles. Switching providers usually means changing the API endpoint, API key, and model name. Libraries like LiteLLM and OpenRouter provide a unified interface across providers, letting you switch with a single config change. The hard part isn't the code. It's testing. Different models behave differently on the same prompt. Always run your evaluation suite against the new model before switching production traffic.
Are there free LLM APIs?
Google offers a free tier for Gemini 2.5 Flash with rate limits (around 15 RPM and 1M TPM for free users). OpenAI has a free tier with very limited rate limits (3 RPM for reasoning models). Groq offers free access to open-source models like Llama 4 with rate limits. These free tiers are enough for prototyping and development but won't support production traffic. For serious volume, expect to pay. But the cheapest models ($0.10/1M tokens) make the cost minimal for most use cases.