LLM Pricing Per Million Tokens: Every API Provider Compared (April 2026)

Q: Which LLM has the cheapest API?

GPT-4.1 Nano at $0.10 per 1M input tokens and $0.40 per 1M output tokens is the cheapest capable model from a major provider. Gemini 2.5 Flash ($0.15/$0.60) and Mistral Small ($0.10/$0.30) are close alternatives. Google also offers a free tier for Gemini Flash under certain rate limits.

Q: How do I switch providers to save money?

Most LLM APIs follow the same request/response format (messages array with roles). Switching providers usually means changing the API endpoint, API key, and model name. Libraries like LiteLLM and OpenRouter provide a unified interface across providers, letting you switch with a single config change. Always run your evaluation suite against the new model before switching production traffic.

Q: What is the cost per token for GPT and Claude?

Divide the per-million price by one million. GPT-4.1 input is $0.000002 per token and output is $0.000008 per token. Claude Sonnet 4.6 is $0.000003 input and $0.000015 output. Claude Opus 4.6 is $0.000005 input and $0.000025 output. Providers quote per million because per-token figures are too small to read.

Token-based pricing is confusing. OpenAI quotes "per 1M tokens." Anthropic does the same but with different tiers. Google changes prices based on context length. And output tokens always cost more than input. Every provider uses a slightly different structure, making direct comparison painful. This page normalizes everything to a single unit: cost per 1 million tokens. Input and output, side by side, for every major model worth considering in April 2026.

Key Takeaways

One million tokens is roughly 750,000 English words, or about 1,500 pages of plain text.
The cheapest capable model as of April 2026 is GPT-4.1 Nano at $0.10 input and $0.40 output per million tokens.
Output tokens cost 4 to 8 times more than input tokens because the model generates them one at a time.
At GPT-4.1 prices ($2 input / $8 output), processing 1 million input tokens costs $2 and generating 1 million output tokens costs $8.
Batch processing (50% off) and prompt caching (50 to 90% off input) routinely cut real bills 30 to 60% below sticker price.

Complete Pricing Table: All Major LLM APIs (April 2026)

LLM Pricing Per Million Tokens 2026: Every Model Compared data visualization — LLM Pricing Per Million Tokens 2026: Every Model Compared

This table covers six providers and 20+ models. All prices are per 1 million tokens. "Input" is what you send to the model (prompts, context, documents). "Output" is what the model generates back. Where a provider offers tiered pricing based on context length, both tiers are listed.

OpenAI

Model	Input / 1M tokens	Output / 1M tokens	Notes
GPT-4.1 Nano	$0.10	$0.40	Cheapest capable model. 1M context.
GPT-4.1 Mini	$0.40	$1.60	Mid-tier. 1M context.
o4-mini	$1.10	$4.40	Budget reasoning. 200K context.
GPT-5	$1.25	$10.00	Most capable. 128K context.
GPT-4.1	$2.00	$8.00	Production workhorse. 1M context.
o3	$2.00	$8.00	Advanced reasoning. 200K context.

Anthropic

Model	Input / 1M tokens	Output / 1M tokens	Notes
Claude Haiku 4.5	$1.00	$5.00	Fast, affordable. 200K context.
Claude Sonnet 4.6	$3.00	$15.00	Best quality/cost balance. 200K context.
Claude Opus 4.6	$5.00	$25.00	Most capable Claude. 200K context.

Google

Model	Input / 1M tokens	Output / 1M tokens	Notes
Gemini 2.5 Flash (<200K)	$0.15	$0.60	Budget tier under 200K tokens.
Gemini 2.5 Flash (>200K)	$0.30	$1.20	2x price above 200K tokens.
Gemini 2.5 Pro (<200K)	$1.25	$10.00	Flagship Google model.
Gemini 2.5 Pro (>200K)	$2.50	$15.00	Long-context premium.

Meta (Llama 4, via inference providers)

Model	Input / 1M tokens	Output / 1M tokens	Notes
Llama 4 Scout (via Groq)	$0.15	$0.60	Open-source. Fastest inference.
Llama 4 Maverick (via Together)	~$0.20	~$0.80	Open-source. Larger model.

Mistral

Model	Input / 1M tokens	Output / 1M tokens	Notes
Mistral Small	$0.10	$0.30	Cheapest output per token.
Mistral Medium	$0.40	$2.00	Mid-tier European alternative.
Mistral Large	$2.00	$6.00	Flagship. EU data residency.

Cohere

Model	Input / 1M tokens	Output / 1M tokens	Notes
Command R	$0.15	$0.60	Budget RAG-optimized model.
Command R+	$2.50	$10.00	Flagship. Strong at RAG + citations.

Cheapest Models by Price Tier

Under $0.20 per 1M Input Tokens: Classification and Extraction

GPT-4.1 Nano ($0.10/$0.40), Mistral Small ($0.10/$0.30), Gemini 2.5 Flash ($0.15/$0.60), Llama 4 Scout ($0.15/$0.60), and Cohere Command R ($0.15/$0.60). These models handle classification, entity extraction, data labeling, and simple Q&A. If your task doesn't require nuanced reasoning, start here. Mistral Small has the cheapest output tokens at $0.30 per million. GPT-4.1 Nano has the best overall quality at this price point. For a chatbot that handles 10,000 conversations a day, Nano costs $39/month. The same traffic on GPT-4.1 costs $780/month.

$0.40 to $2.00 per 1M Input: Production Workloads

GPT-4.1 Mini ($0.40/$1.60), Mistral Medium ($0.40/$2.00), Haiku 4.5 ($1.00/$5.00), o4-mini ($1.10/$4.40), GPT-5 ($1.25/$10.00), Gemini 2.5 Pro ($1.25/$10.00), GPT-4.1 ($2.00/$8.00), and o3 ($2.00/$8.00). This is the production tier. Most real applications live here. GPT-4.1 is the default for coding, content, and general tasks. o4-mini is the best value if you need step-by-step reasoning. GPT-5 and Gemini 2.5 Pro offer flagship capabilities with cheap input but expensive output ($10/1M).

$2.00+ per 1M Input: Hard Problems and Premium Quality

Mistral Large ($2.00/$6.00), Cohere Command R+ ($2.50/$10.00), Sonnet 4.6 ($3.00/$15.00), and Opus 4.6 ($5.00/$25.00). Reserve these for tasks where cheaper models fall short. Opus 4.6 is the most expensive mainstream model at $5/$25 but excels at complex reasoning, nuanced writing, and difficult coding. Sonnet 4.6 offers 80-90% of Opus quality at 60% of the cost. Mistral Large is worth considering if EU data residency is a requirement.

Why Output Tokens Cost 4-8x More Than Input

Every LLM provider charges more for output tokens than input tokens. GPT-4.1 charges $2 for input but $8 for output. Opus 4.6 charges $5 for input but $25 for output. This isn't arbitrary. It reflects how the models work.

When you send input tokens (your prompt), the model processes them all at once in parallel. The GPU crunches through your entire prompt in a single forward pass. It's fast and computationally efficient.

Output tokens are different. The model generates them one at a time, sequentially. Each new token depends on every token that came before it. The model can't skip ahead or parallelize this step. Token 50 requires tokens 1 through 49 to exist first. This sequential dependency means each output token requires a separate forward pass through the model, using far more compute per token than input processing.

This has a direct impact on your bill. A typical API call with 2,000 input tokens and 500 output tokens on GPT-4.1 costs $0.004 for input and $0.004 for output. Equal cost despite 4x fewer output tokens. If your responses average 1,000 output tokens, output is 67% of your total cost.

The practical takeaway: controlling output length is the single most effective way to reduce costs. Tell the model to be concise. Use structured output (JSON) instead of prose. For classification, request a single label instead of an explanation. Set max_tokens to prevent runaway responses. A 10-class classifier using structured output typically uses 5-20 output tokens instead of 50-200 with free-form text. At scale, that's a 10x cost reduction on your most expensive token type.

How to Calculate Your Monthly Cost: 3 Real Scenarios

Token pricing means nothing in isolation. Here's what three common workloads cost per month across different models, with the math shown.

Scenario 1: Customer Support Chatbot (10K conversations/day)

Each conversation averages 500 input tokens (system prompt + user message) and 200 output tokens. Daily tokens: 5M input, 2M output.

GPT-4.1 Nano: (5 x $0.10) + (2 x $0.40) = $1.30/day, $39/month
Gemini 2.5 Flash: (5 x $0.15) + (2 x $0.60) = $1.95/day, $59/month
GPT-4.1: (5 x $2.00) + (2 x $8.00) = $26/day, $780/month
Sonnet 4.6: (5 x $3.00) + (2 x $15.00) = $45/day, $1,350/month

For most support chatbots, GPT-4.1 Nano or Gemini Flash provides sufficient quality. Upgrading to GPT-4.1 or Sonnet only makes sense if the chatbot handles complex, multi-turn conversations where reasoning quality directly impacts customer satisfaction.

Scenario 2: RAG Pipeline (50K queries/day)

Each query includes 2,000 input tokens (retrieved documents + question) and 500 output tokens. Daily tokens: 100M input, 25M output.

GPT-4.1 Nano: (100 x $0.10) + (25 x $0.40) = $20/day, $600/month
Mistral Small: (100 x $0.10) + (25 x $0.30) = $17.50/day, $525/month
GPT-4.1: (100 x $2.00) + (25 x $8.00) = $400/day, $12,000/month
Sonnet 4.6: (100 x $3.00) + (25 x $15.00) = $675/day, $20,250/month

RAG pipelines are input-heavy. Prompt caching helps here because the system prompt and few-shot examples repeat on every query. With 75% cache hits on GPT-4.1, that $12,000/month drops to roughly $5,000/month.

Scenario 3: Batch Document Classification (1M documents, one-time)

Each document averages 1,000 input tokens. Output is a single classification label (10 tokens). Total: 1B input tokens, 10M output tokens.

GPT-4.1 Nano: (1000 x $0.10) + (10 x $0.40) = $104 total
GPT-4.1 Nano (batch, 50% off): $52 total
Gemini 2.5 Flash: (1000 x $0.15) + (10 x $0.60) = $156 total
GPT-4.1: (1000 x $2.00) + (10 x $8.00) = $2,080 total

Classification is the ideal use case for cheap models. The difference between $52 (Nano batch) and $2,080 (GPT-4.1) is 40x, and for labeling tasks, Nano's accuracy is typically within 1-2% of the larger model.

Batch and Caching Discounts That Change the Math

The per-million-token prices above are sticker prices. Three discount mechanisms can cut your actual costs by 50-90%. Ignoring them means overpaying.

OpenAI Batch API: 50% off everything. Submit requests as a JSONL file and get results within 24 hours. Every token costs half. GPT-4.1 drops from $2/$8 to $1/$4. GPT-4.1 Nano drops to $0.05/$0.20. Use this for any workload that doesn't need real-time responses: data extraction, content generation, nightly processing, bulk classification. It's the single biggest cost reduction available from any provider.

Prompt caching: 50-90% off input tokens. Both OpenAI and Anthropic cache repeated prompt prefixes automatically. If your system prompt, few-shot examples, or document context stays the same across requests, subsequent calls pay a reduced rate on those cached tokens. OpenAI's discounts vary by model: GPT-5 gets 90% off cached reads, GPT-4.1 gets 75% off, o-series gets 50% off. Anthropic offers 90% off cached input tokens across all Claude models. Structure your prompts with stable content first and variable content last to maximize cache hits.

Google's free tier. Gemini 2.5 Flash is free under certain rate limits (roughly 15 RPM and 1M TPM). For prototyping, low-volume internal tools, or development environments, this is zero-cost access to a model that competes with GPT-4.1 Nano in quality. No credit card required.

These discounts stack in some cases. A cached batch request through OpenAI's GPT-4.1 pays $0.25 per million cached input tokens and $4 per million output tokens. That's 87.5% off standard input pricing. The sticker prices at the top of this page are the ceiling, not the floor.

Which Provider Is Cheapest for Your Use Case?

Stop comparing models in isolation. Match the model to your task. Here are clear winners in each category.

Simple classification, extraction, or routing

Pick GPT-4.1 Nano ($0.10/$0.40) or Gemini 2.5 Flash ($0.15/$0.60). For labeling data, extracting structured fields from text, or routing queries to other models, you don't need a $3+ model. Nano handles these tasks with 95%+ accuracy at a fraction of the cost. Gemini Flash is a strong alternative, especially if you want Google's free tier for development. If output cost matters most, Mistral Small ($0.10/$0.30) has the cheapest output tokens available.

General production (chatbots, content, coding)

Pick GPT-4.1 ($2/$8) or Sonnet 4.6 ($3/$15). This is where most teams spend their money. GPT-4.1 is 33% cheaper on input and 47% cheaper on output. Sonnet 4.6 tends to produce higher-quality writing and handles ambiguous instructions better. Test both on your specific prompts. If cost is the priority, GPT-4.1 wins. If output quality on nuanced tasks matters more, Sonnet 4.6 is worth the premium.

Complex reasoning (math, logic, multi-step analysis)

Pick o4-mini ($1.10/$4.40) or Opus 4.6 ($5/$25). o4-mini is the value play for reasoning tasks. It uses chain-of-thought reasoning at a low price and handles most math, science, and logic problems well. Opus 4.6 is the quality ceiling but costs 5x more on input and 6x more on output. For most reasoning workloads, start with o4-mini and only escalate to Opus when accuracy drops below your threshold. o3 ($2/$8) sits between them as a mid-range option.

Budget-sensitive or experimental

Pick Gemini 2.5 Flash (free tier) or Llama 4 Scout via Groq ($0.15/$0.60). Google's free tier gets you a competitive model at zero cost for low-volume use. Groq's inference of Llama 4 is extremely fast and cheap. Both work for prototyping, internal tools, and development. When you need to move to production, switch to GPT-4.1 Nano or stay on Gemini Flash's paid tier.

Input Output Tokens: What You Pay For and Why They Differ

Every API bill splits into two line items: input tokens and output tokens. Input tokens are everything you send to the model. That includes your system prompt, the conversation history, retrieved documents, few-shot examples, and the user's current message. Output tokens are only the text the model writes back. The split matters because the two are priced differently and behave differently at scale.

A quick worked example. You send GPT-4.1 a 3,000-token prompt and it returns a 700-token answer. Input cost is 0.003 x $2 = $0.006. Output cost is 0.0007 x $8 = $0.0056. Almost a 50/50 split even though the output is four times shorter, because output is priced four times higher. Knowing which side dominates your workload tells you where to optimize. Long-context RAG is input-heavy. Chatty assistants and code generation are output-heavy.

Here is the input output token spread for the models most teams actually deploy, as of April 2026:

Model	Input / 1M	Output / 1M	Output-to-input ratio
GPT-4.1 Nano	$0.10	$0.40	4x
GPT-4.1	$2.00	$8.00	4x
GPT-5	$1.25	$10.00	8x
Claude Sonnet 4.6	$3.00	$15.00	5x
Claude Opus 4.6	$5.00	$25.00	5x
Gemini 2.5 Flash	$0.15	$0.60	4x

How Much Is 1 Million Tokens (and 2 Million, and 10 Million)?

In plain dollars, the answer depends entirely on the model and whether the tokens are input or output. One million tokens is approximately 750,000 words. On GPT-4.1 Nano, a million input tokens costs ten cents and a million output tokens costs forty cents. On Opus 4.6, the same million input tokens costs five dollars and a million output tokens costs twenty-five dollars. The table below shows the all-in cost for a few round numbers, assuming a typical 80% input / 20% output mix.

Total tokens (80% in / 20% out)	GPT-4.1 Nano	GPT-4.1	Sonnet 4.6
1 million	$0.16	$3.20	$5.40
2 million	$0.32	$6.40	$10.80
10 million	$1.60	$32	$54

The cost per token, then, ranges from about $0.0000001 (GPT-4.1 Nano input) to $0.000025 (Opus 4.6 output). Those numbers are tiny per token, which is exactly why providers quote per million. Confirm the current rate on the provider's own pricing page before you budget, since model lineups and prices shift through the year. See OpenAI API pricing and Anthropic API pricing for the per-provider breakdowns we keep updated.

The Bottom Line

The cheapest capable model is GPT-4.1 Nano at $0.10 per million input tokens, with Mistral Small and Gemini 2.5 Flash close behind. For production workloads, GPT-4.1 ($2/$8) and Sonnet 4.6 ($3/$15) are the two defaults. For reasoning, o4-mini ($1.10/$4.40) is the value pick. The biggest cost lever isn't which model you pick. It's routing simple tasks to cheap models instead of sending everything through the same expensive one. Combine that with batch processing and prompt caching, and your actual costs will be 30-60% below the sticker prices on this page.

Disclosure: Pricing information is sourced from official provider websites and may change. We update this page regularly but always verify current pricing on the vendor's site before purchasing.

Related Resources

OpenAI API Pricing → Anthropic API Pricing → AWS Bedrock Pricing → Cohere API Pricing → LLM API Pricing Comparison →

Frequently Asked Questions

What does "per million tokens" mean?

LLM providers charge based on tokens, which are chunks of text roughly 3-4 characters long. "Per million tokens" (per 1M tokens) is the standard pricing unit. One million tokens is approximately 750,000 words. If a model costs $2 per 1M input tokens and you send 500,000 tokens in a month, your input cost is $1. Some older documentation quotes prices "per 1K tokens." To convert, multiply by 1,000. A model priced at $0.002 per 1K tokens costs $2 per 1M tokens.

How many tokens is 1,000 words?

In English, 1,000 words is roughly 1,300 to 1,400 tokens. The exact count depends on word length and punctuation. Code typically tokenizes at a higher ratio because of special characters and syntax. A 10-page report (around 5,000 words) is about 6,500-7,000 tokens. You can use OpenAI's tiktoken library or Anthropic's token counter to get exact counts before sending requests.

Why do output tokens cost more than input tokens?

Input tokens can be processed in parallel because the model reads the entire prompt at once. Output tokens must be generated one at a time, sequentially, because each new token depends on the ones before it. This sequential generation uses more GPU compute per token. That's why output costs 4-8x more than input across every major provider. It also means verbose responses are disproportionately expensive. Controlling output length is the single most effective cost optimization.

Which LLM has the cheapest API?

GPT-4.1 Nano at $0.10/$0.40 per million tokens is the cheapest capable model from a major provider. Mistral Small matches it on input at $0.10 and beats it on output at $0.30. Gemini 2.5 Flash is $0.15/$0.60 but has a free tier for low-volume use. For open-source models, Llama 4 Scout via Groq costs about $0.15/$0.60. With OpenAI's batch API (50% off), GPT-4.1 Nano drops to $0.05/$0.20, making it the cheapest option at scale.

How do I switch providers to save money?

Most LLM APIs follow the same request/response format: a messages array with system, user, and assistant roles. Switching providers usually means changing the API endpoint, API key, and model name. Libraries like LiteLLM and OpenRouter provide a unified interface across providers, letting you switch with a single config change. The hard part isn't the code. It's testing. Different models behave differently on the same prompt. Always run your evaluation suite against the new model before switching production traffic.

Are there free LLM APIs?

Google offers a free tier for Gemini 2.5 Flash with rate limits (around 15 RPM and 1M TPM for free users). OpenAI has a free tier with very limited rate limits (3 RPM for reasoning models). Groq offers free access to open-source models like Llama 4 with rate limits. These free tiers are enough for prototyping and development but won't support production traffic. For serious volume, expect to pay. But the cheapest models ($0.10/1M tokens) make the cost minimal for most use cases.

How much is 1 million tokens in dollars?

It depends on the model and whether the tokens are input or output. As of April 2026: a million input tokens runs $0.10 on GPT-4.1 Nano, $2 on GPT-4.1, $3 on Claude Sonnet 4.6, and $5 on Claude Opus 4.6. Output tokens cost more: $0.40, $8, $15, and $25 per million respectively. A million tokens is roughly 750,000 words, so for most everyday workloads the per-message cost is fractions of a cent. Always confirm the live rate on the provider's pricing page.

What is the cost per token for GPT and Claude?

Divide the per-million price by one million. GPT-4.1 input works out to $0.000002 per token and output to $0.000008 per token. Claude Sonnet 4.6 is $0.000003 input and $0.000015 output. Claude Opus 4.6 is $0.000005 input and $0.000025 output. Nobody quotes prices this way in practice because the numbers are too small to read, which is why every provider standardizes on a per-million-token figure.