Comparison

LLM API Pricing Comparison (April 2026) - Every Model, Side by Side

By Rome Thorndike · April 6, 2026 · 18 min read

LLM pricing changes fast. New models launch, old ones get price cuts, and the per-token math can make or break your AI project's economics. This page tracks current API pricing for every major model as of April 2026, with real cost comparisons for common workloads.

Bookmark this page. We update it monthly.

Quick Pricing Reference: All Models at a Glance

This table covers the primary API-accessible models from each provider. Prices are per 1 million tokens unless noted otherwise.

ModelProviderInput (per 1M tokens)Output (per 1M tokens)Context Window
GPT-4.1OpenAI$2.00$8.001M
GPT-4.1 miniOpenAI$0.40$1.601M
GPT-4.1 nanoOpenAI$0.10$0.401M
GPT-4oOpenAI$2.50$10.00128K
o3OpenAI$2.00$8.00200K
o3-miniOpenAI$1.10$4.40200K
o4-miniOpenAI$1.10$4.40200K
Claude Opus 4Anthropic$15.00$75.00200K
Claude Sonnet 4Anthropic$3.00$15.00200K
Claude Haiku 3.5Anthropic$0.80$4.00200K
Gemini 2.5 ProGoogle$1.25$10.001M
Gemini 2.5 FlashGoogle$0.15$0.601M
Gemini 2.0 FlashGoogle$0.10$0.401M
Llama 4 MaverickMeta (via providers)$0.20$0.601M
Llama 4 ScoutMeta (via providers)$0.10$0.2510M
Mistral Large 2Mistral$2.00$6.00128K
Mistral SmallMistral$0.10$0.3032K
Cohere Command R+Cohere$2.50$10.00128K
Cohere Command RCohere$0.15$0.60128K

Prices reflect standard API rates. Volume discounts, committed-use agreements, and batch processing can reduce costs by 25-50% depending on the provider.

OpenAI Pricing Breakdown

OpenAI runs the largest model portfolio in the market. The April 2026 lineup spans from nano-class models at $0.10/1M input tokens up to the full o3 reasoning model. For most production workloads, the GPT-4.1 family replaced GPT-4o as the default recommendation.

GPT-4.1 Family

GPT-4.1 is OpenAI's workhorse. It handles coding, analysis, and long-context tasks with a 1M token context window. The mini variant cuts cost by 80% with surprisingly small quality tradeoffs on structured tasks. The nano variant is built for high-volume, latency-sensitive workloads where you need sub-100ms responses.

Cost example: processing 10,000 customer support tickets (average 500 tokens input, 200 tokens output each) costs roughly $16 with GPT-4.1, $3.20 with GPT-4.1 mini, and $0.80 with GPT-4.1 nano.

Reasoning Models (o3, o4-mini)

OpenAI's reasoning models think before answering. They consume more tokens internally (chain-of-thought tokens are billed as output), which means actual costs run 2-5x higher than the per-token price suggests. Use these for complex analysis, math, and multi-step reasoning. Not cost-effective for simple classification or extraction tasks.

For detailed OpenAI pricing tiers and rate limits, see our OpenAI API Pricing page.

Anthropic Pricing Breakdown

Anthropic prices on a three-tier system: Haiku (fast and cheap), Sonnet (balanced), and Opus (maximum capability). The gap between tiers is significant. Opus costs 5x more than Sonnet on input and 5x more on output.

When Opus Is Worth the Premium

Claude Opus 4 is the most expensive mainstream LLM at $15/$75 per million tokens. That price point only makes sense for tasks where quality differences directly impact revenue: legal document analysis, complex code generation, research synthesis, and agentic workflows where errors cascade. For most applications, Sonnet 4 handles the work at 80% of the quality for 20% of the cost.

Haiku: The Budget Workhorse

Claude Haiku 3.5 at $0.80/$4.00 fills the high-quality budget slot. It outperforms GPT-4o mini on many benchmarks while costing roughly double. The tradeoff is worth it when you need Anthropic's safety characteristics or superior instruction following on nuanced tasks.

Full Anthropic tier comparison at our Anthropic API Pricing page.

Google Gemini Pricing Breakdown

Google's pricing strategy is aggressive. Gemini 2.5 Flash at $0.15/$0.60 per million tokens undercuts nearly everything except open-source models, and it includes a 1M token context window. The free tier (see our AI Free Tiers 2026 guide) is generous enough for prototyping and low-volume production use.

Gemini 2.5 Pro

At $1.25/$10.00, Gemini 2.5 Pro offers strong reasoning and coding performance. The input pricing undercuts Claude Sonnet and GPT-4.1, but output tokens are priced at $10 per million, making generation-heavy workloads expensive. Use Gemini Pro when your prompts have high input-to-output ratios (document analysis, summarization of long texts).

Flash Models

Gemini 2.5 Flash and 2.0 Flash are the price-performance leaders. At $0.10-$0.15 per million input tokens, they compete directly with open-source model hosting costs while requiring zero infrastructure management.

Open-Source Model Costs

Llama 4, Mistral, and other open-weight models don't have a single price. Your cost depends on how you host them.

Hosted API Pricing

Providers like Together AI, Fireworks, Groq, and AWS Bedrock host open-source models and charge per token. Typical rates for Llama 4 Maverick range from $0.15-$0.30 per million input tokens depending on the provider and commitment level. Self-hosting on your own GPUs can be cheaper at scale but requires significant DevOps investment.

Self-Hosting Economics

Running Llama 4 Maverick (400B+ parameters, mixture-of-experts) requires multiple high-end GPUs. A typical setup costs $3-8/hour on cloud GPU instances. At sustained high throughput (100+ requests/minute), self-hosting breaks even with API pricing around the 50,000 requests/day mark. Below that, hosted APIs are cheaper.

Cost Comparison by Workload

Raw per-token pricing tells part of the story. Actual costs depend on your workload pattern.

Chatbot / Conversational AI

Average conversation: 2,000 tokens input (system prompt + history), 500 tokens output per turn, 5 turns per session.

ModelCost per SessionCost per 10K Sessions/Month
GPT-4.1$0.06$600
GPT-4.1 mini$0.012$120
Claude Sonnet 4$0.068$675
Gemini 2.5 Flash$0.005$45
Llama 4 Maverick (hosted)$0.005$55

Document Processing Pipeline

Average document: 8,000 tokens input, 1,000 tokens output (summary + extraction).

ModelCost per DocumentCost per 50K Docs/Month
GPT-4.1$0.024$1,200
GPT-4.1 nano$0.001$60
Claude Haiku 3.5$0.010$520
Gemini 2.5 Flash$0.002$90
Gemini 2.0 Flash$0.001$60

Code Generation / Analysis

Average request: 3,000 tokens input (code + instructions), 2,000 tokens output.

ModelCost per RequestCost per 100K Requests/Month
GPT-4.1$0.022$2,200
Claude Sonnet 4$0.039$3,900
Claude Opus 4$0.195$19,500
Gemini 2.5 Pro$0.024$2,375
Mistral Large 2$0.018$1,800

Cost Optimization Strategies

The cheapest model isn't always the best value. Here's how to optimize spend without sacrificing quality.

1. Tiered Model Routing

Route requests to different models based on complexity. Use a cheap classifier (GPT-4.1 nano or Gemini 2.0 Flash) to assess request difficulty, then route simple requests to budget models and complex ones to premium models. This typically cuts costs 40-60% compared to using a single model for everything.

2. Prompt Caching

Both OpenAI and Anthropic offer prompt caching for system prompts and repeated context. Cached input tokens cost 50-90% less than fresh tokens. If your system prompt is 2,000+ tokens, caching pays for itself immediately. Anthropic's prompt caching reduces cached input to $0.30/1M on Sonnet (90% discount).

3. Batch Processing

OpenAI's Batch API charges 50% less for non-real-time workloads. If your use case can tolerate 24-hour turnaround (nightly report generation, weekly analysis runs), batch processing is the simplest cost reduction available.

4. Context Window Management

Stuffing the full context window costs money. A 100K token input to Claude Sonnet costs $0.30 per request. Trim your context to what's actually needed. Use RAG to retrieve only relevant chunks instead of passing entire documents.

5. Output Token Optimization

Output tokens cost 2-5x more than input tokens across all providers. Request concise outputs. Use structured output formats (JSON) to avoid verbose prose. Set max_tokens limits to prevent runaway generation.

Provider Comparison: Beyond Price

Price alone doesn't determine value. Consider these factors when choosing a provider.

Rate Limits and Availability

OpenAI offers the highest default rate limits. Anthropic's rate limits are more restrictive but increase with usage tier. Google provides generous free-tier limits but can throttle aggressively during peak hours. For production workloads, check each provider's rate limit documentation and request increases proactively.

Quality vs. Cost Tradeoff

The best model for your use case isn't always the most expensive one. Run evaluations on your specific tasks. We've seen cases where GPT-4.1 mini outperforms GPT-4.1 on well-structured extraction tasks, and where Claude Haiku beats Gemini Pro on nuanced classification. Price per token matters less than price per correct output.

Ecosystem and Tooling

OpenAI has the largest ecosystem: fine-tuning, assistants API, function calling, and extensive third-party integrations. Anthropic offers the best developer experience for complex system prompts and tool use. Google integrates tightly with GCP services (Vertex AI, BigQuery). Your existing infrastructure should influence your choice.

Pricing Trends: Where Costs Are Heading

LLM API prices have dropped roughly 10x over the past two years. GPT-4-class performance that cost $30/1M input tokens in early 2024 now costs $2-3/1M. Several forces continue pushing prices down.

  • Hardware improvements: New GPU architectures (NVIDIA Blackwell, AMD MI350) increase inference throughput, reducing per-token serving costs.
  • Model efficiency: Mixture-of-experts architectures (used by Llama 4, Gemini, Mistral) activate only a fraction of total parameters per request, cutting compute costs.
  • Competition: Google, Meta, and Mistral are subsidizing pricing to gain market share. This benefits buyers.
  • Open-source pressure: Free open-weight models set a floor on how much providers can charge for closed models of similar quality.

Expect another 2-3x price reduction over the next 12 months for equivalent quality levels.

Which Model Should You Choose?

Here's the decision framework we recommend.

  • Lowest possible cost, acceptable quality: Gemini 2.0 Flash or GPT-4.1 nano
  • Best price-performance balance: GPT-4.1 mini or Gemini 2.5 Flash
  • Production quality, reasonable cost: GPT-4.1 or Claude Sonnet 4
  • Maximum quality, cost secondary: Claude Opus 4 or o3
  • High volume, cost-sensitive: Llama 4 Maverick (self-hosted) or Cohere Command R
  • Privacy/compliance requirements: Self-hosted Llama 4 or Mistral via VPC deployment

For provider-specific deep dives, see our OpenAI pricing guide, Anthropic pricing guide, and Cohere pricing guide.

RT
About the Author

Rome Thorndike is the founder of the Prompt Engineer Collective, a community of over 1,300 prompt engineering professionals, and author of The AI News Digest, a weekly newsletter with 2,700+ subscribers. Rome brings hands-on AI/ML experience from Microsoft, where he worked with Dynamics and Azure AI/ML solutions, and later led sales at Datajoy (acquired by Databricks).

Get smarter about AI tools, careers & strategy. Every week.

AI News Digest covers industry moves & tool updates. AI Pulse covers salary data & career strategy. Both free.

2,700+ subscribers. Unsubscribe anytime.