OpenAI API Pricing: What Every Model Actually Costs (April 2026)

OpenAI's model lineup has expanded significantly. GPT-4.1 replaced GPT-4o as the recommended production model. GPT-5 is the new flagship. The o-series reasoning models now include o3 and o4-mini. And the GPT-4.1 Nano at $0.10 per million input tokens is one of the cheapest capable models available anywhere. This page breaks down every current model's pricing, the batch and caching discounts that cut costs by 50-90%, and real cost math for production workloads. All prices verified April 2026.

OpenAI API pricing comparison chart showing plan tiers and costs — OpenAI API pricing tiers as of April 2026. Data verified by PE Collective.

GPT-4.1 Nano

$0.10 / $0.40 per 1M input / output tokens

✓ Cheapest capable model in OpenAI's lineup
✓ Good for classification, extraction, routing
✓ 1M token context window
✓ Fast response times
✓ Best cost-per-token ratio for simple tasks

GPT-4.1

$2.00 / $8.00 per 1M input / output tokens

✓ Recommended production model (replaced GPT-4o)
✓ Strong coding, instruction following, long context
✓ 1M token context window
✓ Function calling and structured outputs
✓ Best balance of cost, speed, and quality

o4-mini

$1.10 / $4.40 per 1M input / output tokens

✓ Best-value reasoning model
✓ Chain-of-thought reasoning built in
✓ 200K context window
✓ Outperforms o3-mini on most benchmarks
✓ Good for math, science, and complex logic

GPT-5

$1.25 / $10.00 per 1M input / output tokens

✓ Most capable model overall
✓ Agentic workflows and complex reasoning
✓ 128K context window
✓ Best for research and hard problems
✓ 90% cache read discount

Complete Model Pricing Table (April 2026)

OpenAI now offers more than a dozen models across three families: GPT (general-purpose), o-series (reasoning), and specialized models. Here's the full breakdown of every current model worth knowing about.

Model	Input / 1M tokens	Output / 1M tokens	Cached Input	Context Window	Best For
GPT-4.1 Nano	$0.10	$0.40	$0.025	1M	Classification, routing, extraction
GPT-4o-mini	$0.15	$0.60	$0.075	128K	Legacy simple tasks
GPT-5 Mini	$0.25	$2.00	$0.025	128K	Budget general-purpose
GPT-4.1 Mini	$0.40	$1.60	$0.10	1M	Mid-tier production tasks
o4-mini	$1.10	$4.40	$0.275	200K	Budget reasoning
o3-mini	$1.10	$4.40	$0.55	200K	Legacy reasoning
GPT-5	$1.25	$10.00	$0.125	128K	Flagship, agentic workflows
GPT-4.1	$2.00	$8.00	$0.50	1M	Production workhorse
o3	$2.00	$8.00	$0.50	200K	Advanced reasoning
GPT-4o	$2.50	$10.00	$1.25	128K	Legacy production
o1	$15.00	$60.00	$7.50	200K	Legacy deep reasoning

GPT-4.1 Family: The New Production Standard

GPT-4.1 launched as OpenAI's recommended replacement for GPT-4o. It's cheaper ($2/$8 vs $2.50/$10), has a 1 million token context window (vs 128K), and scores better on instruction-following and coding benchmarks. If your code still references gpt-4o, switching to gpt-4.1 saves money while improving quality.

The GPT-4.1 family includes three variants. GPT-4.1 ($2/$8) is the main production model, strong at coding, long-context processing, and structured outputs. GPT-4.1 Mini ($0.40/$1.60) handles mid-tier tasks at 5x lower cost. GPT-4.1 Nano ($0.10/$0.40) is the budget option, ideal for classification and extraction where you don't need sophisticated reasoning.

All three variants share the 1 million token context window. This is significant for document processing and RAG applications that previously needed to chunk documents into 128K segments. With GPT-4.1, you can process entire codebases, legal documents, or research papers in a single request.

Cache read pricing for the GPT-4.1 family is 75% off standard input pricing. GPT-4.1's cached input rate is $0.50 per million tokens (vs $2.00 standard). For applications that send the same system prompt, few-shot examples, or document context across requests, this stacks with the already-low base price to make GPT-4.1 very cost-effective.

GPT-4.1 Mini 2026 Pricing: The Production Sweet Spot at $0.40 / $1.60 per 1M Tokens

If you want the short answer for "openai api pricing gpt-4.1 mini 2026": $0.40 per 1M input tokens, $1.60 per 1M output tokens, verified against openai.com/api/pricing on April 18, 2026. With the Batch API, those drop to $0.20 / $0.80. With prompt caching, cached input reads cost $0.10 per 1M tokens (75% off standard input). Stack both, and a cached-batch request costs $0.10 input / $0.80 output per 1M tokens.

GPT-4.1 Mini sits between Nano and full 4.1 in OpenAI's lineup. The cost ladder is clean: Nano at $0.10/$0.40 for classification and extraction, Mini at $0.40/$1.60 for general production tasks (chat, structured outputs, mid-complexity coding), full 4.1 at $2/$8 for tasks that need stronger coding or long-context reasoning. The Mini tier delivers roughly 85-90% of full 4.1's quality on most benchmarks at 20% of the price, which is why most cost-aware production deployments route the majority of traffic there and reserve the full 4.1 endpoint for the hardest 10-15% of queries.

Real workload cost math at GPT-4.1 Mini rates (April 2026):

Customer-support chatbot: 5,000 conversations/day, 1,200 input tokens (cached system prompt + user message), 400 output tokens. Cached input: 5M tokens at $0.10/MTok = $0.50/day. Fresh input: 1M tokens at $0.40/MTok = $0.40/day. Output: 2M tokens at $1.60/MTok = $3.20/day. Daily total: $4.10. Monthly: $123.
Document extraction pipeline: 10,000 documents/day, 4,000 input tokens, 300 output tokens. Using the Batch API. Input: 40M tokens at $0.20/MTok = $8/day. Output: 3M tokens at $0.80/MTok = $2.40/day. Daily total: $10.40. Monthly: $312.
RAG-powered search: 2,000 queries/day, 500-token question + 2,500-token cached context + 800-token response. Cached input: 5M tokens at $0.10/MTok = $0.50/day. Fresh input: 1M tokens at $0.40/MTok = $0.40/day. Output: 1.6M tokens at $1.60/MTok = $2.56/day. Daily total: $3.46. Monthly: $104.

GPT-4.1 Mini also inherits the full 1M-token context window from the GPT-4.1 family, which is rare at this price point. Anthropic's Haiku 4.5 caps at 200K tokens. Gemini 2.0 Flash supports 1M tokens at $0.10/$0.40 (cheaper than Mini), but Mini's instruction following and tool use stability are meaningfully stronger in production. For workloads that need both 1M context AND production-grade reliability, Mini at $0.40/$1.60 is the price floor.

When to NOT use GPT-4.1 Mini. Three cases where another model wins. (1) Hard reasoning, math, or science tasks: use o4-mini ($1.10/$4.40) for chain-of-thought reasoning, the visible price is higher but reasoning tokens are billed at output rates, so the actual per-task cost on hard problems often beats trying to coax Mini through them. (2) Frontier coding or agents: use full GPT-4.1 ($2/$8) or Claude Sonnet 4.6 ($3/$15 with 90% caching discount). (3) High-volume routing where Nano can carry the task: Nano at $0.10/$0.40 is 4x cheaper and good enough for classification, simple extraction, and tagging.

For deeper comparisons, see Anthropic Claude API Pricing for the Sonnet/Haiku/Opus side, LLM API Pricing Comparison for a 20-model side-by-side, and GPT-4o Mini Pricing for migration math off the legacy Mini tier.

Is GPT-5 Worth 5x the Price of GPT-4.1?

GPT-5 is OpenAI's most capable general-purpose model. At $1.25 per million input tokens, it's actually cheaper than GPT-4.1 on input, but output tokens cost $10 per million (vs $8 for GPT-4.1). The higher output cost is where GPT-5 gets expensive, especially for tasks that generate long responses.

Where GPT-5 shines is agentic workflows, complex multi-step reasoning, and tasks that require broad world knowledge. It also gets the best cache discount in OpenAI's lineup: 90% off cached input reads, bringing the effective cached input rate to $0.125 per million tokens.

GPT-5 Mini ($0.25/$2.00) offers a budget path into GPT-5's capabilities. It's positioned between GPT-4.1 Nano and GPT-4.1 Mini in price but uses GPT-5's architecture. For applications that need GPT-5-class reasoning on a budget, this is worth testing.

The practical question: when should you use GPT-5 vs GPT-4.1? GPT-4.1 wins on coding tasks, instruction following, and long-context processing. GPT-5 wins on complex reasoning, agentic tool use, and tasks requiring high-stakes judgment. Most production apps should default to GPT-4.1 and only route to GPT-5 for their hardest queries.

o-Series Reasoning Models: o3 and o4-mini

OpenAI's o-series models use chain-of-thought reasoning, they "think" before answering, using internal reasoning tokens that you pay for as output tokens. This means the visible cost per request is higher than the visible output would suggest. A short o3 answer might use 2,000 visible output tokens but 10,000 reasoning tokens behind the scenes.

o4-mini ($1.10/$4.40) is the best value reasoning model. It replaced o3-mini and outperforms it on most benchmarks while costing the same. Use it for math, science, coding problems, and any task that benefits from step-by-step reasoning. The cached input rate is $0.275 per million tokens.

o3 ($2.00/$8.00) is the flagship reasoning model. It costs roughly 2x o4-mini and offers stronger performance on the hardest problems. PhD-level math, complex logic, and multi-step scientific reasoning. For most applications, o4-mini is sufficient. Save o3 for tasks where o4-mini's accuracy drops below your threshold.

One important detail: reasoning models don't support all features. As of April 2026, o-series models support function calling and structured outputs, but some parameters like temperature and top_p behave differently. Check OpenAI's docs for the latest compatibility matrix before building reasoning model pipelines.

The cost trap with reasoning models is unpredictable token usage. A query that uses 500 reasoning tokens on one request might use 5,000 on a slightly different phrasing. Set max_completion_tokens to cap your worst-case costs, and monitor average reasoning token usage per query type.

Batch API: 50% Off Everything

OpenAI's Batch API processes requests asynchronously and returns results within 24 hours. The tradeoff for that latency is a flat 50% discount on all token costs, input and output, across every model.

At batch pricing, GPT-4.1 drops to $1/$4 per million tokens. GPT-4.1 Nano drops to $0.05/$0.20. o4-mini drops to $0.55/$2.20. These are some of the lowest prices available for capable language models.

Batch processing works by uploading a JSONL file of requests. Each line is a standard API request. You submit the batch, wait for completion (usually 1-12 hours in practice, 24-hour SLA), then download results. No code changes to your prompts, just a different submission endpoint.

Best use cases for batch: data extraction from documents, content generation, bulk classification, evaluation/grading, and dataset creation. Anything where you don't need real-time responses. If you're currently running these workloads through the standard API, switching to batch is the single biggest cost reduction available.

You can combine batch pricing with prompt caching for even deeper discounts. A cached batch request through GPT-4.1 pays $0.25 per million cached input tokens and $4 per million output tokens. That's 87.5% off standard input pricing.

Prompt Caching: How to Get 50% Off Every Request

OpenAI automatically caches repeated prompt prefixes and charges a reduced rate for cache reads on subsequent requests. The discount varies significantly by model family, making this a key factor in model selection for cost-sensitive applications.

GPT-5 family models get the deepest cache discount: 90% off input tokens for cache reads. This means GPT-5's effective cached input rate is just $0.125 per million tokens, cheaper than GPT-4.1 Nano's standard input rate. If your application sends the same long context repeatedly and you're using GPT-5, caching makes it remarkably affordable.

GPT-4.1 family models get 75% off cached reads. GPT-4.1's cached rate drops to $0.50 per million, GPT-4.1 Mini to $0.10, and GPT-4.1 Nano to $0.025. For applications with stable system prompts and shared context (RAG, chatbots, document processing), these cached rates make the GPT-4.1 family very competitive.

GPT-4o family and o-series models get 50% off cached reads. This is the weakest discount tier. o4-mini's cached rate is $0.275, o3's is $0.50, and GPT-4o's is $1.25. If you're choosing between GPT-4.1 ($0.50 cached) and GPT-4o ($1.25 cached), the caching difference alone is 2.5x.

Cache writes happen automatically when your prompt prefix exceeds a minimum length (typically 1,024 tokens). The cache has a TTL that varies by model, check OpenAI's docs for current values. Cache hits require an exact prefix match, so structure your prompts with stable content first (system prompt, few-shot examples, shared context) and variable content last (user query).

OpenAI vs Anthropic: Which API Costs Less for Your Workload?

The two leading API providers have different pricing philosophies. OpenAI offers more model tiers with a wider price range. Anthropic offers fewer models but competitive pricing on its core lineup. Here's how they compare head-to-head on comparable models.

Use Case	OpenAI Model	OpenAI Price	Anthropic Model	Anthropic Price
Budget tasks	GPT-4.1 Nano ($0.10/$0.40)	$0.10/$0.40	Haiku 4.5 ($1/$5)	$1/$5
Production workhorse	GPT-4.1 ($2/$8)	$2/$8	Sonnet 4.6 ($3/$15)	$3/$15
Flagship	GPT-5 ($1.25/$10)	$1.25/$10	Opus 4.6 ($5/$25)	$5/$25
Budget reasoning	o4-mini ($1.10/$4.40)	$1.10/$4.40	n/a	n/a
Advanced reasoning	o3 ($2/$8)	$2/$8	Opus Extended Thinking	$5/$25

What This Actually Costs Per Month (Real Numbers)

Abstract per-token pricing is hard to reason about. Here's what real applications actually cost based on typical usage patterns.

Customer support chatbot (10K conversations/day): Average conversation uses 2,000 input tokens (system prompt + history + user message) and 500 output tokens. With GPT-4.1: (10,000 × 2,000 × $2/1M) + (10,000 × 500 × $8/1M) = $40 + $40 = $80/day ($2,400/month). With GPT-4.1 Nano: $2 + $2 = $4/day ($120/month). With prompt caching on GPT-4.1 (assuming 75% cache hits on the system prompt): roughly $1,800/month.

Document processing pipeline (1,000 docs/day, 10K tokens each): Input-heavy workload, 10M input tokens, 500K output tokens per day. With GPT-4.1: (10M × $2/1M) + (0.5M × $8/1M) = $20 + $4 = $24/day ($720/month). With batch processing (50% off): $12/day ($360/month). With GPT-4.1 Nano in batch mode: $0.50 + $0.10 = $0.60/day ($18/month).

AI coding assistant (500 requests/day, complex): Average request: 5,000 input tokens (code context + instructions), 2,000 output tokens. With GPT-4.1: (500 × 5,000 × $2/1M) + (500 × 2,000 × $8/1M) = $5 + $8 = $13/day ($390/month). With GPT-5 for hard problems: $3.13 + $10 = $13.13/day ($394/month). Surprisingly close. GPT-5's cheaper input but pricier output roughly balances out.

Reasoning-heavy analytics (200 queries/day): Average: 3,000 input tokens, 1,000 visible output tokens, 5,000 reasoning tokens. With o4-mini: (200 × 3,000 × $1.10/1M) + (200 × 6,000 × $4.40/1M) = $0.66 + $5.28 = $5.94/day ($178/month). With o3: $1.20 + $9.60 = $10.80/day ($324/month). The reasoning tokens are where the cost lives.

High-volume SaaS with model routing (100K requests/day): Route 70% to GPT-4.1 Nano (70K × 1,500 input × $0.10/1M + 70K × 400 output × $0.40/1M = $1.05 + $1.12 = $2.17/day), 25% to GPT-4.1 (25K × 3,000 × $2/1M + 25K × 1,000 × $8/1M = $150 + $200 = $350/day), and 5% to o4-mini (5K × 3,000 × $1.10/1M + 5K × 6,000 × $4.40/1M = $16.50 + $132 = $148.50/day). Total: ~$500/day ($15,000/month). Without routing (everything on GPT-4.1): ~$900/day ($27,000/month). Routing saves $12,000/month.

These examples assume standard API pricing. Apply batch discounts (50% off) to any non-real-time portion and caching discounts to repeated context for further savings. A well-optimized app typically spends 30-50% less than these baseline estimates.

Rate Limits: How Much You Need to Spend to Unlock Them

OpenAI uses a tiered rate limit system based on your cumulative API spend. Higher tiers unlock more requests per minute (RPM) and tokens per minute (TPM). This matters for production apps that need consistent throughput.

Tier	Qualification	GPT-4.1 RPM	GPT-4.1 TPM	o4-mini RPM
Free	$0 spend	3 RPM	40,000	3 RPM
Tier 1	$5 cumulative	500 RPM	200,000	500 RPM
Tier 2	$50 cumulative	5,000 RPM	2,000,000	5,000 RPM
Tier 3	$100 cumulative	5,000 RPM	4,000,000	5,000 RPM
Tier 4	$250 cumulative	10,000 RPM	10,000,000	10,000 RPM
Tier 5	$1,000 cumulative	10,000 RPM	30,000,000	10,000 RPM

How to Cut Your OpenAI Bill by 60%

The biggest cost lever is model routing. Most production queries don't need GPT-4.1, they can be handled by GPT-4.1 Nano at 20x lower cost. Build a simple classifier (which itself runs on Nano) that routes requests to the cheapest model that can handle each query type. This alone can cut costs by 60-80%.

Prompt caching is the second lever. If you send the same system prompt, few-shot examples, or document context on every request, you're overpaying. Structure prompts so stable content comes first (cached) and variable content comes last. GPT-4.1's 75% cache discount means your 2,000-token system prompt costs $0.001 per request instead of $0.004.

Batch processing is the third lever. Any workload that doesn't need real-time responses, data processing, content generation, evaluations, nightly reports, should use the Batch API for a flat 50% discount. Combined with caching, batch GPT-4.1 processes cached input at $0.25 per million tokens.

Output token optimization matters disproportionately because output costs 4-8x more than input. Tell the model to be concise. Set max_tokens to prevent runaway responses. Use structured output mode to get clean JSON instead of prose. For classification tasks, request a single word or short label instead of an explanation.

Monitor and iterate. OpenAI's API returns token usage in every response. Build a dashboard that tracks cost per request by model, endpoint, and use case. Review weekly. Common findings: a debug system prompt that inflates every request by 500 tokens, a retry loop that doubles costs on timeouts, or a model assignment that hasn't been updated since you launched.

One underused optimization: response format constraints. For classification tasks, use response_format with a JSON schema that only allows your label values. This prevents the model from generating explanations you don't need, saving output tokens. For a 10-class classifier, structured output typically uses 5-20 output tokens instead of 50-200 with free-form responses. At scale, this 10x reduction in output tokens translates directly to a 10x reduction in output costs.

Will You Spend $50 or $5,000? How to Estimate Your Bill

Step 1: Count your requests per day by category. Separate simple tasks (classification, extraction, short answers) from complex tasks (reasoning, code generation, long-form content). This determines your model mix.

Step 2: Estimate average tokens per request. A typical chatbot message is 1,000-3,000 input tokens and 300-1,000 output tokens. A document processing request is 5,000-50,000 input tokens and 500-2,000 output tokens. A reasoning query is 2,000-5,000 input tokens, 500-2,000 visible output tokens, plus 3,000-10,000 reasoning tokens (billed as output).

Step 3: Calculate daily cost. For each category: (requests × avg_input_tokens × model_input_price/1M) + (requests × avg_output_tokens × model_output_price/1M). Add categories together. Multiply by 30 for monthly.

Step 4: Apply discounts. Subtract 50% if using batch processing. Subtract the cache discount percentage for the portion of input tokens that repeat across requests (system prompt, shared context). A typical chatbot with a 1,500-token system prompt gets 30-50% of its input tokens cached.

Step 5: Add a 20-30% buffer for token estimation variance, retry costs, and usage growth. Real production costs almost always exceed initial estimates because prompts grow, edge cases trigger longer responses, and usage increases after launch.

Quick reference: A startup running 5,000 requests/day through GPT-4.1 Nano for simple tasks and 1,000 requests/day through GPT-4.1 for complex tasks typically spends $200-500/month. A mid-size company running 50,000+ requests/day across multiple models typically spends $2,000-10,000/month depending on model mix and optimization.

Token Counting and How OpenAI Bills You

OpenAI uses byte-pair encoding (BPE) tokenization. One token is roughly 3-4 characters of English text, or about 0.75 words. A 1,000-word document typically tokenizes to 1,300-1,400 tokens. You can use OpenAI's tiktoken library to count tokens before sending requests.

System prompts count as input tokens on every request. A 500-token system prompt across 10,000 requests costs 5 million input tokens, $10 on GPT-4.1 before any user messages. This is why prompt caching matters: that same system prompt cached costs $2.50 instead of $10.

Function/tool definitions also count as input tokens. Defining 10 tools with detailed JSON schemas can add 2,000-5,000 tokens per request. Only include the tools each request actually needs, and keep descriptions concise.

The API response includes a usage object with exact prompt_tokens, completion_tokens, and (for reasoning models) reasoning_tokens counts. Log these for every request to build accurate cost tracking. Don't estimate, measure.

Image tokens depend on the image's dimensions and the detail parameter. A 1024x1024 image at high detail costs roughly 765 tokens. A low-detail image costs a fixed 85 tokens. If you're processing images that don't need pixel-level analysis, set detail: low to save 90% on image token costs.

Still on GPT-4? What Migration Saves You

If your application still uses GPT-4 ($30/$60), GPT-4 Turbo ($5/$15), or GPT-3.5 Turbo ($0.50/$1.50), you're overpaying significantly. Here's the migration path for each.

GPT-4 → GPT-4.1: Drop-in replacement for most use cases. GPT-4.1 is 15x cheaper ($2/$8 vs $30/$60) with better performance. The main difference is the API endpoint name, change gpt-4 to gpt-4.1 in your code and test. Context window jumps from 8K to 1M tokens.

GPT-4 Turbo → GPT-4.1: Also a near-drop-in replacement. GPT-4.1 is 2.5x cheaper ($2/$8 vs $5/$15) with a larger context window (1M vs 128K). Function calling and structured outputs work the same way. JSON mode is supported.

GPT-4o → GPT-4.1: Cheapest migration, saves 20% ($2/$8 vs $2.50/$10). GPT-4.1 adds the 1M context window and better instruction following. Vision capabilities are maintained. The only potential issue: GPT-4o has slightly different behavior on some creative writing tasks, so test your specific prompts.

GPT-3.5 Turbo → GPT-4.1 Nano: GPT-4.1 Nano is 5x cheaper ($0.10/$0.40 vs $0.50/$1.50) and significantly more capable. The quality jump is substantial. GPT-4.1 Nano handles tasks that GPT-3.5 Turbo struggled with, at a fraction of the cost.

o1 → o3 or o4-mini: o1 at $15/$60 is expensive. o3 ($2/$8) is 7.5x cheaper on input and o4-mini ($1.10/$4.40) is 13.6x cheaper. Both outperform o1 on most benchmarks. The migration requires testing because reasoning behavior differs between models, but the cost savings are dramatic.

Before migrating any model, run your evaluation suite against the new model to catch behavior differences. Pricing savings mean nothing if output quality drops. Most migrations are smooth, but edge cases in creative writing, specific formatting requirements, or fine-tuned model behavior can surface during testing.

5 Ways OpenAI Charges More Than You Expect

⚠ Output tokens cost 4-8x more than input tokens depending on the model. GPT-5 has the steepest ratio at 8x. Verbose responses can blow up costs fast.
⚠ Reasoning models (o3, o4-mini) use internal reasoning tokens billed as output tokens. A simple-looking o3 response might consume 5-10x more tokens than the visible output. You pay for the thinking, not just the answer.
⚠ GPT-4.1 replaced GPT-4o as the recommended model, but GPT-4o is still available. If your codebase still references GPT-4o, you're paying $2.50/$10 instead of $2/$8 for similar quality. Update your model strings.
⚠ Cache read pricing varies wildly by model family. GPT-5 family gets 90% off cached reads. GPT-4.1 family gets 75% off. GPT-4o family only gets 50% off. Pick models with better cache discounts if you send repeated context.
⚠ Rate limits on the free tier are tight: 3 RPM for reasoning models, 500 RPM for GPT-4o-mini. You need $5+ cumulative spend to unlock Tier 1 limits.
⚠ The Batch API gives 50% off but results arrive within 24 hours, not seconds. Fine for offline processing, useless for real-time apps.
⚠ Fine-tuned model pricing is separate and higher. Training costs $25/1M tokens for GPT-4o, and inference on fine-tuned models costs more than base models.
⚠ Image inputs (vision) are tokenized based on dimensions. A 1024x1024 image costs roughly 765 tokens. Sending high-res images without downscaling wastes tokens.

Which Model Should You Use? Quick Decision Guide

Simple chatbot or classification pipeline

GPT-4.1 Nano at $0.10/$0.40 per 1M tokens. It's 20x cheaper than GPT-4.1 and handles extraction, routing, and simple conversations. At this price point, even high-volume apps cost under $50/month.

Production AI application

GPT-4.1 at $2/$8 per 1M tokens. This is OpenAI's recommended production model, better instruction following and coding than GPT-4o at a lower price. Route simple tasks to GPT-4.1 Nano or GPT-4.1 Mini to save more.

Complex reasoning or research

o4-mini at $1.10/$4.40 for most reasoning tasks. Only upgrade to o3 ($2/$8) when o4-mini's accuracy isn't sufficient. Reserve GPT-5 ($1.25/$10) for your hardest agentic workflows.

Budget-conscious startup or hobby project

GPT-4.1 Nano ($0.10/$0.40) for everything, upgraded to GPT-4.1 Mini ($0.40/$1.60) when Nano's quality falls short. Combined with batch processing (50% off), you can run a serious app for under $20/month.

The Bottom Line

GPT-4.1 Nano at $0.10 per million input tokens is the cheapest capable model in OpenAI's lineup and handles most simple tasks. GPT-4.1 is the new production workhorse, better and cheaper than GPT-4o. For reasoning, o4-mini at $1.10/$4.40 undercuts o3 while matching it on most tasks. GPT-5 is the ceiling for hard problems. The biggest cost-saving move: route requests to the cheapest model that can handle each task, and use batch processing for anything that doesn't need real-time responses.

Disclosure: Pricing information is sourced from official websites and may change. We update this page regularly but always verify current pricing on the vendor's site before purchasing.

Related Resources

OpenAI API Review → Anthropic API Pricing → OpenAI vs Anthropic API → AWS Bedrock Pricing → Cohere API Pricing → Compare All LLM Token Prices → AI API Free Tiers Compared →

Frequently Asked Questions

How much does the OpenAI API cost?

It depends on the model. GPT-4.1 Nano is the cheapest at $0.10 per 1M input tokens. GPT-4.1 (the recommended production model) costs $2 per 1M input tokens. GPT-5 costs $1.25/$10. o4-mini costs $1.10/$4.40. Output tokens cost 4-8x more than input across all models.

What's the cheapest OpenAI model?

GPT-4.1 Nano at $0.10 per million input tokens and $0.40 per million output tokens. It's designed for classification, extraction, and simple tasks. With batch processing (50% off), it drops to $0.05/$0.20, effectively free for most use cases.

Should I use GPT-4o or GPT-4.1?

GPT-4.1. It replaced GPT-4o as OpenAI's recommended production model. It's cheaper ($2/$8 vs $2.50/$10), has a larger context window (1M vs 128K tokens), and performs better on instruction following and coding tasks. GPT-4o is still available but considered legacy.

What are reasoning tokens and why do they increase costs?

The o-series models (o3, o4-mini) use internal chain-of-thought reasoning before producing a visible answer. These reasoning tokens are billed as output tokens but aren't shown in the response. A simple o3 answer might use 500 visible output tokens but 5,000 reasoning tokens, meaning you pay for 5,500 output tokens total.

How does OpenAI pricing compare to Anthropic's Claude?

OpenAI is generally cheaper at the low end. GPT-4.1 Nano ($0.10/$0.40) is 10x cheaper than Claude Haiku 4.5 ($1/$5). At the production tier, GPT-4.1 ($2/$8) is cheaper than Sonnet 4.6 ($3/$15). At the flagship tier, GPT-5 ($1.25/$10) is cheaper than Opus 4.6 ($5/$25) on input but comparable on output. Quality differences vary by task.

What's the Batch API and when should I use it?

The Batch API processes requests asynchronously (results within 24 hours) at a 50% discount on all token costs. Use it for any workload that doesn't need real-time responses: data extraction, content generation, evaluations, nightly processing. It's the single biggest cost reduction available.

How does prompt caching work?

OpenAI automatically caches repeated prompt prefixes and charges a reduced rate on subsequent requests. The discount varies: GPT-5 family gets 90% off cached reads, GPT-4.1 family gets 75% off, GPT-4o/o-series get 50% off. Structure prompts with stable content first and variable content last to maximize cache hits.

What happened to GPT-4 and GPT-4 Turbo?

They're legacy models. GPT-4 ($30/$60 per 1M tokens) and GPT-4 Turbo ($5/$15) are still available but deprecated. GPT-4.1 ($2/$8) and GPT-4o ($2.50/$10) replaced them at dramatically lower prices with better performance. There's no reason to use GPT-4 or GPT-4 Turbo in new projects.

Is the OpenAI API free?

OpenAI offers a free tier with limited rate limits (3 RPM for reasoning models, 500 RPM for GPT-4o-mini). It's enough for testing and development but not production use. Paid tiers start at $5 cumulative spend and unlock significantly higher rate limits.

How do I reduce my OpenAI API costs?

Three high-impact moves: (1) Route simple requests to GPT-4.1 Nano instead of GPT-4.1, 20x cheaper. (2) Use the Batch API for non-real-time workloads, 50% off everything. (3) Enable prompt caching by structuring prompts with stable content first, up to 90% off repeated input tokens. Together these can cut costs by 70-85%.

Has OpenAI API pricing changed in April 2026?

Pricing in April 2026 holds at $2/1M input and $8/1M output for GPT-4.1, $0.40/$1.60 for GPT-4.1 Mini, $0.10/$0.40 for GPT-4.1 Nano. GPT-5 sits at $1.25/$10. o3 reasoning sits at $2/$8 (down from $10/$40 at launch). The most recent change was the GPT-4.1 family launch in early 2026 which replaced GPT-4o pricing (now grandfathered at the old $2.50/$10 rate). Batch API holds at 50% of standard. Prompt caching applies up to 90% discount on repeated input. April rates have not changed since the GPT-4.1 launch.

What is the official OpenAI API pricing for 2026?

Official April 2026 rates: GPT-4.1 at $2/1M input, $8/1M output. GPT-4.1 Mini at $0.40/$1.60. GPT-4.1 Nano at $0.10/$0.40. GPT-5 at $1.25/$10. o3 reasoning at $2/$8 (post-launch price cut). Batch API at 50% of standard. Prompt caching up to 90% off cached input. Embeddings: text-embedding-3-small at $0.02/1M, text-embedding-3-large at $0.13/1M. Canonical price list at openai.com/api/pricing.

OpenAI GPT-4o API pricing in 2026: what should I expect?

GPT-4o is grandfathered for existing users at the legacy $2.50/1M input and $10/1M output rate. The GPT-4.1 family replaced it in early 2026 at $2/$8 for the full model. For most production workloads, GPT-4.1 Mini ($0.40/$1.60) matches GPT-4o quality on standard tasks at lower cost. New API users default to GPT-4.1 endpoints. Existing GPT-4o integrations continue at legacy pricing but should be evaluated for migration to Mini for cost savings.

OpenAI API pricing for GPT-4.1 Mini official rates?

GPT-4.1 Mini official rates in April 2026: $0.40/1M input, $1.60/1M output. Batch API: $0.20/$0.80 (50% off). Cached input reads cost $0.10/1M tokens (75% off standard input). Mini is 5x cheaper than full GPT-4.1 ($2/$8) and 4x more expensive than Nano ($0.10/$0.40). Most cost-optimized production deployments route 70-85% of traffic to Mini and reserve full GPT-4.1 for complex reasoning.

How much does OpenAI API cost per million tokens in 2026?

Per million token rates in April 2026: GPT-4.1 at $2 input / $8 output. GPT-4.1 Mini at $0.40 / $1.60. GPT-4.1 Nano at $0.10 / $0.40. GPT-5 at $1.25 / $10. GPT-5 Mini at $0.25 / $2. o4-mini at $1.10 / $4.40. o3 at $2 / $8 (post-launch cut). GPT-4o legacy at $2.50 / $10 (grandfathered). Embeddings $0.02-$0.13/1M tokens by model. Batch API discounts everything 50%. Prompt caching reduces repeated input up to 90%. For real cost estimates, multiply by your average input/output token counts per request, then by monthly volume.

OpenAI API Pricing Update Tracker (2026)

OpenAI publishes new model tiers, pricing adjustments, and feature changes throughout the year. We track every update so this page stays the most current source. Last reviewed: April 2026.

April 2026: No list price changes. GPT-4.1 family rates hold steady. Prompt caching discount confirmed at up to 90% on repeated stable input tokens.
March 2026: o3 reasoning model expanded to broader API access at $15/$60 per million tokens with cached input discount.
February 2026: GPT-4.1 Nano launched at $0.10/$0.40 per million tokens, becoming the cheapest GPT-4-class endpoint.
January 2026: GPT-4.1 family launched at $2/$8 (full), $0.40/$1.60 (Mini), $0.10/$0.40 (Nano). GPT-4o moved to legacy/grandfathered status at the $2.50/$10 rate for existing users.
Q4 2025: Batch API discount confirmed at 50% across all models. Prompt caching expanded to all GPT-4-class models.