GPT-4o Pricing: What It Costs Per Token (April 2026)
GPT-4o launched in May 2024 at $5/$15 per million tokens and got a 50% price cut in October 2024. It now costs $2.50 input and $10 output per million tokens. OpenAI has since released GPT-4.1 as the recommended production replacement, but GPT-4o remains available and widely used. If you are running GPT-4o in production or evaluating whether to switch, this page covers everything: current per-token pricing, batch and caching discounts, comparison to GPT-4.1, and real-world cost estimates.
GPT-4o
- ✓ Multimodal: text, image, and audio input
- ✓ 128K context window
- ✓ Function calling and structured outputs
- ✓ Vision capabilities included at no extra charge
- ✓ 50% cached input discount ($1.25/1M)
GPT-4o Mini
- ✓ Cheapest GPT-4o variant
- ✓ 128K context window
- ✓ Good for classification, extraction, simple tasks
- ✓ Function calling supported
- ✓ 50% cached input discount ($0.075/1M)
GPT-4o Audio
- ✓ Native audio input and output
- ✓ Text tokens still at standard GPT-4o rates
- ✓ Audio tokens are significantly more expensive
- ✓ Best for voice-first applications
GPT-4o (Batch)
- ✓ 50% discount on standard pricing
- ✓ 24-hour turnaround SLA
- ✓ Same quality as synchronous API
- ✓ Best for bulk processing and evaluation
GPT-4o Pricing Table (April 2026)
Here is every GPT-4o variant with current per-token pricing. All prices are per 1 million tokens unless noted otherwise.
GPT-4o Price History
GPT-4o launched in May 2024 at $5.00 input and $15.00 output per million tokens. In October 2024, OpenAI cut prices by 50% to the current $2.50/$10.00. GPT-4o Mini launched in July 2024 at $0.15/$0.60 and has not changed since. The price cuts made GPT-4o competitive with Claude 3.5 Sonnet at the time, though both providers have since released newer models.
GPT-4o vs GPT-4.1: Which Should You Use?
GPT-4.1 replaced GPT-4o as OpenAI's recommended production model in January 2025. It is cheaper ($2.00/$8.00 vs $2.50/$10.00), scores higher on coding benchmarks (SWE-bench), handles instruction following better, and supports a 1 million token context window versus GPT-4o's 128K. The main reason to stay on GPT-4o is vision support. GPT-4.1 does not accept image inputs. For text-only applications, GPT-4.1 is strictly better on price and performance. The API interface is identical, so migration requires only changing the model parameter from gpt-4o to gpt-4.1.
GPT-4o vs GPT-4o Mini: When to Use Each
GPT-4o Mini costs 94% less than GPT-4o on input tokens and 94% less on output tokens. For tasks that do not require GPT-4o's full capabilities, classification, entity extraction, simple Q&A, content routing. Mini delivers comparable results at a fraction of the cost. The quality gap shows up on complex reasoning, nuanced writing, and multi-step tool use. A common pattern is to use Mini for initial filtering or classification and route only complex requests to the full GPT-4o or GPT-4.1 model. This hybrid approach can cut API costs by 80% or more depending on your traffic mix.
Batch API: Half-Price GPT-4o
OpenAI's Batch API processes requests asynchronously at 50% off. GPT-4o Batch pricing is $1.25 input and $5.00 output per million tokens. Jobs complete within 24 hours. The Batch API accepts the same request format as the synchronous Chat Completions API, making integration straightforward. Best for: bulk content generation, large-scale data labeling, evaluation pipelines, and any workload where latency is not critical. You submit a JSONL file of requests and poll for results.
Prompt Caching and How It Reduces GPT-4o Costs
OpenAI automatically caches the prefix of your prompt. When a subsequent request starts with the same tokens, cached input tokens are billed at 50% off ($1.25/1M instead of $2.50/1M for GPT-4o). Caching works best with long, stable system prompts and few-shot examples. The cache has a short TTL, typically a few minutes, so it helps most with high-throughput applications making frequent similar requests. Structure your prompts with static content first and dynamic content last to maximize cache hits.
Real-World Cost Examples
Here is what GPT-4o actually costs for common use cases, assuming average token counts per request.
Rate Limits by Spending Tier
OpenAI sets rate limits based on your total API spending. New accounts start at the free tier with modest limits. As you spend more, limits increase automatically. For GPT-4o specifically:
How to Estimate Your Monthly GPT-4o Bill
Multiply your average input tokens per request by your request volume, then divide by 1 million and multiply by $2.50. Do the same for output tokens at $10.00 per million. Example: 10,000 requests/day averaging 1,000 input and 300 output tokens each = 10M input tokens + 3M output tokens per day. Monthly: 300M input ($750) + 90M output ($900) = $1,650/month at standard rates, or $825/month with Batch API. Add caching if your prompts have stable prefixes.
GPT-4o vs Claude Sonnet 4.6: Cross-Provider Comparison
Anthropic's Claude Sonnet 4.6 is the closest competitor to GPT-4o. Claude Sonnet 4.6 costs $3.00 input and $15.00 output per million tokens, more expensive than GPT-4o on both dimensions. Claude offers a 200K context window versus GPT-4o's 128K. Performance is comparable on most benchmarks, with Claude generally stronger on long-form writing and analysis, and GPT-4o stronger on structured output and tool use. For cost-sensitive applications, GPT-4o (or better yet, GPT-4.1) has the edge.
Migration Guide: GPT-4o to GPT-4.1
If you are running GPT-4o for text-only applications, switching to GPT-4.1 saves 20% on every API call with no quality loss. The migration is simple: change the model parameter from "gpt-4o" to "gpt-4.1" in your API calls. Both models use the same Chat Completions API format, support the same tools (function calling, structured outputs, JSON mode), and accept the same parameters. Test with a small sample first to verify output quality meets your requirements, then switch over. The only blocker is if you rely on vision input. GPT-4.1 does not support images.
Hidden Costs & Gotchas
- ⚠ {'title': 'GPT-4.1 is cheaper for most use cases', 'detail': "GPT-4.1 costs $2.00/$8.00 per 1M tokens versus GPT-4o's $2.50/$10.00. That's 20% cheaper on input and output. Unless you need vision or audio specifically, GPT-4.1 is the better deal with stronger benchmark performance."}
- ⚠ {'title': 'Output tokens cost 4x input tokens', 'detail': 'GPT-4o charges $2.50 per 1M input tokens but $10.00 per 1M output tokens. Applications that generate long responses (summaries, articles, code) will see output dominate the bill. Budget based on your output-to-input ratio.'}
- ⚠ {'title': 'Vision queries use image tokens', 'detail': 'Sending images to GPT-4o converts them to tokens based on resolution. A 1024x1024 image uses roughly 765 tokens. High-res mode can use 2-3x more. Vision is included in the base price, but the token count adds up fast.'}
- ⚠ {'title': 'Audio tokens are 16x more expensive', 'detail': "GPT-4o audio input costs $40 per 1M tokens and output costs $80 per 1M tokens. That's 16x and 8x the text token prices respectively. For voice applications, calculate audio token costs separately."}
- ⚠ {'title': 'Prompt caching only helps on repeated prefixes', 'detail': 'The 50% cached input discount applies only when the beginning of your prompt matches a recent request exactly. System prompts and few-shot examples benefit most. Dynamic user inputs typically do not get cached.'}
- ⚠ {'title': 'Rate limits depend on your spending tier', 'detail': "Free tier users get 500 RPM. Tier 1 (\\ spent) gets 500 RPM and 30K TPM. Tier 5 (\\+ spent) gets 10K RPM and 30M TPM. Your effective throughput depends on your account's spending history."}
Which Plan Do You Need?
Prototyping or light usage
GPT-4o Mini at $0.15/$0.60 per 1M tokens. 97% cheaper than standard GPT-4o for simple tasks. Use this for classification, extraction, routing, and early-stage development.
Production text applications
Migrate to GPT-4.1 ($2.00/$8.00). It's 20% cheaper than GPT-4o, benchmarks higher on coding and instruction following, and has a 1M token context window. GPT-4o is no longer the recommended production model.
Vision and multimodal workflows
GPT-4o is still the go-to for image understanding. GPT-4.1 does not support vision input. If your application sends images, stay on GPT-4o or evaluate o4-mini for reasoning-heavy vision tasks.
Bulk processing and evaluation
Batch API at $1.25/$5.00 per 1M tokens. Half the cost of synchronous GPT-4o, with 24-hour turnaround. Ideal for data labeling, content generation, and test suite evaluation.
The Bottom Line
GPT-4o at $2.50/$10.00 per million tokens is no longer OpenAI's best value for text-only applications. GPT-4.1 is 20% cheaper and performs better on most benchmarks. But GPT-4o remains the right choice for vision and multimodal workflows where GPT-4.1 has no support. If you're starting fresh, default to GPT-4.1 for text and GPT-4o for images. If you're already running GPT-4o, the migration is straightforward since they share the same API format.
Related Resources
Frequently Asked Questions
How much does GPT-4o cost per token?
GPT-4o costs $2.50 per 1 million input tokens and $10.00 per 1 million output tokens. With prompt caching, input tokens drop to $1.25 per million. Batch API pricing is $1.25 input and $5.00 output per million tokens.
Is GPT-4o cheaper than GPT-4?
Yes. GPT-4o is significantly cheaper than the original GPT-4 ($30/$60 per 1M tokens) and GPT-4 Turbo ($10/$30 per 1M tokens). GPT-4o costs $2.50/$10.00, roughly 75% cheaper than GPT-4 Turbo.
Should I use GPT-4o or GPT-4.1?
Use GPT-4.1 for text-only applications, it is 20% cheaper and benchmarks higher. Use GPT-4o only if you need vision (image input) or audio capabilities, which GPT-4.1 does not support.
What is GPT-4o Mini and when should I use it?
GPT-4o Mini costs $0.15/$0.60 per million tokens, 94% cheaper than GPT-4o. Use it for simple tasks like classification, extraction, routing, and early-stage development. Switch to GPT-4o or GPT-4.1 for complex reasoning and nuanced generation.
Does GPT-4o have a free tier?
The OpenAI API does not have a free tier for GPT-4o. New accounts get $5 in credits. ChatGPT Free includes limited GPT-4o access, but that is the consumer product, not the API.
How does GPT-4o batch pricing work?
Submit a JSONL file of requests to the Batch API endpoint. Processing completes within 24 hours. Pricing is 50% off standard rates: $1.25 input and $5.00 output per million tokens. Same quality as synchronous calls.
Is GPT-4o being deprecated?
OpenAI has not announced a deprecation date for GPT-4o. It is no longer the recommended production model (GPT-4.1 replaced it), but it remains available. OpenAI typically gives at least 6 months notice before deprecating models.
How much does it cost to analyze an image with GPT-4o?
Image analysis uses GPT-4o's standard text token pricing plus additional tokens for the image. A 1024x1024 image uses roughly 765 tokens ($0.002). High-res images use more tokens. There is no separate per-image charge.
What is the context window for GPT-4o?
GPT-4o supports 128K tokens of context (roughly 96K words). Both GPT-4o and GPT-4o Mini share this 128K limit. If you need more context, GPT-4.1 supports 1 million tokens.
How does GPT-4o pricing compare to Claude?
GPT-4o ($2.50/$10.00) is cheaper than Claude Sonnet 4.6 ($3.00/$15.00) on both input and output. Claude Haiku 4.5 ($0.80/$4.00) is cheaper than GPT-4o but more expensive than GPT-4o Mini ($0.15/$0.60).