Tokens
Example
Why It Matters
Token count directly impacts cost and performance. Efficient prompt engineering means getting the same quality output with fewer input tokens. At enterprise scale, reducing prompt length by 20% can save thousands per month.
How It Works
Tokens are the fundamental units that language models process. A token might be a whole word ('hello'), a word fragment ('un' + 'believ' + 'able'), a punctuation mark, or a special character. Different models use different tokenizers, so the same text produces different token counts across models.
Tokenization affects both cost and behavior. API pricing is per-token, so understanding token counts is essential for budget management. Tokenization quirks also cause model behavior oddities: models struggle with character-counting tasks because they don't see individual characters, only tokens.
Common ratios: English text averages about 0.75 tokens per word (or about 4 characters per token). Code tends to use more tokens per line than prose. Non-English languages, especially those with non-Latin scripts, typically require more tokens per word, making API calls more expensive.
Common Mistakes
Common mistake: Estimating costs based on word count instead of actual token count
Use the model provider's tokenizer tool to get exact counts. Libraries like tiktoken (OpenAI) give precise token counts for budgeting.
Common mistake: Ignoring that both input AND output tokens are billed
Output tokens are typically 3-4x more expensive than input tokens. Limiting output length (e.g., 'respond in under 100 words') can significantly reduce costs.
Career Relevance
Token economics directly affect AI product viability. Prompt engineers and AI product managers need to understand token costs to build sustainable products. A prompt that uses 2,000 tokens vs 500 tokens for the same task means 4x the API cost at scale.
Related Terms
Stay Ahead in AI
Join 1,300+ prompt engineers getting weekly insights on tools, techniques, and career opportunities.
Join the Community →