Cross-Entropy
Example
Why It Matters
Cross-entropy is the objective function that LLMs are trained to minimize. Understanding it explains why models sometimes generate high-probability but incorrect text (hallucinations) and why temperature adjustments change output quality.
How It Works
Cross-entropy measures the difference between two probability distributions: what the model predicted and what actually happened. For language models, it measures how surprised the model is by each token in the training data. The goal of training is to minimize this surprise across trillions of tokens.
The formula computes the negative log probability assigned to the correct token at each position. If the model assigned high probability to the correct token, the cross-entropy for that position is low. If it assigned low probability, the cross-entropy is high. Averaging across all positions gives the model's overall loss.
Cross-entropy connects to perplexity through a simple relationship: perplexity = 2^(cross-entropy). This means a model with cross-entropy loss of 3.32 has a perplexity of 10. Understanding this relationship helps interpret training curves and model comparisons.
Common Mistakes
Common mistake: Confusing training loss (cross-entropy) with model quality for downstream tasks
Lower training loss means better next-token prediction, not necessarily better task performance. Models are typically evaluated on downstream tasks, not training loss.
Common mistake: Expecting cross-entropy to decrease monotonically during training
Loss curves have noise, and validation loss may increase while training loss decreases (overfitting). Monitor validation loss and use early stopping when it starts rising.
Career Relevance
Cross-entropy understanding is fundamental for ML engineers and researchers working on model training. It's the objective function that drives all language model development, making it important background knowledge for anyone in the AI field.
Related Terms
Stay Ahead in AI
Join 1,300+ prompt engineers getting weekly insights on tools, techniques, and career opportunities.
Join the Community →