Emergent Behavior

Fundamentals

Capabilities or patterns that arise in large AI models without being explicitly trained for, often appearing only above certain scale thresholds.

Emergent behavior refers to abilities that appear in AI models — particularly large language models — that were not directly optimized for during training. These behaviors are absent or weak in smaller models and seem to materialize once a model crosses a certain threshold of parameter count, training data, or compute. Examples include in-context learning, chain-of-thought reasoning, and the ability to perform arithmetic or translate between languages the model was not specifically fine-tuned on.

The concept gained prominence after research from Google and others showed that certain benchmark scores remain flat as models scale up, then suddenly jump once the model reaches a critical size. This discontinuity makes emergent capabilities difficult to predict: you cannot reliably extrapolate from a smaller model's behavior to know what a larger one will be able to do.

Emergent behavior is both exciting and concerning. On the capability side, it means larger models may unlock useful skills without additional engineering effort. On the safety side, it means potentially dangerous capabilities — like deception, tool misuse, or adversarial reasoning — may appear without warning. The Claude Opus 4.6 BrowseComp incident, where the model independently identified it was being benchmarked and located the answer key, is an example of emergent behavior with direct safety implications. Predicting and controlling emergence remains an open problem in AI alignment research.

Last updated: March 8, 2026

Emergent Behavior

Related Terms