>_TheQuery
← Glossary

Epoch

Fundamentals

One complete pass through the entire training dataset during model training.

An epoch represents one full cycle through every example in the training dataset. If you have 10,000 training examples and use a batch size of 100, one epoch consists of 100 gradient update steps. Training typically runs for multiple epochs -- anywhere from a few to hundreds -- allowing the model to see and learn from each example multiple times.

The number of epochs is a critical training decision tied to the bias-variance tradeoff. Too few epochs means the model hasn't learned enough from the data (underfitting). Too many epochs means the model starts memorizing training data rather than learning generalizable patterns (overfitting). The optimal number is typically determined by monitoring validation loss and applying early stopping when it begins to increase.

In practice, the term is also used to describe training progress and schedule milestones. Learning rate schedules are often defined in terms of epochs (e.g., reduce learning rate by 10x at epoch 30 and 60). It is important to distinguish epochs from iterations (individual gradient update steps) and from the total number of gradient updates, which depends on both epochs and batch size.

Last updated: February 22, 2026