Cold Start

MLOps

The initial delay when a system or service must initialize from scratch before it can handle requests, common in serverless and containerized deployments.

A cold start occurs when a system that has been idle or is being launched for the first time must complete initialization steps before it can begin processing. This includes loading code into memory, establishing network connections, initializing runtimes, and in the case of ML systems, loading model weights into memory or onto a GPU.

In serverless computing platforms like AWS Lambda or Google Cloud Functions, cold starts happen when a new container instance must be provisioned to handle an incoming request. The latency introduced can range from milliseconds to several seconds depending on the runtime, package size, and initialization logic. For ML inference endpoints, cold starts can be especially pronounced because large model files must be downloaded and loaded into GPU memory.

Strategies to mitigate cold starts include keeping instances warm with periodic pings, using provisioned concurrency, optimizing container image sizes, lazy-loading non-critical dependencies, and using model caching. In the context of recommendation systems, "cold start" also refers to the challenge of making predictions for new users or items with no historical data.

Related Terms

Inference Docker Image

Last updated: February 25, 2026