Ollama

Platforms & Tools

An open-source tool for running large language models locally on personal computers with a simple command-line interface.

Ollama is an open-source application that makes it easy to download, run, and manage large language models on local hardware. It provides a simple command-line interface and a REST API, abstracting away the complexity of model quantization, memory management, and GPU acceleration so that users can run LLMs with minimal setup.

Ollama packages models using a format inspired by Docker, where each model is defined by a "Modelfile" that specifies the base model, parameters, system prompts, and other configuration. Users can pull pre-built models from the Ollama library (including Llama, Mistral, Gemma, Phi, and many others) or create custom models by modifying existing ones. The tool automatically handles quantization and hardware detection to optimize performance.

Ollama has become popular in the developer community for local AI development, prototyping, and privacy-sensitive applications where sending data to cloud APIs is not desirable. It runs on macOS, Linux, and Windows, supports both CPU and GPU inference, and integrates with tools like Open WebUI, LangChain, and various IDE extensions for code completion.

Related Terms

Large Language Model Inference RAM

Last updated: February 25, 2026