Large Language Model

NLP

A neural network trained on vast amounts of text data that can understand and generate human language with remarkable fluency and versatility.

A large language model (LLM) is a transformer-based neural network with billions (or trillions) of parameters trained on massive corpora of text data. LLMs learn statistical patterns in language through self-supervised pretraining, typically using a next-token prediction objective, enabling them to generate coherent and contextually appropriate text across a wide range of tasks.

The capabilities of LLMs scale with model size, data quantity, and compute investment, a relationship described by scaling laws. Modern LLMs like GPT-4, Claude, Gemini, and Llama demonstrate emergent abilities at sufficient scale, including in-context learning, chain-of-thought reasoning, and the ability to follow complex instructions. Post-training techniques such as reinforcement learning from human feedback (RLHF) further align these models with human preferences and values.

LLMs have transformed AI applications across industries, powering chatbots, code assistants, content generation tools, and research aids. Active areas of research include improving reasoning capabilities, reducing hallucinations, extending context windows, enabling tool use, and developing more efficient architectures that deliver strong performance at smaller scales.

Related Terms

Transformer Fine-tuning Tokenization

Last updated: February 20, 2026