Embedding

NLP

A learned dense vector representation that maps discrete entities like words or items into a continuous vector space where similar items are closer together.

An embedding is a dense, low-dimensional vector representation of a discrete object such as a word, sentence, image, or user profile. Embeddings are learned during training and capture semantic relationships: objects with similar meanings or properties end up close together in the vector space, enabling mathematical operations on concepts.

Word embeddings (such as Word2Vec, GloVe, and contextual embeddings from BERT or GPT) revolutionized NLP by providing meaningful numerical representations of words. The classic example is that the vector arithmetic "king - man + woman" yields a vector close to "queen." Modern transformer-based models produce contextual embeddings where the same word receives different vectors depending on its surrounding context.

Embeddings are fundamental to modern AI systems far beyond NLP. They are used in recommendation systems (user and item embeddings), information retrieval (document embeddings for semantic search), computer vision (image embeddings from CNNs or ViTs), and multimodal systems (joint embeddings of text and images, as in CLIP). Embedding-based similarity search powers many production applications, from search engines to retrieval-augmented generation (RAG) systems.

Last updated: February 20, 2026

Embedding

Related Terms