Word Embeddings

NLP

Dense vector representations of words in a continuous space where semantic similarity corresponds to geometric proximity.

Word embeddings are learned vector representations that map words from a high-dimensional one-hot space into a dense, low-dimensional continuous space — typically 50 to 300 dimensions. Words with similar meanings or usage patterns end up close together in this space. The classic demonstration is that the vector arithmetic 'king − man + woman ≈ queen' captures relational meaning geometrically.

The most influential early word embedding algorithms were Word2Vec (2013) and GloVe (2014). Word2Vec trains a shallow neural network to predict a word from its context (CBOW) or a context from a word (Skip-gram). GloVe factorizes a word co-occurrence matrix. Both produce static embeddings — one fixed vector per word — which cannot capture polysemy or context dependence.

Contextual word embeddings, produced by models like ELMo, BERT, and GPT, replaced static embeddings for most tasks. These models generate different representations for the same word depending on context, resolving ambiguity that static embeddings cannot handle. Word embeddings remain foundational to understanding how meaning is encoded in neural NLP systems.

Last updated: March 6, 2026

Word Embeddings

Related Terms