Attention Mechanism

Deep Learning

A technique that allows neural networks to focus on relevant parts of the input when producing each element of the output.

An attention mechanism is a component in neural networks that computes a weighted combination of input representations, where the weights indicate the relevance of each input element to the current processing step. This allows the model to dynamically focus on the most pertinent information rather than relying on a fixed-size hidden state.

In the context of transformers, self-attention (also called scaled dot-product attention) computes queries, keys, and values from the input. The attention score between any two positions is the dot product of their query and key vectors, scaled and passed through a softmax function. Multi-head attention extends this by running several attention operations in parallel, allowing the model to attend to information from different representation subspaces.

Attention mechanisms were originally developed for sequence-to-sequence models in machine translation but have since been adopted across virtually all areas of deep learning. They are the key innovation that makes transformers so effective and have been adapted for use in computer vision (vision transformers), speech processing, and protein structure prediction.

Last updated: February 20, 2026

Attention Mechanism

Related Terms