>_TheQuery
← Glossary

Activation Function

Deep Learning

A mathematical function applied to a neuron's output that introduces non-linearity, enabling neural networks to learn complex patterns.

An activation function is a non-linear transformation applied to the weighted sum of inputs at each neuron in a neural network. Without activation functions, a neural network would be equivalent to a single linear transformation regardless of its depth, severely limiting its ability to model complex relationships.

Common activation functions include the Rectified Linear Unit (ReLU), which outputs the input if positive and zero otherwise; the sigmoid function, which squashes values to the range (0, 1); the hyperbolic tangent (tanh), which maps values to (-1, 1); and the Gaussian Error Linear Unit (GELU), which is widely used in transformers. Each has trade-offs in terms of gradient flow, computational cost, and suitability for specific tasks.

The choice of activation function significantly impacts training dynamics. ReLU and its variants (Leaky ReLU, PReLU) addressed the vanishing gradient problem that plagued sigmoid and tanh activations in deep networks. More recent activation functions like SiLU/Swish and GELU have shown improvements in certain architectures, particularly in large language models and vision transformers.

Last updated: February 20, 2026