Universal Approximation Theorem

Deep Learning

A theorem proving that a neural network with a single hidden layer and non-linear activation can approximate any continuous function to arbitrary precision, given enough neurons.

The Universal Approximation Theorem (Cybenko 1989, Hornik et al. 1989) states that a feedforward network with one hidden layer containing a finite number of neurons with a non-polynomial activation function (such as sigmoid or ReLU) can approximate any continuous function on a compact domain to any desired accuracy. Each neuron defines a ridge in input space, and by combining many such ridges with different orientations and positions, any smooth surface can be approximated.

However, the theorem is often misleadingly interpreted as neural networks can learn anything. It only guarantees existence of an approximation, not that gradient descent will find it, that a reasonable number of neurons will suffice (the required width could be exponentially large), or that the approximation will generalize from finite training data. It is a statement about representational capacity, not about learnability.

The practical significance is that neural networks are not fundamentally limited in what functions they can represent. The real challenges lie in optimization (finding good weights), generalization (performing well on unseen data), and efficiency (using a reasonable number of parameters). Deep networks are preferred over wide shallow ones because they can represent compositional functions exponentially more efficiently, exploiting the hierarchical structure present in real-world data.

Last updated: February 22, 2026

Universal Approximation Theorem

Related Terms