Cybenko theorem

Formulated by Cybenko in 1989, the theorem states:

Let $$\varphi$$ be any continuous sigmoid-type function [e.g., $$\varphi(\xi) = 1/(1+e^{-\xi})$$]. Then, given any continuous real-valued function $$f$$ on $$[0,1]^n$$ (or any other compact subset of $$R^n$$) and $$\epsilon > 0$$, there exists vectors $$\mathbf{w_1}, \mathbf{w_2}, \dots, \mathbf{w_N}, \mathbf{\alpha}$$ and $$\mathbf{\theta}$$ and a parameterized function $$G(\mathbf{\cdot},\mathbf{w},\mathbf{\alpha},\mathbf{\theta}): [0,1]^n \rightarrow R$$ such that $$|G(\mathbf{x},\mathbf{w},\mathbf{\alpha},\mathbf{\theta}) - f(x)| < |\epsilon|$$ for all $$\mathbf{x} \in [0,1]^n$$ where $$G(\mathbf{x},\mathbf{w},\mathbf{\alpha},\mathbf{\theta}) = \sum_{i=1}^N\alpha_j\varphi(\mathbf{w}_j^T\mathbf{x} + \theta_j)$$ and $$\mathbf{w}_j \in R^n, \alpha_j, \theta_j \in R, \mathbf{w} = (\mathbf{w}_1, \mathbf{w}_2, \dots \mathbf{w}_N), \mathbf{\alpha} = (\alpha_1, \alpha_2, \dots, \alpha_N),$$ and $$\mathbf{\theta} = (\theta_1, \theta_2, \dots, \theta_N)$$.

This theorem says that a single hidden layer, feed forward neural network is capable of approximating any continuous, multivariate function to the desired degree of accuracy and that failure to map a function arises from poor choices for $$\mathbf{w}_1, \mathbf{w}_2, \dots, \mathbf{w}_N, \mathbf{\alpha},$$ and $$\mathbf{\theta}$$ or an insufficient number of hidden neurons.