Universal Approximation Theorem
Overview
The universal approximation shows that in general, one does not need more than 2 layers in a neural network in order to
approximate any reasonable function. However, it is true that the performance of a neural network in terms of computing resources
may be better by including additional layers.
Theorem
Let {% K \subset \mathbb{R}^n %} be closed and bounded (compact) and let
{% f:K \rightarrow \mathbb{R}^m %} such that {% f_i(x) \geq 0 %} for all
{% x \in K %} and each i. The for {% \epsilon > 0 %} there exists an
artificial neural network {% h:\mathbb{R}^n \rightarrow \mathbb{R}^m %} with two layers
using ReLU activation function such that
{% sup_{x \in K} || h(x) - f(x) || < \epsilon %}
see berlyand