Overview
Given a layer with {% n_{in} %} number of inputs, and {% n_{out} %} number of outputs, the Xavier initialization would set each parameter drawn from the uniform distribution, {% \mathcal{U} %} as:
{% W \sim \mathcal{U}(-\sqrt{\frac{6}{n_{in}+n_{out}}}, \sqrt{\frac{6}{n_{in}+n_{out}}}) %}
A roughly equivalent form of initialization using the
normal distribution is
{% W \sim \mathcal{N}(0, \frac{2}{n_{in}+n_{out}}) %}