The derivation of the ordinary least squares regression coefficients relies on the method from
calculus of
The forecasted weight vector {% \vec{b} %} is calculated by minimizing the squared error.
The residuals can be computed as follows:
{% Residual_i = y_i - x_i ^T \vec{\beta} %}
The sum of the square of the residuals is
{% SSR = (\vec{y} - X \vec{\beta})^T (\vec{y} - X \vec{\beta}) %}
We seek the value of {% \beta %} that minimizes the SSR.
{% SSR = y^T y - \beta^T X^T y - y^T X \beta + \beta^T X^T X \beta %}
Noting the following two identities
{% \frac{d (a^T \beta)}{d\beta} = a %}
{% \frac{\beta^T X \beta}{d\beta} = 2X\beta %}
We get
{% b = (X^T X)^{-1} X^T \vec{y} %}