Kernel Regression

Overview


The kernel method is one way to take a linear regression and turn it into a method that can utilize a nonlinear relationship among the selected features, by first mapping the features into a new feature space (in a nonlinear fashion).

The technique is similar to other kernel methods. By first recasting the regression methodology in a format that utilizes an embedded kernel function, the regression can be run in which a different kernel function is used.

Effectively this technique retains the linear features of the model, while placing the nonlinearities into the feature map using the kernel function.

ridge regression


{% \vec{x} \in \mathbb{R}^n %}
{% \phi %}
What you would seek to do, would be to recast the the regression formula so that the values of the sample set only show up as part of inner product, such as
{% \kappa(\vec{x_i}, \vec{x_j}) = \; < \phi(\vec{x_i}), \phi(\vec{x_j}) > %}
{% Cost(\vec{w}) = \vec{w} \lambda ||\vec{w}||^2 + \sum_{i=1}^n (y_i - g(x_i))^2 %}
{% min_\vec{w} \; \lambda ||\vec{w}||^2 + \sum_{i=1}^n (y_i - g(x_i))^2 %}
{% min_\vec{w} \; \lambda ||\vec{w}||^2 + | y - < \vec{w} , \vec {x} > |^2 %}
If you differentiate the cost function with respect to the vector, w and set to zero, you get
{% X^TX\vec{w} + \lambda \vec{w} = (X^TX + \lambda I_n) \vec{w} = X^Ty %}
{% g(\vec{x}) = <\vec{w},\vec{x}> = y^TX(X^TX + \lambda I_n)^{-1} \vec{x} %}
{% \vec{w} = \lambda ^{-1} X^T(y - X\vec{w}) = X^T\alpha %}
{% \alpha = \lambda ^{-1} (\vec{y} - X\vec{w}) %}
{% \lambda \alpha = (\vec{y} - XX^T \alpha) %}
{% (XX^T + \lambda I_n) \alpha = y %}
{% \alpha = (XX^T + \lambda I_n)^{-1} y %}
where
{% (XX^T) _{ij} = \; < x_i , x_j > %}
which we will refer to as the Gramm matrix, and represent as G.

The prediction for a new datapoint would be
{% g(\vec{x}) = < \vec{w} , \vec{x} > = \sum_1 ^n \alpha _i < \vec{x}_i , \vec{x} > = y^T(G + \alpha I_i) ^{-1} \vec{k} %}
where
{% \vec{k}_i = < \vec{x}_i , \vec{x} > %}

kernel ridge regression


In a the case of
{% | y - < \vec{w} ,\vec{\phi}(\vec{x}) > | %}
Then the Gram matrix becomes
{% G _{ij} = \; < \phi(x_i) , \phi(x_j) > %}
and
{% \vec{k}_i = < \phi(\vec{x}_i) , \phi( \vec{x} ) > %}

let rg = await import('/lib/machine-learning/kernel/v1.0.0/regression.js');
					
Try it!

Contents