Kernel Regression
Overview
The kernel method is one way to take a linear regression and
turn it into a method that can utilize a nonlinear relationship
among the selected features, by first mapping the features
into a new feature space (in a nonlinear fashion).
The technique is similar to other kernel methods. By first
recasting the regression methodology in a format that utilizes
an embedded kernel function, the regression can be run
in which a different kernel function is used.
Effectively this technique retains the linear features of the model,
while placing the nonlinearities into the feature map using the kernel
function.
ridge regression
{% \vec{x} \in \mathbb{R}^n %}
{% \phi %}
What you would seek to do, would be to recast the the regression
formula so that the values of the sample set only show up as part
of inner product, such as
{% \kappa(\vec{x_i}, \vec{x_j}) = \; < \phi(\vec{x_i}), \phi(\vec{x_j}) > %}
{% Cost(\vec{w}) = \vec{w} \lambda ||\vec{w}||^2 + \sum_{i=1}^n (y_i - g(x_i))^2 %}
{% min_\vec{w} \; \lambda ||\vec{w}||^2 + \sum_{i=1}^n (y_i - g(x_i))^2 %}
{% min_\vec{w} \; \lambda ||\vec{w}||^2 + | y - < \vec{w} , \vec {x} > |^2 %}
If you differentiate the cost function with respect to the vector, w and
set to zero, you get
{% X^TX\vec{w} + \lambda \vec{w} = (X^TX + \lambda I_n) \vec{w} = X^Ty %}
{% g(\vec{x}) = <\vec{w},\vec{x}> = y^TX(X^TX + \lambda I_n)^{-1} \vec{x} %}
{% \vec{w} = \lambda ^{-1} X^T(y - X\vec{w}) = X^T\alpha %}
{% \alpha = \lambda ^{-1} (\vec{y} - X\vec{w}) %}
{% \lambda \alpha = (\vec{y} - XX^T \alpha) %}
{% (XX^T + \lambda I_n) \alpha = y %}
{% \alpha = (XX^T + \lambda I_n)^{-1} y %}
where
{% (XX^T) _{ij} = \; < x_i , x_j > %}
which we will refer to as the Gramm matrix, and represent as G.
The prediction for a new datapoint would be
{% g(\vec{x}) = < \vec{w} , \vec{x} > = \sum_1 ^n \alpha _i < \vec{x}_i , \vec{x} > = y^T(G + \alpha I_i) ^{-1} \vec{k} %}
where
{% \vec{k}_i = < \vec{x}_i , \vec{x} > %}
kernel ridge regression
In a the case of
{% | y - < \vec{w} ,\vec{\phi}(\vec{x}) > | %}
Then the Gram matrix becomes
{% G _{ij} = \; < \phi(x_i) , \phi(x_j) > %}
and
{% \vec{k}_i = < \phi(\vec{x}_i) , \phi( \vec{x} ) > %}
let rg = await import('/lib/machine-learning/kernel/v1.0.0/regression.js');
Try it!