Feature Extraction and Kernel Regression

Overview


Given a dataset with columns {% X_1, X_2, ..., X_m %}, the standard OLS regression can be stated as
{% y = \alpha + \beta_1 X_1 + ,,, + \beta_m X_m %}
The {% X_i %} can represent any features of the given record, including calculated features from the other features. For example, given a single feature {% X %}, one can compute additional features from the one feature and run a regression as follows.
{% y = \alpha + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 %}
Here the additional features are just powers of the first feature.

Kernel Regression


The linear regression method can be extended using kernel methods to include an arbitray number of features, and in fact, can accomodate an infinite number of features. The resulting method is referred to as kernel regression.