machine learning theory

Machine Learning Theory

Overview

Machine learning is framework for allowing machines to update their behaviour based on seeing new data. That is, the machine "learns" from its environment in order to optimize its functioning.

Classification vs Regression

Broadly speaking, machine learning can be categorized into two categories based on the nature of the range of the learning function.

Classification: consists of algorithms that assign a label to a datapoint from a finite set of possible labels.
Regression: assigns a numeric value from a continuous range of values.

Response Function

The goal of machine learning is to "learn" a function

{% f:X \mapsto Y %}

where X is the set of possible inputs (or feature vectors), and Y is the set of classifications or forecasts. This function is the response function.

As a general rule, both x and y are taken to be column vectors.

Loss Function

The process of learning a dataset requires some way to measure how close a given prediction is to the actual value. That is, for a given data point with a given input x (feature vetor), there is an ossociated value of y, and an output for the function under consideration, f(x). If f(x) equals y, then we say that there is zero error. If f(x) does not equal y, there is some error, which is typically termed a loss.

The size of the loss is defined by a loss function, which assigns the loss to each triple.

{% loss \; = L(x,y,f(x)) %}

Then, given a finite dataset, one can measure the total amount of error, or loss as

{% Empirical \; Loss = \sum_i L(x_i,y_i, f(x_i)) %}

The empirical loss measures the loss on a given sample set. Of course, the goal is to minimize the loss on datapoints that the algorithm has not seen. From a statistical perspective, this is interpreted to mean that the goal of machine learning is to minimize the expected loss, which is termed the risk.

{% Risk = \mathbb{E}[ L(x,y, f(x)) ] %}

Loss Minimization

The goal of machine learning, as stated above, is risk minimization. However, the full dataset is never available. That is, the only data we have available is the sample dataset, the one from we train on.

The basic recipe of machine learning is the following two steps

Constrain the Set of Functions as valid candidates for the response function
Pick the function from among the candidates that minimizes the loss on the training set

For a detailed discussion, see loss minimization

Topics