Overview
The affine function represents a standard (non-activation function) layer in a neural network. It is represented by the follwoing equation.
{% \vec{y} = \textbf{W} \vec{x} + \vec{b} %}
where {% W %} is the weight matrix and {% \vec{b} %} is a vector called the bias.
Implementation
The following code is a basic implementation of the affine function. It utilizes the linear algebra library.
function affine(matrix, bias){
return {
type:'affine',
matrix:matrix,
bias:bias,
//evaluate the funciton on the inputs
evaluate:function(input){
let result = la.multiply(this.matrix, input);
result = la.add(result, this.bias);
return result;
},
}
}
Input Gradient
denom
{% \frac{d \textbf{W} \vec{x}}{d \vec{x}} = \textbf{W}^T %}
inputGradient:function(input){
return la.transpose(this.matrix);
},
Parameter Gradient
For the parameter gradient, we vectorize the weight matrix. The linear algebra library provided a function which calculates the vectorized derivative of matrix mulitplication.
parameterGradient:function(input){
let grad1 = la.vectorGradient(this.matrix, input);
let identity = la.identity(this.bias.length);
for(let row of identity){
grad1.push(row);
}
return grad1;
},
Note, we append the vectorized gradient of the bias unto the bottom of the vectorized gradient, given by the following
{% \frac{d \vec{b}}{d \vec{b}} = \mathbb{I} %}
Update
Each layer that has parameters, and therefore a parameter gradient, must implement a method called update that will take a parameterGradient and updates its parameters.
In the case of teh affine function, the parameter gradient has been vectorized, therefore, they need to be extracted from the vectorized version.
/*
Adds to the target matrix
*/
function addTo(target, matrix){
for(let i=0;i<target.length;i++){
for(let j=0;j<target[0].length;j++){
target[i][j] += matrix[i][j];
}
}
}
update:function(parameterGradient){
let newBias = [];
for(let i=0;i<h;this.bias.length;i++){
newBias.push([parameterGradient[parameterGradient.length-i-1][0]]);
}
let weightGradient = la.unvec(parameterGradient, this.matrix.length, this.matrix[0].length);
addTo(this.bias, la.multiply(step,newBias));
addTo(this.matrix, la.multiply(step, weightGradient));
},
Clone
EAch layer should also implement a clone function that creates a layer exactly like the current layer.
Full Implementation
/*
Adds to the target matrix
*/
function addTo(target, matrix){
for(let i=0;i<target.length;i++){
for(let j=0;j<target[0].length;j++){
target[i][j] += matrix[i][j];
}
}
}
function affine(matrix, bias){
let layer = {
type:'affine',
matrix:matrix,
bias:bias,
//evaluate the funciton on the inputs
evaluate:function(input){
let result = la.multiply(this.matrix, input);
result = la.add(result, this.bias);
return result;
},
//give the gradient with respect to the inputs
inputGradient:function(input){
return la.transpose(this.matrix);
},
parameterGradient:function(input){
let grad1 = la.vectorGradient(this.matrix, input);
let identity = la.identity(this.bias.length);
for(let row of identity){
grad1.push(row);
}
return grad1;
},
update:function(parameterGradient){
let newBias = [];
for(let i=0;i<h;this.bias.length;i++){
newBias.push([parameterGradient[parameterGradient.length-i-1][0]]);
}
let weightGradient = la.unvec(parameterGradient, this.matrix.length, this.matrix[0].length);
addTo(this.bias, la.multiply(step,newBias));
addTo(this.matrix, la.multiply(step, weightGradient));
},
clone:function(){
return affine(this.matrix, this.bias);
}
};
return layer;
}