The Linear Layer

We now introduce Linear layers, a neural network layer abstraction that allows us to quickly build feedforward networks with ease. For various historical and non-historical reasons, you may see other deep learning resources or libraries refer to these as dense or perceptron layers, but they all mean basically the same thing. For our purposes, a Linear layer will be one that simply applies a linear (sometimes called affine) transformation to the input. That is, for an input $X \in \mathbb{R}^{m \times n}$, they'll apply a transformation that looks like this

$$\text{Linear}(X) = W X + b$$

We call $W$ the weight matrix for the layer and $b$ the bias vector. If you view $W$ as applying a linear map to $X$, then $b$ allows us to shift that mapping off the origin. This is key to the representational power of the affine transformation. You can refer to the linked Wikipedia article to learn more about the interesting properties of affine transformations, but some of the key delineators are that they preserve

  1. Collinearity (points lying on the same line, lie on that same line after the transformation)
  2. Parallelism (parallel lines remain parallel after the transformation)
  3. Convexity (convex sets in the domain remain convex after the transformation is applied)

Now that we've introduced the Linear layer, let's work on its implementation within our FlameFlower library.

The Implementation

First things first, let's recall the Module class of our nn library. This provides a basic construction for neural network "parts" that we can string together to make a full model. As such, our Linear layer class will inherit from nn.Module. Now we can look toward the class __init__ method. If you look back to the layer definition, all we need to do is specify the two parameters which comprise it: $W$ and $b$. Remember, $X$ is a matrix with rows containing our training examples and columns containing the features. Therefore, the number of columns of $X$ will be the input_size of our layer. Instead of multiplying on the left by $W$ as in the layer definition, we'll actually multiply by it on the right in order to make the dimensions work out. Therefore, we'll have input_size number of rows in $W$ and output_size number of columns. Having this figured out, let's start the implementation of __init__

The __init__ Method

View Solution

def __init__(self, in_size, out_size):
    super(Linear, self).__init__()
    self.in_size = in_size
    self.out_size = out_size

Remember that because we're inheriting from nn.Module we need to use the Python built-in super() function to instantiate the parent class.

Next, let's add a couple enhancements to the above __init__ implementation. Let's defer handling of actually initializing the model parameters, W and b, to a private method _init_parameters(). Finally, let's pass a keyword argument use_bias to __init__ which allows us to specify whether we want to use the bias in the layer. The updated __init__ should now look like this.

View Solution

def __init__(self, in_size, out_size, use_bias=True):
    super(Linear, self).__init__()
    self.in_size = in_size
    self.out_size = out_size
    self.use_bias = use_bias
    self._init_params()

The _init_parameters Method

Now, let's turn our attention to actually filling the W and b parameters with their initial values. If you've read (which you should have!) the lesson on parameter initialization, you'll know that there are various schemes that can be used for sampling the initial values of weight matrices (vectors). We'll implement these in a separate module called init.py. The _init_parameters method will instead just take an optional init_fn keyword argument which will pass a reference to a desired parameter initialization function which handles all the value sampling. The default such function we'll use will be called glorot_uniform. This will handle the initialization of W. For b, we'll just initialize it to a vector of zeroes. This is a pretty commonly used practice for setting initial biases which works pretty well.

Another thing we want to do is ensure that we wrap W and b as Tensor objects. This will ensure that they're tracked by autograd and will be optimized via backpropagation during neural network training.

Finally, we'll want to call self.new_param(param_name, param) for each of the parameters we initialize. This is an underlying method of the Module class and allows the parameters to be tracked as part of the module, so that they can be used by Optimizers (more on these later). Let's see what all of this looks like in code.

View Solution

def _init_params(self, init_fn=None):
    if not init_fn:
        init_fn = init.glorot_uniform
    self.W = Tensor(init_fn(self.in_size, self.out_size))
    self.b = Tensor(tl.zeros((1, self.W.shape[1])))
    self.new_param('W', self.W)
    if self.use_bias:
        self.b = Tensor(tl.ones((1, self.W.shape[1])))
        self.new_param('b', self.b)

The forward Method

Now it's time for the implementation bread and butter. If you'll recall from the Module section, every Module must implement a forward() method which specifies the model computation when called on an input. In our case, we just implement the simple equation from the Linear layer definition. Remember, we can use @ as an alias for Numpy matrix multiplication. The code looks as follows.

View Solution

def forward(self, X):
    if self.use_bias:
        return X @ self.W + self.b
    else:
        return X @ self.W

The Entire Thing (Imports and All)

View Solution

from .module import Module
from flamethrower.autograd import Tensor

import flamethrower.autograd.tensor_library as tl
import flamethrower.autograd.tensor_library.random as tlr
import flamethrower.nn.initialize as init

class Linear(Module):
    def __init__(self, in_size, out_size, use_bias=True):
        super(Linear, self).__init__()
        self.in_size = in_size
        self.out_size = out_size
        self.use_bias = use_bias
        self._init_params()

    def _init_params(self, init_fn=None):
        if not init_fn:
            init_fn = init.glorot_uniform
        self.W = Tensor(init_fn(self.in_size, self.out_size))
        self.b = Tensor(tl.zeros((1, self.W.shape[1])))
        self.new_param('W', self.W)
        if self.use_bias:
            self.b = Tensor(tl.ones((1, self.W.shape[1])))
            self.new_param('b', self.b)

    def forward(self, X):
        if self.use_bias:
            return X @ self.W + self.b
        else:
            return X @ self.W

Complete and Continue