Copyright (C) 2013 Sergey Demyanov
contact: [email protected]
This library has been written as a part of my project on facial expression analysis. It contains the implementation of convolitional neural nets for Matlab, both on Matlab and C++. The C++ version works about 2 times faster. Both implementations work identically.
GENERAL INFORMATION
Convolitional neural net is a type of deep learning classification algorithms, that can learn useful features from raw data by themselves. Learning is performed by tuning its weighs. CNNs consist of several layers, that are usually convolutional and subsampling layers following each other. Convolution layer performs filtering of its input with a small matrix of weights and applies some non-linear function to the result. Subsampling layer does not contain weights and simply reduces the size of its input by averaging of max-pooling operation. The last layer is fully connected by weights with all outputs of the previous layer. The output is also modified by a non-linear function. If your neural net consists of only fully connected layers, you get a classic neural net.
Learning process consists of 2 steps: forward and backward passes, that repeat for all objects in a training set. On the forward pass each layer transforms the output from the previous layer according to its function. The output of the last layer is compared with the label values and the total error is computed. On the backward pass the corresponding transformation happens with the derivatives of error with respect to outputs and weights of this layer. After the backward pass finished, the weights are changed in the direction that decreases the total error. This process is performed for a batch of objects simultaneously, in order to decrease the sample bias. After all the object have been processed, the process might repeat for different batch splits.
DESCRIPTION
The library was written for Matlab and its functions can be called only from Matlab scripts. It operates with 2-dimensional objects, like images, that are stored as a 3-dimensional array. The last index represents the object number. The labels must be in a 2-dimensional array where the first index represents the class label (0 or 1) for each object.
The library contains 3 main functions to call:
- [weights, trainerr] = cnntrain(layers, params, train_x, train_y, funtype, weights_in(optionally)) Performs neural net training. Returns weights from all layers as a single vector.
- [pred, err] = cnntest(layers, weights, test_x, test_y, funtype) Calculates the test error. Based on cnnclassify, that returns only the predictions.
- [weights_in] = genweights(layers, funtype); Returns randomly generated weights for neural net. If you need to get repeatable results, just pass these weights to the cnntrain or cnntest.
Parameters:
layers - the structure of CNN. Sets up as cell array, with each element representing an independent layer. Layers can be one of 4 types:
- i - input layer. Must be the first and only first. Must contain the "mapsize" field, that is a vector with 2 integer values, representing the objects size. May also contain the "outputmaps" field, that specifies the number of independent data sources. In this case the function input must be a cell array with the "outputmaps" number of cells.
- c - convolutional layer. Must contain the "kernelsize" field, that identifies the filter size. Must not be greater than the size of maps on the previous layer. Must also contain the "outputmaps" field, that is the number of maps for each objects on this layer. If the previous layer has "m" maps and the current one has "n" maps, the total number of filters on it is m * n. Despite that it is called convolutional, it performs filtering, that is a convolution operation with flipped dimensions.
- s - scaling layer. Reduces the map size by pooling. Must contain the "scale" field, that is also a vector with integer 2 values.
- f - fully connected layer. Must contain the "length" field that defines the number of its outputs. Must be the last one. For the last layer the length must coincide with the number of classes. May also contain the "dropout" field, that determines the probabilty of dropping the input elements. Should not be too large, otherwise it drops everything.
All layers except "i" may contain the "function" field, that defines their action. For:
- c and f - it defines the non-linear function. It can be either "sigm" or "relu", for sigmoids and rectified linear units respectively. The default value is "sigm".
- f - it can also be "SVM", that calculates the SVM error function. See this article for the details. Has been tested only for the final layer.
- s - it defines the pooling procedure, that can be either "mean" or "max". The default value is "mean".
params - define the learning process. It is a structure with the following fields. If some of them are absent, the value by default is taken.
- alpha - defines the learning rate speed. Default is 1, for "SVM" on the last layer should be about 10 times lower.
- batchsize - defines the size of batches. Default is 50.
- numepochs - the number of repeats the training procedure with different batch splits. Default is 1.
- momentum - defines the actual direction of weight change according to the formula m * dp + (1-m) * d, where m is momentum, dp is the previous change and d is the current derivative. Default is 0.
- adjustrate - defines how much we change the learning rate for a particular weight. If the signs of previous and current updates coincide we add it to the learning rate. If not, we divide the learning rate on (1 - adjustrate). Default is 0.
- maxcoef - defines the maximum and minimum learning rates, that are alpha * maxcoef and alpha / maxcoef respectively. Default is 10.
- balance - boolean variable. Balances errors according to the class appearance frequencies. Useful for highly unbalanced datasets. Default is 0.
- shuffle - determines whether the input dataset will be shuffled or not. If you want repeatable results, you need to set it to 0. In this case the batches are created in a natural order: first "batchsize" objects become the first batch and so on. Otherwise, it should be 1. Default is 1.
- verbose - determines output info during learning. For 0 there is no output, for 1 it prints only number of epochs, for 2 it prints both numbers of epoch and batch. Default is 2.
weights - the weights vector obtained from genweights or cnntrain, that is used for weights initialization. Can be used for testing, repeating the results or continuing the training procedure.
funtype - defines the actual function that is used. Can be either "mexfun" or "matlab". "Mexfun" is faster, but in "matlab" it is easier to do some debugging and see the intermediate results.
TECHNICAL DETAILS
-
For compilation the C++ version you need the have the Boost library. Just modify the paths in compile.m and run it. I tried it only in Windows, but it should work in Linux as well. You can also download the binaries from my website.
-
The "cnnexamples.m" file requires "mnist_uint8.mat" file to be performed. You can get it from Matlab Central File Exchange, just download it and save in ./data folder.
-
Uncertainty comes not only from weights and the input data order. Another source of uncertainity is the dropout matrix. Therefore if want to get repeatable resutls, you need to setup all dropout rates to 0.
SOME COMMENTS
-
The library was developed for Matlab, but probably works in Octave as well. In case the matlab "imdilate" function does not work, you can use the mex-function "maxscale" instead. Just uncomment it in the corresponding block and compile by 'mex maxscale' if necessary.
-
In order to achieve compatibility with mex, there are some unnecessary transpose operations in the matlab code. If you do need it, you can remove them.
ACKNOLEDGEMENTS
-
The original Matlab code and the "mnist_uint8.mat" workspace was created by Rasmus Berg Palm and can be found in his DeepLearnToolbox. The Matlab version basically remained the same structure as there.
-
The C++ version was inspired by Yichuan Tang and his solution for the Kaggle Facial Expression Recognition Challenge. The structure of C++ code was originated from there.