This is a C++ implementation of a basic, general-purpose feedworward neural network with a freely configurable number of layers and freely configurable number of neurons in each layer.
There are many ready-and-easy-to-use NN frameworks for C++ and especially Python which -to some degree- allow you to avoid making your feet wet with the nuts&bolts of the underlying principles. But I wanted to learn the fundamentals and create my own, working NN from scratch, without any external dependencies.
The main program's purpose is to train a NN to detect handwritten digits as provided by the MNIST dataset. Since the MNIST dataset contains 28x28 pixel large annotated pictures of handwritten digits, the input layer of the program's neural network has 28x28=784 neurons. And since there are 10 different digits to be detected, the output layer has 10 neurons. Each neuron's output indicates the (non-)detection of a specific digit - the neuron with the highest output value "wins the vote", so to speak.
In the hidden layer, there are 100 neurons. This number has been chosen by trial and error. Altogether, the network achieves a detection rate of about 95%.
I gained the knowledge about neural networks mainly from D. Kriesel's A Brief Introduction to Neural Networks.
Clone with:
git clone https://github.com/chn-dev/NeuralNetwork
and then compile with:
cd NeuralNetwork
cmake . -B build
cd build
make
Assuming you have downloaded the MNIST training and test datasets (mnist_train.csv and mnist_test.csv respectively), you invoke the main program like this:
./NeuralNetwork /path/to/mnist_train.csv /path/to/mnist_test.csv
The program will give a progress feedback during training and testing. After testing, it gives a success rate.
In feedforward neural networks, the neurons are arranged in layers, whereby neurons of a given layer are connected to all neurons of the previous layer. There is an input layer (where all neurons have only one input), an arbitrary number of hidden layers and an output layer. Signals are fed from the input layer through the hidden layers to the output layer.
The input layer is a special case because it just passes the input signals unaltered through to the first hidden layer.
A neuron consists of:
- a specific number of inputs:
$i_1 .. i_n$ - a weight for each of its inputs:
$w_1 .. w_n$ - an activation function:
$f_{act}()$ - an output:
$o$
In order to calculate the output of a neuron, the activation function is applied to the weighted sum of the inputs
The logistic function is a widely used activation function:
Therefore, we calculate the neuron's output with:
In order to query the neural network as a whole, we feed the components of the input vector into the neurons of the input layer and then successively feed the outputs of each layer successively into the inputs of the next layer by applying the formula above.
Let's consider a simple 2x3 neural network with an input layer, one hidden layer and an output layer, each with 2 neurons:
The input vector is
We feed the two components
-
Since neurons
$n_1$ and$n_2$ are part of the input layer, they pass their inputs unaltered through to their outputs. This means that$o_1 = i_1$ and$o_2 = i_2$ . -
The outputs
$o_3$ and$o_4$ of neurons$n_3$ and$n_4$ are calculated with:$$o_3 = { 1 \over { 1 + e ^ { -( w_1 o_1 + w_2 o_2 ) } } }$$ $$o_4 = { 1 \over { 1 + e ^ { -( w_3 o_1 + w_4 o_2 ) } } }$$ -
The outputs
$o_5$ and$o_6$ of neurons$n_5$ and$n_6$ are calculated with:$$o_5 = { 1 \over { 1 + e ^ { -( w_5 o_3 + w_6 o_4 ) } } }$$ $$o_6 = { 1 \over { 1 + e ^ { -( w_7 o_3 + w_8 o_4 ) } } }$$ After replacing
$o_3$ and$o_4$ , we obtain:$$o_5 = { 1 \over { 1 + e ^ { -( w_5 { 1 \over { 1 + e ^ { -( w_1 i_1 + w_2 i_2 ) } } } + w_6 { 1 \over { 1 + e ^ { -( w_3 i_1 + w_4 i_2 ) } } } ) } } }$$ $$o_6 = { 1 \over { 1 + e ^ { -( w_7 { 1 \over { 1 + e ^ { -( w_1 i_1 + w_2 i_2 ) } } } + w_8 { 1 \over { 1 + e ^ { -( w_3 i_1 + w_4 i_2 ) } } } ) } } }$$
This is still a small neural network. You can see that as neural networks become bigger, the necessary calculations rapidly become more and more complex and nested. Luckily, computers can do this very quickly for us, even with millions of neurons in hundreds of layers.
Training data is a set of input vectors of the form
and a set of expected network responses of the form
This is often called "annotation" in training datasets. For example, the MNIST dataset contains 28x28 pixel vectors representing handwritten digits plus an appropriate annotation indicating what the digit actually is, for example: "3". This is needed to evaluate the error of the network. The NeuralNetwork program translates this annotation to the expected output vector
so we expect neuron 3 to detect digit 3 by outputting 1, all others are expected to output 0.
The neural network responds to the input vector
A perfect neural network would always respond with
There will always be an error which can be calculated with
Each component of the error vector is actually the mean squared error (MSE), whereby
For a given training input vector, the error only depends on the input weights. What we do is to successively feed training vectors into the network, calculate the output error vector and reduce that error a little bit by adjusting the input weights of every neuron.
So lets consider a single neuron:
We can calculate the output error with:
This is the multidimensional error function, with the input weights
The gradient of the error function indicates its slope at a specific point
We call
The gradient
Note that the activation function
Having done this, we can continue with the derivative of the error function:
With
and
we obtain:
So, what we do is to feed the training value into the neuron, determine the difference
This adjustment of the input weights of a neuron can be done directly within the output layer of the network. That's because the training dataset contains expected outputs of the output layer.
We don't have any direct information about the errors (
Let's once more consider the example 2x3 neural network:
We feed the input vector
into the network and obtain
as the response from the network. The training dataset tells us that the expected network output is
from which we can calculate the errors of the output layer:
With this information, we can adjust the input weights
The errors of hidden layer(s), in this example
Neurons
Likewise, we can express the error
We propagate the output layer's errors back to the hidden layer. This can be done successively for all hidden layers. Let's generalize this.
Assuming that
Whereby
It turns out that it practically doesn't matter whether the denominator is there or not when training the network, so we can just as well omit it:
Here's a step-by-step summary of the training process. Repeat the steps for all available training data vectors.
- Feed a training data vector into the network
- Calculate the errors
$\delta$ of the output layer - Successively backpropagate the output errors to the hidden layers, from back to front
- Adjust the input weigths
$w$ of all neurons
That's it. Have fun!
Copyright (c) 2023 by chn