Training a neural network
A neural network (NN) is used to solve problems that are not linearly separable like the XOR logic. A NN is organized in several layers of interconnected neurons, in fact there are several types of networks configuration but in our application, we will use a fully connected multilayer perceptron, whose dimensions depend on the problem to be solved. Here solving the XOR logic requires 3 layers: an input, a hidden and an output layer. The first layer is composed of as many neurons as there are inputs. Therefore, for the XOR logic we have two inputs which take the values 0 or 1. Another example, for the use of a NN in image processing, we will have as many input-neurons as there are pixels in the image. The number of neurons in the hidden layer depends on Kolmogorov’s law :
hidden = 2n + 1, where n = number of input
Referring to the diagram below, each of the neurons of the hidden and output layers are assigned a weighted input sum and a weighted bias. This sum is then fed into an activation function, which will adjust the output. In our example we use the sigmoid function which transforms the input into an output in the range ]−1 ; 1[.
This type of system is trained to solve a specific task with the help of examples for which the desired input and output are known. In the case of XOR logic, it is known that the output must be 1 if and only if the two inputs are different from each other. In the case of animal and object image recognition, labeled image banks are therefore used to train the NNs. During the training phase the network configuration is tested and the parameter values are corrected according to the performance of the network.
Thus we find the analogy with brain’s neurons since, for example, when recognizing a cat, attributes such as whiskers will be associated with the activation of a group of neurons, which simulates associative memory. On the other hand, an NN can be trained to recognize handwritten numbers at a very acceptable success rate (95%) , but if a cat image is submitted to it, this NN will be certain that the image represents, for example, a 4. An NN will only perform in the task for which it has been trained, and will only output responses related to that task.
Finally, our network will have the following structure:
Knowing that each line of color represents the transfer of a weighted value. The output will result from the following calculation: (Refer to the figure above for the colors)
We can therefore see that with our architecture, two inputs eventually produce a single output, and that all the weights and biases of the network influence this output.
The classical learning method for an NN is the error backpropagation: we submit a known dataset to the network, and by comparing the output produced to the expected output, we can compute the error committed by the system. Then one can deduce which weights and biases are most responsible for the error and correct them little by little. We can imagine the configuration of the NN as a point evolving in a 21-dimensional space (the number of parameters) and which would be slowly oriented towards the optimal configuration.
Now we will experiment with a different learning method, the particle system optimization. As a reminder, the PSO is an algorithm which, using a swarm of particles, is able to approximate the extremum of a function. Here our function is the error committed by the NN. Indeed we will assign to a fleet of 20 particles 21 random parameter values, and then subject the NN to a known dataset. The NN will take as parameters the coordinates, in turn, of each of the particles. The final output from the NN will be used to assess the efficiency of each configurations and thus used to compute an error to be minimized. This way, the swarm of particles will gradually move in a 21-dimensional solution-space to approximate the optimal configuration to solve our problem.
PSO’s use in training a neural network
A few explanations :
On the image below, the cube on the right is used to visualize the progress of the learning of the NN. Indeed, the outputs of the network are projected on the edges of the cube. We can see for example that for the input pair [0 , 0] the NN outputs a value close to 0 (in blue) according to the XOR logic. On the other hand, the pair [0 , 1] outputs a value close to 1 (in red), and we know that two inputs different from each other in XOR logic must output a 1.
To achieve these results, the weights and biases of the network evolved with the PSO algorithm, until acceptable values were reached at the 66th iteration.