# Artificial Neural Networks in NetLogo

Última modificación: 26 de Diciembre de 2018, y ha tenido 3413 vistas

Etiquetas utilizadas: || || || || ||

As a way to continue with AI algorithms implemented in NetLogo, in this post we will see how we can make a simple model to investigate about Artificial Neural Networks (ANN). We will restrict ourselves to the more common and classical ones, the Multilayer Perceptron Network... my excuses for those of you awaiting for something about Deep Neural Networks (DNN, as those used in AlphaGo from Google, or CaptionBot from Microsoft), maybe in a later post I will try to extract the main features of some convolutional neural network and test it on a very simple NetLogo model, but I am afraid that we will need too many computational resources to obtain anything of interest with this tool... Who knows? I will keep thinking.

The Multilayer Perceptron is one of the main ANN structures in use today... indeed, they are the basement of the new DNN methods and it is still appearing in some phases of them. Although it lacks some good features (for example, we can't control the learning process when we need several hidden layers to model more complex problems because the bad composition of Gradient Descent Method for tuning weights), it can solve a lot of interesting problems and offers a very visual way to understand the fundamentals of this area of Machine Learning.

As you can find a lot of good resources out there about the basics of ANN (better than I can write here) we will focus this post only in the implementation in NetLogo of a more or less flexible multilayer perceptron network, hoping that the use of agents and links in the model can help the reader to understand the central ideas of how it works.

The model we will prepare here is a slightly variant of that coming with the NetLogo official distribution. As a first approach, we will use them for computing boolean functions,

$f:\{0,1\}^n \to \{0,1\}^k$

that is, neurons from input layer will take only boolean values ($0$/$1$) and the return will be a boolean string. All these restrictions can be easily adapted to fix other requirements, ans then the main learning process has not to be changed, only the setup. Indeed, our networks will return a value in the interval $[0,1]$, that we can transform adequately in a boolean value (for example, with an appropriate step function). The NetLogo model comes with two example boolean functions:

• Majority (it returns $1$ if the input has more 1's than $0$'s), and
• Even (it returns $1$ if there are an even number of $1$'s).

Again, this affects only auxiliary procedures, not the main ones for the learning process, and it is easy to add more functions, or adapt the code to work with different kinds of functions.

We will train the network to compute a function from a sample-set of pairs $(input\ output)$ with the correct return that the network must provide (computing with the function we are trying to learn), and as usual in this learning model:

1. Starting from a random value of weights
2. Repeat a number of times:
1. Take every sample from the the samples dataset
2. Propagate: Compute the value of the network for the sample
3. Compute the error between the expected value and the obtained one
4. Back-Propagate: Adjust the weights in some way to reduce this error

With this in mind, let's see the data structures to be used in the implementation.

## NetLogo Implementation

Global variables needed to operate de model are (together with those from the interface controls)

globals [
data-list    ; List of pairs [Input Output] to train the network
inputs       ; List with the binary inputs in the training
outputs      ; List with the binary output in the training
epoch-error  ; error in every epoch during training
]


We will define several breeds of agents in order to ease the model:

• one for every type of layer-neurons (input, hidden and output), in this way it iis simpler to provide different behaviours for them, and
• one for the bias neuron, we use only one bias neuron because the information needed for every other neuron is stored in the weight of the link connecting it with the bias.

All of them will have properties for storing the activation value of the neuron (output) and the backpropagated gradient that must act on the links reaching it. Also, we add a weight property for the links:

breed [bias-neurons bias-neuron]

breed [input-neurons input-neuron]

breed [output-neurons output-neuron]

breed [hidden-neurons hidden-neuron]



After deciding the number of neurons in every layer (through the interface controls), setup procedure will prepare the geometry of the network (note the different shape of the different layer neurons), initialize the global variables for the learning process, and create a sample-set of pairs according to the boolean function we want to learn:

to setup
clear-all
; Building the Front Panel
ask patches [ set pcolor 39 ]
ask patches with [pxcor > -2] [set pcolor 38]
ask patches with [pxcor > 4] [set pcolor 37]
ask patch -6 10 [ set plabel-color 32 set plabel "Input"]
ask patch  3 10 [ set plabel-color 32 set plabel "Hidden Layer"]
ask patch  8 10 [ set plabel-color 32 set plabel "Output"]
; Building the network
setup-neurons
; Recolor of neurons and links
recolor
; Initializing global variables
set epoch-error 0
set data-list []
set inputs []
set outputs []
; Create samples
create-samples  ; Reset timer
reset-ticks
end
; Auxiliary Procedure to setup neurons
to setup-neurons
set-default-shape input-neurons "square"
set-default-shape output-neurons "neuron-node"
set-default-shape hidden-neurons "hidden-neuron"
set-default-shape bias-neurons "bias-node"

; Create Input neurons
repeat Neurons-Input-Layer [
create-input-neurons 1 [
set size min (list (10 / Neurons-Input-Layer) 1)
setxy -6 (-19 / Neurons-Input-Layer * ( who - (Neurons-Input-Layer / 2) + 0.5))
set activation random-float 0.1]]

; Create Hidden neurons
repeat Neurons-Hidden-Layer [
create-hidden-neurons 1 [
set size min (list (10 / Neurons-Hidden-Layer) 1)
setxy 2 (-19 / Neurons-Hidden-Layer * ( who - Neurons-Input-Layer                                                   - (Neurons-Hidden-Layer / 2) + 0.5))
set activation random-float 0.1 ]]

; Create Output neurons
repeat Neurons-Output-Layer [
create-output-neurons 1 [
set size min (list (10 / Neurons-Output-Layer) 1)
setxy 7 (-19 / Neurons-Output-Layer * ( who - Neurons-Input-Layer                                                   - Neurons-Hidden-Layer                                                   - (Neurons-Output-Layer / 2) + 0.5))
set activation random-float 0.1]]

; Create Bias Neurons
create-bias-neurons 1 [ setxy -1.5 9 ]
ask bias-neurons [ set activation 1 ]
end
; Auxiliary Procedure to create connections between neurons
connect input-neurons  hidden-neurons
connect hidden-neurons output-neurons
connect bias-neurons   hidden-neurons
connect bias-neurons   output-neurons
end

; Auxiliary procedure to totally connect two groups of neurons
to connect [neurons1 neurons2]
create-links-to neurons2 [ set weight random-float 0.2 - 0.1 ] ]
end


The procedure to create samples modify some global variables:

• inputs, to store the list of inputs of every sample,
• outputs, to store the list of outputs of every sample, in the same order, and
• data-list, to store the pairs of [input output] of every sample.
to create-samples  set inputs (n-values num-samples [ (n-values Neurons-input-layer [one-of [0 1]])])
set outputs map [ x -> (list evaluate Function x)] inputs  set data-list (map [ [x y] -> (list x y)] inputs outputs)end


In order to show the dynamics of the network while it is learning, we have a procedure, recolor, that adequately recolor neurons and links (also their thickness) to show their values:

• neurons: $0$-white, $1$-yellow. It uses the setp function to discretize the value,
to recolor
set color item (step activation) [white yellow]
]
let MaxP max [abs weight] of links
set thickness 0.05 * abs weight
ifelse weight > 0
[ set color lput (255 * abs weight / MaxP) [0 0 255]]
[ set color lput (255 * abs weight / MaxP) [255 0 0]]  ]
end

; Step Function
to-report step [x]
ifelse x > 0.5
[ report 1 ]
[ report 0 ]
end

The Propagation procedure is very simple when working with agents. Every neuron computes its activation value applying the sigmoid function to the sum of activations of neurons feeding it:

; Forward Propagation of the signal along the network
to Forward-Propagation
ask hidden-neurons [ set activation compute-activation ]
ask output-neurons [ set activation compute-activation ]
recolor
end

to-report compute-activation
report sigmoid (sum [ [activation] of end1 * weight] of my-in-links)
end

; Sigmoid Function
to-report sigmoid [x]
report 1 / (1 + e ^ (- x))
end


With all the previous procedures we have a functional model that use ANN to compute functions. Now we will provide the backpropagation procedure allowing the network to approximate functions from a sample-set of correct values. The calculations that you can find in the next procedure are the standard ones that derives from the Gradient Descent Method for optimizing the output error by varying the weights of the links.

Of course, the error is computed only from the differences between the output of output-neurons and the correct values we know they must provide (it is a supervised learning algorithm):

to Back-propagation
let error-sample 0
; Compute error and gradient of every output neurons
(foreach (sort output-neurons) outputs [    [ n y] ->
ask n [ set grad activation * (1 - activation) * (y - activation) ]
set error-sample error-sample + ( (y - [activation] of n) ^ 2 )])
; Average error of the output neurons in this epoch
set epoch-error epoch-error + (error-sample / count output-neurons)
; Compute gradient of hidden layer neurons
set grad activation * (1 - activation) * sum [weight * [grad] of end2] of my-out-links ]
set weight weight + Learning-rate * [grad] of end2 * [activation] of end1 ]
set epoch-error epoch-error / 2
end


If we have more hidden layers (in this model we have only one), we need going back from the output layer to the input layer one by one, computing the grad value of all the neurons of every layer. After that, we can update the weights of all the links. In ths point we can see one of the problems of this method when working with a great number of layers (deep layers): farther from output-layer, smaller the effect of the grad on the connection weights, and then the model can't learn the correct weights in layers distant from the output one (where the correct error to be reduced can be computed).

With this individual learning procedure we can now code the training procedure that will learn from all the samples from the set. It will try with all the samples only once (one epoch, in the ANN terminology), so it can be called from a Forever button, for example (in order to randomize the epoch and to not memorize the order of the dataset, we will shuffle this set in every epoch).

During the training, the procedure will plot the average error reached in the current epoch, so we can evaluate if the network is learning in a correct way. Observe that this will depend on the complexity of the function, as well as the structure of the network (maybe it is too simple for the function) and the sample dataset generated for the training process.

to train
set epoch-error 0
; For every trainig data
foreach (shuffle data-list) [    d ->
; Take the input and correct output
set inputs first d
set outputs last d
(foreach (sort input-neurons) inputs [      [n x] ->
ask n [set activation x] ])
; Forward Propagation of the signal
Forward-Propagation
; Back Propagation from the output error
Back-propagation ]
plotxy ticks epoch-error ; Plot the error
tick
end


After the training you can test if the network has approximated the function by trying with random inputs (usually, with some reserved test data) and using the correct value that you can compute with the real function (it is an advantage to use toy models and not real ones, where usually we have no idea about the "real" function). The next procedure may help you with the test, it will create a random input, and will return a pair [correct-ouput network-output] (where network-output is obtained from the continuous activations of output-layer):

to-report  test
let inp n-values Neurons-input-layer [one-of [0 1]]
let out (list evaluate Function inp)
set inputs inp
active-inputs
Forward-Propagation
report (list out [activation] of output-neurons)
end

; Activate input neurons with read inputs
to active-inputs
(foreach (sort input-neurons) inputs [    [n x] ->
recolor
end


If you want, you can prepare a procedure to repeat some tests and calculate the average error of the set:

to-report multi-test [n]
let er 0
repeat n [
let t test
set er er + ((first first t) - (first last t)) ^ 2]
report er / n
end


Finally, we provide the functions you can find in the NetLogo Model to make some experiments:

to-report evaluate [f x]
report runresult (word f x)
end

to-report Majority [x]
let ones length filter [ x -> x = 1] x
let ceros length filter [ x -> x = 0] x
report ifelse-value (ones > ceros) [1] [0]
end

to-report Even [x]
let ones length filter [ x -> x = 1] x
report ifelse-value (ones mod 2 = 1) [1] [0]
end


If you want to add any other function, simply add the code as a report and add the name of the function to the Chooser control (Function) in the interface.

In this link you can play with a web version of the model (NetLogoWeb doesn't allow runresult on strings, for that reason the model irunning in the web is slightly different in the treatment of the evaluation functions):

Note that it is running on NetLogoWeb, so it is much more slower than the normal version, that you can find here.

You can play around with all the parameters and trying to discover how they affect the quality of the solution (remember to repeat the experiments since the training sample-set is randomly created and can affect the accuracy). Maybe it is of interest trying to change the number of neurons in the hidden layer.

## To know more...

Wikipedia Neural Networks

Michael Nielsen's Neural Networks

The Nature of Code (Ch. 10)

ANN for Beginners

NetLogo Book