« Local Search Algorith… « || Inicio || » Self Organizing Maps … »

Artificial Neural Networks in NetLogo

Última modificación: 23 de Abril de 2017, y ha tenido 917 vistas

Etiquetas utilizadas: || || || || ||

As a way to continue with AI algorithms implemented in NetLogo, in this post we will see how we can make a simple model to investigate about Artificial Neural Networks (ANN). We will restrict ourselves to the more common and classical ones, the Multilayer Perceptron Network... my excuses for those of you that are waiting to see here something about Deep Neural Networks (DNN, as those used in AlphaGo from Google, or CaptionBot from Microsoft), maybe in a later post I will try to think about how to extract the main features of some convolutional neural network and test it on a very simple NetLogo model, but I am afraid that we will need too many computational resources to obtain anything of interest with this tool... Who knows? I will keep thinking.

The Multilayer Perceptron is one of the main ANN structures in use today... indeed, they are the basement of the new DNN methods and it is still appearing in some phases of them. Although it lacks of some good features (for example, we can't control the learning process when we need several hidden layers to model more complex problems because the bad composition of Gradient Descent Method for tuning weights), it can solve a lot of interesting problems and offers a very visual way to understand the fundamentals of this area of Machine Learning.

As you can find a lot of good resources out there about the basics of ANN (better than I can write here) we will focus this post only in the implementation in NetLogo of a more or less flexible multilayer perceptron network, hoping that the use of agents and links in the model can help the reader to understand the central ideas of how it works.

The model we will prepare here is a slightly variant of that coming with the NetLogo official distribution. As a first approach, we will us them for computing boolean functions,

\[f:\{0,1\}^n \to \{0,1\}^k\]

that is, neurons from input layer will take only boolean values (\(0\)/\(1\)) and the network will return a boolean string. All this restrictions can be easily changed to fix other requirements, the main learning process has not to be changed, only the setup. Indeed, our networks will return a value in the interval \([0,1]\), that we can transform adequately in a boolean value (for example, with an appropriate step function). The NetLogo model comes with two example boolean functions:

  • Majority (it returns \(1\) if the input has more 1's than \(0\)'s), and
  • Even (it returns \(1\) if there are an even number of \(1\)'s).

Again, this affect only auxiliary procedures, not the main ones for the learning process, and it is easy to add more functions, or adapt the code to work with different kind of functions.

We will train the network to compute a function from a sample-set of pairs (input output) with the correct return that the network must provide (computing with the function we are trying to learn), and as usual in this learning model:

  1. Starting from a random value of weights
  2. Repeat a number of times:
    1. Take every sample
    2. Propagate: Compute the value of the network for it
    3. Compute the error between the expected value and the obtained one
    4. Back-Propagate: Adjust the weights in some way to reduce this error

With this in mind, let's see the data structures to be used in the implementation.

NetLogo Implementation

The global variables that we need to operate de model are (together with those from the interface controls)

globals [
    data-list    ; List of pairs [Input Output] to train the network
    inputs       ; List with the binary inputs in the training
    outputs      ; List with the binary output in the training
    epoch-error  ; error in every epoch during training
    ]

We will work with several breed of agents:

  • one for every type of layer-neurons (input, hidden and output), because we can choose a different behaviour for them, and
  • one for the bias neuron, we use only one because the information needed for every other neuron is stored in the weight of the link connecting it with the bias.

All of them will have properties for storing the activation value of the neuron (output) and the backpropagated gradient that must act on the links reaching it. Also, we add a weight property for the links:

breed [bias-neurons bias-neuron]
bias-neurons-own [activation grad]
    
breed [input-neurons input-neuron]
input-neurons-own [activation grad]
  
breed [output-neurons output-neuron]
output-neurons-own [activation grad]
    
breed [hidden-neurons hidden-neuron]
hidden-neurons-own [activation grad]
    
links-own [weight]

After deciding the number of neurons in every layer (through the interface controls), setup procedure will prepare the geometry of the network (note the different shape of the different layer neurons), initialize the global variables for the learning process, and create a sample-set of pairs according to the boolean function we want to learn:

to setup
    clear-all
    ; Building the Front Panel
    ask patches [ set pcolor 39 ]
    ask patches with [pxcor > -2] [set pcolor 38]
    ask patches with [pxcor > 4] [set pcolor 37]
    ask patch -6 10 [ set plabel-color 32 set plabel "Input"]
    ask patch  3 10 [ set plabel-color 32 set plabel "Hidden Layer"]
    ask patch  8 10 [ set plabel-color 32 set plabel "Output"]
    ; Building the network
    setup-neurons
    setup-links
    ; Recolor of neurons and links
    recolor
    ; Initializing global variables
    set epoch-error 0
    set data-list []
    set inputs []
    set outputs []
    ; Create samples
    create-samples
; Reset timer reset-ticks end
; Auxiliary Procedure to setup neurons to setup-neurons set-default-shape input-neurons "square" set-default-shape output-neurons "neuron-node" set-default-shape hidden-neurons "hidden-neuron" set-default-shape bias-neurons "bias-node" ; Create Input neurons repeat Neurons-Input-Layer [ create-input-neurons 1 [ set size min (list (10 / Neurons-Input-Layer) 1) setxy -6 (-19 / Neurons-Input-Layer * ( who - (Neurons-Input-Layer / 2) + 0.5)) set activation random-float 0.1]] ; Create Hidden neurons repeat Neurons-Hidden-Layer [ create-hidden-neurons 1 [ set size min (list (10 / Neurons-Hidden-Layer) 1) setxy 2 (-19 / Neurons-Hidden-Layer * ( who - Neurons-Input-Layer
- (Neurons-Hidden-Layer / 2) + 0.5)) set activation random-float 0.1 ]] ; Create Output neurons repeat Neurons-Output-Layer [ create-output-neurons 1 [ set size min (list (10 / Neurons-Output-Layer) 1) setxy 7 (-19 / Neurons-Output-Layer * ( who - Neurons-Input-Layer
- Neurons-Hidden-Layer
- (Neurons-Output-Layer / 2) + 0.5)) set activation random-float 0.1]] ; Create Bias Neurons create-bias-neurons 1 [ setxy -1.5 9 ] ask bias-neurons [ set activation 1 ] end
; Auxiliary Procedure to create connections between neurons to setup-links connect input-neurons hidden-neurons connect hidden-neurons output-neurons connect bias-neurons hidden-neurons connect bias-neurons output-neurons end ; Auxiliary procedure to totally connect two groups of neurons to connect [neurons1 neurons2] ask neurons1 [ create-links-to neurons2 [ set weight random-float 0.2 - 0.1 ] ] end

The procedure to create samples modify some global variables:

  • inputs, to store the list of inputs of every sample,
  • outputs, to store the list of outputs of every sample, in the same order, and
  • data-list, to store the pairs of [input output] of every sample.
to create-samples
    repeat num-samples [
      let inp n-values Neurons-input-layer [one-of [0 1]]
      let out (list evaluate Function inp)
      set inputs lput inp inputs
      set outputs lput out outputs
      set data-list lput (list inp out) data-list  ]
end

In order to show the dynamics of the network while it is learning, we have a procedure, recolor, that adequately recolor neurons and links (also their thickness) to show their values:

  • neurons: 0-white, 1-yellow. It uses the setp function to discretize the value,
  • links: negative-red, positive-blue, value-thickness.
to recolor
    ask turtles [
      set color item (step activation) [white yellow]
      ]
    let MaxP max [abs weight] of links
    ask links [
      set thickness 0.05 * abs weight
      ifelse weight > 0
        [ set color lput (255 * abs weight / MaxP) [0 0 255]]
        [ set color lput (255 * abs weight / MaxP) [255 0 0]]  ]
end
    
; Step Function
to-report step [x]
    ifelse x > 0.5
      [ report 1 ]
      [ report 0 ]
end

The Propagation procedure is really easy when working with agents. Every neuron computes its activation value applying the sigmoid function to the sum of activations of neurons feeding it:

; Forward Propagation of the signal along the network
to Forward-Propagation
    ask hidden-neurons [ set activation compute-activation ]
    ask output-neurons [ set activation compute-activation ]
    recolor
end
    
to-report compute-activation
    report sigmoid (sum [ [activation] of end1 * weight] of my-in-links)
end
    
; Sigmoid Function
to-report sigmoid [x]
    report 1 / (1 + e ^ (- x))
end

With all the previous procedures we have a functional model that use ANN to compute functions. Now we will provide the backpropagation procedures that allow the network to approximate functions from a sample-set of correct values. The calculations that you can find in the next procedure are the standard ones that derives from the Gradient Descent Method for optimizing the output error by varying the weights of the connections.

Take into account that we would have more hidden layers (in this case we have only one), you need going back, from the output layer to the input layer, one by one computing the grad value of all the neurons of the layer. After that, you can update the weights of all the links. In ths point we can see one of the problems of this method when working with deep layers: farther from output-layer, smaller the effect of the grad on the connection weights, and then the model can't learn the correct weights in layers distant from the output one (where the correct error to be reduced can be computed).

Of course, the error is computed only from the differences between the output of output-neurons and the correct values we know they must provide (it is a supervised learning algorithm):

to Back-propagation
    let error-sample 0
    ; Compute error and gradient of every output neurons
    (foreach (sort output-neurons) outputs [
      ask ?1 [ set grad activation * (1 - activation) * (?2 - activation) ]
      set error-sample error-sample + ( (?2 - [activation] of ?1) ^ 2 )])
      ; Average error of the output neurons in this epoch
    set epoch-error epoch-error + (error-sample / count output-neurons)
    ; Compute gradient of hidden layer neurons
    ask hidden-neurons [
      set grad activation * (1 - activation) * sum [weight * [grad] of end2] of my-out-links ]
    ; Update link weights
    ask links [
      set weight weight + Learning-rate * [grad] of end2 * [activation] of end1 ]
    set epoch-error epoch-error / 2
end

With this individual learning procedure we can now code the training procedure that will learn from all the samples from the set. It will try with all the samples only once (one epoch, in the ANN terminology), so it can be called from a from a Forever button, for example.

During the training, the procedure will plot the error reached in the current epoch, so we can evaluate if the network is learning in a correct way. Observe that this will depend on the complexity of the function, as well as the structure of the network (maybe it is too simple for the function) and the sample-set generated for the training process.

to train
    set epoch-error 0
    ; For every trainig data
    foreach data-list [
      ; Take the input and correct output
      set inputs first ?
      set outputs last ?
      ; Load input on input-neurons
      (foreach (sort input-neurons) inputs [
        ask ?1 [set activation ?2] ])
      ; Forward Propagation of the signal
      Forward-Propagation
      ; Back Propagation from the output error
      Back-propagation ]
    plotxy ticks epoch-error ; Plot the error
    tick
end

After the training you can test if the network has approximated the function by trying with random inputs and using the correct value that you can compute with the real function (it is an advantage to use toy models and not real ones, where usually we have no idea about the "real" function). The next procedure may help you with the test, it will create a random input, and will return a pair [correct-ouput network-output] (network-output will be the continuous activation of output-layer):

to-report  test
    let inp n-values Neurons-input-layer [one-of [0 1]]
    let out (list evaluate Function inp)
    set inputs inp
    active-inputs
    Forward-Propagation
    report (list out [activation] of output-neurons)
end
    
; Activate input neurons with read inputs
to active-inputs
    (foreach inputs (sort input-neurons ) [
       ask ?2 [set activation ?1]])
    recolor
end

If you want, you can prepare a procedure to repeat some tests and calculate the average error of the set:

to-report multi-test [n]
    let er 0
    repeat n [
      let t test
      set er er + ((first first t) - (first last t)) ^ 2]
    report er / n
end

Finally, we provide the test functions you can find in the NetLogo Model:

to-report evaluate [f x]
    report runresult (word f x)
end
    
to-report Majority [x]
    let ones length filter [ ? = 1] x
    let ceros length filter [ ? = 0] x
    report ifelse-value (ones > ceros) [1] [0]
end
    
to-report Even [x]
    let ones length filter [? = 1] x
    report ifelse-value (ones mod 2 = 1) [1] [0]
end

If you want to add any other function, simply add the code as a report and add the name of the function to the Chooser control (Function) in the interface (NetLogoWeb doesn't allow runresult on strings, for that reason, the version is running in the web here is slightly different in this respect).

In this link you can play with a web version of the model:

Note that it is running on NetLogoWeb, so it is much more slower than the normal version, that you can find here.

While running you can play aorund with all the parameters and trying to discover how they affect the quality of the solution (remember to repeat the experiments was the training sample-set is random). Maybe it is of interest trying to change the number of neurons in the hidden layer... and think about this affect the two test functions of the model.

To know more...

Wikipedia Neural Networks

Michael Nielsen's Neural Networks

The Nature of Code (Ch. 10)

ANN for Beginners

NetLogo Book

« Local Search Algorith… « || Inicio || » Self Organizing Maps … »