Hopfield/Brody mouse mus silicium

Description of my entry to competition B
by Seb Wills, 11th December 2000

This document describes my winning entry to a competition set up by John Hopfield and Carlos Brody to design a neural network which can recognise spoken words. Details of the competition and its winners are here. The mus silicium site gives the background to this competition, and lists the rules and constraints under which competition entries had to be created.

The underlying principle on which the "brain" operates is the one deduced by the Inference Group, and which was the winning entry to competition A. It is described in detail here. In summary, a spoken word is recognised by the presence of the word's features (e.g. sequences of pitch) in their correct temporal relationships. The neural network recognises the word by the coincidence of the firing rates of a set of neurons whose firing rates decay linearly over time, at different rates of decay. For example, if event 1 occurs at an early time during the word and event 2 occurs at a later time, then we use a neuron which begins firing at event 1 and whose firing rate decays slowly, and a second neuron which begins firing at event event 2 whose firing rate decays quickly. At some time after event 2, the rates will be equal. In this way, the firing rates of a set of neurons can be made to coincide at a designated "end of word" time, if and only if the input sound presented to the network (i.e. the times at which each event occurs) is similar to the one used when selecting the set of decay rates. This scheme also has the property that the word will still be recognised if it is spoken quickly or slowly compared to the training example. (see diagrams on this page)

In the mus silicium competition, there are 40 features of the sound which are detected, which I call "event channels" (they are a mixture of "on","off" and "peak" events for a variety of frequency bands of the sound file). For each of these we have at our disposal 20 area A neurons, each of which outputs a steady current decaying with a different rate, when triggered by their corresponding feature. A simplified diagram showing 4 event channels and 10 "time channels" is shown below:


Layout of area A neurons

In my version of the mouse brain, we use 800 area W neurons (400 excitatory (alpha) neurons and 400 inhibitory (beta) neurons). We preserve the tonotopic mapping which is found in area A, i.e. the 400 alpha (or beta) neurons are arranged in a grid with a row per event channel. Because there are 40 event channels, this leaves 10 neurons in each row (duplicated for the beta neurons). The network is required to learn to recognise 10 different input patterns (spoken words). The training algorithm used allocates one column in the aforementioned grid to each training pattern:


Layout of area W neurons

For a given event channel and training pattern, we therefore have a single pair of area W neurons to play with. The alpha neuron is connected to the output of the area A neuron which, if the input to the network was the training pattern in question, would have its output most closely matching a predetermined "coincidence current" at a predetermined "coincidence time". The coincidence current and time are determined by the coincidence of the fastest decay rate triggered by the last event in the pattern and the slowest decay rate triggered by the first event in the pattern. The beta neuron corresponding to each alpha neuron has no inputs except a strong connection from its partner alpha neuron, which ensures that whenever the alpha fires, the beta also fires.

With the connections described so far, if the network was presented with one of the training patterns, the column of area W corresponding to that training pattern would contain neurons which, at a particular "coincidence time" shortly after the end of the word, would all be firing with the same rate. However, this is not sufficient for the network to recognise the word (i.e. for a gamma neuron with certain connections from area W to fire only in this situation). The additional trick is that the area W neurons which are firing at the same rate need to syncronise not only their firing rate but also their relative phase, i.e. they need to all fire simultaneously. If we then have weak connections from all those area W neurons to a particular gamma neuron, the fact that the neurons are firing simultaneously will add up to a large spike in the gamma neuron's total input, causing it to fire in syncrony with the W neurons. When the area W neurons are not firing in syncrony, the input to the gamma neuron is distributed randomly over time, and is therefore not strong enough to bring it over threshold.

The simultaneous firing of a set of area W neurons is attained by making lateral connections to each alpha neuron in the set from all the other alpha and beta neurons in the set. Because of the different nature of the post-synaptic currents from the alpha and beta neurons, the overall effect is to advance the phase of an alpha neuron's firing cycle if it is 'late' compared to the other neurons in its set, and to retard the phase if it is 'early'. In other words, in a situation where the natural firing rates of all the neurons in a set are almost equal, they will tend to completely syncronise. The only other connections that are needed are connections from every alpha and beta cell in the set to a gamma neuron which is supposed to recognise the training pattern in question.

To summarise, the 800 neurons available in area W are divided into 10 entirely unconnected sets, each attempting to recognise one of the ten training patterns. Within each set, there are lateral connections to every alpha in the set from every other neuron in the set, and connections from each alpha to its partner beta neuron. There is also a connection from every cell in the set to the one gamma cell which corresponds to the set. The brain is therefore split into 10 entirely separate brains, each trying to recognise one of the training patterns. One would expect this "split brain" architecture to not perform as well as an architecture which distributed the information about each training pattern across the whole network, however experiments carried out in the limited time available showed better performance for the "split brain" model.