Abstract

Brain-inspired, artificial neural network approach offers the ability to develop attractors for each pattern if feedback connections are allowed. It also exhibits great stability and adaptability with regards to noise and pattern degradation and can perform generalization tasks. In particular, the Bidirectional Associative Memory (BAM) model has shown great promise for pattern recognition for its capacity to be trained using a supervised or unsupervised scheme. This paper describes such a BAM, one that can encode patterns of real and binary values, perform multistep pattern recognition of variable-size time series and accomplish many-to-one associations. Moreover, it will be shown that the BAM can be generalized to multiple associative memories, and that it can be used to store associations from multiple sources as well. The various behaviors are the result of only topological rearrangements, and the same learning and transmission functions are kept constant throughout the models. Therefore, a consistent architecture is used for different tasks, thereby increasing its practical appeal and modeling importance. Simulations show the BAM's various capacities, by using several types of encoding and recall situations.

1. Introduction

Being able to recognize and recall patterns of various natures and in different contexts is something that human beings accomplish routinely and with little effort. But these tasks are difficult to reproduce by artificial intelligent systems. A successful approach has consisted of distributing information over parallel networks of processing units, as done in biological neural networks. This brain-inspired, artificial neural network approach offers the ability to develop attractors for each pattern if feedback connections are allowed (e.g., [1, 2]). It also exhibits great stability and adaptability with regard to noise and pattern degradation, and can perform generalization tasks. In particular, the Bidirectional Associative Memory (BAM) model has shown great promise for pattern recognition for its capacity to be trained using a supervised or unsupervised scheme [3]. Given bipolar colon-matrix pairs, , BAM learning is accomplished with a simple Hebbian rule, according to the equation: In this expression, and are matrices that represent the sets of bipolar pairs to be associated, and is the weight matrix. To assure perfect storage and retrieval, the input patterns needed to be orthogonal (, with typically the identity matrix; [4]). In that case, all the positive eigenvalues of the weight matrix will be equal and, when an input is presented for recall, the correct output can be retrieved. For example, if the input represents the encoded pattern , then the output will be given by The output will thus correspond to the encoded stimulus. Equation (1.1) uses a one-shot learning process since Hebbian association is strictly additive. A more natural learning procedure would make the learning incremental but, then, the weight matrix would grow unbounded with the repetition of the input stimuli during learning. In addition, the eigenvalues will reflect the frequency of presentation of each stimulus. This property may be acceptable for orthogonal patterns, but it leads to disastrous results when the patterns are correlated. In that case, the weight matrix will be dominated by its first eigenvalue, and this will result in recalling the same pattern whatever the input. For correlated patterns, a compromise is to use a one-shot learning rule to limit the domination of the first eigenvalue, and to use a recurrent nonlinear transmission function to allow the network to filter out the different patterns during recall. Kosko’s BAM effectively used a signum transmission function to recall noisy patterns, despite the fact that the weight matrix developed by using (1.1) is not optimal. The nonlinear transmission function usually used by the BAM network is expressed by the following equations: where is the signum function and is a given discrete time step. Initially, BAMs had poor storage capacity, could only store binary patterns, and were limited to static inputs. Nowadays, storage and recall performance have much improved; BAMs can learn and recall binary patterns with good performance (e.g., [520]), and that capability has been extended to real-valued patterns (e.g., [7, 8, 2124]). Attention has also been given to learning and retrieving temporal sequences (see [25] for a review). These multistep associations have been studied essentially with gradient descent techniques (e.g., [26, 27]), competitive techniques (e.g., [28]) or complex architectures (e.g., [29, 30]). Few authors have studied this type of association in the case of simple BAM networks (e.g., [7, 3133]).

In this paper, focus will be drawn on BAM properties for different kinds of associations. At first analysis of the transmission function will be studied both analytically and numerically. It will be followed by simulations of the classic case of one-to-one association. The next section will present many-to-one association that allows mean extraction from the presentation of different exemplars. Then, we focus on temporal association. In this case, both limit cycle and steady-state behaviors will be presented. Finally, more complex architectures will be explored by using the Multidirectional Associative Memory (MAM) architecture while keeping the same learning and transmission functions.

2. Analysis of the Transmission Function

This section describes how the transmission function is derived from both analytical and numerical results of the one-dimensional cubic function.

2.1. One-Dimensional Symmetric Setting

The transmission function used in the model is based on the classic Verhulst equation [34]. Since, this quadratic function has only one stable fixed-point, it is extended to a cubic form described by the dynamic equation where represents the input value, and is a free parameter that affects the equilibrium states of (2.1). Figure 1 illustrates its shape for . The fixed-points are the roots of . For example, if the corresponding fixed-points are , 0, and 1. The stability properties of the fixed-points are determined by computing the derivative of (2.1): If the derivative at a given fixed-point is greater than zero, then a small perturbation results in growth; else, if the derivative is negative, then a small perturbation results in decay. The first situation represents an unstable fixed-point whereas the second represents a stable one. In the previous example, both and are stable fixed-points while is an unstable fixed-point. This is illustrated in Figure 1 by filled and empty circles, respectively.

Another way to visualize the dynamics of a recurrent neural network is based on the physical idea of energy [1]. The energy function, noted , can be defined by where the negative sign indicates that the state vector moves downhill in the energy landscape. Using the chain rule, it follows from (2.3) that Thus, decreases along trajectories or, in other words, the state vector globally converges towards lower energy states. Equilibrium occurs at locations of the vector field where local minima correspond to stable fixed-points, and local maxima correspond to unstable ones. To find them, we need to find such that The general solution is where is an arbitrary constant (we use for convenience). Figure 2 illustrates the energy function when . The system exhibits a double-well potential with two stable equilibrium points ( and ).

The parameter plays an important role in determining the number and positions of the fixed-points. Figure 3 illustrates the situation. For less than zero, the system has only one (stable) fixed-point, ; for greater than zero, there are three fixed-points: and , of which the first one is unstable and the other two are stable. Finally, for equal to zero, we have a pitchfork bifurcation point. We deduce from the preceding that we must have for the system to store binary stimuli.

2.2. M-Dimensional Symmetric Setting

In a -dimensional space, (2.1) takes a vector form and becomes where, , and and is the network’s dimension. When the weight matrix is diagonal, the system is uncoupled and the analysis is straightforward. As in the one-dimensional system, the fixed-points are defined by the roots of (2.7). Their stability properties of each one are determined by finding the eigenvalues of its Jacobian matrix. Depending on the eigenvalues, different types of fixed-point behaviors are obtained. For example, if there will be nine fixed-points as illustrated in Figure 4. Four of them are stable: , , , and ; they are indicated by filled circles. The other five are unstable and indicated by empty circles. Here also, the stability of the fixed-point can be determined from the energy of the system. In matrix notation, the energy function is In the case of bipolar pattern and if is set to , then the energy of the system will be equal to .

The function is plotted in Figure 5 for a two-dimensional system, using the previous values of . The stable fixed-points correspond to the local minima in the plot and they partition the recall space is into four equal wells. For example, if the desired correct output (fixed-point) is , the probability that this fixed-point attracts a uniformly distributed pattern whose elements is 25%.

2.3. Numerical Approximation Using Euler’s Method

From the previous analysis, two stable fixed-points exist when is positive. For simplicity, if all the element of are set to one, then (2.7) becomes This last differential equation can then be approximated in discrete time by using the Euler method: where is small positive constant and is a given discrete time step. Rearranging the terms yields However, as it is, (2.11) does not reflect the impact of weight connections. To take into account that the connection strength between units is modified by a learning rule, we pose , where represents the weight connections. Equation (2.11) then becomes This last function is illustrated in Figure 6. As the figure shows, , then will have the same value of 1.

The slope of the derivative of the transmission function will determine its stability. In our case, it is desired that the fixed-points have a stable monotonic approach. Therefore, the slope must be positive and less than one [35]: This condition is satisfied when for bipolar stimuli. In that case, . Another way to analyze the behavior of the network for the whole range of is by performing a Lyapunov Analysis. This analysis will allow us to discriminate between aperiodic (chaos) attractor from periodic one. The Lyapunov exponent for the case of a one-dimensional network is approximated by where is the number of network iterations, set to 10,000 to establish the approximation. Again, for simplicity, we perform the analysis in the case of independent units. In other words, the derivative term is obtained from (2.13) when (for simplicity), so that is given by In order to estimate the range of values for a given period, a bifurcation diagram can be performed. Figure 7 illustrates both Lyapunov and bifurcation analysis and shows that for values of less than 1.0, the network converges to a fixed-point attractor. However, at higher values (e.g., 1.6), the network may converge to an aperiodic attractor.

2.4. Numerical Approximation Using Forth-Order Runge Kutta’s Method

Another numerical approximation of the network’s dynamics is by using the Forth-Order Runge Kutta (FORK) method. Contrarily to Euler’s method, FORK uses an average estimation to approximate the next time step where is a small approximation parameter. Again, to take into account weight connections, we pose that . Equation (2.16) thus becomes This last function is illustrated in Figure 8. As the figure shows, if , then will have the same value of 1. Again, like any nonlinear dynamic system, to guarantee that a given output converges to a fixed-point , the slope of the derivative of the transmission function must be positive and less than one [35]: This condition is satisfied when for bipolar stimuli. In that case, . Here also, bifurcation and Lyapunov exponent diagrams were performed. Figure 8 shows that if the value of is lower than 1.7, the network will converge to a fixed-point. However, if the value is too high then a given input will converge to an aperiodic attractor.

2.5. Performance Comparison between Euler and FORK Approximations

The purpose of this simulation was to compare the performance between Euler and FORK approximations. Although FORK gives a more precise approximation of ordinary differential equation (2.7), we need to evaluate if a better approximation translates into a better radius of attraction.

2.5.1. Methodology

A low and a high memory load was used as a basis of comparison. For the low memory load situation, the first 10 correlated patterns were associated together . The patterns are pixels images of alphabetic characters where a white pixel is given the value −1 and a black pixel the value 1. This led to a memory load equal to 20% (10/49) of the 49-dimensional space capacity. Normally, such a high load value cannot be handled by Kosko’s BAM which is about 14%. For the high memory load situation, the associations were extended to all the 26 correlated patterns . This last situation led to a memory load equal to 53% (26/49). Usually, around 50% is about the maximum load an optimized Hebbian-type associative memory can stored without major performance decline [36]. Figure 9 illustrates the stimuli used for the simulation. The images were converted to vectors of 49 dimensions before being input to the network.

Both methods were tested with their output parameter set to 0.05 and 0.025. These values are much lower than the theoretical limit that was found analytically. The simulation results (not reported) show that a higher value makes the model unable to associate the patterns. The associations between patterns were accomplish using the learning rule (3.4) and (3.5) that will be described in the next section.

After associating the desired pattern pairs, the radiuses of attraction obtained by Euler and FORK were evaluated for noisy recall tasks. The first task consisted of recalling noisy patterns obtained by generating random normally distributed vectors that were added to the patterns. Each noisy pattern vector was distributed with a mean of 0 and a standard deviation of (where represents the desired proportion of noise). For the simulation, varied from 0.1 to 1.0. The second task was to recall the correct associated stimulus from a noisy input obtained by randomly flipping pixels in the input pattern. The number of pixel flips varied from 0 to 10, thus corresponding to a noise proportion of 0 to 20%.

2.5.2. Results

Figure 10 illustrates the various performances. If the memory load is low (10 patterns over 49 pixels) as shown in the left column, then Euler's approximation has a slight advantage over FORK for random pixel flipping. This advantage is increased if the memory load is high (26 patterns over 49 pixels) as shown in the right column. Moreover, the higher the value of the output parameter, the higher the performance will be. However, if the output parameter is set to high, then the network might not converged to a fixed-point. Therefore, the choice for an output transmission will be based on Euler methods.

3. Model Description

3.1. Architecture

The proposed BAM architecture is similar to the one proposed by Hassoun [37] as illustrated in Figure 11, where and represent the initial input-states (stimuli); is the number of iterations over the network; and and are weight matrices. The network is composed of two Hopfield-like neural networks interconnected in head-to-toe fashion. These interconnected layers allow a recurrent flow of information that is processed in a bidirectional way. The y-layer returns information to the -layer and vice versa. If similar patterns are presented at both layers, then the network act like an autoassociative memory and if the patterns at the -layer are associated with different ones in the -layer, then the network acts like a heteroassociative memory [3]. As a result, it encompasses both unsupervised (self-supervised) and supervised learning. In this particular model, the two layers can be of different dimensions and contrary to usual BAM designs, the weight matrix from one side is not necessarily the transpose of that from the other side.

3.2. Transmission Function

Based, on the numerical results of the Section 2.5.2., the transmission function is expressed by the following equations. The input activations to each of the - and -layers are computed as follows: The outputs are then obtained using Euler's approximation (2.12) defined by the following equations: where and are the number of units in each layer, is the index of the respective vector element, and represent the layers’ contents at time , and is a general output parameter. The shape of this function is illustrated in Figure 6. This function has the advantage of exhibiting continuous-valued (or gray-level) attractor behavior (for a detailed example, see [7]) when used in a recurrent network. Such properties contrast with standard nonlinear transmission functions, which only exhibit bipolar attractor behavior (e.g., [3]).

3.3. Learning

The network tries to solve the following nonlinear constraints: where is the transmission function defined previously (3.2). The form of the constraints and the recurrent nature of the underlying network call for a learning process that is executed online: the weights are modified in function of the input and the obtained output. In addition, since incremental learning is favored, the network must then be able to self-converge. As a result, the learning is based on the time difference Hebbian association [7, 3840]. It is formally expressed by the following equations: where represents the learning parameter. In (3.4) and (3.5), the weight updates follow this general procedure: first, initial inputs and are fed to the network, then, those inputs are iterated t times through the network (Figure 1). This results in outputs and that are used for the weight updates. Therefore, the weights will self-stabilize when the feedback is the same as the initial inputs ( and ; in other words, when the network has developed fixed-points. This contrasts with most BAMs, where the learning is performed solely on the activation (offline). Learning convergence is a function of the value of the learning parameter . For simplicity, we assumed that both input and output are the same . Therefore, to find the maximum value that the learning parameter can be set to we need to find the derivation of learning equation when the slope is positive and then solve it for   [35] If and in the case of a one-dimensional pattern in a one-dimensional network, the situation simplifies to In this case, the network will converges when and the solution will be Just as any network, as the network increases in dimensionality, must be set at lower values. Therefore, in the case of BAM of and dimensions, the learning parameter must be set according to

4. Simulations

The following simulations will first provide a numerical example to illustrate how the learning and recall are performed using a small network. The next simulation then reproduces a classic BAM one-to-one association’s task. Finally, the third simulation extends the association property of the model to many-to-one association’s task.

4.1. Numerical Example

The first simulation uses a toy example to show in details how the learning and recall processes are performed.

4.1.1. Learning

For this simulation the transmission function parameter () was set to 0.1 and the learning parameter () was set to 0.01. Also, to limit the simulation time, the number of output iterations before each weight matrix update was set to for all simulations. Learning was carried out according to the following procedure:

Weight initialization at zero;(1)random selection of a pattern pair,(2)computation of and according to the transmission function (3.2),(3)computation of the weight matrix ( and ) update according to (3.4) and (3.5),(4)repetition of steps (1) to (3) until the weight matrices converge.

The stimuli pairs for the simulation are given by

First Learning Trial
Assume that the second pair is randomly selected. Then, and (3.2) yield zero-vectors for and since the weight connections are initially set at zero: Using (3.4) and (3.5), the weight matrices are updated and their values are

Second Learning Trial
Assume that the first pair is randomly selected. Therefore, Using (3.2) we compute y(1) and x(1) to get Using (3.4) and (3.5), the weight matrices are updated and their values are now Notice that the weight matrices are not the transposed of each other like Kosko’s [3].

Learning Trial
If the process is iterated over and over, (e.g., 100 learning trials) the weight matrices converge to

4.1.2. Recall

To test if the network can effectively recall a given pattern, a pattern is selected and it is iterated through the network until stabilization. For example, if the following pattern is presented to the network (noisy version of pattern 2) the different state-vectors will be: The output of the second layer will give Then, this output is sent back to the first layer: The cycle will be repeated until convergence Therefore, the new input is classified as part of the second pattern pair. The next section presents a classic example of one-to-one association.

4.2. One-to-One Association

The task of the network consists of associating a given picture with a name. Therefore, the network should output from a picture the corresponding name, and from the name the corresponding picture.

4.2.1. Methodology

The first stimulus set represents 8-bit grey-level pictures. Each image has a dimension of and therefore forms a 912-dimensional real-valued vector. They are reduced size versions of the California Facial Expressions set (CAFE, [41]). Each pixel was normalized to values between −1 and 1. The second set consists of 4-letter words on a grid that identify each picture. For each name a white pixel is assigned a value of −1 and a black pixel a value of +1. Each name forms a 217-dimensional bipolar vector. Therefore, the weight matrix has connections and the weight matrix connections. The network task was to associate each image with its corresponding name as depicted in Figure 12. The learning parameter () was set to 0.0025 and the output parameter() to 0.1. Both values met the requirement for weight convergence and fixed-point development. Since the model’s learning is online and iterative, the stimuli were not presented all at once. In order to save computational time, the number of iterations before each weight update was set to . The learning followed the same general procedure described in the previous simulation:(1)initialization of weights to zero,(2)random selection of a pair following a uniform distribution,(3)stimuli iteration through the network according to Equations (3.1) and (3.2) (one cycle),(4)weight update according to Equations (3.4) and (3.5),(5)repetition of 2 to 4 until the desired number of learning trials is reached ().

4.2.2. Results

It took about 12 000 learning trial before the learning converged. The network was able to perfectly associate a name with the corresponding picture, and vice versa. Table 1 to Table 4 illustrates some examples of the noise tolerance and pattern completion properties of the network. More precisely, Table 1 shows how the network was able to recall the proper name under a noisy input. The input consisted of the first picture contaminated with a noise level of 22%. In other words, 200 pixels (out of 912) were randomly selected and their values were multiplied by −1.

In addition, by the bidirectional nature of the network, it is also possible to not only recall the appropriate name but also to clean the noisy input. Table 2 shows an example of pattern completion. In this case the eye band consisted of pixels of zero value. The network was able to correctly output the name and restore the missing eyes. The network can also output a picture given an appropriate noisy name. For example, Table 3 shows that the network was able to recall the appropriate name even though the input name was incorrect (Kate instead of Katy). Finally, Table 4 shows that since “Pete” is the only name that begins with a “”, then only the first letter is necessary for the network to output the correct face. The general performance of the network has been evaluated by Chartier and Boukadoum [7]. The next section extends the simulations by introducing many-to-one associations.

4.2.3. Transmission Function with Hard Limits

For some applications (e.g., [7]), It is desired that the transmission function lies within a fixed range of values. Saturating limits at −1 and 1 can be added to (3.2) and the transmission function is then expressed by the following equations: In contrast to a sigmoid transmission function, this function is not asymptotic at the ±1 values, and it still has the advantage of exhibiting continuous-valued (or gray-level) attractor behavior when used in a recurrent network. Figure 13 illustrates the shape of the transmission function when = 0.5.

We compared the same network with and without hard limits on the same task (Figure 9) previously described in Section 2.5.1. The performance of the network was evaluated with transmission function () values between 0.05 to 0.3; values outside this range gave worst results. In addition, the network was compared with the best results ( = 0.05) obtained from the simulation illustrated in Figure 10 when no hard limits was used. After the association of the desired pattern pairs, radius of attraction of the network was evaluated under noisy input obtained by randomly flipping pixels in the input pattern. The number of pixel flips varied from 0 to 10.

Figure 14 illustrated the various performances. The results show that under low to medium noise, the network will have the best performance (about 10% increases) if no hard limits are used. However, under medium-to-high level of noise the situation is reverse; the network will have the best performance (about 5% increases) if hard limits are used ().

4.3. Many-to-One Association

This simulation illustrates many-to-one association. The idea is to associate different emotions depicted by different peoples to the proper name tag. Therefore, upon the presentation of a given images, the network should output the corresponding emotion. The simulation is based on Tabari et al. [42].

4.3.1. Methodology

As in the previous section, the first stimulus set represents 8-bits grey-level pictures. Each image has a dimension of and therefore forms a 912-dimensional real value vector. They are reduced size version of the California Facial Expressions sample (CAFE, [41]). Each image pixels was rescaled to values between −1 and 1. For each of the 9 individuals 7 images reflect a given emotion (anger, disgust, fear, happy, maudlin, neutral, and surprised). The second set consists of letters place a grid that identify each emotions (A, D, F, H, M, N, and S). For each letter a white pixel is assigned a value of −1 and a black pixel a value of +1. Each name forms a 49-dimensional bipolar vector. Therefore, the weight matrix has connections and the weight matrix connections. The network task was to associate each emotion, expressed by different person, with the corresponding letter (Figure 15). was set to 0.0025 and to 0.1. Both values met the requirement for weight convergence and fixed-point behavior. The learning procedure followed the general procedure expressed previously.

4.3.2. Results

Once the learning trials were finished (about 1000 epochs), the network was able to correctly recall each emotion from the different people expressing it. As in one-to-one association, the network was able to removed the noise and correctly recall the appropriate emotion tag. Table 5 shows two examples of noisy images that were contaminated by 200-pixel flips (22%). Table 6 shows that the network recalled the appropriate emotion tag even in the absence of picture parts that are usually deemed important for identifying emotions, that is, the eyes and the mouth.

However, the most important property in this context is the fact that the two weight matrices are dynamically linked together. As was shown before, the network will output the corresponding letter given a face, but which face should the network output for a given letter? In this case, going from the letter layer to the picture layer implies one-to-many association. Without an architecture modification that can allow contextual encoding [7], the network will not be able to output the correct pictures. Rather, the network will average all the people emotional expression. Therefore, as shown in Figure 16, the network is able to extract what features in the images make each emotion different.

5. Model Extension to Temporal Associative Memory

Until now, the association between the different stimuli is static. In some situations, the network can also perform multistep pattern recognition [43]. Such is the case when the output is a time-series. To allow such encoding, the network’s architecture is be modified.

5.1. Architecture Modification

Figure 17 shows that, instead of having two heteroassociative networks connected together as in the Okajima et al. model [44], the network is composed of a heteroassociative and an autoassociative layer. The heteroassociative layer is used to map a given pattern to the next layer, whereas the autoassociative layer acts like a time delay circuit that feeds back the initial input to the heteroassociative layer. With this delay, it becomes possible for the overall network to learn temporal pattern sequences. Figure 17 illustrates the difference between a temporal associative memory and a bidirectional associative memory.

The initial value of the input pattern to the heteroassociative part is , and its output, , feeds the autoassociative part as . The latter yields an output, , that is fed back into the heteroassociative part as well as into the autoassociative part. Thus, the autoassociative part serves as a context unit to the heteroassociative part. These two networks work in synergy to allow the necessary flow of information for, as our simulations will show, multistep pattern recall while still being able to filter noise out of the inputs.

5.2. Simplification of the Learning Function

In the case of autoassociative learning, the function expressed by (3.5) can be simplified by using the fact that the weight matrix is then square and symmetric. The symmetry property has the effect of canceling the cross terms in (3.5), and the autoassociative learning function becomes where represents the connections weight matrix. The learning rule is then a sum of a positive and a negative Hebbian term [7, 38].

5.3. Simulation 1: Limit Cycle Behavior

This simulation illustrates how the network can perform temporal association. The task consists of learning different planar rotation of the same object.

5.3.1. Methodology

Five different pattern sequences must be learned. Each sequence is composed of the same image with a 45 degrees planar rotation. Table 7 illustrates the five patterns sequences. The network has to associative each pattern of a sequence with the following one. In other words the network will associate the 45° with 90°, the 90° with 135°, the 135° with 180°, the 180° with the 225°, the 225° with 270°, the 270° with the 315°, the 315° with the 0°, and the 0° with the image. Therefore, the steady-state of the network should be a limit cycle of period 8. Each binary stimulus was placed on a grid, where white and black pixels were assigned −1 and 1 values, respectively. The free parameters were set to and , in accordance to the specification given in Chartier et al. [8].

5.3.2. Results

The network took about 2000 learning trials to converge and was able to correctly learn the 5 pattern sequences. It was also able to operate with noisy inputs and perform some generalization. For instance, Table 8 illustrates how the network was able to remove noise from the “fish” image as the temporal sequence was recalled. Table 9 shows that the network can generalize its learning to use similar objects. In this case a new “butterfly” picture was used as the initial input. As the stimulus iterates through the network, the network recalled the original “butterfly” depicted in Table 7 (the butterfly output at step 8 is different from the original one at step 0). Finally, Table 10 shows that the network can correctly output the image sequence under small planar rotation variations of the initial input image. In this particular example the new “dinosaur” picture represents a 275° rotation (270° + 5°). Although that particular rotation was not part of initial sequence set, the network was able to correctly recall the appropriate picture sequence.

5.4. Simulation 2: Fixed-Point Behavior

Again, this simulation illustrates how the network can perform temporal association. The task is to learn different planar rotations of the same object. Contrarily to the previous simulation, once the network has been through every planar rotation it converges to a fixed-point.

5.4.1. Methodology

Five different pattern sequences must be learned. Each sequence is composed of the same image with a 45° planar rotation. Table 11 illustrates the five patterns sequences. The network must associative each pattern of a sequence with the following one. In other words the network will associate the 45° with 90°, the 90° with 135°, the 135° with 180°, the 180° with the 225°, the 225° with 270°, the 270° with the 315°, the 315° with the 0° and the with the Image. This last association will creates a fixed-point attractor that can then be used for other types of association in more complex architectures as will be shown in the next section. Each binary stimulus was placed on a grid, where white and black pixels were assigned a −1 and 1 value, respectively. The free parameters were set to and , in accordance to the specification given in [8].

5.4.2. Results

The network took about 2000 learning trials before convergence occurred, and it was able to correctly learn the 5 pattern sequences. The same examples for noise tolerance and generalization were used to compare the network performance between the limit cycle and the fixed-point condition. Table 12 shows that the network can eliminate noise as the sequence is recalled. Table 13 shows learning generalization to other similar objects. After 9 time steps, the network recalled the “butterfly” that is depicted in Table 11. Finally, Table 14 shows that the network can also output the correct image sequence under small initial variations of planar rotation. In this particular example the “dinosaur” picture represent a 275° rotation as in the previous section.

6. Multidirectional Associative Memories

Sometimes association must be generalized to more than two directions. To deal with multiple associations, Hagiwara [45] proposed a Multidirectional Associative Memory [MAM]. The architecture was later extended to deal with temporal sequences [46]. This section shows some properties of a MAM that is composed of one temporal heteroassociative memory and one bidirectional associative memory. As in the previous simulations, only the network topology is modified as Figure 18 shows; the learning and transmission function remain the same. In this simulation the information received by the y-layer at time consist of and . The feedback sent by is useful only once the network reaches steady-state, the last pattern of the stimuli set. Therefore, to minimize its effect during the temporal recall, the feedback value of is lower than . This is formally expressed by where , and represent the weight connections linking and , respectively. Taken together, and form the weight matrix.

6.1. Methodology

The simulation performs the multistep pattern recognition described in the previous section in combination with a standard one-to-one association. The one-to-one association is performed only when the network is at a given fixed-point. In other words the association will occur only when the output is at a 0° planar rotation for the recalled sequence (Table 11). The association is made between the given picture and the first letter of its name as illustrated in Figure 19. The learning and transmission function parameters were set to and . The z-layer feedback parameter was set to .

6.2. Results

Of course, if the temporal or bidirectional associative sections are independently activated, the resulting network performance will be the same as the previous sections. What matter in this case is the output state of the network . The best case is when a pattern is presented to the network that is closed to its fixed-point behavior (315°). The effect of the feedback coming from -layer will be limited. On the opposite, the worst case is when a pattern is presented with a 45° planar rotation. In this case, it will need 8 time steps before the output vector is at its fixed-point behavior. From this point on, the picture identification process can occur. If the feedback coming from the z-layer is too strong, it could deviate the output vector’s trajectory. This could make the network converge to the wrong fixed-point and lead to an incorrect name tag. Because of this, the feedback value of the z-layer must be low.

Table 15 shows an example of the “runner” given an initial 45° planar rotation. After one time step the network correctly output the 90° picture (Out 1). Of course, the name tag is incorrect (Out 2). Now the new input will correspond to the output of the layer (Out 10) and the feedback given by the z-layer (Out 2 not shown). The overall effect will be Out 1 + Out 2 and will be used as the new input. The process is repeated until convergence. After 9 time steps, Table 15 shows that the correct 0° “runner” picture is output as well as the correct name tag letter “”.

7. Conclusion

Several interesting properties of a BAM architecture have been shown. First, it was shown that a simple time-delay Hebbian learning can perform one-to-one association and many-to-one association with both binary and real-valued pattern. In addition, the BAM can be modified in such a way that both heteroassociation and autoassociation can be accomplished within the same architecture. In this case, the autoassociative part act like a time delay and the overall network can be used for temporal association. A simple case of one-step delay was shown. However, by adding l extra layers of autoassociation, the network can be easily modified to handle l-step delays. Finally, if a combination of temporal and bidirectional associations is grouped into a multidirectional associative memory more complex behaviors can be obtained.

In all cases, the same learning and transmission function are used. The only modification concerned the network topology. This property gives the network a high internal consistency. More complex architectures are possible, but a really powerful implementation improvement would be to develop an algorithm that guides the architecture growth based on the problem to be solved. Therefore, the architecture could be modified in function of the task to be performed and the desired behaviors, using BAMs as building blocks.

Acknowledgment

This research was supported in part by Natural Sciences and Engineering Research Council (NSERC) of Canada.