Abstract

The episodic memory, the memory of personal events and history, is essential for understanding the mechanism of human intelligence. Neuroscience evidence has shown that the hippocampus, a part of the limbic system, plays an important role in the encoding and the retrieval of the episodic memory. This paper reviews computational models of the hippocampus and introduces our own computational model of human episodic memory based on neural synchronization. Results from computer simulations demonstrate that our model provides advantage for instantaneous memory formation and selective retrieval enabling memory search. Moreover, this model was found to have the ability to predict human memory recall by integrating human eye movement data during encoding. The combined approach between computational models and experiment is efficient for theorizing the human episodic memory.

1. Introduction

In 1982, Marr [1] argued the importance of computational theory for understanding the information processing in the brain and presented “three levels at which any machine carrying out an information-processing task must be understood (p. 25)” as follows.

(i)Computational theory. What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out? (ii)Representation and algorithm. How can this computational theory be implemented? In particular, what is the representation for the input and output, and what is the algorithm for the transformation? (iii)Hardware implementation. How can the representation and algorithm be realized physically?

As an example, consider a brain function of “associative memory of visual stimulus A and B.” In the level of the computational theory, it is asked what relationship between stimulus A and B results in the memory; for example, a correlation coefficient of presentation sequences of A and B will indicate a strength of association between A and B. On the level of representation and algorithm, the visual stimuli can be understood by an -dimensional binary vector pattern where an overlap between stimulus A and B will be an important parameter for their association. A correlation of vector patterns will be represented by a matrix denoting the connection strength between ith and jth units and the matrix will be formed by the Hebb rule with a repetitive presentation of the stimulus. On the level of hardware implementation, it is asked what neuronal activation and dynamics are used for implementing the above algorithm; for example, neuronal synchronization dynamics might play an important role in the synaptic plasticity under the Hebb rule. The above three levels of understanding can be separately considered, while all levels are necessary for a complete understanding of the function of visual associative memory.

In the case of the memory function, the main problem is how to theorize the memory function; for example, a simple record and playback scenario is not perfect and there is a problem on the level of the computational theory. For example, how does the brain organize experiences into memory that can be applicable to novel situations? The models of artificial intelligence focus on the level of computational theory, and the models of neuroscience further address the representation-algorithm level and the implementation level. The final goal is to have a computational theory of the memory function that can be common between artificial intelligence models and neuroscience models, while the neuroscience models are advantageous in the theorization of the memory function in cooperation with experimental evidences.

This paper reviews computational models of human episodic memory that are associated with the personal history and contextual information of the environment. Section 2 summarizes the functional aspects of the episodic memory and the contribution the hippocampus makes to this memory. Section 3 investigates computational models of the hippocampus. Sections 4 and 5 describe our computational model of the human episodic memory and its application to the simulation of the human memory by using eye movement data. Section 6 summarizes the paper and provides future directions.

2. What Is Episodic Memory?

2.1. Episodic Memory in the Hippocampus

The bilateral hippocampal damaged patient H.M. [2] clearly demonstrated a significant role of the hippocampus in the formation of new memories. Patient H.M. had a normal IQ score and normal language skills and procedural memory, while H.M. had great difficulty in recognizing the current location and time (e.g., events where H.M.’s own conduct had occurred several minutes earlier). This kind of memory is categorized as “episodic memory’’ [3] and known to be maintained by the hippocampus. Even if damage to the hippocampus occurs in childhood, patients with damage to the hippocampus show difficulty in the formation and maintenance of the episodic memory [4]. This is one of the reason why the hippocampus is considered an essential structure for maintaining episodic memory.

In 1983, Tulving [5] proposed that the episodic memory can be modeled by an association of information among “what,’’ “where,’’ and “when.’’ In relationship to this proposal, a simplified version of the episodic memory model, an object-place association model, is often used in experiments involving humans [69], monkeys [10, 11], and rats [12]. In a task, participants are asked to remember identities and locations of objects on a table during a short period. After a short delay period, the participants are asked to retrieve identities of the objects and reconstruct the arrangement of the objects. When the hippocampus is damaged, patients have great difficulty in performing such task [69]. This evidence suggests that the hippocampus uses the object-place representation as part of the episodic memory.

Anatomically, the hippocampus is known to receive a convergent projection of the information of object and space through the parahippocampal region [13] (Figure 1). The object information starts from the perbocellar system with color information. It then forms a ventral visual pathway, converges to the perirhinal cortex in the parahippocampal region, and then enters the hippocampus. The space information starts from the magnocellar system with a wide visual field and then forms a dorsal visual pathway, converges to the parahippocampal cortex, and enters the hippocampus. This anatomical structure is reasonable in relationship to the object-place memory of the hippocampus; so the object-place memory paradigm is a good tool for evaluating the neural mechanism of the episodic memory in the hippocampus.

2.2. Neural Dynamics of the Hippocampus

The hippocampus is part of the limbic system and characterized by a closed loop circuit [14] (Figure 1). The cortical input enters from the superficial layer of the entorhinal cortex and is then sequentially transmitted to the dentate gyrus, the CA3 and the CA1 regions, and returns back to the deeper layer of the entorhinal cortex. The hippocampus has been considered to implement an associative memory [15] and the CA3 region including massive recurrent connections is considered to be a major network of the hippocampal memory [16]. These structures are similar between the hippocampus of rodents and primates, so that a common principle of the memory function is expected [17].

In the CA1 and CA3 regions of the rat hippocampus, many neurons were found to show a selective activation during passing through a specific portion of the environment [18]. Such neurons are called “place cells’’ and are also found in monkeys [19] and humans [20]. The hippocampus is known to represent a map of the environment called “cognitive map,’’ and therefore the place cells are considered a neuronal basis of the cognitive map [18]. In the case of monkeys, other neuronal selectivity is further reported. This selectivity is called “view cells’’ and encodes information about the spatial location at which a monkey is looking into the environment [10]. Interestingly, the activation is not determined by a specific visual feature. Thus, this activity is considered a result of the information integration among body motion, head direction, and self-location. Both place cells and view cells are considered to contribute to the spatial navigation in the environment.

In the rat hippocampus, the local field potential of 4–12 Hz (theta-band) oscillations appears prominently during moving in the environment and the place cell firings are known to be synchronized with the local field potential (LFP) theta [21]. Moreover, the phase of the firing with the LFP theta cycle is found to gradually advance as the rat passes through the environment [22]. This phenomenon is called “theta phase precession.’’ Each place cell firings have different phases according to their entering time of the place field, which then results in a sequential place cell firing in a theta cycle that represents a temporally compressed sequence of place field activation [23]. More important, the time difference of the sequential firings agrees with an asymmetric time window of a modified Hebb rule [24, 25]. The firing pattern of the theta phase precession is expected to contribute to the formation of the cognitive map in the hippocampus.

3. Computational Models of the Hippocampus

In this section, we review models of the hippocampus by using a classification with input overlap and the asymmetry of CA3 connection weights (Table 1). The CA3 region has been considered a major region for maintaining the hippocampal memory [16]; so the classification is applicable for many models of the hippocampus. Although each model has its own advantages in specific problems and the use of the CA3 network highly depends on the dynamics of units and other adjacent systems, the classification is meaningful for looking over the function and dynamics of the hippocampus models.

The CA3 region is regarded as a center of the memory function and modeled as an associative network [2629]. In the associative network, multiple vector patterns can be stored into the CA3 connection weights and one of the patterns can be recalled through mutual activation among units. The memory encoding is implemented by the Hebb rule in which the connections between simultaneously activated units increase. The recall is implemented by mutual unit activations through the connections, where the stored vector pattern can be self-organized and completed from an initial activation of a part of the vector pattern. The performance of the pattern completion becomes better when the overlap of the arbitrary pairs of vector patters is small and random. In agreement with this model, an experimental study involving rats demonstrated that the CA3 region is essential for pattern completion [30].

In the above associative network, the connection weights are symmetric, while the connection weights can be asymmetric according to the Hebb rule with an asymmetric time window [25]. Models with asymmetric CA3 connections revealed that a sequence of vector patters can be stored and recalled with mutual unit activations [31, 32]. It is important that these models can deal with the information of the time with asymmetric connections. Moreover, the temporal compression with phase precession has been demonstrated to have an advantage in the sequence memory formation [3335].

In cognitive map theory [18], the map of the environment is represented by a network of place cells, where population activity of the place cells gradually changes as the rat passes through the environment. Such neuronal activation was modeled by a “continuous attractor network’’ [36], where the overlap of the positional input vector is given by a function of spatial geometry (e.g., input vectors of neighboring positions have a large overlap and input vectors of distant positions have a small overlap) and CA3 connections are given by symmetric connections. This model demonstrated that the population activity of place cells representing a location in an environment can be self-organized from an initial state of random unit activation. When asymmetric connections are introduced to the CA3 network, the models are further able to show the ability of spatial navigation where the sequential activation to a goal location is evoked from arbitral location in the environment [3740].

A combination of discrete and continuous input vectors was also used to represent an environment consisting of objects. Rolls et al. [41] proposed a unified network between discrete and continuous attractors, where both discrete and continuous patterns are associated with symmetric connections that implement the pattern completion including both discrete and continuous patterns. Byrne et al. [42] proposed a network model including the medial temporal system and the parietal system where the CA3 region represents both object and space information. Interestingly, this model implements the mental imagery of navigation by integrating movement signals into a proper population activation of place cells. The authors proposed a model of a cognitive map for object-place associations [43]. The overlap of input is similar to the above models, while asymmetric connections according to phase precession are introduced. The model can store multiple object-place associations in a hierarchical structure of this network with asymmetric connections that represent inclusion relationships among visual features. In such a structure, a set of object-place associations can be recalled sequentially.

Let us consider the relationship between the models and the episodic memory. According to Tulving’s proposal [5], the episodic memory is modeled by an association of information among “what,” “where,” and “when.” The models with asymmetric connection with discrete input and continuous input can deal with “when” information. On the other hand, the models with discrete-continuous input can deal with “what-where” associations that are often used as an experimental model of the episodic memory in animals [1012]. In order to understand comprehensively the episodic memory, it seems to be necessary to investigate the integration between “when” and “what-where” in future studies. In that case, the dynamics of phase precession can be a strong candidate for integrating “what-where-when,” because the model already demonstrated to be able to encode each “when” and “what-where” information. In the next section, we review a model of “what-where” association by using theta phase precession.

4. A Computational Model of the Episodic Memory Based on Neural Synchronization

In this section, a computational model of the episodic memory based on neural synchronization of phase precession [43] is reviewed.

4.1. Representation of Object and Scene Information

Figure 2(a) shows the information flow of the model that follows experimental proposals [13, 17]. Retinal information produces two visual pathways that converge on the parahippocampal region, in which the perirhinal cortex receives object information in the ventral visual pathway and the parahippocampal cortex receives space information in dorsal visual pathways. Subsequently, the object and space information converge on the hippocampus that stores object-place associations in the connection weights in the CA3 network.

In the model, a one-dimensional environment with a grayscale pattern including multiple objects with different colors was assumed (Figure 2(b)). The object information is represented by color features at the center of the visual field that produces a discrete vector pattern. The scene information is represented by spatial frequency components of an object-centered gray-scale pattern in a 120 degree-wide visual field that represents a spatially continuous vector pattern. In these representations, the scene information plays an essential role in the binding among multiple object-place associations in an environment; that is, overlap between scene information works as a tab for combining two scenes and their orientation and distance can be obtained by calculating a shift of the two visual patterns (Figure 2(c)).

Multiple object-place associations are encoded by a sequence with “saccadic’’ eye movement where one of the objects is successively caught at the center of the visual field. Since a size of saccade is found less than 10 degree [44], the scene vector pattern will have a large overlap with a subsequent scene vector and the object vector pattern will drastically change at a subsequent saccade. Thus, the eye movement produces a sequence consisting of discrete and continuous vector sequences. It should be noted that the eye movement sequence was assumed to “randomly’’ catch the object and it is not like the scan path theory [45] in which a stereotyped eye movement repeats when seeing a picture.

4.2. Memory Encoding Based on Neural Synchronization

The visual input sequence of object and scene information is stored into the CA3 connection weights by using theta phase precession that has a computational advantage in the encoding of the sequence [34, 35], the spatiotemporal patterns [46], and the map of the environment [40].

In the model, the visual input sequence is translated into a phase precession pattern at the entorhinal cortex, where each neural unit shows an oscillatory activity according to an excitatory visual input and its oscillatory frequency is assumed to gradually increase when receiving a persistently excitatory input [34]. Phase-locking dynamics between the units’ oscillation and a global theta rhythm results in a gradual phase advancement of the units’ oscillation with the theta cycle. The CA3 region receives the pattern that is stored into the CA3 connection weights according to the Hebb rule with an asymmetric time window. The connections between a simultaneously activated object and scene units at each eye fixations can increase, while an additional effect appears in the phase precession; earlier and persistently activated units have activations at later phases and other intermittently activated units can only have activations at earlier phases. Subsequently, the modified Hebb rule with an asymmetric time window results in the formation of asymmetric connections from persistently activated units to intermittently activated units (Figure 2(d)).

The activation duration of each unit can vary according to the eye movement sequence, while on average the larger overlaps of scene input vectors would produce a longer activation of scene units than object units. The resultant network appears to include unidirectional connections from scene to object units from a random eye movement sequence. In cooperation with symmetric autoassociative connections, the network is characterized by a layered structure of symmetric connections and interlayer asymmetric connection from scenes to object units (Figure 2(e)). We refer to the structure as a hierarchical cognitive map for object-place associations [43]. Interestingly, the network represents “inclusion” relationships of visual features in multiple spatial scales and it can be organized in an encoding period of several seconds. This structure is expected to contribute to an efficient memory storage of the global environment, as demonstrated in psychological studies of human cognitive maps [47, 48].

4.3. Memory Retrieval

The hierarchical cognitive map of object-place associations has several advantages for memory retrieval. When the CA3 units of an object-scene association are activated as an initial cue, the units of other associations are automatically activated through the CA3 recurrent connections. Since the network is organized asymmetrically from the top to the bottom layers, the activity propagation accordingly occurs from the top to the bottom layers (Figure 2(f)). During the activity propagation, the asymmetric connections between object and scene units support a synchronized activation between corresponding object and scene units. Then a set of object-place associations is recalled where individual object-place associations appear in a sequence [43]. Such a simultaneous activation of multiple associations is an advantage of the hierarchical network.

When a small part of the hierarchical network is activated as an initial cue, an interesting retrieval appears [49]. By using a global inhibition of the network, initial activation of units at the top layer results in a sequential retrieval of all object-place associations. On the other hand, the activation of units at the middle layer results in a constrained retrieval where multiple associations including the visual feature of the initial cue only are activated (Figure 2(g)). Such a selective retrieval will relate to the memory search, where course scene information can evoke a set of possible object-place associations. In the network, asymmetric connections are formed to represent inclusion relationships of visual features; therefore any initial cue of a partial feature is considered to evoke a set of possible object-place associations. This property is important for understanding the memory search mechanism in the hippocampus that maintains a large memory content.

4.4. Experimental Support for the Model

The model predicts a positive correlation among saccade rate, EEG theta power, and memory recall performance. We have evaluated the prediction by using brain signal analysis of human participants (see [50] for review). In the EEG measurement during object-place memory encoding, the EEG 7.0 Hz power and coherence at central region showed to significantly correlate with subsequent successful recall [51]. The coherence between EEG theta power and saccade rate was also found to correlate with the subsequent successful recall [52]. These results indicate that the EEG theta-related neural dynamics plays an important role in the memory encoding with eye movement.

Moreover, the results of an EEG-fMRI simultaneous measurement showed that scalp EEG theta power during object-place memory encoding is correlated with BOLD responses in the medial prefrontal, medial posterior, and right parahippocampal regions [53]. This result did not show a direct link between the hippocampus and theta dynamics, but it does suggest that the medial temporal memory system, consisting of the hippocampus and the parahippocampal region, uses theta dynamics for memory encoding.

5. Simulation of the Episodic Memory Based on the Computational Model

It has been shown that memory recall performance of object-place associations can be predicted by either EEG theta power [43] or BOLD responses [55] during encoding. This fact leads to the prediction that the computational model integrating experimental data could have an excellent ability in the prediction of a subsequent recall. At the same time, it produces a good validation of the computational model; for example, if the brain really uses the dynamic of this model, then the model should have predictability; otherwise the model will be rejected.

This section now reviews an application of the computational model of object-place associations to the prediction of human subsequent recall by using eye movement data during encoding [54, 56]. In the analysis, the eye movement data of our previous report [51] were used that consists of 350 trials of object-place memory encoding from eleven subjects. During the task, the participants were asked to remember identities and locations of four objects in a grid during 8 seconds. Afterwards, the participants were asked to reconstruct the arrangement of the objects by using a mouse on the display after a 10-second delay period that contains a secondary task of randomly targeted saccades to inhibit the memory rehearsals (data were also used to calibrate eye cameras). Both temporal parameters are sensitive to participants’ correct recall rates, while the temporal parameters were determined to make the correct recall rate at around 50%. Each participant performed 30 trials of the encoding task. Trials which failed to record any eye movement were discarded in the analysis. The interobject saccade rate of remaining data appeared in normal range (579.7 milliseconds) and almost all fixations appeared on each object.

To apply the model to the experimental data analysis, the visual features of the model were adapted to include object shapes used in the experiment and multiscaled receptive fields for location of eye fixation. In the model, a visual input at a fixation location was represented by 9 object units and 36 scene unit activations. A sequence of eye movement was translated to a visual input sequence and is stored into a CA3 connection matrix by using phase precession and the Hebb rule with an asymmetric time window and then connection matrices were varied for trials using identical stimulus (it should be noted that the eye movements were not stereotyped).

In the statistical procedure, the individual correlation coefficient of a predictor and human recall were calculated (Figure 3(a)) and these were averaged over participants. In order to evaluate the importance of the hierarchal structure in the recall prediction, following four computational predictors and three traditional experimental predictors were used. The computational predictors are (1) the connection weight sum, , (2) the asymmetric connection weight sum, , (3) the hierarchical connection weight sum, , and (4) computational recall evoked by an initial input to the top layer, where denotes a CA3 connection weight from the jth to the ith unit and indicates the hierarchy of the ith unit in the hierarchical network. The experimental predictors are (5) blink rate, (6) saccade rate, and (7) EEG 7 Hz power at a central region. The forthcoming results section will discuss the meaning of these predictors in more detail.

The results are shown in Figure 3(b). Only three predictors, the hierarchical connection weight sum ( , ), the computational recall ( , ), and EEG theta power ( , ), were found to significantly correlate with the human recall. This indicates that the computational model receiving eye movement data has similar predictability with the EEG theta power. On the other hand, the experimental predictors of the blink rate ( , ) and the saccade rate ( , ) did not show a significant correlation with the human recall. These results indicate that the computational network somehow extracted a memory-dependent component from the eye movement during encoding. Together with the result of no significant correlation between other computational predictors and the human recall (the sum connection weights, , ; the asymmetric connection weights, , ), the hierarchal structure itself is considered to be an important factor to predict the human recall.

From a computational point of view, the hierarchical connection weight sum can increase when the eye movement occurs to evenly catch neighboring and distant pairs of objects with a saccade interval of more than 250 milliseconds. In order to experimentally evaluate that point, we further tested other experimental predictors (e.g., the variance of fixation duration of individual objects, etc.), while we have not found a suitable experimental predictor (data not shown). It is considered that more complicated memory-related components, such as order of fixated objects, might be extracted by the model. These results suggest that the model dynamics exists in the human brain and work during object-place memory encoding and retrieval.

6. Summary and Future Directions

The computational models of the episodic memory in the hippocampus and a simulation of the human episodic memory based on a computational model are reviewed. The hippocampus has a clear functional role in the episodic memory with a beautiful anatomical organization; thus many models have been proposed. A computational model of the hippocampus based on neural synchronization of phase precession [43] produces neural dynamics of the episodic memory formation that is characterized by the one trial learning of multiple object-place associations and the selective retrieval realizing memory search. The model was further applied to experimental data analysis, where a neural network organized by human eye movement data was found to have the ability to predict human object-place memory recall. This suggests that the model’s dynamics exists in the brain and works during memory encoding and retrieval. This also indicates the importance of bridging between the computational model and experimental studies for theorizing the human episodic memory (Figure 4). In the following section, questions for future research are discussed.

6.1. Neural Mechanism of Memory Retrieval

Section 4.3 identified that the retrieval of the computational network is constrained by the initial cue, while the definition of the cue is a problem for understanding the human episodic memory. The initial cue could be modeled in the context of a situation, such as task demand and intentional effort. Such context information is proposed to be represented in the prefrontal region [57, 58], and the framework of the computational model of the hippocampus is developing to include the prefrontal and other regions. Recent simultaneous recordings of the prefrontal region and the hippocampus in rats are becoming possible [59], and these data give insight to a new framework of the episodic memory.

6.2. Representation of the Episodic Memory

The representation of the episodic memory is still an important issue. In the computational models of cognitive maps in rats, the representation of cortical inputs to the hippocampus has been discussed. Hartley et al. [60] proposed a boundary vector cell (BVC) as a component of the cortical inputs leading the place cell properties. De Araujo et al. [61] proposed an angular combination of visual cues and showed that the size of the visual field is critical for forming the place and view cell properties in rats and primates. Among Marr’s three levels of understanding, the level of representation and algorithm is key to a combination between computational models and experiments. Although there are few computational proposals on the representation of human episodic memory, virtual maze experiments in humans [62, 63] might produce essential data for linking human experience and episodic memory. Moreover, it will be a great step toward understanding the human episodic memory outside of the laboratory.

In addition to the above discussion, it is necessary to integrate “what” information with “what-where” association models as discussed. In Section 4, we reviewed a computational model of object-scene association by using theta phase precession. The model was also shown to be able to encode and recall the temporal sequence through asymmetric connections [34, 35], while it might require some balance in the usage of asymmetric connections for representing both “what” and “what-where” association. Further evaluation is necessary in terms of both representation and dynamics for the comprehensive understanding of the episodic memory.

6.3. Computational Models-Experiments-Combined Approach

The computational models have been applied to experimental data analysis of fMRI measurements. Tanaka et al. [64] applied the temporal difference (TD) learning algorithm to the BOLD signal analysis and detected a topographical map of time scales of reward predictions. Anderson et al. [65] applied their information processing model to the analysis of BOLD responses and evaluated functional roles of their region-of-interests. Section 5 indicated that our computational model of the hippocampus was applied to analyze human eye movement data and showed its prediction ability for human memory recall. The model is also applicable to the brain signal analysis and its performance is now under evaluation. These studies demonstrated the efficacy of computational model-based analyses for understanding system-level brain functions. Recently methods of fMRI signal decoding have been developed to read the perceptual state of an observer [66]. The computational models mentioned in this text should contribute to brain signal decoding to validate the existence of their dynamics within the brain.

Acknowledgment

This study was supported by the MEXT KAKENHI (20220003).