Abstract

In recent years, our research in computational neuroscience has focused on understanding how populations of neurons encode naturalistic stimuli. In particular, we focused on how populations of neurons use the time domain to encode sensory information. In this focused review, we summarize this recent work from our laboratory. We focus in particular on the mathematical methods that we developed for the quantification of how information is encoded by populations of neurons and on how we used these methods to investigate the encoding of complex naturalistic sounds in auditory cortex. We review how these methods revealed a complementary role of low frequency oscillations and millisecond precise spike patterns in encoding complex sounds and in making these representations robust to imprecise knowledge about the timing of the external stimulus. Further, we discuss challenges in extending this work to understand how large populations of neurons encode sensory information. Overall, this previous work provides analytical tools and conceptual understanding necessary to study the principles of how neural populations reflect sensory inputs and achieve a stable representation despite many uncertainties in the environment.

1. Introduction

Our sensory percept and our interaction with the environment arise from neural representations of the external world. An important question is therefore how the characteristics of external events, such as sensory stimuli, are represented by patterns of neural activity in the brain. Answering these questions amounts to determining the neural code [13], more formally defined as the smallest set of response patterns capable of encoding relevant stimulus parameters [4].

Two dimensions of neural representations are important for characterizing a neural code. The first is defined by space: sensory processing is based on spatially distributed populations of neurons, ranging from localized groups to populations of neurons spread across brain areas [5, 6]. The second dimension is defined by time: neuronal responses evolve over time, and the temporal structure of neural activity is often required to explain speeded reactions. Under most circumstances, neglecting the temporal dimension of neural activity results in a much impoverished representation of the sensory input [4, 7].

In this review, we focus on the recent work of our laboratory towards understanding the temporal dimension of neural codes in the auditory system. We first discuss our general mathematical approach, based on the principles of information theory, to evaluate the information content of different components of neural activity. We then discuss how this can be applied to neural data and to understand how auditory cortical neurons encode information about complex naturalistic sounds. In particular, we review our work showing that neural activity is patterned on multiple timescales carrying complementary information, ranging from millisecond precision spike patterns to slower oscillatory patterns.

The analysis of neural activity is a technically challenging problem. A typical analysis of the structure and information content of time-varying spike trains starts by aligning the spikes with sensory events (e.g., with the stimulus onset, or a reference point during the stimulus time course). Then, spike trains are partitioned into representative time intervals. These steps are necessary for even just plotting the data, or for any subsequent analysis, such as those attempting to decode the information carried by the temporal structure of the spike train. The experimenter conducts these procedures using measurements from a laboratory-based computer clock that registers stimuli and neural activity with supreme accuracy. Likewise, if a decoding mechanism in the brain uses information encoded in temporally precise codes, it may be able do so only after obtaining precise knowledge about the timing of sensory events and having access to a representation of time intervals with a reasonable degree of precision.

This raises a crucial question: how can the brain decode the information carried by the temporal variations of neural responses, given that the brain does not have access to the laboratory computer clock with its exact measures of time intervals and time of stimulus presentation [810]? We hence phrase the problem of interpreting and deciphering neural activity in the context of a decoding perspective; hence how a higher-level brain area (or an experimenter) can make best sense of the spiking activity observed in sensory cortices.

In this review, we focus on the problem of how decoders may extract information from spike times using different reference frames. We first describe relevant analytical approaches to address this problem and we then review recent studies investigating intrinsic reference frames derived from local network activity.

2. Information Theoretic Tools Available to Estimate and Compare Different Codes and Reference Frames

To study the role of spike timing in sensory decoding, it is necessary to have quantitative tools to assess the amount of information carried by different putative coding schemes. Shannon information, abbreviated hereafter as information, offers a rigorous measure to compute single-trial stimulus discriminability: where is the joint probability of presenting a stimulus and observing a response and is the probability of observing the response across all stimuli and is the probability of each stimulus. Information quantifies the reduction of uncertainty (i.e., the gain in knowledge) about the stimuli obtained when the neural response of a single trial is observed (averaged over stimuli and responses). It is measured in bits, where one bit of information indicates that on average the uncertainty is reduced by a factor of two. Information provides an upper bound on the amount of knowledge about stimuli that can be extracted by any algorithm extracting knowledge from neural responses. The fact that mutual information quantifies single trial stimulus knowledge is particularly appealing because neural systems usually must discriminate or identify stimuli on a single encounter.

One can evaluate the capacity of different candidate neural codes by computing the information carried by the neural codes based on different response aspects (e.g., the timing or total number of spikes) and defined relative to different reference frames. For example, in previous work, we determined whether precise spike times carry significant information by comparing the information obtained from responses quantified using different timing precisions. We then evaluated the extent to which the information present in spike timing can be decoded based on reference frames intrinsically available to the brain by comparing the information that can be extracted from responses defined using the experimenter’s clock with responses defined using an internal reference.

Information can be calculated by means of the stimulus-response probabilities: , , and . Determining information by means of these probabilities is the so-called direct method for calculating information [3]. In an idealized case, these could be measured precisely. However, for experimental data, these probabilities need to be estimated from limited available data, such as a finite (and often small) number of trials. As a consequence, a systematic error (bias) is present in the estimated probabilities and hence the derived information values. The correction of this bias has been the subject of extensive research (see [11, 12] for reviews). An important consideration is that in general more complex codes defined by many parameters (such as, for example, finely timed sequences of spike times) tend to have a larger upward bias than simpler codes defined by a smaller number of parameters (such as, for example, those based on spike counts or on coarse measures of spike times). This means that, in a naïve analysis, the more complex codes may artificially appear to have higher information than in reality. Considerable work of our group has concentrated on trying to estimate and remove this upward bias as precisely as possible and in developing classes of information estimators that have a tendency for being biased downward (rather than upward) when using more complex codes (see [11, 12] for reviews). These downward biased estimators are important for the questions about the nature of the neural code, as they allow conservative conclusions about the role of spike timing in comparison to methods that may overestimate the significance of a specific code.

When using very high dimensional neural responses, it becomes impossible to correct the bias of the direct information measures defined by (1). Given a typically available number of stimulus repeats (~30–60 trials), this is the case for codes characterized by a cardinality of 50–100 or more elements, corresponding, for example, to either spike time computed in 6-7 or more subsequent small time bins, the population rate of 5 neurons or more, or a mixture thereof. In such cases, information metrics are often computed using an intermediate decoding step. In this approach, the most likely stimulus that elicited a given response is determined using a cross-validated decoding algorithm [13, 14]. Then, the information extracted through the stimulus reconstruction scheme can be quantified as follows [1416]: where in the above is the joint probability that in a trial the decoding procedure reports that the stimulus was presented when the true presented stimulus is . The decoded information quantifies (in bits) the average knowledge gained, per trial, when predicting the stimulus using a specific algorithm and takes into account both the fraction of correct decoding and the spread of the decoding errors.

Information depends on both the choice of the stimulus set and of the quantification of the neural response. Stimulus set here refers to both the stimulus material used for the experimental paradigm (e.g., simple tones versus natural sounds) and how the presented material is grouped or divided into the stimulus dimension used for the information theoretic analysis. In our work, we mostly concentrated on studying how neurons encode time-varying natural sounds or video clips. Such stimulus material is difficult to analyze in terms of sensory coding because they contain many different feature dimensions that vary continuously at different time scales. To create the stimulus set for analysis, we used a feature agnostic approach: we divided the presentation time of the dynamic stimulus material presented in the experiment into different segments of length (a parameter that was varied in the range from few ms to several seconds) and each segment was considered as a different stimulus for the analysis (see schematic in Figure 1). We then computed the information about which stimulus segment elicited the considered response. This procedure has several advantages. The first is that it is simple to apply and it lends itself to comparisons between different experimental datasets and between experimental and theoretical studies. The second is that it does not make any assumption as to which specific features of the dynamic stimulus triggered the neural response and so can potentially capture information about all possible dynamical stimulus features presented experimentally [17].

2.1. Precise Spike Times Encode Stimulus Information

Here, we summarize our investigation on the role of spike timing in encoding complex time-varying stimuli. The work in this subsection is a summary of the work previously reported in [18, 19] to which we refer for full details.

We recorded responses of single neurons in the caudal auditory cortex of passively listening macaque monkeys. In a first experiment, we recorded the responses to a sequence of pseudorandom tones (so-called “random chords”), a stimulus with short correlation time scale, hence rapid dynamic content. Because of this short intrinsic time scale, and given the debate about whether there could be a precise temporal encoding in the absence of response locking to fast sequences of stimulus presentations [20], such stimuli are particularly suited to determine whether precise spike timing can contribute to the encoding of complex sounds within a rich acoustic background.

We thus used Shannon information (; see (1)) to directly quantify the information carried by temporal spike patterns sampled at different temporal precisions. Shannon information was computed from many (average 55) repeats of the same stimulus sequence using the direct method [3]. We computed the stimulus information carried by spike patterns characterized as “binary” (spike/no spike) sequences sampled in fine (1 ms) time bins and used a temporal shuffling procedure to compare the stimulus information encoded at different “effective” response precisions (Figure 2(a)). This shuffling procedure entailed shuffling spikes in nearby time bins and can be used to progressively degrade the effective precision of a spike train without affecting the statistical dimensionality of the data. We considered the randomly selected epochs from the long stimulus sequence as “stimuli,” and the resulting information estimates (averaged across many selections of stimulus epochs), hence indicating how well these different sounds can be discriminated given the observed responses.

Estimating stimulus information from responses at effective precisions coarser than 1 ms resulted in a considerable information loss. Across the population of neurons, the dependency of stimulus information on response precision was quantified by normalizing information values at coarser precisions by the information derived from the original (1 ms precision) response (Figure 2(b)). Across neurons, the information loss amounted to 5%, 11%, and 20% (median) for effective precisions of 4, 6, and 12 ms, respectively, demonstrating that a considerable fraction of the encoded stimulus information is discarded when sampling the same response at coarser temporal resolution.

In a second experiment, we further tested the importance of reading responses at millisecond precision using an additional set of neurons recorded during the presentation of natural sounds. This stimulus comprised a continuous sequence of environmental sounds, animal vocalizations, and conspecific macaque vocalizations, and the information values hence indicate the relevance of finely timed activity to discriminate natural sounds occurring within a context of similar sounds. Across the entire sample of neurons, the proportional information lost by reducing the effective temporal precision was smaller than the results obtained during stimulation with random chords (Figure 2(c)). Importantly, however, a subset of neurons carried considerable stimulus information at high temporal precision and revealed a significant information loss when ignoring the temporal response precision. The fraction of neurons with significant information loss (bootstrap test; ) was 11% and 17% at 6 and 12 ms, respectively. For those with significant loss at 12 ms the information transmitted per spike dropped from 3.8 bit/spike at 1 ms to 3.2 and 2.7 bit/spike (median) at 6 and 12 ms precision, respectively. This demonstrates that millisecond precise spike timing can also carry additional information about natural sounds that cannot be recovered from the spike count on the scale of about 10 ms or coarser. Recent work by other groups corroborates these results using similar studies and analysis methods in the auditory inferior colliculus [21] and the rat auditory cortex [22].

3. A Candidate Intrinsic Indicator of Stimulus Onset: Stereotyped Neurons

In the analysis presented above, information is analyzed by aligning spikes and sensory events using a reference frame such as a laboratory-based computer clock. This procedure is typical for the vast majority of analyses about the information content of spike times, but problematic. It raises the question of how the brain may succeed in interpreting the information carried by the temporal variations of neural responses without the benefit of a computer clock measuring perfect time intervals and providing the exact time of stimulus presentation [810]. In “active sampling,” situations where the motor system actively initiates or modulates the sampling of external information [10, 23, 24], sensory systems may receive a motor efference copy that reduces temporal uncertainty about stimulus timing [2527]. However, in more general situations (e.g., when sampling is not actively initiated or when the stimulus appears at unpredictable times), such efference mechanisms are not available. Under such conditions, the system must have an intrinsic temporal reference or an otherwise intrinsic mechanism deriving a signature of stimulus occurrence in a stimulus-derived, hence bottom-up manner.

One possibility is that a neural population event could provide an estimate of the time of the stimulus, which could then be used to measure the relative timing of subsequent spikes. For a population event to act as a plausible “clock” that indicates the stimulus onset, the event must enable the extraction of information about complex natural stimulus features in an alert animal with sufficient robustness across trials, without needing to rely on any external predictive clues about stimulus timing. We investigated the feasibility of a relative coding scheme and its robustness with regard to these requirements in the auditory cortex of awake primates [28]. Using a paradigm that minimizes predictive cues about stimulus onset, we recorded the responses of single neurons from primary auditory cortex to naturalistic sounds made of conspecific vocalizations and vocalizations or noises of other animals (Figure 3(a)). In this section, we review this work, following what we summarized in [29].

We started the analysis by measuring the single-trial response latencies of the recorded neurons using a statistical algorithm. Then, for each neuron, we computed the standard deviation of the response latency across all trials of each stimulus as well as the average response latency over all stimuli. This was useful to characterize the trial-to-trial variability in the response latency of the neuron. We found a clear dichotomy in the population with respect to the response latency variability. Some neurons exhibited very low variability in their response latency, while many others displayed higher variability. Based on this finding, we applied a threshold on the latency variability (Figure 3(b)) and divided the population into two distinct groups, which we named as “stereotyped” neurons and “modulated” neurons. Neurons that were classified into the stereotyped group (approximately one-fourth of the population) had very low latency variability. Modulated neurons on the other hand had larger variability in their latencies. Example responses of one stereotyped and one modulated neuron are shown in Figure 3(b). When we observed the response characteristics of neurons in the two groups, we found two other distinctions. Stereotyped neurons responded to all tested sounds, while modulated neurons responded only to some sounds. In addition, stereotyped neurons exhibited much shorter mean response latencies ( ms) compared to modulated neurons ( ms; two-sample t-test ).

Stereotyped neurons are thus distinctive from modulated neurons due to their fast, reliable, and nonspecific responses. This suggests that stereotyped neurons may provide an intrinsic reference signal of the stimulus time. This reference signal could enable a putative downstream neuron to extract the information carried by the time-varying responses of stimulus-modulated neurons. We tested this hypothesis in the following way. We defined the stimulus onset using two alternative reference frames, one being the precise stimulus time as measured by the laboratory clock (external reference) and the other being the response onset of a simultaneously recorded stereotyped neuron (internal reference). We formulated two candidate codes by aligning the spike trains of the modulated neurons to either one of the two references. We found that when the responses of the modulated neurons were aligned with respect to the stimulus onset using the extremely precise external reference, the responses showed temporally precise stimulus modulated spike patterns (Figure 3(c)) indicating that information is conveyed in the auditory cortex through precise spike timing [19, 30]. When the responses of the modulated neurons were aligned using the internal reference provided by the single-trial onsets of a simultaneously recorded stereotyped neuron, these temporal response patterns were largely preserved (Figure 3(c)). We then computed the information carried by each of these codes. We found that only little of the information about the sound identity carried by the externally referenced time-varying neural responses was lost when computing information with the internally referenced time-varying neural responses (Figure 3(d)). This is due to the temporal reliability of stereotyped neurons. Importantly, when we used the response onset time of a modulated neuron as the reference point of stimulus onset, the temporal response patterns were highly degraded (Figure 3(c)) and resulted in a higher information loss (Figure 3(d)).

We finally investigated whether the selective pooling of stereotyped neurons could act as a reliable indicator of stimulus onset. By using a computational modelling approach, we estimated that more than 95% of the full information that is contained in the spike times measured with respect to the precise stimulus onset could be recovered when the reference point for stimulus onset is calculated by pooling the responses of about 25 stereotyped neurons [28].

Previous studies have shown that the relative timing of neural responses can carry considerable sensory information and sometimes even more than the absolute timing relative to the stimulus [9, 31, 32]. Our results show that the relative timing of neural responses to an intrinsically defined population event can constitute a highly informative code also in the alert animal and for complex and suddenly appearing stimuli.

4. A Candidate Intrinsic Mechanism to Partition Spike Sequences: Network Oscillations

Access to information in temporally precise codes by the brain requires not only the existence of intrinsic indicators of stimulus onset, but also the existence of internal mechanisms to partition neural responses into precisely defined time intervals. The problem of maintaining a precise representation of time intervals is likely more difficult when considering the partitioning of long-lasting neural responses, such as those generated during presentation of a long-lasting stimuli. The auditory system is often exposed to continuous stimuli such as a speech and has to represent individual sound objects within the evolving stimulus stream [33, 34], for example, representing individual words in the speech. Based on these considerations, we conducted a study to investigate which temporal aspects of the activity of the primary cortical auditory network could act as a temporal frame that provides an informative partitioning of long spike trains into finer time intervals.

We presented a 52-second continuous sequence of naturalistic sounds, such as animal calls and environmental sounds (Figure 4(a)) and recorded the responses of neurons from monkey primary auditory cortex [18, 32]. In this section, we review this work, following what we summarized in [29].

We first defined the stimulus set by randomly selecting sets of 10 epochs from the long sound sequence (Figure 4(a)). The response of one example neuron across trials is shown in Figure 4(b). Previous work on the data (see e.g., [19]) had revealed that the spike patterns encoded information with high temporal precision (in the range of few milliseconds) when the spike trains were stimulus aligned and partitioned into equally spaced time bins based on the laboratory clock (time-partitioned spike trains, Figure 4(c)).

In the present study, we investigated whether we could use a reference frame purely based on intrinsic network activity to partition responses into informative spike patterns. We considered slow oscillatory network activity, which has previously been suggested as a potential reference signal for neural processing [35]. Rhythms with cycle lengths of 100 ms or longer, such as delta or theta bands are often observed in sensory cortices during naturalistic stimulation [18, 36, 37].

We asked whether the phase of the network oscillation permits partitioning long spike sequences. In natural sounds, low frequency components in the theta (2–6 Hz) frequency range contain important acoustic information that is crucial for speech comprehension [38]. Slow rhythmic network activity in the auditory cortex entrains to the presentation of natural sounds [18, 34, 36, 39, 40]. This causes the phase of the oscillation to be reliably time-locked to the stimulus. The phase may then indicate salient points along the continuously varying stimulus [41]. As a result, phase differences can be used as a surrogate measure of time intervals during stimulation, at least so on the order of few tens of milliseconds. Thus, we used the phase of the theta band of local field potential (LFP) as an oscillatory reference to partition spikes trains (Figure 4(c)). Specifically, we divided the full phase cycle of the oscillation into phase ranges (or phase bins) and allocated each spike in a spike train to the corresponding phase bin based on the instantaneous phase of the oscillation at the time of the spike (phase-partitioned spike trains, Figure 4(c)). This is an alternative way of assigning spikes into representative intervals, using the oscillatory phase as a virtual time axis. Note that the phase epochs may not be equally spaced in time as a result of natural variability in network rhythms.

We found that the phase-partitioned spike trains still had clear stimulus dependence (Figure 4(b)). For comparison, we formulated a time-partitioned spike code, where time intervals were defined using the laboratory clock, and a spike count based code, which takes the sum of all spikes without considering the time structure. Then, we estimated the information in each of the three coding schemes about the defined stimulus set (where different sections of the long sound sequence were defined as stimuli). We evaluated the viability of the phase-partitioned code by comparing its information with those in the other two coding schemes.

The phase-partitioned code had a large information gain compared to the spike count (40%, population mean, Figure 4(d)) and was able to recover almost all (86%, Figure 4(d)) of the information conveyed by the time-partitioned code. Moreover, the excess information in either partitioning scheme over the spike count was highly correlated across neurons (Spearman’s rank correlation ). Therefore, good stimulus discrimination displayed by one partitioning scheme implies good discrimination performance from the other. Notably, the information recovered by the phase-partitioned code in some of the neurons was higher than that for the time-partitioned code. This suggests that the oscillatory phase during which these neurons fire was more reliable and stimulus specific than the precise timing to the stimulus itself [32].

An important question is where these oscillations come from, or how they are generated. To address this we performed additional modelling studies [4244]. Results from these studies and experimental data [36, 39, 40] show that low frequency oscillations are generated by entrainment of cortical activity to the low frequency components of dynamics of the stimulus. These low frequency variations are a very prominent component of natural stimuli. In naturalistic movies, the power spectrum of most visual features decreases proportionally to the square of the frequency [45], meaning that the components of natural movies with higher amplitudes are those at low temporal frequencies. Similar results apply to the auditory domain [46].

Notably, an oscillatory reference frame based on the oscillatory phase arises from the intrinsic network activity and is likely to be directly accessible within local cortical network [47, 48]. This is because low frequency LFPs reflect changes in neuronal excitability that are spatially coherent over several millimeters [49, 50] and often accompanied by coherent fluctuations of neural membrane potentials [51] whose low frequency phase provides an effective reference signal for decoding spike information [52]. Given that the majority of synapses are made within local networks [53], pre- and postsynaptic neurons likely have access to the same slow rhythm for the majority of cortical connections.

In sum, our investigations suggest that network oscillations may be able to act as a highly effective, biologically plausible, and purely internal reference frame to generate informative spike patterns.

5. Characterizing the Role of Response Timing in Population Activity: Results, Challenges, and Ideas for Future Work

The studies discussed the above focused mostly on how the timing of single neurons encoded information. An important question is how these results generalize to populations of neurons, or, in other words, how to include the spatial dimension of neural codes along with the temporal dimension. This is particularly difficult because of the combinatorial problem of considering many space-time parameters. The number of possible spike patterns grows exponentially with the number of neurons and the number of time bins. This is known as the curse of dimensionality and is fundamental and, in its general form, unsolved problem of computation and sampling [54]. As noted above, for the specific case of information estimates from neural data, the curse of dimensionality problem arises primarily because of the limited amount of data that can be collected from a neural system (especially from behaving subjects), rather than from computation time issues (see [11, 12] for recent reviews). The limited amount of experimental neural data that can be collected limits severely the size of the neural populations that can be analyzed and ultimately requires additional techniques to study high dimensional activity patterns.

One possible scenario is that sensory areas in the brain process information using not only small but also high informative ensembles; hence they effectively rely on a subset of the many available neurons. If this was true, analysis could be limited to those “relevant” ensembles and the combinatorial space-time problem of large populations could be possibly avoided altogether. Noteworthy, in a recent study on the encoding of natural sounds in primate auditory cortex, we found strong evidence for this to be possible [55]. We found that a small fraction of cells carried the vast majority of information available in a much larger sample of recorded neurons. Hence, rather than using all neurons to decode stimulus identity, similar or sometimes even more information could be recovered when studying only a selected subset of neurons. Moreover, we could determine optimal subpopulations by the encoding timescales of the neurons in the pool of recorded cells, thereby providing a plausible way to identify and readout optimal populations in biologically realistic circuits.

Another possibility is that the information available in precisely timed spike patterns of some neurons is replaceable by the information provided by the spike counts of other neurons in the population. This would reduce the complexity of the combinatorial problem tremendously by reducing the temporal granularity of the response readout. However, in our study, we found that the informative subpopulations carried their information by means of temporally precise spiking [55]. This means that in order to readout these populations optimally (i.e., to achieve best possible performance), time could not be replaced by space. In other words, the additional information provided by temporal response patterns was not encoded by the spike counts of other neurons. This suggests that the code by which auditory cortical neurons carry information is therefore genuinely made of space and time.

When considering very large populations of neurons, a typical procedure is to reduce the dimensionality of the considered problem to overcome the curse of dimensionality. Several techniques are available to search for structure in the neural interactions that allows simplifying the representation of the data. One possibility is to make assumptions about the coding characteristics within pools of neurons like, for instance, disregarding the identity of which neuron fires a spike and to compute a so-called pooled code [56]. Another possibility is to preprocess the data with a general dimensionality reduction technique. Many techniques for dimensionality reduction with different constraints and objectives are available. One example is nonnegative matrix factorization (NMF), which is particularly suitable for nonnegative data such as neural responses. This technique factorizes data into approximate nonnegative components with a resulting data representation that is parts-based and sparse [57].

Recently, Delis and colleagues developed a variant of NMF called sample-based nonnegative matrix trifactorization [58]. Originally, the method was proposed for the analysis of muscle synergies. What makes this method potentially interesting for the space-time problem of neural population responses is its ability to decompose its input into space-by-time components in a data-driven way. The method is illustrated in Figure 5. A nonnegative input matrix which may consist of time-varying responses of a population of neurons is decomposed into three components: temporal modules, spatial modules, and activation coefficients. Temporal modules are temporal activity patterns in the data while spatial modules are groups of neurons that are active in fixed proportions. Temporal and spatial modules are constant across trials whereas activation coefficients are trial dependent. By using spatial and temporal modules that are fixed across trials, the method reduces the dimensionality of the data considerably under the assumption that the composition of coactive neurons is stimulus-driven and reliable within the dataset. The method thereby identifies functional units in space and time that can be further analyzed with regard to stimulus information or other properties of interest. Although these methods were not yet applied to neural data, they have the potential to tackle long-standing problems in the analysis of large-scale populations and to facilitate future studies of very large populations of neurons.

Last, when considering population codes the correlation structure between responses of different neurons becomes an important determining factor on how independent or synergistic different neurons reflect the sensory environment. In our study [55], we found that correlations between neurons of subpopulations did not have a strong impact on information and could be safely ignored. But this may not always be the case. Several methods were developed to analyze the impact of correlations on information and to construct models that take correlations into account [59, 60]. For small ensembles of neurons and a rate coding assumption, the information theoretic importance of taking detailed correlations into account can be tested within a maximum entropy framework [61] and, if necessary, models of detailed population dependencies can be constructed [62, 63]. Moreover, detailed interactions in the spike timing of populations can be investigated in terms of the rate of synchronous discharge [56]. The techniques developed in these studies are useful to study neural populations in the presence of strong correlations and to test the implications of such correlations in a principled and analytical framework.

6. Discussion

The quest for the neural code has been going on for several decades and is still an open question that raises heated discussions among neuroscientists [4, 6466]. Our contribution to the understanding of this question has been to develop methods, based on the principles of information, for the unbiased quantification of the information carried by different kinds of neural codes. These methods have the advantage that they can be applied likewise to any kind of stimulus material including complex and naturalistic stimuli, thereby making it possible to investigate neural representations in conditions closer to real life. Importantly, our methods allow the investigation of how different coding schemes complement each other and cooperate. This helped to unite previous somewhat contradicting results by showing that auditory cortical neurons may multiplex sensory information across multiple response time scales [18].

We feel that the opportunities offered by the information theoretic formalism introduced here for understanding how different coding mechanisms and different scales cooperate are very significant. We therefore see the further development of these methods (in particular, to be able to consider the information provided by many spatial and temporal scales simultaneously), as an important area for progress in computational neuroscience for the next years. The ideas for information rich dimensionality reduction of neural population responses discussed in the previous section may be instrumental to this progress.

Evidence for information that is represented by temporal spike patterns does not imply that the nervous system can make use of such temporally precise codes. It is commonly criticized that spike timing information encoded in variables such as poststimulus latency, which are defined with respect to external events such as stimulus onset, cannot be utilized by a downstream neuron because a biological system is not able to measure such variables. Therefore, identification of intrinsic temporal reference frames that enable direct decoding of spike trains without reference to external frameworks is crucial to link temporally precise spike codes to behavior. Studies over the last few years began to investigate this problem, and the results reviewed here provide a series of useful insights.

One insight from recent work is that the population activity of the network itself can generate a sufficient reference frame for reconstructing informative spike patterns. As we reviewed in this paper, considerable sensory information can be recovered even under challenging conditions, including natural stimuli whose presentation timing cannot be predicted from stimulus regularities [28], or long stretches of natural stimuli [32]. In visual cortex, Shriki and colleagues [67] studied the encoding of visual orientation and also reported a subpopulation of stereotyped neurons with reliable nonstimulus-selective response latency. Similar to our results in the auditory system, these visual neurons with invariant robust latencies could be used to compute informative spike times from other neurons with longer and stimulus-selective latencies [67]. In some cases, internally referenced codes may outperform externally referenced ones [9, 31, 32]. This may, for example, happen when variations in spike timing are coordinated across neurons due to a common covarying factor. In this case, spike timing relative to the stimulus is more adversely affected than the relative timing between neurons [68].

Another insight is that distinct populations within a single area separately encode the stimulus timing and the stimulus identity [28, 67]. While some neurons show time-dependent stimulus-selective responses, other neurons exhibit short-latency and unselective responses that reflect stimulus occurrence [28]. It is possible that the neurons showing unselective responses may have been systematically ignored in previous work. These neurons could act as “saliency detector” neurons and may have the function to ensure that the early poststimulus part of neural responses (which is the most informative one in many cases [4, 20, 69, 70]) is not missed out. Future work needs to elucidate whether and how these neurons interact with slow network rhythms to collectively form reliable and precise intrinsic temporal reference frames for neural coding.

The work reviewed here does not tell about how the computations needed for decoding spike timing information may be implemented in real neural networks at the biophysical level. However, some insights about these mechanisms may be gained using computational models [71]. Sensitivity to temporal spike patterns at different scales, for example, can arise from synaptic mechanisms like short-term depression or facilitation [72, 73]. Recent work has suggested that downstream learning and decoding of temporal patterns of spikes may rely upon spike timing-dependent plasticity (STDP). If downstream neural networks are with STPD, then they can easily localize a repeating spatiotemporal spike pattern embedded in equally dense background spike trains [74]. Such plasticity of decoding mechanisms may be facilitated by the fact that internally referenced patterns of neural activity show a degree of robustness in their coarse structure across stimulation conditions and during spontaneous activity [75]. Model neurons equipped with STDP robustly detect a pattern of spikes encoded by the phase of a subset of afferents, even so when these patterns are presented at unpredictable intervals [76] and even when only a fraction of afferents are organized according to the phase [76].

Together, the observations reviewed in this paper support the view that transmitting, learning, and decoding spike timing information based on internal temporal frames are computational capabilities of the microcircuitry of cortical sensory structures [71, 77]. Thus, precise spike timing of individual neurons and neural populations may play an important role in the neural cortical encoding of sensory signals.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors acknowledge the financial support of the VISUALISE and SICODE projects of the Future and Emerging Technologies (FET) Programme within the Seventh Framework Programme for Research of the European Commission (FP7-ICT-2011.9.11) under Grant Agreement nos. FP7-600954 and FP7-284553 and the European Community’s Seventh Framework Programme FP7/2007-2013 under Grant Agreement no. PITN-GA-2011-290011. This work was further supported by the Max Planck Society and was part of the research program of the Bernstein Center for Computational Neuroscience, Tübingen, funded by the German Federal Ministry of Education and Research (BMBF; FKZ: 01GQ1002). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the paper.