Abstract

In recent years numerous improvements have been made in multiple-electrode recordings (i.e., parallel spike-train recordings) and spike sorting to the extent that nowadays it is possible to monitor the activity of up to hundreds of neurons simultaneously. Due to these improvements it is now potentially possible to identify assembly activity (roughly understood as significant synchronous spiking of a group of neurons) from these recordings, which—if it can be demonstrated reliably—would significantly improve our understanding of neural activity and neural coding. However, several methodological problems remain when trying to do so and, among them, a principal one is the combinatorial explosion that one faces when considering all potential neuronal assemblies, since in principle every subset of the recorded neurons constitutes a candidate set for an assembly. We present several statistical tests to identify assembly neurons (i.e., neurons that participate in a neuronal assembly) from parallel spike trains with the aim of reducing the set of neurons to a relevant subset of them and this way ease the task of identifying neuronal assemblies in further analyses. These tests are an improvement of those introduced in the work by Berger et al. (2010) based on additional features like spike weight or pairwise overlap and on alternative ways to identify spike coincidences (e.g., by avoiding time binning, which tends to lose information).

1. Introduction

The principles of neural coding and information processing in biological neural networks are still not well understood and are the topic of ongoing debate. As a model of network processing, neuronal assemblies were proposed in [1], which are intuitively understood as groups of neurons that tend to exhibit synchronous spiking.

In recent years considerable improvements have been made in multiple-electrode recordings and spike sorting (see, e.g., [2, 3]) that allow monitoring the activity of up to hundreds of neurons simultaneously. These improvements open the possibility of identifying neuronal assemblies from multiple-electrode recordings using statistical data analysis techniques. However, several methodological problems remain when trying to do so and, among them, a principal one is the combinatorial explosion that we face when considering all potential neuronal assemblies (since in principle every subset of the recorded neurons constitutes a candidate set for an assembly). For this reason, most studies that deal with temporal spike correlation still resort to analyzing only pairwise interactions (see, e.g., [47]), thus considerably reducing the computational complexity of such task. There are approaches in the literature that try to infer higher-order correlation and potential assembly activity by building primarily on these pairwise interactions (see, e.g., [811]) but, although they can sometimes provide a hint of higher-order correlation and even closely identify assembly activity (provided it is sufficiently pronounced), higher-order correlations need to be checked directly in order to properly identify neuronal assemblies, mostly for two reasons: first, to make sure that the activity reported is actually that of an assembly and not just of several overlapping pairs and, second, to increase the sensitivity for assembly activity as pairwise tests may not be affected sufficiently by assembly activity (see, e.g., [12, 13]). Some approaches already do so (see, e.g., [1416]) yet they are all generally limited to a small number of neurons. Others presented in some of our recent companion papers (see, e.g., [1719]) push this limitation by employing frequent item set mining methodology and algorithms to ease and speed up the search through all the candidate sets for potential assemblies, yet combinatorial explosion remains a fundamental problem (especially since statistical tests aiming at identifying assembly activity often rely on randomization or surrogate data approaches, which drive up the computational complexity even further).

In this paper we present several statistical tests to identify individual assembly neurons (i.e., neurons that are part of an assembly). Our tests extend and considerably improve those presented in [20], which were based on time binning and were mostly intended to identify exact (or almost exact) spike synchrony—which is more a theoretical simplification for modelling purposes rather than a realistic assumption. With the new tests introduced in this paper we can do much better: first, we introduce new features into the tests that make them more sensitive (like, e.g., spike weights or pairwise overlap of spikes) and, second, we introduce new ways to identify spike coincidences (i.e., we introduce alternatives to time binning to avoid the loss of detectable synchronous activity). The main motivation of our tests is to reduce the set of neurons only to a relevant subset of them and in this way ease the task of identifying neuronal assemblies in further analyses (i.e., by reducing the total number of neurons to those that tested positive in our approach, the combinatorial explosion can be reduced significantly). The idea of all tests that we present in this paper is fairly simple: we evaluate whether an individual neuron is involved significantly more often in some correlated-spiking event (that depends on the particular test) than it would be expected by chance under the assumption of noncorrelation (i.e., independence). In order to assess significance we estimate the distribution of our test statistics by means of randomized trials (i.e., collections of parallel spike trains): modifications of our original data that are intended to keep all its essential features except synchrony for the neuron we are testing.

The paper is structured as follows: in Section 2 we mainly introduce some notation that we will be using throughout the paper and briefly discuss the notion of spike synchrony, central to our research. In Section 3 we introduce our test statistics to identify assembly neurons. First, in Section 3.1 we provide four statistical tests that rely on a window-based approach to identify spike coincidences. Technically speaking, different collections of windows provide different ways of counting spike coincidences and thus different tests. We consider in our evaluations two collections of windows: the first one we consider is a partition of the recording time of our spike data into equal intervals (i.e, time bins), on which the bin-based model (the almost exclusively applied model of synchrony in the neurobiology literature) relies in order to identify spike coincidences. The second one we consider, more in keeping with a time-continuous account of spiking activity, is a collection of sliding windows (one for each spike time) able to account for all spike coincidences in our spike trains that fall within the window length and that is consistent with the common, intended characterization of spike synchrony in the field, which regards two or more spikes as synchronous if they lie within a certain distance from each other (to be determined by the modeller). Second, in Section 3.2, we offer a graded, continuous alternative to some of the previous tests. In Section 4 we briefly discuss the complexity of computing the test statistics presented in the two previous sections. In Section 5 we evaluate the performance of our new test statistics on artificially generated collections of spike trains based on parameters learned from typical real recordings, compared to the performance of those in [20], and show that the former clearly outperform the latter. Finally, in Section 6 we summarize results.

2. Preliminary Definitions, Remarks, and Notation

Let be our set of items (i.e., in our context, neurons). We will be working with parallel spike trains, one for each neuron in , formalized as spike-time sequences (i.e., point processes) of the form , for and (the recording time), where is the number of times neuron fires in the interval . We denote the set of all these sequences by . Sets of sequences like constitute our raw data.

In order to identify (potential) assembly neurons and, ultimately, neuronal assemblies we need to determine first what constitutes spike synchrony: exact spike coincidences cannot be expected and thus an alternative, nontrivial characterization of synchrony is needed. Generally it is considered that two or more spikes are synchronous (or coincident)—that is, they constitute a synchronous event—if they lie within a certain (user-defined) distance from each other, say . We will assume this notion of spike synchrony throughout.

The bin-based method, the almost exclusively applied method for dealing with synchronous spiking in the neurobiology literature, builds on the notion of synchrony above: the recording time is partitioned into time bins (i.e., windows) of equal length ( above, the time distance within which the modeller intends to define synchrony) and all those spikes that lie in the same time bin are regarded as synchronous. Notice though that the bin-based method can fail to identify some synchronous events: two or more spikes can be separated by a time distance way smaller than and lie in two distinct time bins—what we called in other companion papers the boundary problem, which we addressed by means of an alternative method to identify and count spike coincidences which builds on an alternative window set, defined in the next section (that matches the intended characterization of spike synchrony given above), introduced in [17]. In order to illustrate the relevance of the boundary problem and the huge impact that time-bin boundaries have on the identification of synchrony we show, in Figure 1, the probability that spike coincidences of different sizes (with respect to different ratios between the scatter of the spikes—the time span of the spikes in the coincidence—and bin width) are cut by a time-bin boundary.

3. Statistics

In order to identify assembly neurons from -like data we propose here several statistics based on a variety of ideas (already briefly sketched in Section 1).

3.1. First Set: Window-Based Tests

We first present four test statistics that are based on counts of spike coincidences in a collection of (sliding) bins or windows . We denote the number of such windows by (i.e., ).

We denote by the number of windows, where all neurons in a set fire. To simplify we sometimes avoid set notation: instead of writing, for example, , we write , for .

is the subset of neurons that fire in the th window.

Conditional Pattern Cardinalities (CPC1). This test (first introduced in [20] for time binning) builds on the idea that neurons participating in assemblies should have more neurons firing synchronously with them (due to the spikes of the other assembly neurons and the background spikes that are merely synchronous by chance) than it would be expected by chance under the assumption that they are not assembly neurons. Therefore, if belongs to a neuronal assembly, the average cardinality of the spike coincidences in which neuron participates should be bigger than that expected by chance (i.e., under the assumption of independence).

In order to formalize our test statistic () we first define the amounts and as follows, for : where is the indicator function of the set (i.e., if and otherwise) and is a user-specified variable that, for values greater than , weights large cardinalities more strongly than smaller ones (on the understanding that mainly large cardinalities tell us about assembly activity while small ones can simply respond to chance events). In other words, is the unconditional average (for ) pattern cardinality (taking all windows into account) while is the conditional average pattern cardinality given neuron (i.e., conditional on neuron : only windows containing a spike of neuron are taken into account). If neuron does not participate in an assembly the two averages should not differ significantly. However, if neuron participates in an assembly then we would expect to be (significantly) larger. Therefore, by comparing the two averages we obtain a test for assembly participation. We formalize this comparison by defining the test statistic with respect to neuron , as follows:

Conditional Item Frequencies (CIF1). This test (first introduced in [20] for time binning) is based on the idea that, if belongs to one or more neuronal assemblies, it should fire more often with other neurons, namely, those that are also part of the assembly or assemblies, than it would be expected by chance under the assumption that it is not an assembly neuron.

For each neuron (for ) we consider the number of windows, where neurons fire together and its expected number to build our test statistic (the latter is estimated as , with —the estimated firing frequency of neuron ). If exceeds (significantly) then neurons are likely to be part of the same assembly, due to which we see more cooccurrences of spikes of these two neurons that can be expected by chance. If, on the contrary, is less than , it is highly likely that the observed cooccurrences are merely chance events. We formally express this intuition by means of our test statistic as follows, for neuron :where is, here and throughout the rest of this section, a boolean operator that returns value if the condition holds (i.e., in this statistic, if ) and otherwise (note that we are only interested in the former case, which could be indicative that neuron belongs to an assembly). The value offers the possibility of weighting large numbers of spike coincidences for pairs of the form (over the expected ones) more than smaller ones.

Conditional Item Weight (CIW1). The previous test statistic (i.e., ) was built based on the number of observed and expected spike coincidences of sets of the form (where is the neuron tested and ) without taking into account the cardinality of the sets in the windows, where such -coincidences occurred. It is plausible that -coincidences that cooccur with many more spikes are more indicative of correlation (assembly activity) than only a few cooccurrences. Basically, in order to build this new statistical test, we combine the idea on which is based (i.e., that larger pattern cardinalities are possibly indicative of assembly activity) and that of (i.e., that a neuron participating in an assembly fires more often together with some other specific neurons—those also in the assembly) and combine them by weighting spike cooccurrences with the corresponding pattern cardinality. This test statistic goes beyond what was presented in [20] and, given that we are bringing together two pieces of information that proved effective for our purposes (pattern cardinality and coincident spiking with other specific neurons), it can be expected to yield considerably better performance.

We formalize this idea by means of our test statistic . In order to define such statistic we first need the values and defined as follows:

In other words, gives us the sum of the cardinalities of all sets of neurons in that fire together with neuron over our collection of windows (i.e., the occurrences of spikes of neuron are weighted with the cardinality of the pattern in the window they appear in. Thus, is the total size of patterns containing a spike of neuron ). Similarly, gives us the sum of the cardinalities of all sets of neurons in that fire together with neurons and over (i.e., the cooccurrences of spikes of neurons are weighted with the cardinality of the pattern in the window in which they occur).

We define the test statistic , with a user-specified power , as follows: with the estimated firing frequency of neuron . The parameter , as in previous statistics and in those that follow, offers the possibility of weighting larger (average) spike coincidences more than smaller ones.

Conditional Pattern Overlap (CPO1). While all preceding statistics were computed from aggregates over values computed from individual windows, for the test statistic we present now, we consider pairs of windows in which the neuron tested fires together with another set of neurons. The idea underlying this statistic is that cooccurrences of spikes of neuron with those of any other neuron (as considered in the two preceding statistics) may still be chance events. However, if spikes of several other neurons all occur together twice (as we look at pairs of windows) with spikes of the tested neuron , this is a much stronger indicator of assembly activity. Apart from this difference, this statistic employs the same idea as , only that the overlap of pairs takes the role of a single pattern.

We formalize this idea by means of the test statistic , which we define as follows:where excludes patterns overlapping only in one neuron.

A simple example on how these test statistics that we have just presented are computed is given in Figure 2.

In Section 5 we report results on the evaluation of these statistical tests for two window sets of particular interest, which we denote by and (except , which was only evaluated on ). “b” stands for “bin” and “w” for “(sliding) window.” The subscript reflects the dependence of on the underlying collection of spike trains :(i) is a partition (of intervals of length , the time span within which we define spike synchrony) of the recording time ;(ii) is the set given by all the intervals of the form , for all (in ) and all . The real value refers to the particular (user-defined) time span.

Our definition of is motivated by the bin-based model of synchrony that, as mentioned earlier, partitions the recording time into time bins of equal length and counts as synchronous those spikes that lie in the same bin (which constitutes the most popular method for the identification of synchronous spiking in the neurobiology literature and the reference for the statistical tests presented in [20]). However, as we explained earlier (and illustrated by means of Figure 1), such an account of synchronous spiking leads to missing potential synchronous groups: groups of spikes that lie within the time span that determines synchrony (say , as above)—and thus should be identified as synchronous—but that, due to the placement of the bin boundaries, fall into different time bins and are thus not reported as synchronous by the bin-based model. In order to bring more flexibility to the placement of the bin boundaries and this way achieve a better account of spike synchrony some possibilities come naturally to our mind. Maybe the most natural way would be to look at each spike and check its neighborhood, considering a time span in each direction (i.e., considering the window , for the corresponding spike time). However, this has the disadvantage that looking only at in each direction may still miss synchronous spiking, hence the natural possibility of considering a neighborhood with span in each direction, but this increases the number of chance occurrences. The next option is then to let a window (of length ) slide over the spike trains stopping at each spike, which captures each spike coincidence in the range given by at least once. Such a collection of windows is given by .

3.2. Second Set: Time-Continuous Approach

In this section we offer a continuous version of some of the previous statistical tests that are implicitly built on a graded, continuous notion of spike synchrony.

We consider, for each spike and , an influence region that corresponds to the distance within which two or more spikes are regarded as synchronous (i.e., for a time span , we would define the influence region of spike as the interval ). From the influence region we define the function as follows:

In what follows we will represent spikes by these maps (i.e., will be represented by above). We call functions of this form influence maps (and the windows of the form underlying them are called influence regions). Such functions constitute the building blocks of the synchrony model that we introduce in our companion paper [21], which is characterized by a graded notion of synchrony (which differs substantially from the intended notion of synchrony in this paper, which is bivalent): the degree of synchrony among two or more spikes is defined as the integral (i.e., area) of the intersection of their corresponding influence maps. Such degree is thus a value in the interval (e.g., if the time distance between any two spikes is greater than or equal to and if there is exact time synchrony between them).

Next we define as follows: where is the map corresponding to spike . In other words, any spike time that lies in an interval of the form , for a spike of neuron (and that, thus, should be regarded as synchronous with ), will be given, by , value .

3.2.1. Conditional Pattern Cardinalities (CPC2)

We introduce now a continuous version of the test statistic , in terms of influence regions and influence maps, which we denote by .

Formally, for , we define the values and as follows: with

Here is, as in previous statistics and all others that we will be presenting in this section, a weighting parameter that, for values greater than , weights large spike coincidences more strongly than smaller ones. As with , and measure average spike cardinalities (notice that gives us, at each , the number of influence regions corresponding to spikes of neurons in that overlap and, thus, the number of spikes that lie in the window ). As with and in , we expect to be bigger than if is an assembly neuron. Based on this intuition, we formally define the test statistic as follows, for :

3.2.2. Conditional Item Frequencies (CIF2)

We present now an adaptation of the test statistic to influence maps and a continuous domain, which we will denote by , and that responds to the same ideas as .

For each neuron we define the values and as follows:

We formally define the statistic as follows, for neuron : where is the boolean operator returns value if and otherwise.

3.2.3. Conditional Item Weight (CIW2)

A continuous version of the test statistic is that which we denote by .

In order to formalize our continuous version of the statistic we first define the values and as follows:

We define as follows, for neuron : where is the frequency .

As before, is the boolean operator returns value if and otherwise.

4. Computational Complexity

In this section we briefly analyze the complexity of computing our statistics.

First of all, if we take as reference the window set (i.e., binning), we have that , , and are linear in the number of windows in . Also, as it is probably clear, is constant in the number of neurons (only the pattern cardinality is taken into account; the composition of the pattern itself is irrelevant) and and are linear (since one needs to loop over the neurons). More formally, we have that the complexity of computing is at most of the order , where is the number of time bins, and that the complexity of and is of the order , where is the number of spikes and is the number of neurons. As for , it is quadratic in the number of time bins and linear in the number of neurons. More formally, we have that its complexity is of the order (this bound could be reduced by the size of the largest set of neurons that fires together in a window, which would replace ). If, instead, we consider the window set then we have that the computation of , , and is linear in the total number of spikes and that and are also linear in the number of neurons. Formally, the complexity of is of the order and that of and is of the order (where, as before, could be replaced by the largest number of neurons firing together in a window). The statistics , , and have the same complexities as its window-based counterparts.

5. Evaluation

In this section we show some results concerning the evaluation of our statistical tests on artificially generated collections of spike trains. Such artificially generated collections, in which all assemblies—and thus assembly neurons—are known, are necessary in order to assess whether our test statistics do what they are supposed to do which is to identify all assembly neurons and discard all those that are not. Only on such data a proper evaluation of our test statistics is possible.

For the results reported in this section we generate our collections of spike trains as follows: for each signature (where stands for the size of the neuronal group and for the number of spike coincidences injected) we generate trials, each consisting of spike trains (one for each neuron) independently generated as -second Poisson processes (i.e., ) of constant rate 20Hz (which represent the background activity), with injected spike coincidences of a particular -neuron pattern containing the neuron we are testing for (for the neurons with injected synchronous spikes, a corresponding number of background spikes were removed and thus the background firing of the assembly neurons was adjusted accordingly). In order to generate such coincidences a random choice of points in the interval is considered for each trial and added to the background spiking activity. In trials with nonexact coincidences (i.e., jittered trials, as opposed to nonjittered trials with exact coincidences) a random shift is added, which we model by means of a uniform random variable on the interval (i.e., ±1.5 maximal millisecond shift, in keeping with the time span and the corresponding length of windows and influence regions that we are considering for our statistics). More results and diagrams corresponding to artificially generated data with slightly different settings can be found in http://www.borgelt.net/docs/napa.pdf. The general conclusions that could be drawn from them do not differ from those reported here.

5.1. Significance

To estimate the distribution of the test statistics we generate surrogate data from our original spike trains as follows: modifications of the original data that are intended to keep all its essential features except synchrony among the neuron we are testing and the others (see, e.g., [22] or [23] for a survey and analysis of methods to generate surrogate data from parallel spike trains). In order to keep as many properties of the original data as possible we create only a surrogate train for the neuron we are currently testing, which replaces the original train. The trains of all other neurons are left unchanged. With the surrogate train the test statistic is recomputed. Generating a surrogate train and recomputing the test statistic are repeated 1000 times, in order to obtain an estimate of the distribution of the test statistic. We then determine the fraction of surrogate trains that produced a test statistic value exceeding the one obtained with the actual (real) train and thus obtain a value. Note that, for testing another neuron, the original (real) train of any neuron tested before is used. That is, no surrogate trains are evaluated for neurons other than the one to be tested.

5.2. Results

Figures 36 feature diagrams with rates of false negatives for each signature , with over the trials; that is, the rate of trials (over ) in which the tested neuron that belongs to the group with injected coincidences is not identified as an assembly neuron—on the understanding that a group of neurons of size at least with at least spike coincidences in our trials constitutes a potential neuronal assembly (see, e.g., [17] or [18] for a better insight). Maybe it is worth stressing that, if we were to test a neuron that does not belong to an assembly, it would be identified by our test statistics as an assembly neuron (i.e., a false positive) in about of our trials (which is probably clear, since this is our significance level, learned from uncorrelated trials).

In Figure 3 we show results for the window-based statistics , , , and on (i.e., when considering time binning for the identification of spike coincidences). The first two test statistics (i.e., and ) were already introduced and evaluated in a companion paper ([20]) on artificially generated trials based on slightly different—but essentially comparable—settings. As the diagrams in Figure 3 show, the two new test statistics and introduced in this paper report considerably lower rates of false negatives than those already introduced in [20] on nonjittered trials (the best performance being that of that, as was seen in the previous section, is more costly than the other three in terms of computational efficiency). The performance of all such statistics with respect to tends to be substantially better than statistics with for most signatures: for an increase in the exponent yields an increase in sensitivity towards smaller patterns (i.e., towards smaller values for ) while for such an increase yields an improvement in sensitivity towards a smaller number of coincidences (i.e., towards smaller values for ). combines both effects, since it combines pattern cardinality assessment and coincidence counts (which is precisely what was intended with the definition of this statistic). The effect of on is even higher, since it exploits cooccurrences not only of pairs but of larger groups of neurons.

Figure 4 shows results for the same window-based statistics on jittered trials. As can be expected, the performance of all such statistics worsens substantially when dealing with nonexact spike coincidences. This is due to the above mentioned boundary problem when using the window set (i.e., binning): two or more spikes can be less than milliseconds apart (in our evaluations ) but still lie in different windows and thus be regarded as nonsynchronous (a detailed analysis and quantification of the effect of the boundary problem can be found in our companion paper [24]).

In order to improve performance when dealing with nonexact spike coincidences and to overcome the boundary problem in binning we introduced an alternative window set for our window-based statistics and a time-continuous alternative to them by means of our test statistics , , and . Diagrams in Figure 5 show the performance of our window-based statistics , , and on ( becomes very inefficient on —due to the much larger number of windows and its quadratic complexity in the number of windows—and thus was not tested). Performance of such statistics on is, for most signatures, better than the corresponding performance on (more so with respect to ). As was mentioned, such improvement is mostly due to the fact that, by considering in place of , we identify all injected coincidences.

Diagrams in Figure 6 show the performance of our test statistics , , and . Overall, the performance of our window-based statistics on and the corresponding time-continuous statistics (based on influence regions and influence maps) are not clearly distinguishable from the diagrams and, among all test statistics introduced in this paper, and seem to yield the best results.

We are currently exploring possibilities to transfer the ideas on which is based to work with without incurring quadratic computational complexity and also in the time-continuous approach. Although it is unclear how the statistic could be expressed in terms of influence regions (in the time-continuous approach), with such a transfer one can hope to achieve even better performance, as was seen for when working with .

6. Conclusion

In this paper we have presented several test statistics to identify assembly neurons from multiple-electrode recordings. The aim of such statistics is to reduce the set of neurons to a relevant subset of them and in this way ease the task of identifying neuronal assemblies in further analyses (a task which, due to the large amount of neurons that can nowadays be recorded, is undermined by the computational explosion that comes from having to consider every possible subset of them as a potential neuronal assembly).

We have provided two types of statistics as follows: the window-based statistics (, , , and ) and the time-continuous statistics (, , and ). The former rely on a window-based approach to identify spike coincidences and the latter on what we called influence regions (i.e., a time span around each spike within which synchrony with other spikes is defined—two or more spikes are synchronous in these settings if their influence regions overlap). For the window-based statistics we considered two window sets in our evaluations as follows: a partition of the recording time of our spike data into equal intervals (which is called binning)—on which the bin-based model of synchrony relies in order to identify spike coincidences—and a collection of sliding windows (one for each spike time), able to account for all spike coincidences in our spike trains that fall within the window length, which is more in keeping with the common, intended characterization of spike synchrony in the field, which regards two or more spikes as synchronous if they lie within a certain distance from each other.

Two of the window-based statistics ( and ) were first presented and evaluated with binning in a companion paper ([20]) for artificially generated nonjittered trials (i.e., with exact spike coincidences injected). In this paper we have shown that the two novel window-based statistics here presented (i.e., and ) perform substantially better in such settings, in terms of rates of false negatives. Performance of the latter is still better on jittered trials, yet, in these settings, test statistics based on the sliding-window set and the time-continuous ones yield much better results, as was shown.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The work presented in this paper was partially supported by the Spanish Ministry for Economy and Competitiveness (MINECO Grant no. TIN2012-31372) and by the Goverment of the Principality of Asturias (Programa Asturias Grant no. CT14-05-2-06).