Abstract

Crowd behaviour challenges our fundamental understanding of social phenomena. Involving complex interactions between multiple temporal and spatial scales of activity, its governing mechanisms defy conventional analysis. Using 1.5 million Twitter messages from the 15M movement in Spain as an example of multitudinous self-organization, we describe the coordination dynamics of the system measuring phase-locking statistics at different frequencies using wavelet transforms, identifying 8 frequency bands of entrained oscillations between 15 geographical nodes. Then we apply maximum entropy inference methods to describe Ising models capturing transient synchrony in our data at each frequency band. The models show that all frequency bands of the system operate near critical points of their parameter space and while fast frequencies present only a few metastable states displaying all-or-none synchronization, slow frequencies present a diversity of metastable states of partial synchronization. Furthermore, describing the state at each frequency band using the energy of the corresponding Ising model, we compute transfer entropy to characterize cross-scale interactions between frequency bands, showing a cascade of upward information flows in which each frequency band influences its contiguous slower bands and downward information flows where slow frequencies modulate distant fast frequencies.

1. Introduction

Coordinated activity is a powerful force in creating and maintaining social ties [1]. From communal dances in ancient human groups to civic festivals in the French Revolution or goose-stepping formations and stiff right arm salutes in Nazi marches and rallies [1, p. 136, p. 148-149], visceral, emotional sensations of shared movement have been used to create communal identities and to shape political landscapes. Historically, forms of distributed communication and coordination have often come together with episodes of large-scale mobilization and social change, as the widespread print-shop networks of radical reforming movements with the generalization of the printing press during the 16th century German Reformation [2] or postal networks of the Republic of Letters in the Age of Enlightenment a century later [3]. Today, amidst unprecedented development of communication technologies, new forms of coordination for large and scattered communities have been unleashed around the globe.

The rise of new digital communication tools and network technologies is accelerating fast bidirectional communication, generating new forms of collective communication and action. Digital communications tools increase the autonomy and influence of the social groups making use of them. They do this by promoting forms of mass self-communication [4], processes of collective intelligence using pools of social knowledge [5], or smart mobs exploiting new found communication and computing capabilities via ubiquitous devices [6]. From protest movements including the Arab Spring or the Occupy movements to autonomous responses in the face of natural disasters (e.g., Hurricane Sandy or the Tōhoku earthquake), several examples highlight the increasing power of digitally connected social and political grassroots movements to shape events. Recognition of a growing influence has brought with it heightened scholarly interest in its explanation: how such movements arise and self-organize, what mechanisms underlie their formation, and how are they able to constitute autonomous social and political subjects? [7]. Recent advances have described specific elements of connected multitudes: the geographical diffusion of trends [8], the interplay between exogenous and endogenous dynamics [9], or the connection between social media and collective activity in physical spaces [10]. Nevertheless, many of the mechanisms so far explored are specific to a particular scale or level of description of social dynamics. General mechanisms offering explanatory insights across different levels remain poorly articulated. The same problem applies to qualitative analyses trying to capture general principles of connected multitudes. These include perspectives stressing the individualistic logic pervading digital communication tools operating through sharing personalized content in social media [11], in sharp contrast with narratives highlighting the powerful aggregating and unifying affordances of digital communication tools [12]. We argue that these tensions can be reconciled. Using the analogy of biological brains, what constitutes social collective “brains” as complex entities probably cannot be captured by a single level of description. Instead, it may involve the capacity to display coordination at multiple scales [13], perhaps resembling neural large-scale synchronization over multiple frequency bands [1416]. Howsoever, the principles operating behind networks of connected multitudes require further conceptual and experimental development to address gaps in extant theory.

Propitiously, the rise of social media and digital data-mining creates the opportunity for a novel analysis of human social systems [17] providing mechanisms for explaining their behaviour and opening up the interactions between different scales of activity for detailed investigation. This opportunity provides an entry point into theoretical debates from where we can begin to generate hypothesis based on inferences from social experimental data. It is a position from which to undertake the difficult task of conceptualizing and describing the interwoven network of causal relations at different levels of description in social systems.

We use a data set of 1.5 million Twitter messages to explore transient phase-locking synchronization as a general mechanism explaining interactions within and between temporal scales. In particular, we use a well-known social event of large-scale social and political self-organization: the massive political protest of the 15M movement in Spain, emergent in the aftermath of the 2011 Arab Spring and widely thought to be facilitated through digital social platforms [18]. The exemplar of the 15M movement is interesting for a number of reasons: First, it consists in a self-organized social movement arising from online communication in a distributed network of citizens and civil associations (without significant coverage by mainstream media until days after the protests had taken to the streets). Second, the movement led to massive, nationwide demonstrations and encampments, creating a decentralized collective agency which has had a profound impact in Spanish politics [1921]. Finally, a series of studies have characterized some of the emerging properties of the 15M and how it exhibits features typical of critical systems and distributed self-organization [22, 23].

Using this data set, we propose phase-locking statistics between geographical nodes at different frequencies as a generic description of coordination in a nationwide social system. This description allows us to use maximum entropy techniques to extract Ising models mapping the statistical mechanics of the system at each frequency band and thus obtain a deeper understanding of the spatiotemporal patterns of coordination within and between frequency bands. Inspecting the properties of the models at each frequency band, we observe that all bands are operating near a critical point but that different frequencies play different roles in the system. While fast bands alternate states of (almost) full synchronization and full desynchronization, bands with slower frequencies display a wide range of possible configurations of metastable states with clusters of partial synchronization. Furthermore, applying transfer entropy in the energy landscape at each frequency described by the Ising models, we characterize cross-scale interactions showing an asymmetry between upward and downward influences, where high-frequency synchronization influences nearby slower frequencies, while slow frequency bands are able to modulate distant faster bands. We argue that our results offer a promising step towards the description of general mechanisms operating at different scales, suggesting the existence of general rules for scaling up and down the dynamics of multitudinous collective systems.

2. Results

We use a data set of 1,444,051 time-stamped tweets from 181,146 users, collected through the Twitter streaming API between 13 May 2011 and 31 May 2011 [20] using T-Hoarder [24]. Messages were captured during 17 days during the Spanish 15M social unrest events in 2011 containing at least one of a set of 12 keywords or hashtags related to the protest (see [20] for a detailed description). We extracted geographical information from the location information of users (see Supplementary Materials (available here)), selecting the 15 urban areas with the largest number of messages. Using this information, we generated time-stamped series reflecting the number of tweets emitted from each city for intervals of 60 seconds.

2.1. Synchronization at Multiple Frequencies

One of the most prominent features of the M movement was its fast territorial development. Without any coordination centre or any formal organization, the movement was able to reproduce a network of camps across Spanish cities in a period of a few days. As this coordination between geographical nodes takes place at several temporal scales, we propose a generic description of these interactions based on the temporal coordination of oscillations at multiple frequencies. We analyse the coordination between populations at main Spanish cities using Morlet wavelet filtering to extract the phase content of the activity time series at city at time and frequency , with a span of frequencies in the range [, ] (from 10 minutes to 3 hours) logarithmically distributed with intervals of . We use phase-locking statistics [25] to define phase-locking values between two cities and aswhere is the size of the window of temporal integration: , being the number of cycles in which we analyse phase-locking. We use a value of cycles, similar to the values typically used in neuroscience, ensuring that we are detecting sustained synchronization. is a corrector factor removing spurious synchronization when the network is inactive (e.g., during nighttime, see Supplementary Materials).

Statistical significance of phase-locking values is determined by comparing them to phase-locking values of surrogate time series obtained using the amplitude adjusted Fourier transform [26]. We use surrogate time series to estimate a significance threshold for the values of for all values of . The average phase-locking values of surrogate time series were used to compute a threshold , indicating a value higher than of surrogate data. Using this threshold, we define phase-locking links between two cities and as statistically salient values of :

As we document in [27], using phase-locking statistics we find widespread moments of significant synchrony at different instants often corresponding with important moments of the 15M protests. As well, in the Supplementary Materials (Section S4), we provide an analysis of the stability of the synchronization patterns found by wavelet filtering in comparison with other choices of window width for filtering the data.

For illustrative purposes, in Figure 1 we show the total number of phase-locking links for a specific day of the protests. At faster frequencies (lower period), we observe short and less intense instants of synchrony, while at slower frequencies synchrony lasts for longer periods of time. Using wavelet pattern matching [28] over after applying a linear detrending, we detect frequency peaks of synchronization in the system (see Supplementary Materials, Figure S1 and Table S1), identifying eight main frequency bands of synchronization , , where larger corresponds to larger timescales (i.e., slower frequencies).

2.2. Pairwise Maximum Entropy Modelling of Phase-Locking Statistics

In order to inspect how these phase-locked coalitions are operating at each frequency band, we derive from our data statistical mechanics models of the system. With these models we can infer macroscopic properties from microscopic descriptions of the system. Specifically, we use Ising models, which consist of discrete variables that traditionally are assumed to represent the magnetic moments of atomic spins that can be in one of two states (+1 or −1). In our case, positive spins will represent the presence of synchronizing activity of a node at a particular frequency. Spins are connected to other spins in the networks, allowing pairwise interaction between nodes. This is the least-structured (i.e., maximum entropy) model that is consistent with the mean activation rate and correlations of the nodes in the network. Pairwise maximum entropy models have been successfully used to map the activity of networks of neurons [29], antibody sequences [30], or flocks of birds [31]. These models, instead of being postulated as approximations of real phenomena, can infer exact mappings capturing measured properties of a system (means and correlations in our case), making them good candidates for capturing the structures underlying social coordination.

Using Ising models, we infer the probability distribution of possible states of the network at a specific synchronization frequency, corresponding to all the combinations of binary possibilities of each node being or not being phase-locked to other nodes in the network. For simplicity, we consider the state of a node equal to one when the node is active in a synchronized cluster (i.e., when ), and otherwise the state of the node is set to .

The maximum entropy distribution consistent with a known average energy is the Boltzmann distribution , where is a state of the network, is the partition function, and , being Boltzmann’s constant and the temperature. The energy of the model with pairwise interactions is defined as , where “magnetic fields” represent influences in the activation of individual nodes and “exchange couplings” stand for the tendencies correlating the activity between nodes. Without loss of generality, we can set the temperature . Considering a pairwise model, the resulting distribution of the maximum entropy model iswhere and are adjusted to reproduce the measured mean and correlation values between nodes in the network.

From the frequency bands extracted in the previous section, we extract models of pairwise correlations at the corresponding frequencies. For each frequency band, we infer an Ising model solving the corresponding inverse Ising problem, using a coordinate descent algorithm (see Methods) for fitting the parameters and that reproduce the means and correlations found in the series of states for the description of phase-locking relations at each frequency.

The accuracy of the inferred models can be evaluated by testing how much of the correlation structure of the data is captured. One measure to evaluate this is the ratio of multi-information between model and real data [32]. In our case, our data limits us to computing the entropy of small sets of nodes (between 5 and 7). Limiting our entropy calculations to random sets of five to seven nodes (see Table S2), we can see in Figure S3 and Table S3 that our models are able to capture around 70% of the correlations in the data for subsets of the indicated sizes (see Supplementary Materials for a detailed description).

Once we have extracted a battery of models , indicating the probability distributions of phase-locking configurations at different frequency bands, we explore the thermodynamic (macroscopic) properties associated with them. First, we observe that all the models are poised close to critical points. One signature of criticality we find is that the probability distribution of follows Zipf’s law (Figure 2(a)), specially for slower values of . Finding a scale-free distribution in our model is consistent with power laws appearing in the dynamics of the temporal series of tweet activity found in this data set [27] or in structural parameters in similar data sets [22]. Nevertheless, the sole occurrence of a power law is generally insufficient to assess the presence of criticality and may arise naturally in some nonequilibrium conditions. Thus, further evidence is necessary to test if the system is in a critical point.

The Ising model allows us to find further evidence of the critical behaviour of the model by exploring divergences of some variables in its parameter space. By introducing a fictitious temperature parameter (previously assumed to be equal to 1), we can explore the parameter space of the system and look for critical points. Modifying the value of is equivalent to a global rescaling of the parameters of the agent transforming and , thus exploring the parameter space along one specific direction.

Specifically, a sufficient condition for describing a critical point in the parameter space of an Ising model is the divergence of its heat capacity, which is defined aswhere is the Shannon Entropy of the probability distribution of an Ising model. A divergence in the heat capacity of the system is an indicator of critical phenomena. As we observe in Figure 2(b), for all the peak of the heat capacity is around the value , suggesting that the models are poised near critical points. Inferring the Ising models to match random subsets of the network nodes (see Supplementary Materials), we observe how the normalized peak in the heat capacity averaged over 100 random models diverges with the system size (the specific, although representative, case of is shown in Figure 2(c); see Figure S5 for other frequencies), where grows with with a nearly linear rate in the range (see Supplementary Materials). Together with the Zipf distribution, the divergence of the heat capacity suggests that social coordination phenomena in the 15M social network are operating in a state of criticality [32].

The fact that all frequency bands are operating near critical points does not mean that they are displaying the same behaviour. We can extract more information about the behaviour of the system at each frequency by analysing the presence of locally stable or metastable states in the system. Metastable states are defined as states whose energy is lower than any of its adjacent states, where adjacency is defined by single spin flips. This means that in a deterministic state (i.e., a Hopfield network with ) these points would act as attractors of the system. In our statistical model, metastable states are points in which the system tends to be poised, since their probability is higher than any of its adjacent states. Finding the metastable states of the models at each frequency, we observe how the number of metastable states increases for slower frequencies (Figure S4(B)), as the model presents a higher number of negative (inhibitory) couplings (see Figures S4(A) and S6(B)). A detailed list of metastable states and their basins of attraction can be found in Table S4.

Moreover, if we count the number of nodes that are phase-locked (i.e., the sum of all nodes with ) for each metastable state represented in Figure 3, we observe important distinctions among frequency bands. For faster values of , there are only a few metastable states: a state where all nodes are not phase-locked (i.e., the system is completely desynchronized) and a few values where almost all nodes are phase-locked. Thus, at fast frequencies synchronization rapidly spreads from zero to all nodes in the network. On the other hand, for slower frequencies the number of metastable states grows and the number of phase-locked nodes for each state decreases. This shows that slow frequency synchronizations allows the creation of a variety of clusters of partial synchronization, allowing parts of the network to sustain a differentiated behaviour.

These results suggest that fast and slow synchronization frequencies in the network operate in complementary regimes, all operating near critical points, the former rapidly propagating information to all the network and the latter sustaining a variety of configurations responding to specific situations. Systems in critical points present a wide range of dynamic scales of activity and maximal sensitivity to external fluctuations. These features may be crucial for large systems that are self-organized in a distributed fashion. The presence of these complementary modes of critical behaviour at different frequency bands suggests that the system might be operating in a state of self-organized criticality, in which frequency bands adaptively regulate each other in order to maintain a global critical behaviour.

2.3. Cross-Scale Interactions between Frequency Bands

Modelling phase-locking statistics provides a characterization of the interactions within frequency bands of synchronization. Furthermore, differences in the metastable states at each frequency band suggest what kind of interactions take place between distinct temporal scales. Because our definition of phase-locking statistics is restricted to interactions within the same frequency, we cannot use the computed phase-locking statistics to directly model interscale phase-locking between different frequencies (e.g., 2 : 1 phase-locking). However, we can use the thermodynamic descriptions of the system provided by maximum entropy models to simplify the analysis of interscale relations in real data.

Analysis of multiscale causal relations is typically a difficult task, and in our case we have to deal with a system of a high number of dimensions ( dimensions). Nevertheless, the Ising models describe the stability of the configurations of the 15 nodes in the network at each frequency band with an energy value. Thus, an easier way to describe multiscale interactions is to observe how fluctuations in the energy at one level affect the energy of the system at other levels, reducing the dimensions we have to deal with to only the frequencies of synchronization.

We characterize the information flow between frequency bands using transfer entropy [33] between energy levels at each frequency . Transfer entropy captures the decrease of uncertainty in the state of a variable derived from the past state of other variable :where denotes the state of at time and indicates the temporal distance used to capture interactions.

In order to compute transfer entropy over energy values between timescales, we discretize the values of energy into a variable with 3 discrete bins using the Jenks-Caspall algorithm [34]. The value of 3 bins was selected to optimize the computation of joint probability density functions (see Supplementary Materials) although we tested values from 2 to 6 bins with similar results. Using transfer entropy, we estimate the causal interactions between energetic states at each timescale by computing the values of (see Figure S7 for a representation of transfer entropy functions) for values of between and minutes (i.e., up to 8.5 hours) logarithmically distributed with intervals of .

To simplify the interpretation of the data, we compute the average value of transfer entropy (across the logarithmic range of ) for pairs of frequencies as (Figure 4(a)). Moreover, we separate the values of upward and downward flows of information for each node, characterizing and , where takes values between and , and upward and downward entropies are divided by their maximum values in order to compare transfer entropy between nodes with distinct values of entropy. In Figure 4(b), we observe upward and downward flows of information. As we can see, upward flows decrease importantly with distance between scales. In contrast, downward flows increase slightly with distance between scales.

These results show an interesting picture of cross-scale interactions. While in upward interactions energy at each frequency band only influences neighbouring slower bands, in downward interactions slow frequency bands modulate distant faster bands. We also observe this in the schematic in Figure 5(a), where for simplicity only the largest values of and are displayed for each frequency band. These results suggest that there might be general rules for scaling up and scaling down social coordination dynamics in a nested structure of frequency bands. The mechanisms involved might resemble those found in neuroscience, where upward cascades have been found to take place in the form of avalanches propagating local synchrony and downward cascades take the form of phase-amplitude modulation of local high-frequency oscillations by large-scale slow oscillations [16]. Future research is required for testing the application of these rules to other social coordination phenomena and the specific mechanisms operating behind upward and downward cross-scale interactions.

3. Discussion

It is appealing to think that general coordinative mechanisms may be suited to explain the behaviour of social systems at different scales. Here, using a large-scale social media data set, we have shown how the application of maximum entropy inference methods over phase-locking statistics at different frequencies offers the prospect of understanding collective phenomena at a deeper level. The presented results provide interesting insights about the self-organization of digitally connected multitudes. Our contribution shows that phase-locking mechanisms at different frequencies operate in a state of criticality for rapidly integrating the activity of the network at fast frequencies while building up an increasing diversity of distinct configurations at slower frequencies. Moreover, the asymmetry between upward and downward flows of information suggests how social systems operating through distributed transient synchronization may create a hierarchical structure of temporal timescales, in which hierarchy is not reflected in a centralized control but in the asymmetry of information flows between the coordinative structures at different frequencies of activity. This offers a tentative explanation of how a unified collective agency, such as the 15M movement, might emerge in a distributed manner from mechanisms of transient large-scale synchronization. Of particular interest would be to test the extent our findings about the structural and functional relations of social coordination apply to other self-organizing social systems, or their relation with mechanisms of cross-scale interactions known from large-scale systems neuroscience. A new generation of experimental findings based on statistical mechanics models may provide the opportunity to discover the mechanisms behind multitudinous social self-organization.

4. Methods

4.1. Data Availability

The data employed in this study was kindly provided by the authors of [20].

4.2. Learning Pairwise Maximum Entropy Models from Data

Ising models are inferred using an adapted version of the coordinate descent algorithm described in [35]. The coordinate descent algorithm works by iteratively adjusting a single weight or that will maximize an approximation of the change in the empirical logarithmic loss between the observed data and the model, computed through the means and correlations present in the empirical data and the model. The code implementing the coordinate descent algorithm is available at https://github.com/MiguelAguilera/Rhythms-of-the-Collective-Brain-code/.

Conflicts of Interest

The author declares no competing financial interests.

Acknowledgments

This research was supported in part by the Spanish National Programme for Fostering Excellence in Scientific and Technical Research Project PSI2014-62092-EXP, Projects TIN2016-80347-R and FFI2014-52173-P funded by the Spanish Ministry of Economy and Competitiveness, and the UPV/EHU postdoctoral training program ESPDOC17/17.

Supplementary Materials

Table S1: frequencies of salient synchronization. Table representing the frequency values corresponding to the peaks represented in Figure S1(B). Figure S1: peaks of salient synchronization. (A) Sum of total phase-locking links for each value of frequency (solid line). We detect a log-linear trend that we remove for detecting synchronization peaks. (B) Detrended for each value of frequency (solid line). Synchronization peaks found using a two-dimensional wavelet transform (black dots). Figure S2: stability of synchronization patterns. Average value of the sum of derivatives of the salient values of synchronization for different multiplicators of the width of the wavelet windows. We find that small or large multiplicators reduce the stability of salient synchronization patterns, suggesting that wavelet filtering is a good strategy for defining the windows for phase-estimation filtering. Table S2: number of state transitions for each frequency. Number of transitions between states from the data used for computing the Ising model at each selected frequency. Figure S3: accuracy of the model. Values of calculated for Ising models computed from subsets of combinations of nodes at each frequency. Each boxplot represents the distribution of for 100 subsets of nodes at a specific frequency band selected randomly comparing the multi-information of the model and real data. Figure S4: negative couplings and number of metastable states. (A) Ratio of negative couplings for the inferred values of for each frequency. (B) Count of the number of metastable states for each frequency. Figure S5: divergence of the heat capacity of the system. (A) Normalized heat capacity of the Ising models for sizes 6, 9, and 12 (averaged over 100 random models) and 15, where the larger peaks correspond to larger sizes. (B) Linear trend (solid line) of the peaks of (dots) with respect to the size of the system. Table S3: distributions of multi-information ratios. Mean and standard deviation for each distribution in Figure S3. Figure S6: parameters of the Ising models. For each frequency, we depict the parameters (A) and (B) of the inferred Ising models. Figure S7: transfer entropy. We represent functions for values of in a range within (i.e., from 1 minute to 8.5 hours) logarithmically distributed with intervals of . Rows specify the value of while columns specify the value of . For each graph, the vertical axis represents the value of transfer entropy and the horizontal axis the value of in minutes. Table S4: metastable states. Metastable state (where positive spins are marked with 1 s and negative with 0 s), probability of the metastable state , and basin of attraction of the metastable state . (Supplementary Materials)