Abstract

We have retrieved and analyzed several millions of Twitter messages corresponding to the Spanish general elections held on the 20th of December 2015 and repeated on the 26th of June 2016. The availability of data from two electoral campaigns that are very close in time allows us to compare collective behaviors of two analogous social systems with a similar context. By computing and analyzing the time series of daily activity, we have found a significant linear correlation between both elections. Additionally, we have revealed that the daily number of tweets, retweets, and mentions follow a power law with respect to the number of unique users that take part in the conversation. Furthermore, we have verified that the topologies of the networks of mentions and retweets do not change from one election to the other, indicating that their underlying dynamics are robust in the face of a change in social context. Hence, in the light of our results, there are several recurrent collective behavioral patterns that exhibit similar and consistent properties in different electoral campaigns.

1. Introduction

Nowadays, social networking sites (SNS) are a well-established communication medium. They are used by a huge user base to share experiences, discuss opinions, read the news, etc. Twitter is one of the most dynamic SNS with respect to the interactions among users and one of the most powerful with respect to the potential information that can be extracted for research purposes. Moreover, in the last quarter of 2015 this social network had 305 million monthly active users, and the last data show that the number of monthly active users has risen to 335 million in the second quarter of 2018 [1].

These facts have stimulated the development of research projects from a wide variety of fields, from sociology to network science, that have provided new perspectives to the study of user behavior. These studies unravel the emergent patterns in our collective behavior and show how they can be used to gain insight about relevant social topics like economy [2], marketing [3], politics [4, 5], mobility [6], or polarization [7, 8].

In this work, we are interested in electoral processes. In these kinds of contexts, people use social media as a communication medium to exchange opinions. However, there are also some users whose purpose is to influence the conversation in a way that may affect the voting choices of the people. Although these are online actions, they have an impact on the offline world.

The concise character of a tweet turns it into a powerful tool to share breaking news and take part in dynamic debates [9]. That is why it is one of the most used media by journalists, politicians, and politics enthusiasts to share their views and news. Nowadays, even traditional media often use Twitter as a news source. Furthermore, politicians consider Twitter one of the most consequential social media for their job [10], although other online networking sites like Facebook also hold a high relevance.

In the Obama campaign of 2008, the efficacy of Twitter as a communication medium in a political context was established for the first time [11, 12]. From then on, it has gained more and more relevance in political campaigns. This trend has also inspired an ever growing number of academic works that use Twitter data to analyze different aspects of the political campaigns.

Some lines of research are centered around predicting the outcome of elections following diverse techniques [4, 5, 13]. Other works investigate the interaction among parties to assess the cohesion of the parties [14] or the level of debate [5]. There are also studies that analyze the content of the tweets in order to detect their sentiment [1517]. Two extensive reviews that cover a variety of subjects within the area can be found in [18, 19].

In a previous paper [5] the Spanish general elections of 2011 were studied using Twitter data. That work was focused on the prediction of the outcome of the election using a metric that measures political sentiment. The authors also explore the interactions among users by analyzing the structural and dynamical patterns of the complex networks emergent from the mention and retweet networks. From the study of the communication dynamics among politicians, they found a lack of debate, and a network growth model was proposed to reproduce these interactions.

Although we compare some of our results to those presented in that work, in this study we do not aim to predict the outcome of the elections or simply study the relations among users and politicians. We take advantage of the availability of Twitter data gathered during two electoral campaigns (the Spanish general elections of 2015 and the repetition of the elections in 2016) that are very close in time to compare the collective user behavior manifested in two analogous social systems with a similar context. The individual activity and the political actors may be different; in fact, in the second election two parties formed a coalition, altering the political landscape. Our objective is to study and characterize emergent behaviors that are recurrently manifested in political contexts.

To this end, we have computed and compared time series of daily activity for both elections, finding that the temporal series of activity for both electoral periods are significantly correlated. Furthermore, our results suggest that the number of tweets, retweets, and mentions follow a power law with respect to the number of unique users that take part in the conversation. In order to explore the evolution of the interactions among users, we have built networks of mentions and retweets and studied their temporal evolution. This has enabled us to verify that they show similar topological properties in both electoral periods. Besides, we have studied the mention and retweet subgraphs induced by political users and obtained results that imply a lack of communication among different parties and are in agreement with previous works [5]. Furthermore, we have found that an intensification of the interaction can be detected between parties after the formation of a coalition.

The paper is organised as follows. In the second section we describe the political context, some relevant aspects of the interaction mechanisms in Twitter, the characteristics of our dataset, and the methodology followed to build the networks of interactions. In the third section, we present and discuss our results with respect to the user activity and the evolution of the mention and retweet networks. We also use a metric to discuss the influence of regular users and politicians. Furthermore, we analyze the degree of debate among politicians of different parties. Finally, we summarize the main conclusions.

2. Materials and Methods

2.1. Political Context

Spain has a bicameral parliamentary system, where the lower house is called Congress of Deputies and the upper house, the Senate. For elections to the Congress of Deputies, held every four years, each of the 50 provinces serves as an electoral district, with the number of deputies representing it determined by its population. Under a proportional representation electoral system governed by the d’Hondt formula, ballots are cast for a provincewide party list rather than for candidates representing individual constituencies.

About four-fifths of the members of the Senate are directly elected via a plurality system at the provincial level. Each province is entitled to four representatives; voters cast ballots for three candidates, and those with the most votes are elected. The remainder of the senators are appointed by the regional legislatures. Because representation is not based upon population, in the Senate smaller and more-rural provinces generally are overrepresented in relation to their overall population [21]. Hence, with the purpose of illustrating the Spanish political landscape, in this work we have chosen to report only the results for the Congress of Deputies, as the distribution of seats is supposed to be more representative.

On the 20th of December 2015, the Spanish general elections were held. The PP (Partido Popular, People’s Party) and the PSOE (Partido Socialista Obrero Español, Spanish Socialist Workers’ Party), which constituted the traditional two-party system had lost a lot of social support, while the emerging parties Podemos (We Can) and Cs (Ciudadanos, Citizens) were on the rise. This caused a transition from a two-party system to a multiparty system [22].

In spite of that, the PP, which was holding the government, was still leading the polls. The rest of the parties were behind but not too far away. In fact, the supports for the other three main parties fluctuated so much during the year before the election that it seemed impossible to predict, by looking at the polls, which would be the final ranking of votes [23].

The result was a fragmented parliament where no party held an absolute majority and large coalitions were needed to form a government. The votes and seats that each party obtained are displayed in Table 1. After several months of negotiations there was no agreement between any group of parties large enough to obtain a parliamentary majority that would allow the formation of a government. This led to the announcement of a new election on the 26th of June 2016.

Before this new election, one of the emerging parties, Podemos, formed a coalition with IU (Izquierda Unida, United Left). This alliance was called UP (Unidos Podemos, United We Can).

The 2016 election resulted in a parliament that was almost as fragmented as the one in 2015. The votes and seats obtained by each party are presented in Table 1. In October 2016, the PP finally won the appointment vote and formed a minority government (in June 2018, a motion of no-confidence was issued against the ruling party (PP) and PSOE formed a minority government).

2.2. Description of the Data

We have worked with Twitter messages retrieved with the Twitter Streaming API. This API allows downloading tweets matching a set of keywords. In order to avoid biased results, we have chosen the following neutral keywords to filter the messages:(i)Keywords for the 2015 election: 20D, 20D2015, #EleccionesGenerales2015(ii)Keywords for the 2016 election: 26J, 26J2016, #EleccionesGenerales2016, #Elecciones26J

We have downloaded tweets during a period of more than two months before and after each election. However, the core of our analysis has been focused on the 15 days of the official electoral campaign, the reflection day (day before the election), the election day, and the day after. During that period of 18 days we have retrieved 1793145 tweets for the 2015 election and 1755438 for the 2016 election.

Besides the keywords used to retrieve the data, the most used hashtags that appear in our dataset are the following:(i)Top hashtags of 2015: #podemos, #psoe, #partidopopular, #ciudadanos, #l6elecciones, #7deldebatedecisivo, #votapsoe, #possible, #pp, #mivotocuenta, #españa, #españaenserio, #podemos20dic, #20dicpodemos, #podemosremontada, #votapp, #hevotado, #votapodemos20d(ii)Top hashtags of 2016: #afavor, #unidospodemos, #l6elecciones, #votapsoe, #Debate13j, #psoe, #cambioamejor, #votapp, #españa, #ciudadanos, #partidopopular, #elcanvipossible, #unsiporelcambio, #avotar, #lasonrisadeunpais, #brexit, #eleccionesgenerales

We have also compiled lists of Twitter accounts associated with the four main parties (PP, PSOE, Podemos and Cs) and with IU. The latter is relevant because, in the 2016 election, as we explained in Section 2.1, this party formed a coalition (UP) with Podemos, a fact that is reflected on the data. These lists of accounts allow us to analyze the difference of behaviors of regular users and politicians.

In order to build the set of political accounts, we have looked into the Twitter lists (which are lists of accounts elaborated by the users) defined by relevant official accounts associated with each party. We have downloaded those lists that include politicians, political institutions, or supporters of the party. The total number of retrieved political users that participated in the conversation was 5227 in 2015, with an average number of followers of 4044, and 5012 in 2016, with an average number of followers of 4662.

2.3. How Does Twitter Work?

In this section we will briefly describe some of the characteristics of Twitter. Specifically, the means of communication among users, which will be the source used to build the networks of interactions.

In Twitter there are several mechanisms of interaction among users. The first one is to follow other users. A user will receive every tweet that her followers post. Besides, there are five other mechanisms of interaction within the social network: the direct message, the mention, the retweet, the quote, and the reply. We will focus on the mentions and the retweets, as they are the most widely used.

The mention is a public direct communication mechanism. It consists of including the name of a user in a tweet ([..]@username[…]). This way, the mentioned user will receive the message even if she is not following the user that posted it. There exists a convention in Twitter according to which, whenever a given user is mentioned inside a tweet, her @username is used instead of her real name, independently of whether the intention is to establish a communication with the target user or not.

The retweet is a broadcast mechanism. Every retweeted message by a user is forwarded to her followers the same way as an original tweet. In a political context retweeting usually, but not always, implies the endorsement of the ideas contained in the message.

2.4. Building the Networks of Mentions and Retweets

Here we will explain the methodology that we have adopted to build the networks of interactions from Twitter data. The interactions that we have analyzed are the mention and the retweet.

We have built mention networks by considering each user participating in the conversation as a node. Two nodes j and i are joined with a directed link from to when user j mentions user i; that is, j posts a tweet including @name_of_i. The result is a directed and weighted network where the weight of each link corresponds to the total number of tweets where j mentions i gathered during a given interval of time. The in-degree, or , of a user in this network corresponds to the total number of users that have been mentioned . corresponds to the total number of users that she has mentioned.

The retweet network is built in a similar way as the mention network. The nodes are the users and user j is joined with a directed link to user i if j retweets a message originally posted by i. If a user retweets an original tweet by and retweets the message broadcast by , there are links from to and from to , but not from to . Figure 1 illustrates this mechanism.

Let us remark this idea: links in the retweet network join retweeters with original posters; there are no links between the middlemen that broadcast the original message. The reason for this choice of methodology is that the Twitter API used to download the data at that moment only provided information about the original poster of a tweet in the retweet metadata. The retweet network is then a directed and weighted network where the weights of the links from to are the number of messages of i retweeted by j during a given period of time. of a user in this network corresponds to the total number of users that have retweeted messages by . corresponds to the total number of users whose messages she has retweeted. The retweet mechanism usually results in networks that display large star subgraphs.

We have considered two different temporal scales. On one hand, we have built daily networks, counting the 24 hours of a day from 4 AM (UTC) (in Spain, the local time corresponds to UTC+1 for winter time and UTC+2 for summer time) of that day until 4 AM of the following day. This way we capture full human activity cycles. This approach is based on a work by Morales et al. [24], where human synchronization is studied at several scales, finding strong periodicities linked to day-night and social cycles as well as a developing global synchronization that adds a new perspective to the phenomenon of globalization. On the other hand, we have built aggregated networks for the 15 days of electoral campaign plus the next three days (before election, election, and after election).

3. Results

3.1. Temporal Evolution of User Activity

In order to characterize the global activity of the users (number of tweets posted during a given period of time), we have computed time series of total daily activity for the whole period considered. Additionally, we have analyzed the temporal evolution of the distribution of daily activity per user.

In the left panels of Figure 2 we have plotted the time series of daily activity for both electoral campaigns. We have aligned them such that the voting day coincides with day 0. As we can see in that figure, before the electoral campaign the time series of daily activity present low values, but they start to rise around the onset of both campaigns.

In line with the existing literature [5, 25, 26], the features of the time series of activity can be explained to a high extent by offline events. There is a huge peak of messages on the election day, which accounts for more than 30% of the total of messages during the period of study (36% in 2015 election and 32% in 2016). There are also smaller peaks during the period of the electoral campaign that can be explained by the main electoral debates.

In Spain, preelectoral silence is mandatory during the period that spans from 12AM of the day before the election to the closing of voting polls. This period is called reflection day and a number of restrictions apply with respect to the behavior of political parties and media. Among others, electoral events can not be held and the spread of electoral propaganda is forbidden [27]. The time series experiences a decrement that day, which may be triggered by the fact that politicians cease to campaign in both online and offline media. Because of this lack of stimulus, the conversation among common users decays.

It is worth pointing out the similarity of the temporal evolution of the user activity in both elections. In Figure 2, we have plotted (in logarithmic scale) the time series of one period against the other such that the election days coincide. By performing a linear regression of these data, a significant linear correlation has been found. We obtain when we perform the linear regression including the day of the elections and if we exclude it. However, notice that the slopes present only a difference of . This correlation can be partially explained by the similarity of the contexts and by the recurrent structure of the electoral campaign. It always starts 16 days before the election and lasts for 15 days. Moreover, the main debates were both held 13 days before the election. Due to the fact that the number of tweets in the 2016 election was slightly lower than in 2015, the slope of the linear regression is lower than 1.

Another relevant property of the user activity is displayed in Figure 3, where we show that the total number of tweets, retweets, and mentions per day follows a power law with respect to the number of unique users each day. Accordingly, we have fitted our data to a power law with exponent . The values of the exponent and correlation coefficients of the fits are displayed in Table 2. To understand this behavior we have followed the work by Leskovec et al. [28], where it is shown that when real-world networks evolve through time, the number of links scales with the number of nodes as

Notice that, whereas in 2015 the growth for the three quantities was slightly super-linear with respect to the number of users, in 2016 we observe an approximately linear behavior. Hence, in 2015 when more users join the conversation, the activity experiences a proportionally higher increment than in 2016.

In order to further explore the characteristics of the user behavior, we have also analyzed the temporal evolution of the distribution of the daily user activity shown in the left panels of Figure 4. The activity distributions show a heterogeneous character, result that is in concordance with the literature [29]. We have fitted them to the following discrete power law:

where in this case is the daily activity and is the Riemann zeta function.

In the right panels of Figure 4 we have plotted the temporal evolution of the exponent of the power law. It has been computed by numerically solving the following equation to obtain the maximum likelihood estimator [30] (MLE) of :

where the prime denotes differentiation with respect to the first argument and corresponds to the number of users. The uncertainty of has been computed as follows:

This technique is proven to be more precise than a minimum squares fit to the log-log plot, which usually yielding incorrect results when computing the parameters of power law distributions [30]. Every other exponent of power laws of discrete data has been computed in the same way. Since the minimum value of activity is 1 (and is also the most abundant in the data), we have fixed when computing the MLE of .

We can see that the values of the exponent fluctuate within a small interval () during the considered period for both elections. These values are perfectly compatible with those presented in [5], where the authors obtained a value of .

The small fluctuations of the exponent tell us that the collective behavior does not change much from one day to the other. If we take into account the fact that the value of the exponent of the distribution controls how fast it decays, we see that, when the exponent is smaller, the activity reaches higher values (and vice versa). Consequently, by looking at Figure 4, we can see that the day of the elections we obtain a low exponent due to the increase in activity. Additionally, during the electoral silence, consistently with the results shown in Figure 2, we appreciate a decrease in activity. Notice however that while in the case of Figure 4 the points corresponding to the electoral silence are the highest, they are not the lowest in the activity time series of Figure 2. This implies that although individually each user tends to post fewer messages that day, there are still a lot of users taking part in the conversation.

3.2. Temporal Evolution of Mention and Retweet Networks

We have analyzed the temporal evolution of the aggregated mention and retweet networks at two different temporal scales. On one hand, we have aggregated the networks for the whole campaign period (plus the next three days); on the other hand, we have performed an analysis of the temporal evolution of the networks by aggregating the data for each day separately and computing time series for different metrics.

In Figure 5 we show the strongly connected component of the aggregated networks of 2015. Colors correspond to the communities computed with the Louvain algorithm [20]. We have obtained a modularity of for the retweet network and for the mention network. Other community detection algorithms were applied obtaining analogous results [31]. We have indicated the most probable affiliations of the nodes of each community. In order to do that we have visually inspected which nodes (or users) are the most central in each community. The centralities of the nodes have been computed with the metric [32], which measures the influence of a node based on its neighborhood. The sizes of the nodes in Figure 5 have been represented proportional to .

Every well-defined group seems to correspond to a political party. Whereas in the representation of the mention network the different groups are well-defined and the most central nodes correspond to the leaders of the communities and are politicians or political parties, in the retweet network a mixing of nodes of different communities placed in the center of the representation can be appreciated. Most of these nodes present high centralities and belong to different communication media. This is in good agreement with the literature [33].

We have computed several statistical properties of these networks: the number of nodes () and links (), the density (), the average clustering coefficient (), and the in- and out-degree distributions. We have found that they are very heterogeneous and fitted them to a power law following the methodology described in Section 3.1. The in-degree () distribution is more heterogeneous than the out-degree () distribution; that is, the exponents follow the relation . This is due to the fact that the is associated with individual efforts whereas the corresponds to collective actions. An individual is physically limited to posting a given number of tweets during a given period of time, whereas a large group of users may post many more messages in the same period of time. Because of this, high values are more probable than high values, leading to the aforementioned relationship between the exponents.

The properties of the aggregated networks are displayed in Table 3, where we can appreciate clear differences between the mention and the retweet networks. First of all, the former presents higher average and maximum degree. Whereas the higher of the mention network shows that its distribution is slightly less heterogeneous than the one for the retweets, the lower value of implies that the distribution is more heterogeneous for mentions than for retweets. The first phenomenon can be attributed to the construction methodology of the networks. Since all retweeters are linked to the original poster, high values of are slightly more common in the retweet network. The second phenomenon arises from the fact that there can be more than one mention per tweet and in every retweet there is, at least, one mention to the original poster.

The higher average clustering for mention networks can be explained by the different uses of mentions and retweets. Mentions are a communication and allusion mechanism whereas the retweets are used mainly to broadcast messages, leading to more clustered networks on the first case and to more star-like subgraphs on the second one. Nevertheless, the construction mechanism of the networks also plays a relevant role, since the relationships between middlemen are not present in the retweet network.

The results show very little change from one election to the other, suggesting that the underlying dynamics of the networks of interactions are consistent and, to some extent, independent of the context.

With respect to the temporal evolution of the exponent of the in-degree distribution for mention and retweet networks, which is shown in Figure 6, it remains approximately constant for 47 days before the election (for simplicity, only 16 days before the election are shown in the figure). The average values during that period are and for the mention networks and and for retweet networks, with a confidence interval of . These results indicate a high consistency of the user behavior for both elections. The exponents are slightly higher to the one of the aggregated networks because in those networks almost everyone has more mentions, making the tail of the distribution longer and heavier.

In the case of the temporal evolution of the out-degree distribution exponent, displayed in Figure 6, it also remains approximately constant for 47 days before the election. The average values during that period are and for the mention networks and and for the retweet networks, with a confidence interval of . Analogously to the in-degrees, the similarity of the values of the exponents from one year to the other suggests a recurrent behavioral pattern. They are also similar to the values of the aggregated network, but higher, meaning that the heterogeneity is lower. The reason is again the construction mechanism of these networks. People are active during all the period under study, producing more links each day, thus slightly changing the distribution to a more heterogeneous one as more days are aggregated.

We have explored the degree correlations of the networks and their evolution in several ways. First, we have computed the degree assortativities [34], presented in Table 4, and compared them with assortativities of randomized networks via a Z-Score in order to determine if the assortativities are a result of the degree distributions alone or of more complex correlations. The Z-Scores can be consulted in Table 5. The randomization has been carried out using 500 realizations of the directed configuration model [35] implemented in the NetworkX Python module [36].

As we can see in Table 4, the assortativities display a recurrent order among networks and years: . Consistently with the rest of the measures presented so far, they exhibit a high similarity among years. Moreover, when we compare the results of the degree correlations of the retweet network to those of the 2011 election [5], we see that they are very similar. This result suggests a recurrent behavior in the communication patterns of the users.

This order is maintained for the Z-Scores (see Table 5), meaning that the assortative relations are still more assortative than what we should expect for a random network and that the disassortative ones are more disassortative.

With respect to the temporal evolution of the daily degree assortativities that is displayed in Figure 7, we can see that they fluctuate around a more or less fixed value. However, these fluctuations are sometimes strong enough to alter the previously described order.

These fluctuations seem to be caused by disruptive events, both exogenous and endogenous. On day -6 of 2015 a debate was celebrated between the leader of the PP (then, the president) and the leader of the PSOE. A similar pattern, but weaker, can be observed in the time series of assortativity on day -13, which coincides with the other important debate of the campaign. On day -11 of 2016 we have found a viral tweet that contained a comical video about Spanish politics. The retweets received by that particular tweet amounted to of all the tweets published that day in the analyzed conversation, when the average proportion of retweets received by the most popular tweets each day during the 2016 campaign was .

3.3. Communication Efficiency

In order to measure the global influence of a user on the network, we have used the user efficiency metric [29]. In Twitter, it can be considered that a message has reached a higher impact if it gets a high number of retweets (RTs). The higher her average number of RTs per original tweet is the more efficient the user is. Hence, the efficiency of a user is defined as follows:

where is the in-strength of the user in the retweet network and is her activity, that is, the total number of posted messages. It should be noted that . Therefore, this measure represents the collective response to the individual action.

We have computed the distribution of efficiency for the global conversation including regular users and the politicians that participated in it. In order to compare the behavior of regular users with politicians, we have also considered the distribution of efficiency corresponding to the groups of accounts of the four main political parties described in Section 2.2.

In Figure 8 we represent, for both electoral campaigns, the probability distributions of efficiency for Twitter accounts associated with each party and for the whole set of users. This distribution corresponds to the probability mass function. There, we can appreciate that the distributions follow the same functional form for common users and users associated with political parties. In particular, they show a heterogeneous behavior characterized by a heavy tail with power law decay. This has been tested by fitting the tail of the distribution to different functions (exponential, gamma, lognormal, and power law) and choosing the best one according to the Akaike Information Criterion (AIC) [37]. The transition to the power law happens around . The same kind of behavior is observed in both elections. Notice that the shape of the efficiency distribution is not specific of our electoral context, since this structure has been observed in efficiency distributions corresponding to Twitter conversations of different natures and sizes [29].

We have compared the efficiency patterns of the political accounts of each party to the whole set of users for both elections. In order to do that, we have divided the interval that spans all the possible values of efficiency (shown in Figure 8) in logarithmic bins. Then, we have computed the probability of finding an account of a given party in each bin () and the probability of finding a user account belonging to the global conversation (). Finally, we have plotted the difference for the four main parties as shown in Figure 9.

In order to assess the significance of this result, we have taken 100 samples of 1000 randomly chosen users and computed their efficiency probability differences with respect to the bulk of users as described above. The average values of the differences and their standard deviations are represented, respectively, as blue dots and a blue-grey shadow in Figure 9.

In that figure, it can be noticed that accounts belonging to political parties tend to exhibit higher probabilities than regular users for efficiencies in the region and lower probabilities for efficiencies in the region . The behavior of this measure is similar for all the parties. According to this result, the collective reactions to the communication strategies of each party are comparable. Additionally, it is shown that, in general, political accounts are more efficient in propagating messages than regular users.

3.4. Communication among the Elites

While most of the users act as passive listeners or broadcasters, Twitter conversations are usually driven by a small elite of influential accounts [5, 38]. The composition of such elite varies depending on the main subject of the conversation and on the interaction medium [33]. As we have shown in the previous section, politicians are among the most efficient users, implying that they hold a high influence. Given the relevance of this group of users, in this section we will study the communication dynamics among them. To this end, we have retrieved lists of user accounts associated with each party following the process described in Section 2.2.

In order to analyze the communication among politicians, we have computed the subgraphs induced by the user accounts associated with political parties. Then, we have grouped nodes belonging to the same party in supernodes, obtaining a C-network. The resulting supernodes correspond to groups of users and the weights of the links among those supernodes are the sum of the weights of all the links that join a user from one group with a user of another in the original network. The resulting colored adjacency matrices are displayed in Figure 10, where the numbers are the weights of the links, that is, the total number of times that users of a party in a row have mentioned (retweeted) a user of another party in a column. The color is related to the proportion of mentions (retweets) from party to party relative to the total number of mentions (retweets) by party , such that if is an element of the adjacency matrix, the color of the cell would be proportional to .

As we can see, in line with previous works [5, 15], there is a complete lack of communication between accounts of different parties. The diagonal of the adjacency matrix (the self-links) holds the heaviest weights by more than two orders of magnitude. For the retweet networks, there are almost no links at all between different parties. Taking into account the fact that retweeting a message normally implies an endorsement of the ideas of the original poster, this is not surprising. In the case of the mention networks, there are more message exchanges than in the retweet network.

We have included the party IU in this analysis to study the effect of the agreement between them and Podemos to form a coalition (UP) in the 2016 election. The results displayed in Figure 10 show that, although both parties were already interacting in 2015, they exchanged much more messages in 2016.

We have also computed the evolution of the assortative mixing [39] by political affiliation using two different partitions of the nodes. In the first partition we assign each node to its own party such that we have five groups of nodes corresponding to PP, PSOE, Cs, Podemos, and IU. This is the partition by party. In the second partition, we assign the nodes belonging to IU and Podemos to the same group, leaving the rest of the nodes in their own parties. Hence, we have four groups that correspond to PP, PSOE, Cs, and UP = (Podemos+IU). We have called this second partition, the coalition partition.

The assortative mixing is a metric used to test if the links of a network preferently join nodes of the same kind and nodes of different kinds or the connections are random. In our case, the different kinds would be the different parties in one case and the coalitions in the other. In order to compute the assortative mixing, the nodes are classified in groups and the proportions of strength (that is, the total weights of the links) from group to group are computed. In our case, since we already have aggregated the nodes by party in the C-networks, we can use the elements of the adjacency matrix of such C-networks to determine :

Then, is the proportion of links with an origin node of type and the proportion of links with a target node of type . If we ignore the network structure, the probability of finding a link with origin node of type and target node of type would be . Taking that into account, the assortative mixing of a network is defined as follows:

This metric takes the value for a perfectly assortative network, for a random network, and for a perfectly disassortative one.

In Figure 11 we have plotted the temporal evolution of the assortative mixing by party and coalition. As we can see, the assortative mixing by party and coalition remains very high for mentions and retweets. The retweet network is however the most assortative, often reaching values of .

The drop in assortativity the day after the elections means that politicians of different parties interacted more with each other that day than during the campaign. This seems to be caused partially by some exchange of messages commenting the consequences of the results of the elections. There are tweets containing criticism to adversaries, congratulation messages to related parties, and tweets trying to convince or push potential allies to form coalitions. Notice, however, that this decrease, although significant, is not large: the value reached in 2015 is around 0.87 and in 2016 is 0.89. Consequently, we attribute the decrease both to the drop in the number of posted messages the day after the elections, which makes the data more noisy, and to an increment of message exchange between parties.

The most remarkable feature of these time series is the difference between the elections of 2015 and 2016. In the latter, the parties IU and Podemos formed a coalition, a fact that is reflected here in the following way: whereas in 2015 coalition and party assortativities are almost equal, in 2016 the coalition assortativity is clearly higher for both networks. This means that the communication between users from Podemos and IU is high enough to lower the general assortativity.

In order to assess the relevance of this effect, we have performed a paired t-test on the assortativities time series coupling each assortativity from 2015 with its counterpart of 2016. The null hypothesis is that the assortative mixing values for both elections have the same expected values. The results presented in Table 6 imply, for a statistical confidence of , a clear rejection of the null hypothesis for the parties time series while in the case of the coalitions time series the null hypothesis can not be rejected.

4. Conclusions

Our main goal in this work was to perform a comparative analysis of the user behavior in Twitter in two consecutive electoral campaigns in order to find the presence of correlations and recurrent patterns. To this end, we have analyzed temporal series and interaction networks corresponding to two Twitter datasets downloaded during the Spanish electoral campaigns of 2015 and 2016. Although the individual activity and the political actors may have changed, we have shown evidence of recurrent activity patterns in different political campaigns. In particular, the activity time series for both elections exhibit a significant correlation. Moreover, we have found a power law relationship between the daily rate of tweets (retweets and mentions) and the number of unique users. Finally, besides the behavioral stabilities mentioned above, we have been able to detect the effect of a political coalition in the interaction networks through the study of the evolution of their properties.

The results that we have obtained from the computation and analysis of the daily user activity time series for both elections indicate that they present a significant linear correlation. Additionally, by studying the distribution of user activity we have found that in both elections its exponent fluctuates in the same tight interval. The value of the exponent obtained in a previous work [5] also lies within this interval. These facts suggest the existence of recurrent activity patterns in different political campaigns.

We have shown that the daily rate of tweets, retweets, and mentions follow a power law with respect to the number of unique users that participated in the conversation each day. However, whereas in 2015 the growth for the three quantities was slightly super-linear with respect to the number of users, in 2016 we observe an approximately linear behavior. Hence, in 2015, when more users join the conversation, the activity experiences a proportionally higher increment than in 2016.

We have assessed the consistency of the topology of the mentions and retweets networks from one election to the other by computing the degree distribution and the degree correlations of the aggregated networks. The variation of the power law exponent of the degree distributions from one electoral period to the other is of 1% at most, whereas the degree correlations are shifted less than 10% from one year to the other. The values of these properties are also comparable to the results obtained in a previous work with a similar political context [5]. This indicates that the underlying interaction dynamics are robust in the face of a change in social context. However, the analysis of the daily evolution of the degree assortativity of the networks has enabled us to detect fluctuations that seem to be caused by disruptive events, both exogenous and endogenous, and have a relevant impact on the political conversation.

By computing the distribution of the user efficiency for regular users and the accounts associated with each party, we have shown that its functional form is not dependent on the chosen group of users or on the particular electoral period under study. This adds further evidence of the universality of the efficiency patterns shown by Morales et al. [29], where conversations of different natures and sizes were analyzed. We have also computed the differences between users belonging to the global conversation and politicians and found that, in the case of the latter, high efficiencies have a higher probability with respect to regular users whereas low efficiencies present lower probabilities. The behavior of this measure is similar for all the parties. According to this result, the collective reactions to the communication strategies of each party are comparable. Additionally, it is shown that, in general, political accounts are more efficient in propagating messages than regular users. Politicians are aware of the relevance of social media and know how to leverage their power.

The performed analysis of the mention and retweet C-networks induced by political accounts has enabled us to show the lack of debate among different political parties. This result is in good agreement with the existing literature [5, 15]. Furthermore, we have found that an intensification of the interaction can be detected between parties after the formation of a coalition.

In addition to the regularities in behavioral patterns that we have found by comparing two similar political contexts, several results are consistent with a previous study of the 2011 Spanish elections [5], suggesting that there exist collective behaviors that are robust in the face of social change and can be associated with the Spanish political landscape, but with a potential application beyond this social context. One possible use of these results would be to probe similar political processes and highlight anomalous behaviors that may indicate atypical Twitter uses in electoral contexts.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure

An earlier version of this work has been presented at the 9th International Conference on Complex Systems.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work has been supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under Contract no. MTM2015-63914-P.