#### Abstract

We present a social network and forensic analysis of the vote counts of Spanish national elections that took place in December 2015 and their sequel in June 2016. We initially consider the phenomenon of bipartisanship breakdown by analyzing spatial distributions of several bipartisanship indices. We find that such breakdown is more prominently close to cosmopolite and largely populated areas and less important in rural areas where bipartisanship still prevails, and its evolution mildly consolidates in the 2016 round, with some evidence of bipartisanship reinforcement which we hypothesize to be due to psychological mechanisms of risk aversion. Subsequently, a functional network analysis detects an effective partition of municipalities which remarkably coincides with the first-level political and administrative division of autonomous communities. Finally, we explore to which extent vote data are faithful by applying forensic techniques to vote statistics. Results based on deviation from Benford’s law are mixed and vary across different levels of aggregation. As a complementary metric, we further explore the cooccurring statistics of vote share and turnout, finding a mild tendency in the clusters of the conservative party to smear out towards the area of high turnout and vote share, what has been previously interpreted as a possible sign of incremental fraud.

#### 1. Introduction and Datasets

In the last decade and in parallel with the improvement of computational resources and the possibility of accessing, storing and manipulating massive digital records easily, the political science community has engaged with the task of producing quantitative and systematic methods to detect irregularities in electoral results [1]. In this work, we analyze the vote count statistics obtained in the Spanish national elections that took place in December 2015 as well as in their sequel of June 2016. Since the end of 2014, the emergence of new parties such as the antiausterity Podemos and the rise of other ones such as Ciudadanos (Cs) challenged an already decadent bipartisanship system, as was evidenced by the highly fragmented total vote share in 2015. These results further defined a new type of political equilibrium in Spain, where the quest for alliances across parties was required to form a workable majority. Unfortunately, this situation was not achieved and the parliament was unable to build the necessary coalitions to make such a workable majority, what triggered the onset of new elections only six months after the previous ones, in June 2016. These special and unique conditions, together with the fact that the polls and electoral surveys preceding and on the day of the elections, showed an unusually high discrepancy with the actual results motivating the use of some of the recently developed techniques for elections forensic analysis to scrutinize any source of irregularity in these elections.

High resolution vote count data (at several levels of aggregation down to the level of municipalities) have been extracted from the official webpage of the Ministerio Del Interior [2] (Spanish Ministry of Home Affairs) for both 2015 and 2016 elections. For concreteness we have focused on vote counts on congress and discarded senate (see Figure 1 for a guide of the type of data available from the ministry of home affairs website).

The high quality of the dataset under study allows us to address important social and political questions in a scientific way. In this work, we have considered three specific questions, namely, (i) can we confirm that the bipartisanship system is challenged? (ii) Can a social network analysis reveal quantitative information on the voting profiles and similarities across municipalities and regions? And (iii) can we scrutinize these data against forensic techniques? And if so, is there any evidence of fraud?

To address the first question, we will define a set of bipartisanship indices and will explore the spatial distribution of these over the Iberian peninsula of Spain (at the fine-grained level of municipalities), while, for the second question, we will build on state-of-the-art community detection methods (Infomap) applied on a functional-like network of municipalities extracted via cosine similarity.

The third question will be addressed via two different studies. The first one addresses the deviation or conformance of vote counts statistics to the so-called Benford’s law [3, 4] that predicts that the first significant digits in some datasets (including vote counts) should follow an inverse-logarithmic distribution. The rationale for this analysis is that statistically significant deviations between the empirical distribution and the theoretical one point us towards electoral irregularities. These irregularities might in turn be due either to unintentional mismanagement of the voting process and/or to fraud. This type of analysis only flags the existence of such irregularities and gives no judgment on what was the cause for such irregularity. To complement this study, we then explore the presence and detection of sources of incremental and extreme fraud from the cooccurring statistics of vote and turnout numbers, following a recent study [5].

The rest of the paper goes as follows: in Section 2, we introduce similarity indices and explore the spatial distribution of these. Having access to the 2015 and 2016 election statistics, we will be able to explore the social effect of distrust and the possible longitudinal progression of bipartisanship breakdown. In Section 3, we perform the social network analysis of the data, creating functional networks via cosine similarity computed on the vote profile at the level of municipalities. Then, in Section 4, we focus on forensic methods. We introduce Benford’s law along with the precise types of statistical tests that have been proposed in the realm of election forensics, and we present the results obtained from these tests for both the December 2015 and June 2016 elections at three different levels of aggregation. The main results and interpretations on this first study are reported in this section and additional material and analysis are shown in Appendix. In this section, we also present the second forensic analysis that addresses the cooccurring statistics of vote and turnout numbers. Finally, in Section 5, we provide some discussion and conclude.

#### 2. Bipartisanship Indices

Vote counts can be aggregated at several spatial levels (municipalities, precincts, etc.). In general, for a specific region (e.g., a given municipality), vote data consist of a vector of vote percentage , where is the percentage of votes to party , and , being the total number of parties with representation in that region. Whereas in a majority of municipalities the main national-wide parties PP, PSOE, Cs, and Podemos have representation, other smaller, regional (or otherwise) parties also appear with different frequency in the different municipalities. In other words, (total number of parties) might fluctuate from municipality to municipality.

*Bipartisanship Index (BI)*. As a crude metric, we initially define a* bipartisanship index* () of a given municipality (resp., precinct, etc.) as the sum of the vote percentage ratio (between 0 and 1) of the two most-voted parties in . For a pure bipartisanship region , approaches 1, whereas in the ideal case of a multiparty region, BI would approach its minimum value . This metric allows us to compare the relative level of bipartisanship across regions.

In Figure 2, we provide a spatial heat map of Spain where we plot for each municipality , for the 2015 and 2016 elections data, respectively (note that, for illustration constraints, Canary Islands are not represented here, but we shall emphasize that data and results there are qualitatively equivalent to those obtained for the Iberian peninsula and Balearic Islands). First, we can observe that there is a clear tendency towards relatively lower bipartisanship in areas that correspond to highly populated regions, for example, Madrid and Catalonia. This finding is well aligned with the social observation that the process of bipartisanship breakdown, as other social changes initially develop close to important cosmopolite cities and then percolate to more rural areas. A second interesting finding is that from 2015 to 2016 there is a* stalling* in the bipartisanship breakdown, and its average index over all Spanish municipalities even slightly increases from in 2015 to in 2016 (this small increase is however within error bars so one cannot rule out this being a statistical artifact). A possible sociological interpretation is the following: after the 2015 elections, parliament was unable to build the necessary coalitions to make a workable majority, and this was the trigger to the new elections only six months after the previous ones, in June 2016. As society is averse to the uncertainty generated by frustrated elections, risk aversion might have stalled the overall inertia that a priori was driving the bipartisanship breakdown, and the fear of not being able to find workable majorities might have forced some voters in the most conservative regions to turn back to a bipartisanship strategy that indeed guarantees those much-needed majorities.

**(a)**

**(b)**

**(c)**

At a regional level, we can observe that this phenomenon is highly heterogeneous: whereas there is a very acute trend towards bipartisanship breakdown consolidation in specific regions such as the autonomous community of Valencia, in other regions such as the autonomous community of Galicia the trend is pretty much the opposite.

In Figure 2(c) we plot the frequency histogram of bipartisanship indices, for both 2015 and 2016. For 2015, a clear Gaussian-like shape emerges, and such shape is slightly perturbed for the 2016 case. In the latter case, we observe weird peaks and pits emerging in the distribution leaving a trace of wild fluctuations which are not present in the 2015 statistics. Comparing both histograms we can perceive a subtle shift towards higher values of BI.

*Entropy Index (E)*. Intuitively, bipartisanship tends to accumulate vote percentage in a few parties, whereas multiparty tends to spread out votes across parties. Following this argument, in order to quantify more precisely the concentration of vote percentages over the whole set of parties, one can define an* entropy index* for region as This quantity is bounded in , reaching the minimum for uniparty (a single party gets 100% of the votes in the region) and reaching its maximum for multiparty (all parties get the same percentage of votes).

Qualitatively, we have found similar results when exploring spatial distributions of the entropy index to those obtained with the more crude quantifier BI (see Figure 3).

**(a)**

**(b)**

**(c)**

*Diversity Index*. Inspired by ecological metrics [6], here we further define the diversity index of a given region as the effective number of political parties. This refers to the number of “equally voted parties” needed to obtain the same mean proportional parties vote percentage as that observed in the dataset (where all parties may not be equally abundant): where is the entropy index as defined above. In other words, counts, assuming that effective parties all get the same number of votes, the number of such effective parties one would need to find the same entropy index as found by computing to the vote statistics. In Figures 4(a) and 4(b) we plot such index for 2015 (a) and 2016 (b) elections. We clearly observe two important stylized facts: namely, (i) as already found in BI and , there is a clear separation between regions close to cosmopolite and largely populated cities, whose diversity index tends to be large, entailing a high number of effective parties at play, and regions which are typically less populated (rural areas) with lower diversity index. (ii) At odds with BI, the overall index evidences a marked decrease between 2015 and 2016, visually observed by a drift in heat map towards lighter color (i.e., lower values of diversity) and a change in the diversity distribution (Figure 4(c)).

**(a)**

**(b)**

**(c)**

#### 3. Functional Network Analysis

In this section, we make use of tools from Network Theory [7] to explore the vote similarity across regions. Attached to each municipality, we consider the vote percentage vector defined above. We measure the similarity between the voting statistics of two municipalities and via the so-called cosine similarity: where is the standard scalar product and is the norm. By construction, , where complete similarity () is reached when the vote statistics are identical and null similarity is reached when , that is, when a nonnull percentage to a given party in one municipality is always matched to a null percentage in the other municipality.

The matrix naturally defines a fully connected weighted network where nodes are municipalities, and every pair of nodes and are linked with an edge with weight . In our database, we have a total number of about municipalities, hence a fully connected network of about nodes and edges. In Figure 5 we plot the edge’s weight probability density . As expected, weights are concentrated in a region relatively close to the maximum value, something that can be justified already by noting that, in most of the municipalities, all four parties PP, PSOE, Cs, and Podemos have a nonnull vote percentage. To get some insight beyond this simple statistic, we now run an algorithm to detect communities, that is, large groups of nodes which are similar to each other and less similar to the nodes in the other groups. To do this, we run Infomap [8, 9] on the undirected and weighted network after a simple thresholding is performed on : for a given node , we only conserve the largest similarity weights such that all nodes at least have degree . In other words, we perform a parallel pruning on the edges, starting from those with smaller weights, and we prune the network up to the point where we cannot prune anymore if we want to make sure every node has a degree . The first level of Infomap reveals a total of 14 communities in 2015 and 16 communities in 2016. In Figure 6 we plot a spatial projection of the network, where nodes are municipalities. For exposition reasons, edges are not plotted in the figure as otherwise the image would not carry much information. Nodes belonging to the same network community are colored equally. Resemblance between network communities obtained via Infomap in our functional network and actual Spanish autonomous communities is remarkable, and such similitude is more acute in 2015, where bipartisanship breakdown is slightly more pronounced, than in 2016, where bipartisanship slightly reduced, as reported by the diversity index, and this has the effect of fuzzing up the relation between network communities and autonomous community divisions. Interestingly, one finds what we could call stable and compact autonomous communities (those which have a well-defined, stable over time and cohesive functional community counterpart): Catalonia, Madrid, Basque Country, and Navarra, whereas other set of communities have a clear counterpart in 2015 but a fuzzier one in 2016 (e.g., Comunitat Valencia, Andalucia, and Murcia). Another set of autonomous communities present a highly heterogeneous voting profiles and do not present any clear functional community counterpart. Theoretical digressions and insights that could give a sociopolitical justification for this classification are left as an open question for future work.

#### 4. Forensic Analysis

##### 4.1. Benford’s Law

The first significant digit (or leading digit) of a number is defined as its nonzero leftmost digit (e.g., the leading digit of 123 is 1 whereas the leading digit of 0.025 is 2). The so-called Benford’s law is an empirical statistical law stating that in particular types of numeric datasets the probability of finding an entry whose first significant digit is decays logarithmically aswhere stands here for the decimal logarithm (note that trivially . Perhaps counterintuitively, this law is quite different from the expected distribution arising from an uncorrelated random process (e.g., coin tossing or extracting numbers at random from an urn) which would yield a uniform distribution where every leading digit would be equally likely to appear. The logarithmically decaying shape given in (4) was empirically found first in 1881 by astronomer Simon Newcomb and later popularized and exhaustively studied by Benford [10]. Empirical datasets that comply to Benford’s law emerge in disparate places as for stock prizes or physical constants, and some mathematical sequences such as binomial arrays or some geometric sequences have been shown to conform to Benford. A possible origin of this law has been rigorously explained by Hill [11], who proved a central limit-type theorem by which random entries picked from random distributions form a sequence whose leading digit distribution converges to Benford’s law. Another explanation comes from the theory of multiplicative processes, as it is well known that power-law distributed stochastic processes follow Benford’s law for the specific case of a density (see [12] and references therein for details). In practice, this law is expected to emerge in a range of empirical datasets where part or all of the following criteria hold: (i) the data ranges a broad interval encompassing several orders of magnitude rather uniformly, (ii) the data are the outcome of different random processes with different probability densities, and (iii) the data are the result of one or several multiplicative processes.

Mainly advocated by Nigrini [3], the application of Benford’s law to detect fraud and irregularities, by observing anomalous and statistically significant deviations from (4) for datasets which otherwise should conform to that distribution, has become popular in recent years, and from now on we quote this a 1BL test. Mansilla [13] and Roukema [14] applied this methodology to assess Mexican and Iranian vote count results, respectively. On the other hand, Mebane [4] advocates instead to look at the second significant digit (which follows an extended version of Benford’s law [15]) and argues that the frequencies of election vote counts at precinct level approximate a Benford distribution for the second digit, and accordingly mismanaged or fraudulent manipulation of vote counts would induce a statistically significant deviation in the distribution of the second leading digit, detected by a simple Pearson goodness-of-fit test. Mebane applied this so-called 2BL test to assess the cases of Florida 2004 and Mexico 2006, and other authors have subsequently applied this in many other occasions (see [16] and references therein). In this case the theoretical distribution takes a more convoluted shape than (4), namely,and a good numerical approximation [15, 16] is given by We start by exploring 1BL and 2BL tests applied to vote count statistics nationally using the fine-grained data given by splitting vote counts at the level of municipalities (with over 8000 samples, vote counts ranging in about five orders of magnitude). With sociopolitical impact in mind, from now on we focus on the vote statistics to the main, national-wide parties PP, PSOE, Cs, and Podemos.

Results for the 1BL are shown in Figures 7(a) and 7(c) ((a, c) depicts results for the 2015 elections while (b, d) does the same for the 2016 case). As expected the distributions seem to be close to Benford’s law for all political parties, at least visually, and there are no obvious differences between 2015 and 2016. To have a better quantitative understanding, we have made use of two statistics: (i) the classical Pearson’s and (ii) the mean absolute deviation (MAD) test as proposed by Nigrini [3]. In both cases the null hypothesis is that data conform to Benford’s law. The former statistic reads where and are the theoretical and observed relative frequencies of each digit and for 1BL and for 2BL. This statistic has 8 degrees of freedom for 1BL and 9 for 2BL (as in this latter case the digit zero has to be incorporated as a candidate) and is to be compared to certain critical values, such that if then is rejected with the selected level of confidence level . For degrees of freedom, the critical values at the 95% and 99% are 15.507 and 20.090, respectively, whereas, for degrees of freedom, the critical values at the % and % are 16.919 and 21.666, respectively.

**(a)**

**(b)**

**(c)**

**(d)**

The mean absolute deviation is defined as where is the initial digit (1 for 1BL, 0 for 2BL). Whereas this statistic lacks clear cut-off values, Nigrini provides the following rule of thumb for 1BL: MAD between 0 and 0.004 implies close conformity; from 0.004 to 0.008 acceptable conformity; from 0.008 to 0.012 marginally acceptable conformity; and, finally, greater than 0.012 nonconformity. To the best of our knowledge, the critical values for MAD have not yet been established for 2BL so all over this work we will assume the same ones as for 1BL.

Results for 1BL can be found in Table 1. We conclude that, for the Pearson test, cannot be rejected with sufficiently high confidence in three out of the four main political parties but the result for PP is consistently large and suggests rejection of the null hypothesis with a confidence of 99%. These results are in contrast with those found using the MAD statistic, where according to Nigrini all political parties conform to Benford’s law (PP only showing acceptable conformity and the rest showing close conformity).

The results on 2BL are shown in Figures 7(b) and 7(d) and test statistics are summarized again in Table 1. These suggest an overall conformance to the second digit law, with exception flagging nonconformance raised by that rejects at 95% for Podemos (2015), Unidos-Podemos (2016), and PSOE (2015).

###### 4.1.1. Individual Analysis at the Precincts Level

In order to give a closer look to the vote count distributions we now explore the statistics taking place at each separate precinct. At this point we need to recall that among other criteria Benford’s law is expected to emerge in datasets where data range several orders of magnitude. This hypothesis was fulfilled at the national scale as the population of municipalities ranges several orders of magnitude (–, see Table 2) if we consider all of them. However this is not straightforward at the precinct level, where the number of municipalities is highly heterogeneous from precinct to precinct. In Figure 8 we have checked that the number of orders of magnitude that vote counts span at the precinct level is indeed linearly correlated with the number of municipalities the precinct contains (). This means that the larger the number of municipalities considered in a single analysis is, the more we should expect data to conform to Benford’s law.

**(a)**

**(b)**

That being said, for each and every precinct in Spain, we proceed to extract the frequencies of the first and second significant digits found for all the municipalities inside that precinct and make a goodness-of-fit test between these empirical distributions and 1BL and 2BL using both and MAD statistics. Results on 1BL are summarized for the case of in Figures 9(a) and 9(c) and in Figure 10, finding an overall good conformance to 1BL at the precinct level. Conversely, MAD statistics (Figure 11) say just the opposite, suggesting systematic nonconformance. As for the 2BL test, there exists a strong deviation from the expected distribution (Figures 9(b) and 9(d) for and Figure 11 for MAD), and both statistics consistently reject the null hypothesis of conformance to Benford’s law for all political parties.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(a)**

**(b)**

**(c)**

**(d)**

Now, note that at each individual precinct we expect statistics to be a priori poorer than at the national scale, as the average number of municipalities per precinct is of the order of (see Table 2 for details), that is, one order of magnitude smaller. As MAD does not include any correction term that depends on the sample set, one should therefore take the results associated with MAD with a pinch of salt. This is not necessarily the case for the as this latter statistic takes into account in its definition the number of samples. In any case, in order to assess whether the strong nonconformance to 2BL at this level of aggregation is just due to finite size effects we explore the dependence of both and MAD results on the precinct’s size. Accordingly, in Figure 12 we plot for each precinct its and MAD result as a function of the number of municipalities present in that precinct. As expected, we find that MAD suffers from finite size effects and is over conservative for small sample sizes; however, this effect is rather weak and not enough to explain the systematic nonconformance to 2BL. In the case of the statistic we observe quite the opposite effect: the larger the number of municipalities in a given precinct, the more likely the null hypothesis to be rejected. An equivalent size dependence analysis for the 1BL test is reported in Figure 13.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

###### 4.1.2. Aggregate Analysis at the Precincts Level

To round off our analysis with a third level of aggregation, we explore conformance to 1BL and 2BL when vote counts are aggregated per precinct. In this case, we only have 52 samples (52 precincts) so we expect the distributions to be more noisy. From the previous analysis, we learned that MAD suffers from finite size effects so we expect MAD to be more conservative than at this level of aggregation. In Figure 14 we show the results for 2016 and we refer the readers to Figure 15 to find analogous results for 2015, which do not show substantial differences at the qualitative level. As expected the distributions show larger fluctuations and, in absolute terms, deviate more from the theoretical laws (depicted in dotted lines). In Table 3 we depict and MAD statistics, which, again as expected, show inconsistent results: while systematically cannot be rejected with above 95% confidence level, in turn MAD systematically suggests nonconformity. We conclude that this level of aggregation is less informative than previous ones.

**(a)**

**(b)**

**(a)**

**(b)**

##### 4.2. Cooccurrence Heat Maps

Our second analysis is inspired by a recent study [5] that explore the cooccurring statistics of vote and turnout numbers and the associated double mechanism of incremental and extreme fraud by plotting two-dimensional histograms (heat maps) reporting, for a given political party, the percentage of vote (vote share) it got as compared to the percentage of participation. According to Klimek and coauthors, incremental fraud occurs when, with a given rate, ballots for one party are added to the urn and/or votes for other parties are taken away, and this mechanism is revealed when the histograms smear out towards the top-right corner of the histograms. On the other hand, extreme fraud (which corresponds to reporting close to complete turnout and almost all votes for a single party) emerges when the distribution transitions from unimodal to bimodal and one of the modes corresponding to a cluster that concentrates close to that corner of 100% participation (complete turnout) and very large vote percentage. They applied these statistical principles to several national elections, concluding that in the cases of Russia and Uganda fraudulent manipulation was the most likely underlying mechanism. In Figure 16 we plot such heat maps for the 2016 case for all four political parties. Data for PSOE, Unidos-Podemos, and Cs do not show any sign of fraudulent manipulation. In the case of PP results are less clear, as there indeed exists a (rather weak) tendency of the data to smear out towards the top-right corner (results for 2015 are very similar and have been reported in Figure 17). We do not find any sign of systematic extreme fraud, although it is worth stating that we have found a small subset of municipalities where just one party received 100% of the vote share (see Table 4). Without exception, this is the popular party (PP), something that is in principle suspicious. Nevertheless a closer inspection reveals that these municipalities are extremely small and thus consensus in one political option cannot be ruled out statistically.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

A further interesting peculiarity for the case of the conservative party PP is the existence of two clusters of municipalities (bimodal distribution) that gathers two different voting strategies: one relatively small, located at small vote share and the other one at high vote share, which is more spread out (we do not observe bimodality for the rest of political parties). We have labeled municipalities according to which cluster they belong (assigning a brown label for the larger cluster and a turquoise label for the smaller one) and plotted them in Figure 18. Just by visual inspection we can appreciate that the category linked with the smaller cluster is mainly formed by Catalonia and the Basque Country (regions with proindependence aspirations and a strong nationalist tradition), something that was recently pointed out independently [17], and some further municipalities in regions that have been considered PSOE strongholds historically.

**(a)**

**(b)**

#### 5. Discussion

In this work, we have studied the statistical properties of vote counts in the Spanish national elections that took place in December 2015 and June 2016, focusing on three separate questions: (i) breakdown of bipartisanship, (ii) region-to-region similarity in vote percentage, and (iii) election forensics for fraud detection.

On relation to (i), our results highlight that the bipartisanship system has suffered a clear breakdown in 2015, at least in regions associated with a more widespread cosmopolite society. Such breakdown consolidates over time but does not increase, probably due to the risk aversion of not finding workable majorities in the second election round and even evidences a subtle decrease as captured by the diversity index . Bipartisanship breakdown is actually a quite complex phenomenon with a high degree of heterogeneity at a regional level, probably due to regional political particularities.

Second, on relation to (ii), we have constructed a functional network of municipalities via cosine similarity of voting profiles. Interestingly, there is a very good matching between network communities which emerge by a community detection algorithm on the functional network and the actual Spanish autonomous communities. In particular, a classification of autonomous communities emerges naturally: we find some autonomous communities whose functional network community counterpart is more cohesive and stable over time (e.g., Catalonia, Basque Country, Madrid, and Navarra), some whose counterpart is only well-defined in 2015 when bipartisanship breakdown was more acute (e.g., Murcia and Valencia), and the rest where there is no clear matching. Beyond the probably amplified role played by regionalist parties in 2015, we do not have clear sociopolitical explanations for such emergent classification and we leave this as an open problem. Other aspects left for future work include network pruning using different criteria to the ones applied in this work, such as using a fixed similarity threshold.

On relation to fraud detection, for the 2016 elections, the unusually high discrepancy found between electoral surveys preceding and on the day of the elections (26th June) and the actual electoral results have been a source of debate and controversy in Spanish media. To the best of our knowledge, this work is among the first systematic analysis of its kind for Spanish elections (see however [17, 18]). The first and general conclusion on relation to question (iii) we have extracted is that the voting distributions do not show any systematic and significant change between the 2015 and the 2016 elections, as all statistical results are qualitatively identical. This is in line with the original analysts thesis that were discussed soon after it was learned that Spain had to go into a second election given the inability of the parliament to find a suitable coalition, but at odds with most of the polls and surveys of vote intention which were predicting a much different scenario as 26th June approached.

The first analysis is based on the hypothesis that, under clean conditions, vote count data should conform to Benford’s law. At the national scale we have found a general good qualitative and quantitative conformance to Benford’s law for the first (1BL) and second (2BL) digits, with small deviations only occurring for 1BL in the conservative party, where the null hypothesis can be rejected at 99% confidence in both years according to the standard Pearson hypothesis test, a result which is not confirmed using an alternative test (mean absolute deviation) proposed by Nigrini. For 2BL only flags up some concerns at the 95% confidence level for Podemos/Unidos-Podemos, but the null hypothesis cannot be rejected at 99% and, again in this case, MAD statistic is less conservative and accepts the null hypothesis for every party.

If we change the resolution and explore results for each individual precinct, results show a completely different story: conformance to 1BL is accepted according to but systematically rejected according to MAD, and conformance to 2BL is consistently rejected according to both and MAD statistics for every precinct and every political party. We have also shown that these are genuine results that cannot be associated with a lack of statistics. Finally, by aggregating vote counts per precinct and analyzing conformance to 1BL and 2BL at this level of aggregation, we obtain inconsistent and therefore inconclusive results, as cannot reject the null hypothesis above 95% confidence level systematically but conversely MAD suggests systematic nonconformance. This lack of consistency raises the question about what level of aggregation might be better suited for BL-type analysis and which statistic is more reliable when assessing the goodness-of-fit issues that certainly deserve further investigation.

Given the somewhat mixed results and acknowledging that the applicability of Benford’s law tests to election forensic is not completely free from controversy [19, 20], as a complementary analysis we further explored the correlations between percentage of participation and percentage of votes for each municipality, plotting two-dimensional histograms to detect the presence of so-called incremental and/or extreme fraud as described by Klimek et al. [5]. Our results suggest that the results for PSOE, Unidos-Podemos, and Cs are apparently free from these mechanisms whereas in the case of PP we find a weak evidence of cluster smearing out similarly to what Klimek et al. refer to incremental fraud, an evidence which needs to be studied in more detail. The heat map of the conservative party also shows two clusters instead of a single one, hence the bimodality in the vote share tendency: there exist two different groups of municipalities, including a small one where the tendency is to give a small vote share to PP and a larger one where the vote share takes larger values. Interestingly, according to a spatial analysis, we have been able to confirm that the low vote share cluster typically corresponds to regions which are considered nationalist (Catalonia and Basque Country) where the strength of regional options outperforms those that prevail at a nationwide scale.

All in all, these results suggest that further investigations and enquiries should be conducted in order to confirm and clarify the presence or absence of some of these apparent irregularities, to elucidate their source and quantify their impact in election results. In this respect, systematic comparative studies with historical bipartisanship Spanish data and analogous data (analysis at different levels of aggregation) from other similar democratic countries are needed.

#### Appendix

In this appendix we depict several additional figures and tables that complement the main study (see the main text for references to each of these figures). See Figures 3, 8, 10, 11, 13, 15, 17, 19, 20, and 21 and Table 2.

**(a)**

**(b)**

**(a)**

**(b)**

#### Additional Points

*Availability of Data and Materials*. The datasets supporting the conclusions of this article are available under request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Authors’ Contributions

Juan Fernández-Gracia and Lucas Lacasa designed the study, Juan Fernández-Gracia performed the data gathering and analysis, Juan Fernández-Gracia and Lucas Lacasa interpreted the results, Lucas Lacasa wrote the manuscript, and Juan Fernández-Gracia and Lucas Lacasa revised the manuscript.

#### Acknowledgments

The authors thank M. Antònia Tugores for her helpful assistance in the data gathering process, Luis F. Lafuerza for fruitful suggestions, and D. Kobak for pointing out the recent works [17, 18]. Lucas Lacasa acknowledges funding from EPSRC Early Career Fellowship EP/P01660X/1.