Abstract

This work presents a theoretical and numerical analysis of the conditions under which distributed sequential consensus is possible when the state of a portion of nodes in a network is perturbed. Specifically, it examines the consensus level of partially connected blockchains under failure/attack events. To this end, we developed stochastic models for both verification probability once an error is detected and network breakdown when consensus is not possible. Through a mean field approximation for network degree we derive analytical solutions for the average network consensus in the large graph size thermodynamic limit. The resulting expressions allow us to derive connectivity thresholds above which networks can tolerate an attack.

1. Introduction

Trust is usually conceived as the additive aggregation of reliable pieces. However, when it comes to cyber-security or privacy requirements, the challenge is how to collaboratively create trust out of uncertain sources in a networked environment [16]. A remarkable success story of this approach is Bitcoin [7]. In Bitcoin, trust is built by a set of agents—miners—which collaborates in sequencing blocks of transactions in a chain. Blockchain (BC) is the underpinning technology of Bitcoin, a protocol in which miners compete to solve a computationally expensive problem, known as Proof-of-Work (POW) [8]. The miners’ results are then assembled together in a distributed data chain. The outcomes are only embedded in the final version of the chain after consensus, which is only reached if the order relationships are consistent. POW is a proxy of trust and, hence, reliability increases as the chain grows; it is incrementally more difficult to revert—hack—the chain since this requires increasing computing power. Thus, although each agent generates insecure information locally, the resulting aggregate becomes more and more reliable over time.

Recently however, these advantages have also caused concerns about how the BC paradigm can be exported to domains other than cryptocurrency, such as the Internet-of-Things (IoT) or Wireless Sensor Networks (WSN) [9, 10]. This difficulty arises from the limitations of the BC architecture, which hamper the possibility of extending it to small devices (e.g., sensors). Sensors, in particular, lack the computing power to perform POW. An even more challenging fact is that BC requires full connectivity to operate (which is unfeasible for WSNs). Therefore, the question at issue is how to design blockchains without POW and partial connectivity while maintaining robustness to failures and attacks.

Distributed consistency is not a novel concept. In [11] the authors analyse the consistency of distributed databases by using algorithms which are closely related to epidemiological models [12]. Two information diffusion mechanisms, antientropy and rumor mongering, happen to be particularly useful for modelling distributed consistency. Antientropy regularises entries in the databases while rumor mongering updates the last information content from neighbour instances. This trade-off between ordered and random infection allow the authors to find exponential epidemic growth by using a mean field approach. The concept of diffusion in partially connected networks is treated rigorously in [13] in the context of glassy relaxation. Here, the geometrical aspects of the return probability of a Markovian hypercube walk are also analysed using mean field theory.

The effect of graph topology on information spreading has been extensively discussed in the literature (e.g., [1416]). However, the model in [16] (a random graph superposed to a ring lattice) is particularly relevant to our discussion, since it ensures a minimum connectivity while maintaining the small-world property (i.e., high clustering coefficient and small characteristic path length [17]).

In [18] the general distributed consensus problem is described; nonfailing sites out of choices have to decide on a common value . The authors of that study found that the key components for consensus breakdown are asynchronicity and failure, which both inject uncertainty into the system at different scales. Distributed consensus in networks is also analysed in [19], where the authors address the most important applications of the concept, such as clock synchronisation in WSNs. The authors introduce the average consensus as the limit to which initial states converge, provided this limit is equal to the averaged initial values. Interestingly, a randomised consensus protocol (where only a fraction of sites needs to agree on a value) is shown to be more robust against crash than a deterministic algorithm [20].

When consensus is not reached, systems usually break down. From the point of view of control theory, a number of interesting results have been obtained in studies focused on this issue, for example, [19], aimed at self-healing the system momentously after failure. However, security and resilience are multidimensional objects which can be tackled more consistently through a complex systems approach [21, 22]. For instance, [23] proposes a phone call model where players broadcast rumors randomly among their partners. The authors study the effect of node failure and concentrate on an interesting result; if failure patterns are random, crashing nodes result in only uninformed players with high probability. The work also shows that any randomised rumor spreading algorithm running for rounds requires transmissions. This is consistent with what we know from network science [24]; random failures do not spread so easily. The model considered in [25] consists of sites running processes asynchronously where failures are modelled as a Bernoulli process. In [26] the problem is set in terms of a voter model and an invasion process; agreed values are exported from a set of sites but imported errors infect the rest of nodes.

When it comes to blockchain implementations, [27] analyses information propagation in the Bitcoin network. This work highlights the limitations of the synchronisation mechanisms in BC and the system’s weaknesses under attack. Here, the communication network is modelled as a random graph with a mean degree of ≈32 and it is found that the block verification process can majorly contribute to delay propagation and inconsistency. In their experiments the authors show that the probability distribution of the rate at which nodes learn about a block has a long tail. This means that there is a nonnegligible portion of nodes which does not receive information timely. The effect is equivalent to considering an incomplete consensus network. A typical example of organised attack in the BC is the so-called selfish-mine strategy. This consists of a subset of nodes which diffuse information partially to targets, instead of distributing updates homogeneously [28]. In [29] a Markov chain model is used to analyse the selfish-mine strategy in Bitcoin. This and other block-withholding behaviour can have a devastating effect on the performance if the dishonest community is around half the size of the network.

All these works provide key insights into the problem of network resilience, diffusion, and consensus from different perspectives. However, to the authors’ knowledge, a mathematical model of partially connected blockchains is still missing. Therefore, in this paper we make a theoretical and numerical analysis of the conditions under which a distributed sequential consensus is possible. In concrete, we examine the consensus level of partially connected blockchains under failure/attack events. To this end, we develop stochastic models for both verification probability once an error is detected and network breakdown when consensus is not possible. The resulting expressions allow us to derive connectivity thresholds above which networks can tolerate attack.

The paper is organised as follows. In Section 2 we formulate the problem. The results obtained in the study are presented in Section 3. Finally, we present the conclusions obtained from our research and discuss the possibilities for future work in Section 5.

2. Problem Formulation

Blockchains can be conceived as dynamical distributed databases whose constituents (blocks) are collaboratively and incrementally built by a set of agents. There are three key factors in this process: (a) how information spreads, (b) how consensus can be achieved, and (c) how errors affect the overall performance. We elaborate on these elements below.

2.1. Partial Connectivity in Consensus Networks

From a network perspective we consider a Peer-to-Peer (P2P) infrastructure with two types of nodes: communication sites and processing sites, miners (Figure 1). Users connected to nodes can launch transactions to other users in the network. If a group of users is involved in a transaction arrangement, one or more miners can attempt to verify the intended transactions and if successful, pack them into a block. This problem can be conceived as the interplay of three graphs: communication, transactions, and miners. As stressed, the usual BC protocol takes the full graph for granted, which is not always possible; there may either be failures or intentional attacks on a portion of the network. However, it is unlikely for a network to get disconnected under normal operation. Hence, graph connectedness is a reasonable lower bound assumption (particularly in the case of sensor networks and IoT). This leads us to consider the network proposed in [16]; , consisting of a random graph superposed to a ring lattice . This model still exhibits the small-world property found in [14, 15] but it is closer to the real requirements of minimum connectivity found in WSNs and other networked systems such as computer networks [30].

At this point it seems that information spreading in the BC can be reduced to the well-known problem of diffusion on graphs. This area is vastly covered in the literature (see, e.g., [13]). However, BC diffusion holds some subtleties under the hood as we show below.

2.2. Why Order Matters: Sequential Diffusion

At every transaction arrangement, the ordering of each operation in the set is a key factor. Consider the simple arrangement shown in Figure 1(b), which represents four possible transactions. These can be formalised as the directed links shown in the graph. There are ways to sort this set but not all of them are consistent. The type of consistency we refer to is that which avoids the double-spending problem. Take two possible order relationships and implemented by the bijections . They result in The first ordering does not induce any inconsistency but the ordered set violates the double spend constraint depending on the weights . If we label by the state vector at step , a transition, say , results in the update equation: where represents the row-base vector and is the graph Laplacian corresponding to the transaction subgraph . The ordering allows writing compact update equations aswhere , , and represent transaction weights, base vectors, and graph Laplacians for each transaction. In Table 1, we show the evolution of states in the case with initial state for and . Notice that for node 3 has ran out of values at step 2 but it still intends to perform a transaction to node 5 at step 3. This is like having a balance of $10 in a bank account and spending it twice by sending $10 to two different recipients. When it comes to measurements in WSNs (say energy consumption data) avoiding these inconsistencies is imperative [31]. If a miner attempted to pack these transactions along with into a block, he will reach an inconsistency. These order constraints make the BC diffusion different to regular graph diffusion [13]. In fact, BC protocol ensures that double-spending paradoxes cannot occur by imposing constraints such as . An example of this is the distributed ledger in Bitcoin [27]. The next question is how this ordering couples with failures in the network.

2.3. Attack and Failure in Consensus Dynamics

Blockchain technology copes with the above restrictions efficiently by elevating the transaction order relationships to the block scale. Thus, every block (which can hold one or more transactions) in the resulting blockchain builds on top of the preceding block to preserve sequential diffusion. This strategy can however be used by dishonest users to create massive damage in the network. Consider the case depicted in Figure 2 where 6 miners collaborate to build a blockchain. Without loss of generality we can label the miners according to the order of their block resolution (it is very unlikely that two miners solve a block at the same time and, if this happens, BC would still sort the resulting blocks in order with high probability [20]). Node 3 is a failure node; it sends an error/attack to either a nonneighboring node (a) or to a miner who happens to be the one solving the next block (b). Below each graph, we also show the evolution of the chain. In this schematic, rows represent sites and columns represent iterations within the cycle. A row stands for the local instance of the chain at a given site and a column represents the collective blockchain built up to a given step. The blockchain is constructed as follows. At step all sites own the -genesis block. At step if miner finds no error in the last block of his local instance of the chain he solves the next block and broadcasts the solution to neighbours. The nonreached sites simply replicate their state. However, if the sending site is a failure node, it will broadcast a failure to one of his neighbours. In this case, if the affected block finds the error in his solving step, he still has a chance to restore the block upon consensus from his acquaintances. In case this consensus is not possible the blockchain breaks down. This flow is depicted in Figure 3.

Both situations shown in Figure 2 trigger different phenomena and have different effects in the overall network performance. In the first case, the error (represented as in the table) has no chance of being restored and it persists in the blockchain. However, in the second case an additional recovery step can restore the error to the agreed value of . Notice also that since the network is not fully connected there are sites that lack state updates and their local instances of the chain are not synchronised. This limits the information spreading in the network as we show in the next section.

We highlight the fact that, in the Bitcoin implementation, miners asynchronously relay blocks and transactions as soon as they either receive or mine them [32]. In our case agents hold received blocks and transmit their knowledge to neighbours sequentially. In Figure 4 we compare the sequence diagrams for both models in the case of three miners (for the sake of simplicity we have only considered one thread per miner. Since mining times are much larger than relay times, splitting mining and relay processes in two threads would not affect the conclusions of this comparison). Without loss of generality miners will solve blocks in first, second, and third order. In the Bitcoin blockchain implementation (a) the processes of mining and the relay of blocks have different timescales; ≈10 minutes for mining and a few seconds for block forwarding. However in a context where POW is absent (b), the mining lags tend to zero and the processes of mining, verification, and relay converge. In (a) if site at time sends a block to , this miner will forward it to after a short verification lag . Then, will release after a big mining delay. However, in (b), since there is no POW, will broadcast to neighbours pretty soon at epoch . This enables saving time and reducing the network traffic considerably.

2.4. Mathematical Model

By putting all these facts together, we obtain a minimum blockchain model that captures the dynamics described above: (a) partial connectivity, (b) sequential diffusion, and (c) failure spreading. Below we develop a stochastic process analysis to examine the averaged network performance under different conditions.

With the graph model of size described in Section 2.1 we represent each information block (or measure state in general) at site at the th iteration as . As stressed above, all sites start from the 0-genesis block: . Then, following the flow depicted in Figure 3, at iteration node , checks its state and adds a block to the chain. We collect the number of sites matching the current block in the variable , which is equal to the node degree plus a noise term . If sends an error signal to which cannot be reverted to the state , then and in any other case. The performance ratio per iteration is a measure of the consensus level reached at step . Depending on whether consensus is reached or not the whole chain may collapse. In an ensemble of chains we define both the failure and matching random variables , , and , respectively. in case there is one or more steps where consensus is not possible. Hence the ensemble mean for can be expressed aswhere —with as connection probability—represents the network average degree, , and stands for the failure probability. Since a chain failure can only happen after verification, , where is the verification probability and the respective conditional probability.

Notice that even in the failure-free case there is an upper bound in the mean efficiency imposed by the lack of full connectivity (full connectivity and full recovery with would result in the limit (i.e., 100% efficiency)). Hence, both size and connectivity limit network performance due to the partial sequential diffusion, specific for the BC architecture. Next, we look into chain failure probability.

Firstly, it is clear that failure can only happen when at iteration the last block of node is an error state. This requires (a) the emisor node to be an attack node with probability and (b) the receiving node is indeed . Since connections in are uniformly random, the verification probability at step can be expressed as . Also, because the chain is verified if at least one step needs verification, consequently the probability of blockchain verification is

3. Main Results

3.1. Mean Field Approximation for the Chain Verification Probability

By using a mean field approximation [11] we replace node degree by the mean network degree . In this case, for large , rendersIn Figure 5 we compare expression (5) with Monte Carlo simulation. For a ring lattice of size we generated synthetic networks with increasing connections and attack strength until graph saturation. Each experimental point (50 runs with the same parameters) represents the fraction of networks that reported a verification step. As the ratio of attacking nodes increases verifications grow exponentially, like the epidemics in [11]. As expected, graph connectivity (measured with the percentage of additional links until saturation) decreases the verification rate.

It is important to provide this estimate because a large number of verification steps translate directly to cost and efficiency in real implementations. Next, we investigate in detail the probability of a chain failure after a verification step.

3.2. Network Consensus Mechanisms

As stressed, if node sends an error code to node at iteration , there is a chance for node to revert this error provided that the consensus reached among its neighbours is over a given threshold. The problem can be formulated as follows. Let represent the neighbours of node . Notice that, at iteration , nodes in have value while the remaining nodes in can attain any value from the set . Given a consensus threshold , let denote the maximum frequency of values in which are different than . There is agreement whenwhere the notation stands for the floor value of . Notice that defines a simple majority based consensus among the sites.

Consequently, inspired by the antientropy and rumor mongering concepts [11], we split the consensus problem of (6) in two mechanisms: clustering and random infection (we use the mathematical epidemiology terminology for infected nodes as those receiving a given state. Notice that in our case infection is not necessarily a negative phenomenon unless the broadcasted quantity is an attack). In the former, neighbours get an update from to value . In the latter case neighbours eventually agree on a value arriving from other sites different than or from their own replications along the preceding steps in the blockchain cycle.

Notice that the number of symbols in increases with the number of iterations. Therefore, it is increasingly less likely to reach consensus by random infection. On the other hand, the link consensus mechanism does not decrease with the iterations. Hence, the link consensus will dominate over random consensus in the thermodynamic limit  . For a reasonable network size (say ) this enables us to neglect the random term contribution to the failure probability. Below we elaborate more on this stochastic approximation.

3.3. Stochastic Network Failure in the Thermodynamic Limit

As demonstrated before, cluster consensus occurs when there are at least sites out of possible nodes (the −1 term is because site already holds an state) with state . Equivalently, it can be assumed that is connected to at least nodes in . In this way, we can model the process as a Bernoulli trial (akin to [25]) where the success variable follows the binomial ~. The resulting failure probability rendersSince the blockchain failure probability can be expressed asby using (8) and (3) and we arrive to the expressionNow, provided that the quantity is small compared to , we approximate the logarithm in the last expression by its first-order series expansion. By implementing the same mean field approximation as for in the preceding section we obtain the equationwhere denotes the corresponding mean field approximation for . In Figure 6 we show the mean field approximation to the blockchain performance measured as the average network consensus for and . As for we generated synthetic networks with increasing connections and attack strength until graph saturation. For each network instance, we monitored the number of sites with value at iteration within the blockchain cycle. This gives us an empirical estimate for the network match per iteration . Then, we averaged the quantities over the cycle, which results into a measure for the network performance (i.e., consensus level). Finally, we obtain the mean value of this quantity from our Monte Carlo dataset. Each experimental point represents 50 runs with the same parameters.

Notice that 100% performance—blockchain limit—can only be achieved for full connectivity . The upper bound (black straight line) limits the network match for partial connectivity; as we increase the link probability the performance increases according to (10). Also, stronger attack strategies (larger values) result in lower performance as expected.

A remarkable result in Figure 6 is that beyond a critical value of connectivity ; consensus is only limited by information spreading and not by failure. This fact motivates us to look closely at possible estimates of .

3.4. Estimate for the Attack Tolerance Critical Connectivity

Noticing that is nothing else than the cumulative distribution function for the binomial , we use the normal approximation for the binomial distribution aswhere is the error function andIf denotes a small quantity, we inquire the conditions under which tends to , or more specifically . To this end, we derive conditions for equality in this expression from (11) and (12). Also, by using the approximation, the following condition holds:In the large limit and also assuming , can be approximated as

From (14) one could attempt to solve (13) for , , , and . But this is not possible because the function diverges for . Still, an interesting case occurs when . At this limit, (13) only makes sense if vanishes, or, the equivalent, if . However, this value does not provide the asymptotic limit we are looking for.

If we express in terms of through the rescaling and we also rewrite (13) in terms of we obtainwhere we have introduced the function: .

An operative approximation is possible by considering . Then, by using the corresponding solution and for large we findThis is nothing more than a useful parametrisation of (13). For we recover the case . However, larger values allow us to explore the limit closely. For instance, for , , and we arrive at the solution . This means that, for maximum attach strength (), beyond , the percentage of the deviation of from with respect to is lower than 15%. By setting other attack tolerance thresholds the value can be adjusted in different realisations of the blockchain network. The value represented in Figure 6 can then be conceived as a reasonable threshold for minimum network connectivity ensuring attack tolerance with the above parameters.

4. Proof-of-Concept Example

Notice that the mathematical model addressed in this work abstracts the specifics about transactions, blocks, network architectures, communication protocols, and so on. The implementer must therefore provide definitions for (a) what is a transaction, (b) criterion for consistent ordering of transactions (this is equivalent to defining the analogous to the double-spending problem), (c) how transactions can be packed into blocks, and (d) how is the information spread over the network. When these specifications are provided there are at least two possible scenarios where the findings addressed in this work can be applied: Wireless Sensor Networks and the Internet-of-Things.

As stressed, there are fundamental discrepancies between the proposed model and the current blockchain protocol implementation in cryptocurrencies. In particular, in our approach the information is not transmitted immediately to the miners once blocks are created; it is sequentially diffused as shown in Figure 4. This has its pros and cons depending on the application domain.

When there is no Proof-of-Work requirement the block mining lags tend to zero and the verification and generation delays become comparable. This way the blockchain construction speed is dominated by network latency. Therefore, in the absence of POW, one can reschedule agent’s diffusion to save network operations. In the following example we show a proof-of-concept example in the IoT domain where we compare our model with an asynchronous diffusion scheme akin to the conventional blockchain implementation. In the context of IoT consider the problem of human mobility tracking where two individuals leave rooms A and B to reach rooms D, E through hall C (Figure 7). Five presence sensors A–D are continuously capturing data of the form where identifies the sensor, represent the measurement time, and stands for the presence event. Measures are collected at intervals and then checked for consistency. Within , time is split into length subintervals. These quantities represent the minimum displacement time between home areas or any other relevant time scale. In general they will be functions of the sensor sampling rates. Therefore, we discretise the continuous variable into measurement epochs implicitly defined as This allows preprocessing raw data into a dataset with entries of the form , where we also drop values. Maintaining our cryptocurrency metaphor, we define transactions as ordered pairs in : . For instance, represents the movement of a person from room A at epoch to the hall C at epoch . Some transactions do not represent real movement (e.g., ). A possible criterion for the validness of a transaction is if . This restricts the type of movements allowed in a specific way, but any other criterion can also be defined.

Next we define a as an ordered sequence of transactions. If denotes the set of possible transactions among the measurements in collected in , consider two possible paths:Both paths represent the movement of two individuals from A, B to D, E. However, is not consistent, since the person in B intends to move from C to E before reaching C.

Since we neglect POW, we can consider blocks containing one transaction only which can therefore be generated immediately. The order criterion provides means for building the information chain avoiding the type of order inconsistencies commented above.

We also consider a minimal set of three distributed agents (miners in our analogy) which will build the chain. Depending on the network architecture and the communication protocol the information flow among agents can be defined in different ways. However, the model provided in Section 2 allows a considerable reduction of network operations which is more amenable for an IoT implementation. In the bottom panel of Figure 7 we use simplified sequence diagrams to compare the information flow of blockchain (a) and sequential diffusion (b) models as we did in Figure 4. In the lower part, we have also included a summary of the local information stored at each node.

Without loss of generality the mining ordering can be mapped to nodes 1–3 (again, as in Figure 4, we use a single thread for verification and mining processes in the nodes, since mining times are much larger than verification times). In (a), first extracts and validates transaction from and broadcasts the corresponding block to the network (1-2). After validating the block, in turn forwards it to (3-4). At a later time, validates , adds it to its local copy , and distributes the information among other nodes (6, 7). Next node 3 has itself mined    which is then validated and sent to the network (9–11). Finally, node 1 only finds it consistent to add to its local chain and then it broadcasts the information to the network for its validation and transmission (13–15).

However in the sequential diffusion model (b), as stressed, agents do not immediately forward transactions/blocks as they receive them; nodes propagate information when they generate new blocks. In the absence of POW, agents can synchronise to save unnecessary communication processes. This way, node 2 does not forward (2′) to node 3 after receiving it from node 1 (1′); the information is sent when packing (3′) and so on. This reduces the network traffic considerably. When a miners round is completed node sends a sync message (dotted line from 8′ to 9′) to the next first mining node ( in this case) until there are no more transactions to verify. If there are agents and transactions, the number of messages grows as in (a) and as in (b), where is the degree of each node in the agents’ network. The maximum overhead is attained for the full graph where and both models coincide.

Since WSNs and IoT have in general very low battery capacities, this dramatically limits the size of network traffic. Therefore the model addressed here can add value to these situations.

5. Summary and Discussion

In this paper we have analysed, both theoretically and numerically, the conditions under which distributed sequential consensus is possible in presence of partial connectivity and uncertainty. A minimum sequential diffusion model consisting of the superposition of a ring lattice with a random graph along with ordered infection rules allowed us to capture key blockchain elements: partial connectivity, sequential diffusion, and failure spreading.

In our setting a mean field approximation for network degree was helpful in deriving closed-form expressions for the probability of chain verification once errors are detected. We found that verifications grow exponentially with attack. As expected, graph connectivity reduces verification rates. This is a remarkable result because a large number of verification steps translate directly to cost and efficiency in real implementations.

We have also provided expressions for the probability of network breakdown when consensus is not possible. To this end, we have investigated analytically the constituents of the consensus problem in blockchains. We found that clustering dominates over random infection in the large network size limit. This allowed us to derive an expression for the average network performance as a function of connectivity and failure strength. We validated this expression by Monte Carlo simulation. As expected, 100% performance—blockchain limit—can only be achieved for full connectivity. Furthermore, there is an upper bound for network match for partial connectivity. Stronger attack strategies result in lower performance.

The resulting expressions allow us to derive connectivity thresholds above which networks can tolerate attack. Beyond that, lower bound consensus is only limited by information spreading and not by failure. A set of reasonable assumptions and algebraic manipulations allowed us to derive operational expressions for this bound. Specifically, for simple majority based consensus, we arrived at the solution; . This means that in a scenario with maximum attach strength, beyond , the percentage deviation of blockchain consensus with respect to the upper connectivity bound is lower than 15%.

Clearly this contribution is just a first step in the understanding of partially connected blockchains; the problem still needs further elaboration in order to foster more robust implementations. For instance, we have neglected some communication issues such as delay or bandwidth limitations. In a future work we will research other topological models such as scale-free and spatial networks with heterogeneous links. Multiplex networks will also allow us to inquire into different attack patterns and the possible counterattacking strategies.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was partially supported by the Regional Ministry of Education from Castilla y León (Spain) and the European Social Fund under the MOVIURBAN project with Ref. SA070U16.