Abstract

Bitcoin mining consumes tremendous amounts of electricity to solve the hash problem. At the same time, large-scale applications of artificial intelligence (AI) require efficient and secure computing. There are many computing devices in use, and the hardware resources are highly heterogeneous. This means a cooperation mechanism is needed to realize cooperation among computing devices, and a good calculation structure is required in the case of data dispersion. In this paper, we propose an architecture where devices (also called nodes) can reach a consensus on task results using off-chain smart contracts and private data. The proposed distributed computing architecture can accelerate computing-intensive and data-intensive supervised classification algorithms with limited resources. This architecture can significantly increase privacy protection and prevent leakage of distributed data. Our proposed architecture can support heterogeneous data, making computing on each device more efficient. We used mathematical formulas to prove the correctness and robustness of our system and deduced the condition to stop a given task. In the experiments, we transformed Bitcoin hash collision into distributed computing on several nodes and evaluated the training and prediction accuracy for handwritten digit images (MNIST). The experimental results demonstrate the effectiveness of the proposed method.

1. Introduction

Artificial intelligence (AI) has significantly affected human life in various aspects, solving various tasks, such as image classification and object detection based on supervised classification algorithms. Supervised classification algorithms use computational methods to learn information directly from data, where a positive and proportional relationship exists between the number of training samples and the accuracy of prediction results. The increase in the number of training samples would mean increasing the algorithm’s training sample time. Thus, having an architecture where the nodes can quickly get results and reach a consensus through off-chain smart contracts and private data would be extremely useful.

At present, the research on computational power based on blockchains can be summarized as follows:(1)Some proposed works leverage auction mechanisms to off-load tasks[16]. In these works, an application is divided into multiple tasks, and the tasks are off-loaded to a cloud server or edge servers. Consuming time, energy, and edge servers’ reputation are the indexes of auctions. To our knowledge, heterogeneous devices and privacy issues are not considered.(2)Several methodologies use deep learning to derive task off-loading for heterogeneous devices [79]. However, privacy issues have not been well solved, or computing devices must work in a permitted network.(3)Distributed computing based on Federated Learning (FL) has been proposed because it can protect privacy and reduce network burden [1013]. FL can complete AI computation without disclosing data, but FL is not suitable for heterogeneous devices.

Smart contracts are naturally distributed computation technologies. As a mature technology, on-chain smart contracts have some shortcomings in running complex programs. For example, Bitcoin scripts are not Turing-complete [13], and Ethereum does not support the execution of complex computations. [1416].

A consistent result must be obtained in calculating distributed and heterogeneous data while considering speed, energy consumption, and privacy protection. Reputation is an important index in evaluating nodes. Nodes with a higher reputation can process more tasks and get more rewards. Many blockchain-based computing models punish malicious nodes [1719]. However, with different training samples and devices, even honest nodes can make mistakes in AI calculations (e.g., supervised classification algorithms). Such penalties will greatly fluctuate the reputation of nodes and affect the calculation results.

In our proposed model, off-chain smart contracts and private data (edge data center) are leveraged for multiparty computing. Our method can speed up training and improve prediction accuracy and privacy. Our experimental results on MNIST show that the cooperation of many low-power nodes is not weaker compared to centralized servers. The key points of this paper can be summarized as follows:(1)We propose strong privacy protection and compatibility-computing model for supervised classification algorithms to accelerate data training in supervised classification and improve the accuracy of prediction results.(2)In the proposed architecture, we proved that the prediction result with the most supporters is most likely to be the right one.(3)Using Nakamoto consensus, we calculate the impact of the number of nodes and the accuracy of single node prediction for the entire blockchain. We also obtained the condition for task termination through calculations.(4)The influences of malicious nodes and lazy nodes on the prediction results are discussed further, and we proved the robustness against malicious nodes and lazy nodes.

The remainder of this article is organized as follows. In Section 2, the background of the study is discussed, and the rationale behind the design is explained. In Section 3, the novel distributed computing architecture is demonstrated. Section 4 explains how the methodology improves prediction accuracy and the robustness of the supervised classification algorithm. Section 5 presents the experimental results and discusses the performance of the proposed architecture. Finally, Section 6 presents the conclusions of this study.

2. Background

Similar to traditional programs, smart contracts can be stored and executed. However, smart contracts are distributed programs residing in the blockchain. They are automatically triggered according to instructions and do not require the participation of a third party. For a trigger event, the result of distributed execution needs to be unique, such that all nodes need to admit the solution known as the consensus mechanism. This section introduces the consensus mechanism in Bitcoin and presents relevant research on smart contracts.

2.1. Bitcoin and Nakamoto Consensus

At present, blockchain technology has become a research hot spot in finance, IoT, copyright protection, and information technology. It is a decentralized peer-to-peer (P2P) architecture, where the nodes consist as network participants. Blockchain establishes transparency and trust without third-party insurance.

As the first widely deployed and decentralized global currency, Bitcoin has attracted increasing attention. Nodes in Bitcoin compete to perform challenging Proof of Work (PoW) problems, in which solutions to the problems are worked out about every ten minutes. The winners who solve the problems get rewards through bonuses, which are stored in blocks. The blocks propagate among Bitcoin nodes through the network, so the bonuses are recorded by each node redundancy. When a block is accepted and added to the blockchain, the height of the blockchain is increased by one. In some cases, multiple nodes solve the problem before receiving solutions from other nodes, and so multiple blocks may be generated at a certain height.

A consensus mechanism is an algorithm in which a group of nodes can reach an agreement on events and sequences simultaneously. There are a lot of consensus mechanisms, such as PBFT [20], Paxos [21], Raft [22], Proof of Stake (PoS) [23], Tangle [24], and Shimmer [19]. The core technology to reach a consensus on Bitcoin is the Nakamoto consensus, as presented in Figure 1. In the figure, there are blocks chained in succession from Block0 to Blockn-1, where Block0 is at height 0 and Blockn − 1 is at height n − 1. At height n, node Alice and node Bob declare their blocks Blockn and concurrently. Node Carol receives Blockn before , so Blockn is added to Carol’s fork as a tip, and is stored as a backup. The same goes for Alice and Bob, wherein they build forks using their blocks, and subsequent blocks are stored in their memories as backup blocks. Alice and Carol keep the same tips in their memories, while Bob keeps them differently. At height n + 1, Carol publishes its block Blockn+1 while Block_n’s hash is kept in Blockn+1’s header. Because the fork with Blockn+1 is longer than that with , when Bob receives Blockn +1, it keeps Blockn+1 in memory, activates Blockn+1’s previous blocks (i.e., ), and then reserves as a backup. At height n + 2, Bob publishes its new block Blockn+ 2 with the header pointing to Blockn+1. As a result, Alice’s bonus in Blockn is accepted by every node, and Bob’s bonus in is ignored.

At height n, there are two blocks called forks. After several rounds of competition, the longest fork is considered to be the best chain in Bitcoin. The computing competition of the PoW problem for computing power incentives in Bitcoin is also called mining [25]. Bitcoin adjusts the difficulty of mining to ensure that a result is worked out every 10 minutes. In early October 2020, Bitcoin difficulty is 19.30 T, and the hash rate reaches 138.09 exahashes per second (EH/s) [26]. Such giant power makes Bitcoin the most energy-consuming application. According to Digiconomist [27], the estimated power used by miners to verify Bitcoin blockchain transactions is 70.89TWh a year, which is greater than the annual electric consumption of Colombia and 41 other countries. Therefore, the power waste in the hash collision has become an emerging concern.

2.2. Smart Contracts

The concept of smart contract was proposed by Nick Szabo in the 1990s [28]. He proposed embedding the concept of contracts into computer components. However, this concept was only theoretical because the technologies and protocols needed were not available at the time. Today, these requirements are available, allowing the implementation of smart contracts with blockchain technology.

There are two types of smart contracts: on-chain smart contracts and off-chain smart contracts. On-chain smart contracts are executed by all nodes in the network, such as Bitcoin scripts [29], Ethereum smart contracts [30], and Fabric chain codes [31]. On-chain smart contracts have three disadvantages. First, they must be run by all nodes, which means that they scale poorly. Second, the Turing downtime problem (also called the statement loop problem) directly causes the smart contract environment to execute script files in an infinite loop, resulting in increased running pressure until the system crashes. Third, although external data can be fed to smart contracts by oracles, data are visible to all nodes.

Off-chain smart contracts are executed outside of the core protocol. Only a subset of nodes need to execute them, such as the ongoing IOTA Smart Contracts [32], FastKitten [17], Ekiden [33], and ZoKrates [34]. In these systems, the calculation of tasks is performed off-chain by using multiparty computation (MPC) [35, 36], and the consensus is reached on-chain. While off-chain smart contracts do not put burden on the network and can handle heterogeneous data, their overall security depends on the security of each device. The summary of details of these works is presented in Table 1. Though these works adopt off-chain smart contracts to enable efficient decentralized task execution at low cost, none consider the heterogeneous data and devices.

3. Novel Distributed Computing Architecture

Traditionally, computers spend much time training a large number of samples. There are many types of computing devices used ubiquitously around the world, such as smartphones, smart vehicles, and wearable devices. Therefore, we propose a blockchain-based architecture for supervised classification. The model can gather the computing power of scattered equipment and reduce calculation time while ensuring accuracy. Devices are heterogeneous; some are powerful with their hardware, some have strong operating systems, and some are efficient with their training samples. To retain each device’s advantage, we propose a blockchain framework named RapidTrainChain with flexible off-chain smart contracts and compatible consensus named Proof of Prediction (PoP) for node cooperation. The longest chain is selected as the best chain in PoP. In each block of the chain, the transactions and task solutions are stored. The same solution is linked in the same fork, while different solutions are in different forks.

RapidTrainChain is designed as a distributed computing system to maximize overall performance and protect the data. The system architecture is shown in Figure 2. Off-chain algorithms and private data are managed by devices. Devices are also called nodes in the blockchain. When RapidTrainChain receives a task, the node determines whether to start a new task. If the task needs to be started, the node triggers the off-chain smart contract through an interface to start computing. The node can start working on a task if the node is free. After the nodes complete a task, the prediction results are stored in the blocks of RapidTrainChain. Same solutions are stored in the same fork, and the longest fork is considered the best chain. Solutions stored in blocks of the best chain are the final solution for the task. The node can decide when to stop working, as discussed in Section 4.5.

In contrast with the hash collision in Bitcoin or Ethereum, the accuracy of prediction results cannot be verified by nodes, so each legal prediction result from every node is stored in RapidTrainChain. Multiple nodes can also generate the same prediction results simultaneously, giving rise to blocks with the same prediction results stored on different forks. In this case, the prediction results with the most supporters cannot constitute the best chain. To ensure that the blocks supporting the same prediction are in the same fork, every node checks whether it is consistent with the predicted result in the latest received block. If the node is consistent and its block is not in the fork, it publishes a new block following the received one. The workflow is shown in Figure 3.

All nodes begin to train at t0. Bob initially finishes training and publishes its prediction result in the yellow block at t1. Later, Alice and Carol publish their findings at t2. Alice’s and Carol’s prediction results are the same but they are different from Bob’s. As Alice and Carol publish their blocks simultaneously, they cannot follow each other’s block. Also, they are unable to follow Bob’s block as their prediction results are different. At t3, Alice and Carol publish their blocks Blockn +1 and separately again, so their blocks are both the longest. At t4, Dave publishes its prediction result the same as Alice’s and Carol’s. Dave receives Blockn +1 earlier than , so Blockn + 2 follows Blockn + 1. The fork with Blockn + 2 is the best chain. Every node has its own special off-chain smart contracts and private training data. When a new task is started, nodes begin to train their private samples with their off-chain smart contracts and publish the prediction results into blocks. Blocks with the same prediction results are connected to the same fork, while different prediction results are stored in different forks. The result with most supporters, which is the longest fork, is considered the best chain. The accuracy of the prediction is discussed in Section 4.1.

Parameters and descriptions used in this paper are listed in Table 2.

For the following reasons, the longest fork is the best chain in PoP and PoW, but PoP is different from PoW:(1)PoP does not work on hash collision or certain algorithms. It works with various off-chain smart contracts.(2)PoP does not wait for a certain period for confirmation and security. The confirmation condition for task solving is discussed in Section 4.3 and Section 4.4.(3)Nodes do not verify the correctness of prediction results but reject illegal blocks.(4)The same prediction results are chained in blocks of a fork.

4. Quantification and Proofs

The smart contract is executed on multiple nodes in a distributed way. Because each node cannot have all the training samples, a node can get incorrect results. This section shows that the consensus mechanism can ensure that the result of the voting is correct. When some nodes skip calculations or cheat RapidTrainChain out of their own selfish desire, they do not affect RapidTrainChain in obtaining the right results.

4.1. Accuracy Estimation

Because the data is distributed and private, the training samples of one node are comparatively less than in a centralized system, which suggests that the node has low predictive accuracy. However, nodes that collaborate through blockchain technology can provide high-accuracy prediction results.

In this paper, RapidTrainChain works on the premise that appropriate private data and smart contracts are adopted. For example, if a delivery person’s smartphone shows relatively more people ordering hotpot today, it is more likely that the weather would be cold rather than hot or mild. This may suggest that takeout order data and appropriate off-chain smart contracts on smartphones could be used for weather inference. Under this premise, nodes are more likely to choose the correct classification.

As in the example, the weather may either be hot, mild, or cold, so |Cweather| = 3. The correct solution is also called target. If today is cold, targetweather=cold. Under the premise that private data and off-chain smart contracts are appropriate, nodei is more likely to predict cold weather; that is, .

Just like voting, the solution with the most supporters will be elected. The more participating nodes, the higher the accuracy. RapidTrainChain’s accuracy changes with the number of nodes and the accuracy of each node. Let us start the proof with a simple case:(1)For any solution to a given task, each node has the same probability of working out; that is, .(2)Except for targetx, the probability of a node getting all other solutions is the same; that is, .

Algorithm 1 calculates the accuracy of RapidTrainChain’s . Using this algorithm, we can get the curves in Figure 4 with |Cx| = 4, from 0.004 to 1, and |Nodes| from 1 to 30. When , no matter how large |Nodes| is, the curve does not go up or down. When , with plenty of nodes working together, the curves ascend to 1. When , with adequate nodes, the curves go down to 0. If enough nodes work together, the nodes are more likely to choose the correct classification after the samples are trained (i.e., ). Equation (1) shows the trend of .

Result: Accuracy of RapidTrainChain, P
Input:
 Number of nodes, |Nodes|;
 Accuracy of a node, p;
 Number of classifications, |Cx|;
Function:
 Decompose (nodes_number_range_of_a_classification, remaining_nodes, remaining_classifications)
  if nodes_number_range_o f _a_classification > 1 then
   foreach x ∈ nodes_number_range_of_a_classification do
    ;
    map<x>++;
   end
  else
     1;
end;return
 cnt = tmp = 1;
 foreach iterator ∈ map do
   ;
   cnt = cnt + iterator- > second;
 end
return
   ;

Suppose that the accuracy of a node named nodej is ; the total accuracy of RapidTrainChain () can be characterized as follows:(1)If nodej properly trains samples and computes more accurately than nodei, that is, , while the number of nodes remains constant, the accuracy is improved.(2)If nodej properly trains samples but computes less accurately than nodei, that is, , given enough nodes computing at least as accurately as nodej, the accuracy is also high.(3)If nodej skips training and directly makes predictions, the classifications are randomly selected; that is, . The situation is discussed in Section 4.2.(4)If nodej improperly trains samples, it is more likely to pick up wrong solutions; that is, . Such a situation is discussed in Section 4.3 and Section 4.4.

4.2. Robustness with Lazy Nodes

Some nodes may skip training and make predictions directly. The prediction results are randomly picked from the set of all solutions (Cx), and these nodes are called lazy nodes. Nodes that properly train samples are called honest nodes. The accuracies for these types of nodes are as follows:(1)For lazy nodes, , and,(2)For honest nodes, ,where lazy_nodeNodeslazy and honest_nodeNodeshonest. Therefore, the following equations can be used to calculate the expected fork lengths for the different solutions:

Equation (2) is the expected length of a fork with targetx, that is, . Equation (3) is the expected length of other forks. The distance between the tips of forks is equal to equation (2) minus equation (3); that is, . As long as and , the greater the number of honest nodes is, the more likely the fork with targetx would be the longest. The number of lazy nodes does not affect accuracy.

4.3. Solutions Competitions

Suppose that there is one solution that is easier to calculate than all other solutions except targetx, that is, . The race between the fork with ck () and that with targetx () can be characterized as a Binomial Random Walk. The probability (proz) indicating the length of forks with ck catching up that of targetx from z blocks behind proz is given in the following equation:

If , proz drops exponentially as z increases. As the number of accurate blocks increases, the chances of become extremely smaller.

Consider when RapidTrainChain is sufficiently certain that ck cannot win the task. targetx is added to a block, and there are at least z more nodes supporting targetx than those of ck. Assuming that the honest nodes spend almost the same period in making predictions, ck’s potential progress will be a Poisson distribution with expected value:

Equation (5) is the same as “Calculations” in Bitcoin Whitepaper [39]. Using Nakamoto’s methods, the probability can be expressed by the following formula:

According to equation (6), probability changes with and z. The curves show that probability drops exponentially with z in Figure 5. The closer and are to each other, the more blocks are needed to ensure that the fork with ck () does not catch up with the fork with targetx (). If the probability of getting solution ck is half that of targetx, that is, , the probability of catching up is 0.010943 when z= 21. In this case, when one fork is 21 blocks longer than another, the solution in the longest fork is very likely to be targetx.

4.4. Competition with Sybil Attackers

In distributed systems, there are many kinds of attackers, such as Sybil attackers. Sybil attackers propose a certain wrong prediction () continuously; that is, . Suppose that the number of Sybil attackers is |NodesSybil|.

Equation (7) is the expected length of the fork with targetx, while equation (8) is the expected length of the fork with . Equation (7) minus equation (8) is the distance of tips between the two forks, that is, .

There needs to be more than honest nodes to ensure RapidTrainChain’s accuracy. Suppose . After a certain period, when most honest nodes finish making predictions, if one fork is 21 blocks longer than the others (as described in Section 4.3), the solution in this fork is very likely to win.

4.5. Task Duration

RapidTrainChain works on one task until one fork is unambiguously the longest, such that the length of the fork satisfies the condition:where ck is a solution belonging to Cx. If a fork with ck is the longest (i.e., no fork can possibly be longer), RapidTrainChain stops the task. However, there are infinite nodes in the public network. If forks are very unlikely to catch up with one fork, RapidTrainChain should stop working on this task. As discussed in Section 4.3 and Section 4.4, if there are more than honest nodes after a certain period and supposing , RapidTrainChain stops the working task when one fork is 21 blocks longer than the others.

5. Implementation and Evaluation

We enhanced the mining function in Bitcoin source codes (generatetoaddress in mining.cpp), validation functions (CheckProofOfWork in pow.cpp), and other Bitcoin-related functions to invoke off-chain smart contracts. Off-chain smart contracts can train samples and make predictions and store these prediction results into blocks. We set up a powerful computer and 10 less powerful nodes in RapidTrainChain, as shown in Table 3. Each node in RapidTrainChain possesses 5,000 training samples, while a powerful computer holds 50,000 training samples. We monitored the performance of nodes in RapidTrainChain and compared it with the computer’s performance. To make the performance data comparable, nodes in RapidTrainChain and the powerful computer adopted the same algorithm in training their samples and making predictions repeatedly.

The summary of node performances is presented in Figure 6. The main highlights are as follows:(1)From Figure 6(a), the RapidTrainChain and powerful computer start simultaneously at time = 20 seconds. RapidTrainChain stops at time = 1240 seconds, while the powerful computer stops at time = 5420 seconds. The powerful computer takes more than four times as long as RapidTrainChain.(2)Figures 6(a) and 6(c) present the CPU and memory performance charts, showing that a single node in RapidTrainChain consumes much less CPU computing and memory resources. This is because the node trains fewer samples, and the calculation burden is shared.(3)Figure 6(d) shows the data storage performance chart. A single node in RapidTrainChain requires a little more storage than the powerful computer because it needs to store messages from other nodes.(4)As shown in the network performance chart in Figure 6(b), the powerful computer consumes little bandwidth, while RapidTrainChain uses some bandwidth for block transfer.

We used a convolution neural network (CNN) algorithm to predict on MNIST. Because the existing CNN algorithm performs very well in digital recognition, the accuracy of prediction would be high even if only 5,000 training samples were used. Thus, we divided the 50,000 samples into 20 parts (i.e., each containing 2,500 samples) and assigned the parts to 40 nodes. As shown in Figure 7, the accuracy of RapidTrainChain is much higher than the average accuracy of nodes and increases as the number of nodes increases.

6. Conclusion

In this paper, we presented a novel supervised learning approach based on Bitcoin (RapidTrainChain). In the proposed algorithm, we introduced a rapid, compatible consensus mechanism (PoP), which helps RapidTrainChain make an accurate prediction. We formalized the cooperation mechanism to reduce the workload of a single node while maintaining overall accuracy, improving overall efficiency, and ensuring overall privacy. In the experiment, we showed the influence of honest nodes, lazy nodes, and Sybil attackers on the overall accuracy. We implemented our proposed algorithm and evaluated its efficiency. Our results suggest that RapidTrainChain does not depend heavily on the computational power of single nodes and is friendly to heterogeneous devices. We found that the more nodes are used in RapidTrainChain, the more secure the system becomes. Moreover, our results showed that the number of nodes does not affect the processing time for a given task and that RapidTrainChain can be applied in the public network.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was funded in part by the National Natural Science Foundation of China (Grant no. 61772352), National Key Research and Development Project (Grant nos. 2020YFB1711800 and 2020YFB1707900), the Science and Technology Project of Sichuan Province (Grant nos. 2019YFG0400, 2021YFG0152, 2020YFG0479, 2020YFG0322, and 2020GFW035), the R&D Project of Chengdu City (Grant no. 2019-YF05-01790-GX), National Natural Science Foundation of China (Grant no. 61871422), and Science and Technology Program of Sichuan Province (Grant no. 2020YFH0071).