Abstract

Blockchain is an emerging technology that promises many exciting applications in various fields, including financial, medical, energy, and logistics management. However, there are still some limitations in the existing blockchain framework that prevents its widespread adoption in the commercial world. One important limitation is the storage requirement, wherein each blockchain node has to store a copy of the distributed ledger. Thus, as the number of transactions increases, this storage requirement grows quadratically, eventually limiting the scalability of a blockchain system. Moreover, the public ledger in a blockchain framework allows anyone in the network to audit the transaction, which may not be favourable in some privacy-sensitive applications. In this paper, a secret-sharing scheme is proposed to reduce the size of the blockchain transactions. Each transaction block is divided into t parts, and the size of each part is 1/t size of transaction block. We use the secret-sharing mechanism to share t parts into n shares. Hence, each node stores not one transaction but one share in the blockchain system. The proposed scheme can eventually reduce the storage cost of a blockchain transaction by 1/t without introducing an additional recovery communication cost; however, robustness is reduced in node failure as a tradeoff. Meanwhile, the proposed scheme was more efficient and secure compared to other state-of-the-art schemes that aim to reduce blockchain storage for industrial big data.

1. Introduction

Internet commerce has become relatively entirely dependent on financial institutions that are trusted third parties to handle electronic payments. Although the system is suitable for most transactions, it still has the inherent weaknesses in trust-based models. Blockchain is a paradigm-shifting technology that has emerged over the past decade, which is based on peer-to-peer communication technology, network theory, and cryptography [1]. In a traditional centralized database model, the central server stores every transaction record, which can be disastrous if the central server is being attacked. In contrast, blockchain technology is a distributed solution without any third-party trust problem. Since every node in the blockchain network keeps a copy of the public ledger, it is possible to audit the transactions locally without referring to a centralized authority. Therefore, blockchain can also be viewed as a distributed ledger system, which eliminates the disadvantage of a traditional centralized database model. However, the storage cost in blockchain is huge [2], which is one of the factors currently limiting the widespread adoption of blockchain technology. Since each node needs to store the public ledger locally, the storage cost of an entire blockchain network grows quadratically.

To resolve the storage issue in blockchain, Dai et al. [3] proposed a low-storage blockchain system that employs network coding theory to divide the transaction data into multiple blocks and then stores the blocks at different nodes. This can realize distributed storage (DS) by recovering the transaction data through network coding (NC). They proposed two kinds of NC-based DS, one is deterministic rate (NC-DRDS) and the other is rateless (NC-RLDS), to deal with a fixed and variable number of blockchain nodes, respectively. In another approach, Dorri et al. [4] proposed a memory-optimized and flexible blockchain (MOF-BC) that allows the user to summarize or remove part of the “aged” blockchain transactions. This eventually reduces the ability to perform a public audit, as some of the information is eventually erased. The idea of distributed storage blockchain (DSB) is first introduced by Raman and Varshney in 2018 [5]. The blockchain transaction block is first encrypted and then stored at different nodes (together with the encryption key) using a secret-sharing scheme. In this way, the storage of a blockchain is reduced significantly. Later, Kim et al. [6] improved DSB by proposing a local secret-sharing (LSS) scheme, which can tolerate single peer failure.

Secret sharing is used as a basic cryptographic primitive in computer science, including for electronic voting [7], key-aggregate authentication [8], distributed cloud computing [9], secret image sharing [10, 11], and data hiding [12]. In 1979, the secret sharing was first introduced by Shamir [13] based on the Lagrange interpolating polynomial, while Blakley [14] independently took another approach based on the hyperplane geometry. In 1983, Mignotte’s scheme [15] and Asmuth-Bloom’s scheme [16] were proposed based on the Chinese remainder theorem. A perfect secret-sharing scheme [13] has two properties: (1) any or more shares can recover the secret; (2) any or fewer shares reveal no information about the secret. To share another secret, the dealer must redistribute every participant’s secret share. Later, several schemes have been presented for multiple secret sharing. In a multisecret sharing scheme, every user only needs to keep one share and many secrets can be shared independently without updating the share.

Most multisecret sharing schemes require the disclosure of large amounts of public information. Chien et al. [17] proposed a multisecret scheme based on systematic block codes, while Yang et al. [18] improved the amount of public information required by Chien et al. Further, Yang’s scheme is based on Shamir’s secret-sharing scheme such that fewer than the threshold number of pieces do not leak any information. Dehkordi and Mashhadi’s scheme [19] focused on improving the efficiency of computations involved in share creation and secret reconstruction rather than space efficiency. Our multisecret sharing is based on recursion, in which a secret is first divided into pieces and then the pieces are encoded one by one in such a manner that the shares of the already encoded pieces are reused to create new shares for the next piece.

In this paper, we improve existing work [3, 5, 6] by proposing a low-storage scheme with multisecret sharing based on polynomial interpolation. Our idea is different from Raman and Varshney [5] and Kim et al. [6] as we do not encrypt the blockchain transaction block. Instead, our proposed scheme divides the blockchain transaction block directly into multiple shares and stores it in different nodes. The advantages of our work can be summarized as follows.(1)Efficiency: the size of share is on the order of |S|/t; that is, the storage room and communication cost are reduced effectively.(2)Robustness: no side information needs to be stored with the shares, and no public information needs to be disclosed.

2. Blockchain and Storage Issues

Blockchain is a distributed ledger system that requires each participating node to store a copy of ledger for the transaction records. Every time a transaction block is created, it is first verified by the neighbouring nodes and then goes through the consensus (mining) process by solving a difficult cryptographic puzzle. One of the nodes that successfully solve this puzzle is rewarded with some incentives. This transaction block is then added to the ledger, wherein each block is related to the previous block through a chain of hash values. This data structure suffers from scalability issue, as the storage room required to keep the entire ledger is growing quadratically when the number of blockchain nodes increases.

3. Overview of Shamir’s Secret-Sharing Scheme

3.1. Shamir’s -Threshold Secret Sharing

In 1979, Shamir developed a -threshold secret-sharing scheme [13] based on polynomial interpolation, wherein a univariate polynomial of degree is uniquely defined by points with distinct , for . The scheme can decompose one secret into shares, with shares required to recover the original secret, where . However, the secret cannot be recovered with less than shadows. It consists of the following two phases.

3.2. Shadow Distribution Phase

The trusted dealer starts with a secret integer, , that is, to be distributed among users. Thus, the dealer runs the following:(1)Choose a prime .(2)Randomly select independent coefficients , to constitute a random polynomial with degree over :(3)Choose distinct nonzero elements of , denoted as , for .(4)Compute and securely transfer the share to user , along with the public index .

3.3. Secret Reconstruction Phase

Assume that users, , pool their shares to compute the secret . Their shadows provide distinct points , , which allow the computation of the coefficients of by Lagrange interpolation. The secret, , can be expressed aswhere , for .

Figure 1 shows an example of Shamir’s (3, 6) secret sharing, which can be adopted by blockchain. Assume that the transaction block is divided into three equal length packets, which are expressed as integers , , and . We use them as the coefficients to construct a 2-degree polynomial, which is expressed as . The shadows are then generated and distributed to the blockchain nodes with .

4. Blockchain Transactions with Secret-Sharing Scheme

In this section, we describe our proposal of using secret-sharing scheme to reduce the storage cost of blockchain. In our sharing scheme, each block is stored in a distributed manner among all nodes. Consider the transaction block , our idea inspired by Parakh and Kak [20]. The flowchart of sharing process and reconstruction process is shown in Figure 2.

Firstly, we divide the block into pieces of size , denoted as , such that their concatenation . All the arithmetic performed is finite field arithmetic on , where is larger than and . Then, we share them recursively as follows. The first step is to run Shamir’s (2, 2) secret sharing for the piece . That is, generate randomly a 1-degree polynomial whose constant term is , and compute and to obtain two shares for . Next, generate a 2-degree polynomial whose constant term is the next piece , and the coefficients are the previous two shares. By induction, assume that we have obtained shares of the piece , denoted as , we can generate a -degree sharing polynomial for the next piece by using as the -th coefficient term for . At the last step, we generate a -degree sharing polynomial:and the final shares arewhere are public indexes of nodes. The sharing scheme is shown in Algorithm 1.

Given the block and public indexes
 Divide into equal length pieces
 Select a large prime
 Select a random number , and construct a 1-degree polynomial
 Compute and
for to do
  Construct -degree polynomial
  for to do
   Compute
  end for
 end for
 Construct -degree polynomial
for to do
  Compute
 end for
 Distribute the share to the corresponding node with public index , for

The reconstruction is an inverse process that is carried out in a backward and first-out manner. Any of the shares can interpolate a -degree polynomial with the constant term . Then, interpolate a -degree polynomial with the constant term , by taking the -th coefficient of as the point at , for . The recursion repeats until is obtained. Then, the whole block is given by the concatenation . Algorithm 2 shows how to reconstruct the shared transaction block.

Given any shares
 Interpolate points, , to generate -degree polynomial  
 Compute the constant term and the coefficients
for to 1 do
 Interpolate points to generate -degree polynomial  
 Compute
end for
return

In Figure 3, a block is divided into 4 pieces , , , and . Assumed that is to be shared between 7 parties such that any 5 of them can reconstruct all the 4 secrets. We can now use a prime and random number .

Theorem 1. Our proposed scheme divides the original file into pieces in such a way that (i) any packets can recover the original file , but (ii) no group of pieces can do so.

Proof. Based on the Lagrange interpolation, given points , , , with distinct ’s, there is a unique polynomial of degree such that , for , i.e.,where .
Therefore, can be reconstructed correctly. To reconstruct the next piece , a -degree polynomial is generated by taking the -th coefficient of as the point at , for . Correctness is still guaranteed by Lagrange interpolation. Then, based on recursion, , , , can be correctly reconstructed one by one.
On the contrary, if only of these packets are available, it is not suffice to compute ’s. For each candidate value , we can construct a unique polynomial of degree such that and for . These possible polynomials have equal probability; therefore, the real value of packets cannot be determined.
Moreover, we can improve the above proposed scheme to achieve the verifiable property. In general, each transaction block in the blockchain system (e.g., Bitcoin) consists of a block header and a data block. All transactions are included in the data block as leaf nodes of a Merkle tree [21], and the hash root of the Merkle tree is computed in the block header (see Figure 4). Denote the block header and data block of a transaction block, , as and , respectively. Now, we perform the above sharing scheme for and , independently. After reconstructing and , the correctness of can be verified by the root hash value in block header .

5. Practical Implementation Issues

5.1. Storage Cost

Assuming that the block header is stored in and the data block is stored in , then the blockchain’s storage cost per transaction per peer is

The storage cost of the DSB proposed by Raman and Varshney [5] is

This is effectively smaller compared to the traditional blockchain. Kim et al. [6] proposed the LSS scheme to further reduce the storage cost to

The comparison of storage cost for different schemes is detailed in Table 1.

Compared to the traditional blockchain, our proposed scheme can effectively reduce the storage cost by , so the storage cost becomes . From Table 1, it is clear that our proposed scheme is more efficient than other prior work including Kim et al.’s [6]. This can effectively reduce the deployment and maintenance cost in blockchain.

5.2. Recovery Communication Cost

Our proposed scheme can effectively reduce the storage cost in blockchain communication, but it also introduces some additional communication cost. In traditional blockchain, each node keeps a public ledger locally. If any node failure happens, the failed node can recover the ledger by getting it from other peers, so the communication cost is proportional to the storage cost. The minimum communication cost is essentially

When there is a node failure, our proposed scheme requires secret shares to recover the transaction block, so the recovery communication cost is dependent on the number of shares and proportional to the storage cost:

Figure 5 shows the tradeoff between storage and recovery communication cost for a specific case. It is clearly shown that our proposed scheme can achieve lower storage cost but does not increase the recovery communication cost. This is a huge advantage compared to original blockchain and the state-of-the-art work by Kim et al.

5.3. Robustness to Peer Failures

In regard to the robustness to peer failures, traditional blockchain can tolerate node failure, in the expense of large ledger stored in every node. The scheme proposed by Raman and Varshney [5] and Kim et al. [6] can only tolerate and failures, respectively, which is significantly smaller than the traditional blockchain. Our proposed scheme can tolerate node failure, which is better compared to these two state-of-the-art schemes [5, 6]. Figure 6 shows the comparison of robustness where .

Another interesting work was presented by Dai et al. [3] recently based on network coding theory. Compared to them, our proposed scheme shares the same storage cost, communication cost, and robustness. However, their scheme can guarantee the recovery of transaction block if the collected packets is no less than . It may leak information to malicious node if the collected packets is less than . On the contrary, our proposed scheme utilizes Shamir’s secret-sharing scheme, which can assure that if the collected packets is less than , no additional information is being leaked. Hence, it is more advantageous compared to the scheme proposed by Dai et al.

6. Conclusions

Traditional blockchain requires huge storage room when the number of nodes increases, which limits the scalability of this emerging technology. In this article, we proposed a low-storage scheme based on secret sharing to reduce the storage cost of blockchain. The proposed scheme is able to reduce the storage cost of traditional blockchain by without introducing additional communication cost. It is also more efficient compared to the other recent studies [5, 6] that attempted to solve the same problem. Although the scheme proposed by Dai et al. [3] is as efficient as our proposed scheme, it leaks partial information and is considered less secure. The proposed scheme requires both and to be fixed, so it is more suitable for private and consortium blockchain systems where the number of nodes is predetermined. This can be a limitation as the number of nodes in public blockchain is changing dynamically. In future, we plan to extend this to support public blockchain.

Data Availability

The relevant analysis data used to support the findings of this study are included in the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was partly supported by the National Natural Science Foundation of China (Grant no. 11804114), the Natural Science Foundation of Fujian Province, China (Grant nos. 2017J01761 and 2018J01537), the Science and Technology project of Fujian Province (Grant no. 2019H0021), and the Science and Technology project of Xiamen Municipal (Grant no. 3502Z20173028).