Abstract

Blockchain has become very popular and suitable to the Internet of Things (IoT) field due to its nontamperability and decentralization properties. The number of IoT devices and leaders (who own IoT devices) is increased exponentially, and thus, data privacy and security are undoubtedly significant concerns. In this paper, we summarize some issues for the BeeKeeper system, a blockchain-based IoT system, proposed by Zhou et al., and then aim for presenting an improved solution for decentralized data aggregation (DDA) on IoT. Firstly, we formally state the security requirements of DDA. Secondly, we propose our basic DDA system by using secret sharing to improve its efficiency and smart contracts as the computing processors. Moreover, the proposed full-fledged system achieves data sharing (e.g., a leader to access data of others’ devices), which is realized by using local differential privacy and cryptographic primitives such as token-based encryption. Finally, to show the feasibility, we provide some implementations and experiments for the DDA systems.

1. Introduction

The Internet of Things (IoT) is the extensive concept of Internet with physical devices for jointly realizing a service or achieving target functionality. For example, these devices (embedded systems or sensors) communicate and interact with others over the Internet and also can be remotely monitored and controlled. In fact, IoT has offered numerous applications in the aspects of consumer, commerce, industry, and infrastructure. For education purposes, Raspberry Pi plays the role of a cheap and fundamental platform to develop applications of IoT. Typically, in IoT systems, the purpose for collecting data may be used in the future (e.g., analysis for behaviors of consumers), and such data are stored in the centralized cloud storage. Hence, users are enforced to trust that the centralized cloud will protect their unencrypted data.

This suffices to wrap up the following issues for centralized cloud-based IoT systems. (i)Centralization: the cloud is in charge of providing and maintaining the service. If it crashes, the centralized system is no longer to work [1]. For example, denial of service attacks may focus on the single cloud to terminate the functionality(ii)Trust: the users outsource their data to the cloud [2] and only hold limited control for their data. They have to fully trust the cloud which does not modify and delete the stored data(iii)Scalability: more efficient IoT devices and larger bandwidth communication produce big data streams [3]. Therefore, the centralized cloud must be accordingly efficient or scalable for tackling those data(iv)Data security: the stored data are unencrypted. Sensitive data are directly revealed to the cloud. However, encryption may be a solution, but processing over encrypted data is always an issue. In the past, plenty of clever notions [46] are proposed to address this

Blockchain, introduced by Nakamoto [7], is referred to as the core technology of Bitcoin. It can be regarded as the distributed ledger consisting of blocks chained in order. The potential advantages of blockchain for improving IoT systems are as follows: (i)Decentralization and tamper proof [8]: the property of decentralization in blockchain frees our IoT system from setting up centralized nodes. Anyone (a.k.a. a node or participant in the protocol) can join or leave the protocol execution at any time without getting permission from an authority. In general, it is an effective method for maintaining a public, immutable, and ordered ledger of records (for instance, in the Bitcoin, these records are simply transactions); that is, records can be added to the end of the ledger at any time (but only to the end of it); additionally, it is guaranteed that records previously added cannot be removed or reordered and that all honest nodes have a consistent view of the ledger, which is referred to as consistency(ii)Anonymous: since the data exchange between the nodes of the blockchain follows a fixed and predictable algorithm, the blockchain network is trustless and can exchange data based on addresses instead of identities(iii)Smart contracts (SC): SC are a powerful application based on blockchain, where SC programs deployed on the blockchain and can be triggered or executed by events. SC can also make our IoT devices smarter since the behavior of them can be specified by SC

In a sense, blockchain seems a sort of potential solution to the above-mentioned issues in the IoT scenario, including centralization, trust, and scalability, since it provides decentralized services, distributed trust, and permissionless property for scalability.

1.1. Related Work

Very recently, Zhou et al. [9] proposed a blockchain-based threshold IoT system, named as BeeKeeper. This system applies secret sharing to encrypt data. In this system, there are three entities: a leader, the leader’s devices, and a certain number of servers, and all of them only communicate with blockchain. The threshold is denoted by -out-of- (), which implies that there are totally servers, and any suffice to complete the task. Let us keep highlighting the procedure of the system. Devices collect data and then submit them to blockchain. Servers will retrieve them from blockchain, run some operations, and return the processed data back to blockchain. Finally, the leader can submit his/her request to blockchain and obtain the (aggregation) result as long as at least servers complete the processing over the data. In fact, the same authors [10] have proposed an improvement named as BeeKeeper 2.0, which can offer some additional functionalities, but the overall framework is not significantly changed. In addition to data aggregation, blockchain can be also applied to key management in IoT [11].

1.2. Contributions

The motivation of this paper is the observation in BeeKeeper, regarding a few concerns briefly shown as follows. (1) At least servers must be maintained to operate for keeping functionality. (2) Blockchain seems only the communication platform and database. (3) The underlying secret sharing is realized by using polynomials. Our main results are conceptually simple solutions where we use smart contracts (supporting data aggregation) to overcome (1) and (2) and then rely on secret sharing to improve efficiency to (3). We propose a warm-up system which is composed of the above building blocks and achieves decentralized data aggregation on IoT. However, in the warm-up, the devices are of the leader, which means that he/she cannot obtain the results collected by the other leaders’ devices. It is an extensional feasibility, but indeed induces privacy issues for accessing others’ data. Hence, we provide a full-fledged system to achieve this by using local differential privacy and token-based encryption. Accordingly, we also state the security definition for our systems, and under such the definition prove their security. As shown in Table 1, we briefly explain the difference from the previous works. The proposed DDA/DDA+ does not need to set up a server to handle complex calculations. The aggregation of data is left to record nodes of DDA/DDA+ or the owner of the smart contract (i.e., leader), which operates with the (simple) logic design. For the verification function, Zhou et al. [9, 10] require additional verification for all participants. However, in our DDA+ scheme, we only need to consider devices to verify the legitimacy of the leader, for which we provide a token tk. In the high level DDA+, data privacy protection is added. We use local differential privacy to protect sensitive data of devices over gathering. In addition, our smart contract sets up multiple functions, not just processing transfer operations. Finally, we complete the security proof of the program.

1.2.1. Comparisons to the Previous Version [12]

In the full version, we go through the main ideas of protocol design and then give a solid description of security. In addition, we present the full-fledged construction from local differential privacy and prove the security in a formal way. Finally, experiments are provided to show its effectiveness.

1.3. Organization

The remainder of this paper is organized as follows. First of all, Section 2 briefly describes the basic technologies of the decentralized data aggregation (DDA) system. Then, in Section 3, we present the system model and security requirements of this system. Thereafter, the warm-up and full-fledged DDA systems are described in Sections 4 and 5, respectively. Implementations and analysis of the systems are given in Section 6. Finally, we provide the conclusions of this paper in Section 7.

2. Preliminaries

In this section, we briefly introduce some preliminaries. We use to denote security parameters and PPT to denote polynomial time adversaries. Also, the assumptions are generated by the parameter; we define a negligible function for any polynomial function and satisfies sufficiently large, holds.

2.1. Secret Sharing

Secret sharing [13, 14] is a notion that divides a secret into shares. The secret can be recovered by combining certain numbers of shares. Here, secret sharing for participants only allows shares together to recover the secret, so-called secret sharing. Also, for some specific functionality, we state the definition as follows.

Additive secret sharing: a secret message secret will be split into shares (, ,…, ) in secret sharing scheme. This scheme consists of five phases. (i)ShareGen (, , secret): given security parameter , device numbers , secret as input, then outputs ,…, (ii)SecretRec (,…, ): taking all the shares ,…, as input, this algorithm can output secret(iii)AddOnShare (): it takes data , share as input, and then generates (iv)PartialRec (,…, ): this algorithm takes ,…, as input and then outputs partrec as the result for partial recovery(v)Extract (partrec, secret): it takes partrec and secret as input and returns sumx

We give the formal security definition of secret sharing as follows.

Definition 1. Let be the number of participants, which implies that only collecting shares can recover the secret. Accordingly, any shares cannot get any information about the secret. The probability of an (unbounded) adversary is where and is not repeated.

The construction can be easily realized by using additive secret sharing.

2.2. Smart Contract-Based Aggregation Service

The smart contract-based aggregation service can mainly achieve identification of devices and data aggregation. Note that smart contracts do not provide any confidentiality. In general, a smart contract-based aggregation service is composed of the following algorithms (The SC provides the specified functionality for the specified authorized user and provides data signature and verification, since the underlying blockchain works with signatures). (i)InitContract: given only the security parameters as input, it initially creates and deploys the personal smart contract on the blockchain and finally outputs the corresponding smart contract address (SC.Address)(ii)AddDevice(SC.Address,leaderId,id,): it takes the smart contract address SC.Address, the personal identity of the system leader leaderId, the devices’ id id, and the id list (all id), and then, the algorithm generates as inputs and outputs flagid (flag indicates that the device holds permissions to execute certain functions in SC) of every id and updated id list (iii)Upload(SC.Address,id,flag,m,): it takes the smart contract address SC.Address, the devices’ id id, the corresponding flag flag, the collected data , and the database (all ) as inputs. Subsequently, the algorithm can generate and output the result database (iv)Aggregation(SC.Address,,leaderId): it takes the smart contract address SC.Address, the result database DB0, and the personal id of system leader leaderId as inputs. Finally, it returns the aggregation result

The smart contract-based aggregation service system holds the following security properties.

Definition 2. The probability that a polynomial adversary without leaderId and corresponding conditions in this system can execute the AddDevice algorithm is negligible, that is, where id denotes the device .

Definition 3. The probability that the polynomial adversary as a device without corresponding flag in this smart contract-based aggregation service system can upload data is negligible, that is, where denotes the data uploaded .

Definition 4. The probability that a polynomial adversary without leaderId and corresponding conditions in this system can execute the Aggregation algorithm is negligible, that is,

3. System Model

3.1. Syntax of Decentralized Data Aggregation

In this section, we aim for a concrete notion, called decentralized data aggregation (DDA). We formally present the system model below. In a nutshell, DDA consists of three phases, denoted by DDA = (Initialization, DataCollection, and DataReconstruction). Some notations are listed in Table 2. The DDA syntax is formally described as follows. (i)Initialization (, , , L.EOA, .EOA, , ): the algorithm takes as inputs the security parameter , , device quantity , leader’s public address L.EOA, public address of the -th device .EOA, leader’s private key , device address set and outputs , SC.Address, , (ii)Data Collection (SC.Address, L.EOA, , , ): the algorithm takes as inputs contract address SC.Address, leader’s public address L.EOA, the raw data for the -th device , device stored database DB, the private key of the -th device and outputs (iii)Data Reconstruction (): the algorithm takes as inputs ciphertext , leader’s private key , and outputs

3.2. Security Requirements

In our security model, privacy-preserving aggregation is our main concern. Since the smart contract is completely public, we assume that there are no inside threats in which the contract code is secure. To clarify our security definition, we assume that there are only three parties on the system: the leader, the smart contract, and the device belonging to the leader’s IoT. We elaborate the following security definitions for all probability polynomial time adversaries where we can quantify its computation is polynomially bounded such as . (1)Single Device Security (SDS). We assume an adversary who knows the probability distribution of the plaintext space of one of the devices, but does not have any knowledge of the key. can eavesdrop on communication (leader and IoT device) and thus observe the ciphertext of an IoT device. Let us briefly describe this simple security experiment as follows:(i)Setup: the challenger randomly selects and obtains sequence , and then, outputs a pair of message (ii)Challenge phase: the challenger randomly selects and a bit , and then, is computed and given to . We refer to as the challenge ciphertext. Note that the ciphertext only has a ciphertext generated by a single device operation(iii)Guess: outputs a bit

We defined s advantage as (2)Threshold Device Security (NDS). We assume an adversary who is given values, but cannot infer . Collecting the knowledge of pieces reveals no significant information about . Let us briefly describe this simple security experiment as follows:(i)Setup: the challenger randomly selects and obtains sequence (ii)Challenge phase: the challenger randomly select values to send to (iii)Guess: outputs

Definition 5 (SDS). We say that DDA meets Single Device Security if for all probabilistic polynomial time adversary , .

We defined s advantage as (3)Smart Contract Security (SCS). We assume an adversary (A.EOA) who tries to modify the uploaded data. Let us briefly describe this simple security experiment as follows.

Definition 6 (NDS). We say that DDA meets Threshold Device Security if for all probabilistic polynomial time adversary , .

L.EOA must authorize A.EOA to enter SC, that is, run AddDevice, does not get L.EOA’s flag flagid. (i)Setup: the challenger deploys the smart contract and returns SC.Address to (ii)Challenge phase: the challenger randomly selects identities of devices to run AddDevice(iii)Guess: outputs flagid. We defined s advantage as(4)Outlier Security (OS). We assume that the external adversary gets the result of all ciphertext aggregation. Let us briefly describe this simple security experiment as follows:(i)Setup: the challenger randomly selects , and the set of is generated by the Initialization algorithm . outputs a pair of message (ii)Challenge phase: the challenger randomly chooses a bit and generates ciphertext aggregated by devices by running and gives to (iii)Guess: outputs a bit

Definition 7 (SCS). We say that DDA meets Smart Contract Security if for all probabilistic polynomial time adversary ,

We defined s advantage as

Definition 8 (OS). We say that DDA meets Outlier Security if for all probabilistic polynomial time adversary ,

4. Warm-Up: A Basic DDA System

Inspired by the secure decentralized privacy system [15] and the secret sharing [14], we propose the basic decentralized privacy data aggregation on the architecture of smart contracts. In this basic scheme, we mainly use cryptographic primitives such as secret sharing to protect the security of devices’ data. As shown in Figure 1, the system consists of three main entities: leaders, devices that provide data, and smart contracts.

4.1. Construction

The details of the proposed system are elaborated as follows. (i): (1)The leader uses ShareGen to generate values and then, the leader deploys each to each device that he/she controls. Hence, the device has parameters .(2)The leader creates SC And then runs AddDevice program to deploy his own devices (ii): (1)The leader requests data from SC. In order to protect the privacy of the raw data in SC, the device uses AddOnShare to encrypt data

SC receives and then updates the database by using (2)The leader runs Aggregation. In fact, the data will be aggregated using a leader-specified algorithm PartialRec: (iii): the leader uses his secret value to recover the raw data by computing:

4.2. Security Analysis

The security proof of the proposed DDA is located on the supplemental materials. Here, we sketch the security in an implicit way with respect to SDS, NDS, SCS, and OS. (i)(SDS) As noted earlier (Definition 5), is a one-time key so that the proposed system is equivalent to one-time pad. does not have a key; in order to distinguish the single device’s ciphertext , it is trivial for to succeed with probability 1/2 by outputting a random guess(ii)(NDS) As noted earlier (Definition 6), the key give no information at all on so that the proposed system is equivalent to secret sharing. From Definition 1, the adversary can only be randomly selected from the domain , and the probability of success is (iii)(SCS) As noted earlier (Definition 7), an external needs to be authorized by the leader (AddDevice) to add their own data to the specified database (Upload). From Definition 2, unless can destroy the security of the smart contract SCS, the probability of success is negligible(iv)(OS) As noted earlier (Definition 8), is a one-time key so that the proposed system is equivalent to one-time pad. does not have an key; in order to distinguish the ciphertext aggregation results of devices, it is trivial for to succeed with probability by outputting a random guess

5. Full-Fledged DDA+ System

In practical applications, the leaders need crowdsourcing data because more information can make better, more informed decisions for leaders. We consider if the data collected by the leader is the total number of smokers in a region. Assuming that an incoming person joins the statistics, the leader can query the number of people before and after, thereby exposing the privacy of the new user. (Assuming smoking is sensitive data for individuals.) For any such crowdsourcing, privacy preservation mechanisms should be used to reduce and control the privacy risks introduced by the data collection process [16]. We consider a balance between the usefulness of data leaking and the data collected.

Firstly, we consider a method of centralizing differential privacy (also called differential privacy on curator model). As shown in Figure 2, we can set SC or the third party to add differential privacy to the noise-adding mechanism, such as Laplace, indexing mechanism [17, 18]. However, such method cannot be realized. The specific security considerations are stated as follows: (i)SC is currently fully open, so the noise addition process will also be exposed(ii)Even if some of the data processing process is not public, centralized privacy may cause collusion problems

This privacy issues are actually considered in the area of differential privacy. We have some alternatives (e.g., local differential privacy) to overcome a few issues of the curator model [16].

5.1. Additional Primitives

(1)Local Differential Privacy (LDP). We explicitly describe the LDP settings and their concepts. In the LDP setup, there is a group of leaders, and the -th leader has a private value in a certain domain . These leaders interact with the untrusted aggregator such that the aggregator learns statistical information about the distribution of private values in the leader population, while the information leakage for each individual is bounded. Specifically, the leader uses the algorithm to perturb the private value and sends to the aggregator. The aggregator then processes the collected reports to recover statistical information. The algorithm satisfies the following properties.(2)Token-Controlled Public Key Encryption. The point of a token-controlled public key encryption scheme is that the sender randomly picks a token from predefined token space. The sender encrypts the information using the token and the receiver’s public key. In addition, the sender needs to send the token to a “semitrusted” third party. The receiver decrypts using the valid token and private key. We now formally define a token-controlled public key encryption scheme as follows.

Definition 9 (LDP) [19, 20]. A randomized function satisfies -local differential privacy (-LDP) if and only if for any two input tuples and for any possible output of , we have

Definition 10 (TCPKE) [21]. A token-controlled public key encryption (in short TCPKE) scheme consists of the following algorithms: (i): it takes a security parameter as input and outputs a private and public key pair . Note that is the security parameter, finite plaintext space , a finite token space , and a ciphertext space (ii): it takes a security parameter as input and randomly outputs a token (iii): it takes , , and as input and outputs a ciphertext (iv): it takes , , and as input; this algorithm outputs a plaintext or a special symbol as null

5.2. Details of DDA+

Based on the above security concerns, we have modified the model. (i)LDP [19]: the differential privacy is deployed on the local device side, and the other devices response to the leader with strong -differential privacy guarantees(ii)TCPKE [21]: the cryptographic primitives (token) are added to ensure the unforgeability of the data requester and resist against external adversary attacks

As shown in Figure 3, the proposed full-fledged DDA+ system consists of two parts. (1) The data interaction between the leader and the device is discussed in Section 4. (2) The leader interacts with the external devices. As a conclusion, the leader only collects randomized answers from each IoT device (by LDP techniques). We will show the details of data interaction with external devices.

5.2.1. Definition

We slightly modify the definition of the DDA model. Note that for each IoT, we assume a universal for the ease of presentation and analysis. The DDA+ syntax is formally described as follows: (i): the algorithm takes as inputs , leader’s private key , the private address registered by the leader on a platform L.EPK, device address set , and generates (ii): the algorithm takes as inputs SC.Address, the leader’s ciphertext and token , leader’s public address L.EOA, device private key , the raw data for the -th device , device stored database NDB, the device’s privacy budget, and outputs NC(iii)DataReconstruction : the algorithm takes inputs as ciphertext NC, leader’s private key , and outputs NM

5.2.2. Construction

(i)Initialization(1λ, , n, S0, L.EPK, .EOA, ): (1)The leader uses ShareGen to generate values: . Note that these are used by devices that are not associated with (2)The leader creates SC and then runs AddDevice program to authorize alien device: (3)The device runs algorithm and generates a private and public key pair: (4)The leader runs and randomly chooses a token :

Then, the leader needs to encrypt a shared secret value , calculated as follows:

Finally, the leader releases . Note that tk is transmitted to SC, and SC will sign tk for the device specified by the leader, and the device will verify it. (ii):(1)The device decrypts to obtain : (2)In the local differential privacy budget , the device uses the algorithm to generate the noisy data : (3)In order to protect the privacy of the noisy data in SC, the device runs the AddOnShare algorithm:

Then, it updates the database NDB where it is located: (4)The leader runs the Aggregation program. In fact, the data will be aggregated using a leader-specified algorithm PartialRec: (iii): the leader decrypts using his secret value to restore the noisy data by computing:

The leader can query multiple times for data statistics and mining.

5.2.3. Security Analysis

The extended DDA+ system still satisfies the security of SDS, NDS, SCS, and OS. The proof of the scheme is similar to that of Section 4, and we do not discuss it further

5.2.4. Privacy Analysis

According to Definition 9, the leader who receives the perturbed tuple cannot distinguish whether the true tuple is or another with high confidence (controlled by parameter), regardless of the background information of the leader. This provides plausible deniability to the leader. The attributes of the tuple can be either numerical or categorical. According to the privacy data types of IoT devices, different statistical methods are used for estimation and more details in Nguyen et al. [22], e.g., for a single binary attribute, it is sufficient to estimate the distribution of IoT device data using a typical random response [23]. We consider that in the following scenario, sensors (IoT devices) are distributed among patients, and the probability that a patient has a disease of type is ; we want to count the expected value of . If the aggregator (leader) directly obtains the corresponding data of the user for statistics, the patient will reveal privacy. Thus, each user reports her true answer with probability and random answer with probability . Then, we can simply calculate the expected value

According to Definition 9, -LDP require that .

6. Implementations

We discuss the implementations in the aspects of LDP, secret sharing, and smart contracts.

6.1. The LDP Perturbation Data Function

We use Python 2.7 programming language to implement the experiments. Then, we take Raspberry Pi 3 as a data collector (the leader’s device or other device) and they equip a quad-core ARMv7 CPU 1200 MHz and 1 GB RAM. Moreover, we use the Google RAPPOR [24] to realize the LDP. In DDA+ system, the devices run this algorithm to generate the noisy data. Suppose each device only stores a 1-bit binary data, where the probability of the number of 1s in all the devices’ data is . Each device determines a true value (without flipping the 1-bit data) with a probability . After the randomized response, we can obtain the noisy data. For analyzing the deviation from the original and noisy data aggregations, we fix and then set and to get and , respectively. The deviation of pure data and noisy data with different is shown in Figures 4 and 5. By observation over the experimental results, it suffices to use if we accept 10% deviation on 300 devices ( if on more than 500 devices).

6.2. Performance of the Secret Sharing Scheme

For evaluating the performance, we use python programming language with version 2.7 and import random module, numpy module, etc., to implement the experiment, running Intel(R) Core(TM) i7-7700 CPU 3.60 GHz and 8 GB memory in Windows 10 as the operating environment of Ethereum. Firstly, the leader uses ShareGen to generate shares and sends them to devices. Devices use AddOnShare to encrypt the data. Next, as long as the leader wants to require some data, he uses PartialRec and Extract to recover the data. The execution time of each algorithm is shown in Figure 6.

6.3. Simulations on Smart Contracts

We run our DDA+ system on the Ethereum blockchain through Solidity 0.6.12 and deploy it upon the Ropsten test network on February 17, 2021, where the gas price is approximately 2.1578 Gwei. Note that, at the same time, Ethereum average gas price on the public network is 183.087 Gwei. On our simulation, we firstly have to deploy the main contract, where we use AddDevice and RemoveDevice transaction to add device and delete device, respectively. If the leader would like to access, he can send Aggregation transaction. When devices collect some data, they send Upload transaction to the blockchain. In Table 3, every transaction has a different processing fee. We found that the highest cost is exactly on contract deploying. The other types of transactions only require a lesser cost.

7. Conclusions

In this paper, we propose the DDA/DDA+ systems based on smart contract as the heart to achieve decentralization. However, they rely on privacy and cryptographic algorithms to protect the data of IoT devices and satisfy the security requirements. To wrap up our techniques, we use secret sharing to preserve the data processing mechanism efficiently. We apply local differential privacy and tokens as extensions of the access capabilities of other leaders’ devices.

Data Availability

Data sharing not applicable—no new data generated, or the article describes entirely theoretical research.

Disclosure

The earlier version of this work was published in DSC 2021.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported in part by the Ministry of Science and Technology of Taiwan (Nos. 106-2218-E-155-008-MY3 and 109-2628-E-155-001-MY3). We thank Zhong-Yi Guo and Yunmin He (YZU) for initial discussions.