Abstract

In a wireless sensor network, data privacy with a minimum network bandwidth usage is addressed using homomorphic-based data aggregation schemes. Most of the schemes which ensure the end-to-end privacy provide collective integrity verification of aggregated data at the receiver end. The presence of corrupted values affects the integrity of the aggregated data and results in the rejection of the whole data by the base station (BS) thereby leading to the wastage of bandwidth and other resources of energy constraint wireless sensor network. In this paper, we propose a secured data aggregation scheme by slicing the data generated by each sensor node deployed in layered topology and enabling en route aggregation. Novel encoding of data and hash slices based on child order is proposed to enable concatenation-based additive aggregation and smooth extraction of slices from the aggregate by the BS. Elliptic curve-based homomorphic encryption is adopted to ensure end-to-end confidentiality. To the best of our knowledge, the proposed scheme is the first which facilitates the BS to perform node-wise integrity verification, filter out only the corrupted portion, and implement dynamic query over the received data. Communication- and computation-based performance analysis shows the efficiency of the proposed scheme for varied network sizes. The scheme can resist eavesdropping attack, node compromising attack, replay attack, malleability attack, selective dropping attack, and collusion attack.

1. Introduction

Wireless sensor network(WSN) has started its origin with the SOSUS (SOund SUrveillance System) developed by the U.S. Navy for detecting the Soviet submarines during the end of World War II by employing acoustic sensor on the SOFAR channel [1]. Nowadays, wireless sensor networks have been extensively used in many applications and play a vital role in monitoring remote places which are remarkably inaccessible by humans and require continuous scanning. Surveillance is done by collecting information of measurable factors like temperature, humidity, dissolved gases, and current and water turbulence of environment using tiny sensing devices scattered at different distances. These miniature devices though have become the boon of today’s technological development, starve from resource constraints like less storage, less processing power, and low power backup. Hence, the data transmission which is the major cause of energy depletion in wireless sensor network attracted the attention of research. Efficient routing protocols [24] and mobility sinks are proposed to minimize the transmission. Sensor nodes usually deployed in dense lead to redundancy in the data being measured. This plays a major role in the consumption of communication bandwidth. Hence, data aggregation which reduces the communication by aggregating redundant data becomes vital in wireless sensor network. Also, sensor nodes mostly deployed in remote hostile areas are liable to man-in-the-middle attack [5] due to the lack of physical protection and attention and untrustworthy transmission channel. Hence, there is a need of privacy-preserving schemes. The privacy in data aggregation is preserved either by incorporating encryption schemes or by nonencryption schemes like synapsis diffusion [6] and iterative filtration technique [7].

Our survey [8] on existing secured data aggregation schemes reveals that the number of communication path between nodes and the number of hops along the path are determined by the network model adopted for organizing the nodes. Network model which organizes the nodes as tree suffers from a high probability of communication loss due to the existence of the single communication path with multiple hops [9, 10]. Existence of a single path also prevents the nodes from sharing their role dynamically. In [11], sensor nodes are organized as clusters that enable dynamic sharing of aggregator role within a cluster. But like tree topology, it also suffers from a high probability of communication loss due to the existence of a single path. The problem of both tree and cluster topology is addressed in [12] by organizing the nodes as layers where the nodes are grouped into different layers based on the hop distance from the base station.

Privacy in data aggregation involves 2 conflicting factors. (1) Mechanism for keeping the network free from danger (privacy) requires additional data transmission. (2) Data aggregation is aimed at reducing the data communication. Many privacy-preserving secure data aggregation protocols have been developed to meet these two conflicting objectives simultaneously. These schemes can be broadly classified into two, based on how security is achieved either by encryption or by nonencryption [6, 7, 13]. Schemes like [11, 14, 15] employ encryption to conceal the data and prevent intruders from tapping information by overhearing the wireless channel. The scheme [15] verifies the authentication using ID-based aggregated signature. Concealment of data can be in two different levels either between the hops or between the source end and destination end. Concealment between the hops needs the encrypted data to be unwrapped at every hop for aggregation [16]. Unwrapping of encrypted packet opens a hole for the intruders to get access to the data by compromising the aggregator node. To strengthen the security, the actual data is hidden from intruders by slicing it into different pieces in [17]. These pieces are encrypted and grouped into two sets and forwarded to the aggregator of disjoint trees where they get decrypted and aggregated. Girao et al. [18] proposed a new scheme to conceal the data between the source end and the destination end by applying aggregation function over the encrypted data instead of the plain data using privacy homomorphism (PHM). The PHM-based schemes can again be classified based on keys as symmetric PHM and asymmetric PHM-based schemes. The authors in [11, 19] employed the same key for both encryption and decryption. This reduces the communication and computation cost with less message expansion. But sharing of secret key among all the nodes makes the system more vulnerable to attack. To address this vulnerability, different keys are used for encryption and decryption in [2022]. The key used for decryption is kept secret at the base station, and the key used for encryption is shared among all the nodes. The scheme proposed in [14] employs asymmetric PHM and aggregates cipher text of multiple applications generated using different keys. Privacy homomorphism-based encryption which ensures the confidentiality is vulnerable to malleability attack, and hence, the encrypted data can be falsified by intruders sitting on the way to the BS. This problem is addressed by verifying the integrity of the aggregated data collectively at the BS. In [11], the BS performs collective integrity verification by employing privacy homomorphism and structuring the data as a complex number comprised of the encrypted data and the privacy factor. Another solution using collective integrity verification at the BS proposed in [9] referred to as ASAHS is made scalable by making all the active nodes increment an array element whose index corresponds to the range of value to be contributed by them. The array allows the BS to perform any operation over the sensed data. Since the aggregated values are the representation of only the different range of sensed data, the BS cannot know the exact sensed values. This made this scheme not suitable for applications where the accuracy is at most important. Though the collective integrity of the aggregated value is verified by the BS, it requires the BS to share a secret cipher text with each leaf node. Boudia et al., in [23], proposed a secure additive homomorphic-based aggregation scheme referred to as SASPKE using symmetric state full key which is shared between the BS and sensor nodes. For secure sharing of this state full key, it employs an asymmetric key using elliptic curve cryptosystem (ECC). This scheme ensures end-to-end concealment, collective data integrity, and authentication by using homomorphic encryption. For authentication, hash-based message authentication code (MAC) is generated using the state full key which can be unique for each message. Chen et al., in [22], referred to as RCDA, proposed an encoding-based secured data aggregation to support dynamic query by recovering node-wise data. But the integrity verification is done in collective manner at the BS. Since the PH schemes allow the operations over the encrypted data without decryption, the malicious node can perform any malfunction as if it is a valid node. This malleability attack is addressed in MRCDA [24] by verifying the integrity at every intermediate aggregator node with the help of the MAC generated using pairwise shared symmetric key, secret key, and random function. The end-to-end integrity is verified by the BS with the help of MAC generated using another key pair shared between the BS and all nodes. But continuous compromising of nodes violates the integrity which leads to the failure of collective integrity verification. Li et al. in [10] verified the integrity of homomorphic-based aggregated data in BS as well as in the forwarding path. To reduce the impact of compromising node attack as in [24], integrity is ensured by aggregator node and monitoring nodes by employing a randomly located subMAC. Detection of false data by checking the integrity at intermediate nodes [10, 24] prevents the invalid data from reaching the BS.

In all these above discussed schemes, failure of collective integrity verification leads to the rejection of whole aggregated value including the valid data. This results in the wastage of bandwidth. Ozdemir and Yang in [25] proposed another solution referred to as IPHCDA which has improved the granularity level of integrity verification by allowing the BS to verify the integrity at the cluster level. In our proposed scheme, we further improved this granularity level by making the BS to do integrity verification at individual sensor node level and separate the corrupted data from the valid one. Hence, the bandwidth consumption problem of collective integrity verification due to the rejection of valid data along with invalid data is addressed in the proposed scheme.

Discrete integrity assuring slice-based secured data aggregation scheme for wireless sensor network is proposed in this paper. The major contributions of the proposed scheme are as follows:(1)The scheme supports concatenation-based additive aggregation of sliced messages at every hop en route to the BS by adopting additive Elliptic Curve-ElGamal- (EC-EG-) based homomorphic encryption mechanism. To enable this, a novelty is added by creating the sliced messages using child order-based encoding mechanism(2)The proposed scheme allows smooth extraction of node-wise sliced messages from the aggregate. The BS is made to perform integrity verification at individual sensor node level and filter out only the data of corrupted node(3)The scheme allows the BS to implement dynamic query like sum, mean, max, and min using a single round of data collection. The scheme is made secure against eavesdropping attack, node compromising attack, replay attack, malleability attack, selective dropping attack, and collusion attack by adopting slice-based homomorphic encryption mechanism

The rest of the paper is organized as follows. Section 2 discusses the related work. Section 3 describes the role of hashing function and comparisons of hashing functions. Section 4 first describes the network model. Then, TinyMD5 and EC-ElGamal cryptosystem are reviewed. Section 5 details our proposed scheme for discrete integrity assuring slice-based secured data aggregation. Security analysis is discussed in Section 6. Section 7 presents performance and comparative analysis. Finally, concluding notes are made in Section 8.

In [26], the authors have proposed two schemes based on slicing for secure data aggregation. In Scheme I, for the network with nodes, each data is sliced into pieces and confusion is added by mixing the slices of nodes. Mixing is done at the cost of transmissions. Originality of the slices is validated by attaching a hash with each slice, and confidentiality is ensured using the symmetric advanced encryption standard. Homomorphic encryption is adopted to ensure end-to-end confidentiality. But the scheme supports only additive aggregation of slices. In Scheme II, the communication cost due to homomorphism is reduced by introducing a noise data while mixing the slices which strengthens the security. Attack by compromising the aggregator node is not addressed in both the schemes. Liu et al. proposed a slice-based secured data aggregation scheme [27] by reducing the number of slices using Euclidean-based decomposition. Different aggregation function is allowed by adding mark bits along with the data slices. But the support of multiple functions using only one query is not possible. In addition, the data integrity is not verified at all levels. In [28], the authors proposed a secure data aggregation to support multiple functions in a single query by representing the data as number of occurrences of values using homomorphic encryption. The authors in [29] proposed cluster-based secure data aggregation using symmetric key encryption. The authenticity of cluster head is ensured before the transmission using hash value. The BS is enabled to detect the malicious node by analysing the timestamp retrieved from the aggregate. But the scheme requires separate symmetric key for each node participating in the data collection round. The authors in [17] proposed a secured data aggregation scheme using elliptic curve-based homomorphic encryption, and integrity of the data is verified collectively at all levels both at CH and at BS by employing two different MAC, one is by using pairwise symmetric key and another one is by using homomorphism. Another elliptic curve-based scheme [22] allows the BS to verify the integrity of the aggregate and to extract node-wise data. This helps the BS to apply any mathematical functions over the data in a single query. But the absence of node-wise integrity verification at the BS results in the rejection of whole received values including the valid data.

By considering the above discussed issues, we propose a scheme based on slicing which can ensure end-to-end confidentiality with reduced number of slices and allow the BS to reject only the corrupted data instead of the whole received values by verifying node-wise integrity. The scheme also supports the BS to compute multiple functions (like sum, mean, max, and min) in a single query.

3. Preliminaries

Three major security elements which are vital in wireless communication are confidentiality, integrity, and authentication. Confidentiality can be achieved by applying encryption mechanisms whereas the originality and authentication of the received data can be verified by generating message digest using hash functions with secret key. It makes it difficult for the adversaries to introduce changes in the original data without doing the required modification in the digest. Hence, hashing functions play a vital role in security schemes.

Most of the existing hash algorithm’s design has taken the form of Merkle-Damgard construction. The message of arbitrary length is split into blocks of predefined size, and from each block, a fixed size digest is generated by applying mathematical functions [30]. This process is repeated for all the blocks in cyclic manner. MD2 and MD4 are 128-bit hash functions and were proven to be in danger of preimage and collision attack. MD5 can generate 128-bit message digest in 4 rounds in oppose to MD4 which has taken 3 rounds. The use of unique additive constant in each round makes MD5 to generate an avalanche effect at a faster rate compared to MD4 which uses the same constant for all rounds. The authors in [31] proposed the modified version of MD5 called TinyMD5 by reducing the message digest from 128 bits to 32 bits and made it energy efficient hashing function and suitable for energy starving wireless sensor network. This is adapted in our scheme to generate the hash value for verifying whether the received message is intact. In TinyMD5, the major difference lies in the initial buffer value of four 8-bit buffers, size of operational data, and constants used for removing the duplication in the generated hash value. Though the weakness of message digest hash functions with respect to collision attack is proven and made the cryptographers suggest the SHA hash function series, the size of message digest of SHA series which is more than 128 bits leads to communication overhead for the energy constraint wireless sensor network. The comparative analysis of some of the existing hash functions is given in Table 1.

4. Specification of DIA-SSDA Method

In this section, first, the network model which illustrates the deployment of nodes around the base station is discussed. Then, the review of TinyMD5 and Elliptic Curve-ElGamal cryptosystem is presented.

4.1. Network Model

We consider a WSN with a ring-based layered topology. The nodes are organized as layers such that each layer comprises nodes with the same hop distance from the BS. Layer 1 includes all the nodes which can be reached from the BS in one hop. Layer 2 includes nodes which are at two hop distance from the BS. Similarly, layer includes all the nodes which are hops away from the BS. Refer to Figure 1.

The layers which form the rings of nodes around the BS enable multihop communication [12]. Also, the existence of multipath between the nodes and the BS in the layered architecture allows dynamic sharing of aggregator role among the nodes [32]. This prevents one node from getting overloaded by playing the role of aggregator as in cluster and hierarchical network models. It also minimizes the probability of communication loss. To facilitate the propagation of packets from source layer to the BS through multiple paths, each node at layer stores information about the neighbour nodes in the next layer towards the BS.

4.2. TinyMD5

In wireless sensor network, the data in aggregated form moving towards the BS are viable for attack en route. The adversaries may introduce unauthorized modification and affect the integrity of the data. Hence, to verify the integrity of the message, hash value is sent along with the data. We consider an algorithm called TinyMD5 proposed by Bok et al. [31] for generating tiny hash value with minimized computation. The scheme generates an intermediate 128-bit hash value from the original data using the normal MD5. Then, this 128-bit intermediate hash value is reduced to 32-bit lightweight hash value using block processing method. Since the TinyMD5 is a one-way hash function, the plain text cannot be recovered using decryption. The generation of 32-bit hash using TinyMD5 is illustrated in Figure 2.

The four 8-bit registers A, B, C, and D hold the final output of TinyMD5.

4.3. EC-ElGamal Cryptosystem

Mykletun et al. [21] proposed an Elliptic Curve-ElGamal cryptosystem, a transformed version of the original ElGamal cryptosystem. It has the ability of providing the same level of privacy accorded by RSA but with smaller key size and same stubbornness against security attack. Hence, the elliptic curve cryptosystem is considered as the suitable one for bandwidth constraint wireless sensor network. The stubbornness of this system against security attack is based on elliptic curve discrete logarithm problem that is given the multiplier and the product finding the multiplicand is computationally difficult when both the multiplier and the product represent a point over an elliptic curve. It maps the plain text to the points in the elliptic curve independent of the transformation needed for generating the cipher text. The mapping is done using homomorphic function that is . In this cryptosystem, the private and public key parameters are generated as follows:(1)The elliptic curve over the finite field with as prime number and the order of , i.e., , with a large prime factor is constructed(2)The random private key is taken from and kept secret in the BS(3)The public key parameters , where is the generator, is selected such that , , are stored in all sensors during deployment

Encryption and decryption are done as follows:(1)Encryptionwhere .(2)Decryptionwhere rmap is for reverse mapping point to the scalar value.

5. Working Principle

Before deployment, the BS generates the private key and public key parameters using the EC-ElGamal cryptosystem [21]. The random private key is taken from the curve and kept secret in BS. The public key parameters are stored in all sensors during the deployment along with the node ID which is generated by the BS. To construct the BS centred layered topology, the BS broadcasts Build_Layer message which carries the node ID and layer number. The layer number for the BS is fixed as zero. Each node which receives this message from the BS marks its layer number as 1 and broadcast a new Build_Layer message. The layer number and neighbours of node are fixed as follows.(1)If the node receives the build message for the first time and its layer number is not yet fixed, set its layer number as and mark as its neighbour in the parent layer(2)If the layer number is already fixed and , then is marked as its parent neighbour. If , then is marked as its child neighbour(3)The node which could not find any child neighbour forms the last layer and broadcasts Build_Complete message which carries the node ID and layer number to its parent neighbour. Each node which receives this message stores and rebroadcasts it towards the BS

In each sensor node, the parent neighbours are grouped into sets such that each set has “” parents. This ensures multiple paths between the source node and the BS. In this parent-child relationship, the children are grouped such that each group has 2 children and within a group each child is assigned a number called child order . This child order is either 1 or 2. It is kept secret in the parent as well as in the child. It helps to identify suitable parent for routing the sliced aggregated message to BS. When there is data that need to be sent to BS, the sensor node generates the hash value of the raw data and performs slicing of both raw and hashed data. Then, the sliced messages carrying the piece of raw and hash values along with additional information required for helping the BS to match the received slices are created. Next, the sliced messages are encoded and encrypted using the public key shared with the BS. The nodes en route to BS perform grouping of the received messages according to the child order and aggregate them. The communication overhead caused by slicing is minimized by aggregation at every hop. Thus, the whole procedure is composed of 5 steps: slicing, encryption and aggregation, decryption of aggregated message, data recovery, and integrity verification. The first 2 steps occur in the sensor whereas the last 3 steps are carried out by the BS. The definitions of the symbols used in the explanations are given in Table 2. The detailed explanations of these 5 steps are as follows.

5.1. Slicing

The node at layer generates hash value of the sensed data using TinyMD5. It slices both the sensed data and its hash into pieces. The number of slices () at node is chosen in random from {} where “” refers the maximum number of slices. For each slice, sliced message (SM) is formed by appending sliced data piece with one sliced hash, (node ID generated by the BS during node deployment), number of slices (), and the nonce . The node ID and nonce help the BS to sync the received slices correctly. The slicing reduces the size of the message. The SMs are encoded, encrypted, and forwarded to the random parent neighbour in the upper layer which is chosen from the parent set based on the child order. This child order is used for the encoding. The SMs are generated and encoded as follows:(1)The hash value is computed such that where is TinyMD5 hash function and is the data sensed by node at layer (2)The sensed data and its hash value are split into slices of equal length such that , . The value of remains 1 at one hop layer. For the remaining layer, is chosen in random from (3)Encoded sliced message (ESM) for each slice is created as follows:(a)Let and (b)While (), compute and increment by 1(c) where is sliced message length in bits and is the child order of node for the th parent chosen in random from the parent set. Repeat step (b) by initializing x by 1 and b by 0. Increment β by (tx ∗ (gi − 1 )) if the group order (gi) is not 1.(d),

5.2. Encryption and Aggregation

Encoded SMs are encrypted using the public key as follows.

5.2.1. Aggregation

Let the received cipher texts at node in layer be where “” is the number of packets received for aggregation. The received cipher texts are grouped into sets , each with 2 cipher text based on the child order assigned to each child node. Each set is aggregated with corresponding where to . The number of 0’s used in the encoding of SM enables smooth aggregation of cipher texts in each set as it is computed based on child order. For each group , the aggregated cipher text is created as follows.(1)Initialize , i.e., (, )(2) for each in

Node forwards to the parent in the next upper layer. The parent is chosen based on the child order which is used to encode .

5.3. Decryption of the Aggregated Data

The received aggregated cipher text is decrypted by the BS using the secret key. The decrypted message .

5.4. Data Recovery (DR)

The BS extracts individual SMs from the decrypted aggregated message using the message length () as follows.

where to .

5.5. Integrity Verification

The BS decodes the recovered message and extracts the sliced sensed value , sliced hash value , number of slices (), node ID , and nonce from each recovered message using the index position. For the sliced sensed value , the index position is from 0 to . Here, refers to the length of sensed value. The index position of hash value is from to . The node ID is extracted from the index position to . The sliced messages are mapped with their corresponding sliced part using the node ID and . After the availability of all the SMs, the extracted and mapped sliced sensed values are merged to form the received sensed data. The hash value of the received sensed data is calculated by the BS using . The received partial hash values are merged and compared with the calculated hash value. If both are the same, integrity is ensured. Otherwise, the corresponding sensed value of a particular node is dropped by the BS. Node-wise integrity verification by the BS is given as follows.(1)For each recovered message , to

Extract(a)Sliced sensed value (b)Sliced hash value (c)(d)(2)For all the extracted node ID, repeat the following:(a)If the number of recovered (i)Merge all of node (ii)Merge all of node (b)Compute (c)If Integrity verification for node is successful. The data of is accepted.ElseThe data of is rejected.

5.6. Illustrative Example

To demonstrate the working principle of our scheme, we have taken a simple WSN deployed in ring topology with 8 sensor nodes such that nodes are in layer 3, are in layer 2, and are in layer 1 (refer to Figure 3). Let the data sensed at each node and their corresponding hash value be , , , , , , , and . To make the explanation simple, here we omitted node ID, number of slices, and time stamp in the sliced message and set the number of slices to 2. The data slices generated at node from the data are . The hash is sliced into . The encoded sliced messages at are and , i.e., where ; hence, and where . Let decimal digits. Hence, . The encoded SMs of other nodes are , , , , , and .

The encoded SMs at layer 3 are encrypted and forwarded to the parent node based on the child order. The parents of node are . The child order of is 1 and 2, respectively. Cipher text of , i.e., , is forwarded to and to .

To demonstrate the aggregation process, consider the cipher texts received at . These cipher texts are grouped into 2 sets based on child order such that each group includes cipher texts with child order 1 and 2. Hence, the groups at are and . The encoded SMs of are , and , . The aggregated cipher text of is and its corresponding encoded SM .

In the same way, the encoded SM of aggregate at , , , , and .

The BS extracts node-wise sliced messages from by using where to . The SMs are matched by comparing the node ID, and the order is determined using time stamp. For example, the extracted SMs of are . From the SMs, the data slices and hash slices are retrieved and merged to form a final data value 71 and hash value 75. Now, the BS generates new hash using the retrieved data value and compares it with the received hash. If both are equal, the data is accepted. Hence, node-wise data integrity is verified.

6. Attack Model/Security Analysis

In this section, we provide various attack models of wireless sensor network and the ability of our proposed scheme to afford shield against them.

6.1. Eavesdropping Attack

Since the encrypted sliced data packets (ESMs) can be decrypted using the private key known only to BS, the intruder cannot access the sliced data. Even if the intruder manages to break the keys, the partial sliced information in the ESM prevents the intruder from getting the actual sensed data. In addition, to know the sensed information, the intruder has to find the synchronized matching ESMs by compromising multiple nodes. Since the ESMs are taking random path, this is highly not possible. Thus, the slicing and encryption provide double protection against eavesdropping attack and ensure confidentiality.

6.1.1. Double Protection through Encryption and Slicing

From the two cipher text components , where , for the intruder, it is computationally difficult to get “” given the generator and . This difficulty is based on elliptic curve discrete log problem.

6.2. Node Compromising Attack

Double protection of the sensed data offers strong resistance against intermediate node compromising attack. By compromising the intermediate node, the attacker can only get the ciphered data which can be deciphered only by the BS. If the key is broken by cryptanalysis, the attacker can access only the partial sliced data, because the aggregated message at the compromised node contains only the combination of sliced messages (SMs) as follows.where to .where to .

Each SM contains only the sliced part of the data sensed by node .

6.3. Selective Dropping Attack

The intermediate compromised node may corrupt the data aggregation by selectively dropping packets. With the help of the node ID and , the BS can obtain the number of SMs of particular node and is enabled to detect selective dropping of the sliced packet as follows

If number of SMs of

Detects selective dropping attack

Else

No selective dropping attack

Ability of the BS to derive information about the number of slices from the node ID, allow it to detect the number of packets missed during the transmission.

6.4. Collusion Attack

Collusion attack allows the raw data to be extracted in collaborated manner. In our proposed scheme, the data is travelling in the form of multiple slices and each slice is taking different routes. The path taken depends on the parent node chosen in random. Because of this randomness, finding synchronized sliced packets by compromising multiple nodes and imposing collusion attack becomes difficult.

6.5. Forgery Attack/Replay Attack

The replay attack which falsifies the duplicate sensitive aggregation values like SUM, COUNT, and AVG can be detected and filtered out by the BS. This can be done by counting the number of SMs with the same node ID () and verifying whether the count is equal to . The node ID and nonce provide unique identification for each packet. Forgery attack is possible in 2 ways: (i) by not compromising node and (ii) by compromising node. Since the information is sliced into different packets and each sliced packet carries the hashed information of the whole message, the forged packet without compromising the node can be easily detected by the BS.

6.6. Malleability Attack

The cryptosystem using privacy homomorphism is intrinsically malleable which paves way for the intruder to modify the cipher text without knowing the original data and this cipher text can be decrypted to related raw text. The slicing and node-wise integrity verification at the BS adopted in our proposed system help to remove the malleable data smoothly without affecting the pure data. Illegal manipulation done by the malicious antagonists over the aggregated packet of compromised node affects only the slices of data merged in the aggregated packet. Since the packets carry only the aggregation of slices, it is difficult to make the manipulation to imitate the valid data and this cannot affect the entire aggregate value as in MRCDA and other systems. Also, node-wise integrity verification filters out only the affected slices and their corresponding pairs, not the entire received aggregate as in other existing systems. This eliminates the problem of collective verification where the failure of integrity verification leads to the rejection of the whole aggregated data including both modified and unmodified data which results in wastage of bandwidth.

6.7. Node-Specific Retransmission of Damaged Data

In our proposed scheme, each sliced message carries the ID of the source node () generated by the message, i.e., . The BS derives information about the number of slices generated by each node from the ID. Once all the slices of the original message are available, the BS performs integrity verification as specified in Section 5.5. In the event of any deviation in integrity verification or missing of any slices, the BS can make request for retransmission of invalid data from the source node whose ID is dug out from the received message.

6.8. Security Proof

The security proof is presented by analysing the possible ways of ESM distribution and the probability of the intruder finding the corresponding siblings of the same node.

Slicing and shuffling which take place when the packets travel towards the BS have a significant impact on the distribution of sibling packets which in turn affects the probability of finding all sibling packets.

6.8.1. Possible Ways of Packet Distribution at Each Layer

Case 1. Under the condition such that packet and neighbours are distinguishable without exclusion that more than one packet can be placed in the same neighbour and no neighbours is empty if .
Possible ways for distributing packets into neighbours present in the layer can be specified using the “Stirling number of the second method” [33] as follows.

Case 2. Under the condition such that packet and neighbours are distinguishable without exclusion that more than one packet can be placed in the same neighbour and no neighbour is empty if .Hence, the possible way of packet distribution at layer is is the possible way for distributing packets to neighbours of node at layer .

6.8.2. Possible Ways of Packet Distribution in All the Layers

The possible distribution of sibling packets is the product of possible distribution of packets in each layer which grows cubically with the number of neighbours and exponentially with the number of sibling packets. The increase in number of levels also has a significant impact on the distribution.

6.8.3. Probability of Getting All Sibling Packets of the Same Node

The sibling packets generated from the same node are travelling towards the BS by taking different paths. Hence, the possible distribution of packets increases exponentially with the number of layers crossed. The probability that the intruder gets all the sibling packets of the same node is calculated using the relative position of the siblings in a given instance. This can be categorized into 3 cases.

Case 1. All siblings are available in one node at any layer positioned between the source layer (sl) and the BS.where refers to the total number of nodes between the source layer and BS.

Case 2. All siblings are available in different nodes at the same layer positioned between the source layer and the BS.where nsbl is the total number of siblings.

Case 3. All siblings are available in different nodes at different layers positioned between the source layer and the BS.The above mathematical model shows that the probability of getting all sibling packets is inversely proportional to the number of nodes participating in data gathering. Results obtained by substituting real time values for the above three cases are given in Figure 4 and Table 3. The results show that the probability reaches almost zero with the distribution of sibling nodes in different nodes at different layers. This makes it highly difficult for the intruder to compromise multiple nodes and acquire all the related sibling packets of the same node.

7. Performance and Comparative Analysis

In this section, we analyse the performance of our proposed scheme by measuring the energy-consuming factors which include the communication and the computation overhead.

7.1. Communication Cost

First, the communication cost is analysed by measuring the impact of topology on the number of hops required for the packets to reach the destination from the origin and then by measuring message size. We consider ring topology, cluster topology, and hierarchical cluster topology for comparison.

Topology of sensor networks with widespread monitoring area spans over many layers around the BS, and the number of hops experienced by the packets is high in the schemes with no data aggregation. Our proposed slicing scheme over the sensor network with ring topology reduces the number of hops by merging the sliced packets at every hop. Hence, the total hops taken by all the packets are less. Though the number of hops in the hierarchical cluster-based topology is comparatively less, the number of hops increases in if the network monitoring area spans for wide range. In cluster-based schemes, it is in . Table 4 shows the hop count based on network topology.

The comparative analysis of communication cost based on network topology with varying sizes is presented in Figure 5. Though the cluster topology experiences less communication overhead compared to the other 2 topologies, it requires a dynamic change in the role of cluster head for preventing the node from getting overloaded [34]. This leads to variation in the distance between dynamic CH and the cluster members close to the border of the cluster. This adds to the communication cost, as the power required to transmit the data is directly proportional to the distance between sender and receiver [35].

Hence, the dynamic change of CH in cluster topology leads to more power consumption. This problem is minimized in sensor network with ring topology where the hop distance between the nodes is not dynamic.

To analyse the impact of different deployments on number of hops, a wireless network with 1000 nodes is considered. The number of levels or layers is kept as 9 for both hierarchical-based cluster topology and ring topology. Different testing scenarios are created by introducing randomness in the number of nodes per layer in ring topology and number of clusters per level in hierarchical-based cluster topology. Testing results are given in Figure 6. Results obtained by increasing the number of levels in the network with 1000 nodes are given in Figure 7. From the results presented in Figures 6 and 7, it is observed that in our scheme, increase in the layer and change in deployment do not have impact on the overall number of hops. Hence, nodes can be deployed freely. There is no restriction on the distribution of nodes as in hierarchical-based cluster topology whose communication cost is directly proportional to the cluster size and the hierarchical level.

In cluster topology, the number of clusters does not emit any impact on the number of hops required to reach the BS. Though it shows relaxation on the number of clusters and cluster size, there is a limitation on the distance between the cluster head and BS for the smooth and direct communication. Since the maximum hops to reach BS are restricted to two, the network cannot scale vertically. Hence, indirectly, condition is imposed on the deployment of nodes and on the assignment of cluster head role [34] to the cluster members. Ring topology, in our proposed scheme, allows the network to scale vertically and horizontally around the BS. Existence of multipath communication allows dynamic sharing of aggregator role, and hence, nodes are prevented from overloading. Though communication cost is slightly more, the bandwidth consumption due to the retransmission of both affected and unaffected packets which are discarded by collective verification in the existing schemes is eliminated by dropping only the corrupted packets.

Now, we analyse the communication cost by measuring the message size. The cipher text of our proposed scheme generated using the Elliptic Curve-ElGamal cryptosystem has two components, and . Both refer to two different points on the elliptic curve.

A point on an elliptic curve over is represented by using a pair of coordinates each of size . Using point compression mechanism, it can be represented only by using one coordinate with one or more additional bits [21]. Hence, the size of one cipher text is . From the analysis in [21], the EC-EG cryptosystem achieves the same 1024-bit security level as that of three other elliptic curve schemes [36], Elliptic Curve-Okamoto Uchiyama, Elliptic Curve Naccache-Stern, and Elliptic Curve-Paillier using only 163-bit modulus. Hence, the size of cipher text in EC-EG is smaller compared to that in the other cryptosystem. We consider the schemes based on elliptic curve cryptosystem which ensures integrity as suitable candidates for comparative analysis of bandwidth consumption. The candidate schemes are MRCDA, ASAHS, RCDA-HOMO, and IPHCDA. By considering the security level offered, 163-bit modulation is adopted for schemes based on EC-EG. 1024-bit modulation is adopted for ASAHS and IPHCDA. While analysing the data size, the concatenation-based additive aggregation adapted in the proposed scheme as well as in RCDA and MRCDA has an impact on the data size. The value range of data that can be sensed decreases with an increase in cluster size and layers. Table 5 gives the formula for computing the size of cipher text and communication cost based on hops and message size in bits.

Simulated results of communication cost for varying network sizes are given in Figure 8. For simulation, we kept the slicing size s as two for our scheme. ASAHS is implemented by keeping the vector size as 5, and for IPHCDA, the region size is kept to the minimum of two. While comparing the total packet size which includes size of cipher text and additional information needed for integrity verification, the message size in our scheme is 3 times less than that of ASAHS and IPHCDA. In ASAHS, the communication cost based on hops and message size grows with vector size, and in IPHCDA, it grows with the number of regions.

While comparing RCDA and MRCDA, the communication cost of our scheme is slightly higher. But it becomes acceptable while considering the problem of collective integrity verification. The presence of a single malicious node in a cluster leads to the rejection of the aggregate received from that particular cluster in collective integrity verification-based schemes and results in bandwidth wastage. To analyse this bandwidth wastage, a network of size 150 nodes is considered. When the malicious nodes spread among multiple clusters and affect the integrity, for every unit increase of infected clusters, the bandwidth wastage increases by 9.6 in RCDA. But in our proposed scheme, the bandwidth wastage increases by 2.64 for every increase in infected nodes by 5. This is illustrated in Figure 9, and it shows that the impact of collective integrity verification on communication cost is more compared to that of node-wise integrity verification.

The saving in bandwidth by node-wise integrity verification (SBW_NIV) is computed as follows.

where BWW_CIV stands for bandwidth wastage due to the rejection of aggregate in collective integrity verification and BWW_NIV refers to the bandwidth wastage due to rejection of false data in node-wise integrity verification. Overall saving in bandwidth in our proposed scheme (OBW_NIV) is computed by finding the difference between the additional bandwidth consumption due to slicing (ABW_S) and SBW_NIV.

While comparing ABW_S with SBW_NIV, the difference, i.e., the overall saving in bandwidth, increases with the number of affected nodes/clusters in our proposed scheme compared to RCDA as shown in Figure 10. Hence, the bandwidth wastage is comparatively reduced in our proposed scheme by filtering only the corrupted part of the data. Added to this, the scalability problem of RCDA which prevents the network from spanning more than two hop distance from the BS is addressed in the proposed scheme by supporting the network to scale with more than 2 hop distance from the BS.

7.2. Computation Cost

Computation is another energy-consuming factor in wireless sensor network. We measure the computation cost of our proposed method by counting the elementary operations (+, −, , pow, mod, and comparison) and other operations like hashing, map, counter generation, additional encryption and decryption required for message generation, aggregation, data recovery, and decryption.

7.2.1. Message Generation

To generate a message, each node has to perform slicing and hashing which costs and map function requires one point multiplication to map the plain text to a point on elliptic curve. The encryption requires 2 point multiplication and 1 point addition operation for each slice. This represents the total of point additions and point multiplications for the whole network.

7.2.2. Aggregation

At every hop towards the BS, the parent node aggregates the received sliced packets using 2 elliptic curve additions which help the sliced message to be embedded in the aggregated message without mixing up with other values and enable node-wise data recovery. Since each node splits the data into , the total cost for aggregation is .

7.2.3. Data Recovery and Decryption

To decipher the aggregated message received from the nodes at the layer next to BS, single multiplication and addition is required. According to this, the total cost for deciphering the aggregated message at BS is .

The different computations incurred at various phases are shown in Table 6, and the comparison of computation cost of candidate schemes based on the results obtained by replacing the variables in Table 6 by real-time values with varying network sizes is presented in Figure 11. The comparison shows that like in other secured data aggregation mechanism, the overall computation cost of our scheme increases with the number of nodes. It is comparatively higher than RCDA and lesser than ASAHS.

This additional cost incurred becomes reasonable by enabling the BS to verify node-wise integrity in contrast to the collective integrity verification practiced in RCDA and ASAHS which leads to the rejection of the entire message including the uncorrupted data by the BS which results in bandwidth wastage. This bandwidth wastage is eliminated by dropping only the corrupted data instead of the entire message.

7.3. Comparative Analysis

We pick out the five schemes RCDA, SASPKE, ASAHS, IPHCDA, and MRCDA as the most appropriate aggregation schemes for comparison, because in all these schemes, end-to-end confidentiality of the sensed data is ensured by enabling the intermediate node to do aggregation on the encrypted data using asymmetric homomorphic encryption. Also, they facilitate the BS to extract individual node data, apply dynamic queries, and ensure the originality of the received data in collective manner which results in the rejection of the whole received data when integrity verification test signs negative. Rejection of the whole received data results in wastage of bandwidth. This is addressed in the proposed scheme by rejecting only the corrupted data by safely filtering out defected data from the received data. In our proposed scheme, the ability of the BS to extract node-wise raw data from the received aggregated packets and knowledge of BS about the number of sliced packets sent by each node help to address the attacks like node compromising attack, replay attack, malleability attack, selective dropping attack, and collusion attack. To the best of our knowledge, all these attacks are not addressed by any one of the single system in the comparison list. Though our system exhibits performance slightly inferior to the existing system, security against these attacks is comparatively strengthened by adopting slicing-based encryption with randomness in the path taken by the packets to reach BS from the source node. This makes it suitable for the system where security plays a vital role. The comparison of our proposed system with existing systems based on quality measures is given in Table 7.

8. Conclusion

The scheme proposed in the paper is a novel slice-based secured data aggregation for wireless sensor network deployed in an insecure attack prone environment. Our scheme merges the benefits of layered topology, aggregation, and slicing. It also enables the BS to perform node-wise integrity verification by safely extracting all the slices from aggregated data and obtain individual node data by merging all slices. In addition, the BS is allowed to apply any query over the data. Elliptic curve-based encryption and slicing provide double protection and make our system secure against eavesdropping attack, node compromising attack, replay attack, malleability attack, selective dropping attack, and collusion attack. Though slicing results in a slighter compromise in the number of hops compared to cluster topology-based schemes, considering the problem of bandwidth wastage in collective integrity-based schemes due to rejection of the whole data, our scheme provides better affordable solutions by filtering out only corrupted data thereby improving the efficiency of the network in terms of bandwidth utilization and providing WSN end user with exact data of individual node instead of aggregate message.

Data Availability

The data used for performance analysis are specified in the article itself.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this article.