Data Security and Privacy for Fog/Edge Computing-Based IoTView this Special Issue
STQ-SCS: An Efficient and Secure Scheme for Fine-Grained Spatial-Temporal Top- Query in Fog-Based Mobile Sensor-Cloud Systems
With the emergence of the fog computing and the sensor-cloud computing paradigms, end users can retrieve the desired sensory data generated by any wireless sensor network (WSN) in a fog-based sensor-cloud system transparently. However, the fog nodes and the cloud servers may suffer from many kinds of attacks on the Internet and become semitrusted, which threatens the security of query processing in the system. In this paper, we investigated the problem of secure, fine-grained spatial-temporal Top- query in fog-based mobile sensor-cloud systems (FMSCSs) and proposed a novel scheme named STQ-SCS to tackle the problem based on the virtual grid construction and the size-order encryption-binding techniques. STQ-SCS can preserve the privacy of the sensed data items and their scores and make end users verify the completeness of the query results of fine-grained spatial-temporal Top- queries with a 100% successful rate even if the fog nodes and the cloud servers are not totally trustworthy. Besides the good security performance, simulation results indicate that STQ-SCS is also an efficient scheme that incurs a much lower communication cost than the state-of-the-art schemes on securing fine-grained spatial-temporal Top- query in FMSCSs.
As one important component of Internet of Things (IoT) , wireless sensor networks (WSNs)  can be used in many application scenarios and are still being studied  by many researchers even though extensive research has been carried out on WSNs for the past two decades. However, traditional WSNs are usually single-user centric , where a user deploys and owns its own WSN and another party is not able to access the sensed data generated by such a WSN. To remedy this shortcoming, researchers have conceived a new paradigm, namely, the sensor-cloud paradigm [5–7], in recent years. A typical sensor-cloud model is shown in Figure 1(a), where the sensor-cloud architecture serves as the intermediate stratum between the end users and the physical sensor nodes . However, early sensor-cloud architectures are still not perfect, and they encounter many new challenges, such as providing real-times services and efficiently managing the physical sensor nodes. In , a new sensor-cloud architecture, namely, the fog-based sensor-cloud framework, was proposed, and the basic model of the fog-based sensor-cloud framework is shown in Figure 1(b). The main difference between early sensor-cloud architectures and the fog-based sensor-cloud framework is that the latter has a fog layer while the former does not have. The fog layer is mainly composed of fog nodes, which can fuse and store the collected sensed data, respond to real-time applications, and efficiently manage the physical sensor nodes . In the fog-based sensor-cloud framework, end users can not only retrieve the sensed data items, which they are interested in directly from the nearby fog nodes, but also obtain the shared sensed data from the cloud by sending queries to the cloud if there are no data which they want in the near fog nodes.
Although the fog-based sensor-cloud framework brings a lot of benefits as described in , it encounters many potential security threats. The fog nodes may be captured by the nearby attackers or may suffer from the attacks arising from the cloud. In other words, the fog nodes may become untrusted [9, 10] under such attacks. Meanwhile, the application servers in the cloud are facing many kinds of attacks, and some of the cloud servers may also not be trustworthy [11–13]. Under this background, how to ensure the integrity and the confidentiality of the sensed data items retrieved by the end users in the fog-based sensor-cloud systems is a thorny-and-burning problem. Such a problem is much more challenging in fog-based mobile sensor-cloud systems (FMSCSs), where the sensor nodes are mobile, considering that the sensed data retrieved by end users must satisfy the spatial-temporal requirements of the queries launched by end users.
In this paper, we focus on fine-grained spatial-temporal Top- queries and make efforts to tackle the abovementioned problem. The concept of fine-grained spatial-temporal Top- queries is defined in Definition 1 in Section 3. In a word, a fine-grained spatial-temporal Top- query refers to a query that tries to find out the top sensed data items generated in a specific time interval and a specific region of a specific WSN deployment field. To our best knowledge, there is no work studying the problem of secure fine-grained spatial-temporal Top- query in fog-based sensor-cloud systems at present. In brief, the main contributions of this paper are twofold:(i)It studies the problem of secure fine-grained spatial-temporal Top- query in FMSCSs and proposes a novel scheme named STQ-SCS to ensure the integrity and confidentiality of the sensed data items retrieved by end users. It provides sound theoretical analysis on the security of STQ-SCS. According to the analysis, STQ-SCS is not only able to preserve the privacy of the sensed data items retrieved by end users but also detect the incomplete query results successfully for fine-grained spatial-temporal Top- query under the security model presented in this paper.(ii)Extensive simulations were conducted in the paper, and the results show that STQ-SCS is much more efficient than the related state-of-the-art schemes.
The remainder of this paper is organized as follows. Section 2 summaries the related schemes; Section 3 describes the system model, the security model, the definitions of some terminologies, and the problem statement; Section 4 presents the proposed scheme STQ-SCS in detail; Section 5 analyzes the security of STQ-SCS; In Section 6, STQ-SCS is compared with the related state-of-the-art schemes through extensive simulations; Section 7 provides performance evaluation. Section 8 concludes this study.
2. Related Works
Since there is no work about secure fine-grained spatial-temporal Top- query in FMSCSs at present, we mainly investigate the related works in Cloud Computing, Two-tiered Wireless Sensor Networks (TWSNs), and Two-tiered Mobile Wireless Sensor Networks (TMWSNs) in this section.
2.1. Securing Top- Queries in Cloud Computing
Top- queries in the cloud are generally securely processed based on the data that are outsourced on cloud servers by the same data owner. In Cloud Computing, the data owner knows all its outsourced data and thus can construct the encrypted data structure, such as EHL , the binary heap , or other tree-like structures [16–18], based on the whole data set to facilitate Top- query without losing data privacy, while in FMSCSs, expect for the fog nodes that are considered as not fully trusted, each sensor node just knows only a small part of the whole data generated by the WSN where it is located, and it thus cannot construct the encrypted data structure of the whole data before outsourcing its data to a fog node or the cloud.
Moreover, existing schemes proposed for secure Top- query in Cloud Computing are based on the strong processing ability and rich resources of the cloud servers, and they never consider the resource-limited sensor nodes which are also weak in computing. Thus, they are not fit for FMSCSs.
2.2. Securing Top- Queries in TWSNs
The study of securing Top- queries in TWSNs was originally launched by the authors in , where three schemes are proposed to preserve the completeness of the Top- query results in TWSNs. The three schemes were proposed based on the MAC (Message Authentication Code) technique, which requires each sensed data item to be attached with an MAC as its proof data. Then, many other schemes that use a similar technique appeared, such as those in [19–24]. However, the MAC-based technique is relatively less efficient because attaching an MAC to each sensed data item brings large quantity of extra data since a MAC takes almost 40% of the volume of a sensed data item according to .
Besides the MAC-based technique, some other methods were also proposed to ensure the privacy of the sensed data and the completeness of the Top- query results in TWSNs, such as inserting digital watermarks or dummy readings into the normal ones  and constructing data aggregation trees [26, 27]. However, inserting digital watermarks or dummy readings into the measure data makes it hard and complicated for the users to extract the normal readings from the hybrid ones, and it also brings a lot of redundant data, which further leads to the increase of the communication cost of both the sensor nodes and fog nodes.
What is more, one of the most important common points of these schemes is that they are all proposed for TWSNs where nodes are static , and they cannot perfectly treat the security threats faced by spatial-temporal Top- query in FMSCSs, where attackers can launch much more covert attacks. When a mobile sensor node travels from the queried region to other regions or vice versa in the queried time interval, some sensed data generated by the sensor node may be in the queried region, and others may not. Obviously, the sensed data generated out of the queried region by the traveling sensor node are not the qualified ones that satisfy the requirements of the spatial-temporal Top- query. However, few securing Top- query schemes proposed in TWSNs consider this, which leaves leaks for the attackers to launch new kinds of covert attacks. For example, the attackers may replace the data items that are generated in the queried region by a sensor node with those produced out of the queried region by the same sensor node.
2.3. Securing Top- Queries in TMWSNs
The first work on securing Top- queries in TMWSNs was done by Liu et al. in 2015 , when they presented a novel network architecture, namely, TMWSNs, and proposed a scheme VTMSN to ensure the completeness of spatial-temporal Top- query in TMWSNs. The main techniques used in VTMSN are symmetric encryption and information binding. Specifically, it binds the score of each sensed data item with its corresponding generation time, location, and value ranking order by concatenating and encrypting them with the kept symmetric key. Although VTMSN increases the difficulty for the attackers to undermine the completeness of the query results because of the binding relationships, it still has shortcomings. One is that it cannot preserve the privacy of the sensed data items since it leaves the data items disclosed to the fog nodes for ease of Top- query processing on them; another one is that there should be a large volume of location data transported together with the sensed readings, which greatly increases the communication cost of the sensor nodes and fog nodes.
To overcome the latter shortcoming of VTMSN, Wu et al. proposed a scheme named EVTopk  in 2016. EVTopk achieves completeness preservation of the Top- query results by using the HMAC (Hash Message Authentication Code), which is formed by making hashing and encryption operations on the concatenated items including the score, the location, and the neighboring HMAC. However, since each sensed data item should be attached with an HMAC in EVTopk, the HMACs account for a large proportion of the data reports of the sensor nodes and the query results. Moreover, EVTopk is not able to achieve data privacy preservation either. In , a comparative study was made on the two schemes, EVTopk and VTMSN. To further decrease the volume of the proof data in the data reports and the query results, in 2018, a scheme named VIP-TQ was proposed to preserve the integrity of the query results for spatial-temporal Top- query in TMWSNs. In VIP-TQ, sensed data are bound together with their location as well as their neighboring data score using pairwise-key-based encryption. Although the binding can effectively prevent the compromised fog nodes from undermining the integrity of the Top- query results, it leaves the scores of the sensed data disclosed to the storage nodes, which increases the risk of divulging the privacy of the sensed data. In the same year, Ma et al. proposed two other schemes, namely, SSSTQ1 and SSSTQ2 , for securing spatial-temporal Top- in TMWSNs. However, a large number of original locations associated with the sensed data items are added into the data reports and the query results for integrity verification, which heavily increases the communication cost of the systems.
In summary, although there are many schemes related to secure Top- query in existing works, they either have obvious shortcomings or cannot be used in FMSCS, which motivates us to do further work in this paper.
3. Models, Notations, and Problem Statement
3.1. System Model
The system model of FMSCSs is shown in Figure 2. In the model, TA is short for trusted authority , which is a trustworthy party. TA is used to authenticate the identity of end users and MWSNOs (Mobile Wireless Sensor Network Owners) and distribute the secret keys to them. Each fog node in the fog layer connects and manages one MWSN (Mobile Wireless Sensor Network), and each MWSN is assumed to be composed of mobile sensor nodes and is owned by a MWSNO. Specifically, the main responsibility of each fog node is as follows: (1) Collecting, processing, and storing the sensed data items updated by the sensor nodes in its corresponding WSN; (2) managing the mobile sensor nodes in its corresponding MWSN; and (3) responding to the queries that may be sent from the Cloud or the end users directly. End users can retrieve the desired data by launching and sending queries to the cloud or the fog nodes directly if they are not far from the fog nodes. If a cloud server receives a query from some end user, it first determines the fog node, which satisfies the region requirement of the query, and then sends the query to the fog node; if a fog node receives a query, it processes the query locally and sends the query result to the party (the cloud or the end user) who has sent the query.
The mobile sensor nodes in WSNs periodically upload their sensed data to the corresponding fog nodes in the fog layer. We divide time into epochs, and take the time length of each epoch as the period for each sensor node to upload its sensed data items. We assume that mobile sensor nodes in each WSN do not move all the time. They stay at some target locations for certain time intervals when they reach the positions, and go on moving to other target locations if it is necessary. Moreover, we assume that the mobile sensor nodes only generate sensed data items when they are staying at their target locations. Besides, it is assumed that each mobile sensor node just moves within the WSN field where it is located, since it will cost a lot of energy for the sensor nodes to move among different WSN-deployed fields.
In this paper, we use the set to denote the sensed data items generated by sensor node at its target location in the epoch , where is the total number of the sensed data items generated by at its target location in . For any sensed data item , its corresponding data score can be worked out using a public scoring function , namely, . Without loss of generality, we assume different sensed data items have distinct scores . Moreover, in order to facilitate presentation, we assume that the ranking orders of the sensed data items generated by any sensor node at a target location are consistent with their subscript digital numbers. For example, there is , where and are the node ID and the target location ID of , respectively. The specific meanings of the notations used in this paper are listed in Table 1.
In this section, we introduce the definitions of some terminologies used in this paper. Specifically, we define the terminologies used in this paper as follows:(i)Fine-grained spatial-temporal Top- query: it is the query which tries to find out the top sensed data items that have the biggest (or the smallest) scores among all the sensed data items generated in in , where is a subregion of the deployment field of the MWSN whose ID is . The meta-language of a fine-grained spatial-temporal Top- query in FMSCSs is shown in the following equation:(ii)Queried node and queried location: given a spatial-temporal Top- query , if a target location of any mobile sensor node falls in in , the target location is one of the queried locations of ; if at least one of the target locations of a mobile sensor node is one of the queried locations of , the sensor node is called a queried node of .(iii)Qualified Top- data items: given a spatial-temporal Top- query , if a sensed data item satisfies the following two conditions, it is called the qualified Top- data item of : (1) was generated in and ; (2) among all the sensed data items generated in and , there are at least data items whose scores are smaller (or bigger) than the score of , where refers to the total number of the sensed data items generated in and .(iv)Data-proof Packet : for any target location of any mobile sensor node , Data-proof Packet refers to the subreport produced by for the sensed data generated at during . Specifically, consists of the pairwise-key-encrypted sensed data items and the OPE-encrypted scores (“OPE” is short for “order-preserving encryption” ) as well as some proof information generated by at during . More specific contents of will be described in Algorithm 1 in Section 4.
3.3. Security Model
In FMSCSs, fog nodes and the cloud servers are assumed to be untrusted, while most of the mobile sensor nodes and TA are trustworthy. We assume that the untrusted fog nodes and cloud servers are not only curious but also malicious. Specifically, a curious fog node or cloud server will try to disclose the sensed data items as well as the data scores computed based on the public scoring function, and a malicious fog node or cloud server will do its best to undermine the completeness of the results of the fine-grained spatial-temporal Top- queries. To execute a malicious attack, an untrusted fog node may put none or only part of the qualified top data items into the Top- query result, and it may also put some fabricated data items and/or the unqualified-but-real ones into the query result when processing a spatial-temporal Top- query. For example, suppose the complete query result should be . Then, an incomplete query result may be or , where is a real but unqualified sensed data item and is a fabricated data item. An untrusted cloud server may also make some wrong deletions or replacements to undermine the integrity of the query results before it transmits the query results to end users.
In our security model, the privacy of the sensed data items, which are generated by the mobile sensor nodes in FMSCSs, and their corresponding scores should be protected. Other information, such as spatial-temporal Top- query and the generation locations of the sensed data items, will be leaked to fog nodes. It is hard to enable fog nodes to process spatial-temporal Top- query smoothly and successfully without such leaks. Fortunately, the leaked information brings little threat to the safety of the systems. Moreover, we assume each mobile sensor node is assumed to be equipped with the tamper-proof hardware, with the help of which the adversaries cannot disclose the encryption materials stored in the hardware even if they capture the sensor nodes .
3.4. Problem Statement and Design Goal
Under the system and the security models described above, the problem tackled in this paper can be presented as follows: how to make the end users in FMSCSs obtain the query results of the fine-grained spatial-temporal Top- queries launched by them without disclosing the sensor data items and their corresponding scores to the fog nodes and the cloud servers and verify the completeness of the corresponding query result correctly and efficiently. Our design goal is to propose a novel scheme that enables efficient privacy-preservation and integrity-verifiable query processing for fine-grained spatial-temporal Top- query in FMSCSs. Specifically, three objects as follows should be achieved:(i)The privacy preservation goal: our proposed scheme should preserve the privacy of the sensed data items and their scores collected from the mobile sensor nodes.(ii)The integrity verification goal: our proposed scheme should enable end users to verify the completeness of spatial-temporal Top- query results, no matter what attacking means introduced in the security model are adopted.(iii)The efficiency goal: our proposed scheme should be effective in communication and computation. It should greatly decrease the additional communication cost of the sensor nodes, since the sensor nodes are energy-limited. Here, the additional communication cost mainly refers to the cost of transmitting the proof data that are used to verify the completeness of the query results.
4. Our Scheme STQ-SCS
This section presents our scheme STQ-SCS. We first make a high-level description of the scheme as follows. At first, each MWSNO obtains the secret keys from TA and preload the keys to its own MWSN. Then, using the secret keys, each sensor node encrypts its own sensed data items and the scores, and uploads the encrypted data items and their scores to the corresponding fog node. If an end user wants to retrieve the query result of a fine-grained spatial-temporal Top- query, it sends the query to the cloud server or to the fog node directly if it is near the fog node of the target MWSN. If a cloud server receives the query, it first determines which fog node should be the target node of the query, and then sends the query to the target fog node. If the target fog node receives the query, it will work out all the qualified Top- data items, put them into the query result packet, and send them to the cloud server or to the end user directly if the query is received by the fog node from the end user. If a cloud server receives the query result from the fog node, it will transmit the query result to the end user who is the launcher of the query.
As a whole, STQ-SCS can be mainly divided into five parts: (1) secret key distribution; (2) virtual-location construction; (3) secure data preprocessing; (4) secure spatial-temporal Top- query processing; (5) completeness verification of the query results. In the following sections, the five parts of STQ-SCS are described in great detail.
4.1. Secret Key Distribution
In STQ-SCS, all secret keys used in FMSCSs are distributed by TA. To obtain the secret keys, each MWSNO sends a key-request message, which contains its own public key, the ID of its own MWSN, the IDs of the mobile sensor nodes in the MWSN, and some authentication information, to TA. After authenticating the identity of the MWSNO using some existing authentication method such as UAP-BCIoT , TA knows whether the MWSNO has the authority to obtain the secret keys or not. If TA determines to send the keys to the MWSNO, TA distributes a master key for the MWSN and a pairwise key for each mobile sensor node in the MWSN, encrypts them using the public key of the MWSNO, and then sends them to the MWSNO. The pairwise keys are generated based on the method in , while the master key is generated according to the scheme in . Using the similar way, legal end users can also obtain the keys of each mobile sensor node in any MWSN from TA.
In our scheme, two encryption methods are leveraged to encrypt the sensed data items and their scores: one is the latest order preservation encryption (OPE) scheme  and the other one is the pairwise-key-based encryption . The former is used to encrypt the scores of the sensed data items using the master keys, while the latter is used to encrypt the sensed data items and the proof data, such as the target locations of the sensor nodes and the ranking orders of the sensed data items, using the pairwise keys. Section 4.3 will describe this in detail.
4.2. Construction of the Virtual Grids
In STQ-SCS, the sensor deployment field is divided into many virtual grids. Each virtual grid should be as small as possible so that the central location of the grid can be approximately taken as the location of every point in the grid in real applications. Then, we design an ID distribution law for the virtual grids. Based on the law, the real locations of each mobile sensor node can be worked out easily if the IDs of the virtual grids where it has moved to are known.
Specifically, the ID distribution law is described as follows. Suppose the FMSCSs-deployed field is a square rectangle. STQ-SCS divides the rectangle into small virtual grids, where is a small digital number that can divide the length with no remainder. Clearly, the smaller is, the larger is. Then, each virtual grid is given an ID, which is a sequence number ranging from 1 to . The virtual grids in the first row at the upper side of the rectangle are given the IDs 1, 2, 3, , , and , respectively, from the left to the right in order; the IDs , , , , and are assigned to those in the second row orderly;; those in the last row have the IDs , , , , and , respectively.
Using such an ID distribution law, each sensor node first works out the IDs of the virtual grid where it has moved to, and then takes the IDs as the coordinate values of its target locations.
4.3. Secure Data Preprocessing
This section describes how each sensor node generates its data report, which will be uploaded to the corresponding fog node at the end of each epoch, based on its own sensed data items under the privacy-and-integrity preservation requirements. Specifically, for any sensor node , the procedure of data report generation in STQ-SCS is shown in Algorithm 1.
In the protocol, firstly computes the score of each sensed data item generated by itself based on the public scoring function; then, it works out for each of its target locations which it has been moved to during epoch . To do this, three cases are considered: , , and . If , should include to show that no sensed data were generated by at in epoch , where is a symmetric encrypting operation with based on ; if , should contain to indicate that only one sensed data item was generated by at in epoch , and it also needs to include both the pairwise-key-encrypted score and the OPE-encrypted score of the only data item. The former will be used as part of the proof information for integrity verification, and the latter will be used by fog nodes to process spatial-temporal Top- query smoothly. The only sensed data item should also be encoded using the pairwise key and included in . If , the contents of are a little complex. Specifically, it contains not only the OPE-encrypted scores and the pairwise-key-encrypted data items and scores but also the chaining relationships of the ranked sensed data items. The chaining relationships, which are used to prevent the adversaries from destroying the integrity of the Top- query results by dropping part of the qualified Top- data items, are achieved by encrypting each sensed data item together with its ranking order number, which is called the sequence number in the following of this paper, using the pairwise key . Moreover, each sensed data item is bond together with its corresponding target location to further strengthen the integrity preservation of the Top- query results. The final output in Algorithm 1 is the very data report which will be uploaded to the corresponding fog node of .
4.4. Secure Spatial-Temporal Top- Query Processing
This section presents how a fine-grained spatial-temporal Top- query is processed in FMSCSs in our proposed scheme STQ-SCS. When a cloud server receives a fine-grained spatial-temporal Top- query from an end user, it first finds out the destination of the query according to the mapping relationships between the MWSN IDs and the fog nodes (Information about the mapping relationships is assumed to be stored in the cloud server). Then, the cloud server sends the query to the target fog node. When the target fog node receives the query, it processes the query according to Algorithm 2. After that, it sends the processing result back to the cloud server. If the query is sent from an end user, the fog node will send the query result back to the end user directly.
In Algorithm 2, the fog node first processes every data report uploaded by the sensor nodes in MWSN and then packets all the processing results of the data reports collected in the queried MWSN to form the final query result of the spatial-temporal Top- query. Specifically, lines 1–9 aim to find out the number of locations that fall in of each sensor node in MWSN and the corresponding generated at those locations; from lines 12 to 42, there is a big “” loop, which is used to process every report generated in MWSN in . Line 14 shows the processing result of considering the case that no target location of falls in in ; lines 16–39 describe the procedure of processing considering the case that there is at least one location of that falls in in . In the abovementioned latter case, all the that correspond to the target locations located in are processed based on the exact values of and/or , where and denote the total data number and the qualified data number, respectively, corresponding to the location , which is supposed to be in the queried region . During the procedure of processing the , the OPE-encrypted items are all removed from the original since the only use of them is to make fog nodes find out the qualified Top- data items encrypted with the pairwise keys. Moreover, all the unqualified data items except for the one which follows the last qualified Top- data item in each are also removed from each original , and the reserved one will be used for completeness verification of the spatial-temporal Top- query results.
4.5. Completeness Verification of the Query Results
The procedure for an end user to verify the completeness of the Top- query result is presented in Algorithm 3, the output of which is the value of the Boolean variable . If is , is considered as incomplete; otherwise, is complete and the final in Algorithm 3 is composed of all the qualified Top- data items corresponding to the fine-grained spatial-temporal Top- query .