Abstract

Several methods use cache for decreasing the number of connections to protect privacy of user data and improve performance in Location Based Services (LBS). Many of these methods require users to trust other users or third parties, which could be servers. An intruder could be disguised as a user or a third party. In this article, we propose a new method, known as “Double Cache Approach”, which uses a pair of caches to reduce the vulnerability of trust between users or third party and offers a vast improvement in privacy and security of user data in healthcare and other applications that use LBS. This approach divides the area into many cells and manages the cooperation among users within two caches at the access point with wireless communication. To demonstrate the superiority, we also provide simulation results of user queries, comparing the proposed method with those using only one cache. We believe that our approach would solve the trust problem optimally, achieve a comprehensive protection for users’ data, and enhance the privacy and security levels.

1. Introduction

Healthcare is one of the critical domains [1] where Internet of Things (IoT) has enhanced its quality and usability [2, 3] with smart phones or wearable devices like RFID tags. Most of the healthcare applications use Location Based Services (LBS) for searching points or places of interest like hospitals, health centres, and pharmacies with the help of GPS technology [46]. This technology plays a significant role in medical emergencies, healthcare applications, and establishing contacts in other domains [7, 8]. However, the use of LBS services entails significant risks of breach of user's privacy and security [9, 10]. The attacker may be able to determine the location and track and build profile and pattern of the user’s movements [11, 12]. The attackers can also gain user’s personal and sensitive information like their whereabouts at a specific time, job, health conditions, financial and social status, and religion, political, and ethical inclinations [1315], which could limit future use of e-healthcare systems [1618]. A server provider can also breach the privacy, consequences of which could be quite serious [19, 20].

Many methods and techniques exist to protect privacy in application and services of IoT but none of them can ensure total protection. Before embarking on these methods, let us discuss the standard format of queries that are launched from clients to LBS server. As shown in Figure 1, there are three components in a query, namely, ID, Location, and Query-Type. Existing approaches are designed to defend privacy from an outer attacker, server provider, and all other types of attacks [21] by protecting one or more components. The methods or approaches for privacy of data in LBS, as in [22], may be grouped into eight classes, namely,(i)Trusted Third Party (TTP) (also known as Cloaking area & K-anonymity);(ii)Obfuscation and Land-marking;(iii)Mix Zone, Private Information;(iv)Retrieval (PIR) and Encryption;(v)Dummies;(vi)Cooperation between Peers;(vii)Caching;(viii)Hybrid.

Each one of them has one or more kind of limitations and consolidated open problems [2125], which are described below.(1)How to deal with Anonymizer or TTP without having trust?(2)How to generate high-efficiency dummies?(3)How to establish reliability between users and reputation of peers?(4)Can the cache hit ratio be improved to achieve more privacy?(5)Can the overhead computation be relaxed in the PIR approach?

In this paper we shall resolve first four problems by proposing the following:(1)An approach, known as Double Cache, for protecting privacy in the applications of LBS from different attacks in all fields including the healthcare;(2)A new method for generating smart dummies with high performance;(3)A new technique for managing cache and enhancing performance;(4)A solution for the trust issue between peers.

2. Literature Review

With the emergence of the IoT paradigm, enhancement and developments of services have taken place in several domains. In particular, healthcare has been a major beneficiary. Healthcare is now benefited by sophisticated applications and tools such as wearable sensors, which can(i)Monitor patients and senior citizens continuously;(ii)Help people with breathing problems to find less polluted routes;(iii)Help in statistical or data mining processing;(iv)Determine the likelihood of spreading a contagious viruses and diseases in specific areas(v)Frequently update donor locations;(vi)Help in emergencies and ambulance services;(vii)Assist drones in transporting medical supplies to obscure locations;(viii)Track and monitor activities of infants or children and subsequently report any associated emergencies to their parents or guardians [1, 26, 27].

Since most of these services use mobile LBS, requiring wireless connections, privacy can easily be violated by server providers through revealing the locations, movements pattern, habits, and behaviours of the people they serve [25, 28, 29].

Use of cache for preserving privacy is a relatively new approach and hence only a small number of researchers have discussed it so far. Most of classical approaches like collaboration, dummy, and K-Anonymity have used additional storage; however they have suffered from many limitations including uncertainty, overhead on user's device, disconnecting, and trust [30]. Concept of a TTP-Free class was introduced which did not need any trust with LBS server or any third party. Instead, it relied upon collaboration between users and peers to preserve privacy; however the issue of trust between peers still needed a resolution [31]. In pursuit of a resolution of trust issue, some researchers have proposed an idea of exchanging some information about previously visited area among peers to create a robust cloak area, which paved the way to consider storing some information on the client side [32]. The method in [23] used cache in TTP to store answers of some queries of users for the purpose of using them in future. It increased the privacy and performance level but resulted in decreasing the number of connections to server. This approach posed the challenges of cache management processes like data freshness, query selection, data consistency, cache hitmiss ratio, and estimation of required area to cache.

In the K-Anonymity approach, TTP would send the queries of K-Users to LBS server and store the answers for future queries [33]. This is an effective approach to protect privacy, although the cost of computation and connection turns out to be high. Use of cache was suggested to overcome this but as we know cache would require its management. To overcome the limitations of K-Anonymity, some more complex methods were suggested [34]. There are other techniques where cache is used as an aid tool in the cooperation systems to enhance the privacy and decrease the overhead on connections to LBS server [35]. In [25], mobile cache is used for saving results of the queries which can be used as answers for user queries.

Another approach known as “Mobi-Cache” is based on two ideas. The first one is an algorithm for dummies selection (DSA) in addition to an enhancing method (en-DSA) based on the Entropy of each cell (ratio of queries raised from each cell), used for optimizing the benefit of these dummies for new queries. The second idea proposed is using the cache at the access point available in each cell or specific area to store the answers of queries and dummies [36]. By having description and more details of DSA and en-DSA algorithms for careful generation, the dummies can be used or requested in the future as they achieve increased privacy and performance. The main idea is to divide the area into many cells (equal sizes) and then calculate the Entropy metric for each one through the number of queries that are launched from it. After that, the dummies would be selected from the cells that have similar value to increase LBS server's uncertainty and use the dummies' answers in the future [37].

In the subsequent research, focus was on usage of cache, which was also divided the area into many cells, pointing each to an access point. This approach relaxed the overhead problem of dummy approach and that of the need for trusted parties [38]. In [39], Long Statistical Attack (LSA) and Regional Statistical attack (RSA) were proposed. As an enhancement of this approach, two new techniques were introduced. The first of them suggested dividing cells into four levels of privacy and changing size of each one to be bigger or smaller to achieve the required Entropy Value, and the second proposed using multinames (M-Name) for each user instead of single name (S-Name) [5]. It divided previous approaches to three classes according to the trust, namely, TTP based, Semi TTP, and Free TTP, and suggested a protocol for dealing between peers lacking trust, by hiding precise location to create a suitable cloaking area.

Collaboration between peers to obfuscate their locations into cloak area with K-users has led to more research [40], which supported continuous queries, as opposed to a snapshot, by way of using the cache. In [41], a new technique of cooperation between users by exchanging queries between themselves before sending them to a server in the Private Information Retrieval (PIR) systems was proposed to hide the identity of each one and retrieve the data from the server without revealing it. Although it enhanced the users’ privacy in LBS server, it also relied upon the trust among peers in addition to overhead of using encryption.

Authors in [23] have created an integration between the cache technique and another one for exchanging queries among each pair of users to provide a solution of some problems and challenges. This includes the overhead in generating dummies and the need to trust in third party (TP) and the cache hit ratio, but at the cost of trusting another peer. Throughout this article, this approach will be referred to as P2PCache, which provided a new method to manage cache and the freshness issue of cache. A research survey of protection strategies and attack models is discussed in [42], which has outlined the following research topics in the privacy domain:(1)Using semantics of data or locations to enhance the privacy;(2)Preserving privacy for the location of data collection;(3)Using an efficient indexing technique to relax the high cost;(4)Creating a generic framework to address all privacy elements.

In view of the forging discussion, we have the following open problems in privacy domain:(1)How to create a trusted party;(2)How to optimize the cost of connections and overhead in some techniques;(3)How to efficiently manage cache, freshness, and enhance the hit-ratio;(4)How to effectively manage connections and collaboration among peers.

In this paper, as follows, we have provided solution for these problems.

3. Proposed Method

The major issues of user privacy are in the existing methods being centred around the trust and cache management.

3.1. Trust Management Issues

As mentioned earlier, ensuring trust between users and third party (server) is an open problem. The “trust” can be classified as follows:(i)Trust with a service provider: it is regarded as the biggest threat because the service providers have all information about the users. Many techniques, as discussed before, had tried to address this issue but the problem was far from being resolved.(ii)Trust with third party: this is known as anonymizer and used to avoid the trust with LBS server by hiding the identity of users, obfuscation their locations, generating dummy users, managing cloaking area, or helping peers to create a protected area. But this ended up shifting the problem from LBS server to another server and so did not produce an effective solution.(iii)Trust with peers: in this case, users cooperate with each other to protect themselves. This technique as provided in [23] is superior though it still has its own issues. On one hand, the threat may still come from peers and on the other hand the management of cache is a complex procedure.

3.2. Cache Management Issues

As we discussed before, the cache was utilized to store some results of user’s queries for answering future queries. In addition, it used so-called freshness technique to refresh the stored results. However, the freshness technique suffers from the following drawbacks:(i)Sometimes the cache may not be utilized at all and so it will be cleared;(ii)Using refresh process would affect system performance;(iii)No distinction exists between queries that are submitted repeatedly and the ones which are submitted once only, which would adversely affect the efficiency of cache itself;(iv)Filling cache by generated dummies would affect the cache hit-ratio adversely.

To overcome these anomalies, we propose a new technique, known as Double Cache Approach (DCA), which uses a pair of caches and replaced the freshness technique with a proposed one.

3.3. Double Cache Approach (DCA)

Initially the use of cache in privacy domain was to reduce the number of connections to LBS server, which was considered to be the biggest threat for users. In doing so, the privacy level was enhanced and so was the performance. However, this did not solve any of the open problems in the LBS system.

Before providing the details of the proposed DCA, we explain it by means of a simple example. Suppose a client or user A wants to choose a less polluted walkway without revealing their identity and the chosen walkway. For example, the query of A would be sent to another user B, who would submit it to LBS server. Consequently, server receives wrong information about B, but A would receive right answer for . To hide the information of A from B, a pair of cache would be used. This process can be repeated by A with other users. We shall call this method as the Double Cache Approach (DCA) and demonstrate the capabilities of that DCA to protect the patient location from malicious SP without effecting the quality of services. It should be noted the other known methods as those discussed in [4346] cannot protect the privacy of A from B or SP. In reality, DCA can be seen as an enhancement of the P2PCache and removes the following anomalies of [23]:(a)Lack of trust between peers, regarded as an open problem.(b)Self-management for communication between peers affecting performance.

In DCA, we have a pair of caches, namely, Cache1 and Cache2, for addressing the trust issue between peers themselves and achieving new benefits. We shall use Cache1 for collecting previous answers and Cache2 for swapping queries between peers and managing the cooperation among them.

3.3.1. First Scenario (Cache2 with First-In-First-Out)

Suppose a user A wants to submit a query to LBS server. Before doing so, A should search for an answer of the query in Cache1. If an answer was not found, should be swapped with , a query from some other user X in Cache2, and then send to LBS. It is unlikely to find Cache2 empty but in case it was, user would submit just a dummy query to LBS server and store the query in Cache2. When an answer of was received, A would store it and look for an answer of in Cache1. In the same manner X can swap with a query of another user Y. This process is demonstrated in Figure 2.

3.3.2. Second Scenario (Cache2 with Priority)

Unlike the previous case, user A can directly submit query to Cache2, and wait for an answer in Cache1. However, this may cause a considerable delay in getting an answer. To increase the priority of query resolution, A should submit one or (preferably) more queries of other users to LBS server. This method is shown in Figure 3. Let us calculate priority for query position in Cache2. For example, let NQC be the number of current queries in the Cache2, UQC the number of queries which user U has stored in Cache2, and UQL number of queries which U has submitted to LBS server. SoThe idea behind this second scenario is to increase the level of cooperation among peers for mutual benefit through saving of power consumption for users with limited battery/bandwidth. This situation does not arise in the first scenario, which forces users to connect LBS server. In contrast, the second scenario allows users with more resources to get more than one query from the cache and direct them to the LBS server. Users with low battery would place their queries in the cache and not connect to server.

According to waiting issue:UQC = 1 often, because user usually needs answer for one query in the same time and cell.If UQL = 1, it would revert to the first scenario.If UQL > 1, that means the frequency of outing of queries to LBS server would outnumber the incoming quires to cache NQC will tend to 1, eliminating need for any waiting.If UQL=0, the query will be at the position NQC, which is the last of position in the cache. In this situation, user might wait until another user gets more than one query from cache.

Therefore, the waiting would occur only rarely, and if it did, it would be in the case when none of the other users would have taken more than one query from the cache, which is not realistic in the LBS system because all users would want to send more dummies to the LBS server. Moreover, if user was patient, they could easily take a query from the cache and send it to the LBS server to return to first scenario. Note that both scenarios are starvation-free.

3.4. Proposed Cache Management Technique

The DCA regards cache as a special queue and eradicates previous drawbacks by exploiting its size. It will hence store answers in Cache1 and if Cache1 was full, the oldest answer (Min ID) would be removed and the new answer would be added and Max ID would be incremented by 1. However, if answer already existed in Cache1, then only ID of answer would be changed to MaxID+1. We have also used Bloom filter (Figure 4) (Hashing Technique) to increase the efficiency of cache and decrease the response time, especially in the miss-hit cases, where it would directly find out if the required answer for specific query was removed from cache or not determined by (O(I)), where O in [38] is the number of elements in cache. Bloom filter directly gets an answer for O(I), if the element existed in cache or not before I search.

3.5. Algorithm of Cache Management

See Algorithm 1.

Input: Query (q) of User A   Output: Answers of q
// Search-Function
1 if (q Query in Cache1) then
2 = ( in Cache1) + 1
3 Return
4 end
5 else
6 Insert (q in Cache2)
7 Get first q~ of Cache2 // Unknown User
8 Insert to Cache1 with
9 Query with in the Cache1 will be deleted.
10 end
11 Recall Search-Function
12 End Function
3.6. Benefits of DCA

Use of additional cache to handle query swapping among peers, with a new technique for managing caches, resolves the issue of trust between peers and enhances performance of the whole system without having to generate dummies. DCA facilitates superior collaboration among peers, unlike that in [23], where the user had to search and contact a peer directly. In particular:(i)answers of queries would be kept in Cache1 permanently without having to refresh them;(ii)DCA would utilize the whole size of Cache1;(iii)no dummy queries would be stored in the Cache1;(iv)user would not need to trust other peers;(v)query swapping would improve the performance and the cache hit-ratio, in addition to increasing the level of privacy by feeding LBS with misleading information about users.

3.7. Mathematical Proof for Superiority of DCA

We divide the area for N different cells and assume that is the probability of the number of queries produced from the cell i, soIf user sends k queries to server, one of them real and others (k-1) dummies, thenwhere Pi denotes the probability of qi being a real query from k ones which will be (1/k) only if the whole queries have same priority.

If H is the Cache hit ratio, thenThe privacy level, according to Entropy metrics (E), lies in , where E = 0 means that a server knows the user’s query, and E = 1 means it does not. ThenEarlier methods have tried to increase E [38] and added k-1 dummies to the real query before sending it to server to make E larger. In case of the method of [23], E would attain maximum value of 1 for server, because query would not be sent by the user. However, the E value peers will be zero because when swapping user sends his real query to other peer so Pi = 1 thereby would be zero, which is a drawback in [23]. DCA has solved this issue by avoiding the direct dealing between peers and so E = 1, both for server and peers. This would achieve more privacy in addition to better performance. Note that the privacy and performance can be increased in two ways, namely, if H or E becomes larger.

4. Threats Models and Security/Privacy Analysis of DCA

Service providers, who can access, manipulate, and reveal user data from LBS, can be classified as active attackers, whereas the other users who only eavesdrop, may be known as passive attackers. Here we discuss the effectiveness of DCA in dealing with these kinds of attackers in the following scenarios:(i)Semantic context: attacker has personal information like age, work history, etc. of the user. In DCA a user is not required to send personal information and exchange their query with random and this kind of threat is eliminated.(ii)Trajectory infers: tracing user. DCA solves this problem as the user would be in a different location, which the attacker cannot trace.(iii)Historical (temporal) attack: it happens when attacker accesses and analyses a lot of user queries. In DCA, this would not happen because a user would always submit other user’s query to LBS.(iv)Inversion attack: this happens when the attacker knows user’s algorithm or technique of protection. In DCA, again this would not happen because, even the attacker knows the used technique, they would not know the real query of the user.

Other types of attacks relate to skills and knowledge of attackers about different kinds of information of cells, as follows:(i)Diversity level: kinds of people of interest (POIs).(ii)Closeness level (uniformity): cells have been adjacent to each other.(iii)Congestion: number of users are close to be the same in each cell.(iv)Location homogeneity: all POIs in the cell have same location (healthy buildings).(v)Knowledge about the map: an attacker may be skilled to determine the type of area or crop some parts of it to make the area smaller to detect more accurate information about user’s location.

In DCA, the possibilities of these can be diminished by having unequal sizes of cells. In addition, we propose that the users should change their alias when they enter a new cell.

4.1. Matrices

Here we discuss and compare privacy and performance metrics of other approaches [5, 35, 42, 47] with DCA.

4.1.1. Privacy Metrics

Many of these metrics are used to measure the system efficiency and compare between approaches. Here we discuss only the critical ones.

K-Anonymity. It refers to number of dummies which are sent to LBS server with real queries. If the number of dummies, k, equals to nine, the user would send ten queries to server (the user should keep the value of k to the minimum). We know that

Entropy (E). It is the most important measure which refers to privacy level and quantifies the anonymity and the amount of data that LBS server has from each user. The user should aim to have it close to 1. As in (6), it is defined as

Ubiquity (U). It refers to user’s existence at each point in the cell to deny attackers to detect identity. It is easily achieved in DCA because of query swapping mechanism. Actually, U is used to measure the movements of users and probability of their existent in a specific location in a cell and is defined as

Uncertainty/Estimate Error (EE). This metric is related to the server to measure the amount of error in trying to determine the position of users. In DCA, EE will be maximum because server provider would not have any clue of the user location (only false information). It is defined asThe Entropy of Privacy is related to the amount of right information in the LBS server about user A. Therefore, if the amount of right information in LBS server equals 0, that means the Entropy will be maximum (E = 1) and it leads to maximum uncertainty (EE = 100%).

4.1.2. Performance Metrics

The performance is dependent on factors like number of dummies, algorithm of generating dummies or obfuscation, encryption, cache management, number of queries sent to LBS server, and cost. In DCA, cache concept, smart dummies, and cooperation are used, to measure the performance as follows.

Cache Hit-Ratio (CHR). We use cache hit ratio to measure the percentage of the queries answered by the cache, which depends on the number of connections to LBS server and is defined as

Response Time (RT). It is related to the requested time of all operations in the system. It is defined aswhere ST is the time of sending and returning a query and PT is the processing time.

5. Simulation and Results

This section contains simulation of implementation of main features of DCA that draws a comparison with P2PCache [23] and other methods, namely, Enhanced-CaDSA, Enhanced-DLS, and MoCrowd [36, 38]. For this, we have used MATLAB 2015. We propose to divide the area into 100 × 100 cells with 10,000 virtual users. Out of the par of caches, first (cache1) was dedicated for saving the answers of queries with POI included in cells with wireless connections and the second (cache2) to store queries and manage the swapping among peers and by avoiding direct cooperation.

As mentioned earlier, the format of any query in LBS applications is Latitude, Longitude, POI/TYPE, USER-ID where POI represent the type of user’s query. Enhanced approach changed the location part to cell-number and added time-stamp instead of USER-ID for cache freshness operation. Therefore, each query needed less than 1 KB of the cache and after 4 hours the storage cost was 53.7 KB [38]. Consequently, the total size of cache in each cell was 100 KB, which was sufficient. The count of POIs from Google-API for New-York city was around 250,000, requiring 250 MB storage. Finally, either 3G or 4G Wi-Fi connection in the smart cities environment or in specific areas and infrastructure would be adequate.

In DCA, same query format was used as in P2PCache but removed the part of the USER-ID and Time-Stamp Latitude, Longitude, POI to make it cost less. However, we added another cache (cache2) to facilitate query swapping.

We now provide examples to prove our claim of supremacy of DCA over earlier available methods. Note that the connection between peers would rely on the mobile app and Wi-Fi, despite having Cache2.

5.1. Cost Communication

It is a measure of the number of connections in LBS server and amount of data on the link, which in our approach, as shown in Figure 5, has better performance than any other methods described in [38]. Notice that DCA and P2PCache have used the minimum and same number of queries sent to LBS server; however the difference lies in the management of cooperation between peers.

5.2. Response Time

DCA overcame the anomalies of P2PCache [23] and Enhanced-CaDSA discussed in [38], and, with the help of Bloom filter, saved time in case of miss-hit in the cache2. Furthermore, the swapping technique did not need searching for or direct dealing with peers, resulting in improvement of performance and enhancement of privacy, as can be seen in Figure 6.

5.3. Cache Hit Ratio

This metric can be improved by storing the queries in Cache2, which may later be requested by other users. As shown in Figure 7, DCA and P2PCache achieved the best result because we submitted real queries in Cache2 instead of dummies and better management of cache as explained earlier.

5.4. Privacy Metric (Entropy)

Other methods, as in [38], have used dummies (k = 10, 20 or 30) along with the real query to send to LBS to increase the privacy and to maximize this metric. However, it would reduce performance and cache hit-ratio. In DCA, user would just send only one query and the same would be done by the other user, resulting in enhancing the performance and cache hit ration, in addition to maximizing Entropy Value to 1 as shown in Figure 8.

5.5. Amount of Data Collected by the LBS Server

Like Entropy, it relates to the amount of information that the server can obtain for each user. When k increases, the ratio would decrease because the server cannot distinguish the actual query from dummies. Thus, DCA would perform better than other approaches because the number of queries is only one, information obtained by server is zero, and so is the ratio.

5.6. Amount of Data Collected by the LBS Server

It refers to degree of spread of a user in a cell and so if it increases, so does privacy because LBS server would not be able to determine the real location of user. Ubiquity (Figure 9) in DCA is the best as compared with other methods because it has best Entropy and the swapping technique in Cache2, which amounts to better spread of data in the whole cell.

5.7. Performance of Searching in Cache

As we discussed before, the Enhanced-CaDSA approach [38] uses freshness-time method to manage the content of cache and refresh the time-stamp of the remaining duration of each query in the cache. In the case of DCA, we achieve better results by updating the position of higher order queries and removing the least requested ones in Cache1. In the simulation shown in Figure 10, we have used the hit-ratio, Tc-hit, the response time in the hit case, TC-mis, the response time in the missed case, Tl, the response time from LBS server, Ct, and the response time of collaboration among peers. Using Bloom filter, we get TC-mis ≈ 0, and, for Cache2, we get Ct ≈ 0. Also Avg_enhance = + (1-H) (Tl+Tc-miss), Avg_Previous = + (1-H)(Tl) + Ct, and Avg_DoubleC = + (1-H)(Tl)

5.8. Comparing between Cache Management Methods

In earlier methods of freshness [38], for each stored query time, there are N write operations on cache, where N is the count of current stored queries in cache. When N is changed, so is the ratio of management time. In the hit case of cache, there is just one read operation, and in the miss case there are N read operations. In the miss case of DCA, there is just one read operation, and in the hit case there is one write and one read operation. There is no dealing with freshness here; instead, after each hit or miss, there is one write operation to change the order of selected query. From Figure 11, we can note that the time of management will be static in DCA, whereas it would increase adversely with increase in the time period of frequency freshness.

5.9. Limitations

For performance, trust, and privacy, DCA approach would be better. However, there are some drawbacks:(a)Cost of distribution of the caches by dividing at the access points cells in the physical environment would be higher. Therefore, we have proposed that this issue should be dealt with smart city, which the governments or significantly large organisations can afford by investing the resources required.(b)Cache needs to be protected from eavesdropper or hacker.(c)DCA would work well with LBS but there are other privacy issues in healthcare domain which we shall address in future.

6. Conclusions

In this paper, we proposed DCA as a novel approach for preserving privacy and security in LBS of IoT applications in general and the medical field in particular. It is the first technique which uses the idea of two caches, addresses the trust among peers, and manages the relations between them in wireless environment. Simulation results have shown overcoming superiority of this approach over previously known ones dealing with the privacy, cost of communication, and performance.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.