Abstract

Location-aware publish/subscribe is an important location-based service based on server-initiated model. Often times, the owner of massive spatio-textual messages and subscriptions outsources its location-aware publish/subscribe services to a third-party service provider, for example, cloud service provider, who is responsible for delivering messages to their relevant subscribers. The issue arising here is that the messages delivered by the service provider might be tailored for profit purposes, intentionally or not. Therefore, it is essential to develop mechanisms which allow subscribers to verify the correctness of the messages delivered by the service provider. In this paper, we study the problem of authenticating messages in outsourced location-aware publish/subscribe services. We propose an authenticated framework which not only can deliver the messages efficiently but also can make the subscribers’ authentication available with low cost. Extensive experiments on a real-world dataset demonstrate the effectiveness and efficiency of our proposed authenticated framework.

1. Introduction

With the rapid development of mobile Internet and positioning-enabled devices (e.g., smart phones), massive amount of data that contain both text information and geographical location information are being generated at an unprecedented scale on the Web. This enables location-based services (), such as Foursquare (https://foursquare.com) and Yelp (https://www.yelp.com), to be extensively deployed in many systems and widely accepted by Internet users. Location-aware publish/subscribe is an important kind of service based on server-initiated model (relative to user-initiated model, like spatial-keyword query) in . For example, in a Groupon system, subscribers register their spatio-textual subscriptions to capture their interests (e.g., “Adidas shoe discount at Beijing, China”) (for the rest of this paper, we use “subscriber” and “subscription” interchangeably if the context is clear). For each Groupon message with textual description and location (e.g., “Adidas running shoes at cheap prices at Adidas factory store, Beijing, China”), the system delivers the message to relevant subscribers.

Since location-aware publish/subscribe is a compute-intensive task, if the data owner of massive spatio-textual messages and subscriptions wants to efficiently deliver each message to relevant subscribers, to strengthen its ability of computing, it needs to build up basic IT infrastructure and hire specialized personnel. However, as such cost might be unaffordable for small-to-medium businesses, outsourcing the data and computations to a third-party service provider (e.g., a cloud service provider) has been an appealing option. Yet, this outsourcing model presents a great challenge that the messages delivered by the service provider might be incomplete or incorrect. There are a variety of reasons for this. First, the service provider might deliver tailored messages to favor its sponsors. Second, the service provider might use some inferior algorithms and deliver the suboptimal messages to the subscribers to save computing resources. Third, with the growing popularity of the cloud, more and more security breaches and attacks on such systems have been reported. In case an attacker takes control of the service provider’s server, it may forge the messages for its own interest.

The aforementioned reasons necessitate the development of mechanisms that allow subscribers to authenticate the messages delivered by the service provider. They should be verified in terms of two conditions: (1) soundness and (2) completeness. The former means that the messages are not tampered with, while the latter implies that no valid message is missing.

In this paper, to make one step further towards practical deployment of location-aware publish/subscribe in untrusted outsourcing environments, we study the problem of authenticating messages in outsourced location-aware publish/subscribe services. To address this problem, we present an authenticated location-aware publish/subscribe framework. We assume that messages are allowed a maximum delay to be delivered to their corresponding subscribers. The data owner organizes the messages within () in an authenticated data structure () called TMR-tree. Based on the TMR-tree, the service provider first computes the relevant messages for each subscription. During this process, we present an inverted index pruning technique to reduce the times of inverted index (used to index the subscriptions) traversal, thus improving the efficiency of computing the relevant messages for each subscription. Then, the service provider constructs a verification object () for each subscription and the corresponding subscriber can authenticate the messages delivered to it. A thorough experimental study on a real-world dataset is conducted over a wide range of workload settings to evaluate the effectiveness and efficiency of our proposed framework in terms of various performance metrics.

Roadmap. The rest of this paper is organized as follows. Section 2 introduces some preliminaries, which include system model, problem definition, and background knowledge. Section 3 presents our proposed authenticated location-aware publish/subscribe framework. In Section 4, we experimentally evaluate the performance of our proposed framework. Related work on the location-aware publish/subscribe and authenticated query processing is surveyed in Section 5. In the end, we conclude the paper in Section 6.

2. Preliminaries

In this section, we first describe our system model. Then, we define the problem studied in this paper. At last, we introduce some background knowledge on cryptographic primitives and location-aware publish/subscribe which underlie our proposed framework.

2.1. System Model

As shown in Figure 1, our system involves four entities: the data owner, the service provider, the subscribers, and the key distribution center ().

First, the data owner builds an authenticated data structure () over the messages within (; recall that is a predefined maximum permissible delivery delay) and signs the using the private key distributed by the . Then, the data owner outsources the location-aware publish/subscribe services to the service provider, who provides the storage resources for the messages, the , the signature of the , and algorithms. Based on the , the service provider finds the messages which are relevant to the registered subscriptions and constructs a verification object () for each subscription. After that, the service provider delivers the messages and the to corresponding subscribers. The subscribers authenticate the soundness and completeness of these messages using the and the public key distributed by the .

Throughout this paper, we assume that (1) the and the data owner are trusted but the service provider is the potential adversary and might fabricate the messages (intentionally or not); (2) the or the data owner does not collude with the service provider; (3) the computation and storage capacities of the service provider are polynomially bounded.

2.2. Problem Definition

In this paper, we study the problem of authenticating messages in outsourced location-aware publish/subscribe services. That is, the subscribers register their interests as subscriptions in the system first. Then, the service provider not only needs to efficiently deliver the messages within to the relevant subscribers whose subscriptions have high relevancy to the messages, but also needs to construct a for each subscriber to allow them to authenticate the soundness and completeness of the delivered messages. The should be constructed as small as possible for minimizing the communication cost between the service provider and subscribers. Meanwhile, the should be suitable for subscribers’ authentication for minimizing the computational cost at the subscribers side.

2.3. Background Knowledge
2.3.1. Cryptographic Primitives

We present the essential cryptographic primitives on one-way hash function, cryptographic signature, and Merkle hash tree as follows.

One-Way Hash Function. A one-way hash function maps a message of arbitrary length to a fixed-length output . It works in one direction. It is easy to compute for a message . However, it is computationally infeasible to find a message that maps to a given .

Cryptographic Signature. A cryptographic signature (or simply signature) is a mathematical scheme for demonstrating the authenticity of a digital message. A signer applies for a pair of private key and public key from the . The former is kept by the signer secretly and the latter is publicly distributed. A digital message can be signed using the private key. The authenticity of the message can be verified by anyone who receives this message using the public key.

Merkle Hash Tree. The Merkle hash tree () [1] is an authenticated data structure used for collectively authenticating a set of messages. The is a binary tree and built in a bottom-up manner, by first computing the hash values of the messages in leaf nodes. The hash value of each internal node is derived from its two children nodes. Finally, the hash value of the root is signed by the owner of the messages. The can be used to authenticate any subset of messages, in conjunction with a proof. The proof consists of the signed root and sibling nodes (auxiliary hash values) on the path from the root down to the messages which need to be authenticated.

2.3.2. Location-Aware Publish/Subscribe

We present the state-of-the-art method [2] for location-aware publish/subscribe as follows.

A location-aware publish/subscribe service delivers each message, denoted by , to its relevant subscribers who register spatio-textual subscriptions (each subscription is denoted by ) to capture their interests. () is a spatial location with the latitude and longitude. () is a set of keywords () and each keyword is associated with a weight which can be set as the inverted document frequency () of the keyword. To quantify the relevancy between a subscription and a message, [2] used a spatio-textual similarity functionwhereis a textual similarity function which is similar to the weighted Jaccard coefficient andis a spatial similarity function, where is the Euclidian distance between and , and is the maximum user-tolerated Euclidian distance between subscriptions and messages (which can be set as the maximum distance between subscriptions). is a preference parameter to tune the weight of textual and spatial similarity. A subscription and a message are called relevant if their similarity exceeds a threshold . Since subscribers usually have different preferences and requirements on and (e.g., some subscribers prefer highly relevant results while some subscribers want to get more results), subscribers are allowed to parameterize their parameters and . Therefore, a parameterized spatio-textual subscription can be redefined as . Figure 2 shows an example of 11 parameterized spatio-textual subscriptions and 7 messages.

To deliver messages to relevant subscribers efficiently, [2] proposed a spatial-oriented prefix to prune irrelevant subscriptions and devised a filter-verification framework. In particular, with respect to the textual filter, [2] claimed that if a subscription is relevant to a message , they must share at least one common keyword in the so-called prefix of , which is computed from the textual similarity threshold. More specifically, based on (1), given a subscription , since the spatial similarity cannot exceed 1, [2] deduced a textual similarity threshold

When , based on , [2] selected a prefix for each subscription . The keywords in are sorted by their weights in descending order and a minimum such thatis computed, where is the total weight of keywords in . Therefore, the prefix of can be defined as . Since the total weight of keywords after is smaller than , if a subscription is relevant to a message (i.e., ), they must share at least one common keyword in .

When , a message may be relevant to no matter whether they share common keywords. To address this issue, for a subscription , if , [2] introduced a virtual dummy keyword “” with weight of 0 (i.e., ), and the prefix of includes its keywords and “”.

Regarding the spatial filter, based on the first match keyword (denoted by ) between and (i.e., does not contain keywords before in ), [2] estimated an upper textual similarity bound of to as follows:Accordingly, [2] estimated a lower spatial similarity bound between and as follows:For any message, if its spatial similarity to is smaller than the lower spatial similarity bound , the subscription can be safely pruned.

Since given a subscription and a message , we do not know which keyword is their first match keyword (if they have), and the first match keywords for different messages to the subscription are different, for each keyword in , and [2] computed the lower spatial similarity bound . This prefix of each subscription with lower spatial similarity bound is called spatial-oriented prefix. If subscription is relevant to message , there must exist a keyword in such that .

Based on the spatial-oriented prefix, [2] devised a filter-verification framework. In particular, an inverted index is built on the spatial-oriented prefixes first. Then, in the filter phase, for each message keyword , the framework retrieves the inverted list of and for each subscription in , if , is put into the candidate set. In the verification phase, based on (1), the framework verifies whether each candidate is an answer, and if yes, the message is delivered to .

3. Authenticated Location-Aware Publish/Subscribe Framework

In this section, we present our proposed authenticated location-aware publish/subscribe framework. In the publish/subscribe scenario, the messages delivered to the subscribers need to be verified as correct or not (i.e., soundness and completeness). However, compared with the subscriptions data, the messages data set is infinite, which can be regarded as the stream data. In such a situation, we (actually the data owner in the practical framework) cannot construct an authenticated data structure () over the infinite messages data and, based on such a structure, construct the for subscribers’ authentication. Therefore, intuitively, we need to sign every coming message and when the signed message is delivered to its corresponding subscribers, they can authenticate this message. However, when many messages need to be delivered to only one subscriber (the subscriber registers many interests in the framework), since every message has a signature, the communication cost between the service provider and this subscriber is high. Moreover, since the decryption of the signature is not a cheap operation, the authentication cost at the subscriber is also high. To tackle this problem, we present an authenticated location-aware publish/subscribe framework, which not only can deliver the messages more efficiently than the framework in the existing work [2], but also can make the subscribers’ authentication available with low communication and authentication cost.

The main idea of our framework is to assume that the messages are allowed a maximum delay to be delivered to their corresponding subscribers. Under this circumstance, a batch of messages, rather than only one message, can be computed at a time. We organize these messages in a Merkle hash tree () like structure (i.e., the ). When more than one message is delivered to a subscriber, only one signature is returned, thereby reducing the communication and authentication cost. Moreover, recall that, in [2], an inverted index of spatial-oriented prefixes of all the subscriptions is constructed and when a message comes, the framework retrieves the inverted index to compute which subscription is relevant to this message. The message needs to be computed with every subscription in of every message keyword . To reduce the computational cost and improve the efficiency of messages delivery, we present an inverted index pruning technique. By using the constraint of the spatial proximity between these messages (the messages are also organized in an R-tree like structure), we can prune some subscriptions which cannot become the delivery destinations from the inverted index and thus they need not be involved in the further computation.

3.1. Text-Aware Merkle R-Tree (TMR-Tree)

We first introduce the method of constructing the , called Text-aware Merkle R-tree (TMR-tree), at the data owner side. Consider a predefined maximum permissible delivery delay . The data owner builds one TMR-tree on all the messages within every time interval (). Specifically, the TMR-tree has four main features:(i)The messages in are spatially organized in an R-tree.(ii)Each node has a pseudo-text which includes the union of the keywords in its children’s texts. A node with children and has pseudo-text .(iii)Similar to the , the TMR-tree stores one hash value in each node. Assume the default fanout of the TMR-tree is 2. A leaf node with children (messages) and stores hash value . An internal node with children and stores hash value , where and are the hash values of and . More specifically, the spatial and textual information of () are both involved in the computation, that is, if is a leaf node in the TMR-tree and if is an internal node in the TMR-tree.(iv)The hash value of the root of the TMR-tree is signed by the data owner, producing signature .

Example 1. Figure 3 shows an example of the TMR-tree constructed over the messages in Figure 2. Here the fanout of the TMR-tree is set as 2. The leaf node has a pseudo-text which is the union of its children’s texts, that is, (). Since no matter the leaf or internal node is represented by a rectangle area (the Minimum Bounding Rectangle () of messages or other in it), its location is defined by two points which can be the bottom-left and upper-right points. For example, is the of and and its location includes the rectangle’s bottom-left and upper-right points ( and ). The leaf node stores a hash value which summarizes the authentication information about and it is computed through the spatial and textual information of its children (messages); that is, . Similarly, the hash value stored in internal node is computed through the spatial and textual information of its children and their hash values; that is, .

3.2. Filter-Verification Framework with Inverted Index Pruning

We use the idea of filter-verification framework proposed in [2] and, based on this, present an inverted index pruning technique to reduce the computational cost, thereby improving the efficiency of messages delivery.

Since a location and a pseudo-text are associated with a node in the TMR-tree, each node in the TMR-tree can be treated as a dummy message. The main idea of our inverted index pruning technique is based on the following proposition.

Proposition 2. If some subscriptions in the inverted index of spatial-oriented prefixes () are not relevant to a node (a dummy message) in the TMR-tree, these subscriptions can be safely pruned from since ’s children (other dummy messages or true messages) cannot be relevant to these subscriptions with certainty.

Proof. If a subscription in is not relevant to a node , we have , . According to (3), we have, that is, is ’s child, since , we haveTherefore, is not relevant to either.

Algorithm 1 shows the pseudo-code of our framework with the inverted index pruning technique. It takes the TMR-tree and as input. For each node and each message in leaf nodes in the TMR-tree, we use () to denote the inverted index where the subscriptions in () are likely to be relevant to (), which is pruned from of ’s (’s) parent. We let () denote of ’s (’s) parent. Thus, of each node (or message) is pruned from its , which is its parent’s . In the beginning, we set the root’s (i.e., TMR-tree.Root.) as , which is the inverted index of spatial-oriented prefixes of all the subscriptions without any pruning (line (1)). Then, we initialize an empty stack and push the root into it (lines (2)-(3)). In the filter phase, every element in is computed until is empty (lines (4)–(11)). First, we pop an element from (line (5)). Then, for each keyword in , we retrieve the inverted list of and for each subscription in , if , is added to (lines (6)–(9)). Other subscriptions which do not satisfy the condition in line (8) are pruned from . If and is not a message, that is, there exist some subscriptions in which might be relevant to ’s children, ’s children are put into stack (lines (10)-(11)). The subscriptions in each are the candidates which might be relevant to . In the verification phase, for each message , we verify whether each candidate in is the answer of and if yes, is added to the answer set of , that is, (lines (12)–(15)). After computing all the messages in the TMR-tree, all the answer sets () are together returned (line (16)). Here we assume that there are messages that come within . Each subscription in is the delivery destination of the message .

Input: TMR-tree: the text-aware Merkle R-tree on messages within ; : the inverted index of spatial-oriented prefixes
Output: : answers of each message in the TMR-tree
(1) TMR-tree.Root.;
(2) Initialize Stack ;
(3) .Push(TMR-tree.Root);
(4) while   is not empty do
(5).Pop();
(6)for each    do
(7)for each    do
(8)if    then
(9).Add();
(10)if    and   is not a message  then
(11).Push(.children);
(12) for  each  do
(13)for  each    do
(14)if  Verify()  then
(15).Add();
(16) return ;

Example 3. Figure 4 shows an example of procedures of our proposed inverted index pruning technique in filter-verification framework. In step ①, the root is popped from stack first. Then, for each keyword in ( to and ), we retrieve the corresponding inverted list in to compute the spatial similarity between the root and each subscription in this inverted list. Take subscription as an example, we retrieve and compute the spatial similarity between the root and : , which equals 1. Since is greater than , is added to . Notice that the values of spatial similarity between the root and , , in and , , in are smaller than the values of these subscriptions’ ; that is, they do not satisfy the condition in line (8) in Algorithm 1. Therefore, they are pruned from and are not added to . Then, since , its children and are pushed into stack . Similarly, in step ②, in , in , in , , in , and in are pruned from . Note that, in step ③, since does not contain the keywords and , we need not compute the spatial similarity between and each subscription in inverted lists and (indicated by “” in the example), and these subscriptions are also pruned from . At last, in step ④, the candidates which might be relevant to are , , , , and . Compared with all the subscriptions in the unpruned inverted index , the computational cost is reduced dramatically.

Time Complexity. For the convenience of comparison between the state-of-the-art method for location-aware publish/subscribe and our proposed filter-verification framework with inverted index pruning technique, we first give the time complexity of filter-verification framework proposed in [2] as the following proposition.

Proposition 4. The time complexity of delivering one message to its relevant subscriber in filter-verification framework proposed in [2] is , where is the inverted index built on the spatial-oriented prefixes.

Proof. In the filter phase, for each keyword in (including the dummy keyword “”), the framework retrieves the inverted list of and for each subscription in , if , is a candidate to the message and we add it into the candidate set. Therefore, the time complexity of filtering is .
In the verification phase, based on the spatio-textual similarity function (see (1)), and are relevant, if and only if can be easily computed in time and can be materialized; thus it is easy to compute . To compute , we check whether each keyword in appears in . If yes, we add the corresponding weight into . To facilitate the checking, we build a hash map for keywords in . (We only need to build the hash map for the message once.) Thus the time complexity of verifying a subscription is . Therefore, the time complexity of delivering one message to its relevant subscriber in filter-verification framework proposed in [2] is .

The time complexity of our proposed filter-verification framework with inverted index pruning technique is given by the following proposition.

Proposition 5. The time complexity of delivering messages that come within to their relevant subscribers in our proposed filter-verification framework with inverted index pruning technique is , where is the fanout of the TMR-tree, is a node in the TMR-tree (also can be treated as a dummy message), and is the inverted index associated with ’s parent in which the subscriptions are likely to be relevant to ’s parent.

Proof. A TMR-tree is constructed over the messages that come within . If the fanout of the TMR-tree is , in the worst case, the height of the TMR-tree (excluding the layer of messages) is . Thus, the number of internal and leaf nodes in the TMR-tree (assuming the root has depth 1) isIn the filter phase, when visiting a node (or a message ) in the TMR-tree, we retrieve its , prune (or ), and generate (or ). Therefore, the time complexity of filtering with inverted index pruning is .
In the verification phase, suppose each message within is delivered to only one subscriber. The time complexity of verifying whether subscriptions are the answers of messages is . Therefore, the time complexity of delivering messages that come within to their relevant subscribers in our proposed filter-verification framework with inverted index pruning technique is .

Compared with the filter-verification framework proposed in [2], our proposed filter-verification framework with inverted index pruning technique needs to visit more inverted indexes (the inverted indexes associated with internal and leaf nodes in the TMR-tree). However, since the subscriptions in each node’s inverted index are constantly pruned from the root to the leaf nodes in the TMR-tree, the total times of inverted index traversal are reduced. Therefore, our proposed filter-verification framework with inverted index pruning technique can be considered efficient, which also can be demonstrated from our experimental study (Section 4).

3.3. Construction and Authentication

After finding the subscribers who are the delivery destinations of messages, that is, , the service provider still needs to construct a for each subscriber for their authentication. Algorithm 2 shows the pseudo-code of constructing the . It takes the TMR-tree and answers of each message () as input. First, we initialize a for each subscription in () with the root of the TMR-tree (lines (1)-(2)). Then, we initialize an empty queue and put the root into it (lines (3)-(4)). Every element in is computed until is empty (lines (5)–(12)). When the distance between a message and the picked element (from ) is smaller than 0, that is, is in the subtree rooted at , for each subscription in , we replace in with three parts: (1) the token “[”; (2) ’s children; and (3) the token “]” (lines (6)–(10)). Note that we use a pair of tokens “[” and “]” to indicate the scope of the entries in . Then, if ’s children are not in and they are not messages, they are put into (lines (11)-(12)). At last, the constructed is delivered to each subscriber with the corresponding messages (line (13)). Here we assume that there are subscribers to whom the messages will be delivered.

Input: TMR-tree: a text-aware Merkle R-tree on messages within ; : answers of each message in the TMR-tree
Output: : the for each subscription
(1) for  each    do
(2).Init(TMR-tree.Root);
(3) Initialize Queue ;
(4) .Put(TMR-tree.Root);
(5) while   is not empty do
(6).Pick();
(7)for  each  do
(8)if  Dist  then
(9)for  each    do
(10).Replace(, “[” + .children + “]”);
(11)if    and   then
(12).Put();
(13) return ;

Example 6. Following the example in Figure 2, after computing the delivery destinations of messages to , we obtain their answer sets as follows: , , , , , , and . Here we take the construction of the of () as an example (shown in Figure 5) and messages , , , and will be delivered to . In step ①, the root is picked from the queue . Obviously, , , , and are all smaller than 0. Therefore, we replace “root” in with “”. Since and are not in and they are not messages, they are put into . The steps ②, ③, and ④ are similar to ①. In step ⑤, since , , , and are not in the subtree root at , that is, , , , and are all greater than 0, “” in is not replaced and thus remains unchanged. Due to the similar reason, in step ⑥, still remains unchanged. At last, in step ⑦, after computing the last element in , is “”.

To authenticate the soundness of delivered messages, each subscriber needs to scan their to recompute the hash value of the root of the TMR-tree and compare it against the root signature using the data owner’s public key distributed by the . Since each includes the entries which have been visited during messages delivery, the subscriber can simulate the procedure of the TMR-tree traversal and recursively reconstruct each and compute its hash value in a bottom-up manner. Specifically, each and its hash value can be computed from the entries in its child node which are indicated by “[” and “]”.

To authenticate the completeness of delivered messages, the subscriber needs to check that each message in results is indeed present in and whether they satisfy the parameters and . What is more, the subscriber still needs to check that the other entries returned in the do not satisfy and .

Example 7. Still taking as an example, the subscriber can recursively reconstruct from and , from and , from and , from and , and at last the root from and and compute its hash value to compare it against the root signature to authenticate the soundness of delivered messages , , , and . As for authenticating the completeness of , , , and , the subscriber needs to recompute whether they satisfy and , while and do not.

From the example we can see that when more than one message is delivered to a subscriber, only one signature is returned, thus reducing the communication and authentication cost.

Space and Time Complexity. We first give a baseline method for the problem of authenticating messages in outsourced location-aware publish/subscribe services. Then, we give the space complexity of its and compare it with our proposed authenticated location-aware publish/subscribe framework. We also compare the authentication’s time complexity of baseline method and our framework.Baseline: the data owner signs every message within and when the signed messages are delivered to their corresponding subscribers, each consists of the messages (to ) and their signatures. Then, the subscriber can verify the soundness by computing the hash value of each message in and comparing it against the message’s signature. Recomputing the spatio-textual similarity between each message in and the subscription enables the subscriber to verify the completeness.

The space complexity of the of baseline method is given by following proposition.

Proposition 8. If there are messages which are delivered to a subscriber at one time, the size for , that is, the space complexity of , is , where is the size of the signature and each includes the size of its spatial and textual information.

Compared with the baseline method, the space complexity of of our proposed authenticated location-aware publish/subscribe framework is given by the following proposition.

Proposition 9. If there are messages which are delivered to a subscriber at one time, the space complexity of is , where is the size of the signature. is a dummy message and we assume there are dummy messages that are included in .

From the above propositions we can see that, in our proposed filter-verification framework with inverted index pruning technique, if more than one message is delivered to a subscriber, only one signature is returned. Although our framework has dummy messages in its , its size is still smaller than that of the baseline method when is large since the signatures are space consuming.

Since the authentication time is co-related to the size of , the time complexity of authentication of our proposed filter-verification framework with inverted index pruning technique is also smaller than that of the baseline method.

4. Experimental Study

In this section, we proceed to conduct extensive experiments to evaluate the performance of our proposed authenticated location-aware publish/subscribe framework.

4.1. Experiment Setup
4.1.1. Datasets

Similar to [2], we use a real-world dataset POI which contains 10 million points of interests in USA. We randomly select 1–5 keywords from each POI to generate subscriptions. Thus the average keyword number in each subscription is 3. The maximum permissible response delay and the messages delivery interval () are both set as 5 mins. During this interval, we randomly select 2000 POIs as messages. To generate long messages, we combine 10 POIs as a single message. The average keyword number in each message is 41.

4.1.2. Parameters

The performance of our proposed framework is evaluated by varying the preference (0.1, 0.3, 0.5, 0.7, and 0.9) and threshold (0.5, 0.6, 0.7, 0.8, and 0.9). We set as 0.5 and as 0.7 in the default setting. When we vary a parameter, the other parameter will be in the default setting. We use inverted document frequency () to generate keywords weights.

4.1.3. System Configuration

All the experiments are run on a server with Intel(R) Xeon(R) CPU E5-2609 v2 @2.5 GHz (Quad Core) and 64 GB RAM, running Linux Ubuntu. We use in-memory setting and the programs are implemented in C++.

4.1.4. Performance Metrics

The metrics for performance evaluation include(i)PAS and PC: percentage of accessed subscriptions and candidates, which indicate the ratios of accessed subscriptions in the inverted index of spatial-oriented prefixes and candidates to the number of total subscriptions(ii)FS: time of finding the relevant subscriptions for each message within (iii)CVO: time of constructing the (iv)VOS: size, which affects the communication cost between the service provider and subscribers(v)AM: time of authenticating the messages at the subscribers side

Note that, in our framework, we process a batch of messages at one time; thus each time we first get a total value of each metric. Then, for the metrics PAS, PC, and FS, we report the average value corresponding to each message and, for the metrics CVO, VOS, and AM, we report the average value corresponding to each subscriber.

4.1.5. Algorithms

For metrics (i), (ii), and (iii), algorithms to be evaluated in our experiments include (1) SP (the method of finding the relevant subscriptions for each message using the spatial-oriented prefixes, which is proposed in [2]); (2) SP + IIP (our filter-verification framework with inverted index pruning technique); (3) VOC (our method of constructing the ).

For metrics (iv) and (v), algorithms to be evaluated include (1) ALPF (our authenticated location-aware publish/subscribe framework) and (2) BL (the baseline method).

Note that, to the best of our knowledge, this is the first attempt to define and solve the problem of authenticating messages in outsourced location-aware publish/subscribe services. Therefore, no existing algorithm is included in our experiments as comparative analysis.

4.2. Performance Study
4.2.1. Cost at the Service Provider

The cost at the service provider is evaluated from two aspects. First, in Figure 6, we evaluate the ratios of accessed subscriptions and candidates (as a function of and ) to the number of total subscriptions (PAS and PC), where the accessed subscriptions refer to subscriptions that are accessed in the inverted index and candidates refer to subscriptions that are verified using the Verify function in Algorithm 1. Second, as shown in Figure 7, we evaluate the running time (as a function of and ), which includes the time of finding the relevant subscriptions for each message (FS) and constructing the (CVO).

According to Figures 6 and 7, we make the following observations. First, SP + IIP outperforms SP; that is, the PAS and PC of SP + IIP are both smaller than those of SP (shown in Figure 6). Besides, FS of SP + IIP is smaller than that of SP (shown in Figure 7). The reason lies in that SP + IIP uses the inverted index pruning technique to prune the subscriptions from the inverted index of spatial-oriented prefixes. These pruned subscriptions are not relevant to the messages and thus they need not be involved in the computation. Second, with the increase of , the performance of SP and SP + IIP increases, because for larger there are smaller number of subscriptions required to be visited and verified, and we have greater opportunity to prune more irrelevant subscriptions. Third, with the decrease of , SP and SP + IIP take much longer time, because for smaller , the spatial similarity is more important and they cannot estimate accurate prefix bounds. Fourth, as () increases, CVO increases (decreases) slightly since we use the answers of each message to construct the and CVO depending on the number of answers. With the increase of (), we get more (less) answers and thus CVO increases (decreases). Fifth, although in our framework it costs extra time to construct for subscribers’ authentication, the total running time (FS + CVO) is still better than SP. For example, in Figure 7, when , SP + IIP costs around 60 ms and VOC costs about 20 ms, thus the total running time is about 80 ms, which is still less than the cost of SP, 90 ms.

4.2.2. Cost between the Service Provider and Subscribers

We evaluate the metric VOS, that is, size, which affects the communication overhead between the service provider and subscribers. Figure 8 shows VOS under the experimental settings by varying and .

From Figure 8, we make the following observations. First, ALPF outperforms BL since we process a batch of messages rather than only one message at a time and when many messages are delivered to a subscriber , the consists of only one signature, which is computed using the root hash value of the TMR-tree. However, in BL, the would include the signatures of every message. Second, with the increase of (), VOS increases (decreases) in a near linear manner. The reason lies in that VOS depends on the number of messages delivered to each subscriber. When () increases, the number of answers of each message increases (decreases) and, conversely, the number of messages delivered to each subscriber increases (decreases). Third, the biggest value of VOS is about 240 KB when . This value is acceptable especially when more than one message needs to be verified by a subscriber.

4.2.3. Cost at the Subscribers

The last metric AM, that is, the time of authenticating the messages at the subscribers side, is evaluated. AM is crucial since the subscribers may have limited computing resources. Figure 9 shows AM as a function of and .

According to Figure 9, we first find that, in ALPF, it always costs subscribers less time to authenticate the messages delivered to them than that in BL. This is because in ALPF when the soundness is verified, subscribers just need to decrypt one signature and recompute the root hash value of the TMR-tree to compare against it. However, in BL, the number of decryption operations equals the number of messages delivered to the subscribers but decryption is not a cheap operation comparing with the hashing operation. Thus, ALPF outperforms BL. Second, we find that, with the increase of (), AM increases (decreases) in a near linear manner since AM is always related to VOS and they have the same changing situation. Third, the worst case of authenticating the messages costs subscribers about 1.2 s, which is reasonable and would not have too many bad effects on the subscribers experience.

4.2.4. Security Analysis

In this paper, we study the problem of authenticating messages in outsourced location-aware publish/subscribe services. Therefore, our goal of security analysis is to prove that our proposed authenticated location-aware publish/subscribe framework can guarantee the verification of soundness and completeness of messages by their corresponding subscribers.

Proof of Soundness. Assume that a message delivered to a subscriber is bogus or modified. In this paper, we adopt the commonly used hash function SHA1 [3]. Because SHA1 is collision-resistant and the hash value of the root of the TMR-tree is computed recursively from the messages that come within , which must include , the recomputed root hash value of the TMR-tree cannot be verified against the signature, which can be detected by the subscriber . Therefore, through our framework, subscribers can receive sound messages from the service provider.

Proof of Completeness. Let be a message satisfying the parameters and which is delivered to a subscriber . For the recomputed hash value of the root of the TMR-tree to match the signature (i.e., the soundness is satisfied), there are the following two cases: (i)The message is included in the corresponding . In this case, the subscriber can confirm whether this message is the result of using the returned spatial and textual information of .(ii)The message is not included in the corresponding . In this case, it must be in the subtree rooted at which is included in . However, the subscriber cannot make sure that does not satisfy and since if is the result, must satisfy and , which alarms the subscriber about potential violation of the completeness. Therefore, through our framework, subscribers can receive complete messages from the service provider.

Our work is related to the location-aware publish/subscribe and authenticated query processing. Sections 5.1 and 5.2 retrospect the related work done in these areas.

5.1. Location-Aware Publish/Subscribe

Recently, location-aware publish/subscribe has attracted considerable attention. Most studies in this field can be categorized according to different evaluation methods of relevancy between subscriptions and messages [2, 48]. In particular, [47] use a spatial region to indicate the spatial information of each subscription and spatial overlap to evaluate spatial similarity and “AND”, “OR” semantics or Boolean expressions to evaluate textual relevancy, while [2, 8] combine the textual relevancy and spatial similarity into a ranking function to quantify the relevancy between subscriptions and messages.

More specifically, regarding the first category, Chen et al. [4] study the problem of matching Boolean range continuous queries over a stream of incoming spatio-textual messages in real time. A Boolean range continuous query is to continually retrieve the spatio-textual messages arriving before the user-specified expiration time such that the retrieved spatio-textual messages satisfy the user’s keywords which are connected by “AND” or “OR” semantics and are located in the query range. The authors present IQ-Tree, which is a hybrid index based on Quad-tree and inverted files. In [5], Li et al. study the location-aware publish/subscribe, which delivers a message to its corresponding subscribers having spatial overlap with the message and all the keywords in the subscriptions are contained in the message (“AND” semantic). They propose the -tree, which extends the R-tree by selecting some representative keywords from subscriptions and adding them into R-tree nodes to enable textual pruning. Both matching algorithms of [4, 5] follow the filtering-and-refinement paradigm. More recently, although they study the same problem, Wang et al. [6] find that, in [4, 5], the spatial factor is always prioritized during the index construction regardless of the keyword distribution of the query set and the inverted indexing technique is not well-suited to textual filtering. Therefore, they utilize the keyword partition and space partition in one tree structure when constructing the index for queries based on expected matching cost. They compute the cost based on the number of queries associated with each partition and the probability of whether the partition is explored during message matching, instead of the complexity of filter and verification steps. Guo et al. [7] study filtering dynamic streams for continuous moving Boolean subscriptions. Different from previous works, it continuously monitors users’ locations and sends nearby messages in real time and it allows users to specify their interests with Boolean expressions, which provides better flexibility and expressiveness in shaping an interest.

With respect to the second category, as introduced in Section 2.3, Hu et al. [2] study the parameterized location-aware publish/subscribe, which requires subscribers to specify parameters to enable personalized filtering. In [8], Chen et al. study top- spatial-keyword publish/subscribe, which aims to continuously feed the user with new spatio-textual messages whose temporal spatial-keyword scores are ranked within the top-. They use a Quad-tree to partition the whole space. Each subscription is assigned to a number of covering cells, forming a disjoint partition of the entire space and an inverted file ordered by subscription id is built to organize the subscriptions assigned to each cell.

5.2. Authenticated Query Processing

Authenticated query processing has been studied extensively. Most studies on query authentication are based on an , Merkle hash tree () [1], as introduced in Section 2.3. The notion of the is generalized to multiway trees and widely adapted to various index structures. Typical examples include the Merkle B-tree and its variant Embedded Merkle B-tree [9]. Following the concept of the , the authenticated query processing problem has also been studied for the relational data [9, 10], data streams [1115], and textual search engines [16].

In the spatial databases domain, based on the , there are also many query authentication applications. Yang et al. [17] first introduce the query authentication problem to the domain of spatial data and study the authentication of spatial range queries. They propose an called MR-tree, which combines the ideas of MB-tree [9] and -tree [18]. Yiu et al. investigate how to efficiently authenticate moving NN queries [19], moving range queries [20], and shortest-path queries [21]. More recently, Hu et al. [22] and Chen et al. [23] develop new schemes for range and top- query authentication that preserve the location privacy of queried objects. Besides, Lin et al. [24] investigate the authentication of location-based skyline queries. A new called MR-Sky-tree is proposed. Authentication of reverse nearest neighbor query is studied by Li et al. in [25]. For the mixed data types, such as spatio-textual data, Su et al. [26] and Wu et al. [27] study the authentication problem for snapshot and moving top- spatial-keyword queries, respectively. Yan et al. [28] explore the authentication problem in the area of spatio-textual similarity joins. Instead of only supporting the relational data as [10] does, the proposed authentication schemes in [28] can support spatial data. Zhang et al. [29] study the authentication of location-based top- queries which ask for the POIs in a certain region and with the highest ratings for an interested POI attribute.

Besides the , there are some other index structures which can be used to construct the , such as Voronoi diagram and prefix-tree. Hu et al. [30] propose a novel approach that authenticates spatial queries based on the neighborhood information derived from the Voronoi diagram. The problem of authenticating query results in data integration services is studied by Chen et al. in [31], which addresses multisource data authentication that can simultaneously support a wide range of query types. Based on the prefix-tree, they propose Homomorphic Secret Sharing Seal, which is to merge the authentication codes of nonresult values with a common prefix, thus allowing them to be verified as a whole.

6. Conclusion

In this paper we have studied the problem of authenticating messages in outsourced location-aware publish/subscribe services. We propose an authenticated location-aware publish/subscribe framework, including an TMR-tree to organize the messages that come within , a filter-verification framework with inverted index pruning technique to efficiently deliver the messages to their relevant subscribers, and the methods of constructing the and authenticating the delivered messages at the subscribers side. Experimental results on a real-world dataset show that our framework achieves high performance.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61502047 and the Coconstruction Program with the Beijing Municipal Commission of Education.