Abstract
Locationaware publish/subscribe is an important locationbased service based on serverinitiated model. Often times, the owner of massive spatiotextual messages and subscriptions outsources its locationaware publish/subscribe services to a thirdparty service provider, for example, cloud service provider, who is responsible for delivering messages to their relevant subscribers. The issue arising here is that the messages delivered by the service provider might be tailored for profit purposes, intentionally or not. Therefore, it is essential to develop mechanisms which allow subscribers to verify the correctness of the messages delivered by the service provider. In this paper, we study the problem of authenticating messages in outsourced locationaware publish/subscribe services. We propose an authenticated framework which not only can deliver the messages efficiently but also can make the subscribers’ authentication available with low cost. Extensive experiments on a realworld dataset demonstrate the effectiveness and efficiency of our proposed authenticated framework.
1. Introduction
With the rapid development of mobile Internet and positioningenabled devices (e.g., smart phones), massive amount of data that contain both text information and geographical location information are being generated at an unprecedented scale on the Web. This enables locationbased services (), such as Foursquare (https://foursquare.com) and Yelp (https://www.yelp.com), to be extensively deployed in many systems and widely accepted by Internet users. Locationaware publish/subscribe is an important kind of service based on serverinitiated model (relative to userinitiated model, like spatialkeyword query) in . For example, in a Groupon system, subscribers register their spatiotextual subscriptions to capture their interests (e.g., “Adidas shoe discount at Beijing, China”) (for the rest of this paper, we use “subscriber” and “subscription” interchangeably if the context is clear). For each Groupon message with textual description and location (e.g., “Adidas running shoes at cheap prices at Adidas factory store, Beijing, China”), the system delivers the message to relevant subscribers.
Since locationaware publish/subscribe is a computeintensive task, if the data owner of massive spatiotextual messages and subscriptions wants to efficiently deliver each message to relevant subscribers, to strengthen its ability of computing, it needs to build up basic IT infrastructure and hire specialized personnel. However, as such cost might be unaffordable for smalltomedium businesses, outsourcing the data and computations to a thirdparty service provider (e.g., a cloud service provider) has been an appealing option. Yet, this outsourcing model presents a great challenge that the messages delivered by the service provider might be incomplete or incorrect. There are a variety of reasons for this. First, the service provider might deliver tailored messages to favor its sponsors. Second, the service provider might use some inferior algorithms and deliver the suboptimal messages to the subscribers to save computing resources. Third, with the growing popularity of the cloud, more and more security breaches and attacks on such systems have been reported. In case an attacker takes control of the service provider’s server, it may forge the messages for its own interest.
The aforementioned reasons necessitate the development of mechanisms that allow subscribers to authenticate the messages delivered by the service provider. They should be verified in terms of two conditions: (1) soundness and (2) completeness. The former means that the messages are not tampered with, while the latter implies that no valid message is missing.
In this paper, to make one step further towards practical deployment of locationaware publish/subscribe in untrusted outsourcing environments, we study the problem of authenticating messages in outsourced locationaware publish/subscribe services. To address this problem, we present an authenticated locationaware publish/subscribe framework. We assume that messages are allowed a maximum delay to be delivered to their corresponding subscribers. The data owner organizes the messages within () in an authenticated data structure () called TMRtree. Based on the TMRtree, the service provider first computes the relevant messages for each subscription. During this process, we present an inverted index pruning technique to reduce the times of inverted index (used to index the subscriptions) traversal, thus improving the efficiency of computing the relevant messages for each subscription. Then, the service provider constructs a verification object () for each subscription and the corresponding subscriber can authenticate the messages delivered to it. A thorough experimental study on a realworld dataset is conducted over a wide range of workload settings to evaluate the effectiveness and efficiency of our proposed framework in terms of various performance metrics.
Roadmap. The rest of this paper is organized as follows. Section 2 introduces some preliminaries, which include system model, problem definition, and background knowledge. Section 3 presents our proposed authenticated locationaware publish/subscribe framework. In Section 4, we experimentally evaluate the performance of our proposed framework. Related work on the locationaware publish/subscribe and authenticated query processing is surveyed in Section 5. In the end, we conclude the paper in Section 6.
2. Preliminaries
In this section, we first describe our system model. Then, we define the problem studied in this paper. At last, we introduce some background knowledge on cryptographic primitives and locationaware publish/subscribe which underlie our proposed framework.
2.1. System Model
As shown in Figure 1, our system involves four entities: the data owner, the service provider, the subscribers, and the key distribution center ().
First, the data owner builds an authenticated data structure () over the messages within (; recall that is a predefined maximum permissible delivery delay) and signs the using the private key distributed by the . Then, the data owner outsources the locationaware publish/subscribe services to the service provider, who provides the storage resources for the messages, the , the signature of the , and algorithms. Based on the , the service provider finds the messages which are relevant to the registered subscriptions and constructs a verification object () for each subscription. After that, the service provider delivers the messages and the to corresponding subscribers. The subscribers authenticate the soundness and completeness of these messages using the and the public key distributed by the .
Throughout this paper, we assume that (1) the and the data owner are trusted but the service provider is the potential adversary and might fabricate the messages (intentionally or not); (2) the or the data owner does not collude with the service provider; (3) the computation and storage capacities of the service provider are polynomially bounded.
2.2. Problem Definition
In this paper, we study the problem of authenticating messages in outsourced locationaware publish/subscribe services. That is, the subscribers register their interests as subscriptions in the system first. Then, the service provider not only needs to efficiently deliver the messages within to the relevant subscribers whose subscriptions have high relevancy to the messages, but also needs to construct a for each subscriber to allow them to authenticate the soundness and completeness of the delivered messages. The should be constructed as small as possible for minimizing the communication cost between the service provider and subscribers. Meanwhile, the should be suitable for subscribers’ authentication for minimizing the computational cost at the subscribers side.
2.3. Background Knowledge
2.3.1. Cryptographic Primitives
We present the essential cryptographic primitives on oneway hash function, cryptographic signature, and Merkle hash tree as follows.
OneWay Hash Function. A oneway hash function maps a message of arbitrary length to a fixedlength output . It works in one direction. It is easy to compute for a message . However, it is computationally infeasible to find a message that maps to a given .
Cryptographic Signature. A cryptographic signature (or simply signature) is a mathematical scheme for demonstrating the authenticity of a digital message. A signer applies for a pair of private key and public key from the . The former is kept by the signer secretly and the latter is publicly distributed. A digital message can be signed using the private key. The authenticity of the message can be verified by anyone who receives this message using the public key.
Merkle Hash Tree. The Merkle hash tree () [1] is an authenticated data structure used for collectively authenticating a set of messages. The is a binary tree and built in a bottomup manner, by first computing the hash values of the messages in leaf nodes. The hash value of each internal node is derived from its two children nodes. Finally, the hash value of the root is signed by the owner of the messages. The can be used to authenticate any subset of messages, in conjunction with a proof. The proof consists of the signed root and sibling nodes (auxiliary hash values) on the path from the root down to the messages which need to be authenticated.
2.3.2. LocationAware Publish/Subscribe
We present the stateoftheart method [2] for locationaware publish/subscribe as follows.
A locationaware publish/subscribe service delivers each message, denoted by , to its relevant subscribers who register spatiotextual subscriptions (each subscription is denoted by ) to capture their interests. () is a spatial location with the latitude and longitude. () is a set of keywords () and each keyword is associated with a weight which can be set as the inverted document frequency () of the keyword. To quantify the relevancy between a subscription and a message, [2] used a spatiotextual similarity functionwhereis a textual similarity function which is similar to the weighted Jaccard coefficient andis a spatial similarity function, where is the Euclidian distance between and , and is the maximum usertolerated Euclidian distance between subscriptions and messages (which can be set as the maximum distance between subscriptions). is a preference parameter to tune the weight of textual and spatial similarity. A subscription and a message are called relevant if their similarity exceeds a threshold . Since subscribers usually have different preferences and requirements on and (e.g., some subscribers prefer highly relevant results while some subscribers want to get more results), subscribers are allowed to parameterize their parameters and . Therefore, a parameterized spatiotextual subscription can be redefined as . Figure 2 shows an example of 11 parameterized spatiotextual subscriptions and 7 messages.
To deliver messages to relevant subscribers efficiently, [2] proposed a spatialoriented prefix to prune irrelevant subscriptions and devised a filterverification framework. In particular, with respect to the textual filter, [2] claimed that if a subscription is relevant to a message , they must share at least one common keyword in the socalled prefix of , which is computed from the textual similarity threshold. More specifically, based on (1), given a subscription , since the spatial similarity cannot exceed 1, [2] deduced a textual similarity threshold
When , based on , [2] selected a prefix for each subscription . The keywords in are sorted by their weights in descending order and a minimum such thatis computed, where is the total weight of keywords in . Therefore, the prefix of can be defined as . Since the total weight of keywords after is smaller than , if a subscription is relevant to a message (i.e., ), they must share at least one common keyword in .
When , a message may be relevant to no matter whether they share common keywords. To address this issue, for a subscription , if , [2] introduced a virtual dummy keyword “” with weight of 0 (i.e., ), and the prefix of includes its keywords and “”.
Regarding the spatial filter, based on the first match keyword (denoted by ) between and (i.e., does not contain keywords before in ), [2] estimated an upper textual similarity bound of to as follows:Accordingly, [2] estimated a lower spatial similarity bound between and as follows:For any message, if its spatial similarity to is smaller than the lower spatial similarity bound , the subscription can be safely pruned.
Since given a subscription and a message , we do not know which keyword is their first match keyword (if they have), and the first match keywords for different messages to the subscription are different, for each keyword in , and [2] computed the lower spatial similarity bound . This prefix of each subscription with lower spatial similarity bound is called spatialoriented prefix. If subscription is relevant to message , there must exist a keyword in such that .
Based on the spatialoriented prefix, [2] devised a filterverification framework. In particular, an inverted index is built on the spatialoriented prefixes first. Then, in the filter phase, for each message keyword , the framework retrieves the inverted list of and for each subscription in , if , is put into the candidate set. In the verification phase, based on (1), the framework verifies whether each candidate is an answer, and if yes, the message is delivered to .
3. Authenticated LocationAware Publish/Subscribe Framework
In this section, we present our proposed authenticated locationaware publish/subscribe framework. In the publish/subscribe scenario, the messages delivered to the subscribers need to be verified as correct or not (i.e., soundness and completeness). However, compared with the subscriptions data, the messages data set is infinite, which can be regarded as the stream data. In such a situation, we (actually the data owner in the practical framework) cannot construct an authenticated data structure () over the infinite messages data and, based on such a structure, construct the for subscribers’ authentication. Therefore, intuitively, we need to sign every coming message and when the signed message is delivered to its corresponding subscribers, they can authenticate this message. However, when many messages need to be delivered to only one subscriber (the subscriber registers many interests in the framework), since every message has a signature, the communication cost between the service provider and this subscriber is high. Moreover, since the decryption of the signature is not a cheap operation, the authentication cost at the subscriber is also high. To tackle this problem, we present an authenticated locationaware publish/subscribe framework, which not only can deliver the messages more efficiently than the framework in the existing work [2], but also can make the subscribers’ authentication available with low communication and authentication cost.
The main idea of our framework is to assume that the messages are allowed a maximum delay to be delivered to their corresponding subscribers. Under this circumstance, a batch of messages, rather than only one message, can be computed at a time. We organize these messages in a Merkle hash tree () like structure (i.e., the ). When more than one message is delivered to a subscriber, only one signature is returned, thereby reducing the communication and authentication cost. Moreover, recall that, in [2], an inverted index of spatialoriented prefixes of all the subscriptions is constructed and when a message comes, the framework retrieves the inverted index to compute which subscription is relevant to this message. The message needs to be computed with every subscription in of every message keyword . To reduce the computational cost and improve the efficiency of messages delivery, we present an inverted index pruning technique. By using the constraint of the spatial proximity between these messages (the messages are also organized in an Rtree like structure), we can prune some subscriptions which cannot become the delivery destinations from the inverted index and thus they need not be involved in the further computation.
3.1. TextAware Merkle RTree (TMRTree)
We first introduce the method of constructing the , called Textaware Merkle Rtree (TMRtree), at the data owner side. Consider a predefined maximum permissible delivery delay . The data owner builds one TMRtree on all the messages within every time interval (). Specifically, the TMRtree has four main features:(i)The messages in are spatially organized in an Rtree.(ii)Each node has a pseudotext which includes the union of the keywords in its children’s texts. A node with children and has pseudotext .(iii)Similar to the , the TMRtree stores one hash value in each node. Assume the default fanout of the TMRtree is 2. A leaf node with children (messages) and stores hash value . An internal node with children and stores hash value , where and are the hash values of and . More specifically, the spatial and textual information of () are both involved in the computation, that is, if is a leaf node in the TMRtree and if is an internal node in the TMRtree.(iv)The hash value of the root of the TMRtree is signed by the data owner, producing signature .
Example 1. Figure 3 shows an example of the TMRtree constructed over the messages in Figure 2. Here the fanout of the TMRtree is set as 2. The leaf node has a pseudotext which is the union of its children’s texts, that is, (). Since no matter the leaf or internal node is represented by a rectangle area (the Minimum Bounding Rectangle () of messages or other in it), its location is defined by two points which can be the bottomleft and upperright points. For example, is the of and and its location includes the rectangle’s bottomleft and upperright points ( and ). The leaf node stores a hash value which summarizes the authentication information about and it is computed through the spatial and textual information of its children (messages); that is, . Similarly, the hash value stored in internal node is computed through the spatial and textual information of its children and their hash values; that is, .
3.2. FilterVerification Framework with Inverted Index Pruning
We use the idea of filterverification framework proposed in [2] and, based on this, present an inverted index pruning technique to reduce the computational cost, thereby improving the efficiency of messages delivery.
Since a location and a pseudotext are associated with a node in the TMRtree, each node in the TMRtree can be treated as a dummy message. The main idea of our inverted index pruning technique is based on the following proposition.
Proposition 2. If some subscriptions in the inverted index of spatialoriented prefixes () are not relevant to a node (a dummy message) in the TMRtree, these subscriptions can be safely pruned from since ’s children (other dummy messages or true messages) cannot be relevant to these subscriptions with certainty.
Proof. If a subscription in is not relevant to a node , we have , . According to (3), we have, that is, is ’s child, since , we haveTherefore, is not relevant to either.
Algorithm 1 shows the pseudocode of our framework with the inverted index pruning technique. It takes the TMRtree and as input. For each node and each message in leaf nodes in the TMRtree, we use () to denote the inverted index where the subscriptions in () are likely to be relevant to (), which is pruned from of ’s (’s) parent. We let () denote of ’s (’s) parent. Thus, of each node (or message) is pruned from its , which is its parent’s . In the beginning, we set the root’s (i.e., TMRtree.Root.) as , which is the inverted index of spatialoriented prefixes of all the subscriptions without any pruning (line (1)). Then, we initialize an empty stack and push the root into it (lines (2)(3)). In the filter phase, every element in is computed until is empty (lines (4)–(11)). First, we pop an element from (line (5)). Then, for each keyword in , we retrieve the inverted list of and for each subscription in , if , is added to (lines (6)–(9)). Other subscriptions which do not satisfy the condition in line (8) are pruned from . If and is not a message, that is, there exist some subscriptions in which might be relevant to ’s children, ’s children are put into stack (lines (10)(11)). The subscriptions in each are the candidates which might be relevant to . In the verification phase, for each message , we verify whether each candidate in is the answer of and if yes, is added to the answer set of , that is, (lines (12)–(15)). After computing all the messages in the TMRtree, all the answer sets () are together returned (line (16)). Here we assume that there are messages that come within . Each subscription in is the delivery destination of the message .

Example 3. Figure 4 shows an example of procedures of our proposed inverted index pruning technique in filterverification framework. In step ①, the root is popped from stack first. Then, for each keyword in ( to and ), we retrieve the corresponding inverted list in to compute the spatial similarity between the root and each subscription in this inverted list. Take subscription as an example, we retrieve and compute the spatial similarity between the root and : , which equals 1. Since is greater than , is added to . Notice that the values of spatial similarity between the root and , , in and , , in are smaller than the values of these subscriptions’ ; that is, they do not satisfy the condition in line (8) in Algorithm 1. Therefore, they are pruned from and are not added to . Then, since , its children and are pushed into stack . Similarly, in step ②, in , in , in , , in , and in are pruned from . Note that, in step ③, since does not contain the keywords and , we need not compute the spatial similarity between and each subscription in inverted lists and (indicated by “” in the example), and these subscriptions are also pruned from . At last, in step ④, the candidates which might be relevant to are , , , , and . Compared with all the subscriptions in the unpruned inverted index , the computational cost is reduced dramatically.
Time Complexity. For the convenience of comparison between the stateoftheart method for locationaware publish/subscribe and our proposed filterverification framework with inverted index pruning technique, we first give the time complexity of filterverification framework proposed in [2] as the following proposition.
Proposition 4. The time complexity of delivering one message to its relevant subscriber in filterverification framework proposed in [2] is , where is the inverted index built on the spatialoriented prefixes.
Proof. In the filter phase, for each keyword in (including the dummy keyword “”), the framework retrieves the inverted list of and for each subscription in , if , is a candidate to the message and we add it into the candidate set. Therefore, the time complexity of filtering is .
In the verification phase, based on the spatiotextual similarity function (see (1)), and are relevant, if and only if can be easily computed in time and can be materialized; thus it is easy to compute . To compute , we check whether each keyword in appears in . If yes, we add the corresponding weight into . To facilitate the checking, we build a hash map for keywords in . (We only need to build the hash map for the message once.) Thus the time complexity of verifying a subscription is . Therefore, the time complexity of delivering one message to its relevant subscriber in filterverification framework proposed in [2] is .
The time complexity of our proposed filterverification framework with inverted index pruning technique is given by the following proposition.
Proposition 5. The time complexity of delivering messages that come within to their relevant subscribers in our proposed filterverification framework with inverted index pruning technique is , where is the fanout of the TMRtree, is a node in the TMRtree (also can be treated as a dummy message), and is the inverted index associated with ’s parent in which the subscriptions are likely to be relevant to ’s parent.
Proof. A TMRtree is constructed over the messages that come within . If the fanout of the TMRtree is , in the worst case, the height of the TMRtree (excluding the layer of messages) is . Thus, the number of internal and leaf nodes in the TMRtree (assuming the root has depth 1) isIn the filter phase, when visiting a node (or a message ) in the TMRtree, we retrieve its , prune (or ), and generate (or ). Therefore, the time complexity of filtering with inverted index pruning is .
In the verification phase, suppose each message within is delivered to only one subscriber. The time complexity of verifying whether subscriptions are the answers of messages is . Therefore, the time complexity of delivering messages that come within to their relevant subscribers in our proposed filterverification framework with inverted index pruning technique is .
Compared with the filterverification framework proposed in [2], our proposed filterverification framework with inverted index pruning technique needs to visit more inverted indexes (the inverted indexes associated with internal and leaf nodes in the TMRtree). However, since the subscriptions in each node’s inverted index are constantly pruned from the root to the leaf nodes in the TMRtree, the total times of inverted index traversal are reduced. Therefore, our proposed filterverification framework with inverted index pruning technique can be considered efficient, which also can be demonstrated from our experimental study (Section 4).
3.3. Construction and Authentication
After finding the subscribers who are the delivery destinations of messages, that is, , the service provider still needs to construct a for each subscriber for their authentication. Algorithm 2 shows the pseudocode of constructing the . It takes the TMRtree and answers of each message () as input. First, we initialize a for each subscription in () with the root of the TMRtree (lines (1)(2)). Then, we initialize an empty queue and put the root into it (lines (3)(4)). Every element in is computed until is empty (lines (5)–(12)). When the distance between a message and the picked element (from ) is smaller than 0, that is, is in the subtree rooted at , for each subscription in , we replace in with three parts: (1) the token “[”; (2) ’s children; and (3) the token “]” (lines (6)–(10)). Note that we use a pair of tokens “[” and “]” to indicate the scope of the entries in . Then, if ’s children are not in and they are not messages, they are put into (lines (11)(12)). At last, the constructed is delivered to each subscriber with the corresponding messages (line (13)). Here we assume that there are subscribers to whom the messages will be delivered.

Example 6. Following the example in Figure 2, after computing the delivery destinations of messages to , we obtain their answer sets as follows: , , , , , , and . Here we take the construction of the of () as an example (shown in Figure 5) and messages , , , and will be delivered to . In step ①, the root is picked from the queue . Obviously, , , , and are all smaller than 0. Therefore, we replace “root” in with “”. Since and are not in and they are not messages, they are put into . The steps ②, ③, and ④ are similar to ①. In step ⑤, since , , , and are not in the subtree root at , that is, , , , and are all greater than 0, “” in is not replaced and thus remains unchanged. Due to the similar reason, in step ⑥, still remains unchanged. At last, in step ⑦, after computing the last element in , is “”.
To authenticate the soundness of delivered messages, each subscriber needs to scan their to recompute the hash value of the root of the TMRtree and compare it against the root signature using the data owner’s public key distributed by the . Since each includes the entries which have been visited during messages delivery, the subscriber can simulate the procedure of the TMRtree traversal and recursively reconstruct each and compute its hash value in a bottomup manner. Specifically, each and its hash value can be computed from the entries in its child node which are indicated by “[” and “]”.
To authenticate the completeness of delivered messages, the subscriber needs to check that each message in results is indeed present in and whether they satisfy the parameters and . What is more, the subscriber still needs to check that the other entries returned in the do not satisfy and .
Example 7. Still taking as an example, the subscriber can recursively reconstruct from and , from and , from and , from and , and at last the root from and and compute its hash value to compare it against the root signature to authenticate the soundness of delivered messages , , , and . As for authenticating the completeness of , , , and , the subscriber needs to recompute whether they satisfy and , while and do not.
From the example we can see that when more than one message is delivered to a subscriber, only one signature is returned, thus reducing the communication and authentication cost.
Space and Time Complexity. We first give a baseline method for the problem of authenticating messages in outsourced locationaware publish/subscribe services. Then, we give the space complexity of its and compare it with our proposed authenticated locationaware publish/subscribe framework. We also compare the authentication’s time complexity of baseline method and our framework. Baseline: the data owner signs every message within and when the signed messages are delivered to their corresponding subscribers, each consists of the messages (to ) and their signatures. Then, the subscriber can verify the soundness by computing the hash value of each message in and comparing it against the message’s signature. Recomputing the spatiotextual similarity between each message in and the subscription enables the subscriber to verify the completeness.
The space complexity of the of baseline method is given by following proposition.
Proposition 8. If there are messages which are delivered to a subscriber at one time, the size for , that is, the space complexity of , is , where is the size of the signature and each includes the size of its spatial and textual information.
Compared with the baseline method, the space complexity of of our proposed authenticated locationaware publish/subscribe framework is given by the following proposition.
Proposition 9. If there are messages which are delivered to a subscriber at one time, the space complexity of is , where is the size of the signature. is a dummy message and we assume there are dummy messages that are included in .
From the above propositions we can see that, in our proposed filterverification framework with inverted index pruning technique, if more than one message is delivered to a subscriber, only one signature is returned. Although our framework has dummy messages in its , its size is still smaller than that of the baseline method when is large since the signatures are space consuming.
Since the authentication time is corelated to the size of , the time complexity of authentication of our proposed filterverification framework with inverted index pruning technique is also smaller than that of the baseline method.
4. Experimental Study
In this section, we proceed to conduct extensive experiments to evaluate the performance of our proposed authenticated locationaware publish/subscribe framework.
4.1. Experiment Setup
4.1.1. Datasets
Similar to [2], we use a realworld dataset POI which contains 10 million points of interests in USA. We randomly select 1–5 keywords from each POI to generate subscriptions. Thus the average keyword number in each subscription is 3. The maximum permissible response delay and the messages delivery interval () are both set as 5 mins. During this interval, we randomly select 2000 POIs as messages. To generate long messages, we combine 10 POIs as a single message. The average keyword number in each message is 41.
4.1.2. Parameters
The performance of our proposed framework is evaluated by varying the preference (0.1, 0.3, 0.5, 0.7, and 0.9) and threshold (0.5, 0.6, 0.7, 0.8, and 0.9). We set as 0.5 and as 0.7 in the default setting. When we vary a parameter, the other parameter will be in the default setting. We use inverted document frequency () to generate keywords weights.
4.1.3. System Configuration
All the experiments are run on a server with Intel(R) Xeon(R) CPU E52609 v2 @2.5 GHz (Quad Core) and 64 GB RAM, running Linux Ubuntu. We use inmemory setting and the programs are implemented in C++.
4.1.4. Performance Metrics
The metrics for performance evaluation include(i)PAS and PC: percentage of accessed subscriptions and candidates, which indicate the ratios of accessed subscriptions in the inverted index of spatialoriented prefixes and candidates to the number of total subscriptions(ii)FS: time of finding the relevant subscriptions for each message within (iii)CVO: time of constructing the (iv)VOS: size, which affects the communication cost between the service provider and subscribers(v)AM: time of authenticating the messages at the subscribers side
Note that, in our framework, we process a batch of messages at one time; thus each time we first get a total value of each metric. Then, for the metrics PAS, PC, and FS, we report the average value corresponding to each message and, for the metrics CVO, VOS, and AM, we report the average value corresponding to each subscriber.
4.1.5. Algorithms
For metrics (i), (ii), and (iii), algorithms to be evaluated in our experiments include (1) SP (the method of finding the relevant subscriptions for each message using the spatialoriented prefixes, which is proposed in [2]); (2) SP + IIP (our filterverification framework with inverted index pruning technique); (3) VOC (our method of constructing the ).
For metrics (iv) and (v), algorithms to be evaluated include (1) ALPF (our authenticated locationaware publish/subscribe framework) and (2) BL (the baseline method).
Note that, to the best of our knowledge, this is the first attempt to define and solve the problem of authenticating messages in outsourced locationaware publish/subscribe services. Therefore, no existing algorithm is included in our experiments as comparative analysis.
4.2. Performance Study
4.2.1. Cost at the Service Provider
The cost at the service provider is evaluated from two aspects. First, in Figure 6, we evaluate the ratios of accessed subscriptions and candidates (as a function of and ) to the number of total subscriptions (PAS and PC), where the accessed subscriptions refer to subscriptions that are accessed in the inverted index and candidates refer to subscriptions that are verified using the Verify function in Algorithm 1. Second, as shown in Figure 7, we evaluate the running time (as a function of and ), which includes the time of finding the relevant subscriptions for each message (FS) and constructing the (CVO).
(a) Varying
(b) Varying
(a) Varying
(b) Varying
According to Figures 6 and 7, we make the following observations. First, SP + IIP outperforms SP; that is, the PAS and PC of SP + IIP are both smaller than those of SP (shown in Figure 6). Besides, FS of SP + IIP is smaller than that of SP (shown in Figure 7). The reason lies in that SP + IIP uses the inverted index pruning technique to prune the subscriptions from the inverted index of spatialoriented prefixes. These pruned subscriptions are not relevant to the messages and thus they need not be involved in the computation. Second, with the increase of , the performance of SP and SP + IIP increases, because for larger there are smaller number of subscriptions required to be visited and verified, and we have greater opportunity to prune more irrelevant subscriptions. Third, with the decrease of , SP and SP + IIP take much longer time, because for smaller , the spatial similarity is more important and they cannot estimate accurate prefix bounds. Fourth, as () increases, CVO increases (decreases) slightly since we use the answers of each message to construct the and CVO depending on the number of answers. With the increase of (), we get more (less) answers and thus CVO increases (decreases). Fifth, although in our framework it costs extra time to construct for subscribers’ authentication, the total running time (FS + CVO) is still better than SP. For example, in Figure 7, when , SP + IIP costs around 60 ms and VOC costs about 20 ms, thus the total running time is about 80 ms, which is still less than the cost of SP, 90 ms.
4.2.2. Cost between the Service Provider and Subscribers
We evaluate the metric VOS, that is, size, which affects the communication overhead between the service provider and subscribers. Figure 8 shows VOS under the experimental settings by varying and .
(a) Varying
(b) Varying
From Figure 8, we make the following observations. First, ALPF outperforms BL since we process a batch of messages rather than only one message at a time and when many messages are delivered to a subscriber , the consists of only one signature, which is computed using the root hash value of the TMRtree. However, in BL, the would include the signatures of every message. Second, with the increase of (), VOS increases (decreases) in a near linear manner. The reason lies in that VOS depends on the number of messages delivered to each subscriber. When () increases, the number of answers of each message increases (decreases) and, conversely, the number of messages delivered to each subscriber increases (decreases). Third, the biggest value of VOS is about 240 KB when . This value is acceptable especially when more than one message needs to be verified by a subscriber.
4.2.3. Cost at the Subscribers
The last metric AM, that is, the time of authenticating the messages at the subscribers side, is evaluated. AM is crucial since the subscribers may have limited computing resources. Figure 9 shows AM as a function of and .
(a) Varying
(b) Varying
According to Figure 9, we first find that, in ALPF, it always costs subscribers less time to authenticate the messages delivered to them than that in BL. This is because in ALPF when the soundness is verified, subscribers just need to decrypt one signature and recompute the root hash value of the TMRtree to compare against it. However, in BL, the number of decryption operations equals the number of messages delivered to the subscribers but decryption is not a cheap operation comparing with the hashing operation. Thus, ALPF outperforms BL. Second, we find that, with the increase of (), AM increases (decreases) in a near linear manner since AM is always related to VOS and they have the same changing situation. Third, the worst case of authenticating the messages costs subscribers about 1.2 s, which is reasonable and would not have too many bad effects on the subscribers experience.
4.2.4. Security Analysis
In this paper, we study the problem of authenticating messages in outsourced locationaware publish/subscribe services. Therefore, our goal of security analysis is to prove that our proposed authenticated locationaware publish/subscribe framework can guarantee the verification of soundness and completeness of messages by their corresponding subscribers.
Proof of Soundness. Assume that a message delivered to a subscriber is bogus or modified. In this paper, we adopt the commonly used hash function SHA1 [3]. Because SHA1 is collisionresistant and the hash value of the root of the TMRtree is computed recursively from the messages that come within , which must include , the recomputed root hash value of the TMRtree cannot be verified against the signature, which can be detected by the subscriber . Therefore, through our framework, subscribers can receive sound messages from the service provider.
Proof of Completeness. Let be a message satisfying the parameters and which is delivered to a subscriber . For the recomputed hash value of the root of the TMRtree to match the signature (i.e., the soundness is satisfied), there are the following two cases: (i)The message is included in the corresponding . In this case, the subscriber can confirm whether this message is the result of using the returned spatial and textual information of .(ii)The message is not included in the corresponding . In this case, it must be in the subtree rooted at which is included in . However, the subscriber cannot make sure that does not satisfy and since if is the result, must satisfy and , which alarms the subscriber about potential violation of the completeness. Therefore, through our framework, subscribers can receive complete messages from the service provider.
5. Related Work
Our work is related to the locationaware publish/subscribe and authenticated query processing. Sections 5.1 and 5.2 retrospect the related work done in these areas.
5.1. LocationAware Publish/Subscribe
Recently, locationaware publish/subscribe has attracted considerable attention. Most studies in this field can be categorized according to different evaluation methods of relevancy between subscriptions and messages [2, 4–8]. In particular, [4–7] use a spatial region to indicate the spatial information of each subscription and spatial overlap to evaluate spatial similarity and “AND”, “OR” semantics or Boolean expressions to evaluate textual relevancy, while [2, 8] combine the textual relevancy and spatial similarity into a ranking function to quantify the relevancy between subscriptions and messages.
More specifically, regarding the first category, Chen et al. [4] study the problem of matching Boolean range continuous queries over a stream of incoming spatiotextual messages in real time. A Boolean range continuous query is to continually retrieve the spatiotextual messages arriving before the userspecified expiration time such that the retrieved spatiotextual messages satisfy the user’s keywords which are connected by “AND” or “OR” semantics and are located in the query range. The authors present IQTree, which is a hybrid index based on Quadtree and inverted files. In [5], Li et al. study the locationaware publish/subscribe, which delivers a message to its corresponding subscribers having spatial overlap with the message and all the keywords in the subscriptions are contained in the message (“AND” semantic). They propose the tree, which extends the Rtree by selecting some representative keywords from subscriptions and adding them into Rtree nodes to enable textual pruning. Both matching algorithms of [4, 5] follow the filteringandrefinement paradigm. More recently, although they study the same problem, Wang et al. [6] find that, in [4, 5], the spatial factor is always prioritized during the index construction regardless of the keyword distribution of the query set and the inverted indexing technique is not wellsuited to textual filtering. Therefore, they utilize the keyword partition and space partition in one tree structure when constructing the index for queries based on expected matching cost. They compute the cost based on the number of queries associated with each partition and the probability of whether the partition is explored during message matching, instead of the complexity of filter and verification steps. Guo et al. [7] study filtering dynamic streams for continuous moving Boolean subscriptions. Different from previous works, it continuously monitors users’ locations and sends nearby messages in real time and it allows users to specify their interests with Boolean expressions, which provides better flexibility and expressiveness in shaping an interest.
With respect to the second category, as introduced in Section 2.3, Hu et al. [2] study the parameterized locationaware publish/subscribe, which requires subscribers to specify parameters to enable personalized filtering. In [8], Chen et al. study top spatialkeyword publish/subscribe, which aims to continuously feed the user with new spatiotextual messages whose temporal spatialkeyword scores are ranked within the top. They use a Quadtree to partition the whole space. Each subscription is assigned to a number of covering cells, forming a disjoint partition of the entire space and an inverted file ordered by subscription id is built to organize the subscriptions assigned to each cell.
5.2. Authenticated Query Processing
Authenticated query processing has been studied extensively. Most studies on query authentication are based on an , Merkle hash tree () [1], as introduced in Section 2.3. The notion of the is generalized to multiway trees and widely adapted to various index structures. Typical examples include the Merkle Btree and its variant Embedded Merkle Btree [9]. Following the concept of the , the authenticated query processing problem has also been studied for the relational data [9, 10], data streams [11–15], and textual search engines [16].
In the spatial databases domain, based on the , there are also many query authentication applications. Yang et al. [17] first introduce the query authentication problem to the domain of spatial data and study the authentication of spatial range queries. They propose an called MRtree, which combines the ideas of MBtree [9] and tree [18]. Yiu et al. investigate how to efficiently authenticate moving NN queries [19], moving range queries [20], and shortestpath queries [21]. More recently, Hu et al. [22] and Chen et al. [23] develop new schemes for range and top query authentication that preserve the location privacy of queried objects. Besides, Lin et al. [24] investigate the authentication of locationbased skyline queries. A new called MRSkytree is proposed. Authentication of reverse nearest neighbor query is studied by Li et al. in [25]. For the mixed data types, such as spatiotextual data, Su et al. [26] and Wu et al. [27] study the authentication problem for snapshot and moving top spatialkeyword queries, respectively. Yan et al. [28] explore the authentication problem in the area of spatiotextual similarity joins. Instead of only supporting the relational data as [10] does, the proposed authentication schemes in [28] can support spatial data. Zhang et al. [29] study the authentication of locationbased top queries which ask for the POIs in a certain region and with the highest ratings for an interested POI attribute.
Besides the , there are some other index structures which can be used to construct the , such as Voronoi diagram and prefixtree. Hu et al. [30] propose a novel approach that authenticates spatial queries based on the neighborhood information derived from the Voronoi diagram. The problem of authenticating query results in data integration services is studied by Chen et al. in [31], which addresses multisource data authentication that can simultaneously support a wide range of query types. Based on the prefixtree, they propose Homomorphic Secret Sharing Seal, which is to merge the authentication codes of nonresult values with a common prefix, thus allowing them to be verified as a whole.
6. Conclusion
In this paper we have studied the problem of authenticating messages in outsourced locationaware publish/subscribe services. We propose an authenticated locationaware publish/subscribe framework, including an TMRtree to organize the messages that come within , a filterverification framework with inverted index pruning technique to efficiently deliver the messages to their relevant subscribers, and the methods of constructing the and authenticating the delivered messages at the subscribers side. Experimental results on a realworld dataset show that our framework achieves high performance.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant 61502047 and the Coconstruction Program with the Beijing Municipal Commission of Education.