Abstract

Indoor shopping trajectories provide us with a new approach to understanding user’s behaviour pattern in urban shopping mall, which can be derived from user-generated WiFi logs using indoor localization technology. In this paper, we propose a location-aware Point-of-Interest (POI) recommendation service in urban shopping mall that offers a user a set of indoor POIs by considering both personal interest and location preference. The POI recommendation service cannot only improve user’s shopping experience but also help the store owner better understand user’s shopping preference and intent. Specifically, the proposed method consists of two phases: offline modelling and online recommendation. The offline modelling phase is designed to learn user preference by mining his/her historical shopping trajectories. The online recommendation phase automatically produces top- recommended POIs based on the learnt preference. To demonstrate the utility of our proposed approach, we have performed a comprehensive experiment evaluation on a real-world dataset collected by 468 users over 33 days. The experimental results show that the proposed recommendation service achieves much better recommendation performance than several existing benchmark methods.

1. Introduction

Indoor location-based services, such as shopping flow monitoring, mobile location-based advertisement, and POI recommendation, are expected to witness a significant growth in the next decade due to the popularity of mobile devices and the development of indoor positioning technologies. Previous studies about this topic mainly focus on providing some basic services, such as indoor positioning [1], indoor navigation [2], or indoor tracking [3]. By contrast, few studies aim to perform in-depth analysis and utilize user’s location information in indoor environment, which is a fundamental context for location-based services. For instance, the retention time customers spend in visiting shops, the way customers come to these shops (e.g., go directly to a shop or just hanging out in the shopping mall), and the frequency of customers checking in shops, something like that, are useful for understanding user’s behaviour pattern and preference.

Similar to online behaviour analysis in E-commerce, this kind of in-depth location analysis is also called physical analytics [4], which is demonstrated as a revolutionary new technology for connecting consumers with shop brand. Specifically, physical analytics in a shopping mall can provide three sides benefit: () For users side, some context-aware personal services (e.g., personalized recommendation and optimal shopping route) can be provided based on their preference deriving from physical analytics. () For shop owner side, physical analytics are beneficial to targeted advertising since potential consumers can be found based on their preference. () For shopping mall manager side, physical analytics can monitor traffic flow in real time and discover correlation between shops and users, which is useful for optimizing shopping mall layout. For E-commerce recommendation, website cookies are sufficient to learn user’s preference. However, physical analytics will suffer serious challenge due to the difficulty of obtaining user’s behaviour information, such as user’s shopping trajectory. Even worse, user’s location information in indoor environment is usually incomplete and scattered. Fortunately, WiFi check-in logs provide a new platform to generate user’s trajectory in indoor environment since free WiFi is increasing available for many indoor spaces, such as urban shopping mall and museum. Additionally, customer’s check-in activities imply their preference; since most people have a finite amount of resources (e.g., time and money), they tend to visit their favourite shops (as shown in Figure 1).

As mobile devices and social media are becoming more and more pervasive, user-generated information (e.g., check-in records) from these platforms is providing rich information to in-depth understand user preference. Recently, a few studies [5, 6] on POI recommendation from location-based social network (LBSN) have been proposed; these approaches will suffer several challenges. The first challenge is that it is insufficient to learn customer preference by only utilizing check-in frequency; other pieces of context information (such as the residence time of visiting) can help better reflect the level of customer’s interest towards a shop. The second challenge is data sparsity due to few check-ins in reality (most customers usually visit a shop only once in fact), so only utilizing customer’s check-in frequency may not be sufficient to learn their preferences. Moreover, existing POI recommendation methods in LBSN cannot make recommendations for people who are not members of the LBSN.

To tackle these challenges, we propose an indoor POI recommendation method for urban shopping mall with the following three contributions:(i)It generates user’s indoor spatial-temporal trajectory from user-generated WiFi logs. The proposed approach first recognizes POI entranced by utilizing the WiFi radio signal strength fluctuation of a small window size when passing a physical boundary point and then splits WiFi logs to a few subsequences according to these identified POI entrances. Finally, the approach maps each subsequence to POI based on indoor fingerprint-based localization.(ii)It utilizes a two-layer relation graph to capture multirelation among users and POIs from user’s historical spatiotemporal trajectories. Then it estimates User-User relation by utilizing a random walk-based propagation algorithm and performs indoor POI recommendation using user-based collaborative filtering.(iii)We evaluate our method on indoor POI recommendation using a real-world dataset collected by 468 users over 33 days, with performances outperforming six baselines.

Since it is insufficient to learn user’s preference by merely utilizing their check-in frequency, our approach extracts various context information (such as the check-in frequency and check-in time) from user’s indoor spatial-temporal trajectory for mining their preference. In addition, our approach collects user-generated WiFi logs by passive crowdsourcing, which is infrastructure-free and no user involvement. Therefore, our proposed approach can make recommendation for users who are not members of the LBSN. To deal with the data sparsity challenge, our approach infers User-User relation utilizing a random walk-based propagation algorithm. Random walk on graph can alleviate the sparsity problem in indoor POI recommendation by utilizing both User-Store relation and User-User relation. Typically, most users have few check-ins and tend to review a small number of POIs; thus the data of directly connected vertices (e.g., User-POI vertex pairs and User-User vertex pairs) are sparse. Fortunately, one vertex can reach another vertex through intermediate vertices (denote as hidden propagation path), which can better estimate the relation strengths between two vertices that are not directly connected with these hidden propagation paths.

The remainder of the paper is organized as follows. Section 2 surveys related work on mining indoor trajectories and indoor POI recommendation. Section 3 introduces our proposed approach for generating indoor trajectories using WiFi RSSI. Section 4 describes our indoor POI recommendation algorithm in detail. Section 5 reports the experimental results. Finally, we present our conclusion and future work in Section 6.

In this section, we survey related works and discuss how these works differ from our research, including previous studies of generating indoor trajectory and indoor POI recommendation.

2.1. Generating Indoor Trajectory

We are aware of only a few works [710] which directly involve this issue that generates user’s trajectory in indoor space. Specifically, Prentow et al. [7] proposed a bootstrapping approach to construct indoor trajectory for mitigating indoor positioning error biases. Werner et al. [8] utilized WiFi RSSI as an information source to infer indoor trajectory from a given trajectories database. Radaelli et al. [9] utilized sequential pattern mining to identify typical trajectories of indoor objects by exploiting presence sensors, that is, Bluetooth and RFID. Dakkak et al. [10] presented a method to generate indoor trajectories using classical predictors and digital fractional differentiation.

Fortunately, there are a number of works on indoor tracking which are relevant to generate indoor trajectories, as it is simple to obtain indoor trajectory from movement tracking data. In general, existing studies on indoor tracking can be divided into three categories: () Indoor tracking by location fingerprinting-based approaches [2, 11]. The key idea is to obtain user’s location using fingerprinting-based positioning algorithm, which usually consists of two phases: offline constructing fingerprint map and online positioning. () Indoor tracking by triangulation positioning approaches [3]. The main idea is to obtain user’s location using at least three anchors for triangulation positioning. () Indoor tracking by dead reckoning approaches [12], which calculates user’s current position according to previously determined position using built-in sensors (e.g., gyroscope, accelerometer, and compass) of mobile devices.

However, fingerprinting-based indoor tracking is time-consuming and vulnerable to many factors, such as heterogeneous devices or environmental changes. Triangulation positioning will result in heavy infrastructure cost since it requires anchors to coverage the indoor space. Dead reckoning approaches rely on the initial location and thus suffer from cumulative error.

2.2. Indoor POI Recommendation

To the best of our knowledge, only a few literatures [1315] address the problem of indoor POI recommendation based on user’s trajectories. Specifically, Lin [13] proposed an indoor location system by regarding the stay time in certain shops as item rating. Fang et al. [15] mined customer’s preference from WiFi RSSI patterns, that is, time spent in a store and check-in frequency of store, and then proposed a recommendation system for indoor shopping. Jin et al. [14] proposed an indoor hotspots detecting method by considering user’s interests in locations and the relationship between users and locations.

We also surveyed related works about indoor recommendation by considering user’s context. For instance, [16] proposed an store recommendation systems by mining the context of decision-making behaviour using eye-tracking data, [17] proposed a POI discovery approach by matching the user profile and the semantic-enhanced POIs, [18] proposed a recommended system to help users in shopping for technical products by considering user preference and technical product attributes, and [19] proposed an automatic mobile assistant for museum visiting based on WiFi-based indoor positioning. Additionally, Shin et al. [20] constructed an indoor database platform for indoor location-based services.

Our proposed indoor POI recommendation method differs from the above-mentioned works in the following three aspects: () We generate user’s indoor trajectories using opt-in WLAN without needing time-consuming and labor intensive cost for constructing fingerprinting map for each small grid of indoor space, such as in [2, 11]. () We only use the indoor trajectories to learn user’s preferences for making recommendations, unlike literatures [17, 18] that need additional user profiles for recommendation. () Existing indoor POI recommendation algorithms, such as [710], merely use user-based or item-based collaborative filtering for making recommendation and will suffer data sparsity problem since numerous users only have few check-in information. To address this challenge, we first utilize a two-layer relation graph to capture multirelation among users and POIs from user’s historical trajectories. Then, we infer User-User relation with a random walk-based propagation algorithm and perform indoor POI recommendation using user-based collaborative filtering.

3. Generate Indoor Spatiotemporal Trajectory

Unlike outdoor trajectories that can be easily obtained from a large number of user-generated GPS trajectories, user’s indoor moving trajectories are significantly difficult to obtain. For generating user’s indoor trajectory, our approach utilizes WiFi RSSI by considering the widespread deployment of WLAN infrastructure in indoor space nowadays. We formally describe our method for generating indoor trajectory as follows.

3.1. Problem Definition

For ease of the following presentation, we first define the key data structures and notations used in the problem of generating indoor spatiotemporal trajectory.

Definition 1 (indoor POI). denotes the set of POIs. is the number of POIs and POI refers to an indoor geographical region that may be useful or interesting for a user.

Definition 2 (indoor spatiotemporal trajectory). An indoor spatiotemporal trajectory is a sequence of indoor POIs consecutively visited by a user, which is defined as and is the POIs numbers that are visited by user in one visiting and is a triple that means user checks in POI at a particular timestamp .

In short, user’s visiting history can be regarded as a set of indoor trajectories. Clearly, user’s trajectories imply spatial and temporal information.

Definition 3 (POI feature). POI feature is defined as a tuple . is -dimensional vector and is denoted by , which means the scanned WiFi RSSI recorded from surrounding WiFi APs in , and is the number of WiFi access points (APs) in indoor space.
According to the principle of indoor fingerprint-based localization [21], the WiFi RSSI collected in a physical place can be regarded as the location landmark for positioning.

Definition 4 (POI feature set). POI feature set is the feature collection of all POIs, denoted by .

Definition 5 (WiFi logs). A user-generated WiFi log is defined as , and denotes the scanned WiFi RSSI record by user’s mobile device at time , .

As mentioned above, our approach utilizes existing WiFi infrastructure to generate user’s interaction behaviours from WiFi logs, which is infrastructure-free and no user involvement. To achieve both goals, we utilize WiFi probe requests to collect the data with a nonintrusive way. WiFi probe requests are frames that are broadcast by mobile phones to discover nearby WiFi APs and can be sniffed by WiFi compatible antennas on 802.11b/g/n channels. According to [22], mobile phones will broadcast WiFi probe requests every few seconds. That means collecting data by WiFi probe requests allowed us to track every mobile device that connects to the WiFi infrastructure. In our experiment, every device that connects to WLAN infrastructure at each POI has agreed to this data collection as part of the sign-on agreement. For privacy issue, we collect user-generated WiFi logs as hashed entities with no additional knowledge about them and finish collecting data when the user leaves the shopping mall. We believe that this is a privacy-safe application.

Based on the above definitions, we formulate the problem of generating user’s indoor spatiotemporal trajectory as follows: given a user-generated WiFi logs in one visiting and POI feature set , obtain the corresponding indoor spatiotemporal trajectory .

3.2. Solving Approach

The idea behind our solving approach is to utilize the WiFi RSSI fluctuation of a small time window when passing a physical boundary point. The principle is that WiFi RSSI is relatively stable in a small area according to the indoor propagation model of wireless signal [23]. However, some indoor physical constraints, that is, POI’s entrances and stairs, will cause WiFi RSSI change dramatically even in a small area. Table 1 shows that the RSSI sequences extracted from a small time window include 7 WiFi RSSI records when a user walks through a POI entrance; the location for collecting is outside the POI, nearby the POI entrance when collecting , and inside POI when collecting . We further calculate the Euclidean distance between two adjacent records as follows: . Obviously, the WiFi RSSI will have a dramatic “jump” when people walk through the POI entrance.

Based on the above analysis, our approach generates user’s indoor spatiotemporal trajectory by the following two steps.

Step 1 (recognize POI entrance). Given a user-generated WiFi logs , we define as the WiFi RSSI variation in time window , as shown in where is the number of WiFi APs and is the variation of WiFi RSSI from WiFi AP during this time window, as calculated in where is the average WiFi RSSI from AP and is the WiFi RSSI from AP at timestamp .

Based on the WiFi RSSI variation, we can obtain the time when user may pass a POI entrance if the WiFi RSSI variation in time segment is larger than a threshold .

As mentioned above, we recognize indoor POI entrance using the WiFi RSSI jump characteristic, which may bring some false recognitions, since other factors (e.g., crowd passing and room layout change) may result in similar RSSI jump characteristic. However, the WiFi RSSI jump characteristic caused by these factors (e.g., crowd passing and room layout change) is temporary, while that caused by POI entrances is stable (there is an obvious RSSI jump characteristic when users pass through a POI entrance). Therefore, we utilize a user-specific threshold method to remove false recognitions. Formally, let as the scanning WiFi RSSI record that user may walk through a POI entrance, and and are the user-specific threshold. We consider as a false recognition ifwhere is the Euclidean distance between and and means the number of RSSI records that the Euclidean distance between and is smaller than .

Step 2 (map WiFi RSSI to POI). After obtaining time set when user walks through POIs entrance, we first split the WiFi logs into a subsequence set and then map each subsequence to the corresponding POI using POI feature set. Finally, we construct indoor spatiotemporal trajectory according to Definition 2.

The framework of generating indoor spatiotemporal trajectory is described in Algorithm 1.

Require: () User-generated WiFi logs ; () POI
     fingerprint set ,
Ensure: Indoor spatio-temporal trajectory Traj.
()   Initial user trajectory .
()    Obtain the time set that user walks through
      POIs entrance according to RSSI variation.
()     Split into a subsequence set .
()   for do
()       Initial POI set .
()       for do
()              Obtain the corresponding POI at time using nearest
                 neighbour.
()              Add to .
()       end for
()    Find the element with the most times in .
()    Obtain the timestamp
()    Construct triple and add to trajectory
               .
() end for
() return Indoor Spatio-temporal trajectory .

4. Indoor POI Recommendation Algorithm

In this section, we first introduce the key data structures and notations used in our POI recommendation algorithm and then present the offline modelling part and online recommendation part of the proposed algorithm.

4.1. Preliminary

For ease of the following presentation, we define the key data structures and notations used in the proposed approach. Table 2 lists the relevant notations used in this paper.

Definition 6 (User-POI visiting interaction). A visiting interaction is a triple , which means user visits POI at a particular timestamp ts. Information about the user and POI history interaction is given by .

Note that the interaction between user and POI carries three-dimensional information: user, POI, and timestamp, which can provide rich information to learn the latent relationship between user and POI. Typically, different users have different interesting POIs in indoor environment. For instance, customers choose to visit shops according to their income level or visit clothing shops based on personal dress style. The timestamp of interactions carries certain behaviour patterns; that is, people usually eat lunch at 12:00 p.m.–14:00 p.m., which means that a customer has a higher probability to check in a restaurant rather than a clothing store at this timestamp. In addition, the stay time when a customer visits different shops implies her preference since most users have finite resources; the more time spent in a shop indicates the more interesting. Obviously, the time dimension information can help to better recommend POIs to users.

Definition 7 (User-POI relation). The relation strength indicates the interest of a user for POIs, which is defined as a vector , where .

Definition 8 (User-User relation). The relation strength indicates the similarity of two user’s POI interest.

Definition 9 (relation graph). We denote the graph by , which is an undirected bipartite graph. Here , where and are the sets of users and POIs, respectively. Edges , where represents the relation between users, and represents the relation strength between users and POIs.

4.2. Offline Modelling

In this subsection, we first describe the offline modelling part of the proposed method, a graph-based model to capture the relation between User-User and User-POI, and then present the inference process. Our goal of the inference process is to derive the pairwise relevance scores for each pair of User-User. To achieve this, a random walk with restart (RWR) [24] is performed on the constructed relation graph, and the value of relevance scores reflects the relation strength of users.

4.2.1. Relation Graph Construction

Our approach for constructing relation graph is comprised of two stages.

In the first stage, we calculate the initial relation strength of User-POI and User-User from user’s visiting trajectories. More exactly, the two kinds of relation strength calculation are described as follows.

User-POI Relation Calculation. We calculate the initial relation strength of a user to POI with considering two factors: one is the POI check-in frequency and another is the average stay time of this POI. Obviously, the more times user visits a POI and the longer stay time demonstrate the more interest for this POI. Therefore, we calculate the User-POI relation strength as shown in where is the stay time of user in POI at trajectory , is the longest POI stay time of user at trajectory , and is the history trajectories of .

User-User Relation Calculation. Following the principle of collaborative filtering, we use Pearson’s correlation [25] to measure User-User relation strength based on User-POI relation, and the relation strength is defined as where and , respectively, represent the average relation strength that users and have given to all POIs, represent the POIs that are visited by both users and , and and are calculated according to

At the second stage, we model the POIs, users, and User-POI interaction in a relation graph based on the two kinds of relation strengths obtained from the first stage, as shown in Figure 2. The relation graph contains a user layer and a POI layer; each node in the user layer represents a user of which the edge between any two users represents their relation strength. Each node in the POI layer represents a POI while the edge between a user and a POI represents the User-POI relation strength. We formally described the relation graph construction as follows.

As described in Definition 9, we construct the relation graph including user layer and POI layer. The edge weight in the user layer is calculated according to (5), while the edge weight between user and POI is calculated according to (4). We formulate the weight matrix of graph as (7), where and are the weight matrix of edges and , respectively. Since we do not consider the relation between POIs, is a zero matrix.

4.2.2. User-User Relation Inference

Existing indoor POI recommendation algorithms, that is, user-based [13] method or item-based method [15], will suffer serious data sparsity problem since numerous users only have few POI check-ins. In addition, few historical trajectories cannot reflect user’s POI preference and further infer reasonable User-POI relation and thus may result in cold-start problem in recommendation system. To address this challenge, we perform the random walk with restart to derive the relation between each pair of users. This method first infers transition probabilities between users based on their similarities and models finite length random walks on the user space to compute predictions, which is especially useful when training data is less than plentiful, that is, when typical similarity measures fail to capture actual relationships between users (such as data sparsity challenge). On the other hand, our method has resemblance to the studies in [26, 27] in the manner by which they deal with sparsity problem. The goal of RWR in our approach is to find neighbour users with top-K highest relevance score for a given user based on their historical POI check-ins.

Random walk on relation graph can alleviate the sparsity problem in indoor POI recommendation by fusing multirelation between users and POIs. Users usually have few check-ins for most POIs; thus the directly connected vertices are sparse. However, one vertex can reach another vertex through hidden propagation path, which can better estimate the relation strength between two vertices that are not directly connected with these hidden propagation paths. The intuition on how hidden propagation paths can alleviate sparsity can be explained by the following example.

Example 10. Suppose we need to infer the relation between two vertices: and (as shown in Figure 3), which is not directly connected in the relation graph. Since has an edge with and , respectively, we can find some hidden relations between and through intermediate vertices (e.g., ). More exactly, we can find one hidden propagation path from the User-POI relation and two hidden propagation paths and from the User-Store relation. Such hidden propagation paths are beneficial to infer the relation strength of two vertices that are not connected directly, thus alleviating the data sparsity problem.

Without loss of generality, we assume the random walker starts from a user node on graph . Then, the random walker iteratively transmits to other nodes which have edges with , with the probability that is proportional to the edge weight between them. At each step, also has a restarting probability to return itself. We can obtain the steady-state probability of by visiting other vertexes when the RWR process is converged. The RWR process can be formulated as where and are two vectors, is the initial relation strength of target user to other users and calculated as (5), represents the probability distribution in step and , and is the transition matrix, which is obtained based on weight matrix of by now normalization, as shown in where is a diagonal matrix with .

4.3. Online Recommendation

After constructing the relation graph, the problem of recommendation POIs to a target user can be converted to calculate the relation strength between vertex and the unvisited POIs and then generate POI recommendation lists according to the top- relation strength rank.

Following the principle of collaborative filtering, we calculate the relation strength between vertex and unvisited POI as where represents the relation strength between and , is the relation strength of to , and is the average relation strength of and all POIs.

Algorithm 2 formally describes the proposed two-stage approach for recommendation indoor POIs to users in urban shopping mall. The framework consists of two stages. First, as shown in Lines 2~5, we calculate the initial User-POI relation strength and User-User relation strength from user’s indoor spatiotemporal trajectories. Second, as depicted in Lines 6~8, we construct relation graph and perform a random walk with restart to infer User-User relation strength. Third, we calculate the relation strength of the target user to his/her unvisited POIs as shown in Lines 11~13. Finally, we obtain top-k POIs with the highest relation strength as the POI recommendation list.

Require: () User set , POI set , and user’s indoor spatial-temporal
     trajectory ; () Target user
Ensure: POI recommendation list of user .
()    Stage 1: Relation graph construction and inference.
()    for do
()      Calculate initial User-POI relation strength according to Equation (4);
()      Calculate initial User-User relation strength according to Equation (5);
()    end for
()     Construct relation graph ;
()     Obtain transition probability matrix according to Equation (9);
()    Perform random walk with start over to infer User-User relation;
()    Stage 2: POI online recommendation
() Obtain the unvisited POIs set of target user
() for   do
()    Calculate User-POI relation strength according to Equation (10).
() end for
() Rank according to .
() return Select top-K POIs as the recommendation list of .

5. Experiment Evaluation

In this section, we report on the results of a series of experiments conducted to evaluate the performance of the proposed approach to generate indoor spatiotemporal trajectory and recommend top- POIs to users, followed by discussion. Our experiment environment is a large indoor shopping mall with four floors and over 60 shops, and we regard each shop as an indoor POI, and these shops belong to 4 categories given by the mall owner, as shown in Table 3.

5.1. Generate Indoor Spatiotemporal Trajectory

In this subsection, we describe the experimental settings for generating user’s indoor spatiotemporal trajectory using WiFi logs including datasets, comparative approaches, and the evaluation metric. We then report the performance of our proposed method and compare it with three baseline methods.

5.1.1. Experimental Datasets

To evaluate our proposed approach for generating user’s indoor spatiotemporal trajectory using WiFi logs, we develop a mobile application based on Android system to collect experimental dataset with a sampling rate of 1 Hz. The format of each record is a triple: , where mac is the MAC address of scanning device and time is the time of collecting data. is the scanned RSSI from surrounding WiFi APs, which is represented as a serial of tuples , where represents the MAC address of and is the scanned RSSI from .

We invite 25 volunteers carrying mobile phones to collect WiFi logs of 117 predefined trajectories for evaluating the performance of mining indoor trajectory. The predefined information includes the check-in time and check-out time of each shop, which can be regarded as ground-truth data to evaluate the performance of generating indoor spatiotemporal trajectory. After analysis, there are 57 different WiFi APs in all WiFi logs; we extend each RSSI record to a 58-dimensional vector. For WiFi AP without scanning RSSI value, we set −110 dBm as default value in RSSI record. An example of RSSI record is shown in Table 4. In addition, we collect 100 WiFi RSS records in each shop to construct POI feature set according to Definition 4. Note that the phase for constructing POI feature set is not time-consuming and labor intensive, since two hours are enough to collect 6000 RSSI records (the sample rate is 1 Hz) for constructing POI feature set.

5.1.2. Comparative Approaches

We compare our proposed method for generating indoor trajectory with the following three competitor methods:(i)RSSI-NN. This approach obtains the corresponding POI of each RSSI record using fingerprint-based positioning method [28], which regards raw WiFi RSSI as POI feature and uses nearest neighbour as matching method.(ii)DIFF-NN. DIFF-NN [29] uses the difference of RSSI between each pair of WiFi APs as POI feature to solve the RSSI variation problem caused by heterogeneous devices.(iii)Weight-RSS. Weight-RSS [1] utilizes both the raw WiFi RSSI and their relation to design stable POI fingerprint and uses a weighted -nearest neighbour as matching method.

After mapping each record of WiFi logs to POI with the above three methods, we construct the corresponding indoor trajectory as Definition 2.

5.1.3. Evaluation Metric

For evaluating the performance of generating indoor spatiotemporal trajectory using WiFi RSSI, we first need to define the distance of two trajectories. Let denote the common POI set of trajectory and , and then the longest common subsequence of and is defined aswhere means that the stay time difference of in and is less than a threshold and .

Following the work of [30], we define the distance of and as where and are the POIs number of and , respectively.

5.1.4. Experimental Results

(1) Recognize Shop Entrance. In our proposed method, three parameters that directly impact the performance of recognizing shop entrance need to be determined, and they are as follows: time windows size () for calculating the RSSI “jump” variation and user-specific thresholds and for removing false recognition. We empirically set for removing false recognition of shop entrance.

Table 5 shows the recognition accuracy as a function of time window size and user-specific threshold for recognizing shop entrance, respectively. From this table, we observe the following: () the accuracy drops sharply when the user-specific threshold is lower than 150 or greater than 350 and achieves the highest accuracy when . () Set ; the accuracy increases with the time window size increasing from 1 to 5 and slightly decreases when the time window size is larger than 5, since the RSSI fluctuation caused by shop entrance and other factors will be smaller with increasing time window size. Finally, the best performance (71%) is achieved when and .

Figures 4(a) and 4(b) show the performance of filtering false identification as a function of user-specific threshold and time window size, respectively. From the two figures, we observe the following: () Set the time window size , and the filtering performance declines sharply when variation coefficient is lower than 200 and achieves the best accuracy when . () Set , and the filtering accuracy increases with increasing the number of time window sizes between 3 and 5 and slightly decreases when the time window size is larger than 5. The reason is that the difference of RSS fluctuation between physical boundary points and normal locations will be smaller with increasing the time window size.

Set and , and we investigate the recognition accuracy with different sampling rates in Figure 4(c). From this figure, we observe that the recognition performance increases slightly with the sampling rate increasing from 500 ms to 1000 ms and drops significantly when the sampling rate is larger than 1000 ms. The best recognition performance is achieved when setting the sampling rate as 1000 ms. For example, the recognition accuracy is 70.4% after removing false recognition when setting the sampling rate as 1000 ms, while the recognition accuracy is only 24.7% after removing false recognition when setting the sampling rate as 3000 ms. The reason is that when collecting RSS values of a relatively short or long period (e.g., 0.5 s or 3 s), the RSS values of both physical boundary point and normal location will fluctuate wildly when users are moving and thus cannot effectively identify physical boundary points.

(2) Map WiFi Logs to Indoor Spatiotemporal Trajectory. Figure 5 shows the performance for generating indoor spatiotemporal trajectory with different methods. We can see that the performance is the same for the three comparative methods, since they are based on a fingerprint-based positioning method using RSSI. Our method outperforms the three methods according to Figure 5. More exactly, the percentage that trajectory distance is less than 0.5 for our method is 81% and 67% for DIFF-NN, 60% for weight-RSS, and 46% for RSSI-NN. The reason our method achieves better performance is that our method can better recognize POI entrance and further more accurately divide WiFi RSSI records into the corresponding POIs, using the RSSI variation characteristics caused by the POI entrance. In contrast, the performance of other three methods will rapidly decline due to the RSSI variation caused by heterogeneous devices or environmental changes.

In addition, we also compare the processing time of generating trajectories for the four methods, as shown in Figure 6. In this figure, we can see that DIFF-NN is very time-consuming compared to the other three methods. The reason is that DIFF-NN will suffer the curse of dimensionality with numerous WiFi APs, since it constructs POI feature using the difference of each pair of WiFi APs. Our method needs the least time costs. The reason is that our approach firstly splits RSSI sequence into POIs. Therefore, our method does not require dealing with WiFi RSSI records outside POIs.

5.2. Indoor POI Recommendation

In this subsection, we describe the experimental settings for indoor POI recommendation, including datasets, comparative approaches, and the evaluation metric. We then report the performance of our proposed method and compare with five baseline methods.

5.2.1. Experimental Datasets

We gather an anonymous dataset from 468 registered customers using an opt-in WLAN in an urban shopping mall over 33 days. Firstly, we filter out these WiFi logs that are generated from mall workers and shop employees using the check-in frequency. More exactly, we consider a user as a mall worker or shop employee if his/her check-in frequency is more than 15 during the 33 days. After preprocessing, the dataset consists of 1021 WiFi logs and over 9,556,560 WiFi RSSI records. More details of the dataset are shown in Table 6.

5.2.2. Comparative Approaches

We compare our method with the following five well-known recommendation algorithms that have been widely used in POI recommendation systems.(i)Content-Based k-Nearest Neighbours Algorithm (CBNN) [31]. CBNN utilizes all user’s trajectories to create a POI-POI matrix , and represents the similarity of and based on the visiting correlation of POIs. Formally, is defined as where and denote the set of trajectories include and , respectively.For a unvisited POI , CBNN first retrieves nearest neighbour POIs which have been visited by the target user using POI-POI matrix. Then, calculate the recommendation score of using User-POI relation, as shown in (ii) Item-Based Collaborative Filtering Algorithm (Item-Based CF) [32]. This method formulates User-POI matrix and POI-POI matrix according to (4) and (13) and then applies item-based collaborative filtering to calculate the recommendation score of unvisited POIs for the target user.(iii) User-Based Collaborative Filtering Algorithm (User-Based CF) [33]. Similar to Item-Based CF, this method first obtains User-POI matrix and User-User matrix according to (4) and (5) and then applies user-based collaborative filtering to calculate the recommendation score of unvisited POIs.(iv) Collaborative Filtering Based Location Cooccurrence (LCCF) [34]. LCCF calculates the recommendation score of unvisited POI by combining other user’s visiting history of the POI. Let if has visited POI , and otherwise; is the check-in vector of customer . Then, the recommendation score between and an unvisited POI is calculated bywhere is the similarity between user and user and is calculated using the cosine similarity between and .(v) Rule-Based Recommendation Algorithm (RBCA) [15]. RBCA estimates user’s POI preference by linearly fusing three factors: time spent in a POI, check-in frequency of a POI, and matching between promotional activities in the POI and user preference towards promotional activities. Recommendation rules are extracted according to two assumptions: one is that the higher the user's preference for a POI is, the more likely he/she is to visit the store; another is that a user will visit the POI in which promotions are in line with his preference.(vi) Graph Recommendation Based Random Walk with Restart (RWR) [24]. This method first constructs a graph, where a node in the graph denotes a POI and the weights between nodes are assigned using the POI’s relation. Then, RWR considers the POI recommendation as an entity ranking problem on the graph and utilizes personalized PageRank algorithm [35] to generate top- recommended POIs.

5.2.3. Evaluation Metric

For each user in , using represents POIs where has been visited in the test set. Let represent the recommendation POIs that appear in , which means that is interesting in POIs of . Then, we define the recommendation hit rate as where is the length of recommendation list.

For evaluating the effectiveness of the proposed POI recommendation algorithm, we randomly select 30% of indoor trajectories as the test set and use the rest as training set, denoted by and . We use to construct the relation graph for inferring User-User relation and User-POI relation. Specifically, includes 133 users as the target users. For evaluation of the performance of the proposed recommendation algorithm, we perform a top-k recommendation task on the 133 target users.

In addition to hit rate, we also measure the recommendation diversity of the proposed method, which indicates the surprise of recommended results. Following the work in [36], we exploit the category information of POIs to evaluate the recommendation diversity. For each user, the diversity is measured as follows:where denotes the categories that likes in the training data while are the categories that appear in the recommendation list of user . Then, we measure the total diversity of a recommendation algorithm by averaging the diversity of all users:where denotes the number of users in test data.

5.2.4. Experimental Results

In this subsection, we first report the impact of model parameters for the proposed algorithm and then present the results of our experiments for all users and cold-start users, respectively.

(1) Impact of Model Parameters. Tuning model parameters, such as the parameter for inferring the User-User relation using random walk with restart, is critical to the performance of the proposed algorithm. We study the impact of model parameter () on the dataset. Set the recommendation list as 5, and we tested the performance of the proposed recommendation model by varying parameter and present the results in Figure 7. From the figure, we observe the following: () the slightly increases with the increasing of from 0.1 to 0.7 and then decreases when is greater than 0.7; () the best performance is achieved when , which means a user will check in 28.1% POIs of the recommendation list if we recommend 5 POIs to the user. The reason is that the convergence of random walk with restart is determined by the parameter ; that is, a greater leads to faster convergence and thus can make better recommendation. But an overlarge will cause a high probability to return to the target user when selecting recommended neighbours, thus reducing the number of high quality recommended neighbours and further decreasing the recommendation performance.

(2) Effectiveness of Recommendation for All Users. Figure 8 shows the recommendation hit rate of the six recommendation algorithms. Note that we only perform experiments where the recommendation list , due to a greater value of , is usually ignored for top-k recommendation task since there is 60 POIs in total. It is apparent that all the six algorithms have significant performance disparity in terms of top-k hit rate. As shown in Figure 8, the of our method is about 71% when and 35% when , which means that a user has 71% probability to check in the recommended POI if we only recommend one POI. Similarly, if we recommend 4 POIs to the user each time, only 35% recommendation POIs can attract the user’s attention. We can see that CBNN and LCCF perform rather worse in the experiment, showing that only using the location cooccurrence is not enough to learn user’s preference. Item-based CF and User-based CF can achieve better performance than CBNN, showing the advantages of using collaborative filtering to model user’s preference and POI’s characteristics. Obviously, our proposed method outperforms the five recommendation algorithms significantly, showing the advantages of our relation graph to capture multirelation between users and POIs, which can better make recommendation for “cold-start” users that only have little historical visiting information.

Table 7 reports the results of diversity for the six recommendation algorithms. From this table, we can observe the following: () Compared to recommendation algorithms using POI correlation (such as CBNN, LCCF, and user-based CF), item-based CF achieve much better diversity. For example, item-based CF improves about 19% diversity compared to CBNN, 24% diversity compared to LCCF, and 13% diversity compared to user-based CF when the recommendation list is 5. The reason is that user-based CF usually recommends POIs with high popularity to users, so it is insufficient to discover POIs in the long tail and thus result in low recommendation diversity. () Our method achieves slightly worse performance when compared to item-based CF (item-based CF improves about average 2.3% compared to our method), which is also much better than the other four algorithms. This improvement is achieved by utilizing random walk with restart to derive pairwise score between each pair of users, which can solve the problem of data sparsity to a certain extent.

(3) Effectiveness of Recommendation for Cold-Start Users. Figure 9 reports the recommendation performance of the six recommendation algorithms; similarly, we only perform experiments where the recommendation list . From this figure, we observe the following: () the performance of all the six algorithms for cold-start users degrades significantly compared to all users, showing data sparsity caused by cold-start users bring serious challenge for indoor POI recommendation. () Our method outperforms the other five methods by improving average 6% hit rate, showing the advantage of using random walk with restart to learn user’s preference.

Table 8 reports the results of diversity for cold-start users with the six recommendation algorithms. The results show that, compared to traditional user-based models (user-based CF, LCCF, and CBNN), item-based CF achieves better diversity. For example, item-based CF improves about 19% diversity compared to CBNN, 21% diversity compared to LCCF, and 15% diversity compared to user-based CF when the recommendation list is 5. We can see that the diversity improvement using item-based approaches compared to user-based CF is much more for cold-start users than for all users. We further observe that the proposed algorithm outperforms user-based CF, but slightly worse diversity (about 5%) compared with item-based CF.

In summary, the proposed method substantially improves the coverage of existing user-based POI recommendation approaches while maintaining a slightly better hit rate. This improvement is achieved by exploiting random walk with restart to construct relation graph, which can better model the preference of users with few POI check-ins.

5.2.5. A Case Study

We perform a case study on the usefulness of indoor POI recommendation. A group including 25 volunteers participated in the case study.

Dataset. As mentioned above, we invited 25 volunteers carrying mobile phones to collect 117 WiFi logs for evaluating the performance of generating indoor trajectory. We utilize this dataset for evaluating the performance of this case study, as shown in Table 9. The volunteers are divided into 5 groups (i.e., A, B, C, D, and E) according to average visited shops per customer; we regard the volunteers of E group that have only an average of four visiting shops as “cold-start” users.

Evaluation Metric. We utilize NDCG@k to evaluate the performance of indoor POI recommendation. Let denote a relevance value, and NGCG@k is calculated aswhere is the value of ideal ranking list.

For instance, if the recommendation list of a user is for five stores while the ideal recommendation list is given by the user, then we can calculate the NDCG@5 = 0.659 according to (19).

Results. Figure 10 shows the NDCG@5 for the five group users with three recommendation models: user-based CF, RWR, and our method. The NDCG@5 is calculated as follows: for a specific user, the recommendation model firstly derives his/her shopping preference from check-in records and then generates top recommendation list using the three recommendation models, respectively. For the recommendation results, each user has an ideal rank list answer in his or her mind. Based on the recommendation list from recommendation model and user’s ideal rank list, the NDCG@5 can be calculated as (19). From Figure 10, we can observe the following: () for all the five groups of users, the NDCG@5 of our proposed model is better than the compared models (user-based CF and RWR). For example, the performance improvement for group C users is about 13% and 8% compared with user-based CF and RWR; () the performance improvement of recommendation model using random walk is significant for “cold-start” users. For example, the performance improvement of our method for group E users is about 15% compared with user-based CF. The results suggest that our recommendation algorithm can learn user’s preference even with few POI check-ins by using a graph-based model to capture the latent relation between users and POIs.

To investigate the recommendation effectiveness for “cold-start” users, we further calculate the NDCG@ (i.e., ) for each user of E group, as shown in Figure 11. From this figure, we can observe that, for all cold-start users, the NDCG@k of our proposed model is much better than the compared models (user-based CF and RWR), showing again that the advantage of our method derives user’s preference by using a graph-based model to capture the latent relation between users and POIs.

6. Conclusion

This paper proposed a location-aware Point-of-Interest recommendation system for urban shopping mall that recommends a set of POIs to a user by mining customer’s preference towards POIs from his/her historical indoor trajectories. For generating indoor spatiotemporal trajectories, we propose a novel method that utilizes the propagation characteristics of WiFi RSSI in indoor space. The proposed recommendation system cannot only facilitate user’s shopping experience but also help the shop owner better understand user’s shopping preference and intent. By constructing a relation graph model and exploiting random walk with restart, our recommendation algorithm can learn user’s preference even with few POI check-ins. We evaluated the proposed recommendation model on a real dataset collected by 468 users over 33 days. The experimental results show that our approach significantly outperforms existing recommendation algorithms in recommendation hit rate and diversity.

As future work, we plan to consider the POI semantic information (such as POI service or online reviews) for further improving the recommendation performance.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work has been supported by Hangzhou Key Laboratory for IoT Technology & Application.