Unlike outdoor trajectory prediction that has been studied many years, predicting the movement of a large number of users in indoor space like shopping mall has just been a hot and challenging issue due to the ubiquitous emerging of mobile devices and free Wi-Fi services in shopping centers in recent years. Aimed at solving the indoor trajectory prediction problem, in this paper, a hybrid method based on Hidden Markov approach is proposed. The proposed approach clusters Wi-Fi access points according to their similarities first; then, a frequent subtrajectory based HMM which captures the moving patterns of users has been investigated. In addition, we assume that a customer’s visiting history has certain patterns; thus, we integrate trajectory prediction with shop category prediction into a unified framework which further improves the predicting ability. Comprehensive performance evaluation using a large-scale real dataset collected between September 2012 and October 2013 from over 120,000 anonymized, opt-in consumers in a large shopping center in Sydney was conducted; the experimental results show that the proposed method outperforms the traditional HMM and perform well enough to be usable in practice.

1. Introduction

During the past decade, there have been a large amount of researches focusing on trajectory prediction [1, 2]. While most of the researches to date concentrated on outdoor scenario with GPS or GPS-like positioning only, researches show that human beings spend around of times in indoor environments such as shopping malls, office buildings, airports, conference facilities, and private homes [3, 4]; thus, the research of trajectory prediction falls short in another important setting, namely, indoor scenario. The main reason is that there are no reliable and accurate indoor positioning systems and the outdoor positioning technologies such as GPS cannot identify the indoor location of a user accurately either.

With the prevalent of mobile devices which support Wi-Fi-enabled connectivity and the increasing number of indoor Wi-Fi-enabled venues, breakthrough in indoor trajectory prediction has been made in recent years. Shopping malls are providing free Wi-Fi connections to attract and retain users. For example, the giant supermarket chain Tesco has carried out free Wi-Fi services in hundreds of its stores in the UK [5]. Wi-Fi is almost a must have for shopping malls, and because of the ubiquitous Wi-Fi services, it is becoming easier to track shoppers’ foot-path and physical movements by capturing the Wi-Fi signals emitted by mobile devices and collecting the MAC address while the shoppers move around in the shopping mall.

In this paper, we aim to propose an effective method to predict a customer’s next visiting place in the indoor space based on his history locations where each location is represented by a specific Wi-Fi access point. There are many potential ways that shoppers and retailers can benefit from the application of indoor trajectory prediction. For example, by being aware of the movement of customers in advance, vendors can quickly target possible shoppers and push advertisements through online advertisement system with the objective to attract shoppers’ attention and make it possible to boost the sales even before customers physically approach the store. Moreover, through the application, the pedestrian flow in the shopping mall can be predicted making it possible to avoid the traffic block and maximize the effect of shopping coupon promotion which delivers the coupons in a specific region at a certain time. Last but not least, through the indoor prediction system, managers can rearrange the workers in the indoor space more efficiently.

In all, the contributions of the paper are as follows: (1)We investigate which are the appropriate hidden states of Hidden Markov model in indoor trajectory prediction problem(2)A novel clustering algorithm based on the similarity of Wi-Fi access points has been studied(3)From spatial perspective, we employ frequent subtrajectory into indoor trajectory prediction problem which improves the prediction ability of our approach(4)From contextual perspective, we investigate and integrate the unique contextual information of shop category in indoor scenario into our trajectory prediction problem which improves the prediction ability(5)Extensive experiments have been carried out on a real-world dataset to evaluate our method. The results show notable improvement of indoor trajectory prediction in accuracy by adopting our approach

The reminder of this paper is organized as follows. We first present literature review on the research area in Section 2 before describing the preliminaries about indoor trajectory and related issues of Hidden Markov mode in Section 3. We present the data processing technique in Section 4; then, we build the HMM-associated indoor trajectory prediction models and present the fusion framework which integrates trajectory prediction and shop category prediction in Section 5. In Section 6 we evaluate the proposed method and analyze experiment results. Finally, we conclude the paper in Section 7.

There have been extensive researches in localization and mobility prediction utilizing wireless network and mobile devices [68]. However, unlike the outdoor positioning system (GPS), indoor positioning systems such as Bluetooth, Infrared, RFID, or Wi-Fi have only been mature in recent years and started to emerge in commercial markets [9]. While much more researches focusing on trajectory prediction in the context of outdoor positioning systems have been published [10], there is still not sufficient related work on the same application in indoor spaces.

Based on the data collected, there are two types of trajectory predictions: continuous where the positioning system generates continuous coordinate values and discrete where the positioning system provides discrete coordinate values such as the IDs of positions [11]. The accuracy of continuous prediction can be evaluated by the distance between the predicted result and the corresponding actual position while the accuracy of discrete prediction which returns the predicted IDs of positions can be evaluated as the probabilities that the results are actual IDs. Our work falls into the second category where each indoor trajectory point is represented by a Wi-Fi access point with a unique ID.

The methods proposed for trajectory prediction can be divided into two main categories: (1) mine frequent patterns of trajectories and (2) establish the model of moving objects to achieve the prediction.

In order to extract the trajectory patterns for prediction, a number of research works have been conducted [1214]. Apart from the work based on geographical characteristics of trajectories, semantic trajectories have also been mined and used for the prediction [15]. While the aforementioned work adopts the trajectory patterns for prediction directly, our work is distinguished from them in that we treat the patterns as hidden states that can not be observed.

Though there are many pattern mining-based prediction methods, they do not have a mechanism to allow continuous updating of movement patterns. So various machine learning techniques have been developed to predict users’ location given their historical movements, such as dynamic Bayesian networks [16, 17]. However, during our background research, the most frequently used methods are Markov Chain and Hidden Markov model and their variations.

In [18], the authors proposed the Markov transition probability which is based on a cell-based organization of a target space and the Markov Chain model. In the recent work [2], the authors present a model based on Markov chain for predicting the next location of a student in campus, they consider the notion of time in the prediction algorithm that coined as trajectory prediction algorithm (TPA).

When considering the variations of the Markov model, the authors proposed a mixed Markov Chain model (MMM) which is an intermediate model between an individual model and a universal one where it clusters individuals into a set of latent groups based on their mobility behaviors and transition traces, each group has its specific Markov Chain model that needs to be learned. The results show that the prediction accuracy is higher than that of Markov-chain model and HMMs [11]. However, it will consume much time and storage space for learning and storing the generated models. In order to solve the problem, the authors extended a mobility model called Mobility Markov Chain (MMC) to incorporate the previous visited locations [19].

When referring to HMM-based trajectory prediction, it can be classified into two categories: parameter learning and structure learning [20]. For parameter learning, [21] developed the mixed hidden Markov models (MHHM) and [22] proposed a HMM-based trajectory prediction algorithm where the main parameters can be adapted autonomously. For structure learning, [23] introduces an algorithm for modeling and recognizing temporal structure of visual activities based on HMM; in [24], the authors presented models for pedestrian behavior analysis by building geometric and probabilistic models sequentially.

Existing works aiming at predicting indoor movements employ the method of pattern mining and variance of Markov Chain [1, 10, 17, 19, 25, 26]; however, the unique feature of semantic information in indoor environment such as shop categories has not been studied for prediction due to the limitation of their data source. What is more, their methods are only tested on small data sets, which only contain a small number of users or trajectory points, and in [19], the datasets used were collected in a controlled environment: data was gathered from participants who are aware of the experiments. Different from the existing works, in this article, we used a large-scale dataset collected between September 2012 and October 2013 from over 120,000 anonymized users (907,084 associations are detected forming 261,369 indoor trajectories) which is operated by a large mall with 67 Wi-Fi access points (APs) across 90,000 square meters. Additional information about the physical environment is provided by the owner of the mall including floor plans of the stores, the shop categories, and the location of Wi-Fi access point. Such data provides a unique opportunity for us to analyze the interaction between users’ physical movement and semantic movement in indoor spaces and our method takes the frequent subtrajectory and shop categories the customer visits into consideration which have not been investigated before.

3. Problem Statement and Preliminaries

In order to better depict the method proposed, we first describe indoor trajectory and its related properties. Table 1 lists all the notations used in the paper; specific meanings have been given when they emerge in the paper.

3.1. Basic Concepts

In this section, we will introduce some basic concepts first.

Definition 1. An AP denotes a Wi-Fi access point in the shopping mall, each AP has a unique id and it covers several shops, and each shop belongs to a specified shop category. We denote an AP as , where is a subset of all the shop categories.

As the example illustrated in Table 2, Wi-Fi access point covers six shops and their shop categories belong to Women’s Fashion, Unisex Fashion, Women’s Footwear, and General Footwear; thus, we denote .where denotes the number of ’s shop categories.

Definition 2. The similarity between two s is defined as follows:

Definition 3. An indoor trajectory is composed of a sequence of time ordered APs. Where and , , represents the indoor trajectory point when associating at timestamp in the indoor space.

Definition 4. Indoor trajectory base stores a large number of customers’ logging information in the indoor space, represents the trajectory set, and is the total number of trajectories.

3.2. Hidden Markov Models

In this section, we briefly introduce the basic concept of Hidden Markov model [27], indoor trajectory HMM, and shop category HMM.

The Hidden Markov model is a statistical model used to describe the Markov process with unknown parameters. It is a two-tiered random process in which the upper layer is composed of Markov Chain that describes the transition between hidden states, and the bottom layer is a random model that depicts the relationship between observation symbols (observation states) and hidden states. Based on HMM, our problem can be formalized as follows.

Definition 5. In indoor trajectory prediction based on HMM, assume an indoor trajectory is depicted as , where is a distinct timestamp, and is the HMM-based indoor trajectory prediction model, where stands for hidden states and corresponds for observation states in the Hidden Markov model, is the initial state distribution matrix, is the hidden states transition matrix, and is the transition matrix; the goal is to compute the position of in the timestamp based on the previous timestamps.

Similar to the classic Hidden Markov model, our indoor trajectory HMM is depicted as follows:

Definition 6. The indoor trajectory HMM is a five-tuple ; it then follows:

is a set of indoor trajectory hidden states. The hidden states are denoted by , which meet the Markov Property, where indicates the number of hidden states.

is a set of observation symbols; in our case, it denotes a set of Wi-Fi access points which is expressed by , where corresponds to the total number of observable states.

denotes the initial probability of choosing state . For example, when , the possibilities of choosing hidden states , , and are , , and , respectively; then, initial state probability matrix is .

, is the indoor trajectory hidden state transition probability matrix, where , , indicates that at timestamp , the probability of choosing state is , in the condition of state at timestamp .

is the confusion matrix (transition probability matrix); it describes the transition probability between the hidden states and observation states in the HMM, where represents the probability of transformation form hidden state to observation state .

After defining the parameters in the 4indoor trajectory HMM, we will introduce how to compute the three basic parameters, namely, initial probability matrix , transition probability matrix , and confusion matrix , respectively. Overall speaking, we calculate the parameters of HMM by a statistical method.

The calculation of initial probability matrix is as follows: where denotes the number of hidden states and denotes the frequency of hidden state appearing in all hidden state sequences corresponding to observation states.

Then, the computation of state transition probability matrix is as follows: where denotes the count of hidden state appearing after hidden state in the training set and represents the count of hidden state emerging in the training set.

Finally and similar to the calculation of probability matrix , the calculation of confusion matrix is as follows: where denotes the count of observed state occurring along with the hidden state and represents the count of hidden state occurring in the training set.

In shop category HMM, in real world scenarios, when a customer goes into the shopping mall, it is highly possible that his visiting sequences have certain patterns; as the example shown in Figure 1, after buying clothes, he would like to eat something; after that, he may go to the cinema to have a rest; thus, the customers’ behaviors are highly predictable. In order to validate to what extent the shop categories a customer visits can be accurately predicted, we design a shop category-based Hidden Markov model to test our assumptions.

Similar to the prediction of indoor trajectory, the prediction of shop category based on HMM is depicted as follows:

Definition 7. In shop category prediction based on HMM, assume that a customer has visited shops in a shopping mall that belong to categories {, }; our goal is to predict what is the next shop category the customer is going to visit based on the Hidden Markov model.

Since the details of shop category HMM are similar to that of indoor trajectory HMM, we omit the detailed descriptions here.

4. Indoor Trajectory Data Preprocessing

Before constructing the HMM, we have to preprocess the data which includes grouping access points and extracting hidden states.

4.1. Group Analysis for APs

In order to obtain the hidden states, we first propose the similarity-based grouping method which groups the access points according to their similarities between each other. Similarity parameter plays a critical role in the algorithm which differs fundamentally from traditional clustering methods because it does not need to iteratively visit each Wi-Fi access point. Algorithm 1 depicts the procedures of the grouping method. The basic idea of our grouping algorithm is given as follows: (1) initialize and specify the ID of groups (line 1); (2) visit the access points in TLink, and check whether a point has been visited; if so, omit it; else, specify groupNum to its group ID and mark it as “visited” (lines 2-8); and (3) traverse the access point in Tlink, where ; if the similarity between and points is no less than , then we assign the group ID of the AP to groupNum and mark it as “visited” (lines 9-17).

input: A linklist storing the access points and similarity threshold
output: A set of groups satisfying the condition with respect to
1: ;
2: fordo
3:  if Tlink[i].visit TRUE then
5:  else
6:   TLink[i].id = groupNum;
7:   TLink[i].visit = TRUE;
8:   Create a new group
9:   .append(TLink[i]);
10:  end if
11:  for to do
12:   if Tlink[j].visit==TRUE then
13:    CONTINUE;
14:   end if
15:   ifthen
16:    Tlink[j].id = groupNum;
17:    Tlink[j].visit = TRUE;
18:    .append(TLink[j])
19:    end if
20:  end for
21:  : end for
23: Return
4.2. Extracting Subtrajectory Patterns

In this section, we describe how to extract the frequent subtrajectory patterns from the training data set.

The subtrajectory in this study means consecutive trajectory points; the length of subtrajectory ranges from 2 to , where is the number of trajectory points in the given trajectory. As the example shown in Table 3, given a trajectory sequence , the set of subtrajectory with length two is ; then, we sort the subtrajectory patterns in a descending order and choose the top- patterns as hidden states.

After computing the top frequent subtrajectory patterns, an additional procedure is to preprocess the raw trajectories into frequent subtrajectory based trajectories. First, we select frequent subtrajectory patterns according to the number of them; after that, we merge the trajectory points into a supernode. The procedure is illustrated in Figure 2.

5. Indoor Trajectory Prediction via HMM

In this section, we start by giving the working mechanism of the proposed method; then, we will present the details of each step.

5.1. Working Mechanism

The working mechanism of our proposed method is illustrated in Figure 3. At its most basic level, our solution contains two essential phases, namely, (1) model training: which retrieves hidden states corresponding to the historical trajectories and build the indoor trajectory Hidden Markov model, and (2) prediction: given a sequence of observation trajectory points, solve the decoding problem in HMM, and calculate the possibilities of candidate access points to be visited by the user next. Hidden states contain much information that can not be observed directly, but the information is highly valuable. We are interested in finding the most possible hidden state sequences with t timestamps corresponding to the given trajectory sequence . It can be answered by solving the decoding problem in indoor trajectory HMM.

Definition 8. In decoding problem, it is given that where represent a chain of trajectory points and , is the corresponding hidden state sequence of . We can solve this problem using the Viterbi algorithm by calculating the maximum probabilities and storing the hidden states that result in the max.

5.2. Viterbi Algorithm

In this section we will briefly introduce the Viterbi algorithm. First, we will introduce four symbols used here: (1) represents the maximum probability that produces and along the path , when observed state is at timestamp (2) represents a status value, which saves the last optimal state leading to the current state(3) is the output probability(4) saves the final choice of optimal hidden state at timestamp

input: Observations of length n, indoor trajectory HMM
output: best path that generates the observations
1: // initialization
2: for each state in hidden states do
5: end for
6: //recursion
7: for each time step t from 2 to n do
8:  for each state in hidden states do
11:  end for
12: end for
13: //termination
16: // backtracking
17: Return

The pseudocode of the Viterbi algorithm is shown in Algorithm 2, it first initializes the end cache and initializes the variable which is used to track the index of the max node to zero (lines 2-5); then, it calculates over all remaining observation sequences and states the partial max and stores away the index that delivers to it (lines 7-12); then, we find the Viterbi path with the maximum transition probability by calculating the inverse function (lines 14-15); and finally, it will return the hidden state sequence corresponding to the given observation sequence (line 17). Thus, when given a sequence of trajectory points, we can obtain the most probable hidden state sequences through the Viterbi algorithm.

5.3. Shop Category Augmented Trajectory Prediction

In this section, we will present the details of shop category prediction and fuse it with the indoor trajectory HMM to help improve the prediction accuracy.

Given a trajectory sequence , the number of shop category sequences is where is the number of shop categories that the trajectory point covers. As the example shown in Table 4, given a trajectory sequence {, , , }, each Wi-Fi access point will cover several shop categories; in our example, covers four shop categories, namely, , and ; covers three shop categories, namely, , and ; and covers three shop categories, namely, , and . First, we generate the possible shop category sequences according to the () trajectory points; in the example, there are a total of possible shop category sequences. Then the generated shop sequences are used to do the prediction, based on shop category HMM, we can get the possible shop categories to be visited next. In our example, if or (the shop categories that covers) is in the set of predicted shop categories, we say the shop category to be visited next is accurately predicted.

input: Candidate AP list , Predicted shop category list
output: Each candidate ’s shop category weight
1: ,
2: for in do
3:  for in do
4:  if in then
6:   end if
7:  end for
8: end for
9: Return

In order to improve the precision of trajectory prediction, we integrate the shop category prediction with trajectory prediction. Each candidate’ shop category weight is computed according to Algorithm 3: candidate list stores possible shop categories to be visited next; then for each candidate access point , if one of its surrounding is in L2, we assign to its shop category weight, where ; finally, we apply Formula (7) to compute the unified score of each access point to be visited; is the weight we signed to shop category.

5.4. Indoor Trajectory Prediction Algorithm

In this section, we propose the details of the algorithm for predicting the next location in indoor scenarios.

As shown in Algorithm 4, we aim to predict the position of the time stamp beyond the previous trajectory points; the details of the algorithm are described as follows: (1) initialize the parameters (line 1); (2) check whether the length of the trajectory meets the requirement (lines4 2-4); (3) use the Viterbi algorithm to compute the most probable hidden states corresponding to the observation sequence (line 5); (4) traverse all the next possible places to be visited, calculate the possibilities, and rank the possible APs in a decreasing order (lines 6-9); and (5) return the next possible access point (line 10).

input: A set of indoor trajectory sequences
output: The predicted trajectory point at timestamp.
1: Initialize parameters;
2: if T.length 0 then
3:  Return FALSE;
4: end if
5: RT =Viterbi(T);
6: for i =0 to m_DimA do
7:  Compute ProNext
8:  Rank
9: end for
10: Return

6. Experimental Evaluation

In this section, we would like to evaluate and compare our algorithm against baseline methods. All algorithms were implemented using Python 3.5. The dataset was split into two sets, namely, training dataset and testing dataset, in terms of check-in time rather than using a random partition method. The intuition is that in practice we can only use the past check-in data to predict the future check-in events. The training set was used to learn the predicting models of the proposed method, and the testing set was used to evaluate the accuracy of the prediction model.

6.1. Experiment Setup
6.1.1. Dataset

The trajectory dataset we used was obtained from a large inner city shopping mall with 67 Wi-Fi access points (APs) across 90,000 square meters between September 2012 and October 2013. The mall contains over 200 stores that belong to 34 shop categories (e.g., Women/Men’s Fashion and General Footwear). Such data provides a unique opportunity for us to analyze the interaction between users’ physical movement and semantic movement in indoor spaces. The statistics of the data set are shown in Table 5.

Figure 4 depicts the trend about the number of trajectories w.r.t their length. We omit the trajectories whose lengths are smaller than those of the three which are too short to convey meaningful visiting patterns. As can be seen from the figure, more than 90% of people tend to access three to nine Wi-Fi access points when they are in the shopping mall.

Based on the floor plan, the proximal areas of APs are classified into three main categories: food court which consists of 11 APs, retail which is made up of 46 APs, and navigational where 10 APs belong to this main category. Table 6 shows the distribution of visiting time per category. Seven percent of users’ visiting time was spent in navigational areas, was spent in the food court, and was spent in a retail context. The largest average duration per AP was measured in the food court. In addition, from the average of visiting time per user visit, we observed that indoor users tend to spend more time in a retail context than in other physical contexts.

6.1.2. Parameters

Table 7 summarizes all of the parameters used in this work. A parameter was set to the default value in the experiments where any other parameters were being varied.

6.1.3. Performance Metrics

In general, our trajectory predicting technique computes a score for each candidate item (i.e., AP in this paper) regarding a target user and returns APs with the top- highest scores as prediction results to the target user. To evaluate the quality of our method, we apply three metrics: (1)p@1: it measures the percentage of times in which we found the correct next location.(2)@5: it measures the percentage of times in which the correct next location was given in the top-five most probable locations.(3)Mean reciprocal rank: it measures the average of the reciprocal ranking positions for the correct next location.

To examine the effectiveness of our proposed method, we have implemented four comparing algorithms: (1) a naive prediction algorithm based on HMM (Naive), (2) prediction algorithm that considers AP similarities (SM), (3) prediction algorithm that takes frequent subtrajectory pattern into consideration and finally (FS), and (4) prediction algorithm which fuses subtrajectory pattern and shop categories into a unified model (FSS).

6.2. Experimental Results

This section analyzes the extensive experimental results. We first investigate the effect of similarity parameter on the prediction accuracy; then, we will check the trend of accuracy when varying the number of frequent subtrajectory patterns; finally, we will compare the fusion method with the aforementioned approaches and discuss some important findings about trajectory prediction in indoor scenarios.

6.2.1. Effect Analysis of the Similarity Threshold

In this section, we will investigate the performance of grouping APs into a cluster as hidden states according to AP similarities and check the trend of predicting accuracy under different similarity thresholds. The experiments evaluate the effect of parameter varying from 0.3 to 0.7 with the step of 0.1 each time to obtain a new HMM. Note that we only report the results of ranging from 0.3 to 0.7 because when is larger tan 0.5; it will have little impact to the final results corresponding to Figure 5. Then, we check the performance of different HMM and analysis the accuracy trend.

As shown in Figure 5, with the increasing of similarity threshold, the accuracy of p@5 increases and remains quite stable when the similarity threshold is larger than 0.5. The reason may be that with the increment of similarity threshold, Wi-Fi access points are less likely to be clustered into a group; thus, the hidden states in the model increase also, which strengthens the model’s predicting ability, so at the early stage, the prediction accuracy increases. However, after a certain threshold, the newly generated hidden states will not affect the model’s predicting ability too much. Thus, throughout the experiment, we find that the model’s predicting ability will not always improve with the increasing of hidden states.

The trend is more obvious as depicted in Figure 6; we illustrate the prediction accuracy for different lengths of trajectories here. The length of trajectories ranges from five to nine; we can conclude from the figure that with the increasing of similarity threshold, the accuracy of p@5 increases steadily, and another important finding is that as the length of the trajectories increases, the prediction accuracy also increases. The reason may be that as the trajectory length increases, when computing the most probable hidden state sequence, more historical trajectory points will be taken into consideration leading to a better prediction result. This is true in a real-world scenario, because we will know better about a user when his visiting sequence becomes longer.

6.2.2. Effect Analysis of the Number of Frequent Subtrajectories

In this section, we will investigate the performance of frequent subtrajectory-based HMM. We incrementally add top frequent subtrajectory patterns into the hidden states of our model. The number of frequent subtrajectories ranges from 5 to 21 covering to of total Wi-Fi access points, respectively. The results are depicted in Figure 7.

As can be seen from the figure, the trend of predicting accuracy for p@1, p@5, and MRR do not increase stably when we increase the number of frequent subtrajectories. But the best result occurs when we employ the top 15 frequent patterns that cover Wi-Fi access points.

6.2.3. Accuracy on Shop Category Prediction

In this section, we will check to what extent can we accurately predicate the shop categories going to be visited next.

Table 8 shows the prediction results. We investigated trajectories whose length ranges from 4 to 7, the number of generated shop sequences ranges from 24870 to 919024, and the prediction accuracy varies from to . Thus, we can surmise that human movement exhibits certain patterns and adopting the shop category will increase the accuracy of trajectory prediction.

6.2.4. Effect Analysis of Fusion Methods

In this section, we will check whether combining the access point and shop category into a unified model will obtain a better a result or not. Figure 8 shows the results obtained when we vary the weight parameter of shop category in which the optimal value of is around 0.8. We can conclude from the results that introducing shop category can help increase the location prediction accuracy, with additional information about the customers; we can predict their whereabouts more precisely. Table 9 compares the results obtained through the four different methods. As can be seen from the table, a naive method performs worst mainly because it does not consider the features in the indoor settings. Our frequent subtrajectory-based method outperforms the naive and similarity-based methods greatly for p@1, p@5, and MRR. Especially for p@1, the accuracy of frequent subtrajectory-based method nearly doubles the result of the similarity-based method indicating the physical movement pattern plays a crucial rule in indoor trajectory prediction. Further, when we integrate shop category into trajectory prediction, the prediction accuracy improves for p@1, p@5, and MRR, indicating that the customers’ movements have certain semantic intentions in real life which can be used to help increase the accuracy of indoor trajectory prediction.

7. Conclusion

Trajectory prediction in indoor space is a challenging and hard topic, unlike in the outdoor scenario where travelers walk through the foot path or road; in the indoor setting, even if in a small region, there will be multiple choices for the next location to go to. In order to accurately predict the mobility of customers, we employed the HMMs to precisely depict transition between trajectory points. The proposed approach first mines hidden states of indoor trajectory points; in order to obtain the hidden states, we put forward two methods here: the first method groups the Wi-Fi access points according to their similarities as hidden states, and the second method considers the frequent subtrajectories as hidden states; then, we use HMM to model and calculate relative parameters. After that, the most likely hidden sequences corresponding to the given trajectory are calculated through the Viterbi algorithm and the ranking list for the next probable access point is generated. In order to enhance the prediction ability, we fuse the results of trajectory prediction and shop category prediction into a unified framework. Finally, we have conducted extensive experiments to evaluate the prediction accuracy of our proposed method using a large-scale real dataset collected from a shopping mall and compared it with baseline methods; results show that our newly proposed approach outperforms the baselines significantly in terms of prediction accuracy.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work was supported by the National Natural Science Foundation of China under grant Nos. 61672179, 61370083, and 61402126; the Natural Science Foundation of Heilongjiang Province under grant No. F2015030; the Youths Science Foundation of Heilongjiang Province of China under grant No. QC2016083; the Fundamental Research Funds for the Central Universities under Grant No. HEUCFM180601; and the Heilongjiang Postdoctoral Science Foundation under grant No. LBH-Z14071.