Abstract

We present a probabilistic method of predicting context of mobile users based on their historic context data. The presented method predicts general context based on probability theory through a novel graphical data structure, which is a kind of weighted directed multigraphs. User context data are transformed into the new graphical structure, in which each node represents a context or a combined context and each directed edge indicates a context transfer with the time weight inferred from corresponding time data. We also consider the periodic property of context data, and we devise a good solution to context data with such property. Through test, we could show the merits of the presented method.

1. Introduction

In recent years, the function of mobile device has been largely extended. Besides call and message service, mobile users usually play games, listen to music, and watch TV, just with their devices. So soon many people cannot get along without their mobile devices [1]. Moreover, mobile device can detect its location and sense the status of the device and current weather such as temperature, humidity, and atmospheric pressure [2]. As such historic context data of mobile users increase cumulatively, now it becomes quite important to provide users with advanced services based on their expected context.

Advertisers want to know mobile users that will do something or locate in some places within a specific time period [3]. Assume that as a result of context prediction a mobile user is expected to reach a location in near future. Then, as an example service, stores near the location can send ads to him or her. In other words, an owner of a restaurant near the location can send ads to whom will reach nearby his/her restaurant (see Figure 1(a)). Also, mobile users can get some information in advance about, for example, what to do, where to go, or who to call/message [4]. The user may want to get some new information near the location . For example, the user can get the parking information regarding a resort, where he/she will go probably soon, while staying at the restaurant (see Figure 1(b)). Context prediction of mobile users can provide advertisers and mobile users with useful information services.

For the advanced services, we should predict user's context correctly using his/her previous historic context data. Any context data gathered by mobile device or its sensors are available. Then user contexts are composed of user activities, user locations, weather information, and so on. All of them are time-stamped. In this paper, we present a probabilistic method providing context prediction based on user historic context data. In recent years, there have been many studies to predict context of mobile users. However most of them focused only on location prediction and many of them were based on just machine learning and rule-based techniques. When we compare our method with them, the most distinguishable feature of the proposed method is that it predicts general context based on probability theory through a novel graphical structure with time-based edge weights.

The remainder of the paper is organized as follows. In Section 2, we introduce recent related work about context prediction in mobile environments. In Section 3, we present our prediction scenarios. We propose our probabilistic method using a new graphical structure called time-inferred pattern network in Section 4. In Section 5, we provide some test results. Finally we give our conclusions in Section 6.

In recent years, there have been many studies to predict context of mobile users. However most of them focused only on location prediction and many of them were based on just machine learning techniques [57]. Also, some of them were based on simple rule-based methods [8, 9].

In this study, we need user's historic context data, but there have been approaches without user's historic data. Karmouch and Samaan [10] predicted the traveling trajectory and destination using knowledge of user's preferences and analyzed spatial information without user's historic data. Ying et al. [11] predicted the next location of a user's movement based on both the geographic and semantic features of users' trajectories. Their prediction model is based on a cluster-based strategy which evaluates the next location of a mobile user based on the frequent behaviors of similar users in the same cluster. Voigtmann et al. [12] also presented a collaborative context prediction technique to overcome the gap of missing context information in the user's context history.

As similar approaches to ours, there have been studies that are probability based, graph based, or using time concept. Liu and Karimi [7] proposed trajectory prediction methods by a probability-based model and a learning-based model. Wang and Cheng [13] devised an approach for mining periodic maximal promising movement patterns based on a graph structure and a random sampling technique. Chen et al. [14] introduced graph-matching algorithms for mining user movement behavior patterns. Laasonen [15] used other context variables such as time to predict mobile user routes. Bradley and Rashad [16] introduced a time-based location prediction technique by mining mobile sequential patterns. However, all of them are quite simple approaches. Here we present a unified approach using probability theory on a new time-inferred graph structure.

Predicting the location of mobile users was a frequently tackled subtask of mobile context prediction in recent researches. There have also been a few studies for predicting general context beyond location, but many of them were also based on just machine learning techniques from user behavior [17, 18]. Tseng and Lin [19] mined and predicted user behavior patterns based on the assumption that the location and the service are inherently coexistent. Hassani and Seidl [20] introduced a method for predicting a next health context of mobile users using multiple contextual streams that influence the health context. In this study, we also consider multiple contexts for predicting any type of context.

3. Prediction Scenarios

3.1. Used Context Types

In mobile computing, location is usually used to approximate context and to implement context-aware applications [6, 7, 9]. However there is more to context than location as illustrated in Figure 2 [21]. We used two types of context data: sensing data and log data. Sensing data consist of location, device status (e.g., idle or calling), activity (e.g., exercising, listening to music, watching TV, or playing games), and weather such as weather type, temperature, humidity, and wind direction/speed. Used log data are as follows: call log, SMS log, played music log, Email log, chatting log, and visited Web log. The above data of each mobile user is periodically transferred to a server. The server analyzes the historic context data and predicts the next context of each user. The next subsections give our prediction scenarios.

3.2. Prediction on Multiple Context

From multiple types of historic context data, we predict the context that belongs to the given target context type and will occur the most probably within the given time period. The motivation of using multiple types of context data is from the assumption that heterogeneous context as well as homogeneous context probably affects the next context considerably. Figure 3 illustrates the context prediction based on multiple types of context data. In the figure, given context data of location and weather for three days in time order, we want to predict the third context of the fourth day.

3.3. Prediction on Periodic Context

This prediction scenario is an extension of that given in Section 3.2. We assume that users may show different patterns according to some period. For example, the patterns of a typical user on weekday will be quite different from those on weekend. Figure 4 illustrates the context prediction based on periodic context data classified by “a day of the week.” This user may show different patterns on Monday and Saturday. If we ignore this data type of “a day of the week,” even with a good prediction algorithm, inaccurate prediction results may be obtained.

4. Proposed Method

4.1. Input Context Data

Input data are given in the following form. First, the target context type to predict is given. The maximum prediction time is also provided. Historic context data of a user are given in a series of starting time, ending time, context type, context element, which are sorted by starting time in nondecreasing order. An example of historic context data can be shown as follows: 12:00, 13:00, location, restaurant, 13:10, 13:30, activity, listening to music, 13:20, 17:50, weather, rainy,   13:20, 18:00, location, resort, and so on.

4.2. Time-Inferred Pattern Network

Context history can be represented as a graph. For context prediction, we use a new data structure called time-inferred pattern network (TIPN). We can consider TIPN as a kind of directed weighted multi-graph. Each node means context or its combination. There are three types of nodes. Type-I nodes represent given context elements (e.g., restaurant, listening to music, rainy, resort). Each type-I node has the information of the average and the standard deviation of stay times, where stay time means the difference between starting time and ending time (i.e., stay time = ending time − starting time). Type-II nodes deal with ordered pairs of context elements belonging to different context types (e.g., (restaurant, listening to music), (restaurant, rainy), (listening to music, rainy), (listening to music, resort), (rainy, resort)). Each type-II node has the information of the average, the standard deviation, and the frequency of time gaps, where time gap means the difference between starting times of the two context elements (i.e., time gap = starting time of the second context element − that of the first). Each type-III node is for a series of context elements that are frequently appeared. Here the number of context elements of a type-III node is not limited. Each type-III node has the same information as type-II node, but time gap becomes the difference between starting times of the first context element and the last one. Each directed edge contains movement information from some (combined) context to a target context element. The information contains the average, the standard deviation, and the frequency of time gaps, where time gap is the difference between starting times of starting node and ending one. Each edge also has its weight. The edge weight presents the magnitude of possibility from context to context. The same patterns increase the weight.

4.2.1. Information of Nodes and Edges

In this subsection, we note some information of nodes and edges to be used in TIPN. First, we assume that there are context types. We have three types of nodes. Information for each type of nodes are given in the following. For a type-I node for each context type, we maintain the average of stay times (), the standard deviation of stay times (), and the frequency (). Similarly for a type-II node for context pair of different context types, we have ordered context pair (), the average of time gaps (), the standard deviation of time gaps (), and the frequency (). Considering a type-III node coming from frequently appearing context sequence, we maintain common context sequence (), the average of time gaps (), the standard deviation of time gaps (), and the frequency (). Finally, we consider multiple edges between context type and target context type . Edge set , where . The following is information about the th edge between node and node : the average of time gaps (), the standard deviation of time gaps (), the frequency (), and the weight (). Figure 5 shows multiple directed edges between nodes in TIPN.

4.2.2. Preprocessing

To make type-III nodes, we find frequent subpatterns from context log. For each context element in target context type , we extract subsequences ending with within time window. (Since the maximum prediction time is given, we do not use the context data beyond the maximum prediction time when we generate a TIPN.) Let be the number of extracted subsequences and let be the extracted subsequences. For each pair of subsequences, we find its longest common subsequence (LCS) [22]. Then we have LCSes. Using Levenshtein distance [23] and -means clustering [24] upon them, we obtain the optimal and clusters. For each cluster, we find a central sequence so that its length is as short as possible. We add these central sequences to the type-III node set. If the length of a central sequence is just one, we discard the corresponding node.

4.2.3. Construction

We construct a TIPN after the preprocessing of creating type-III nodes. Starting with an empty TIPN composed of isolated nodes, we iteratively construct nodes and edges as reading input context data one by one. Algorithm 1 shows the detailed pseudocode for constructing a TIPN.

Start with an empty TIPN composed of isolated nodes;
For   to   // read input context data
 // constructing type-I nodes and their edges
for   to  
  Update info of type-I node ;
  if type-I node belongs to then
   Update edge ;
 // constructing type-II nodes and their edges
for   to
  if context type of context type of then
   for   to  
    if type-I node belongs to then
     Update info of type-II node ;
     Update edge ;
 // constructing type-III nodes and their edges
for   to  
  if   a type-III node starting with the node then
   for   to  
    if type-I node belongs to then
     Update info of type-III node ;
     Update edge ;

When we update each edge, we test a normal distribution against accumulated time-gaps for the edge. In the case that they do not follow normal distributions, we divide the edge into two or more edges. So there may be multiple edges between each pair of nodes. The weight of each edge is determined in proportion to its frequency and the magnitudes of its corresponding time gaps.

4.2.4. Maintenance

It is clear that recent context data is more important than old ones. So we devise an aging strategy of edges to give more weights to recent data than old ones. We periodically apply it to only the edge weights per a unit time, for example, a day, a week, a month, or a year. That is, all edge weights are periodically updated as follows: for all edge , where . (We set to be in our experiments.) Also, we remove weakly connected edges. Given threshold for removing edges, we remove edge when its weight is less than .

In the case that a TIPN has already been constructed and we have added context data, we do the same procedure of Algorithm 1 starting with the previously constructed TIPN instead of an empty TIPN.

4.3. Prediction Algorithm
4.3.1. Assumptions

In general, we assume that time-gap data for each edge and stay-time data for each node follow a normal distribution. If the collected data is sufficient, our assumption is reasonable by the law of large numbers in probability theory [25]. However, in the case that the collected data is sparse, the assumption is not good. In such cases, we assume the data follow a uniform distribution. In more detail, we assume that the data follow , where and mean the average and the standard deviations of sampled time data, respectively. The following fact supports our assumption.

Fact 1. The average and the variance of are and , respectively.

Proof. Let a random variable follow , and let be the probability density function of . Then,

4.3.2. Sum of Edge Distribution and Node One

First we consider the sum of normal distributions. It is well known in probability theory, for example, [25]. If and are independent random variables that are normally distributed, then their sum is also normally distributed. That is, if , and and are independent, then . This means that the sum of two independent normally distributed random variables is normal, with its mean being the sum of the two means, and its variance being the sum of the two variances (i.e., the square of the standard deviation is the sum of the squares of the standard deviations).

Next we consider the sum of uniform distributions. We assume that and are independent random variables that are uniformly distributed, that is, and . Let be the uniform random variable obtained by shifting the ranges of by ; that is, . Let be the uniform random variable obtained by shifting the ranges of by ; that is, . Assuming that without loss of generality, the sum of and is . Then, . The cumulative distribution function of is where . Then,

By differentiating the function by and then shifting the range of by to the right, we obtain the following probability density function of (see Figure 6):

Finally we consider the sum of normal distribution and uniform one. Let be a random variable that is normally distributed; that is, . Let be a random variable that is uniformly distributed; that is, . We assume that and are independent random variables. We cannot obtain the closed form of the probability density function of , but fortunately the following cumulative distribution function is available: where .

4.3.3. Algorithm

When the current time is and recent context data are given, we are to list all possible context elements and their possibilities of a specific user within the given time period , where . To predict context, we use a TIPN that is already constructed. The possibility of each target node is computed by the following three factors: (i) probability from each node to the target node based on the edge , (ii) the context predictability of each node , and (iii) the degree of importance of the edge to the target node .

Now we define each factor and give the formula to compute . (i) First, we calculate the probability from each node to the target node in relation to the edge . We assume that time-gap data for each edge and stay-time data for each node follow a normal or uniform distribution. To compute precisely, we need to make good use of the conditional probability and normal sum distribution. Let and be the probability density functions of edge and sum of edge and node , respectively. Let be − (the starting time of the latest context related to node ). Then becomes − (probability to arrive at node after time − (probability to depart from node before time ). Hence, (ii) Context predictability means the easiness degree of prediction for target context type. It can be derived from a similarity measure between each node and target context type . We use the variance measure as follows: It ranges from to .

(iii) The degree of importance of the edge from node to node can be simply defined based on edge weights as follows: Now we can define the possibility of each target node as the following weighted sum:

4.4. Managing Periodic Data

To manage periodic context data, we define basic periodic elements, for example, the set of “days of the week.” First, we construct a TIPN per each basic periodic element. Some group of basic elements may have similar patterns. If so, we should merge such TIPNs into one before applying the prediction algorithm given in Section 4.3. Similarity between any two TIPNs, and , can be obtained from the normalized summation of similarities s between distributions of edges (see the next subsections). That is,

If the similarity between two TIPNs is close to one, we merge the two TIPNs. This process is continued until there is no similar TIPNs. Now we present how to merge two TIPNs. We just merge nodes and edges in the TIPNs. Nodes and edges have the information of their statistical distribution. So we have to consider the sum of distributions. We assume that each node and each edge follow normal or uniform distributions. Since the sum of normal or uniform distributions is given in Section 4.3.2, we have only to obtain its related statistics. Assuming that we have two statistics of the average, the standard deviation, and the frequency, that is, () and (), their sum becomes

4.4.1. Similarity between Normal Distributions

Let edges and follow normal distributions, and , respectively. As shown in Figure 7, there are three cases to consider. The first case is that and follow the same distribution, that is, and . In this case, it is obvious that . The second case is that the probability density functions of and have only one intersection. Then and . We assume that without loss of generality. Since the intersection point is . The third case is that the probability density functions of and have two intersections. In this case, . Assume that without loss of generality. Then , where , and . We can get and by solving the equation .

4.4.2. Similarity between Uniform Distributions

Let edges and follow uniform distributions, and , respectively. We assume that without loss of generality. As shown in Figure 8, there are two cases to consider. For both cases, becomes , where and .

4.4.3. Similarity between Normal Distribution and Uniform One

Let edges and follow normal distribution and uniform distribution , respectively. As shown in Figure 9, there are four cases to consider. We have one case for and three cases for , where . In the case of becomes . When , the probability density function of has two intersection points and with the line . Here the points and , which are roots of the equation , become , respectively. The latter three cases can be classified in the following. When . When . When .

5. Test Results

We tested the proposed method using real context data of a mobile user. There were four types of context in our test: location, call, SMS, and activity. They had 5, 6, 2, 2 context elements, respectively. We set location to be target context to predict. The maximum prediction time is 6 hours. Time period to predict is to minutes after current time . As training data to construct a TIPN, we used data for 160 hours (about 7 days). As test data to apply the prediction algorithm of Section 4.3, we used data for 6 hours before current time . The resultant TIPN had 15 type-I nodes, 13 type-II nodes, and 4 type-III nodes. The average number of the context elements of obtained type-III nodes was . We made two tests. One is the context prediction on nonperiodic weekday data. Table 1 shows the results. To see the effect of node extension (of introducing type-II nodes and type-III ones), we used three TIPNs: TIPN with only type-I nodes, that with type-I nodes and type-II ones, and that with all types of nodes. All the methods successfully predicted context with high probability. We could also see that type-II nodes and type-III nodes help predict context more accurately.

The other test is the context prediction on periodic data. As basic periodic elements, we used days of the week. The test user had two patterns: weekday pattern and weekend one. We made tests for both cases. Table 2 shows the results. “All” means the prediction on the whole context data. “Periodic” means the prediction applying the method of Section 4.4. We could see that “Periodic” was better than “All.” In weekend test case, “All” even produced a wrong answer.

6. Conclusions

We proposed a novel probabilistic approach for context prediction of mobile users based on their historic context data. The proposed method predicts general context based on probability theory through a new graphical structure called time-inferred pattern network (TIPN).

The accumulated context data of a user are transformed into the TIPN, in which each type-I node represents a single context, each type-II node is from a pair of contexts, each type-III node is made from a series of contexts the most generally, and each directed edge indicates a context transfer with the time weight inferred from corresponding time data. With the constructed TIPN, assuming nodes and edges follow normal or uniform distributions, we apply a scoring method derived from probability theory to predict the next context.

Unlike traditional approaches using only location, we used other types of context such as call, SMS, and activity, together with location. We also considered context data with periodic property, providing a good solution for context prediction of mobile users with such patterns. Some empirical studies could show the goodness of the proposed prediction algorithms. Nodes corresponding to more complex context patterns could help predict the next contexts more accurately. So could the classification and clustering of daily context data patterns with periodic property. Although the proposed method showed some merits through a simple comparison from our experiments on small data set, we did not make comparison of the proposed method with other state-of-the-art ones. Such comparison on large-scale data will be a good direction for future work.

Acknowledgments

The present research has been conducted in 2013 during the sabbatical research year granted by Kwangwoon University. This work was partly supported by the Samsung Electronics Co., Ltd. and the Advanced Research on Meteorological Sciences through the National Institute of Meteorological Research of Korea in 2013 (NIMR-2012-B-1). A preliminary version of this paper appeared in the Proceedings of the Annual ACM Symposium on Applied Computing, pp. 1015–1019, 2010. The authors would like to thank Mr. Wonkook Kim for his valuable suggestions in improving this paper.