Abstract

Recently, many researches on information (e.g., POI, ADs) recommendation based on location have been done in both research and industry. In this paper, we firstly construct a region-based location graph (RLG), in which region node respectively connects with user node and business information node, and then we propose a location-based recommendation algorithm based on RLG, which can combine with user short-ranged mobility formed by daily activity and long-distance mobility formed by social network ties and sequentially can recommend local business information and long-distance business information to users. Moreover, it can combine user-based collaborative filtering with item-based collaborative filtering, and it can alleviate cold start problem which traditional recommender systems often suffer from. Empirical studies from large-scale real-world data from Yelp demonstrate that our method outperforms other methods on the aspect of recommendation accuracy.

1. Introduction

Rapid technology development has brought an increasing number of mobile devices equipped with GPS capacities, such as laptops, PDAs, and mobile phones. It makes check-in behavior become a new life style of millions of users who share their locations, tips, and experience about points of interest (POI) with their friends in location-based social networks. Recently, how to provide timely and personalized information and sharing services based on users’ location information is gradually attracting a lot of attention both from the industry and from the research community. It also forms a known and independent research area named the location-based services (LBS). In particular, personalized information recommendation is more important since it is beneficial for users to know new POIs or special promotions in marketplace and explore their city and for advertisers to launch advertisements to targeted users.

Recently, many researches on information (e.g., POI and ADs) recommendation based on location have been done in both research and industry [13]. Collaborative filtering (CF) is the mainstream of algorithm to solve this task. Both memory-based and model-based collaborative filtering methods have been proposed and investigated to learn users’ preferences on the LBS from users’ location check-in data [1, 4, 5]. However, previously proposed methods consider all check-ins in a whole and mobile users’ basic laws governing human motion and dynamics are usually overlooked as well as rare researches on cold start problem which results from users’ rarely rating on items in location-based recommender systems. As shown in [6], humans experience a combination of periodic movement that is geographically limited and seemingly random jumps that are correlated with their social networks. About 50% to 70% of all human movements are short-ranged and periodic both spatially and temporally and are not affected by the social network structure and about 10% to 30% of all human movements are long-distance and random and are usually influenced by social network ties. Hence, location-based recommendation should be sensitive to range of users’ movement, and we will show how to alleviate cold start problem in location-based recommender system using user’s basic movement laws in this paper.

Hence in this paper, unlike the previous works, our goal is to provide users with information recommendation within the scope of users’ movement in a very sparse rating system. The task is much harder than traditional location-based recommendation or prediction, because it recommends some interesting business information in the scope of user’s daily movement. However, this task is more significant since it can provide various personalized favorite local pieces of information combined with long ranged travel information which is close to their friends’ home. Thus, if we could divide users’ movement region into two parts, the local part and remote part, then we can recommend their favorite business information in each part to them, but most of all, we should determine each user’s scope of movement by exploiting their check-in log. In location-based services or social networks, the places users check in every time are often some points of interest or parkland they are visiting and we cannot obtain users’ successive location trajectory from their check-in data, and more importantly, this results in a very sparse dataset in location-based recommender system.

Based on these two properties of check-in dataset and studies in [6], we focus on explicitly modeling users’ local movements and long-distance travel preferences for recommendation in their check-in dataset. There are two challenges: (1) how to determine users’ local movement region and remote movement region and (2) how to find users’ favorite business information in each of their movement regions. To address these challenges, we propose a region-based location graph (RLG) and design new algorithms to make accurate top- recommendation on RLG. The uniqueness of the proposed model is the introduction of users’ movement region nodes which could help users find out their local neighbors and remote friends for collaborative information recommendation, including users’ local movement region and remote movement region; furthermore, it captures users’ local visit interest through user-local movement region connections and captures long-distance travel interest through user-remote movement region connections. As the two users’ local movement regions are intersecting, we call the two users local neighbors and as the local movement region of a user and the remote movement region of the other user are intersecting, we call the two users remote friends.

To summarize, our main contributions are as follows.(1)We construct a region-based location graph (RLG), in which region node connects with user node and business node, respectively.(2)While the two regions of users’ local movement are intersecting, we call them local neighbors and while the local movement region of a user and the remote movement region of the other user are intersecting, we call the two users remote friends. Based on RLG framework, we propose a novel location-based business information recommendation algorithm.(3)We compare our approach with other methods on a real dataset and show the performance of our approach on alleviating cold start users problem in location-based recommender systems.

In recent years, the technologies of mobile communication and mobile location have achieved great development, especially the usage of social network sites, which brings a new chance for social application of geospatial information. The willingness of users to share their current locations and experience originally facilitates the creation of location-based recommender systems (LBRS) based on users generated content and makes it receive much attention from the academic community and industry.

Currently, there are two lines of work to solve the task of location-based recommendation [5]. One line of research is conducted based on the GPS trajectory logs [711]. The GPS trajectory data usually consist of small number of users but dense records [12, 13]. Many collaborative filtering algorithms have been proposed and deemed location or POIs as item in traditional recommender systems, such as collective matrix factorization [8], tensor factorization [9], memory-based collaborative location model [10], and pattern recognition model [11]. The other line of work focuses on location-based social networks data, which is very sparse and large-scale [1, 4, 14]. Currently, geographical influences, for example, modeling the check-in probability to the distance of the whole check-in history by power-law distribution [1], modeling users’ multicenter check-in behaviors via multicenter Gaussians [4], and mining user check-in behaviors [15], have been addressed and fused with traditional CF algorithms.

The crucial point of location-based recommender systems enables users to read or ask for recommendations in the vicinity of a specified location users are visiting or used to visit, so it is important that recommendations in location-based recommender systems must have strong binding to users’ movement region. However, there are rarely previous researches on this issue and cold start problem because of the fact that very sparse data in location-based recommender systems is not yet well studied. As a user can only visit a limited number of locations, especially when a user travels to a new city, the user’s locations matrix is very sparse, leading to a big challenge to traditional collaborative filtering-based location recommender systems. To this end, Bao et al. [16] proposed a location recommender system, which consists of two main parts: offline modeling and online recommendation. The offline modeling part models each individual’s personal preference with a weighted category hierarchy and infers the expertise of each user in a city with respect to different category of locations according to their location histories using an iterative learning model. The online recommendation part selects candidate local experts in a geospatial range that matches the user’s preferences using a preference-aware candidate selection algorithm and then infers a score of the candidate locations based on the opinions of the selected local experts. The significance of the recommender systems in location-based services and the promising solution motivate us to investigate further in this paper.

3. Data Model and Problem Definition

In this section, we briefly introduce the related data model and define users’ favorite business information finding problem in their different movement regions.

3.1. Data Model

Unlike GPS trajectory data, users’ check-in data are not continuous in both special and temporal dimensions in location-based social network. The places users check in are often some points of interest or parkland they are visiting. For example, when a user has dinner in a restaurant, he may share some information about this restaurant and his experience with his friends at a social network or review websites. So such check-in data indicate that the scope of user’s daily motion can cover all businesses reviewed by him; in other words, the business information reviewed by a user is limited in the scope of his daily motion.

Suppose that is a mobile user set, where   is the total number of mobile users. Each user has some essential attributes, such as gender, age, and occupation, which is denoted by the form of . For a user’s reviewed and favorite businesses  , which can be formed as the following triple , each business has some basic attributes, including location (e.g., longitude and latitude, denoted by ) and service categories (e.g., restaurant, hotel, bar, and shopping mall, denoted by , is 1 or 0).

Our data are in the form of triple which can be modeled by tripartite graph [17] or a tensor [18]. Although both tripartite graph and tensor treat location as a universal dimension shared by all users, as matter of fact, there is one-to-one correspondence between the business and location in users’ check-in data and some users could never review lots of businesses which are out of their movement region in their daily lives. As argued in [6] most of the users’ motions are composed of short-ranged daily travel between their homes and workplaces which is periodic both spatially and temporally and long-distance travel which is more influenced by social network ties. In a recommender system, the fixed correlation between business and location is typically not significant, while the movement region plays an important role in recommendation generating process, and the correlation between user and his movement regions is more relevant than that between user and location of business reviewed by them.

Therefore, according to the geographical position distribution of all businesses reviewed by each user, we divide a user’s movement region into local movement region and remote movement region. Provided that represents the geographical center of a user’s movement region, is the farthest distance between the center and the position the user can reach in his daily life. If there exists a number , the percentage of businesses reviewed by a user in a circle region around the pointer of with as its radius can reach a fixed number , and we call this circle region the user’s local movement region and the other region his remote movement region . Namely, where , , and is the Euclidean distance between two points, and according to the conclusion in [6], we set to 0.7. In addition, we call two users as local neighbors if their local movement regions are overlapping, denoted by , whereas we call them remote friends, if the remote movement region of a user and local movement region of the other user are overlapping, denoted by .

3.2. Problem Definition

Intuitively, location-based recommendation is trying to find potential favorite business information within users’ entire motion range. With data model defined above, we formally define this problem as follows: there are some basic datasets of all users, including review log set , local neighbor set and remote friend set . For a specific query user, one recommendation method should return a ranked list of businesses which the user would like, and what is more, the ranking score in the process should consider both user’s different movement region and social relationship.

4. RLG Construction

In this section, we treat the two movement regions of users as new nodes, which enable new linkages between users and the location of their reviewed business and construct a graph and name it region-based location graph (RLG), which contains three types of nodes: user node, movement region node, and business node. In this way, we can transform into by formation of users’ two movement regions.

RLG is a bipartite graph , where denotes the set of all user nodes, is the set of users’ movement region nodes, and is the set of business nodes. is a nonnegative weight function for all edges. Figure 1 is a simple example of RLG containing two user nodes, 4 region nodes, and 6 business nodes. It shows that user interacts with his two movement regions and ; likewise, the user node interacts with his two movement region nodes and , Furthermore, region node is also linked to business nodes , , and because user node   reviewed business nodes  , , and in his local movement region node  , and region node is connected with business nodes because user node   reviewed businesses node   in his remote movement region node  . Similarly, region node is connected with business nodes and and region node is connected with business node .

In RLG, each user node connects with two movement regions, and the two movement regions only connect with some business nodes which were reviewed by user. If two users coreviewed a business, then their two movement regions would be overlapping; for example, in Figure 1, user and user coreviewed, respectively, business in their local movement regions. This means that user is the local neighbor of user and the two regions are overlapping; that is to say, user could reach ’s local movement region and user could reach ’s local movement region, and they both would like some businesses in the region. Thus, based on the above empirical observations, we connect user node with region node and connect user node with region node in RLG (dotted lines), and if we start working from a user node (), passing through a region node (), we will find out his local neighbor () and we can reach an unknown business () in his local region; namely, . In the same way, we can obtain another path which connects user node with business node or . Hence, in this way, we can help user search for favorite business information in his movement region from local neighbors or remote friends.

The edge weights of RLG between user node and region node are defined as

Given an edge , if , , its weight will be and if , , its weight will be , and, , and since each user’s local movement region or remote region is a part of his whole movement region and the probability of the user being active in local movement region is much greater than that in the remote region, so we let as a learning parameter in the following experiment, and in this way, we use different weights to model the influence of local mobility preferences and long-distance mobility preferences. is the similarity between the two users who coreviewed lots of businesses. When one of the two users only reviewed one business, we use Jaccard coefficient (as shown in formula 6) between the businesses reviewed by them as their similarity , while when the two users reviewed more than one business, we calculate by using the Cosine similarity (as shown in formula 7).

The edge weights of RLG between region node and business node are defined as

The definition means the higher the reviewing score is, the more the user likes the business in a movement region. We denote by the transition probability matrix of RLG: where is an matrix representing the transition probability between user nodes and region nodes, as defined in 2 and is an matrix representing the transition probability between region nodes and business nodes, as defined in 3, and they are symmetric matrixes. We will choose random walk with restarting process to simulate location-based business recommendation process.

5. Making Recommendation on RLG

So far, several graph-based methods have been introduced into recommendation system [1921] to model the interaction between users and items on a graph and to compute node similarity from a global perspective, instead of local pairwise computation of neighborhood [19]. They essentially transform the recommendation process into graph search problem in a graph. Random walk on graphs has shown that it has a rather good performance in graph-based recommendation systems. PageRank, one of typical random walk algorithms, has been widely used in search engines to rank items globally.

Now we describe the recommendation process as a graph search problem in RLG and use the example shown in Figure 1. Suppose that the system needs to recommend lots of businesses to an active user in one of his movement regions. We firstly determine the association between the user and each of businesses which have not been reviewed by this user. The businesses are then sorted according to the associations and top businesses are chosen for recommendation.

In our model, the association between two nodes is determined by all paths connecting them. For the pair of a user node and a business node , we compute the association between them as the sum of weights of all distinct paths that connect and . In this computation, we differentiate two types of paths-paths via local neighbor nodes and paths via remote friend nodes. The length of the two paths is 4. As we defined problem earlier, for a user , if the larger score is on a business reviewed by most of his local neighbors, this user probably will review this business in his local movement region. We adapt modification [22] of PageRank algorithm to calculate the association between an active user and recommended business . For the ease of the algorithms description, let denote the set of nodes that can form paths from to business nodes; let denote a node regardless of this is a user node, region node, or business node; let denote the weight of the link between nodes and ; and let denote the link weight between node and ; then, the matrix is formed from . Furthermore, let denote the association degree between and node when considering paths of length . The algorithm for computing association between and business nodes is shown in Algorithm 1.

(1) Initialize for all ,
(2) for , or until convergence do
(3) for each node do
(4) 
(5)  for each node do
(6)   if or
(7)  
(8)  end for
(9) end for
(10) end for
(11) return , is the association between an active and a business via paths of the length .

Here, is a parameter that downweighs longer paths. We fix with 0.5 in our experiments. The most time consuming part of this algorithm is from line 3 to line 9 which requires computations over all . However, the matrix is very sparse with most elements that are equal to zero and are symmetric. This allows us to use sparse and triangular matrix representation for , which can reduce the complexity to , where is the maximum number of nonzero elements for each row of matrix .

6. Experimental Evaluation

6.1. Data Set

We use dataset from Yelp in our following experiment, and it is publicly available [23]. It is from a US city, Phoenix; each review has a location (being reviewed by users) that is associated with a unique pair of latitude and longitude coordinates. It contains 43873 users, 229907 reviews, and 11537 pieces of business information. About half of all users reviewed just only one business, and consequently the dataset is very sparse (99.9545% sparsity). The other pieces of information about the dataset are given in Table 1.

In the following section, we will discuss the location distribution of businesses reviewed by all users which can reveal all users’ movement regions. We firstly randomly select ten users from the dataset, and the total number of their reviewed businesses, respectively, is 1, 1, 14, 22, 28, 49, 69, 91, 102, and 112, and the location distribution of businesses reviewed by ten users is as shown in Figure 2. The data from Figure 2 indicates that almost all businesses reviewed by each user are located in a certain region, and most of regions are overlapping, and obvious zoning appeared in the whole region. We can get an observation which is the same as the observation in [6]. To further verify the above-mentioned conclusion, we do some statistical analyses on the percentage of businesses reviewed by each user in three circle regions, whose center is user’s movement center and radii are, respectively, , , and (as shown in Figures 3, 4, and 5), and we, respectively, call them region, region, and region.

We can see from the three figures that the larger the number of businesses reviewed by a user is, the higher the proportion of businesses in the central zone is. It also demonstrates that most mobility of all users is restricted in a local region by their daily activities. Furthermore, the percentage of all users who reviewed more than half of businesses in region is, respectively, 48.95%, 86.03%, and 99.48% in the three figures. Therefore, we call region the local movement region of each user and the other is the remote region.

6.2. Evaluation Metric and Compared Methods

We use Hit ratio [24] as the metric for the top- recommendation. Our dataset is split into training part and testing part: for each user, the latest businesses he reviewed are selected as test data and other businesses are selected as training data. When a recommendation is made, we always generate a list of () businesses named for every user in his whole movement region. If the test business appears in the recommendation list, we call it a hit, and then the Hit ratio can be calculated as follows:

We compared the Top- recommendation performance of our method with several existing methods: popularity-based (Pop@1), item-based collaborative filtering (ItemKNN@1), user-based collaborative filtering (UserKNN@1), and their extended methods under condition of users’ movement regions divided into two parts (Pop@2, ItemKNN@2, and UserKNN@2).

Popularity-Based (Pop) Method. Popularity-based method generates a ranking list based on the popularity of businesses in the training dataset. It is not a personalized method and consequently generates the same list of recommended businesses for every user.

Item-Based Collaborative Filtering (ItemKNN) Method. Item-based collaborative filtering method finds the businesses which are similar to some businesses reviewed by each user. The similarity between two businesses can be calculated as follows:

User-Based Collaborative Filtering (UserKNN) Method. User-based collaborative filtering method finds the similar users to the active user and aggregates the ratings of the similar users. The similarity between two users and aggregation function are as follows:

6.3. Experimental Results

In this section, we illustrate the results of all methods and show performance of our method for all cold users who do not review any business information. We will firstly investigate the impact of parameter and then compare the Hit ratios of the four methods recommending cold users.

6.3.1. The Impact of Parameter

We focus on analyzing parameter which governs the influence of users’ local mobility preference formed by their daily activity and long-distance mobility preference formed by their social network ties in location-based businesses recommendation. When tuning , the results of how Hit ratios change against all algorithms are shown in Figure 6. Pop@1, ItemKNN@1, and UserKNN@1 do not have parameter ; thus, their Hit ratios are drawn as a straight line. The results show that Pop@2, ItemKNN@2, and UserKNN@2, respectively, outperform Pop@1, ItemKNN@1, and UserKNN@1, and our method always outperforms others whatever parameter is. Moreover, when we set parameter to , we can get the best results of Hit ratio for most of the methods, so we simply fix parameter to in the following experiments.

6.3.2. Make Recommendation for Cold Users

One challenge for most of existing methods is that the recommendation accuracy suffers when the user-business matrix is very sparse. From Table 2, we can see that over half users reviewed just one business in the dataset. Traditional user-based collaborative filtering cannot recommend any business to these users because of the fact that it is difficult to find out the nearest neighbors for these users in it. We regard the location of business reviewed by a user as the center of the user’s local region and the average of other users as the radius of his local region. We use 6 to calculate , and thus our method possesses the advantages of user-based collaborative filtering and item-based collaborative filtering. Our method can alleviate the sparsity problem by exploiting movement region to find out users’ local neighbors and remote friends. To verify this hypothesis, we use four methods that recommend some businesses in their local region to all cold users, and the results are shown in Table 3. Apparently, our method has better performance than other methods.

7. Conclusion

User mobility often exhibits long- and short-distance factors which, respectively, formed daily activity and social network ties. Tracking and leveraging these factors for location-based business information recommendation pose great challenges. In this paper, we construct a region-based location graph (RLG), which can combine with user short-ranged mobility formed by daily activity and long-distance mobility formed by social network ties and sequentially can recommend local business information and long-distance business information to users. Moreover, it can combine user-based collaborative filtering with item-based collaborative filtering and can be successful in generating recommendation for cold start users, and, consequently, it can alleviate cold start problem which traditional recommender systems often suffer from. The experiments on real dataset confirm that the effectiveness of the proposed method is better than that of other methods.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (no. 60872051) and the Mutual Project of Beijing Municipal Education Commission, China.