Abstract

It is hard to choose places to go from an endless number of options for some specific circumstances. Recommender systems are supposed to help us deal with these issues and make decisions that are more appropriate. The aim of this study is to recommend new venues to users according to their preferences. For this purpose, a hybrid recommendation model is proposed to integrate user-based and item-based collaborative filtering, content-based filtering together with contextual information in order to get rid of the disadvantages of each approach. Besides that, in which specific circumstances the user will like a specific venue is predicted for each user-venue pair. Moreover, threshold values determining the user’s liking toward a venue are determined separately for each user. Results are evaluated with both offline experiments (precision, recall, F-1 score) and a user study. Both the experimental evaluation with a real-world dataset and a user study of the proposed system showed improvement upon the baseline approaches.

1. Introduction

Social media platforms are very rich data resources for researchers to mine and gain insight into user preferences. The increasing use of location-related technologies enables the development of location-based-services. Therefore, location-based social networks (LBSNs), which have become the host of new possibilities for user interaction, have emerged. These systems, which facilitate users to share their visits and explore other locations, have accumulated huge amount of data about users with extensive use over time. Location Recommendation Systems (LRSs) have been developed by discovering embedded information from these data to provide location suggestion for the users.

There are three main recommendation techniques, which are also applied for location recommendation, content-based filtering (CBF), collaborative filtering (CF), and hybrid recommendation. Content-based filtering utilizes the information about an item itself for recommendations and tries to find the most similar item with the user’s previous preferences. Collaborative filtering recommends an item according to the similarity between one user’s preferences and the preferences of other individuals. Hybrid approaches, which are the composition of at least two existing approaches, have recently been awarded for their ability to improve prediction. Contextual information (weather, time, date, etc.) is more important in travel and tourism domains. Therefore, context-aware recommender systems should be widely used for location recommendation rather than other types of recommendations (product, movie, music, etc.). However, most of the recommendation engines fail to consider contextual information for location recommendation.

Personalization, which should be handled from different angles, is another issue for recommendation systems. The effect of each variable used in the recommendation may vary among different users. For instance, two people may like the same places but in different contextual circumstances. Therefore, it is important to consider the changing effects of contextual variables on different users.

In this study, a contextually personalized hybrid location recommender system is developed. For this purpose, users’ check-in history, visited location properties (distance, category, popularity, and price), and contextual data (weather, season, date, and time of visits) were collected from Twitter, Foursquare, and Weather Underground. A hybrid approach (user-based collaborative filtering, item-based collaborative filtering, content-based filtering, and context-aware recommendation) was applied, and results were evaluated with both offline experiments (precision, recall, F-1 score) and a user study. This study is an expanded version of the previous study [1]. The scientific value of this study can be listed as below:(i)Three different types of variables (user-related, venue-related (content), and contextual) that have not been used together in existing recommender systems were used in one algorithm to develop a novel recommender system(ii)Artificial neural network algorithm was applied to determine the weight of each algorithm (user-based collaborative filtering, item-based collaborative filtering, content-based filtering, and context-aware recommendation) that was used when developing the hybrid recommendation system(iii)Threshold values determining the user’s liking toward a venue were determined separately for each user(iv)A contextually personalized recommendation was generated by determining which contextual circumstances were more appropriate for each specific user-venue pair(v)Data sparsity problem was alleviated(vi)Overspecialization was lessened(vii)Cold start problem was partially solved.

Developing a location recommender system is very attractive for researchers because of its importance in both academia and business. Therefore, although its history is based on less than a decade, there are many studies on this subject.

Content-based algorithms utilize the content information of a location in order to handle data sparsity problem that may occur in CF algorithms. Table 1 presents mostly used content information variables. The category variable specifies the category of a venue (restaurant, shopping center, theater, etc.). The distance variable specifies the distance from the user (GPS location, center of visited venues, etc.) to the venue. The tag variable specifies tags that are given by the users (can be visited with friends, romantic, etc.). Tips and comments are specified by the user about the venue. The popularity variable denotes the value of a location specified by ratings, number of visits, etc. The tags and tips/comments variables are used for sentiment analysis, which is not in the context of this study. For this reason, only the category, distance, and popularity variables were selected.

Collaborative filtering algorithms can be categorized as memory-based and model-based. In addition, memory-based CF is divided into two categories; user-based CF, which considers user similarity for recommendation [3537] and item-based CF, which considers the item similarity for recommendation [38, 39]. Data mining techniques such as neural networks [40], Naïve Bayesian modeling [13, 23], association rule mining [41], and SVD [2] are used for model-based CF.

The contextual approach emerged after traditional approaches, which simply focus on the past preferences of customers. Context represents a set of surrounding conditions of a user-item pair and affects the relation between them. A context-aware recommender system may consider either user context (income, profession, age, current user location, mood, and status, etc.) or environmental context (current time, weather, traffic conditions, events, etc.) [14, 42]. Contextual information is very crucial, especially for location recommendation. The decisions of the users for venue visits are generally based on environmental factors rather than on their decisions about other things (buying a product, listening to music, etc.). Even though contextual information is critical rather than additional for location recommendation systems, it is not used in existing systems commonly and effectively. In the literature, it is emphasized that context-based algorithms are demanding for effective recommender systems [4346].

Contextual information for location recommendation can be specified with many variables. Mostly used variables are presented in Table 2.

As can be seen from Table 2, time and weather condition (e.g., sunny, rainy, and snowy) variables have been used for location recommendation more frequently than other variables. Therefore, time and weather conditions are selected as the contextual variables for this study.

Each filtering approach has different drawbacks. For example, the disadvantages of CF are the cold start problem, data sparsity, and scalability [57]. On the other hand, information need about an item and overspecialization are the drawbacks of the CBF [58]. Hybrid approaches combine at least two of the existing approaches and aim to minimize or remove the drawbacks of existing approaches, which may occur when they are used individually. Therefore, hybrid systems, which are the combination of some of these approaches, can be the solution for a better recommendation system [43, 44]. Even though some hybrid systems are presented in the literature, there are still untouched points for performance improvement of location recommender systems. There are studies considering different hybrid algorithms for location recommendation [2, 7, 19, 48, 50, 51, 59]. Seven different types of hybridization techniques are mentioned in the literature, namely, weighted, switching, mixed, feature combination, cascade, feature augmentation, and metalevel [60].

This study aims to develop a personalized hybrid recommendation system using both user and location similarity, location-related properties (distance, category, popularity, and price classification), and varying effects of contextual data (weather, season, date, and time of visits) among different users. Weighted hybridization method is used to achieve better performance and to have drawbacks of any individual recommendation system.

3. Methodology

3.1. Data Collection

The aim of this study is to recommend new venues to the users according to their preferences. Therefore, a location-based social network should be chosen to collect the necessary data. For this purpose, two popular social networks, Twitter and Foursquare, were chosen. In order to crawl users’ check-in history, Twitter is used since Foursquare does not allow direct streaming of user check-ins. Foursquare was chosen to collect the characteristics of various venues since it is one of the most popular location-based social networks and provides the characteristics of various venues with its API.

The REST API of Twitter, which is popularly used for designing web APIs to use pull strategy for data retrieval, was used for this study. The REST API–“GET search/tweets,” which returns a collection of relevant tweets matching a specified query, was used. When a user who linked his/her account with Twitter check-ins using Swarm, a related tweet including all check-in data appears on the Twitter timeline. Firstly, Twitter user ids’ of users who “checked-in” on the Swarm application and shared their check-ins over the Twitter application were collected for a two-month period. After that, all geocoded tweet history of collected users, which goes back to 2011, was retrieved and stored. The Twitter APIs (for Twitter API version 1.1) were used to collect dataset by embedding them into the PHP code, and dataset was stored in MySQL database.

When a user checks in using Swarm and shares this check-in on Twitter, the related tweet includes a URL starting with “https://www.swarmapp.com/,” which contains a venue id at the end. Those venue ids were sent to the Foursquare API (https://api.foursquare.com/v2/venues/VENUE_ID) and venue name, category, latitude, longitude, check-in count, visitor count, tip count, and price classification of venues were collected as venue attributes.

Weather history data are collected from the “Weather Underground” website. Each check-in date is matched with the date in weather history for related weather condition (sunny, rainy, snowy, etc.).

3.2. Data Preprocessing

Data preprocessing helps to transform raw data into an understandable format. Real-world datasets are mostly incomplete, inconsistent, and lack certain behaviors. Data preprocessing is necessary for preparing these raw datasets for further processing. Data reduction, which is one of the data preprocessing steps, was applied to the raw dataset in order to obtain results that are more accurate.

The raw dataset consisted of 6738 users, 60202 venues, and 226227 visits. Data reduction was performed according to the following criteria:(i)Only Istanbul check-ins were retrieved in order to increase visit frequencies.(ii)There are various main categories of venues in Foursquare. For this study, “restaurant” was chosen as main category and all related subcategories of restaurant were used because of intensive check-in frequency in restaurants.(iii)Users who visited only one venue were extracted.(iv)Venues, which were visited by only one user, were extracted.

After that, 1101 users, 711 venues, and 4694 visits remained in the dataset.

The terms used in this study can be found in Table 3.

3.3. Rating

Foursquare does not provide the direct ratings for venues from each individual user. Therefore, rating was calculated from linear normalization of the frequencies in a range of 1 to 5 for each user-venue pair. If a user’s maximum and minimum number of visits are equal, then the rating was determined as 1.

3.4. Distance

The latitude and longitude values of venues, which were collected from Foursquare, were converted into the x, y, z coordinates (Equations (2)–(4)):

User centers were calculated by taking the weighted average of x, y, z coordinates of all visits for each user in order to understand his/her active area. Euclidean distance from each venue to the user center was calculated and named as distance variable (Equation (5)):

3.5. Popularity

Foursquare provides four variables about a venue; check-in count, like count, user count, and tip count. In this study, these variables were considered as reference to the popularity of a venue. Therefore, the popularity variable was formed from these four properties of venues by applying Principal Component Analysis (PCA), which aims at dimension reduction. Before applying PCA, sampling adequacy for PCA should be checked. For this purpose, KMO and Barlett’s tests were used. Sampling adequacy can be observed in Table 4, which presents KMO value as 0.821 and significance of Barlett’s test of sphericity as 0.001. The acceptable level of KMO is generally 0.6, and Barlett’s test of sphericity is significant at the 1% alpha level. The results showed that the sample is adequate for PCA.

Ninety-three percent of the total variance was explained by only one component (Table 5). Therefore, it can be concluded that one variable, which was named “popularity,” can be used instead of four variables.

The component matrix shows the correlation between variables and component. Since the correlation values range from −1 to +1, it can be concluded that there is a strong positive correlation between a component and each of the variables (Table 6).

3.6. Category

All subcategories of food, which were collected by the FS API, were included in this study. There are 34 restaurant categories including different countries’ cuisine in the dataset. User-category matrix, which presents the number of visits of each user in each subcategory, was prepared.

3.7. Price

There is a four-price classification in Foursquare: 1-cheap, 2-average, 3-expensive, and 4-very expensive. The user-price matrix presenting the number of visits of each user in each price class was prepared by using the data coming from FS API.

3.8. Time

Twitter provides UNIX time format for each tweet, in order to understand date and time, it is converted to date and time stamp. For this study, season, day, and the different periods of the day were used as contextual variables. It was observed that some of the values of some contextual variables showed similar characteristics, such as users have the same pattern of check-in behaviors for weekdays. Therefore, discretization was applied to the contextual variables, which displayed a better performance. Days were discretized as “weekday” and “weekend” [51, 61]. The check-ins that were made in spring or summer were categorized as “hot season” check-ins while the check-ins that were made in autumn and winter were categorized as “cold season” check-ins [61]. In the studies of Majid et al. [48] and Wang et al. [62], time is discretized as morning, afternoon, evening, and night. In the study [61], a day is discretized as morning and evening only. However, after exploring the check-in behaviors in the dataset, it is found that discretization as morning, noon, and evening would be more suitable. The time range 07:00 to 11:59 was defined as “morning,” 12:00 to 16:59 as “noon,” and 17:00 to 06:59 as “evening.” Thus, it is aimed to make a more accurate recommendation.

3.9. Weather

The data of weather condition, which is also a contextual variable, were collected from the Weather Underground API that provides more than 10 different weather conditions (sunny, rainy, snowy, rainy and stormy, snowy and stormy, etc.). It was observed that some weather categories showed the same patterns of user behavior. Therefore, they were discretized under three main categories: “sunny,” “rainy” (all categories including rainy), and “snowy” (all categories including snowy). The percentage of which contextual circumstances the venue is preferred was calculated by the proportion of visits in the specific category over the total visits.

3.10. Development of Recommendation System

Development steps of this recommendation system are explained in detail below.

User similarity values were calculated from the user-venue matrix, which presents the ratings of users to the venues with cosine similarity (Equation (6)). Ratings () of the user to the other venues were predicted by using user similarity values (Equation (7)):

User similarity values were calculated from the user-category matrix (Table 7), which presents the number of venues that a specific user visited for each category with Cosine similarity in a similar manner as in Equation (2).

Popularity values were discretized into three categories, namely, high, medium, and low, according to their normalized popularity values, which were attained from PCA. Then, the user-popularity preference matrix (Table 8) was generated, which keeps the number of venues that a specific user visited in each popularity category.

User similarity values were calculated from the user-popularity matrix, which presents the number of venues that a specific user visited in each popularity class with cosine similarity in the similar manner as in Equation (2).

The user-price preference matrix (Table 9) was also constructed, which keeps the number of venues that a specific user visited in each price category. User similarity according to price preferences was also calculated in the similar manner as in Equation (2).

Equation (3) was also used to calculate predicted ratings using user similarity, which depend on the category (), popularity (), and price () preferences of users.

Venue similarity values were calculated from the user-venue matrix with cosine distance (Equation (8)). Ratings () of the user to the other venues were also predicted by using venue similarity values (Equation (9)):

The venue-context matrix (Table 10), which keeps track of the contextual features of the venues, was prepared. The percentages showing the venue preferences in different contextual circumstances were presented in this matrix. With the help of this matrix, venue similarities were also calculated using Equation (4). Predicted ratings () are also calculated as in Equation (5) using the contextual similarity of venues.

The calculation of ratings () according to the distance between a venue to the user depends on the assumption that if this distance is short, then the user will visit that venue more frequently [6366]. Although the users are more willing to check in at nearby venues to their centers, distance perception of each user is different. Optimal coefficients (, , and ) for power law distribution [49, 64, 65] were determined to model the willingness of a user to go and check in at a place

In this study, a weighted hybridization technique was used to compute the score of recommended items using all available recommendation algorithms. Artificial neural network (ANN) analysis was applied in order to find the optimal weights for each technique instead of using equal weights for them. Inputs to the ANN were the results of all available recommendation algorithms, and the output to be predicted was the actual rating. Final ratings were calculated by multiplying the ANN weights and ratings:

In order to decide whether to recommend a venue to the user or not, different threshold values were used for each user. The threshold value for each user was determined by taking the average ratings of that user. After that, if the calculated rating (Equation (11)) was greater than his/her threshold, then it was considered that the user will like that venue. This is the first version of the algorithm, and it is named as “HybRecSys.”

Existing recommender systems do not consider that the preferences of the users are affected by different contextual circumstances. For instance, a user may prefer a venue on a rainy weekday at noon, while another user may prefer the same venue in another context. In order to handle this issue, our system calculates the probability of visiting a venue in a specific contextual category. For instance, the following equation calculates the probability of visit of user i to the venue j in the mornings:

For each user-venue pair, there are 36 different contextual circumstances (day = weekday, weekend; time = morning, noon, evening; season = hot, cold; weather = sunny, rainy, snowy). For each situation, probabilities were calculated, and the resulting table was constructed (Table 11).

Table 11, respectively, presents user id, venue id, average rating of related user, the percentage that the user will visit that venue in that time category, the percentage that the user will visit that venue in that day category, the percentage that the user will visit that venue in that season category, the percentage that the user will visit that venue in that weather category, total point from all contextual variables (sum of all percentages), predicted rating, and the final decision (Like).

The sum of all categories of each contextual variable should add up to one. For instance, since the day variable has two categories, weekdays and weekend, if a user’s probability of visiting a specific venue on a weekday is 0.6, then the probability that user will visit the same venue on a weekend has to be 0.4. Therefore, the sum of the values of all contextual variables (Contexttotal) may have a maximum value of 4. The final decision of whether a user will like a venue or not depends on two things; the predicted rating having a greater value than the average rating of the user and the total context having a value of at least 2 out of 4. The final version of the algorithm was called “contextually personalized HybRecSys” and the whole development process is depicted in Figure 1.

4. Evaluation of Recommendation System

The performance evaluation of the proposed system will be explained in detail in this section. For the evaluation of recommender systems, there are three types of experimental settings [67]:(1)Offline experiments, in which a precollected dataset of users is used to validate the results(2)User studies, in which a small sample of users are asked to perform several tasks requiring an interaction with the recommendation system(3)Online evaluation trying to evaluate different algorithms over online recommender system with a small percentage of the traffic.

4.1. Offline Experiments

The Contextually personalized HybRecSys was compared with five algorithms: user-based K-nearest neighborhood (KNN) [68], item-based KNN [69], biased matrix factorization [70], SVD++ [71], and HybRecSys [1]. First four algorithms were available in the LibRec, a Java library for recommender systems. For each algorithm, the default settings of LibRec were used. Fifth algorithm, HybRecSys, is the earlier version of Contextually Personalized HybRecSys. Other hybrid algorithms defined in the literature could not be included in this study since the source codes were not available.

Precision, recall, and F-1 measures were used as evaluation metrics in this study. Precision specifies the percentage of correctly recommended items over total recommended items. Recall indicates the percentage of recommended items over the total number of liked items by the user. The F1 score measures the accuracy of the system by using both precision and recall.

In order to split the data into training and test sets, K-fold (K = 10) cross-validation technique was used. The dataset was divided into 10 disjoint sets making sure that each set contains about 10% of the visits of each user. One set was used as the test set and nine sets were used as the training set for each fold.

Figure 2 shows precision, recall, and F1 measure of each algorithm and it is obvious that Contextually Personalized HybRecSys outperforms all other algorithms.

According to precision metric, HybRecSys, user-based KNN, item-based KNN, biased matrix factorization, and SVD++ follow contextually personalized HybRecSys, respectively. According to recall metric, HybRecSys, biased matrix factorization, item-based KNN, SVD++, and user-based KNN follow contextually personalized HybRecSys. According to F-1 measure, HybRecSys, item-based KNN, biased matrix factorization, user-based KNN, and SVD++ follow contextually personalized HybRecSys, respectively.

4.2. User Study for Contextually Personalized HybRecSys

As was indicated in the literature, user studies are very helpful to understand whether the recommendations are liked by the users and to collect more detailed data about the recommendation system [45, 57, 67]. Although conducting a user study is difficult, time-consuming, and costly, it is suggested to apply them after the offline experiments in order to validate the results of the offline experiments [67].

4.3. Steps of the User Study

For this study, a user study was conducted on the users in our dataset. For this purpose, the Twitter account of each user in our dataset was checked to learn whether their profiles allow to receive direct messages. Out of 1101 users, 195 accounts were open for direct message. Those users were invited to attend our user study by direct message, and a small incentive (a movie ticket) was promised if they attend this evaluation.

Twenty-four users replied to the message and accepted to attend the study. After that, the user evaluation occurred in the following steps:(1)The algorithm predicted the ratings of that user to all venues except the venues they visited.(2)Among the results, the algorithm recommended three top-rated venues for each 24 users. These users were asked via Twitter message whether any of the recommended items attracted their attention.(3)Twenty-one users replied to the recommendations. Only one of them said that none of the venues was suitable for them. Others were interested in at least one venue among the recommended ones.(4)A survey, which also includes the Foursquare links of recommended venues, was prepared according to the participants’ choices and sent to them.(5)The users were asked to fill out the survey after they visit that recommended venue, or after examining the Foursquare page of the venues.

4.4. Survey Questions of the User Study

There are 12 questions in the survey. The first question was asked to understand the appreciation of the participants to the recommended venues. It was asked in 5-point Likert scale (1-Not Like At All, 2-Not Like, 3-Not Sure, 4-Like, 5-Like Very Much).

The following four questions were asked to measure the appropriateness of the category, price, popularity, and the location of the recommended venue for the participant. They were also asked using a 5-point Likert scale (1-Not Appropriate At All, 2-Not Appropriate, 3-Not Sure, 4-Appropriate, 5-Very Appropriate).

The following four questions were asked to understand in which contextual circumstances the user would prefer the recommended venue. These questions were asked as fixed sum scale questions. The participants were asked to distribute a hundred points to the categories of a contextual variable according to the tendency of the user to visit that venue in these categories.

The last two questions were demographic questions. They were asked to learn the age, gender, and the education level of the participants.

4.5. Results of the User Study

Participants’ average age is 30 and the age range varies from 19 to 38. There are eight women and 12 men in the dataset. 12 of the participants graduated from high school, six of them have a bachelor’s degree, and two of them have a master’s degree. Table 12 presents the answers of the participants to the first five questions of the survey.

Ninety percent of the participants liked the recommended venues. Ten percent of them were indecisive about whether they like the recommended venue. Ninety percent of the participants thought that the price class of the recommended venue was suitable for them. Forty percent of the participants thought that the category of recommended venues were very appropriate for them while 20% of them thought that the categories were appropriate, and the remaining are indecisive. Seventy percent of the participants thought that the popularity class of recommended venues was appropriate for them, and 10% of them thought that the popularity classes of the venues were very appropriate. On the other hand, 10% were indecisive while another 10% thought that the popularity classes of the venues were not appropriate. Moreover, the address of the recommended venue was given to the participants, and they were asked whether the location was appropriate or not. 40% of the participants said very appropriate and 30% of them said appropriate while the other 30% said not appropriate.

The following four questions were asked to understand in which contextual circumstances participants will prefer to visit the recommended venues. Table 13 presents the answers of each participant to these questions and the predictions, which are calculated from the algorithms for each user.

The predicted results were compared with the actual answers of the participants. The evaluation of the recommendation was measured with precision, recall, and F-1 measures. Table 14 presents the precision, recall, and F-1 scores for 20 participants. Precision value is 0.282, recall value is 0.276, and F1-score is 0.279.

The participants made some comments about the venues via Twitter messages. Some of the participants had been to some of the restaurants and supported the given recommendation with the following expressions:(i)Participant 1 (Gender: Male, Age: 37, Education: High School)(a)“I always prefer going these two restaurants that you have recommended.”(ii)Participant 2 (Gender: Male, Age: 38, Education: High School)(a)“I have already been to one of the restaurants.”(iii)Participant 3 (Gender: Female, Age: 19, Education: High School)(a)“I have visited one of the restaurants before.”(iv)Participant 4 (Gender: Female, Age: 38, Education: Bachelor’s Degree)(a)“I have already visited venue1 and venue2.”

These statements demonstrate the ability of the recommender system’s accurate prediction. Even participants that had not visited the recommended venues before stated that they like the recommended venues. This result reveals that the developed system has the diversity, novelty, and serendipity dimensions.

5. Conclusion

In this study, a contextually personalized hybrid recommendation model was proposed. This model integrates user-based and item-based collaborative filtering, content-based filtering together with contextual information in order to get rid of the drawbacks of each approach. Different data sources were used to collect the data: Visiting history of users was collected from Twitter. Venue characteristics (distance, category, popularity, and price classification) were collected from Foursquare, and contextual information (weather, season, date, and time of visits) related to each visit were collected from Weather Underground website. For content-based filtering, the variables distance, category, popularity, and price classification that have not been used before in one algorithm were used in order to determine content-based user similarity. Weather conditions, season, date, and time of each visit were used cumulatively as the properties of venues, and contextual similarities of venues were utilized. The artificial neural network algorithm was applied to determine the weights of each algorithm. Ratings coming from different algorithms (user-based CF, item-based CF, content-based filtering, and rating calculated from the contextual similarities of venues) were used as the predictors of the actual rating. Final ratings were calculated by multiplying the weights retrieved from neural network and the ratings from different algorithms. In addition, in order to make a more accurate recommendation and make the recommender system contextually personalized, our system calculates the probability of visiting a venue in a specific contextual category for each user-venue pair. The decision of recommendation of a venue to the specific user is made according to two rules. If the calculated rating is greater than the average rating of the user and the total contextual score is greater than two then that venue will be recommended to that user under the specific contextual circumstances. The Hybrid system prunes the disadvantages of each approach that may occur when they are used separately.

Contextually Personalized HybRecSys was compared to user-based and item-based KNN, biased matrix factorization, and SVD++. These algorithms are evaluated according to the metrics, which are used for ranking prediction (precision, recall, and F-1 measure). Training and test datasets were created using K-fold cross-validation (K = 10) technique. Results show that contextually personalized HybRecSys outperforms existing four algorithms according to each evaluation metric. Contextually Personalized HybRecSys effectively overcomes the data sparsity problem using venue category, popularity, and price to model user preferences.

In addition, recommending very similar venues to previous visits causes overspecialization. This problem is also reduced by considering the user preferences with different aspects and not just being stuck on only the venue characteristics. Hence, the quality of the recommendation is improved and the recommender system gained diversity, novelty, and serendipity dimensions.

The algorithm partially solves the cold start issue, which can be caused by both a new user and a new item. Even if a new user rates only one venue, the algorithm understands the user preferences from the characteristics of the venue (category, price, and popularity). Moreover, the algorithm figures out the contextual circumstances under which the user prefers that the specific venue. Therefore, by looking at these characteristics, the algorithm may recommend a venue to a new user.

The most important feature that distinguishes the developed algorithm from others is the contextual personalization. Contextually personalized HybRecSys solves this issue and recommends a right venue under the right conditions.

At the beginning, the collected data size was large, but after the filtration, it became moderately small. On the other hand, this small dataset generated significant results that may be improved with larger datasets. As a future study, it is planned to apply contextually personalized HybRecSys on larger datasets. Venue opening and closing hours can be checked for better results. In order to solve the cold start problem entirely, the system may recommend a venue to a new user by looking at the contextual circumstances and recommend the most preferred venue based on these contextual circumstances. In addition, even if a new venue is added to the system, it can be recommended by looking at the content-related characteristics. The online evaluation was conducted with only 20 people, which is a relatively small sample for obtaining statistically precise and coherent results. Therefore, more users should be reached for future studies. Finally, this algorithm can be embedded into a mobile application and this mobile application can be marketed in mobile application stores. Therefore, its performance can be measured via online experiments. As users download and use the application, they will score the recommendation, so that online feedback can be taken and the algorithm can be improved.

Data Availability

The Twitter and Foursquare data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was funded by Bogazici University Scientific Research Fund (grant no. 11463).