Abstract
Since the user recommendation complex matrix is characterized by strong sparsity, it is difficult to correctly recommend relevant services for users by using the recommendation method based on location and collaborative filtering. The similarity measure between users is low. This paper proposes a fusion method based on KL divergence and cosine similarity. KL divergence and cosine similarity have advantages by comparing three similar metrics at different K values. Using the fusion method of the two, the user’s similarity with the preference is reused. By comparing the locationbased collaborative filtering (LCF) algorithm, userbased collaborative filtering (UCF) algorithm, and user recommendation algorithm (F2F), the proposed method has the preparation rate, recall rate, and experimental effect advantage. In different median values, the proposed method also has an advantage in experimental results.
1. Introduction
With the rapid development of spatial information technology, spatial information such as smart signin, mobile services, and GPS has become one of the research hotspots in recent years. Accurate position prediction has a very important application value in urban planning [1, 2], traffic forecasting [3, 4], advertising push [5, 6], and disease prevention [7]. The existing models for discovering the law of user mobility have their own advantages, such as Markov model and PMM. However, there are still the following defects: (1) the impact of time on user location changes cannot be truly and quantitatively reflected; (2) successive and interrelated effects are reflected in real and quantitative terms.
With the maturity of smart sensing devices, mobile phone positioning systems, etc., people usually carry more than one signin tag. Each type of the checkin label produces a corresponding type of checkin data, and various types of checkin data are generated in large quantities, providing researchers with a large amount of data for analysis.
In recent years, many scholars have used checkin data to conduct research. For example, using network signin data to cluster interest groups and content recommendations for user groups [8] and using urban residents’ bus card information to study people’s travel characteristics and hot business circles [9]. However, the existing research is limited to single signin data. Although the number of single signin data is large, it is usually sparse in time and space [10], which reduces the reliability of user similarity calculation and the quality of neighboring search, which is not good. The effect needs to be improved.
For the traditional recommendation problem, in order to improve the search quality of neighboring users, researchers have improved the similarity calculation method. For example, literature [11] uses the Jaccard similarity coefficient improved by the modified formula to calculate the similarity between users. Considering the relationship between common scoring items and all scoring items between users and the difference of users’ scores on common scoring items on user similarity, the search quality of neighboring users is improved; Wang et al. [12] proposed an entropybased user similarity. The sexual measurement method, considering the relative error of user scores, improves the search quality of neighboring users, but this method does not consider the influence of time and geographical location on the recommendation results. In location recommendation, existing research methods use friend relationships and checkin time information to improve the quality of neighborhood search [13]. Literature [14] proposes a friendbased collaborative filtering algorithm, which can only find neighboring users among users' friends, but the accuracy of this method is limited, and the literature [15, 16] also shows that the user’s friend relationship has limited improvement on recommendation accuracy. Construct a userlocationtime threedimensional matrix, consider the time periodicity of the user’s signin, and obtain timeaware user similarity, which improves the accuracy of the recommendation.
The existing location recommendation algorithm considers the geographical location factor less when calculating the user similarity. In view of this problem, this paper proposes a fusion algorithm that integrates the geographic location preference and multisimilarity measure to improve the proximity of users by calculating the user similarity. Quality improves recommendation accuracy.
2. Geographical Relationship Recommendation Complex Model
Based on the geographical location relationship, how can users in different locations establish a relationship connection with other users, a relationship can be established by establishing relationships between nongeographic users, users in the same geographical location establish relationships, and whether users who have not established relationships are analyzed. There is a similarity in the spatial position, and the relationship between users is established by this similar relationship.
User set ; represents a set of all users, data represented by m; Ngeographical position information . User u accesses a location in a geographic location . Each location information is described by < longitude, latitude > coordinates. When the checkin matrix in the user collection iswhere represents the number of visits by the user at the address geographic location , and the number determines the size of the user’s interest in the geographic location. It is obvious that the matrix is a sparse extreme matrix. The user’s model and interest predictions for the geographic location are analyzed by analyzing the matrix at different points in time and associated frequencies.
2.1. Location Information Model
This paper is used to establish a user set to model the overall geographical location. Obviously, the user behavior can be embodied as a whole, but it is difficult to describe the individual preferences. In order to describe the behavior of the user preferences as much as possible, we use an adaptive kernel densityestimation widegeographic locationkernel function modeling in the following formula:where represents the spatiotemporal distance of the geographic locations and , represents the geographic location and sample sets, represents the number of , and is a kernel function. This paper uses the Gaussian kernel function, and the formula is as follows:where is the smoothing parameter; in this paper, the normal distribution test of user and geographical distance distribution is adopted, and the optimal kernel width estimation is used to obtain the best effect. If the distance distribution is approximately normal, the following formula is used:where is the standard error of the sample :where is the absolute difference median of adjacent samples. According to above formula (5), the objective function modeled by the geolocation kernel function estimation method is as follows:where is the user’s preference for the geographic location, and the user predicts the geographic location interest:where is a weighting parameter used to evaluate the user’s preference for geography, is a normalization coefficient of , and indicates that the geographic location is strong against the address location correlation.
2.2. Geographic Similarity Calculation Model
Due to the proximity of the geographical location, the probability of their communication below the online is higher. We assume that users with higher geographic overlaps are more likely to become friends in virtual and real social interactions and to obtain geographic location relationships between users by calculating the historical similarity of signin. We have explored three commonly used similarity calculation methods: cosine similarity, divergence, and Jaccard similarity. The user’s checkin history data, that is, a matrix, is used to evaluate the geographic commonality between users.
2.2.1. Cosine Similarity
Cosine similarity is the inner product of two vectors. It is calculated that the cosine of the two vectors is independent of the size of the vector. Cosine similarity generally measures the multidimensional space with values between [0, 1]. In matrix F, the geographic metrics of and users are as follows:
where N is the number of locations.
2.2.2. Divergence Cosine Similarity
The divergence is called relative entropy and measures the relative distribution of two probabilities. For users and , the probabilities for the next geographic location preference are and . The relative entropy of to the geographic location is
In order to prevent the occurrence of zero probability, add 1 to each of the numerator and the denominator, and then, according to the divergence, the KL divergence formula of and is
When the path or preference of the position between the two users is the same, the above formula results in 0. Therefore, the similarity of the similarity is mapped to [0, 1], and the divergence is normalized, and the value is subtracted by 1. This ensures that the geographical status probability distribution is similar and the similarity value is also larger. The geographical similarity divergence between two users iswhere divergence is asymmetrical. When recommending the geographic location to other users, use to indicate the similarity between the user and other users.
2.2.3. Jaccard Similarity
The Jaccard similarity is used to evaluate the similarity of two sets. For the geographic location preferred by the user, all the location sets of the user calculate the similarity, and the subset of the user’s location set is used to compare the similarities, and the overall similarity passes through the subset. Similar to Jaccard, it is similar, and its calculation formula is as follows:where represents the geographic location history of the user and N is the sum of the locations.
3. Recommendation Algorithm Based on Geographic Location Preference
After the user’s preference for the geographic location is calculated and the similarity is calculated, how to effectively recommend the location information to the user, considering the individual user’s preference and the similarity between the users, effectively integrates the research through the two. Recommendations are made by ranking individual user preferences and then using user similarity.
For the geographic location of the phase, there is multiple users’ continuous access to the geographic location, and then, the user has the same geographic preference during this time period. From the vector model of the user and the preferred geographic clustering, we have been through the computing user together. The more times the two users visit the same location, the higher their interest similarity in the location will be. The user’s interest similarity formula at the location is as follows:where is the kth cluster at the center of the cluster class (geographic location k), represents the access frequency of the ith user going to position k, and refers to the similarity of user i and user j in geographic interest. The value of interest ranges from 0 to 1. Formula (14) uses the similarity evaluation formula to measure the similarity of points of interest of different users to K geographic locations. The value of formula (14) indicates the recommended location similarity value. If the result is equal to 1, the location has extremely high similarity, and if it is 0 means, there is no similarity in the position.
More times a user accesses a location, the greater the probability that the location will be accessed next time, and the average of the location is also large, indicating that the user has more preference for the geographic location than other users, and the preference model is quantified aswhere , , the ratio of the number of visits by user i in the geographic location K to the number of times he visited the geographic location, is the number of times user i accesses the location K, , is the set of all access locations, and is the ratio of the number of times user i accesses the location K to the average number of times the user accesses the location.
Since the geographic location of the user is added to the matrix decomposition optimization formula as a constraint rule, the original loss function not only satisfies the constraints of the matrix decomposition but also satisfies the geographical constraints. Its constraint formula is as follows:where represents the implicit feature factor of the user, and the probability prediction values of the users and recommended as friends are . represents all current friends of user . This function is based on the assumption that the more similar the two users are, the more similar the implicit factors corresponding to them are decomposed by the matrix. The smaller the , the greater the difference between the users and . The difference between their implicit factors is greater. Conversely, the larger the , the smaller the difference between the users and and the smaller the difference between their implicit factors (Algorithm 1).

The above is based on the user similar recommendation algorithm. Through the above algorithm, the user and other users calculate the similarity degree to analyze those users who are to be recommended users and recommend the user to the recommendation level.
4. Experimental Method Analyses
4.1. Experimental Evaluation Method
We use the Gowalla and Foursquare datasets. Gowalla [17] and Foursquare [18] are based on location social networking sites that provide users with location sharing, event sharing, line sharing, etc., by signing in. The Gowalla dataset was randomly extracted from 3,000 users, 2,530 points of interest, and 50,724 checkin records. The Foursquare dataset contains 1,083 users, 400 location types, 38333 locations, and 227427 user checkin records, with a dataset sparseness of 99.45%. In order to ensure the effectiveness of the experiment, delete the user data with less than four times of checkin and less than four times of registration. Finally, 1071 users, 307 place types, 5291 locations, and 92,056 user signin records are obtained. The sparseness of the dataset is 98.38%. The datasets used in the paper are Gowalla and Foursquare, which are public geographic location datasets. There are a large number of points of interest, signin data, user status information, etc. In the dataset, the application of the dataset has high persuasiveness and credibility.
The goal of the user recommendation is to recommend N interested geographic locations for the user, and all the candidates are ranked by the user’s preference to obtain the recommended result. The experimental selection accuracy P, the recall rate R, and the F_{1}Measure evaluation value F_{1} are evaluated. The evaluation index of the experimental effect, wherein the F_{1} evaluation value is a comprehensive evaluation index based on the accuracy rate and the recall rate. The calculation method of the above indicators is as follows:where is the geographic location recommended to the user u and is the geographical location that the user in the test set u has checked in. The following three indicators are used to measure the recommendation results. , , and , respectively, indicate the inaccuracy rate and the recall rate.
This paper analyzes the F_{1} performance of three methods for calculating geographic similarity, then analyzes the performance of each comparison method in terms of accuracy, recall, and F_{1}, and compares the performance of the proposed hybrid method proposed in this paper. A lot of experiments and analysis were carried out for the first 1, 4, 8, 12, 18, 22, and so on. Finally, we analyzed the influence of the blending parameters on the influence of the userrecommended constraints on the proposed method, and the similarity performance is shown in Figure 1.
Figure 1 shows the results of sorting recommendations using only geographic similarity, that is, using only cosine similarity,KL divergence and Jaccard similarity calculate the geographical similarity between users and rank the similarity as a recommendation score, and the A users with the highest scores are recommended to the target users.
In this paper, the purpose of geographic commonality is to explore the relative preference of users for geographic location, and it does not pay attention to the specific number of times a user is in a certain location. Because each user’s living habits or patterns are different, some users may rarely go out. The number of visits is small, and some users may go out often or they have more times. Therefore, the difference caused by this data magnitude is useless information, and we should pay more attention to which user prefers location A and location B. Since the calculation method of Jaccard similarity is greatly affected by the magnitude of data, this method performs the worst. When the performance of cosine similarity and divergence is equivalent, the cosine similarity is slightly better than the divergence; the divergence performs better when k > 8. The cosine similarity method considers the history of the user’s collection as a vector, each location is a dimension of the vector, and the calculated cosine similarity is the angle of the vector. The KL divergence is the distribution of the user’s history as the user’s preference distribution at different locations. The calculated divergence is the difference between the two distributions, which is consistent with the measurement of the user’s different concepts of point preference difference. Therefore, cosine similarity and divergence have achieved good results. In the recommendation list, users at the top of the list are often able to get the attention and acceptance of the target user, so for the measurement, in terms of recommended performance, these top users are more relevant than the later users. Since cosine similarity and divergence have good effects, we use cosine similarity and divergence fusion method to calculate users. The geographical commonality formula is as follows:where .
4.2. Comparison of Experimental Methods
To verify the validity of the experiment, the following comparison algorithm is used in this paper: (1) locationbased collaborative filtering LCF algorithm [19]; (2) userbased collaborative filtering UCF algorithm [20]; (3) F2F rank candidate users accord to that number of user friends [21], and the top k users who select the most common friends are recommended to the target users. The fusion KL and cosine methods proposed in this paper are recorded as KLC.
The influence of the parameter α on the recommendation result is shown in Figure 2. The value of the parameter α is evaluated for the recommendation result. The recommended list length is fixed to 10, the number of neighboring users is fixed at 10, and α is taken in the interval [0, 1]. The F_{1} value increases gradually. When α = 0.6, the F_{1} value reaches the maximum, indicating that the comprehensive recommendation effect is the best. When α continues to increase, the F_{1} value is gradually smaller. When α = 0, it is cosine; when α = 1, it is KL.
Figures 3, 4, and 5 show the comparison of the accuracy, recall, and F_{1} values of each algorithm when taking different recommended list lengths. First, α is fixed at 0.6, and the length of neighboring users is fixed at 10. It can be seen that when N is in hours, the accuracy rate is higher and the recall rate is lower. As N increases, the accuracy rate gradually decreases and the recall rate gradually increases.
The recommended effect of KLC is better than that of UCF and LCF, which indicates that compared with the traditional proximitybased collaborative filtering algorithm, the proposed method has better proximity detection effect. At the same time, the recommendation effect of KLC is better than that of F2F. This shows that the user similarity considering time and space perception is better than that of unilateral similarity. It can be seen from Figure 5 that the comprehensive index F_{1} of KLC is better than other algorithms, and when N = 20 o’clock, the F_{1} value of KLC reached the highest, and the recommended effect was the best. When N was 25, 30, 35, and 40, F2F was inferior to UCF, indicating that there is a certain deficiency in using the spatial similarity of users. The LCF effect was poor in all comparisons and did not achieve the expected recommendation.
When the number of users recommended is different, what is the effect of the four methods under F1 value, α = 0. 6, and the length of adjacent users is fixed at 10, as shown in Figure 6.
As shown in Figure 6, the K value is between 2 and 18. The UCF and KLC are better than the LCF and F2F methods. When K = 10, K = 14, and K = 16, the LCF is better than the F2F method. Among the four methods, the effect of KLC is optimal. When K = 12, the maximum value is reached, which is significantly higher than the other three methods. Through the above comparison experiments, it can be found that the algorithm can improve the search quality of neighboring users and thus improve the recommendation effect based on the user similarity fusion algorithm.
5. Conclusion
In order to be able to access the geographic location with similarity between different users in the spatial geographic location, this paper proposes a method based on the fusion of KL divergence and cosine similarity. KL divergence and cosine similarity have advantages by comparing three similar metrics at different K values. Using the fusion method of the two, the user’s similarity with the preference is reused. Finally, through comparison with LCF and UCF, the proposed method has advantages in preparation rate, recall rate, and F_{1}. At different median values, the proposed method also has advantages in F_{1}. Future works will solve the problem of matching similar users through graph theory [22, 23] to improve the effect of user recommendation.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported in part by the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant no. KJQN202002104).