Abstract
Aiming at the problems that the traditional model is difficult to extract information features, difficult to learn deep knowledge, and cannot automatically and effectively obtain features, which leads to the problem of low recommendation accuracy, this paper proposes a personalized tourism route recommendation model of intelligent service robot using deep learning in a big data environment. Firstly, by crawling the relevant website data, obtain the basic information data and comment the text data of tourism service items, as well as the basic information data, and comment the text data of users and preprocess them, such as data cleaning. Then, a neural network model based on the self-attention mechanism is proposed, in which the data features are obtained by the Gaussian kernel function and node2vec model, and the self-attention mechanism is used to capture the long-term and short-term preferences of users. Finally, the processed data is input into the trained recommendation model to generate a personalized tourism route recommendation scheme. The experimental analysis of the proposed model based on Pytorch deep learning framework shows that its Pre@10, Rec@10 values are 88% and 83%, respectively, and the mean square error is 1.537, which are better than other comparison models and closer to the real tourist route of the tourists.
1. Introduction
With the continuous improvement of social living standards, people’s demand for tourism and leisure is also increasing year by year. The development and prosperity of the tourism industry have made going out to travel increasingly popular. In the preparation of outbound travel, the route formulation in the travel strategy is an extremely important and critical step [1]. People can check relevant travel guides or collect relevant information on the internet, however, it will waste time and be inefficient. Moreover, the information they find will not match their current needs [2]. Therefore, it is difficult to find information that fits the purpose, and systematically, there will be big differences between the information provided by different people. In addition, the rapid increase in the number of users of social networks has caused a rapid increase in network information. When users face massive network data, they cannot realize a quick selection of information [3, 4]. Therefore, the author hopes to be able to automatically obtain personalized travel recommendations that meet their specific needs to help users quickly filter useless information in a large amount of travel information and improve the efficiency and comfort of users in integrating information [5]. The personalized route recommendation platform is diversified. Among them, intelligent service robots in scenic spots occupy an important position among many platforms because of their high efficiency and convenience. Therefore, its recommended algorithm model is also very important.
The purpose of personalized travel route recommendation is to recommend a travel route composed of multiple points of interest (POI) based on the user’s personalized interests and the user’s own travel restrictions. In the personalized recommendation of POI, the key is to comprehensively model the user interests and similarities between the users. A comprehensive analysis of the user’s personalized preference for different POIs and the degree of similarity correlation between each POI is used to determine the user’s personalized interest [6]. In addition, similarity matching is used to find the degree of association between the users. By the analysis of similar users and similar mobile patterns, personalized POI recommendations are made to users. At the same time, personalized travel route recommendation also needs to take into account the user’s personalized factors and generate a travel route that meets their own travel restrictions for each user [7].
There have been many pieces of research on personalized travel route recommendation at home and abroad. Reference [8] introduced the evolution process of the travel recommendation system in detail and conducted research on its characteristics and current limitations. At the same time, the key algorithms used in the classification and recommendation process and the indicators that can be used to evaluate the performance of the algorithms are also discussed. In terms of recommendation methods, the most common method is the recommendation technology based on collaborative filtering. Realize the personalized recommendation by mining the similarity between the users. Reference [9] uses the sequential pattern mining algorithm to generate various fine-grained candidate POI routes from POI access sequences to realize the recommendation of the best tourism route. However, the overall recommendation efficiency is low, and it is slightly insufficient for route planning with complex and multiple points of interest. Reference [10] proposed a personalized and content-adaptive cultural heritage route recommendation to achieve the best cultural heritage experience of context-aware routes. Use the first-order Markov model to convert motion as the time of the problem to realize route recommendation. The overall recommendation effect is good, however, it takes a long time and is not suitable for immediate recommendation. Reference [11] proposed a new travel route mining method on the basis of considering the theme level and characteristics of scenic spots, in which the scenic spots are subject layered according to the location information of the popular scenic spots. The travel path data set is constructed, and the travel routes are mined in combination with the subject level, however, there are still deficiencies in the consideration of user interest point matching.
With the rapid increase in the demand for tourist route recommendation in recent years, thanks to the rapid advancement of computer technology and communication technology, machine learning technology has been widely used in the field of automatic recommendation of demand [2]. Among them, deep learning algorithms have achieved excellent results in many fields. It can effectively process unstructured multimedia data. Some scholars have begun to try to use convolutional neural networks to solve the feature engineering problems faced by the recommender systems [12]. Reference [13] proposed a matrix factorization algorithm based on two-stage clustering. Using the social network subgraph integrated with preference similarity scores, combined with geographic spatial influence, the cluster refinement of preference embeddings is extended to the cluster refinement of geographic preference embeddings. In this way, the best route recommendation under complex conditions is realized. Reference [14] proposed a personalized travel recommendation scheme based on a weighted multi-information constraint matrix factorization scheme. However, the accuracy rate of the optimal travel route needs to be improved. Reference [15] proposed a route recommendation method based on interest topic and distance matching, which obtains the best travel path by analyzing the user’s real historical travel footprint and scenic spot residence time and combined with the given travel time limit. However, this method has poor timeliness and adaptability and cannot be applied to the independent intelligent robot platform.
Based on the above analysis, the traditional tourism route recommendation model is difficult to pay attention to the long-term preferences of users and the poor recommendation effect caused by sparse data. This paper proposes a personalized travel route recommendation method for intelligent service robots using deep learning in a big data environment. It can be applied to intelligent service robots placed in the halls of scenic spots to realize a personalized travel route recommendation. To fuse multisource heterogeneous data, the proposed model uses the Gaussian kernel function, node2vec model, and other technologies to construct the embedded representations of users, time, space, POI score, access frequency, and social relationships. Send it to the deep learning network for analysis. It solves the problem of low recommendation accuracy caused by sparse data. The experimental results based on the pytoch deep learning framework show that the proposed model integrates user preference characteristics, geographical factor characteristics, and theme factor characteristics and can better complete tourism route recommendation, with a Pre@5 of 95%.
2. POI Recommended Problem Description
The user “sign-in” record data in social networks contains a large amount of high-value information data about POI and user preferences, which provides an opportunity for in-depth research on personalized POI recommendations. However, in practical applications, there are some personalized differences in users’ preferences for POI categories [16]. The existing POI recommendation methods are mostly implemented by content-based or model-based collaborative filtering technology. The subject of POI and the relationship between the subjects are not fully considered. Therefore, in the user’s personalized recommendation, combined with the theme factors of POI, more effective features are obtained from the limited user access information, and appropriate models are selected to achieve distinguishable user preference modeling. These are the keys to improve the effectiveness of personalized POI recommendations [17].
Finding effective features from the check-in data is the key to improve the quality of POI recommendations. Traditional methods only learn the linear or low-order interaction between the features, and they cannot effectively integrate the features in a location-based social network (LBSN) [18]. In recent years, with the rapid development of deep learning, it can intelligently learn high-order characteristics and interact from the input of specific tasks. Therefore, a deep neural network recommendation framework that combines the DNN network with the LDA topic model and matrix factorization algorithm is proposed, named DLM. The user preference feature, geographic factor feature, and probability topic feature in LBSN are integrated into the POI recommendation task using word-embedding technology. High-level interactions between the features are learned through neural networks, and personalized recommendations are made to users.
3. Proposed Model
3.1. Overall Framework
The proposed implementation framework of the recommended model is mainly divided into four modules. They are data preprocessing, deep prediction model construction, network training, and final recommendation list generation. The main implementation framework is shown in Figure 1. The model preprocesses the acquired data and uses the Gaussian kernel function and node2vec model to model it to obtain the corresponding POI location and social relationship embedded representation. Both of them are input into the self-attention module to capture user preferences to obtain an ideal personalized tourism recommendation scheme.

The process of data acquisition and preprocessing is very complex. Mainly by crawling the relevant website data, the basic information data, the comment text data of travel service items, and the basic information data and comment text data of users are obtained. Then, these data are preprocessed. Perform data cleaning on the crawled data to filter out the incomplete data and junk data.
3.2. Construction of Deep Prediction Models
The construction of in-depth prediction models. Neural network technology is mainly used to construct a network model and process the preprocessed data. Use the feature extraction ability of deep learning to obtain the corresponding features, and use the model to predict the user's rating of the tourism service item [19].
Train the network. For the constructed model network, use the training sample data for supervised network training. Mine the potential factors between the users and tourism service items and learn the expression of the interaction relationship between the users and tourism service items to train the model.
Generate a personalized recommendation list. Test the experimental data and input the experimental data into the trained model. The model predicts the user’s rating of travel service items and sorts them according to the size of the rating. Generate a personalized recommendation list for each user to complete the user’s recommendation.
3.3. Data Acquisition and Data Preprocessing
3.3.1. Data Collection
The collected data mainly includes two parts.
The first part is the user’s basic information and the comment text data. The user’s basic information data mainly includes the user’s gender, age, occupation, city, and historical comment items. The comment data mainly includes the user’s comment text information on past travel service items and corresponding scores. This part of the content is mainly used to extract user behavior characteristics, analyze user preferences, and build user characteristic models.
The second part is the basic information of tourism service items and the comment data of tourism service items. The basic information of the travel service item includes the name, location, and label of the travel service item. The comment data mainly refers to the comment text information and the corresponding score obtained by the tourism service item. This part of the data is used to extract the attribute characteristics of the tourism service items and construct the characteristic model of the tourism service items.
To obtain these data, web crawlers are used to crawl the related travel websites. The crawler uses the Scrapy web crawler framework to crawl the website. Scrapy is a distributed crawler framework based on Python. Scrapy is highly flexible and controllable and can easily implement distributed crawlers. At the same time, Scrapy encapsulates the implementation details of a lot of crawlers, which can focus more on data extraction.
3.3.2. Data Cleaning
To ensure that the recommended results are valid, the data should be complete and reliable. Therefore, the crawled data must first be cleaned up to filter out the incomplete data and junk data. During data cleaning, the steps followed will be as follows:(1)Firstly, filter out the users with incomplete basic information. Incomplete basic information means that the basic information characteristics cannot be found. For tourism service items and users, it is impossible to dig out the characteristic influence of its basic information. Therefore, ensuring the integrity of basic information plays an important role in model building.(2)Spam comments need to be filtered. By observing the scraped comment data, it can be found that the general comment has only one word, and words that do not indicate good or bad mood can be filtered out as spam comments as these data have no positive effect on the establishment of the model.(3)To filter the content of the comments, filter out the special symbols in the comments. These symbols are not helpful in digging out the characteristics of the comment content. After the data cleaning is completed, complete and reliable data is obtained. These data will be further processed.
3.4. Network Building
The proposed model contains two components: information embedding and information interaction. Its structure is shown in Figure 2. For the information embedding module, firstly, the input data is multi-hot coded, and the user POI check-in sequence model is constructed to generate the potential representation matrix. Then, in the auxiliary information extraction part, POI geographic location information is extracted by the Gaussian kernel function, and the data information is normalized by softmax. In the information interaction module, the deeper interaction of data is obtained based on self-attention to learn the long-term and short-term preferences of the users, and the information is fused using three bottleneck layers to obtain the final prediction.

3.4.1. User Representation Embed
The purpose of a given user’s check-in record is to learn the potential representation of the POI sequence to improve the accuracy of POI recommendations. In the proposed model, a transition matrix is designed. In this way, the user’s spatiotemporal intentions and the potential characteristics of POI are mapped in the feature space. The input is a user sign-in vector characterized by multi-hot. When is 1, it means that user has visited at time . The mapping process is as follows:where represents the latent representation vector of user . and are the weight and bias of user , respectively.
3.4.2. Spatio-Temporal Information Embedding
In the user’s check-in record, the user’s behavior is usually limited to several specific areas, which is a well-known geographical cluster phenomenon in the user’s check-in activities [20]. From this phenomenon, it can be inferred that the users prefer to visit the unreachable POI near the POI they have visited before, and users’ liking depends on the attributes and distance of the two POIs [21, 22]. If the attributes are the same, then the closer the distance between the two POIs, the greater the possibility that the user will visit the unvisited POI [23]. To combine the geographical distance attribute of the POI, the Gaussian kernel function is used to extract the neighbor perception influence of the sign-in POI, which is expressed as follows:
Among them, and are the geographic coordinates of the two POIs visited by the user. The value range of the Gaussian kernel is . is the bandwidth, which controls the radial range of action. Finally, by calculating the paired Gaussian kernel value of each POI pair, the Gaussian kernel value vector can be obtained.
3.4.3. POI Score and Access Frequency Embedding
By preprocessing the LBSN data set, users’ ratings and access frequency can be obtained. Use normalization to scale its value to (0, 1). The sum of all check-in score probabilities and access frequency probabilities of each user is 1, respectively. Therefore, it is easier to characterize the user’s preference for the POI they have visited, which is expressed as follows:
By calculating the score probability value and the access frequency probability value of each user separately, the probability value vectors and can be obtained.
3.4.4. User Social Relationship Embedding
Select the node2vec model to extract the user relationship features. For each user, use the node2vec model to generate random walk sequences of length . Then, train by Skip-Gram with hierarchical Softmax. Finally, find users related to the current user, so that a latent representation matrix of the user’s social relationship can be obtained.
3.4.5. Attached Information Display
To handle the complex interaction between the user and POI and further POI recommendations, the various auxiliary information is integrated. It is expressed as follows:where is the latent representation vector of ancillary information. is the element dot product.
3.5. Network Training
Use pairwise optimized Bayesian Personalized Ranking (BPR) loss function to learn model parameters. The objective function is as follows:where represents the paired training data. indicates that there is a record of access records. Q represents an unobserved visit record. and , respectively, represent the user’s preference for the target POI and . is the sigmoid function. represents all model parameters that can be trained. represents the L2 regularization parameter that controls overfitting. The small batch Adam optimization algorithm is used to optimize the prediction model and update the model parameters.
Although deep learning models have strong representation capabilities, they often have the problem of overexpression [24]. Dropout is an effective solution to prevent neural networks from overfitting. To make the model generalize well to unobserved data, Dropout is used in training. Dropout randomly deletes specific nodes and the neural network nodes of the information self-encoding layer with a certain probability.
3.6. Generate Recommendation List
After the network model is trained using the above optimization algorithm, the vector information of tourism service items and users can be input into the network model. Through the trained network model, the user’s prediction score of the tourism service item is obtained. This value is used as a basis and arranged according to size. Recommend the top N travel service items with higher scores to the user, generate a personalized recommendation list, and complete the recommendation.
4. Experiment and Analysis
The data used in the experiment is from a tourism website, and the basic information data example of its users is shown in Table 1. There will be many comments in a scenic spot. Similarly, a user will also have multiple comments, and each comment has a corresponding user and scenic spot identity (ID) and corresponding score. Because of the limited size of the table, there is no detailed display here (the comments in the table only show the content of the comment text).
The proposed model is run through the pytorch 1.2.0 framework, where the dropout rate is set to 0.5.
4.1. Evaluating Indicator
Two broad indicators are used to evaluate the performance of different recommendation models, namely accuracy and recall (these two indicators are represented by Pre@N and Rec@N, respectively), and the calculation is as follows:where represents the number of users. N represents the number of recommended POIs. Top-N represents the list of the first N points of interest recommended by the recommendation model to the target user. K represents the real check-in list in the user test set, i.e., the POI set that the user has actually accessed in the actual historical access record.
In addition, for user and tourism service item in the test set, represents the predicted score generated by the recommendation algorithm, represents the actual score of user on item , and hence, the mean square error (MSE) can be defined as follows:where is the number of observed values in the test set.
4.2. Performance Comparison with Comparison Algorithm
4.2.1. Influence Analysis of Model Characteristics
Since each feature in the recommendation model will have a certain impact on the recommendation results, user preferences (UP), geographic factor (GF), and thematic factor (TF) are successively added to the proposed model, and experiments are carried out. The comparison results under Pre@5, Pre@10, and Pre@20 are shown in Figure 3.

As can be seen from Figure 3, the proposed model integrates user preference features, geographical factor features, and topic factor features, and its recommendation accuracy is better than the model with user preference and geographical factors. Taking Pre@5 as an example, its accuracy is as high as 95%. At the same time, it can also be seen that the recommendation effect obtained by the fusion of the three-factor features is significantly better than that of the single factor features or the fusion of the two-factor features.
4.2.2. Comparative Analysis of Accuracy and Recall
In addition, by comparing the accuracy and recall of the proposed model with the models in reference [9, 11, 14] on the data set, the comparison results in the case of Pre@5, Pre@10, and Pre@20 are shown in Figure 4, and the comparison results in the case of Rec@5, Rec@10, and Rec@20 are shown in Figure 5.


It can be seen from Figures 4 and 5 that the accuracy and recall of the proposed model are significantly better than those of the other recommended models. Taking Pre@10 and Rec@10 as examples, their values are 88% and 83%, respectively, while those of the other models are less than 80%. The proposed model uses the Gaussian kernel function to obtain the pairwise distance between the corresponding POIs in the user check-in record, selects the node2vec model to extract the network structure characteristics of the user’s social relationship, and captures the user’s preferences by the self-attention mechanism. Hence, the overall recommendation effect is good. However, reference [9] uses sequential pattern mining algorithm to build the POI knowledge base and massive structured POI access sequence to realize the recommendation of the best tourism route, but there is no influence of geography and other factors. Therefore, the accuracy and recall rate of the recommendation scheme are low. Taking Pre@20 and Rec@20 as examples, both are less than 50%. Reference [11] formed a standardized travel data set by preprocessing the data, such as word segmentation and denoising, stratified the scenic spots according to the location information of popular scenic spots, and recommended travel routes in combination with the theme level and scenic spot characteristics, however, it did not deeply mine the users. Hence, the performance of the recommendation model was poor. Reference [14] proposed a personalized travel recommendation model based on weighted multi-information constraint matrix decomposition scheme, which comprehensively describes users and travel locations using photos, user access sequences, and text tags, and it allocates different weights in combination with the common access probability based on geographical distance, which can achieve better travel route recommendation. However, because of the traditional method, the recommendation performance is lower than that of the proposed model using deep learning. Taking Rec@5 as an example, it is reduced by 9%.
4.2.3. MSE Analysis of Different Models
To demonstrate the recommended performance of the proposed model, it is compared with reference [9, 11, 14]. The results are shown in Table 2.
It can be seen from Table 2 that the MSE values of reference [9, 11] are almost the same, only 0.031. Because of the lack of in-depth analysis of geography, user preferences, and other factors, the recommended results deviate greatly from the actual route. Reference [14] uses the weighted multi-information constraint matrix decomposition method to realize personalized travel recommendation, which considers many factors, however, it lacks a powerful learning algorithm. Therefore, the accuracy of the recommendation result is not high, and the MSE is 1.705. On the basis of preprocessing like data cleaning, the proposed model uses the deep learning algorithm to extract user features and carry out corresponding learning classification. It not only considers the comment text information of users and tourism service items but also adopts the basic information of users and tourism service items. Therefore, an ideal recommendation scheme is obtained, and its MSE is only 1.537. In conclusion, the above results demonstrate the effectiveness and superiority of the proposed recommendation model.
5. Conclusion
In recent years, with the popularization of the internet and the continuous development of information technology, people’s demand for tourism is richer and more diverse. Effective and timely tourism service recommendation is of great significance to provide efficient and high-quality personalized tourism service recommendation. Therefore, based on the deep learning algorithm in the big data environment, a personalized tourism route recommendation model that can be applied to the intelligent service robot in the scenic hall is proposed. Aiming at the problems of long travel timespan of users and dynamic changes of preferences, the proposed model uses the self-attention mechanism module to filter the POI features related to the user’s long-term preferences in each sequence, thereby improving the accuracy of travel route recommendation. The experimental results based on the pytoch deep learning framework show that the proposed model completes data feature extraction and prediction using a deep learning network based on a self-attention mechanism, and it comprehensively considers all kinds of data information. Therefore, its Pre@10 and Rec@10 values are 88% and 83%, respectively, and the mean square error is 1.537, which has certain advantages in the tourism route recommendation.
At present, the extensive use of the knowledge map makes it possible to extract potential interactive representations that human beings cannot notice through this technology to make an effective recommendation has become a hot issue in research. Data often have diverse and heterogeneous representations, such as the type of POI, the access time of POI, the traffic time of POI, the cost of POI, the location of POI, etc. Mining the attribute information of these entities using the knowledge map technology can make the recommendation system develop further.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The author declares that there are no conflicts of interest regarding the publication of this paper.