A Hybrid Method to Solve Data Sparsity in Travel Recommendation Agents Using Fuzzy Logic Approach
Travel recommendation agents have been a helpful tool for travelers in their decision-making for destination choices. It has been shown that sparsity can significantly impact on the accuracy of recommendation agents. The COVID-19 outbreak has affected the tourism and hospitality industry of almost all countries in the world. Tourists who have planned to travel are canceling or postponing trips due to this pandemic. Accordingly, this will impact the rate of travelers’ online reviews on tourism products. Hence, the lack of data, in terms of ratings and textual reviews on hotels, will be a major issue for travel recommendation agents during the COVID-19 outbreak in the context of tourism and hospitality. This will be a new challenge for the researchers in the development of travel recommendation agents. Machine learning has been found to be effective in dealing with the data sparsity in recommendation agents. Therefore, developing new algorithms would be helpful to overcome the sparsity issue in travel recommendation agents. This research provides a new method through neurofuzzy, dimensionality reduction, and clustering techniques and evaluates it on the TripAdvisor dataset to see its effectiveness in solving the sparsity issue. The results showed that the method which used the fuzzy logic technique with the aid of clustering, dimensionality reduction, and fuzzy logic is more efficient in addressing the sparsity problem and presenting more accurate results. The results of the method evaluation are presented and discussed, and several suggestions are provided for future studies.
Recommendation systems are software agents that aim to solve the information overload problem and enable real-time decision-making. The use of recommendation agents in tourism has been effective for travelers’ decision-making [1–3]. These systems have helped travelers to find the most suitable destinations according to their choice preferences . It was shown that the accuracy of the recommendations relies on the richness of available data . In fact, data quality plays a vital part in recommending accurate items to the users. Sparsity which is a major issue in these systems [6–11] has significantly impacted the accuracy of the recommendation in many domains of application such as healthcare, tourism, and e-commerce.
The COVID-19 outbreak has affected the tourism and hospitality industry of almost all countries in the world [12, 13]. Tourists who have planned to travel are canceling or postponing trips due to this pandemic. This has negatively impacted the hospitality industry. Thus, many hotel managers will not receive enough feedback from travelers to measure the quality of the services. In fact, there will not be enough online customers’ reviews on the performance criteria of hotels for their evaluation. Thus, the lack of data, in terms of quantitative (numerical ratings) and qualitative (textual reviews) ones on hotels, will be a major issue for travel recommendation agents during the COVID-19 outbreak, particularly considering the tourism and hospitality sector. It is therefore important to provide appropriate strategies to obtain enough and updated data regarding the tourism products from the users to have accurate travel recommendation agents during and in the post-COVID-19 era. In addition, it is vital to overcome the sparsity issue of user-hotel interaction data to have recommendation agents with reliable recommendations.
To address real-world obstacles, there is a need to handle several uncertain variables. Referring to the changing environment of uncertainty, several shortcomings in data such as fuzziness, randomness, incompleteness, and indistinguishability can exist . Fuzzy logic presents a broad range of approaches to inspect the uncertainty in data . Contrary to traditional set theory, in which items are either classified into a group or not, in fuzzy set theory, items can be a part of a group to some degree . This theory has been deployed to represent qualitative data effectively and presented the ability to address several issues with very robust outcomes . The use of fuzzy logic in previous literature has been explored in several fields like software project management , electronic learning , and the trust model . Recommender Systems (RSs) based on fuzzy logic have been advanced since 2008 . Hence, several researchers have explored fuzzy logic in several domains related to recommender systems like consensus ranking , item and trust-aware collaborative filtering , correlation based similarity , competence RSs , situation-aware collaborative filtering (CF) , tourism system , stock market , movie RSs , automatic group RSs , knowledge-based RSs , and multicriteria collaborative filtering .
This research provides a new method through fuzzy logic, dimensionality reduction, and clustering techniques and evaluates it on the TripAdvisor dataset to see its effectiveness in solving the sparsity issue. Data segmentation was deployed using Expectation-Maximization (EM), in which similar users were better detected with lower computation time. EM was deployed for different numbers of clusters and the best cluster size was considered for the next stage. Dimensionality reduction was applied using Higher-Order Singular Value Decomposition (HOSVD). We also use adaptive neurofuzzy inference system (ANFIS) to predict users’ overall ratings according to the predicted criteria ratings. Finally, the recommendations were presented using a dense tensor of the data. The findings of the study were evaluated using several measures. The results of the method evaluation are presented and discussed, and several suggestions are provided for future studies.
The remainder of this paper is structured as follows. In Section 2, we present a bibliometric analysis of the sparsity problem in the recommender system literature. In Section 3, we provide a literature review on RSs in tourism and hospitality. In Section 4, we provide the proposed method and the mathematical background of the proposed techniques. In Section 5, the data collection is provided. Results are elaborated in Section 6. We perform method evaluation in Section 7. Finally, we present the discussion and conclusions, respectively, in Section 8 and Section 9. To simplify, the acronyms used in this work are presented in Table 1.
2. Sparsity Issue in Recommender Systems
A CF recommender system presents items to users based on other individuals’ choices. It assumes that individuals who had comparable choices previously are more probably to share similar preferences later . The core concept of the CF method is that it collects data on users’ choices of several products or services by referring to users’ ratings . Users with the same rating patterns are regarded as similar users, in which the similarity value is calculated by a particular algorithm such as K-Nearest Neighbor (KNN). Although the CF approach has earned researchers’ interest recently, the performance of this method depends basically on the feedback exhibited by users. Still, users may not be concerned with providing feedback based on several causes which might lead to the data sparsity issue . Data sparsity is a basic obstacle of the CF method . The number of users, products, and services in any recommender system might impact the number of rated items by individuals . Efficient estimation of ratings referring to a limited number of examples is significant . Still, without sufficient information, it is difficult for the CF approach to present efficient recommendations to users. A sparsity issue emerges because the user interacts only with a limited set of products in a specific domain. For example, the MovieLens datasets entail a rating matrix, in which users provide their ratings of movies. Still, only 10% of this matrix is filled by users , which indicates the data sparsity problem.
Many methods have tried to address the data sparsity of RSs, which can be classified based on several aspects. The first method tries to utilize additional data of users. Additional data like friends’ connections and consumer-generated tags can also be integrated into RSs to identify similar neighbors . To address the data sparsity in collaborative filtering recommender systems, Guo et al.  designed the TrustSVD model, in which implicit and explicit impacts of social trust were utilized to estimate items for real users. Their proposed model surpassed trust-based and rating-based approaches in the accuracy of prediction among users with various trust levels. On the other hand, Tang et al.  presented the SoDimRec model, in which heterogeneous social links and poor dependencies among them are considered. The research model presented a good performance based on real-world data sets. In another study by Krohn-Grimberghe, Drumond, Freudenthaler, and Schmidt-Thieme , the Multirelational Matrix Factorization (MRMF) technique was utilized using a Bayesian personalized ranking (BPR) framework based on several social connections among users for recommending items to users. The authors indicate the effectiveness of the presented technique in addressing the problem of the lack of social relations among users based on the real dataset. Zhao et al.  focused on the utilization of social relations to present accurate ranking paradigms. The authors assumed that a product purchased by an active consumer is more preferred than the product purchased by a friend, which is accordingly more preferred than a product purchased by other users. They designed the social Bayesian personalized ranking model referring to that assumption and presented promising findings based on the evaluation of real-world datasets. To enhance the performance of RSs in a sparsity context, Feng et al.  presented a multifactor similarity metric that locates nonlinear and linear connections among users from excessive behavior. The disadvantage of these methods is that there is a need for built-in social connections which may not be always available .
The second method tries to integrate additional data of items (features and content). McAuley and Leskovec  present the hidden factors as a topics model to integrate product factors using review texts aiming to enhance the prediction of rating. The proposed method can address the issue of new items with limited ratings. Zhu et al.  tried to address the data sparsity of ratings by referring to several extracted side information of items from the social media. They also converted user-item ratings into weighted topic-item ratings. Compared to other methods and based on the evaluation of real-world datasets, the outcomes indicated the efficiency of the presented method. He and McAuley  deployed deep networks to investigate the influence of extracted visual characteristics of item images. The deployed approach could enhance the performance of Top-N product suggestions. In a study by Vasile et al. , categorical side data of products was utilized in the recommender system. The metadata was incorporated to frame the embedding representation of products. One limitation of this method is overspecialization, which leads to presenting a limited range of products to users. The research findings indicated that new product representations improved the performance of RSs based on a music dataset.
The third approach considered several interactions between users and products. Pan et al.  presented an adaptive Bayesian personalized ranking to address the problem of heterogeneous implicit feedback. The authors assumed that products with browse actions will be more favored than products without any. The outcomes of the study affirmed that the deployed method is capable of leveraging uncertainty in examination records efficiently compared to other several ranking-oriented assessment measures. Qiu et al.  differentiated consumers’ preferences based on three classes of products: the purchased products, products with auxiliary behaviors, and products without any behavior. The authors developed a trinity preference-based BPR model aiming to improve the outcomes of RSs. Loni et al.  proposed the MF-BPR approach. They designed a nonlinear sampling technique for the standard BPR method, in which the sampling probability depends on the degree of positive feedback an individual may have on a product.
To get an insight into the current research regarding the sparsity problem in the recommender system, a visualization of the keyword cooccurrence network is generated through a bibliometric analysis. The main outcome of the bibliometric analysis that we performed is a cooccurrence-keywords map of the research topic. We used the following keywords to retrieve the related terms from the Scopus database: ((“sparse data” OR “sparsity”) AND (“recommender system” OR “recommendation system” OR “recommender agent” OR “recommendation agent” OR “recommender engine” OR “recommendation engine”)).
In Figure 1, the distance among the elements is established by calculating how many studies in which both items (keywords) occur. A huge number of cooccurrences is indicated by a short path among the represented items. That distance is indicated in the cooccurrence image which is utilized to present the segments of the keywords. Besides, bigger frames indicate more occurrences in the studies; hence, as the figure presents, “recommender systems,” “data sparsity,” and “collaborative filtering” are the most frequent keywords in the selected studies. The diagram entails four segments: segment 1 (14 keywords), segment 2 (12 keywords), segment 3 (10 keywords), and segment 4 (9 keywords). In segment 1 (red color), the core keywords are “recommender systems” and “data sparsity.” In segment 2 (green color), the core keyword is “collaborative filtering.” In segment 3 (blue color), the core keywords are “matrix algebra” and “factorization.” In segment 4 (yellow color), the core keyword is “prediction accuracy” and “context-aware recommender system.”
3. Related Work on Recommendation Agents in Tourism and Hospitality
Tourism is the movement of people from one geographical place (which is usually their place of residence) to another place to achieve various individual, business, and leisure goals . The most frequently used description of the tourism system is presented by Leiper  who indicated that the tourism system is a three-part system that entails production, transportation, and tourism destinations, in which these parts are placed within economic, social, and environmental contexts. Based on several business and marketing activities, tourism has a vital part in the economic, social, and cultural advancement of most regions . An important feature of tourism activities is their close link with tourists’ preferences and interests [55, 56].
The fast advancement of social platforms allows online commerce to change from a product-based platform to a social-based one [57–59]. Thus, online business has encountered an emerging formation that adopts Web 2.0 characteristics to allow consumers to be engaged and motivates them to communicate , which accordingly can add more financial value for businesses. Hence, the fast advancements in the Internet, communication technologies, and social media have had an intense impact on the tourism business . Nowadays, tourists tend to utilize smart devices to help them in browsing and finding places that suit their aims . In this social-based environment, tourists search for other tourists who shared their experiences and knowledge to increase their awareness of the presented services and, finally, reach the right choices. People tend to understand the opinions of other individuals and make choices under the impact of social ties [63, 64].
RSs deployed in tourism applications are widely recognized as destination recommendation systems or Travel Recommender Systems (TRSs) . RSs in the tourism field are beneficial tools for consumers and travel agents . Trying to imitate the interaction with a real tourism agent, several tourism providers have integrated RSs into their web portals. Using TRSs, tourists can simply reach related information about the locations they require, thus, leading to less time for booking decisions with more tailored suggestions that meet their preferences. TRS is an intelligent system that returns tourism services by presenting guidelines and suggestions to tourists. TRSs can be categorized as either web-based TRSs or mobile TRSs.
This paper develops a new method for the proposed recommender system through clustering, dimensionality reduction, and the neurofuzzy system (see Figure 2). In the first step of the method, we perform data segmentation using EM. There are many types of clustering techniques for data segmentation. It has been demonstrated that EM clustering techniques are robust when the dataset is sparse . Through segmentation, similar users can be better detected with lower computation time. This also will help the ANFIS technique to better construct the prediction models when the data is large. In addition, in this stage, EM will be applied for the different number of clusters and the best cluster size will be selected for further analysis in the next step which is dimensionality reduction through HOSVD. HOSVD aims to reduce the dimensions of the data for similarity calculations between users and items in lower dimensions. In the next step, we use a neurofuzzy approach, ANFIS, to predict users’ overall ratings according to the predicted criteria ratings. Accordingly, a dense tensor of the data will be used in the next stage of the method for the recommendation procedures. The method predicts the hotels’ ratings for the users and recommends them according to the developed fuzzy-based algorithms. The results are finally evaluated using several metrics, Precision, F1-measure, RMSE (Root Mean Squared Error), and MAE (Mean Absolute Error). The techniques which have been used in the proposed recommendation method are introduced in the following sections.
4.1. Prediction and Recommendation Procedure
In this section, we first present core concepts of fuzzy sets , which are used to complete the prediction and recommendation procedure in the proposed method.
Definition 1. For a fuzzy number (FN) , the -cut for , [0,1], in fuzzy set ( indicates the upper bounds and indicates the lower bounds of the closed interval) is defined as
Definition 2. We define the membership function (MF) for a triangular FN = through triplet as
Definition 3. For any , the group of all finite positive FN on , and we have
Definition 4. Suppose and are two FNs. Then if and for any .
Learning the prediction functions for the items as well as users has been taken into account in the present work and following prior works on multicriteria collaborative filtering (MC-CF). A weighted approach has been used in every cluster to obtain a combination of the prediction functions. According to previous studies [69, 70], we utilize equation (4) as a common weighting scheme, which is represented asIn equation (4), and are the weight of and , respectively. According to the definitions of fuzzy set, we extend the above prediction method for fuzzy-based prediction.
Definition 5. As defined by Adomavicius and Kwon , a multicriteria recommender system incorporates preferences into multiple criteria or dimensions which provide more information about the items and users. For more literature on multicriteria recommender systems see the previous studies by Adomavicius and Kwon [70, 71].
Sparsity has been a major disadvantage of many collaborative filtering RSs which can significantly impact the accuracy of items’ recommendations. CF algorithm generates inefficient recommendations when there are a lower number of ratings. Enough amounts of rating data are required by CF recommendation algorithms. Clustering as well as fuzzy rule-based methods has been used in the present work to deal with this problem. Clustering techniques are aimed to improve the efficiency of recommendation agents [70, 72]. The fuzzy logic approach is demonstrated to be effective in solving the sparsity issue in CF-based recommender systems [23, 32, 73]. Moreover, the technique proposed by Adomavicius and Kwon  was used in the present work to establish similarities of users (see equation (6)), after which it was applied as a fuzzy-based similarity computation technique (see equation (7)). The average similarity of the two users is achieved by the deployment of the suggested method according to According to the definition based on the fuzzy set, we define as The recommending system initially discovers the active or target users along with the active or target hotels in the online recommendation phase. The tasks of rating predictions as well as recommendations are carried out in the next stage. The first task includes every algorithm for the prediction of the ratings associated with a list of hotels according to the determined priorities of a specific active user. In the second phase, the system includes the ranking of a list of items that have not been rated for active users after which the Top-N recommendations, including the first N hotels in the list of recommendations, are provided.
4.2. Data Segmentation
As a probabilistic method, Gaussian Mixture Model (GMM) uses parametric Gaussian distribution to illustrate each cluster of the data. Accordingly, a linear superposition of Gaussian components is used to model which is the distribution of the entire dataset (see equation (8)). and . Referring to the GMM, there are distributions that allow the establishment of , and maximum-likelihood is utilized for the estimation of the unknown parameters, through the maximization of the log-likelihood function referring to the group of the available training samples as shown in the following equation:
It has been shown that the EM algorithm is effective in data clustering in which the available dataset is incomplete. The GMM parameters in the EM algorithm are computed as follows:(1)Initialization: The initial estimates , , , are selected and then the preliminary log-likelihood is computed as follows:(2)Expectation Step (E-Step): In this step, we compute(3)Maximization Step (M-Step): In this step, the new estimate is computed as(4)Convergence Check: In this step, the new log-likelihood is computed as(5)Return to the second step if for a predetermined threshold ; else end the algorithm.
HOSVD is an extension of the classical Singular Value Decomposition (SVD) to tensors. Multilinear rank is used in the HOSVD. In this study, HOSVD aims to discover the latent relationships among the entities in a 3-order tensor . The tensor stores the data in 3 dimensions including hotels, hotels’ criteria, and users (see Figure 3). The decomposition of the tensor is performed by unfolding the tensor to have 2D matrices , , and , as
The core tensor is obtained by the following equation and using left singular vectors of the , , and :
Accordingly, we can obtain the approximation of the tensor as
To find similar users in each cluster to perform the neighborhood formation, we used the cosine similarity formula for two vectors and as provided in the following equation:where and , respectively, represent the Euclidean norm of vectors and .
ANFIS model was presented by  by integrating Artificial Neural Network (ANN) and Fuzzy Inference System (FIS). ANFIS overcomes the shortcomings of ANN and FIS, like overfitting and sensitivity to the definition of membership functions, to present a better outcome concerning the prediction issues. The most general technique for ANFIS training is the Sugeno-type FIS, in which a robust learning algorithm is utilized to choose the parameters of the model. ANFIS is structured from five layers (see Figure 4), in which each layer entails several nodes as in ANN. In ANFIS, several steps are involved as input data fuzzification, constructing the fuzzy database, constructing the fuzzy rule base, development of the decision, and presenting the data defuzzification.
5. Data Collection and Analysis
The data collection procedure in this research was performed on TripAdvisor. Numerical ratings of hotels from travelers in Malaysia were considered. The frequency of ratings in different years is presented in Table 2. The numerical ratings indicate travelers’ selections considering various hotels’ characteristics. TripAdvisor allows travelers to evaluate each of six significant folds for each hotel by the travelers. The main folds are “Service,” “Cleanliness,” “Value,” “Location,” “Rooms,” and “Sleep Quality.”
Correlation among the criteria is presented in Table 3. Travelers can also indicate their degree of satisfaction with the quality of the presented service based on the six main characteristics. The ratings are gathered from ten hotels in Malaysia on the TripAdvisor website from 2015 through 2021. A predesigned web-based crawler was used to gather the data, in which 28173 ratings were crawled. The gathered data was cleaned by removing ratings that only entail overall ratings leading to 22506 ratings. Users’ rating information for overall and criteria ratings is presented in Table 4.
6. Result and Discussion
In this study, EM was chosen as a clustering approach for data segmentation. Choosing the appropriate number of segments is a vital issue in any clustering technique. The right number of segments should be chosen carefully to present segments with the best quality. In the EM method, the maximizing of likelihood is vital. This can be achieved using Akaike Information Criterion or AIC, as a model selection approach. Hence, we applied AIC to determine the optimal number of segments in the Expectation-Maximization algorithm. Besides, we used the 10-fold cross-validation approach in the Expectation-Maximization clustering process to get an unbiased outcome. The results of the segmentation by EM for TripAdvisor are presented in Table 5 with the best value AIC (478069.6732) for 8 segments of the EM. 1-way ANOVA results for clusters versus input attributes are also presented in Table 5. The results of the test indicated the statistical significance of differences in the means among the segments. The number of ratings for 8 clusters is 2200, 2960, 3213, 2827, 2963, 1848, 3205, and 3290, respectively, in segment 1 to segment 8. The clustering quality criterion and cluster centroids are presented in Table 6. According to the table, eight cluster centroids are provided. The lowest values for cluster centroids for the majority of the criteria belong to segment 6. It is found that the highest values belong to segment 1 for the majority of the criteria. In Table 7, clustering quality criteria based on overall ratings are presented.
MATLAB R2020 software was utilized in this research. We performed HOSVD in each cluster to solve the sparsity issue. By the use of the HOSVD technique, the neighborhood was constructed to find similar users in each cluster for unknown ratings. Then we applied ANFIS in each cluster to construct the prediction models. To build the models in the MATLAB Fuzzy Logic toolbox, there are two widely used approaches: backpropagation and hybrid. As the number of inputs increases in the ANFIS model, the MF shows exponential growth, thereby causing a high computation to develop the models. Each input has three MFs assigned to it and ‘‘linear” was chosen as the ‘‘MF Type” for the output. The ANFIS structure becomes more complex with the increasing number of input MFs. Hence, converging to the target error needs more iteration and the training process requires more time. The outcomes of the defuzzification stage are then utilized for the overall rating predictions. Following the application of Principle Component Analysis (PCA) on clusters, ANFIS models were deployed to locate the relative significance of criteria and to estimate the overall ratings according to the performance criteria Rooms, Value, Location, Service, Cleanliness, and Sleep Quality. We used the hybrid and backpropagation learning approaches in the ANFIS technique. Totally, eight ANFIS prediction models were created upon the input and output data for the segments. Principal Components (PCs) were chosen as inputs of ANFIS models, in the fuzzification stages. Hence, suitable MFs were determined for all PCs. We selected triangular MFs and created the ANFIS models based on this selection. This type of MF was shown to be effective in developing recommendation systems [32, 76]. Other types of MFs such as trapezoidal and Gaussian MFs can also be used in developing the prediction models.
Different models were constructed using MFs, i.e., triangular MFs, which represent triangular MF (see Figure 5). This MF is described by the following equation:
7. Method Evaluation and Comparisons
We deployed ANFIS to model item prediction models from TripAdvisor ratings. The deployed method has several advantages over previous methods. First, we used ANFIS for learning the prediction models from the training datasets. We decided to follow this technique because ANFIS can predict overall ratings with higher accuracy. In addition, ANFIS is flexible in inducing fuzzy rules from the ratings in each segment. Second, the ANFIS has a flexible architecture which makes it appropriate in presenting stipulated input-output pairs by utilizing a group of induced fuzzy IF-THEN rules with suitable and various MFs.
In this study, R2 was deployed to identify the correlation factor, which is measured by equation (20). R2 is the proportion of variance in the observed data that is described by the model. R2 values fall in the interval of 0–1, as higher values indicate more capability of explaining the variance. Another assessment, i.e., RMSE, was also utilized to measure the error of the training, validation, testing, and all datasets. The following formulas were used in the evaluation process:where indicates the predicted labels, and indicates the true labels. is the regression sum of squares (i.e., explained sum of squares), and is the total sum of squares, which is proportional to the variance of the data.
Figures 6(a) and 6(b) indicate the RMSE of training for various epochs for various segments of EM for both hybrid and backpropagation methods. As we have used EM for the data segmentation, a total of 8 prediction models were constructed according to the number of segments generated by EM. For the error of prediction in the EM clusters, after 200 epochs, the averages RMSE, MAE, and R2 were measured. The results are presented in Table 8. It is found that the hybrid approach in ANFIS provides better RMSE and R2 values in training, testing, checking, and also all data.
To evaluate the deployed techniques compared with previously used approaches in MC-CF, we utilized the precision and recall metrics, which are broadly utilized to evaluate the quality of collaborative filtering. F-measure, which is defined as the harmonic mean of the recall and precision, is broadly utilized to assess the quality of RSs. The recall and precision measures are provided in equation (21) and equation (22). In equation (23), we present F-measure .
True relevant or TR represents the number of truly related predictions (how many suggested items as relevant that are truly relevant). Also, false relevant or FR represents the number of suggested items that are supposed to be related but indicated as false predictions (“nonrelevant”). The parameter indicates the relative influence of both measures. which usually is considered.
Precision and F1, as decision support accuracy metrics, were utilized to assess the presented approach based on various values of Top-N. We considered N = 5 to N = 100 which indicates that we assessed the approach when suggesting the top 5 to 100 hotels by the proposed collaborative system. Table 9 shows the precision values for various Top-N. It is found that precision and F1 values obtained from our new approach are comparatively high in relation to the HOSVD + ANFIS , Pearson Nearest Neighbor, and SVD. This table also shows F1 measures for various Top-N. The results show that our approach has surpassed the Pearson Nearest Neighbor approach in all Top-N recommendations. For F1, the approach which utilized EM, HOSVD, and ANFIS works better than the Pearson Nearest Neighbor approach. It achieved F1 = 0.9112 in the Top-100 recommendations. The advantage of our approach can be indicated by the usage of the fuzzy logic approach for the MC-CF part. These outcomes are adequate to endorse our hypothesis that the presented approach can present accurate and scalable outcomes in relation to the Pearson Nearest Neighbor approach.
We also assessed the approach based on various degrees of sparsity and measured the average MAE. Hence, we generate 10 sets of data with various sparsity degrees for the TripAdvisor sets of data (i.e., 99.5%, 98.5%, 97.5%, 96.5%, 95.5%, 94.5%, 93.5%, 92.5%, 91.5%, and 90.5%). We deployed the approach on the sets of data with these sparsity degrees and assessed the outcomes compared to other RSs techniques. Referring to Figures 7(a) and 7(b), the MAE values of the presented approach for all sparsity degrees of the dataset are less than other approaches. Besides, the growing ratio of the MAE for the Pearson Nearest Neighbor approach is considered very high in comparison with other approaches. The outcomes also indicated that using clustering and reducing the dimensionality along with the neurofuzzy method presented better prediction accuracy in comparison with other approaches for the sparser dataset. Thus, the approach which used the fuzzy logic approach with the aid of EM, HOSVD, and ANFIS is more efficient in addressing the sparsity problem and presenting more accurate results.
8. Discussion and Research Implications
The impact of the current health outbreak caused a steep decline in the travel and tourism market [77, 78], leading to a decline in the rating ratio of travelers for tourism products. Both industry and research fields need to gather and analyze the available data to investigate, design, and deploy appropriate recovery policies in the tourism sector . An appropriate recommender system can help tourists to overcome their uncertainty, particularly during the current crisis, regarding their travel plans and destination choice. However, CF recommendation agents suffer from serious data sparsity issue which broadly impacts the performance of RSs , particularly in the domains that have a limited average of rated items . In fact, a major issue in CF recommendation agents is the data sparsity problem [81, 82], in which it is difficult for the system to present efficient suggestions to users due to the lack of information . This issue can arise when the consumer interacts with a limited portion of products in a specific application field. The influence of data sparsity on the quality of RS can be represented by two major folds . First, when it is hard to locate the neighbors of a specific user in the domain of sparse data, particularly when there is a lack of historical data, a high degree of sparsity will restrict the system’s ability to build reliable neighborhoods . Sometimes, even though few neighbors are there, this issue can impact the accuracy of recommendations. Second, inadequate ratings presented by neighbors will impact the coverage of the presented suggestions and the ability of the deployed algorithm to present a novel and accurate list of recommendations. Unrated items and inactive travelers (in the context of TRSs) cannot be involved in the CF procedure.
ML techniques have been used in several studies in the context of RSs studies [67, 84]. ML techniques simulate people’s learning process and enable the identification and gaining of new information and, accordingly, enhance the effectiveness of particular functions based on the resulting information . ML has been used to address the sparsity problem in several studies and by adopting various techniques such as sparse Bayesian extreme learning machine , spectral coclustering , and KNN .
A considerable piece of research has therefore been performed both in the business and in academic community to advance efficient prediction models for multicriteria RSs [32, 70, 88, 89]. In the context of the current crisis of COVID-19, to the best of our knowledge, no previous study has been conducted to investigate the sparsity issue in travel recommendation agents. As the COVID-19 pandemic continues, the issues of how to manage the continuous influences on the tourism and hospitality sectors have received much attention from researchers and business managers . The tourism and hospitality sectors have been broadly influenced by travelers-generated media which are represented by online reviews and ratings . Travelers often seek help on where to travel and what activities to do. Travelers usually look for suggestions presented by other travelers. Still, human capabilities are limited which may lead to difficulty in finding suitable suggestions that match their needs and wishes. Hence, during this critical situation, RSs can be used effectively as intelligent tools and supporting systems in aiding users to reach the right choice . The deployed RSs should be flexible to address the continuous obstacles of the current crisis and meet the travelers’ needs . In this work, a new approach was presented to enhance the prediction outcomes of RSs, particularly to address the data sparsity problem during the current global crisis.
The research has several theoretical, practical, and methodological contributions. From a theoretical standpoint, this research presents one of the first attempts to overcome the sparsity problem during the emerging and novel crisis of COVID-19, which supports the current research in RSs literature. To meet the aim of this research, we presented a new recommendation method and evaluated it on the TripAdvisor dataset. Considering the practical view of point, the research can present research insights to be deployed in the design of travel RSs to overcome the sparsity problem, particularly due to the lack of data and uncertainty in the current pandemic. Enhancing the accuracy of RSs is one of the basic aims of deploying new recommendation techniques. Inadequate data is a critical obstacle when working with decision-making systems. Hence, from the methodological view of point, it is important to present new methods to overcome the sparsity problem in the RSs. As indicated by the research outcomes, the deployed techniques, dimensionality reduction, and clustering, as well as neurofuzzy, presented a robust performance in addressing the sparsity problem. The deployed method can be used effectively to face the issue of inefficient recommendations when there are a lower number of ratings.
In this study, we attempted to solve the sparsity problem and to improve the performance of RSs in the tourism context. The aim was to provide a method to solve the sparsity issue in tourism datasets that can effectively help the travel recommendation agents in the current outbreak. The approach presented in this research utilizes dimensionality reduction and clustering techniques with the aid of supervised machine learning. The method was assessed through a dataset from TripAdvisor using precision, F1, and MAE measures. We compared our results with the previous algorithms. The outcomes presented by precision, F1, and MAE indicated that the use of clustering, dimensionality reduction, and neurofuzzy approaches was efficient in enhancing the performance of the MC-CF. The outcomes indicated that the hybrid recommendation approach can be utilized to address the sparsity problems of RSs in the tourism context. In this research, EM clustering was utilized for clustering and nonincremental ANFIS and HOSVD were used for predicting and reducing the dimensionality of the data in tensors. Incremental ANFIS and HOSVD can aid the RS to present more scalable recommendations compared with the nonincremental HOSVD and SVD. Besides, in future research, we aim to investigate the integration of other clustering approaches with ensemble learning approaches to the proposed method. In addition, in this research, the proposed approach was assessed by focusing on the hotel domain. Future research can consider other domains such as e-commerce RSs. We also aim to further enhance the suggested approach and assess it by utilizing additional measures such as diversity and novelty.
Data are available on request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This research was made possible by a generous fund from the deanship of scientific research at Taif University, Taif, Saudi Arabia, under Taif University Researchers Supporting Project, Project no. TURSP-2020/344. This research was supported by Princess Nourah Bint Abdulrahman University Researchers Supporting Project no. PNURSP2022R4, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia.
W. Husain and L. Y. Dih, “A framework of a personalized location-based traveler recommendation system in mobile application,” International Journal of Multimedia and Ubiquitous Engineering, vol. 7, no. 3, pp. 11–18, 2012.View at: Google Scholar
A. Levi, O. Mokryn, C. Diot, and N. Taft, “Finding a needle in a haystack of reviews: cold start context-based hotel recommender system,” in Proceedings of the sixth ACM conference on Recommender systems, pp. 115–122, Association for Computing Machinery, New York, NY, USA, September 2012.View at: Publisher Site | Google Scholar
A. Hoque, F. A. Shikha, M. W. Hasanat, I. Arif, and A. B. A. Hamid, “The effect of Coronavirus (COVID-19) in the tourism industry in China,” Asian Journal of Multidisciplinary Studies, vol. 3, no. 1, pp. 52–58, 2020.View at: Google Scholar
L. A. Zadeh, Fuzzy Logic, Neural Networks, and Soft Computing, Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems: Selected Papers by Lotfi A Zadeh, World Scientific, New York, NY, USA, pp. 775–782, 1996.
R. Colomo-Palacios, I. González-Carrasco, J. L. López-Cuadrado, A. García-Crespo, and ReSySTER, “A hybrid recommender system for Scrum team roles based on fuzzy and rough sets,” International Journal of Applied Mathematics and Computer Science, vol. 22, pp. 801–816, 2012.View at: Publisher Site | Google Scholar
A. Jain and C. Gupta, Fuzzy Logic in Recommender Systems, Fuzzy Logic Augmentation of Neural and Optimization Algorithms: Theoretical Aspects and Real Applications, Springer, New York, NY, USA, pp. 255–273, 2018.
G. Castellano, M. G. Cimino, A. M. Fanelli, B. Lazzerini, F. Marcelloni, and M. A. Torsello, “A collaborative situation-aware scheme based on an emergent paradigm for mobile resource recommenders,” Journal of Ambient Intelligence and Humanized Computing, vol. 4, no. 4, pp. 421–437, 2013.View at: Publisher Site | Google Scholar
S. R. d M. Queiroz, F. d A. de Carvalho, G. L. Ramalho, and V. Corruble, Making Recommendations for Groups Using Collaborative Filtering and Fuzzy Majority, Brazilian Symposium on Artificial Intelligence, Springer, New York, NY, USA, pp. 248–258, 2002.View at: Publisher Site
L. Martínez, M. J. Barranco, L. G. Pérez, and M. Espinilla, “A knowledge based recommender system with multigranular linguistic information,” International Journal of Computational Intelligence Systems, vol. 1, no. 3, pp. 225–236, 2008.View at: Google Scholar
G. Guo, J. Zhang, and N. Yorke-Smith, “Trustsvd: collaborative filtering with both the explicit and implicit influence of user trust and of item ratings,” in Proceedings of the AAAI Conference on Artificial Intelligence, Association for Computing Machinery, New York, NY, USA, January 2015.View at: Google Scholar
J. Tang, S. Wang, X. Hu et al., “Recommendation with social dimensions,” in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Press, California, CA, USA, February 2016.View at: Google Scholar
A. Krohn-Grimberghe, L. Drumond, C. Freudenthaler, and L. Schmidt-Thieme, “Multi-relational matrix factorization using bayesian personalized ranking for social network data,” in Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 173–182, Association for Computing Machinery, New York, NY, USA, February 2012.View at: Publisher Site | Google Scholar
T. Zhao, J. McAuley, and I. King, “Leveraging social connections to improve personalized ranking for collaborative filtering,” in Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 261–270, Association for Computing Machinery, New York, NY, USA, November 2014.View at: Publisher Site | Google Scholar
J. McAuley and J. Leskovec, “Hidden factors and hidden topics: understanding rating dimensions with review text,” in Proceedings of the 7th ACM Conference on Recommender Systems, pp. 165–172, Association for Computing Machinery, New York, NY, USA, October 2013.View at: Google Scholar
X. Zhu, Z. Y. Ming, Y. Hao, and X. Zhu, “Tackling data sparseness in recommendation using social media based topic hierarchy modeling,” in Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, AAAI Press, California, CA, USA, July 2015.View at: Google Scholar
R. He and J. McAuley, “VBPR: visual bayesian personalized ranking from implicit feedback,” in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Press, California, CA, USA, February 2016.View at: Google Scholar
F. Vasile, E. Smirnova, and A. Conneau, “Meta-prod2vec: product embeddings using side-information for recommendation,” in Proceedings of the 10th ACM Conference on Recommender Systems, pp. 225–232, Association for Computing Machinery, New York, NY, USA, September 2016.View at: Google Scholar
H. Qiu, G. Guo, J. Zhang, Z. Sun, H. T. Nguyen, and Y. Liu, “Tbpr: trinity preference based bayesian personalized ranking for multivariate implicit feedback,” in Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, pp. 305-306, Association for Computing Machinery, New York, NY, USA, July 2016.View at: Google Scholar
B. Loni, R. Pagano, M. Larson, and A. Hanjalic, “Bayesian personalized ranking with multi-channel user feedback,” in Proceedings of the 10th ACM Conference on Recommender Systems, pp. 361–364, Association for Computing Machinery, New York, NY, USA, September 2016.View at: Publisher Site | Google Scholar
N. Leiper, Tourism Systems: An Interdisciplinary Perspective, Department of Management Systems, Business Studies Faculty, New Zealand, Oceania, 1990.
S. L. Smith, Tourism Analysis: A Handbook, Routledge, England, UK, 2014.
U. Gretzel and K. H. Yoo, “Use and Impact of Online Travel Reviews, Information and Communication Technologies in Tourism 2008,” in Proceedings of the International Conference in Innsbruck, pp. 35–46, DBLP, Germany, January 2008.View at: Google Scholar
D. Buhalis and A. Amaranggana, Smart Tourism Destinations Enhancing Tourism Experience through Personalisation of Services, Information and Communication Technologies in Tourism, Springer, New York, NY, USA, pp. 377–389, 2015.View at: Publisher Site
M. Nilashi, O. bin Ibrahim, N. Ithnin, and N. H. Sarmin, “A multi-criteria collaborative filtering recommender system for the tourism domain using Expectation Maximization (EM) and PCA–ANFIS,” Electronic Commerce Research and Applications, vol. 14, no. 6, pp. 542–562, 2015.View at: Google Scholar
X. Xu, K. Dutta, and C. Ge, “Do adjective features from user reviews address sparsity and transparency in recommender systems?” Electronic Commerce Research and Applications, vol. 29, pp. 113–123, 2018.View at: Google Scholar
M. Grčar, D. Mladenič, B. Fortuna, and M. Grobelnik, “Data sparsity issues in the collaborative filtering framework,” in Proceedings of the International Workshop on Knowledge Discovery on the Web, pp. 58–76, Springer, New York, NY, USA, October 2005.View at: Google Scholar
Q. Shambour, “A user-based multi-criteria recommendation approach for personalized recommendations,” International Journal of Computer Science and Information Security, vol. 14, no. 12, p. 657, 2016.View at: Google Scholar