Abstract

With the great development of mobile services, the Quality of Services (QoS) becomes an essential factor to meet end users’ personalized requirement on the nonfunctional performance of mobile services. However, most of the QoS values in real cases are unattainable because a service user would only invoke some specific mobile services. Therefore, how to predict the missing QoS values and recommend high-quality services to end users becomes a significant challenge in mobile service recommendation research. Previous QoS prediction researches demonstrate that the nonfunctional performance of mobile services is closely related to users’ location information. However, most location-aware QoS prediction methods ignore the premise that the obtainable QoS values observed by different users in same location region would probably be untrustworthy, which will lead to inaccurate and unreliable prediction results. To make credible location-aware QoS prediction, we propose a hybrid matrix factorization method integrated location and reputation information (LRMF) to predict the unattainable QoS values. Our approach firstly cluster users into different locational region based on their geographical distribution, and then we compute users’ reputation to identify untrustworthy users in every locational region. Finally, the unknown QoS values can be predicted by integrating locational cluster information and users’ reputation into a hybrid matrix factorization model. Comprehensive experiments are conducted on a public QoS dataset which contains sufficient real-world service invocation records. The evaluation results indicate that our LRMF method can effectively reduce the impact of unreliable users on QoS prediction and make credible mobile service recommendation.

1. Introduction

Based on the flexibility and expansibility of mobile application development technique, tens of thousands of hybrid mobile services with similar function have been developed and provided in mobile application store. However, this phenomenon automatically leads to information overload problem in mobile service retrieval system. To tackle this challenge, Quality of Service (QoS) is used in service-oriented system to analyse the nonfunctional performance of mobile services [14]. QoS has been widely used in service selection, composition, and recommendation research [59]. In real-world service invocation scenario, users would only search and select some specific mobile services under the unpredictable Internet environment. For lots of unknown mobile services, it is impractical to make users invoke each of them and evaluate their nonfunctional performance. Therefore, how to make accurate QoS prediction for unknown mobile services is a critical step to make high-quality service recommendation in mobile service computing paradigm.

Collaborative filtering (CF) is widely utilized in most e-commerce recommender systems to predict miss rating values. Traditional CF models generally fall into two categories: memory-based and model-based. The memory-based CF of QoS prediction process would generally find a subset of similar users for the target user and recommend high-quality mobile services shared by these similar users to the target user [10]. The model-based methods will train a model by learning users’ historical QoS performance and then predict the QoS values for unknown mobile services [11, 12]. Although CF model proved to be effective in QoS prediction on different mobile services, the prediction accuracy is still unsatisfactory because of cold-start problem.

To reduce the impact of cold-start problem, more contextual information is introduced into QoS prediction model for the QoS values are greatly affected by some context factors (e.g., location distribution, invocation time, and so on) in Internet. Based on this realization, the location-aware CF model is proposed to predict unknown QoS values in service recommendation [13]. As we all know, different users in one location region generally share the same set of IT infrastructure and they would suffer from similar Internet usage experience when they invoke mobile services, as it is reported in the work of [14], in which the QoS performance is strongly correlated with the location information of users. In Figure 1, we give an example of service invocation with location information. As mentioned above, the user 1 would have similar QoS records (such as response time) with other users in the US; meanwhile, user 2 may share similar QoS records with other users in India when they invocated services such as YouTube, Twitter, etc.

Previous location-aware CF approaches usually compute the similarity of all different users in one specific location region to find most Top-K similar neighbours for the target user [12, 15]. However, some unreliable users who would submit untrustworthy QoS values will be indiscreetly included in the neighbourhood set. Unreliable users would randomly provide some QoS values or better ones to improve the visibility of their own services and worse values for others’ applications [16]. Those untrustworthy QoS values would have a marked negative effect on the prediction accuracy. Therefore, it is essential to introduce credibility of available QoS values in prediction process to enhance the prediction accuracy and persuasiveness of service recommendation mechanism.

Based on above realizations, a hybrid matrix factorization algorithm is proposed by integrating users’ reputation and locational information to predict the unattainable QoS values in this paper. Complementary to previous service recommendation method which only adopts available QoS values, our study tends to make credible and accurate QoS prediction for mobile service recommendation by considering the reputation of different users’ QoS usage experience. We then exploit personal geographical distance and QoS values to find the locational similarity neighbourhoods and discover the latent connectivity between the target user and his/her neighbours. Meanwhile, we use users’ reputation to control the weight of users’ latent feature learning. Finally, these constraints are integrated into matrix factorization model to make credibly personalized QoS prediction. The following contributions are achieved in this paper:(1)We firstly cluster users into different locational regions based on their geographical distribution and design an iterative method to compute users’ reputation score by their provided QoS usage data. Then a subset of trustworthy users in each locational region can be identified by the rank of reputation score.(2)In the next step, a trustworthy neighbourhood can be identified by incorporating both users’ location distribution and reputation score. By integrating the latent feature of both available QoS values and those shared by the neighbourhood, a hybrid matrix factorization model is proposed to make high-quality QoS prediction.(3)Results of experiments conducted on a public QoS dataset show that by considering data credibility, our method can achieve higher prediction accuracy than other previous studies which involve the untrustworthy impact of available QoS values.

The remainder of this paper is organized as follows: Section 2 introduces related studies; Section 3 shows the basic principles of our method; Section 4 elaborates how to design our proposed method; Section 5 discusses experiments and results analysis; finally, Section 6 concludes the paper and draws future studies.

2.1. QoS-Based Service Computing

QoS plays a control role in service-oriented architectures, especially in the service discovery and recommendation research. Al-Masri and Mahmoud [17] firstly calculated users’ preferences on their historical QoS data, and then proposed a service discovery approach by ranking users’ QoS parameters. Kritikos and Plexousakis [18] extracted the contextual information of QoS from the description file of service and designed a service discovery method. Rosario et al. [19] proposed soft probabilistic contracts on QoS parameters to composite web services and validated their method on TOrQuE tool to show its outperformance than other previous studies. Hadad et al. [20] proposed a web service composition framework by exploiting both transactional properties and QoS values. This framework composite plenty of existing web services into a workflow which can satisfy users’ preferences on nonfunctional requirements. However, these methods only conducted experiments on synthetic datasets and lack authenticity in real-world service invocation.

2.2. Collaborative Filtering

Collaborative filtering is a common algorithm adopted by many recommendation systems, such as the famous commercial system Amazon [21]. Memory-based collaborative filtering approaches make prediction and recommendation by calculating the similarity of users or items [21, 22]. These methods utilize the whole entire user-item matrix as the input data, which will take a lot of time and memory spaces in the online recommendation system. Model-based collaborative filtering methods will generally train a predefined model on available data and predict the missing values in the test dataset, and then select the appropriate items as candidate list to the target user [23, 24]. Model-based approaches can learn the model quickly with little need for runtime and memory space, which will be often adopted in online recommendation systems.

Matrix factorization model is now widely adopted in many online recommendation systems for its effectiveness and efficiency. Mnih and Salakhutdinov [11] introduced the mathematical theory of matrix factorization in probabilistic analysis and validated the performance of this method on a famous film recommendation system. Zhang et al. [25] designed a personalized recommendation approach by integrating original matrix factorization with a constraint item extracted from their personal information. It identifies users into different clusters by the statistics of user behaviours on different tags and considers this constraint as a regularization term in matrix factorization model to enhance its prediction accuracy. Ma et al. [26] improved the matrix factorization approach with users’ social information to enhance the prediction accuracy for social recommendation. This approach uses users’ social relationship as an additional constraint which can reflect users’ latent judgment of interest on items in the user-item matrix factorization. Recently, the matrix factorization methods have been widely introduced into service recommendation research [10, 15]. Although matrix factorization methods make some improvements in prediction accuracy, none of them realize that the QoS credibility deserve serious consideration.

2.3. Location-Based QoS Prediction

Location information has been widely used for service recommendation in recent years. Ali and Solis [27] presented a novel distributed service architecture that can adapt to the changes of Internet resources and location topology. Wei et al. [15] firstly calculated users’ similarity with their real-world distances and then clustered users into geographical sets as a constraint item of matrix factorization to generate the location-aware QoS prediction approach. Tang et al. [28] solved the sparsity issue in QoS aware service recommendation by integrating collaborative filtering with users’ geography data. Lee et al. [29] adopt the preference propagation through users in same location region to improve prediction accuracy. This work clusters users and service into different groups by the locational information and then use preference propagation to compute the similarity between different users and services, respectively. Finally, a matrix factorization model is introduced to predict missing QoS values by integrating these constrains. Gonsalves and Patil [30] exploited users’ location information and QoS values to cluster users and web services and then proposed a CF algorithms to make personalized web service recommendation. It firstly uses Pearson correlation coefficient (PCC) to identify different users and service regions and then exploit K-nearest neighbour (KNN) and support vector machine (SVM) in CF algorithm framework to predict missing QoS values. However, above studies do not consider the users’ reputation, and neglect the fact that available QoS values may be untrustworthy even though these values are provided by users in same location region.

2.4. Reputation

Based on the achievement of reputation in applications (e.g., YouTube and Twitter) to avoid possible deception risk, some academics introduce the reputation into QoS prediction to enhance the reliability of service-oriented computing. The reputation values evaluated from QoS data can measure whether the available QoS values are trustworthy or not. Qiu et al. [31] designed a QoS prediction method by calculating users’ reputation to obtain higher accuracy for service recommendation. In their work, reputation of different users will be computed and ranked to find the subset of unreliable users. Followed by this, the memory-based collaborative filtering model combined with reputation for QoS prediction becomes more remarkable. Based on Qiu’s work, Xu et al. [32] presented an improved QoS prediction method with the users’ reputation (RMF), which introduced users’ reputation weight into of a matrix factorization approaches to make QoS prediction for unknown services and then recommend high-quality services to the target user. Mehdi et al. [33] introduced a stochastic approach to evaluate the reputation of services by leveraging the correlation information among different QoS metrics. Comi et al. [34] proposed a hybrid service composition method by exploiting users’ reputation of QoS to help users discover and select high-quality services in multicloud environment. However, these studies do not take location information into consideration on the QoS prediction.

3. Principles and Reputation Analysis

In this section, we would introduce the main principles our LRMF method at the beginning and then present the reputation analysis on different users.

3.1. Principles of LRMF

Previous CF-based QoS prediction approaches (e.g., [7, 35]) only utilize the available QoS values in collaborative filtering model to make personalized service recommendation. However, these methods ignore that users’ reputation and location will make great impact on the prediction results. Therefore, we design a novel mobile service recommendation system by considering both of users’ reputation and location information when predicting missing QoS values. As presented in Figure 2, historical invocations with QoS and location data will be submitted to server database in the service invocation process. Then, the reputation and location information could be calculated by the collected historical QoS dataset, and finally, we can predict missing QoS values with reputation and location information. The main workflow of LRMF is demonstrated as following:(1)The multisource invocation records will be collected and submitted into the database when users invoke different mobile services.(2)In the data preprocess, the available historical QoS values and users’ location info will be extracted as the input data of our method.(3)Based on the available QoS values and location data, users’ reputation score can be computed by our iterative method and the location region can also be identified by their real-word location distribution. Then, the trustworthiness of users in different location region can be evaluated.(4)A hybrid matrix factorization model is proposed by integrating location grouping and users’ reputation to predict missing QoS values of unknown mobile services.(5)Finally, by combining the predicted results and available QoS data, the high-quality services will be discovered and recommended to the target user.

3.2. User Reputation Analysis

In order to identify untrustworthy users in a given geographical region, we analyse the QoS values from users’ historical service invocation records. Figure 3 demonstrates the response time of users in a same region (i.e., the United States in this example) of three randomly selected services from the real-world QoS dataset [36]. As a demonstration, Figure 4 describes the response time of 100 randomly selected services which are invoked by 5 randomly selected users in a same region.

Figure 3 shows that the response time of service invocation varies among users even they are in the same location group. Although most of the response time values falls into the normal range, i.e., [0, 2], some users still submit outlier QoS records when they invoke services. It is unlikely that a QoS item would deviate from the normal value too much. For a specific user, if most QoS values significantly deviated from the normal range, he/she is probably an unreliable user. In order to test whether there are some unreliable users or not, we analyse the QoS data submitted by 5 randomly selected users on 100 services.

It is obviously in Figure 4 that the user 4 is an unreliable user because all of his submitted values deviated greatly from the normal range [0, 2]. Based on this analysis, we can compute the reputation of users by his past QoS value records and measure whether these users should be regarded as unreliable users or not. In next steps, the algorithm of users’ reputation computation will be introduced in detail.

4. Hybrid Matrix Factorization Based on Location and Reputation

We will firstly present the original matrix factorization approach in this part and then propose our improved matrix factorization model.

4.1. Matrix Factorization Prediction

Matrix factorization utilized low-rank approximations to fit the sparse matrix of user and item. It factorizes original sparse matrix into two low-rank matrices with small number of factors. This factorization is based on the hypothesis that users’ latent preference on QoS values would be significantly affected by some latent factors. Then an objective function can be defined as the sum error of original values and the predict values by the conducts of the two low-rank matrices.

We suppose there is a QoS matrix where users in the rows and service in the columns and two low-rank matrices represent user-specific feature and service-specific feature, respectively. The QoS matrix can be regarded as a product of matrix multiplication on low-rank matrix and approximately as follows:where and . is the number of latent factors.

Then, the objective function can be defended by minimizing the sum error of available values in original matrix and the corresponding predicted values in matrix :where represents the available QoS value provided by user on service ; denotes the row of ; is column of . However, the available QoS values are limited in real invocations scenario, so an optimal objective function can be defined to solve this issue:where indicates the QoS value provided by user on service is unknown and otherwise. Two regularization terms are introduced in Equation (3) to avoid the overfitting problem as follows:where is the Frobenius norm. Equation (4) is the objective function of matrix factorization approach to minimize the squared error between the predicted values and original values. The gradient decent algorithm is adopted to train our model and get a local minimum value of (4) as follows:

4.2. Locational Grouping

In this part, we identify different users by their real-world location to acquire a subset of users who have geographical similarity with the target user. Since the location information significantly affects the QoS, it should be considered a significant factor in QoS prediction [13]. We calculate users’ physical distance to generate the location region. The Euclidean distance between users on can be computed by following definition:where and represent the longitude and latitude of user in the real world, respectively. converts the unit of degree into 2D meter with a constant value. In our study, takes the value of 111,261.

The geographical region then could be generated by selecting a set of users who are with small distance calculated by Equation (7). On one hand, the size of this set cannot be too small; otherwise, too many similar users would be filtered. On the other hand, it cannot be too large; otherwise the different locations would not be correctly recognized. For a target user , the region can be defined as follows:where is a positive variable locational threshold which affects the region size, and denotes the subset of users who are in the same geographical region with user . Here, we use the real-world distance (i.e., distance based on longitude and latitude) other than the country-level type (i.e., differentiate users based on their countries).

4.3. Reputation Algorithm

We firstly give a definition as the reputation score of user . If user gets a higher reputation score, he/she can be considered as a reliable user. Then we propose an iterative and incorporative method to compute different users’ reputation score:where is the iteration and represents the damping factor in [0, 1], is the reputation iteration of user , are user ’s invocated services, represents average value of service , and denotes users who invoked the service . Each user is assumed to be trustworthy, and the reputation will be assigned an initial value of 1 in first step of the iteration.

From the above discussion, it is obvious that it is an iterative process to calculate users’ reputation. The reputation score of each user is computed by the difference between the available QoS values in original matrix and the average QoS values provided by the target user on all invocated services.

4.4. QoS Prediction Based on Location and Reputation

The method in the previous section is the original matrix factorization method to predict QoS values with historical invocation records. To take full advantage of users’ location data and repudiation score, a high-performance hybrid matrix factorization method is proposed as follows:where parameter denotes the reputation score of user . The previous studies make QoS prediction based on the premise that users in suffer similar service invocations and observe similar QoS usage experience [15, 28]. However, there are some unreliable users who would provide untrustworthy QoS values and make bad impact on prediction result. Therefore, we introduce into Equation (10) to regulate the credibility of different users. Meanwhile, users’ location information should be introduced into our method to enhance the prediction accuracy as is mentioned in the previous section. Therefore, a locational constraint item is defined as follows:where denotes the relative proportion of the location grouping, represents the latent factors of user , represents the subset of users who are near to user , and represents the similarity between user and user , defined aswhere is the real-world distance between user and user , calculated by Equation (7).

The gradient parts of objective function Equation (10) could be calculated by employing the gradient descent method in and :

Based on the update process in Equations (5) and (6), we can update and with the two derivative Equations (13) and (14) until we get the local minimum of objective function Equation (10).

5. Experiments

5.1. Experiment Setup

A series of experiments are conducted on a well-known public QoS dataset, which is provided in the previous related work [7]. The dataset contains 1,974,675 response time records of service invocation, which is collected from 339 users on 5,825 services. Figure 5 shows the distribution of all values in the dataset and nearly 90% of all response time values are in the range [0, 2]. To simulate a real-world service invocation scenario, several values of response time records are randomly removed to generate random unreliable users, and different numbers of unreliable users will be introduced into the dataset to study the impact of users’ reputation on the prediction accuracy.

5.2. Evaluation Metrics

In the experiment, we utilize two famous statistical metrics, i.e., mean absolute error (MAE) and root mean squared error (RMSE) to evaluate our predicted result. If the prediction result is closer to the actual QoS value, the smaller value of MAE and RMSE will be generated and higher prediction accuracy can be achieved. MAE is given byand RMSE is calculated as follows:where represents the available QoS value in original matrix, denotes the predicted QoS value, and represents the number of unknown QoS values.

5.3. Comparison Study

The proposed approach is compared with following CF approaches:(i)UPCC. This approach utilizes PCC to calculate similarity between users [37]. In service computing researches, this approach can be employed to predict QoS values.(ii)IPCC. This approach is a common commercial recommendation method. Service recommendation systems usually utilize this method to calculate service similarities and make prediction [38].(iii)UIPCC. Both UPCC and IPCC are employed synchronously in the prediction framework [39].(iv)PMF. This approach uses probability theory to explain how to use matrix factorization make prediction [11].(v)LBR1. This approach predicts QoS values by combining geographical information and matrix factorization approach [15].(vi)RMF. This approach utilizes available QoS values to calculate users’ reputation and adopt matrix factorization model to predict missing QoS values of unknown services [32].

To simulate service invocation cases, a set number of elements are removed from the original QoS matrix. After this data preprocess, the density of the final matrix is set to 5%, 10%, 15%, and 20%, respectively. The parameter is set 0.1 to compute users’ reputation. We also set , dimensionality = 50, and . The number of unreliable users is equal to 10 about 2.79 percentage of all users. The overall comparison details are presented in Tables 1 and 2.

The comparison results in Tables 1 and 2 show that the proposed LRMF can achieve higher prediction accuracy than other state-of-the-art approaches, which indicates LRMF has greatly improved QoS prediction accuracy. According to the comparison result, LRMF could achieve the best prediction performance.

5.4. Impact of Unreliable Users

The number of unreliable users determines the untrustworthy QoS values in the training dataset, which will produce great influence on the prediction method. In the experiment, the number of unreliable users is set from 10 to 80 under the condition of dimensionality in 80 and matrix density in 5%, 10%, 15%, and 20%, respectively.

As shown in Figure 6, both MAE and RMSE values of LRMF are significantly smaller when matrix density becomes denser under different conditions. It can also be proven that more available QoS values will make better prediction result. When the number of unreliable users changes in the range of [30, 80], both MAE and RSME do not increase so much, which demonstrates that our LRMF method could minimize the negative impact produced by unreliable users and improve QoS prediction accuracy.

5.5. Impact of Parameter

The parameter determines the proportion of the location region factor in our proposed method. For one thing, too large value of will create a strong relation between the prediction accuracy and the geographical region; for another, if is assigned too small, the location region cannot generate enough contribution to the objective function. The comparative study for is introduced with the condition of dimensionality = 50 and matrix density in 5% and 20%. Also, we add 10 unreliable users in experiment settings. The details of the experiment are presented in Figure 7.

As presented in Figures 7(a) and 7(b), the MAE can achieve a minimum when reaches a certain value 10−3 and when is smaller or larger than this specific value, the MAE will fluctuate wildly. It is similar in Figures 7(c) and 7(d), the RMSE will reach a certain threshold when increases to 10−2 at beginning, but it will increase slightly after the threshold. The analysis shows that users’ location data would make a remarkable impact on the prediction accuracy of LRMF.

5.6. Impact of Parameter

In LRMF, the location group threshold determines the geographical region size. If is assigned too small, the region would be very small and users in the region would have a very short distance. If is assigned too large, much more users would be identified into the same geographical region, which would lead to more noise data and neglect the local factor.

To study how parameter impacts on the prediction approach, the value of dimensionality is assigned 50 under the condition of matrix density in 10% and 30%. We also add 10 unreliable users to experiment settings.

Figure 8 illustrates how parameter affects the prediction accuracy. It is obvious that the evaluation values decrease when parameter increases firstly, but both two kinds of evaluation values gradually increase when parameter passes over a threshold. The observed phenomenon could be considered as when parameter is smaller than a certain value, the location region lacks enough similar users for active user, which prevents the crowds to contribute their collective intelligence. When parameter is assigned to be a large value, too much noise data will be included in the location region. Both two cases would produce negative impacts on the prediction curacy.

5.7. Impact of Density

To study what impacts does matrix density have on the prediction result, we add 10 unreliable users and set dimensionality to 10 and 50 when assigning matrix density from 5% to 20%.

As shown in Figure 9, both MAE and RMSE decrease firstly when the matrix density increases at first. Then, the curve becomes flat when the matrix density continues to increase. These comparative details show that the sparsity of original data would have great impact on the prediction accuracy. If more additional entries are available, the proposed method could get better prediction result. This observation demonstrates that when the original sparse matrix becomes denser by collecting more QoS values, the prediction accuracy can be greatly enhanced in our proposed method.

5.8. Impact of Dimensionality

The number of latent feature vectors is regulated by dimensionality in our LRMF method. We vary the dimensionality from 10 to 100 under the condition of matrix density in 5% and 10 and 30 unreliable users, respectively, to conduct the comparative experiments.

Figure 10 shows that a proper value of dimensionality can achieve better prediction result as we can get a smallest prediction error. Both of MAE and RMSE decrease firstly because of more latent factors are added into the factorization process. When the dimensionality overpasses a certain threshold, more noised data may be brought into the training model with overfitting problem. As a result, the threshold of dimensionality in our model is approximately assigned 80.

6. Conclusions and Future studies

Based on the premise that the reputation of mobile service users would make great effect on the unknown QoS prediction, this paper propose an efficient prediction method by simultaneously exploring users’ reputation and geographic distribution to make personalized service recommendation. We firstly cluster service users into different locational groups by their real-world geographical information and then calculate their reputations by the historical QoS values. At last, a hybrid matrix factorization model is proposed by integrating users’ reputation and geographic data to predict unknown QoS values. Experimental analysis on public QoS dataset demonstrates the high-performance and effectiveness of our LRMF on QoS prediction. The analysis shows that there are some unreliable uses in some location region and they submitted untrustworthy QoS values to gain benefits for their own services. The mobile service recommendation approach proposed in this study could reduce the poor effect of unreliable users and recommend high-quality and credible mobile service to end users.

In this paper, we only consider users’ information as a significant issue to predict unknown QoS values for mobile service recommendation. In fact, the geographical distribution of services would also provide useful facilities when identifying users’ location region based on the distance between user and service. Therefore, it is potential to design more accurate location grouping model by combining both users’ and service’s location information. Besides, if the algorithms of users’ reputation computing have deficiencies, then we would try to introduce intelligence methods, e.g., deep learning, reinforcement learning to design optimal algorithms for QoS prediction. Furthermore, the reliability of user may be affected by their trusted friends, so we will continue to track users’ reputation in their social network to make high-quality service recommendation in our future work.

Data Availability

The real number data WS-DREAM used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was funded by National Natural Science Foundation of China (nos. 6167060382 and 61602070), New Academic Seedling Cultivation and Exploration Innovation Project (no. [2017]5789-21), and China Scholarship Council (no. 201706050085).