Research Article  Open Access
Wenming Ma, Rongjie Shan, Mingming Qi, "General Collaborative Filtering for Web Service QoS Prediction", Mathematical Problems in Engineering, vol. 2018, Article ID 5787406, 18 pages, 2018. https://doi.org/10.1155/2018/5787406
General Collaborative Filtering for Web Service QoS Prediction
Abstract
To avoid the expensive and timeconsuming evaluation, collaborative filtering (CF) methods have been widely studied for web service QoS prediction in recent years. Among the various CF techniques, matrix factorization is the most popular one. Much effort has been devoted to improving matrix factorization collaborative filtering. The key idea of matrix factorization is that it assumes the rating matrix is low rank and projects users and services into a shared lowdimensional latent space, making a prediction by using the dot product of a user latent vector and a service latent vector. Unfortunately, unlike the recommender systems, QoS usually takes continuous values with very wide range, and the low rank assumption might incur high bias. Furthermore, when the QoS matrix is extremely sparse, the low rank assumption also incurs high variance. To reduce the bias, we must use more complex assumptions. To reduce the variance, we can adopt complex regularization techniques. In this paper, we proposed a neural network based framework, named GCF (general collaborative filtering), with the dropout regularization, to model the userservice interactions. We conduct our experiments on a large realworld dataset, the QoS values of which are obtained from 339 users on 5825 web services. The comprehensive experimental studies show that our approach offers higher prediction accuracy than the traditional collaborative filtering approaches.
1. Introduction
In recent years, web services have become one of the most popular techniques to build large distributed systems with interoperable machinetomachine interaction over Internet. By automatic selection, any business organizations’ internal applications or services can seamlessly integrate the services of others. Web service selection in terms of functional properties has been extensively studied for many years [1–3], but finding services that satisfy the functional requirements is not enough. Even if web services provide the same functionality, they always differ in QoS (QualityofService) properties, due to their dependence on the circumstances of users and web services. With the increasing web services that have identical or similar functionalities, selection process must also aim at the services best meeting the customers’ requirements in terms of QoS. Due to the notable importance of QoS in building reliable and userfriendly distributed applications, QoSbased web services selection has been gaining much attention from both the academia and industry [4–6].
Web service QoS refers to nonfunctional properties, such as availability, price, failure probability, response time, and throughput. It is crucial to know the exact QoS values to select the best services. Unfortunately, evaluating the QoS values of the realworld web services is difficult and sometimes even impractical. It is timeconsuming and resourceconsuming for each user to evaluate the QoS values of all the service candidates. Moreover, commercially valuable web services are usually charged, hosted by large business organizations or companies, which cannot be operated by the unpaid users.
An attractive way is that we can predict the QoS value of a service by taking advantage of the past web service usage experiences of a small amount of the other users who have evaluated this service. Therefore, collaborative filtering (CF) techniques have been widely studied in recent years for web service QoS prediction, which already have been adopted for the recommender systems for a decade [7, 8]. In general, there are two types of methods commonly used in collaborative filtering recommender systems, which are referred to as neighborhood based CF and model based CF. Neighborhood based collaborative filtering algorithms include userbased algorithms and itembased algorithms, but such techniques do not work well for sparse rating data. Among various model based collaborative filtering approaches, matrix factorization method is considered to be state of the art in recommender systems [9–11].
The core idea of matrix factorization is that it assumes the rating matrix is low rank. That means many rows and columns are linearly correlated. It project users and services into a shared latent space, and then the interaction between a user and a service is modeled as the inner product of their latent vectors.
Many researchers have provided various matrix factorization methods for web serviced QoS prediction, simply borrowing the similar models already studied in recommender systems. However, we found that they did not recognize that the QoS often takes continuous values with very wide range, but the useritem rating in recommender system is usually a discrete value belonging to a limited set, such as a integer value ranging from 1 to 5. The low rank assumption would incur high bias for QoS prediction due to the low capacity of matrix factorization models. Furthermore, when QoS value matrix is extremely sparse, matrix factorization models also have high variance due to many kinds of solutions. To reduce the bias, we need more complex models with high capacity. To reduce the variance, we need more complex regularization techniques.
Recently, deep neural networks have yielded remarkable success on many applications, especially on the computer vision, speech recognition, and natural language processing. The exploration of deep learning technologies on recommender systems or QoS prediction has received relatively less attraction. Deep neural networks have very high capacity, but they also have very flexible and practical regularization methods. One of the most popular regularization methods is dropout, which is widely used in realworld applications.
Some recent new novel studies, such as Google’s Wide and Deep model and Microsoft’s Deep Crossing model, have tried to employe deep learning for recommendation, but they primarily model auxiliary information [12–14]. Other studies, such as neural collaborative filtering (NCF), a novel approach learning useritem interactions using neural style, outperformed the statedoftheart methods by large margin [15]. However, NCF models focused on implicit feedback and items ranking, which is not appropriate for QoS prediction. Our work, inspired by the above approaches, presents some novel neural architectures for web service QoS prediction. The main contributions of our work are as follows:(i)We formalize the high bias and high variance problem of traditional matrix factorization models (short for MFHBV problem) for web service QoS prediction.(ii)We propose a novel neural network based framework for web service QoS prediction, termed as GCF (general collaborative filtering). The traditional matrix factorization models are special cases of this framework. We also provide four different implementations of GCF, including NLMF (NonLinear Matrix Factorization), DNNCF (Deep forward Neural Network Collaborative Filtering), ResNetCF (Residual Network Collaborative Filtering), and HyResNetCF (Hybrid ResNetCF).(iii)We perform experiments for a large realworld dataset, and the extensive experimental investigations are conducted to study the QoS value prediction accuracy of our GCF approaches.
The remainder of this paper is organized as follows: We present our GCF framework in Section 2. Section 3 presents several GCF implementations for web service QoS prediction. Section 4 describes our experiments. Section 5 introduces the related work, and Section 6 concludes this paper.
2. GCF Framework
2.1. QoS Prediction Task
QoS prediction problem is closely related to matrix completion, which is a process of adding entries to a matrix which has some unknown missing values. Figure 1 shows an example of QoS prediction scenario, including 6 users and 6 web services from three different locations. Each user stores response time of several services measured by itself, but does not measure the response time of all the 6 services. For example, the user in the location 1 only knows the QoS values of service in the location 3 and services , in the location 2. All the measured QoS values constitute a spare matrix, some entries of which are not defined. The QoS task is evaluating the missing values in the matrix by using the observed values.
However, the QoS matrix is always very large and sparse. The missing values take a large proportion of the matrix. Therefore, the traditional missing value estimation methods need specialized adaptions.
In fact, the QoS prediction task is a special regression problem. Usually, a use or a service does not have a concrete feature. Although we can use handdesigned features, such as users’ demographic characteristics or services’ text descriptions, we only have interaction data most of the time. We must learn abstract features for the users or items.
Generally speaking, when modeling QoS prediction task, we need to solve the following problems:(i)We learn the latent vector for each user and service. For the th user and the th service, the latent vectors are defined as and , respectively.(ii)We then need to define a prediction function .(iii)A loss function should be defined, such MSE (Mean Square Error) or MAE (Mean Absolute Error).(iv)A learning algorithm is designed to train the function using the observed values in the matrix. Note that , , usually are learned jointly in the same algorithm.
To conduct the QoS prediction task, a lot of matrix factorization (MF) based models have been proposed. Section 2.2 will analyze the limitation of matrix factorization and then our GCF framework is outlined in Section 2.3.
2.2. Formulization of MFHBV Problem
In recommender systems, matrix factorization models use the dot product of two latent vectors as the predicted value, which is defined asSuppose the size of the true rating matrix is . The length of or is usually less than and . can be approximated by the dot of two lowrank matrices: . The row of is , and the row of is . and are learned from observed entries of .
This might work well because the useritem rating is usually a discrete value belonging to a small set. However, the QoS often takes continuous values with a very wide range, so the low rank assumption might incur high bias.
Denote as the total ratings of user for services or items. is the row of matrix . Cosine similarity is a common measure of similarity between two nonzero vectors. The cosine similarity between and is defined as
Suppose the ratings are normalized and standardized. When the ratings for the each service are very close, and can approximately be considered as linearly correlated. Here, we use the expected value of the sum of absolute difference of ratings to measure the credibility of linear correlation (CLC) of the two vectors: Small value of CLC means the high credibility of linear correlation.
Suppose the ratings are chosen from a set , where and . The value of lies in to . If this value is uniformly chosen from to , then the expected value of is as follows: Now, we can conclude that
Therefore CLC is determined by the number of services (or items) and the difference between the maximum and minimum of ratings. Usually, in recommender systems, ratings are discrete values with limited range, such as 1 to 5. The CLC would be . However, QoS may takes continuous value. The response time always is in a range of 0 to 19 (s), and the throughput is often from 0 to 2000 (kb/s). The CLC is so large in QoS prediction applications that many rows or columns cannot be simply considered as linearly correlated. That is why low rank assumption incurs high bias for QoS prediction.
Furthermore, usually, QoS rating matrix is extremely sparse. Suppose we know the value of the true latent vector for each service , and now we want to learn for a specific user . If the number of rows is less than the number of columns, we would have infinite solutions for . However, in practice, we need to learn both users’ latent vectors and services’ latent vectors. If the observed ratings are extremely sparse, there are infinite ways of matrix factorization. That is why low rank assumption incurs high variance.
One might consider some regularization methods for matrix factorization. However, this only reduces the variance, but cannot reduce the bias, and sometimes might increase the bias. Therefore, we need more complex models with high capacity to reduce the bias and then use complex regularization technologies to reduce the variance. The next section will introduce the details of our GCF framework, which is more general and flexible and can learn the userservice interactions more accurately.
2.3. General Collaborative Filtering
In fact, the process of learning prediction function can be classified into three categories:(i)Predefine features for users and items (, ) and only learn the weights .(ii)Predefine the weights and learn the features for users and items (, ).(iii)Learn features and parameters simultaneously.
Generally speaking, the first type of learning process is related to contentbased recommender algorithms, while the other two types are related to collaborative filtering recommender algorithms.
The second type of learning process mainly involves matrix factorization models, where the predicted QoS value is usually defined aswhere denotes the elementwise product of vectors. In this case, the weights are predefined as a vector .
Matrix factorization models use the dot product of two vectors as the predicted value. However, the underlying relationship between the QoS value and latent vectors might be too complex to be captured well by using only the dot product.
In this paper we present a general collaborative filtering (GCF) framework that uses the third type of learning process. Suppose each user has a unique index value ranging from 1 to , and each service has a unique index value ranging from 1 to . Here, we use the integer to represent the index value of the user and use the integer to represent the index value of service . Formally, we can say that and . The prediction function of GCF is as follows:where is a function of transforming index values to a shared kdimensional latent space: , .
We provide three types of definitions for :where and are two different transformed interactive features of and :
Note that we can use either or or both of them in our model. , , and are termed as pFactor. The traditional matrix factorization method is just a special case of the first type of GCF framework, which only uses and an identical feature transformation that , .
To learn GCF models, we need to solve the following optimization problem:
is the set of ratings observed in . Note that “” means the regularization policy we adopt. Theoretically, we can use any kind of regularization, but we mainly use Dropout in our models [16, 17].
Now, there are two remaining problems to be solved:(i)How to implement the function to transform the index values to latent factors?(ii)How to implement the function to transform latent factors to interactive features?
The next section will give the implementation details of how to solve these two problems. However, the GCF we proposed in this paper is a general framework, and anyone can design different strategies to solve the two problems above. We believe our GCF framework would inspire more researchers to present various algorithms for different applications.
3. Implementation
In this section, we use neural network architecture to implement the GCF models. We first present a solution of implementing the function and then elaborate some strategies on how to design . Finally, we present some important algorithms.
3.1. Neural Latent Feature Embedding
In fact, each user or service has a unique identifier in the realworld system. We suppose that each identifier is a unique integer number. If the identifier of a user or a service is equal to , it refers to the user or service.
Now let us use a onehot vector to represent a user or a service. Suppose we have at most users and services. The user and the can be expressed aswhere the superscript “r” refers to the “raw” feature. Our neural latent feature embedding is defined as follows:where is a matrix and is a matrix. Expanding the formulas and , we can see that
Therefore, is the th column of matrix , and is the th column of matrix . Actually, the tradition matrix factorization predicted QoS matrix can be expressed as by using our architecture. and are termed as embedding layers in the neural network architecture. Figure 2 shows the architecture of the neural latent feature embedding. Above on embedding layers, we can design various other neural layers.
3.2. Learning High Level Interactive Features
After embedding users and services as latent features, we need to transform the latent features to more complex representations.
Here, we use a series of nonlinear functions to transform , , or where can be any kind of simple nonlinear activation function or complex compound function.
For the simple nonlinear activation function, we adopt ReLU [18–20], which is defined as
Note that we usually add a dropout regularization function for some .
Suppose ; if we use dropout, we need a random vector , where the length of is equal to and . is the drop ratio. Now becomes
For the complex compound function, we provide a novel crossing residual unit (CRU):
Instead of adding to , we add to . Our CRU can reduce the variance when using dropout. Figure 3 gives an intuitive explanation of CRU. If we use CRUs, in the last CRU is termed as tFactor, the size of which is important for training.
3.3. Typical GCF Models and Algorithms
Depending on how to choose the pFactors, transformation functions, and prediction functions, we can get various GCF models with different neural network architectures. Table 1 lists 5 types of typical GCF models, and a general architecture for these different models is shown in Figure 4.

UCMF (UnConstrained Matrix Factorization) is actually the traditional matrix factorization, and the predicted QoS value is . NLMF (NonLinear Matrix Factorization) is a simple extension of the traditional matrix factorization, which replaces and with and , respectively. The transform functions are a series of ReLUs. UCMF and NLMF belong to the first type of GCF framework. DNNCF and ResNetCF belong to the second type of GCF framework. However, DNNCF uses the ReLUs but ResNetCF uses the CRUs. HyResNetCF belongs to the third GCF framework. It uses ReLUs to transform , and uses CRUs to transform . Note that UCMF is just a neural network version of tradition matrix factorization, so GCF models refer to NLMF, CNNCF, ResNetCF, and HyResNetCF in the later discussion, not including UCMF.
Now, we give some important algorithms to implement GCF models. Due to the limited space, we only elaborate the algorithms related to HyResNetCF, and the algorithms of the other models are similar. We implement the algorithms using Keras framework, so the pseudocodes are similar to Keras codes [21].
Algorithms 1 and 2 show how to build and train the model, respectively. Embedding function refers to , which is implemented according to formula (13). Concat function merges two vectors to a long vector. CRU function refers to CRU neural unit, which is implemented according to formula (17). Dense is a fully connected layer. Split function separates users’ ids, services ids, and QoS values from the dataset. Given a training set, the training time is mainly determined by the mode size, batch size, learning rate, and the optimization function. In this paper, we use the Adam as the optimizer, which uses a separated learning rate for each parameter and automatically adapts these learning rates throughout the course of learning.


4. Experimental Results
4.1. Dataset and Evaluation Metrics
We evaluate the proposed methods using the response time values of a publicly large accessible dataset: WSDREAM dataset#1, obtained from 339 users on 5825 web services [22]. Note that the original dataset is a dense full matrix. We construct training and test sets as follows:(i)We first filter out zero values, which are not useful for evaluation, and get the dataset .(ii)We then randomly (sparsity level) of QoS values from as the training set , and the remaining values as the test set . We construct 6 different datasets with different sample densities of 5%, 10%, 15%, 20%, 25%, and 30%.
We adopt two evaluation metrics: Mean Absolute Error (MAE) and Normalized MAE (NMAE). MAE metric is widely employed to measure the QoS prediction quality, which is defined as
Since different QoS properties of web services have different value ranges, we also use the Normalized MAE to give a better interpretation of the MAE metric. Here, we define the NMAE as follows:
4.2. Performance Comparison
We compare our GCF methods with some traditional collaborative filtering methods: UPCC [23], IPCC [23], UIPCC [24], and UCMF. For UPCC, IPCC, and UIPCC, different tradeoff parameters (the parameters of top similar users or services) are tried, and finally we choose . For UCMF, DNNCF, NLMF, ResNetCF, and HyResNetCF, the sizes of latent factors are also set as 10. For DNNCF, NLMF, ResNetCF, and HyResNetCF, different hidden layers and different hidden unit sizes are tried. Here, the maximum number of hidden layers is limited to 3. Finally, NLMF uses one hidden layer and the other GCF models use 3 hidden layers. We tested the batch size of , 256, 512, , the learning rate of , 0.0005, 0.001, , and the training epoch of , 20, 30, 40, 50, 60, 70, .
For UCMF, we did not adopt any regularization because we found that there is no help to improve the prediction accuracy. For GCF methods, we adopt dropout regularization. The drop rate is set as 0.5 for the layer of which the size is equal to or larger than 64, and 0.2 for the layer of which the size is smaller than 64.
We conduct 10 experiments for each model and each sparsity level and then average the prediction accuracy values. is a onehot vector, the length of which is 5825. Only the th element is 1. This vector represents the raw feature of a service. The the size of weights matrix is 582510. This matrix is learned jointly with the other parameters of the whole neural network. The output of each hidden layer, , represents different abstract features learned by the network. The higher level feature is determined by the lower level feature and the corresponding weights matrix . These matrices are also learned together while training.
The average total training and testing time (from density 5% to 30%) per experiment of each model finally chosen is as follows: UPCC took about 12 minutes, IPCC took about 13 minutes, UIPCC took about 15 minutes, UCMF took about 18 minutes, DNNCF took about 27 minutes, NLMF took about 19 minutes, ResNetCf took about 42 minutes, and HyResNetCF took about 53 minutes.
The results are reported in Tables 2 and 3 and Figure 5. We can make the following observations:(i)With the sparsity level increasing, all the models have much better prediction accuracy (lower MAE and NMAE). When sparsity level is set as 30%, all the models have similar prediction accuracy except UPCC.(ii)Our GCF methods outperform the traditional collaborative filtering methods, especially when the QoS rating matrix is extremely sparse. HyResNetCF has the best prediction performance.(iii)DNNCF, ResNetCF, and HyResNetCF are all large networks. Although they get similar prediction performance, we found that DNNCF has higher variance.(iv)UCMF has very high variance when the QoS rating matrix is very sparse, and the variance decreases when the sparsity level becomes larger. However, it still has high bias and has lower prediction performance than the other GCF models.


(a) MAE w.r.t. density
(b) NMAE w.r.t. density
When the QoS matrix density is set to 5%, the MAE of UCMF is 0.6284, but the MAE of GCF models is 0.5248 (DNNCF), 0.5176 (NLMF), 0.5154 (ResNetCF), and 0.5111 (HyResNetCF), respectively. On the other side, the SD (Standard Deviation) of UCMF is 0.0256, but the SD of GCF models is 0.0078 (DNNCF), 0.0094 (NLMF), 0.5154 (ResNetCF), and 0.0070 (HyResNetCF), respectively. From the empirical results, we can see that GCF models have lower variance and lower MAE than UCMF. According the machine learning basic theory, prediction loss is determined by both the bias and variance. Since GCF models have lower prediction loss and lower variance, they obviously also have lower bias. This verifies our assumption in Section 2.3.
To understand why deep GCF models can get better prediction performance, let us show how MAE changes during the training process with different sparsity levels. Here, we compare UCMF and HyResNetCF methods. The results are shown in Figure 6, and we can make some observations as follows:(i)The training MAEs of UCMF decrease sharply at the early stage, then become steady or increase slightly for a while, and then decrease with increasing the training epochs.(ii)The test MAEs of UCMF have a knee point when the sample density is set as 5%, 10%, or 15%. If we set the sparsity level as 15%, after the knee point, the MAEs first increase but decrease after a period of time and finally become steady. However, for the density of 20%, 25%, and 30%, the test MAEs of UCMF first increase for a while and then decrease and finally become steady.(iii)The training MAEs of HyResNetCF decrease sharply at the early stage and then decrease relatively more slowly with increasing the training epochs.(iv)The test MAES of HyResNetCF becomes steady after a few training epochs but is always lower than the test MAEs of UCMF.
(a) Density = 5%
(b) Density = 10%
(c) Density = 15%
(d) Density = 20%
(e) Density = 25%
(f) Density = 30%
Therefore, HyResNetCF is more robust and efficient than UCMF. Actually, however we choose the sparsity level; HyResNetCF can always get a better prediction performance using only a small number of training epochs.
To give a more intuitive explanation of the advantages of deep GCF models, we visualize the service latent factors () of UCMF, NLMF, and HyResNetCF, using 5% and 30% sparsity levels. We use tSNE to map the 10dimensional service latent factor to the 3dimensional space. From Figure 7, we can see the following:(i)If the training set is larger, i.e., the sparsity level is 30%, each model can capture distinguishable structures.(ii)When the sparsity level is 5%, the training set is very small. UCMF learns poor latent factors with disordered structure. However, NLMF and HyResNetCF can still capture more clear structures.(iii)NLMF and HyResNetCF learn different latent factors structures; both of them can get similar prediction performance.
(a) UCMF, density = 5%
(b) NLMF, density = 5%
(c) HyResNetCF, density = 5%
(d) UCMF, density = 30%
(e) NLMF, density = 30%
(f) HyResNetCF, density = 30%
4.3. Impact of Network Depth
To see how network depth influences the prediction performance, we change the number of CRUs of ResNetCF model and the size of for each CRU. Here, if we use CRUs, and the size of tFactor, i.e., , is 8, then the size of is 16, the size of is 32, and so on. Therefore, we test the prediction performance, given different and size of tFactor. The results are shown in Figures 8 and 9 and Tables 4 and 5.


(a) MAE w.r.t. number of CRUs
(b) NMAE w.r.t. number of CRUs
(a) MAE w.r.t. number of residual units
(b) NMAE w.r.t. number of residual units
Due to the extremely large size, models with tFactor size greater than 8 use at most 7 CRU layers. It is too slow to train the more deeper networks in a limited period of time, but 7 CRU layers are sufficient to verify the impact of network depth. Therefore, Figure 8 and Tables 4 and 5 provide the performance of at most 7 CRUs. However, if the size of tFactor is equal to 8, we need more CRUs to observe the change of performance. Thus, we draw the prediction performance with respect to the number of CRUs for 1 to 10 in Figure 9 for the model of which the tFactor size is 8. Note that # tFactor in Tables 4 and 5 refers to tFactor size. We can make some observations as follows:(i)When adding the number of CRUs, the performance would become much better, but it would decrease when the number of CRUs exceeds some threshold value. Therefore, we should choose appropriate network depth.(ii)It seems that models with smaller tFactor size can get better performance using deeper architecture. When setting tFactor size as 8, we get the lowest MAE and NMAE using 7 CRUs.(iii)If we use shallower architecture, a larger tFactor size can get better prediction performance.
5. Related Work
Collaborative filtering (CF) algorithms have been widely used for the recommender systems. To select the best favorite items for the users, CF usually computes the ratings for how the users are interested in the items and then selects the top items in term of the ratings. Predicting QoS values are mostly the same as predicting ratings in recommender systems. Users only know partial knowledge about the QoS properties for all the candidate services. We can build the userservice matrix just like the rating matrix and apply collaborative filtering to make prediction [25–27].
Memory based CF fits the models by directly estimating the parameters using some heuristic algorithms. In fact, memory based CF models are actually generalized knearestneighbors (KNN) algorithms [28–31]. Generally, there are two types of memory based CF models: userbased CF and itembased CF. Userbased CF utilizes the most similar other users to predict how the user potentially likes a specific item, while itembased CF makes a recommendation according to the user’s history experience on similar items. Most of improvement on memory based CF models is how to design an appropriate similarity function for a specific task. Xiaokun Wu et al. computed the similarity between users or between items by comparing the attribute values directly. Huifeng Sun et al. proposed a similarity measure named normal recovery (NR), unifying similarity of the scaled user vectors (or item vectors) in different multidimensional vector spaces. Other studies also provide their own similarity measure based on different assumptions and constraints. Due to lack of ability of making complex assumptions and integrating side information, the memory based CF usually cannot get lower prediction accuracy than the model based CF.
Model based CF, also as a special machine learning algorithm, actually is a generalized regression or classification method, which estimates the coefficients from data. Unlike the typical supervised machine learning problems, there are no obvious raw features for data represented by the rating matrix. Therefore, matrix factorization has become the main technology to handle such data [32–38]. The main difference between traditional matrix factorization approaches is how to represent the latent vectors. Probabilistic Matrix Factorization (PMF) supposes that the rating given a specific user and a specific item obeys a normal Gaussian distribution, and the user latent vector and item latent vector have zeromean Gaussian distribution [39]. Nonnegative Matrix Factorization (NMF) learns the optimal nonnegative latent factors from data, which usually deals with the task of rating prediction on explicit feedback [40]. Generally speaking, nonnegative matrix factorization is used to deal with implicit feedback, where the dataset only includes whether the users are interested in some items but does not include whether the users dislike some items. For explicit feedback, the ratings are usually normalized by subtracting the mean, so nonnegative matrix factorization is not suitable. Our task is predicting QoS values, which belong to explicit feedback. Furthermore, our GCF framework is a deep neural network architecture. Although we can add a constraint that the output of embedding layer is nonnegative, the output of the above layers could be any value again. So the nonnegative constraint has little influence on the GCF framework. However, we can easily use dropout to avoid overfitting for the GCF models. Some studies also integrate matrix factorization with memorybased CF algorithms. Even if matrix factorization CF algorithms have the stateoftheart performance, only capturing linear interaction cannot get further significant improvement.
To efficiently find nonlinear interaction, some recent studies have provided some deep learning based collaborative filtering technologies. Two remarkable examples of such technologies are Google’s Wide and Deep [12] and Microsoft’s Deep Crossing [13]. Both of them are generalized linear regression/classification models. The interaction between features is represented by deep neural networks, such as MLP or Residual Network. However, they are designed for tasks with a lot of features, and the interaction of them includes not only the user and the item. Neural Collaborative Filtering (NCF) is designed purely for user and item interactions [15]. It creatively combines the linear interaction and nonlinear interaction, by applying the embedding technology and multiplication of embedding latent vectors. However, the above three studies focus on classification task. Our work is motivated by NCF, but we are focused on regression tasks, which have different evaluation protocols. Our models indeed made an obvious improvement in service QoS prediction.
Another way of modeling nonlinear interaction is using kernel matrix factorization, which is similar to NLMF method in our GCF framework [41, 42]. However, kernel matrix factorization is a shallow architecture, which has limited learning capacity. NLMF is more flexible, which can adjust the network depth to increase or reduce the learning capacity. Furthermore, kernel matrix factorization needs matrix inverse operation for each training iteration, which is hard to train for the large dataset. Finally, we can benefit from the newest regularizer technologies, such as dropout and batch normalization when using NLMF, which are more powerful than L1 or L2 regularizer used in the kernel matrix factorization.
6. Conclusion
Due to the wide range of QoS values, traditional collaborative filtering methods, especially the matrix factorization models, cannot capture the complex structures of userservice interactions. We give some formal description of the MFHBV problem of traditional matrix factorization models and then provide a more general collaborative filtering framework, which is called GCF framework. Traditional matrix factorization is just a special case of GCF framework. Instead of using the dot product of latent factors to make predictions, we use nonlinear transformation functions to get high level interactive feature to increase the capacity of models. At the same time, we use some complex regularization technologies to reduce the variance of models.
There are three types of GCF framework, using different interactive features to predict QoS values. We design 5 types of instances of these three GCF frameworks, using different neural network architectures, including the traditional matrix model. Particularly, we present a novel neural unit, called crossing residual unit (CRU). The generalization power is usually estimated using the prediction performance on the test set. In this paper, we can see that two GCF models, ResNetCF and HyResNetCF, get the lowest MAE and NMAE on the test set. Therefore, by using CRUs, models would have better generalization power. We use dropout as the regularization method in GCF models. Finally, we conduct extensive experiments on a realworld dataset. The experimental results show that our GCF models (not including UCMF) outperform the traditional collaborative filtering methods. Furthermore, we give some intuitive explanations on why GCF models have lower test error and why they can capture the userservice interaction better. To verify whether deeper neural networks can get better performance, we design several ResNetCF architectures. It seems that adding more layers can improve the prediction accuracy but the number of layers cannot exceed some threshold value.
However, the experiments only focus on how to predict response time. The values are from 0 to 19s, and most of the values are around 3s. The future study will conduct more experiments on QoS dataset, the values of which have much wider range.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work is supported by the National Nature Science Foundation of China (No. 61602399 and No. 61502410) and Shandong Provincial Natural Science Foundation, China (No.ZR2016FB22).
References
 F. Chen, C. Lu, H. Wu, and M. Li, “A semantic similarity measure integrating multiple conceptual relationships for web service discovery,” Expert Systems with Applications, vol. 67, pp. 19–31, 2017. View at: Publisher Site  Google Scholar
 J. SerranoGuerrero, J. A. Olivas, F. P. Romero, and E. HerreraViedma, “Sentiment analysis: A review and comparative analysis of web services,” Information Sciences, vol. 311, pp. 18–38, 2015. View at: Publisher Site  Google Scholar
 J. J. Samper Zapater, D. M. Llidó Escrivá, F. R. Soriano García, and J. J. Martínez Durá, “Semantic web service discovery system for road traffic information services,” Expert Systems with Applications, vol. 42, no. 8, pp. 3833–3842, 2015. View at: Publisher Site  Google Scholar
 N. Temglit, A. Chibani, K. Djouani, and M. A. Nacer, “A Distributed AgentBased Approach for Optimal QoS Selection in Web of Object Choreography,” IEEE Systems Journal, 2017. View at: Google Scholar
 M. Wu, X. Xiong, J. Ying, C. Jin, and C. Yu, “QoSdriven global optimization approach for largescale web services composition,” Journal of Computers, vol. 6, no. 7, pp. 1452–1460, 2011. View at: Google Scholar
 Z. Zheng, X. Wu, Y. Zhang, M. R. Lyu, and J. Wang, “QoS ranking prediction for cloud services,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 6, pp. 1213–1222, 2013. View at: Publisher Site  Google Scholar
 X. Yang, Y. Guo, Y. Liu, and H. Steck, “A survey of collaborative filtering based social recommender systems,” Computer Communications, vol. 41, pp. 1–10, 2014. View at: Publisher Site  Google Scholar
 J. Wei, J. He, K. Chen, Y. Zhou, and Z. Tang, “Collaborative filtering and deep learning based recommendation system for cold start items,” Expert Systems with Applications, vol. 69, pp. 1339–1351, 2017. View at: Google Scholar
 X. Luo, M. Zhou, Y. Xia, and Q. Zhu, “An efficient nonnegative matrixfactorizationbased approach to collaborative filtering for recommender systems,” IEEE Transactions on Industrial Informatics, vol. 10, no. 2, pp. 1273–1284, 2014. View at: Publisher Site  Google Scholar
 A. Hernando, J. Bobadilla, and F. Ortega, “A non negative matrix factorization for collaborative filtering recommender systems based on a Bayesian probabilistic model,” KnowledgeBased Systems, vol. 97, pp. 188–202, 2016. View at: Publisher Site  Google Scholar
 P. Pirasteh, D. Hwang, and J. J. Jung, “Exploiting matrix factorization to asymmetric user similarities in recommendation systems,” KnowledgeBased Systems, vol. 83, no. 1, pp. 51–57, 2015. View at: Publisher Site  Google Scholar
 C. HengTze, K. Levent, H. Jeremiah, and T. Shaked, “Wide & Deep Learning for Recommender Systems,” in Proceedings of the in Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, pp. 7–20, 2016. View at: Google Scholar
 Y. Shan, T. R. Hoens, J. Jiao, H. Wang, D. Yu, and J. C. Mao, “Deep crossing: Webscale modeling without manually crafted combinatorial features,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 255–262, USA, August 2016. View at: Google Scholar
 S. Zhang, L. Yao, and A. Sun, “Deep Learning based Recommender System: A Survey and New Perspectives,” Computer Science: Information Retrieval, 2017. View at: Google Scholar
 X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua, “Neural Collaborative Filtering,” in Proceedings of the the 26th International Conference, pp. 173–182, Perth, Australia, April 2017. View at: Publisher Site  Google Scholar
 N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014. View at: Google Scholar  MathSciNet
 S. Wager, S. Wang, and P. Liang, “Dropout training as adaptive regularization,” in Proceedings of the 27th Annual Conference on Neural Information Processing Systems, NIPS 2013, USA, December 2013. View at: Google Scholar
 W. Shang, J. Chiu, and K. Sohn, “Exploring normalization in deep residual networks with concatenated rectified linear units,” in Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 1509–1516, USA, February 2017. View at: Google Scholar
 K. Hara, D. Saito, and H. Shouno, “Analysis of function of rectified linear unit used in deep learning,” in Proceedings of the International Joint Conference on Neural Networks, IJCNN 2015, Ireland, July 2015. View at: Google Scholar
 P. Wang, R. Ge, X. Xiao, Y. Cai, G. Wang, and F. Zhou, “RectifiedLinearUnitBased Deep Learning for Biomedical Multilabel Data,” Interdisciplinary Sciences: Computational Life Sciences, vol. 9, no. 3, pp. 419–422, 2017. View at: Publisher Site  Google Scholar
 F. Chollet, “Keras: the Python deep learning library,” https://keras.io/. View at: Google Scholar
 “WSDREAM: A Package of Open SourceCode and Datasets to Benchmark QoS Prediction Approaches of Web Services,” https://github.com/wsdream. View at: Google Scholar
 Z. Zheng, H. Ma, M. R. Lyu, and I. King, “WSRec: a collaborative filtering based web service recommender system,” in Proceedings of the IEEE International Conference on Web Services (ICWS '09), pp. 437–444, IEEE, July 2009. View at: Publisher Site  Google Scholar
 Z. Zheng, H. Ma, M. R. Lyu, and I. King, “QoSaware web service recommendation by collaborative filtering,” IEEE Transactions on Services Computing, vol. 4, no. 2, pp. 140–152, 2011. View at: Publisher Site  Google Scholar
 X. Wu, B. Cheng, and J. Chen, “Collaborative Filtering Service Recommendation Based on a Novel Similarity Computation Method,” IEEE Transactions on Services Computing, vol. 10, no. 3, pp. 352–365, 2017. View at: Publisher Site  Google Scholar
 J. Liu, M. Tang, Z. Zheng, X. Liu, and S. Lyu, “Locationaware and personalized collaborative filtering for web service recommendation,” IEEE Transactions on Services Computing, vol. 9, no. 5, pp. 686–699, 2016. View at: Publisher Site  Google Scholar
 D. Margaris, C. Vassilakis, and P. Georgiadis, “An integrated framework for adapting WSBPEL scenario execution using QoS and collaborative filtering techniques,” Science of Computer Programming, vol. 98, pp. 707–734, 2015. View at: Publisher Site  Google Scholar
 A. Bellogín, P. Castells, and I. Cantador, “Neighbor Selection and Weighting in UserBased Collaborative Filtering,” ACM Transactions on the Web (TWEB), vol. 8, no. 2, pp. 1–30, 2014. View at: Publisher Site  Google Scholar
 Z. Jia, Y. Yang, W. Gao, and X. Chen, “Userbased collaborative filtering for tourist attraction recommendations,” in Proceedings of the IEEE International Conference on Computational Intelligence and Communication Technology (CICT '15), pp. 22–25, February 2015. View at: Publisher Site  Google Scholar
 H. Sun, Z. Zheng, J. Chen, and M. R. Lyu, “Personalized web service recommendation via normal recovery collaborative filtering,” IEEE Transactions on Services Computing, vol. 6, no. 4, pp. 573–579, 2013. View at: Publisher Site  Google Scholar
 J. Wu, L. Chen, Y. Feng, Z. Zheng, M. C. Zhou, and Z. Wu, “Predicting quality of service for selection by neighborhoodbased collaborative filtering,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 43, no. 2, pp. 428–439, 2013. View at: Publisher Site  Google Scholar
 Z. Zheng, H. Ma, M. R. Lyu, and I. King, “Collaborative web service Qos prediction via neighborhood integrated matrix factorization,” IEEE Transactions on Services Computing, vol. 6, no. 3, pp. 289–299, 2013. View at: Publisher Site  Google Scholar
 J. Zhu, P. He, Z. Zheng, and M. R. Lyu, “Online QoS Prediction for Runtime Service Adaptation via Adaptive Matrix Factorization,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 10, pp. 2911–2924, 2017. View at: Publisher Site  Google Scholar
 R. Zhu, D. Niu, and Z. Li, “Robust web service recommendation via quantile matrix factorization,” in Proceedings of the 2017 IEEE Conference on Computer Communications, INFOCOM 2017, USA, May 2017. View at: Google Scholar
 Y. Zhang, Z. Zheng, and M. R. Lyu, “Exploring latent features for memorybased QoS prediction in cloud computing,” in Proceedings of the 30th IEEE International Symposium on Reliable Distributed Systems (SRDS '11), pp. 1–10, IEEE, Madrid, Spain, October 2011. View at: Publisher Site  Google Scholar
 D. Yu, Y. Liu, Y. Xu, and Y. Yin, “Personalized QoS prediction for web services using latent factor models,” in Proceedings of the 11th IEEE International Conference on Services Computing, (SCC '14), pp. 107–114, July 2014. View at: Publisher Site  Google Scholar
 Y. Koren, “Factorization meets the neighborhood: a multifaceted collaborative filtering model,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08), pp. 426–434, New York, NY, USA, August 2008. View at: Publisher Site  Google Scholar
 W. Lo, J. Yin, S. Deng, Y. Li, and Z. Wu, “An Extended Matrix Factorization Approach for QoS Prediction in Service Selection,” in Proceedings of the 2012 IEEE International Conference on Services Computing (SCC), pp. 162–169, Honolulu, HI, USA, June 2012. View at: Publisher Site  Google Scholar
 R. Salakhutdinov and A. Mnih, “Probabilistic Matrix Factorization,” in Proceedings of Advances in Neural Information Processing Systems, pp. 1257–1264, 2007. View at: Google Scholar
 D. D. Lee and H. S. Seung, “Algorithms for nonnegative matrix factorization,” in Advances in Neural Information Processing Systems, pp. 556–562, 2000. View at: Google Scholar
 X. Liu, C. Aggarwal, Y.F. Li, X. Kong, X. Sun, and S. Sathe, “Kernelized matrix factorization for collaborative filtering,” in Proceedings of the 16th SIAM International Conference on Data Mining 2016, SDM 2016, pp. 378–386, USA, May 2016. View at: Google Scholar
 S. Rendle and L. SchmidtThieme, “Onlineupdating regularized kernel matrix factorization models for largescale recommender systems,” in Proceedings of The 2008 ACM Conference on Recommender Systems, pp. 251–258, 2018. View at: Google Scholar
Copyright
Copyright © 2018 Wenming Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.