Abstract

With the advent of the mobile network, the fusion of cloud computing and fog computing is becoming feasible to promise lower latency and short-fat connection. However, there are a lot of redundant cloud-aware services with identical functionalities but a different quality of service (QoS) in the fog cloud environment. In fact, since QoS information is stored in distributed fog servers rather than remote cloud, it is hard for individuals to make recommendation and selection with sparse QoS information. Collaborative filtering is an important method for the sparsity problems and has been widely adopted on the prediction of missing QoS values. Focusing on the fact that existing researchers often ignore the QoS fluctuation in a wide range in the fog cloud environment, a novel neighbor-based QoS prediction method is proposed for service recommendation, in which a concept and calculation method is put forward to describe the stable status of services and users with quantifiable QoS values, and a NearestGraph algorithm is further designed to recognize stable or unstable candidate along with their popularity by a nearest neighbor graph structure which can help to make missing QoS values prediction in a certain order to improve final prediction accuracy. Experimental results confirm that the proposed method is effective in predicting unknown QoS values in terms of service recommendation accuracy and efficiency.

1. Introduction

As the development of mobile Internet and agility of distributed system services [1, 2], cloud computing is migrating to the fusion of cloud and fog computing since fog computing is able to better satisfy demands on lower latency and short-fat connection. At present, the composed distributed system [3] is becoming the main solution accepted by the majority [4]. However, a wide range of cloud-aware services is produced to cater for the fog cloud environment. In this situation, mobile users often feel confused to select proper cloud-aware services due to the appearance of redundant cloud-aware services with identical functionalities but a different quality of service (QoS) [5, 6]. Recommender systems are designed to address the suitable matching problem with mobile users and cloud-aware services under information overload.

The key to cloud-aware service selection and recommendation is QoS [7]. QoS is defined as a set of properties of specific cloud-aware services such as response-time, throughput, reputation, and the like, which is treated as an important criterion to distinguish among different functionally equivalent services [8]. In the fog cloud environment, QoS information is normally collected and stored in various fog servers, instead of being transferred to the remote cloud directly, due to the big volume of data and heavy transmission cost. In this situation, QoS information is always distributed but not centralized [9], which means QoS information is often sparse and unavailable for mobile users. Therefore, motivated by making an effective recommendation, it is a feasible way to complete missing QoS values by making predictions.

In fact, all roles in the fog cloud environment have the motivation to predict QoS before their assignments. A typical example of the fog cloud environment [1012] is shown in Figure 1, which includes three roles, service user, service broker, and service designer. The same problem is happening to each role on how to manage cloud-aware services with high-quality performance. For example, service users expect more qualified services which respond more quickly while meeting basic functions. In general, it is necessary for the three roles in the fog cloud environment to predict QoS values due to the following reasons: service user can only get a limited number of QoS values caused by time-and-money-consuming QoS invocation, which makes it difficult for cloud-aware service recommender to make a decision, service broker always has a strong desire to manage cloud-aware services with good performances, and service designer needs to deploy cloud-aware services that satisfy QoS constraints to avoid punishment. Therefore, QoS prediction is a critical issue for cloud-aware service deployment, selection, and recommendation.

At present, the studies on QoS prediction have made certain progress in recent years. Many scholars prefer to “fill” the unknown QoS values through historical QoS information and formulate it as a matrix completion problem [13]. Chen et al. [14] take user-service geographical location into account to improve prediction accuracy. Wang et al. [15] introduce more QoS values affecting aspects such as time and location. Wu et al. [16] answer this problem by considering the relationship between similarity and candidate’s consistency. Some other researchers devote them to finding solutions on how to improve the poor credibility of a fog cloud environment. Tang et al. [17] apply the trust concept for cloud-aware service QoS prediction. Su et al. [18] make a prediction for missing QoS values based on the trust relationship.

However, to the best of our knowledge, there is still a lack of research efforts explicitly targeting on the fluctuation of QoS values related to mobile users’ status and services’ status. In a highly dynamic Internet environment, QoS values of cloud-aware services often fluctuate in a large range due to the variety of users’ mobile networking environments and physical distance between mobile users and fog servers. There is a current situation that some services perform more “unstable” according to a study on the real world QoS[19]. We select two services in random which is invoked by 339 users and draw their QoS values distribution as shown in Figure 2. We all feel service in blue is more unstable compared with service in orange intuitively in Figure 2. Therefore, we can conclude that a cloud-aware service with a wide QoS range performance is not of general applicability and should not be recommended to other users since it is difficult to make an accurate prediction when candidate services with “unstable” performance are employed.

In this paper, the problem of QoS prediction is formulated to leverage historical QoS information. Inspired by the fact of QoS fluctuation, we propose a novel neighbor-based QoS prediction algorithm under the assumption that QoS values have a close relation with services and users in the fog cloud environment. In our approach, a concept and quantization method is put forward to represent the stable status of services and users in the fog cloud environment. And a graph structure is adopted to recognize stable or unstable candidate and to expose their popularity at the same time. Based on this, a NearestGraph method is used to generate an optimal prediction order to get the higher prediction accuracy.

The remainder of this paper is organized as follows. Section 2 introduces related works of QoS prediction and existing methods. Section 3 presents our proposed QoS prediction method for cloud-aware service recommendation. Section 4 provides our experimental results and the details of our experimental implementation. Section 5 sets out our conclusion and looks forward to future works.

At present, there are a lot of efforts and results devoted to tackling the issue of QoS prediction. Initially, scholars adopt static methods to make a QoS prediction. Static methods use the arithmetic average value for prediction, including average QoS value from global, user, and service, respectively. These methods are simple and easy to implement, disregarding the situation-aware factors of users and services. Moreover, these static methods cannot reflect the dynamic properties of QoS values, which are leading to greater prediction error between predicted and actual values according to our experimental results in Section 4.

Motivated by the success of traditional recommender systems, existing works on QoS prediction in the fog cloud environment is usually based on collaborative filtering. Collaborative filtering (CF) methods are widely used to rate prediction in recommender systems. It exploits the similarity between users’ experiences to predict user preference on unknown items. The intuitive idea is to identify “similar” users with the active user and to predict the active user’s preference based on these similar users’ feedback. CF can be further divided into two main categories: model-based method and neighbor-based method.

Model-based method makes a prediction from the known QoS values by learning a predictive model [20]. Observed values are used to learn two matrices which are the basis to calculate similarities among users. However, model-based method suffers from the ignorance of the low-rank structure of real world user-service matrices [21]. The main idea of the model-based method is based on matrix completion, in which the key is to exploit the low-rank structure of the user-service matrix. Lee et al. [22] present an algorithm for nonnegative matrix factorization indicating that there is only a small number of factors influencing the service performance. Some scholars think QoS has strong relation with time and put forward an online prediction. Zhu et al. [23] propose a method for running cloud-aware service to predict its QoS value.

The neighbor-based method uses QoS values of similar users or services to make QoS predictions directly. Shao et al. [24]first introduce a collaborative filtering approach for similarity mining and inference based on historical QoS information in the user-service matrix. They perform positive and negative user similarity calculations separately and integrate them using a weighted mean equation. Zhu et al. [25]give a QoS prediction approach based on multidimension, which takes timing constraints, QoS, throughput, fairness, and load balancing into account. Zheng et al. [5] propose a novel QoS ranking prediction model with the consideration of different cloud users having different preferences for different QoS attributes values.

In this paper, we mainly focus on the neighbor-based collaborative filtering since it is simple to implement and the prediction results are often easy to explain. The prediction accuracy of the neighbor-based method is highly influenced by the available similar candidates. Similar candidates play an effective role in QoS prediction phase mainly since they come from similar computation and assign more or less importance to the target in the prediction. However, the sparsity of QoS information always degrades the accuracy of QoS prediction. In our proposed approach, we address this challenge by introducing graph structure to expose candidate’s own popularity.

3. NearestGraph QoS Prediction

In this section, the problem of QoS prediction is described and formulated in Section 3.1. After that, both user-user similarity and service-service similarity are computed in Section 3.2 to select neighbors. Our NearestGraph algorithm is presented in Algorithms 1 and 2 to predict missing QoS values at last.

Input:  usrG: user nearest neighbor graph; : user-service matrix
Output:  : prediction order for user-based CF
 1: =;
 2: for each 0, usrG.vertices  do  add property usrWeight, usrIndegree to a graph usrG=(usrID,usrEdge)
 3:  usrG← usrWeight given by Eq.(6);
 4:  usrG← usrIndegrees given by usrG.inDegrees;
 5: end for
 6: repeat
 7:  select usrID from usrG.vertices where usrG.vertices.Weight.max in select from usrG.vertices where
  usrG.vertices.Indegree.min;
 8:  .append(usrID);
 9:  usrGusrIndegrees given by usrG.inDegrees;
 10: until usrG.vertices.count is 0
Input:  usrG: service nearest neighbor graph; : user-service matrix
Output:  : prediction order for service-based CF
 1: =;
 2: for each 0, serG.vertices  do  add property serWeight, serIndegree to a graph serG=(serID,serEdge)
 3:  serGserWeight given by Eq.(7);
 4:  serGserIndegrees given by serG.inDegrees;
 5: end for
 6: repeat
 7:  select serID from serG.vertices where serG.vertices.Weight.max in select from serG.vertices where
  serG.vertices.Indegree.min;
 8:  .append(serID);
 9:  serGserIndegrees given by serG.inDegrees;
 10: until  serG.vertices.count is 0
3.1. Problem Description

In this paper, the problem of QoS prediction is described as follows: considering a fog cloud environment with users and cloud-aware services, the QoS of cloud-aware services rated by users is represented as an matrix . The entry is a -dimensional vector representing the QoS values of criteria. Let , be the set of users, let , be the set of n cloud-aware services, is set of all tuples , and is set of all unknown tuples Then the missing information is filled based on the existing information The order of filling in the matrix is expressed as and there is an optimal order to satisfy the higher accuracy.

Figure 3 shows a matrix formed by users and cloud-aware services. The shaded part of the matrix indicates the user has invoked the cloud-aware service and has rated the corresponding QoS value. The blank part indicates the user has not invoked the cloud-aware service and the QoS values need to be predicted. The objective of the missing QoS value prediction is to make the user-service matrix denser within certain iteration phases [26].

Due to analysis of real world QoS datasets, QoS values can vary widely and are highly skewed with large variances that degrade the accuracy of prediction. Without loss of generality, we apply the following function to QoS data in order to map QoS values onto the interval where and are the minimum and maximum QoS values, respectively.

3.2. Neighbors Selection

We can find the neighborhood similarities of users and services by employing Pearson Correlation Coefficient (PCC). PCC is widely used in neighbor-based recommendation systems for similarity computation and proved to have high accuracy. In this paper, we adopt an enhanced-PCC method proposed by Zheng for the neighborhood similarity computation on both sets of users and services. The similarity between two users and is defined by the following equation:where sim falls into the interval , is the number of cloud-aware services that are invoked by the two users, and and are the number of cloud-aware services invoked by user and user , respectively. is a vector of QoS values of cloud-aware service observed by user and represents the average QoS values of different cloud-aware services observed by user .

Similar to the user similarity computation, we also employ enhanced-PCC to compute the similarity between cloud-aware service and as follows:where sim falls into the interval , is the number of users who invoked both cloud-aware services and previously, and and are the number of users who invoked cloud-aware services and , respectively. is a vector of QoS values of user when he invokes cloud-aware service and represents average QoS values of different users when they invoke cloud-aware service .

After the similarity computations, we can get the user similarity matrix and the service similarity matrix. At the same time, we can also identify their neighbors by similarity values in the ascending order. Traditional top-K algorithms select the top most similar neighbors for making missing value prediction. In practice, some neighbors with negative similarity values could greatly decrease the prediction accuracy. In this paper, we exclude dissimilar neighbors with negative enhanced-PCC values. We employ the following equation to find a set of proper similar users for user as :where is the ranking position of user in the similarity neighbors of user and indicates the lowest ranking position manually.

In the same way, we can get the set of proper similar cloud-aware services for cloud-aware service as :where is the ranking position of cloud-aware service in the similarity neighbors of cloud-aware service and indicates the lowest ranking position manually.

3.3. Predicting Missing QoS Values with NearestGraph

After user neighbors selection, we find an interesting fact that some users or services are relatively “popular” to others. For example, user is on the top-1 similarity ranking position of user . It also happens to user when user ranks top-1 in user ’s similar neighbors. User and User may be confronted with the same situation. This is not an occasional case but happens for most similar neighbors. In order to expose this kind of popularity of users or services, we construct a directed graph by nearest neighbor graph as shown in Figure 4.

In Figure 4, a user is represented by usrG=(usrID, usrEdge), in which usrID labels a user and usrEdge shows the relationship between the user and his most similar neighbor-a directed edge will line from a user to his most similar neighbor. Therefore, the indegree of a vertex in our nearest neighbor graph indicates the degree to which other vertices are in favor of this vertex. A vertex with larger indegree means it is very “popular” and will have a higher influence on other vertices. It can be understood as the relationship between celebrities and fans in social networking sites. A celebrity who has more fans means greater appeal, which reveals the greater influence at the same time. Similar method can be used to represent cloud-aware service by servG=(servID, servEdge), in which servID labels a service and servEdge shows the relationship of a service and his most similar neighbor.

Furthermore, to reflect the stability of different users and cloud-aware services as shown in Figure 2, a concept of candidate stability is also proposed. We employ the following equation to describe the stability of user’s status.

Similarly, we can describe the stability of cloud-aware service’s status as follows.where a smaller value of stability will indicate a more stable status.

In order to introduce the stability of users or services, we further extend above nearest neighbor graph to usrG=(usrID, usrWeight, usrEdge) and servG=(servID, servWeight, servEdge), respectively, in which usrWeigh is the stability of user’s status and servWeigh is the stability of service’s status.

In this paper, we believe both popularity and stability will play an important role in QoS prediction and should be used to obtain better prediction accuracy. Therefore, we propose an algorithm called NearestGraph to achieve this goal, which can generate an optimal prediction order by nearest neighbor graph based on different popularity and stability. The main strategies of NearestGraph are the following three key points.(1)Select the usrID with the minimum indegree.(2)Select the usrID with the maximum weight if more than one vertex has the same indegree.(3)Select the usrID with the minimum dictionary order if more than one vertex has the same indegree as well as weight.

The reason why we use those three rules comes from two facts: those vertices with larger indegree, which means they are more popular, will affect more users and should be kept longer in our nearest neighbor graph to make full use of their important influence; those vertices with larger weight, which means they are more stable, will have more positive impact on QoS prediction and should be kept longer in our nearest neighbor graph to make full use of their important influence. Now we take an example, shown in Figure 5, to illustrate the process of our NearestGraph algorithm.

Phase (a) is the initial state of nearest neighbor graph in which a property of vertex called weight is introduced to represent the status of stability and the directed edge is used to show the relationship between the user and his most similar neighbor. For example, vertex with weight 0.42 means it is more stable than vertex with weight 0.40, and vertex pointing to vertex expresses is the most similar neighbor of . Then we will decide which vertex will be predicted according to the weight and indegree shown in nearest neighbor graph. According to No. 1 strategy of NearestGraph, will be predicted first since it has minimum indegree. In the next phase (b), there are two vertices with the same indegree after applying No. 1 strategy. Here No. 2 strategy can help us to make a decision in such a situation. should be predicted in phase (b) for its high stability. Then we loop through the three-key-point strategies to obtain a complete prediction order until there is only one vertex left in the graph structure. We can get a prediction order : in this example. The details of our NearestGraph algorithm for are as Algorithm 1.

Based on the prediction order generated by Algorithm 1, user-based method employs the values of entries to predict the missing entry in the user-service matrix as follows:where and are the average existence QoS values of different cloud services rated by and , respectively.

We can also give the prediction order for in a similar way as shown in Algorithm 2. And the values from service prediction order are correspondingly employed for prediction in service-based method as follows:where and are the average existence QoS values of and rated by different users, respectively.

In this paper, both user-based and service-based approaches are adopted as follows:The mixed approach can help us to get much more missing QoS values and therefore can improve the accuracy of prediction. The parameter controls how much fusion proportion of these two methods and can be trained on a sample dataset from the real world.

The complete QoS prediction algorithm is summarized in Algorithm 3.

Input: : user-service matrix; : lowest ranking position; : degree of fusion prediction results
Output:
 1: for all   do
 2:  compute the similarity sim by Eq.(2)
 3:  compute the similarity sim by Eq.(3)
 4: end for
 5: for all   do
 6:  similar user set by Eq.(4)
 7:  similar service set by Eq.(5)
 8: end for
 9: Learn by applying Algorithm 1
 10: Learn by applying Algorithm 2
 11: for all   do
 12:  compute by Eq.(8)
 13:  compute by Eq.(9)
 14:  fill by Eq.(10)
 15: end for

4. Experiment

In this section, we evaluate the effectiveness of our proposed method on a distributed and parallel platform, Spark system. Section 4.1 introduces two typical metrics to assess the prediction accuracy. The comparison experiments on the prediction accuracy are conducted with different baseline algorithms in neighbor-based CF fields in Section 4.2 and three key parameters of NearestGraph on the prediction accuracy are further demonstrated in Sections 4.3, 4.4, and 4.5. All the experiments are conducted by using 4 PCs with i5-4460 CPU and 16G RAM as our hardware platform.

We evaluate the QoS prediction accuracy of our proposed method based on a real world QoS dataset which is widely used to evaluate the performance of QoS prediction. It contains response-time (response-time measures the time duration between user sending a request and receiving a response) and throughput (throughput stands for the data transmission rate of a user invoking a service) of 5828 services invoked by 339 distributed computers located in 30 countries from PlanetLab. According to statistics of this QoS dataset as shown in Table 1, the range of response-time and throughput are 0−20 s and 0−1000 kbps, respectively, and the means of response-time and throughput are 0.910 s and 47.386 kbps, respectively.

There are 100837 QoS records about response-time property and 143422 QoS records about throughput property in this QoS dataset. The corresponding user-service matrices on both these two QoS properties have some entries with the value of -1, which means the current QoS value cannot be obtained or the service is unreachable in the real world. Therefore, the entries with the value of -1 are where we need to predict in the matrix.

4.1. Metrics

To evaluate the performance of our proposed NearestGraph method, we compare its prediction accuracy with some neighbor-based CF methods by computing mean absolute error (MAE) and root-mean-square error (RMSE), which is to calculate the errors between predicted values and real values. The metric MAE is defined asand RMSE is defined aswhere is the QoS value of cloud-aware service observed by user , is QoS value of cloud-aware service that would be observed by user as predicted by a method, and is the number of predicted QoS values. According to the definitions, the smaller value of metric indicates the higher accuracy of prediction.

4.2. Performance Comparison

In this part, we conduct an overall comparison experiment on our NearestGraph method and some baseline algorithms in neighbor-based CF fields on both MAE and RMSE. They are listed as follows:

UMean: mean QoS values obtained by a user are used to predict the missing QoS value which has not been obtained by this user.

IMean: mean QoS values obtained by all users are used to predict the missing QoS value which has not been obtained by some users.

UPCC: it is a user-based collaborative filtering method, which uses similar users calculated by Pearson Correlation Coefficient to make a prediction [24].

IPCC: it is an item-based collaborative filtering method, which uses similar items calculated by Pearson Correlation Coefficient to make a prediction [27].

WSRec: it is a hybrid collaborative filtering method that combines IPCC and UPCC and uses both similar users and similar services for QoS prediction [5].

In order to simulate the users’ invocation of cloud-aware services in the real world, we remove some entries from user-service matrix in random and compare their values with predicted ones. For example, 10% represents that we randomly remove 90% entries and use the remaining 10% entries to predict the values of removed entries. The parameter settings of NearestGraph are and in the experiments.

Experiment results are shown in Table 2. We highlight the best performance of all methods for each row in Table 2. We can easily see from Table 2 that NearestGraph always obtains the minimum MAE and RMSE of response-time and throughput almost for all different matrix densities, which means it can improve the prediction accuracy. Moreover, with the value of matrix density increasing from 10% to 30%, the MAE and RMSE of NearestGraph method become smaller and smaller since a denser matrix will provide more information for the missing QoS value prediction.

Comparing the MAE and RMSE of response-time and throughput in Table 2, we can also find that the MAE and RMSE of response-time are larger than those of throughput which means NearestGraph performs better on throughput property than on response-time property. Taking the matrix density of 30% as an example, we can calculate the MAE improvements of NearestGraph over WSRec, respectively. For the response-time property, the MAE improvement is , while for the throughput property, the MAE improvement is . That confirms that our proposed method focuses on facts of the QoS fluctuation and can make a better performance in a wide range of QoS values (just like the range of throughput is 0-1000 kbps and the range of response-time is only 0-20 s).

4.3. Impact of Matrix Density

In order to explore the impact of matrix density, we compare the prediction accuracy of all the methods under different matrix densities and present the results in Figure 6. The density of the matrix increases from 10% to 30% with a step of 10%. The parameter settings in this experiment are and =0.5.

The MAE and RMSE results of response-time are shown in Figures 6(a) and 6(b) and the MAE and RMSE results of throughput are shown in Figures 6(c) and 6(d). In these figures, the green line NearestGraph stands for is always below any other lines, which means our proposed NearestGraph method gets the smallest values of MAE and RMSE under different matrix densities. Moreover, we can observe that the performance of our NearestGraph method improves with the increase of matrix density, which indicates that collecting more QoS information will greatly enhance prediction accuracy when the matrix is sparse.

4.4. Impact of

The parameter here controls how much fusion proportion of user-based and service-based method. A larger value of means user-based approach will contribute more to the hybrid prediction. In Figure 7, we study the impact of parameter in the proposed NearestGraph method on prediction accuracy by varying the values of from 0 to 1 with a step of 0.1 under the condition of =10.

Figures 7(a) and 7(b) show the MAE and RMSE results of response-time and throughput, respectively. The prediction accuracies increase when we increase the value of at first. But when surpasses a certain threshold, the prediction accuracy decreases with the further increase of . From Figure 7, we can also find that NearestGraph gets the best performance when .

4.5. Impact of

The parameter determines the size of candidates sets including similar users and similar services. In Figure 8, we study the impact of parameter in the proposed NearestGraph method on prediction accuracy by varying the values of from 2 to 20 with a step of 2 under the condition of =0.5.

Figures 8(a) and 8(b) represent the MAE and RMSE results of response-time and throughput, respectively. The experimental results show that our NearestGraph will achieve best prediction accuracy (minimum MAE and RMSE) when is set around 10. This is because too small value will exclude useful information from some similar candidates, while too large value will introduce noise from dissimilar candidates, which will impact the prediction accuracy.

5. Conclusion and Future Work

In the fog cloud environment, to reduce the data transmission cost from mobile users to the cloud, QoS information is often first handled by distributed fog servers instead of being sent to a remote cloud directly. However, such a cross-platform data distribution will lead to the sparsity of QoS information for service recommendation. Focusing on the fact that existing researches on missing QoS value prediction often ignore the QoS fluctuation in a wide range especially in the fog cloud environment, we propose a novel QoS prediction method by using NearestGraph algorithm for service recommendation. The key point of our approach proposed on the neighbor-based method is the construction of nearest neighbor graph which is designed to expose stable and popular candidates, and the choice of making prediction in a certain order, which applies priorities to different candidates instead of traversing candidates in random to promote the final accuracy. Through a set of experiments on a real world distributed service quality dataset WS-DREAM for stimulating the fog cloud environment, we validate the feasibility of our method in terms of service recommendation accuracy and confirm the motivation that NearestGraph can get a good performance in large fluctuation of QoS properties. In summary, the paper makes the following key contributions:(1)We emphasize the fact of real world QoS values fluctuation in a wide range and take it into account to solve the inaccuracy of predicting missing values.(2)We reveal the inner features of candidates behind neighbors and take their outer characteristic, stability, and popularity, in the fog cloud environment by constructing the nearest neighbor graph.(3)Graph structure is employed to develop prediction order and enhance prediction accuracy.

Currently we predict the values of different QoS attributes separately. And we are going to investigate on the correlations and combinations on the QoS attributes in the future. Furthermore, we will use time series analysis for prediction and extend NearestGraph to describe accurate user and service status in the fog cloud environment.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors acknowledge the support received from the National Natural Science Foundation of China (61300176, 61473031, 61472029, and 61672086) and Fundamental Research Funds for the Central Universities (2016JBZ005).