Abstract

Service recommendation has become one of the most effective approaches to quickly extract insightful information from big educational data. However, the sparsity of educational service quality data (from multiple platforms or parties) used to make service recommendations often leads to few even null recommended results. Moreover, to protect sensitive business information and obey laws, preserving user privacy during the abovementioned multisource data integration process is a very important but challenging requirement. Considering the above challenges, this paper integrates Locality-Sensitive Hashing (LSH) with hybrid Collaborative Filtering (HCF) techniques for robust and privacy-aware data sharing between different platforms involved in the cross-platform service recommendation process. Furthermore, to minimize the “False negative” recommended results incurred by LSH and enhance the success of recommended results, we propose two optimization strategies to reduce the probability that similar neighbours of a target user or similar services of a target service are overlooked by mistake. Finally, we conduct a set of experiments based on a real distributed service quality dataset, i.e., WS-DREAM, to validate the feasibility and advantages of our proposed recommendation approach. The extensive experimental results show that our proposal performs better than three competitive methods in terms of efficiency, accuracy, and successful rate while guaranteeing privacy-preservation.

1. Introduction

With the advent of the Web of Things (WoT), tremendous computing resources or services (e.g., web APIs) are emerging rapidly on the Web [14], imposing a heavy burden on the service selection decisions of target users in education domain. In this situation, various lightweight service recommendation techniques, e.g., Collaborative Filtering (CF), are proposed to alleviate the abovementioned service selection burdens. Typically, by analysing the historical service usage data (e.g., the quality data of services invoked by users), a recommender system can capture the personalized preferences of a user and output appropriate services to him/her; this way, complex requirements from the user could be satisfied [57].

However, in the big data environment, the recommendation bases for educational decisions are sometimes not centralized but are distributed across multiple platforms [8, 9]. Considering the example in Figure 1, user u1 invoked web service s1 from platform P1 and user u2 invoked web service s2 from platform P2. Thus previous service quality values of s1 and s2 are recorded in platforms P1 and P2, respectively. In this situation, to make comprehensive and accurate service recommendations to the target user, it is necessary for the recommender system to integrate or fuse the distributed educational data across platforms P1 and P2 properly.

However, there are still several challenges in the abovementioned data integration process. First, to protect sensitive business information [10, 11] and obey laws, platform P1 is often reluctant to share its data with P2 and vice versa [12]. Such a cross-platform data sharing failure severely impedes the subsequent service recommendations. Besides, the possible sparsity of service quality data [13, 14] stored in platforms P1 and P2 often leads to few (even null) recommended results, which decreases the target user’s satisfaction degree significantly. Namely, the robustness of the recommender system is not as high as expected.

Considering the drawbacks, a time-efficient and privacy-preserving neighbour search technique, Locality-Sensitive Hashing (LSH), is employed for cross-platform service recommendations, so that the multiple platforms involved in the distributed recommendation process can share their data with each other efficiently and securely. Furthermore, we combine the LSH technique with hybrid CF (i.e., HCF, including user-based CF and item-based CF), to propose a novel privacy-preserving cross-platform service recommendation approach, named . Benefiting from the advantages of LSH in terms of search efficiency and privacy-preservation, our proposal can achieve a good trade-off among recommendation efficiency, accuracy, successful rate, and privacy-preservation.

In summary, our contributions are three-fold.

(1) We integrate the LSH technique with hybrid CF to guarantee efficient and secure data sharing between different platforms involved in the cross-platform service recommendations in a big educational data environment.

(2) Two solutions are suggested to reduce the probability of “False negative” (i.e., high-quality recommended results are overlooked by mistake) incurred by the inherent shortcoming of LSH and thereby increase the success ratio of recommended list.

(3) A wide range of experiments are conducted on a real distributed service quality dataset, i.e., WS-DREAM to validate the feasibility of our proposal. Experiment results show that our proposed approach outperforms the other state-of-the-art approaches.

The remainder of this paper is structured as follows. Related work is presented in Section 2. In Section 3, we introduce the preliminary knowledge of the LSH technique to be used in our approach. In Section 4, we introduce the details of our proposed privacy-preserving cross-platform service recommendation approach, i.e., . In Section 5, a set of experiments are conducted on WS-DREAM dataset to validate the feasibility of our proposal. Finally, in Section 6, we conclude the paper and discuss the future research directions.

To the best of our knowledge, the existing privacy-preservation techniques adopted in the field of service recommendations can be divided into the following four categories: K-anonymity, data obfuscation, data decomposition, and Locality-Sensitive Hashing. Next, we introduce the related work from these four perspectives, respectively.

2.1. K-Anonymity

As an effective privacy-preservation technique, K-anonymity is successfully applied in [15] to protect the sensitive data of users. The authors in [16] recruit K-anonymity technique to generalize the location information that users left in the past so as to protect the users’ location privacy when making a recommendation decision. Generally, a larger K value often means better privacy-preservation performance. However, when the K value becomes larger, the availability of anonymous data would be reduced significantly, thereby decreasing the accuracy of recommended results.

2.2. Data Obfuscation

Random data obfuscation technique is proposed in various applications where the real service quality data are replaced by the obfuscated data so that the private information hidden in real service quality data can be protected. However, as the data used to make recommendation decisions have already been obfuscated beforehand, the service recommendation accuracy is reduced accordingly. Differential Privacy (DP) technique is recruited in [17] to obfuscate the sensitive service quality data by noise injection so as to hide the real service quality data when making service recommendation decisions. However, the time complexity of Differential Privacy technique is often high. Besides, when the service quality data for recommendation decisions are updated frequently, the accumulated noise amount will become increasingly larger; in this situation, the data availability is reduced, which influences the accuracy of returned results to some extent.

2.3. Data Decomposition

In [18], the authors propose a data decomposition mechanism to achieve the privacy-preservation goal in service recommendations. Concretely, each sensitive quality data is transformed to be multiple segments with less privacy information; afterwards, these service quality segments with little privacy are sent to different user clients for storage. Thus when a user requests service recommendations, the multiple service quality segments kept by each user client are integrated together for subsequent recommendation decision-making process. As each user only possesses multiple service quality segments from different quality data, instead of the whole service quality data, the sensitive information from users is secured, while this approach still fails to secure certain privacy, for example, the intersection of services executed by different users.

2.4. Locality-Sensitive Hashing (LSH)

As an effective technique for quick neighbour search from massive and high-dimensional data, LSH has recently been introduced into service recommendation for privacy-preservation. In our previous work [19, 20], the LSH technique is combined with user-based CF to protect the sensitive service quality data engaged in recommendation process. In [21], LSH is recruited to build service indices in the distributed environment, so as to reduce the cross-platform data communication cost and improve the recommendation efficiency. However, these LSH-based service recommendation approaches do not consider the low successful rate incurred by the possible sparsity of recommendation data. Moreover, they seldom study the “False negative” recommended results as well as the corresponding resolutions.

With the above analyses, we can conclude that existing privacy-preserving service recommendation approaches either fall short in the efficiency and the capability of privacy-preservation, or they probably overlook high-quality recommended results so that the users’ satisfaction degree is decreased. Considering these drawbacks, we integrate the LSH technique and hybrid CF in this paper to propose a novel privacy-preserving service recommendation approach named . The details of our proposal will be introduced in Section 4.

3. Preliminary Knowledge

In Section 3.1, we first formulate the privacy-preserving service recommendation problems to be addressed in this paper. Afterwards, in Section 3.2, we briefly introduce the rationale of the LSH technique to be used in our service recommendation approach.

3.1. Problem Formulation

To facilitate the following discussions, we introduce the symbols used in this paper below. and mean user set and service set, respectively; utarget and starget denote a target user and a target service (i.e., a service preferred by the target user), respectively; q is a quality dimension of web services, e.g., response time or throughput (for simplicity, only one quality dimension is considered in this paper); qi,j denotes the quality of q of service sj (∈WS) ever-invoked by user ui (∈U) and the qi,j data are often distributed across different platforms in the big data environment.

With the above formulation, our focused privacy-preserving service recommendation problems can be specified more formally as follows: recommend appropriate services from set WS to target user utarget based on the historical qi,j data across different platforms and meanwhile protect the real value of qi,j so that the users’ private information hidden in qi,j data is still secure.

3.2. Locality-Sensitive Hashing

Locality-Sensitive Hashing has been considered as one of the most effective techniques for similar neighbour search due to the following two properties [22]. Here, A and B are two points in original data space, and h(.) denotes a LSH function that is responsible for transforming points A and B into corresponding hash values h(A) and h(B), respectively.

Property 1. If A and B are close in original data space, then they will be projected into the same bucket (i.e., h(A) = h(B)) after hashing with high probability.

Property 2. If A and B are not close in original data space, then they will be projected into different buckets (i.e., h(A) ≠ h(B)) after hashing with high probability.

Thus, inspired by these two properties, we can utilize the hash values h(A) and h(B) (with little or no privacy) to evaluate the approximation degree of original points A and B, without revealing the details of A and B. This way, the private information of points A and B can be protected.

4. : Service Recommendation Based on LSH and Hybrid CF

In this section, we introduce our proposed privacy-preserving service recommendation approach, i.e., . Concretely, in Section 4.1, we utilize the LSH technique and user-based CF to make service quality prediction; in Section 4.2, we utilize the LSH technique and item-based CF to make service quality prediction. Finally, in Section 4.3, we integrate the predicted results of Sections 4.1 and 4.2 and then make service recommendations accordingly.

4.1. Service Quality Prediction Based on LSH and User-Based CF

In this subsection, we utilize user-based CF and LSH to look for a target user’s similar neighbours (denoted by set Neighbour_set(utarget)) in a privacy-aware and scalable manner, and then the method makes service quality prediction based on the derived similar neighbours in Neighbour_set(utarget).

First, for any u∈U, the quality data over dimension q are simply converted into an n-dimensional quality vector = (qu,1, …, qu,n). Here, qu,j denotes the quality value of q of sj invoked by user u (typically, qu,j = 0 if user u did not rate sj in the past) and n is the number of candidate web services. Next, we introduce how to utilize the LSH technique to transform vector with much private information into corresponding user index h(u) with little privacy, based on a pre-selected LSH function h(.).

Concretely, the concrete forms of LSH function h(.) heavily rely on the “distance” for user similarity measurement; in other words, different types of similarity “distance” correspond to different kinds of LSH functions. The Pearson Correlation Coefficient (PCC) is often utilized to calculate user similarity in existing recommender systems, so we choose the LSH function corresponding to the PCC distance in this paper. More concretely, the LSH function h(.) in (1) is adopted [23]. Here, is an n-dimensional vector (v1, …, vn), where vj (1 ≤ j n) is a random value in the range ; symbol “” represents the dot product between two vectors. This way, through (1), we can transform with much privacy into a Boolean value h(u) with little privacy.

As LSH is essentially a probability-based similar neighbour search technique, one hash function h(.) is often not enough for finding the similar neighbours of a target user accurately. In view of this observation, we amplify the performance of LSH by adopting r hash functions and L hash tables into the similar neighbour search processes. Concretely, in each hash table, we can build an index for user u, denoted by H(u) = (h1(u), …, hr(u)). Furthermore, two users u1 and u2 are regarded as similar iff condition in (2) holds, where Hx(u1) and Hx(u2) denote the indices of u1 and u2 in the x-th hash table (i.e., Tablex), respectively.

Likewise, for the target user, i.e., utarget, we can calculate his/her user index value Hx(utarget) in Tablex (x = 1, …, L), according to the same LSH functions and LSH tables. Then, through the condition in (2), we can determine the similar neighbours of utarget and put them into set Neighbour_set(utarget). The pseudocode of the above neighbour search process is presented in Algorithms 1 and 2, where Algorithm 1 is used to build the L hash tables for users offline and Algorithm 2 is used to search for the similar neighbours of the target user.

Inputs:  
L: number of LSH tables
r: number of LSH functions
Output: Table1, …, TableL
For  x = 1, …, L  do // build hash tables offline
For  k = 1, …, r do
For i = 1, …, m do
Build user sub-index hk(ui) based on random LSH function hk(.)
For i = 1, …, m do
Build user index Hx(ui) = (h1(ui), …, hr(ui))
Return hash table Tablex constituted by all the “” mappings
Inputs: utarget // a target user
Output: Neighbour_set(utarget)
For  x = 1, …, L  do
Find the bucket bt corresponding to Hx(utarget) in Tablex
If  ui∈bt and uiutarget
Then put ui into Neighbour_set(utarget)
Return  Neighbour_set(utarget)

However, as LSH is a probability-based neighbour finding technique, the “False negative” search results are inevitable. In other words, some similar neighbours of a target user may be overlooked by mistake according to the abovementioned LSH-based neighbour search process. In view of this drawback, we propose two optimization strategies to reduce the “False negative” probability and improve the successful rate of neighbour search. Next, we introduce these two strategies, respectively.

Strategy 1 (neighbour propagation (for users)). The neighbour relationship between different users is essentially depicted by the similarity of user preferences, while the latter (i.e., user preference similarity) obeys a kind of propagation rule. Let us consider the example in Figure 2 where three users and four web services are present. The user-service ratings (1~5) are shown in Figure 2(a), according to which we can determine the neighbour relationship between u1 and u2 as well as the neighbour relationship between u1 and u3. In this situation, we can infer that u2 and u3 are possible neighbours (marked with dotted line in Figure 2(b)) as both of them hold the same or similar preferences with u1, although u2 and u3 are not direct neighbours based on the user-service rating data in Figure 2(a). This way, through the neighbour propagation rule illustrated in Figure 2, we can find more possible neighbours of a target user (in an indirect manner) so as to reduce the “False negative” probability.

Next, we introduce how to integrate the neighbour propagation strategy (for users) into the abovementioned LSH-based neighbouring user search process. Concretely, if users ua and utarget are projected into an identical bucket in any of the L hash tables and users ua and ub are projected into an identical bucket in any of the L hash tables, then according to the neighbour propagation strategy (for users), we can infer that ub is a possible neighbour of utarget and put ub into Neighbour_set(utarget). The pseudocode of Strategy 1 is presented in Algorithm 3.

Inputs:  
utarget // a target user
Neighbour_set(ui) // before neighbour propagation (for users)
Output: Neighbour_set(utarget) // after neighbour propagation (for users)
For each ua∈Neighbour_set(utarget) do
For each ub∈Neighbour_set(ua) do
If  ub Neighbour_set(utarget)
Then put ub into Neighbour_set(utarget)
Return  Neighbour_set(utarget)

Strategy 2 (condition relaxation for neighbour search (for users)). According to the inherent characteristic of the LSH technique, the number of hash functions (i.e., r) plays an important role in the neighbour search process. Generally, a larger r value often means stricter filtering condition for neighbour search and thereby leads to higher probability of “False negative” search results. Considering this, we relax the search condition for neighbours of the target user to reduce the “False negative” probability. Next, we introduce the concrete condition relaxation process.

According to the neighbour search condition in (2), ui is regarded as a neighbour of utarget iff H(ui) = H(utarget) holds in any hash table, where H(ui)= (h1(ui), …, hr(ui)) and H(utarget) = (h1(utarget), …, hr(utarget)). Namely, all the r bit values in H(ui) are required to be equal to the r bit values in H(utarget), respectively. Hence, to relax the search condition for neighbours of utarget and guarantee high similarity between utarget and his/her neighbours ui, one bit difference between the indices of utarget and ui is permitted.

For example, if Hx(utarget) = (1, 1, 1) holds in hash table Tablex, then the neighbour ui’s index in Tablex is permitted to be (0, 1, 1) or (1, 0, 1) or (1, 1, 0). In other words, any user whose index is equal to (0, 1, 1) or (1, 0, 1) or (1, 1, 0) is a possible neighbour of utarget. This is the main idea of our proposed search condition relaxation strategy (for users). The pseudocode of Strategy 2 is presented in Algorithm 4, where Hx(utarget)k denotes the relaxed search condition (i.e., k-th bit is different from that of Hx(utarget)) for utarget in Tablex; for example, if Hx(utarget) = (1, 1, 1), then Hx(utarget)2 = (1, 0, 1) holds.

Inputs:
utarget // a target user
Output: Neighbour_set(utarget) // after condition relaxation for neighbour search (for users)
For  x = 1, …, L  do
For  k = 1, …, r  do
Hx(utarget) = Hx(utarget)k
Neighbouring-user-search (utarget, TB) // Algorithm 2
Return  Neighbour_set(utarget)

Through Strategies 1 and 2, we can obtain an enlarged set of neighbours of the target user, i.e., Neighbour_set(utarget). Next, we make service quality prediction based on the elements in Neighbour_set(utarget). Concretely, for web service sj never invoked by utarget before, its predicted quality over dimension q by utarget, denoted by qtarget,j, can be calculated by

4.2. Service Quality Prediction Based on LSH and Item-Based CF

Similar to Section 4.1, in this subsection, we first utilize item-based CF and LSH techniques to look for the similar services (named “neighbouring services”) of target service starget (denoted by set Neighbour_set(starget)) in a privacy-aware and scalable manner, and the techniques then make service quality predictions based on the elements in Neighbour_set(starget).

First, for any web service sj∈ WS, its historical quality data over dimension q ever-invoked by users can be specified by an m-dimensional vector = (q1,j, …, qm,j), where qi,j (1 ≤ i m) denotes the quality value of q of service sj invoked by user ui and m is the number of users. Next, we utilize the LSH technique to transform with private information into a corresponding service index h(sj) with little privacy, based on the random LSH function h(.) in (1). Here, is an m-dimensional real vector (v1, …, vm), where vi (1 ≤ i m) is a random value in the range .

This way, we can transform with much privacy into a Boolean value h(sj) with little privacy.

Likewise, we amplify LSH through integrating r hash functions and L hash tables . Then, in each hash table, we build an index for service sj, denoted by H(sj) = (h1(sj), …, hr(sj)). Furthermore, two services s1 and s2 are regarded as neighbouring services if the condition in (4) holds where Hx(s1) and Hx(s2) denote the indices of services s1 and s2 in the x-th hash table (i.e., Tablex), respectively.

Then, through (4), we can find out the neighbouring services of starget and put them into Neighbour_set(starget). Note that if multiple target services are present, then it is necessary to repeat the above process for each target service to discover all the qualified neighbours. The pseudocode is presented in Algorithms 5 and 6, where Algorithm 5 is used to build the L hash tables for services offline and Algorithm 6 is used to search for the neighbouring services of the target service (repeat Algorithm 6 if multiple target services are present).

Inputs:
L: number of LSH tables
r: number of LSH functions
Output: Table1, …, TableL
For  x = 1, …, L  do // build hash tables offline
For  k = 1, …, r  do
For  j = 1, …, n  do
Build service sub-index hk(sj) based on random LSH function hk(.)
For  j = 1, …, n  do
Build service index Hx(sj) = (h1(sj), …, hr(sj))
Return  hash table Tablex constituted by all “” mappings
Inputs: starget // a target service
Output: Neighbour_set(starget)
For  x = 1, …, L  do
Find the bucket bt corresponding to Hx(starget) in Tablex
If  sj∈bt and sjstarget
Then put sj into Neighbour_set(starget)
Return  Neighbour_set(starget)

However, similar to Section 4.1, “False negative” search results are also inevitable; in other words, certain real neighbours of a target user are probably deemed as non-neighbors. Considering the drawback, Strategies 3 and 4 (actually the variants of Strategies 1 and 2 in Section 4.1) are proposed to reduce the “False negative” probability.

Strategy 3 (neighbour propagation (for services)). Let’s consider the example in Figure 3 where four users and three web services are present. The user-service ratings (1~5) are shown in Figure 3(a), according to which we can determine that s1 and s2 are neighbouring services and s1 and s3 are neighbouring services. In this situation, we can infer that s2 and s3 are possible neighbouring services (marked with dotted line in Figure 3(b)). Thus through the propagation rule illustrated in Figure 3, we can obtain more neighbouring services of a target service so that the “False negative” probability is reduced.

Next, we introduce how to integrate the neighbour propagation strategy (for services) into the abovementioned LSH-based neighbouring service search process. Concretely, if services sa and starget are projected into an identical bucket in any of the L hash tables and services sa and sb are projected into an identical bucket in any of the L hash tables, then according to the neighbour propagation strategy (for services), we can infer that sb is probably a neighbouring service of starget and hence put sb into Neighbour_set(starget). The pseudocode of Strategy 3 is presented in Algorithm 7.

Inputs:
starget // a target service
Neighbour_set(sj) // before neighbour propagation (for services)
Output: Neighbour_set(starget) // after neighbour propagation (for services)
For each sa∈Neighbour_set(starget) do
For each sb∈Neighbour_set(sa) do
If  sb Neighbour_set(starget)
Then put sb into Neighbour_set(starget)
Return  Neighbour_set(starget)

Strategy 4 (condition relaxation for neighbour search (for services)). Similar to Strategy 2, in Strategy 4, we relax the search condition for neighbouring services of the target service to reduce the “False negative” probability of search results. Concretely, according to the neighbouring service search condition in (4), service sj is regarded as a neighbouring service of starget iff all the r bit values in H(sj) are equal to the r bit values in H(starget), respectively. Therefore, to relax the search condition for neighbouring services of starget and meanwhile guarantee the high similarity between starget and its neighbouring services sj, one bit difference between the indices of starget and sj is permitted.

For example, if condition Hx(starget) = (1, 1, 1) holds in hash table Tablex, then any service whose index is equal to (0, 1, 1) or (1, 0, 1) or (1, 1, 0) is a possible neighbouring service of starget. This is the main idea of our proposed search condition relaxation strategy (for services). The pseudocode of Strategy 4 is presented in Algorithm 8, where Hx(starget)k denotes the relaxed search condition (i.e., the k-th bit is different from that of Hx(starget)) for starget in Tablex; e.g., Hx(starget)2 = (1, 0, 1) holds if Hx(starget) = (1, 1, 1).

Inputs:
starget //a target service
Output: Neighbour_set(starget)//after condition relaxation for neighbour search (for services)
For  x = 1, …, L  do
For  k = 1, …, r  do
Hx(starget) = Hx(starget)k
Neighbouring-service-search (starget, TB) //Algorithm 6
Return  Neighbour_set(starget)

Through Strategies 3 and 4, we can obtain an enlarged set of neighbouring services of the target services, i.e., Neighbour_set(starget). Next, for each service sj never invoked by utarget, its predicted quality over dimension q rated by utarget, denoted by qtarget,j, is calculated by equation (5) where sj∈Neighbour_set(starget). Here, qtarget,target denotes the real service quality of starget observed by utarget. Furthermore, if service sj appears multiple times in Neighbour_set(starget), then the average predicted quality is adopted.

4.3. Aggregation of Predicted Service Quality and Service Recommendation

We aggregate the two pieces of quality data predicted by (3) and (5) into a comprehensive quality in (6). Here, quser and qitem denote the qtarget,j values predicted in (3) and (5), respectively; α and β (0 ≤ α, β ≤ 1 and α + β = 1) are the aggregation coefficients. At last, we choose the service sj whose predicted value (i.e., qtarget,j in (6)) is the best and return it to utarget.

5. Experiments

In this section, we deploy a group of experiments to validate the feasibility of our proposed approach in terms of service recommendation efficiency, accuracy, and successful rate. Concretely, in Section 5.1, we introduce the experiment dataset and configurations that we adopted for experiments; in Section 5.2, experiment comparison results are presented; in Section 5.3, further discussions are given.

5.1. Experiment Configurations

Our experiments are based on a real distributed web service quality dataset WS-DREAM [24] that collects real-world service quality data from 339 users on 5825 web services (hosted in different countries). Each country that hosts a group of services is considered to be an individual platform for recommendation scenario simulation. Additionally, partial real values in the dataset are dropped for prediction needs. Moreover, only one quality dimension of services, i.e., response time, is considered in our experiments for simplicity. The target user is selected randomly from the user set in WS-DREAM, whose invoked services are regarded as the target services recruited in Section 4.2.

In order to validate the feasibility of our proposed approach, we test the time cost and MAE of our proposal and compare them with three other state-of-the-art recommendation approaches including UPCC [25], P-UIPCC [17], and PPICF [18]. Concretely, UPCC is the benchmark service recommendation approach that is based on user-based CF; P-UIPCC utilizes the “divide-merge” operations over sensitive service quality data; while in PPICF, the real service quality data is transformed into the obfuscated data and then the obfuscated data are used to make service quality prediction and service recommendations.

The experiments were conducted on a Dell laptop with 2.80 GHz processors and 2.0 GB RAM. The machine runs Windows XP and JAVA 1.5. Each experiment was carried out ten times, and the average experimental results were adopted finally.

5.2. Experiment Results and Analyses

In the experiments, five profiles are tested and compared to validate the feasibility of our proposal. Here, denotes the density of the user-service quality matrix recruited to make service recommendations; L and r denote the number of hash tables and the number of hash functions, respectively; α = β = 0.5 holds in (6).

Profile (computational time of four approaches w.r.t. ). Next, we measure the service computational time for recommendation process and scalability of four approaches with respect to matrix density . Concrete experimental parameters are set as follows: is varied from 5% to 25%, L = 10 and r = 14 hold. Experimental results are shown in Figure 4.

As the experimental results in Figure 4 indicate, the computational time of the four different approaches all increase with the growth of service quality matrix density, i.e., , because all the user-service quality data need to be considered in the four approaches and, therefore, more computational time is often required when the quality matrix becomes denser (i.e., when grows). However, our proposed approach outperforms the other three approaches in terms of recommendation efficiency and scalability because most jobs in our approach (e.g., user indices building) can be done offline before a service recommendation request arrives, while the time complexity of the remaining jobs (e.g., online neighbour search) is rather small. So generally, our proposal can satisfy the quick response requirements of target users.

Profile (accuracy of returned results by four approaches w.r.t. ). We test and compare the recommendation accuracy (i.e., MAE, the smaller the better) of four approaches. The following are the experiment parameter settings: is varied from 5% to 25%, L = 10, and r = 14. Concrete comparison results are presented in Figure 5.

Figure 5 indicates that the accuracy of returned results by P-UIPCC and PPICF are not high; the reason is that, in order to secure the sensitive user privacy, the service quality data engaged in recommendation process have already been obfuscated in UIPCC and PPICF, while our approach performs better than the other three approaches in terms of recommendation accuracy; this is because only the “most similar” neighbouring users and neighbouring services can be returned by LSH and recruited to make service recommendations. Therefore, the recommendation accuracy is improved considerably.

Profile (recommendation efficiency of w.r.t. and ). In this profile, we test the recommendation efficiency of our approach with respect to L and r. The parameters are set as follows: = 25%, L is varied from 6 to 14, and r is varied from 8 to 14. Experimental results are shown in Figure 6.

As shown in Figure 6(a), the time cost of our proposal increases approximately with the growth of L, as all the L hash tables need to be traversed in order to find the similar neighbours of the target user by (2) and find the similar neighbouring services of the target services by (4), respectively, while Figure 6(b) shows that the time cost decreases when r grows. This is because a larger r value often means stricter search condition for neighbouring users or neighbouring services; and therefore, few search results are obtained when r is large; in this situation, less time is needed to evaluate and rank the few search results.

Profile (recommendation accuracy of w.r.t. and ). We test the recommendation accuracy of our proposed approach with respect to L and r. The following are the experimental parameter settings: = 25%, L is varied from 6 to 14, and r is varied from 8 to 14. The experiment results are offered in Figure 7.

As Figure 7 shows, the recommendation accuracy of increases (i.e., MAE drops) with the decrease of L and the growth of r. This is because a smaller L value or a larger r value often means stricter search condition for neighbouring users and services; in this situation, only the “most similar” neighbouring users or neighbouring services are returned to make service recommendations. Therefore, the recommendation accuracy is improved accordingly.

Profile (recommendation successful rate comparison). LSH is essentially a probability-based similar neighbour search technique; therefore, our proposed LSH-based service recommendation approach cannot always guarantee to return a satisfying recommended result to the target user. In other words, recommendation failure is inevitable. However, as discussed in Section 2, the hybrid CF method can reduce the failure rate to some extent. Therefore, in this profile, we test the recommendation successful rate of our proposal and compare it with the following two benchmark approaches: (i.e., the DistSRLSH approach in [26]) and .(1): integrate LSH with user-based CF(2): integrate LSH with item-based CF

Here, we define the successful rate of a recommendation approach as the ratio between the successful recommendation times and the total recommendation times ([0, 100%]). Parameters settings are = 25%, L = 1, and r is varied from 8 to 14. Concrete experimental results are presented in Figure 8.

As Figure 8 shows, the successful rates of three recommendation approaches all decrease with the growth of r. This is because a larger r value often means stricter filtering condition for the search of neighbouring users and neighbouring services; and, therefore, the successful rate of recommendations is reduced accordingly. Namely, there is a trade-off between successful rate and r; specifically, when r is large enough (e.g., when r = 14, 15…), the successful rate approaches 0. However, as Figure 6 shows, our approach still outperforms the other two approaches in terms of successful rate as our approach recruits hybrid CF for recommendation, integrating the advantages of both user-based CF and item-based CF.

5.3. Further Discussions

Our experiments only adopt one service quality dimension, i.e., response time, without considering the probably existed multiple dimensions [2737] and their respective weight significance values [3844]. In the future research, we will integrate the dimension and weight information into to make the approach more comprehensive. Besides, only one type of service quality data is considered in the experiments. So in the future, we will further extend our proposal by considering the possible data diversity in the big data environment [4550].

6. Conclusions

Collaborative service recommendation has become an effective technique to quickly extract insightful information from big educational data. However, traditional service recommendation approaches often assume that the service usage data used to make recommendations are centralized, without considering the multisource property of service usage data as well as the privacy leakage risks during the multisource educational data integration. Besides, existing service recommendation approaches often suffer from low robustness due to the possible data sparsity. In view of these drawbacks, we combine the LSH technique and hybrid Collaborative Filtering (HCF) for distributed service recommendations in the big data environment. Furthermore, to minimize the “False negative” recommended results incurred by the inherent shortcoming of LSH, two solutions are introduced in this paper, to reduce the probability that similar users and similar services are overlooked by mistake and thereby enhance the success rate. A wide range of experiments deployed on real-world dataset shows the performances of in terms of efficiency, accuracy, and successful rate while securing the sensitive user information.

However, only one quality dimension of web services is considered in the recommendation model, which is often not enough for the practical recommendation requirements. In the future, we will further refine our work by considering multiple quality dimensions as well as their linear correlations [5153] and nonlinear correlations [5458]. Besides, data type diversity is another challenge in the big data environment. Therefore, in the future research, we will continue to extend our proposal by integrating the multisource data with diverse data types, e.g., discrete data [5963], binary data [64], and fuzzy data [6567].

Data Availability

The [web service quality] data used to support the findings of this study have been deposited in the [WS-DREAM] repository (http://inpluslab.com/wsdream/)

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This paper is partially supported by the Natural Science Foundation of China (No. 61872219).