Abstract

IoT service recommendation techniques can help a user select appropriate IoT services efficiently. Aiming at improving the recommendation efficiency and preserving the data privacy, the locality-sensitive hashing (LSH) technique is adopted in service recommendation. However, existing LSH-based service recommendation methods ignore the intrinsic temporal feature of IoT services. In light of this challenge, we integrate the temporal feature into the conventional LSH-based method and present a time-aware approach with the capability of privacy preservation for IoT service recommendation across multiple platforms. Experiments on a real-world dataset are conducted to validate the advantage of our proposed approach in terms of accuracy and efficiency in recommendation.

1. Introduction

The rapidly increasing number of IoT devices and services are continuously producing a vast amount of data. Among those data, the quality of service (QoS) data is generated when a user invoked a service. Service recommendation techniques can employ the historical quality data to reduce the users’ service selection burden and help them find out appropriate services efficiently. Typically, taking collaborative filtering recommendation algorithm for example, through predicting the quality value when a target user invokes a service according to its similar users’ quality data, a recommendation system can suggest a list of optional services for a target user.

However, in the IoT environment, traditional recommendation algorithms are often nonavailable in practice [1]. One fundamental reason is that users may choose services from different platforms. For example, in the context of a smart home, users may use service provided by Hikvision to capture video data, service from Huawei to monitor the air quality, and service from Xiaomi to control household appliances. Therefore, the historical QoS data of users is not centralized, but stored across different platforms [2, 3]. Moreover, because of conflicts in economic interests and data privacy concerns, service providers are unwilling to share their data with each other. Furthermore, the volume of the historical QoS data is often massive, thus it is impractical to share such large amount of data with other platforms.

Considering these challenges, the locality-sensitive hashing (LSH) algorithm [4] is adopted in service recommender system. Specifically, through hashing high dimensional historical quality data of users into scalar hash value, user privacy is preserved. Besides, as the hashing process is conducted individually in each platform, there is no need to transfer historical data.

However, most LSH-based service recommender systems view the historical QoS data as static and unique, rarely taking other context factors (e.g., time) into account. This may lead to less reasonable and accurate recommended results.

In light of the abovementioned challenges, we integrate temporal dimension into the conventional LSH-based recommendation method and present a time-aware recommendation approach for IoT services across multiple platforms. The proposed approach achieves higher accuracy than conventional recommendation approaches that have the capability of privacy preservation. The contributions of this paper are summarized as follows.(1)We improve the conventional LSH-based recommender system by incorporating it with a time factor, so as to adapt the intrinsic feature of IoT services, and achieve better performance in IoT service recommendation.(2)We conduct extensive experiments on a real-world dataset to validate the advantage of our method. Experimental results demonstrate that the proposed method outperforms the other state-of-the-art approaches in both recommendation accuracy and efficiency, while preserving data privacy among multiple platforms.

The rest of this article is structured as follows. Section 2 reviews recent work on recommender system for IoT service. Section 3 formulates the problem of IoT service recommendation and presents our motivation. Section 4 describes a time-aware cross-platform IoT service recommendation approach with privacy-preserving capability. Section 5 demonstrates the implementation and results of experiments. Section 6 summarizes the whole paper and addresses future work.

In this section, we review the related research work on time-aware IoT services recommendation with privacy preservation from the following three aspects.

2.1. IoT Service Recommendation

Most existing researches on recommender system for IoT services can be classified into three groups: content-based filtering, collaborative filtering, and link-based methods [1]. In [5, 6], Mashal et al. formulated the IoT recommendation as a hypergraph model, which connects users, objects, and services with hyperedges and presents a graph-based recommender system considering the unique feature of IoT services. Yao et al. put forward a unified framework based on probabilistic factor; they calculated user similarity and device similarity and then fused them together to make more accurate recommendation [7, 8]. In [9], Mashal et al. proposed a multiagent approach to establish a distributed recommender system in IoT environment. However, in the above literature, the time factor, which is one of the most common features of IoT service, is not considered in IoT recommendation system. This may decrease the accuracy of recommendation result.

2.2. Time-Aware Service Recommendation

A number of research works have taken the time factor into account to obtain more accurate recommendation results. In [10, 11], Wang and Zhu present a spatial-temporal QoS value prediction approach. Temporal sequences of historical QoS data are employed to build feature models, while the spatial information of web services is exploited to reduce the searching space. In [12], Zhong and Fan built a time-aware recommender system for mashup creation. An extraction method for service pattern based on LDA and time series prediction is presented. In [13], Yu and Huang took both time and location factors into account; they represent the temporal quality data as a three-dimensional matrix and use CF techniques to make prediction and recommendation.

All of the abovementioned methods employed the time factor to enhance the performance of recommendation system; however, none of those methods take privacy preservation into account, which is necessary when QoS data is collected from different platform to obtain more comprehensive user preference.

2.3. Privacy-Preserving Service Recommendation

As IoT service data from different platform contains sensitive user information, it is crucial to preserve user’s privacy while sharing valuable data across platforms [1419]. In [20], Ma et al. proposed K-anonymity method to protect user privacy through hiding sensitive user identification information. However, this may influence the data availability and decrease the performance of recommendation systems accordingly. In [21], Dou et al. suggest not publishing all the observed QoS data but the optimal data; however, users may still leak some sensitive information. In [4], Qi et al. first introduced LSH technique into service recommendation; by hashing high dimensionality QoS data into low dimensionality indices, data privacy can be protected in an efficient way. However, time factor is still not considered in the approach.

As a conclusion, existing IoT service recommendation methods fail in taking time factor and privacy preservation into account simultaneously [2226]. In light of this challenge, we improve the conventional LSH method and present a time-aware cross-platform IoT service recommendation algorithm with privacy preservation.

3. Formulation and Motivation

3.1. Problem Formulation

Concretely, our time-aware cross-platform IoT service recommendation model can be formulated as a five-tuple IoTSerRect-LSH (SP, U, IS, H, utarget), where(1) represents the k-th IoT service platform. Each platform provides a part of a user’s QoS data.(2) denotes the k-th IoT service user. As a user may invoke IoT service from different platforms, his/her historical QoS data is stored across multiple platforms.(3) denotes the j-th IoT service on the i-th platform; means the number of IoT services provided by the i-th platform.(4) denotes the historical user-service quality data at the k-th time slot. denotes the quality value of the j-th IoT service when the i-th user invokes at the k-th time slot.(5): the user who needs recommendation service.

3.2. Motivation

As shown in Figure 1, there are three IoT service platforms, Alibaba (denoted as ), Huawei (denoted as ), and Google (denoted as ). IoT services are deployed in Alibaba, are deployed in Huawei, and are deployed in Google. Users keep on invoking IoT services in the above platforms to finish certain tasks.

For conventional user-based CF recommendation approach, if we want to make recommendation for , we need to find similar users of according to the historical QoS data first [2729]. However, there are three challenges in the collaboration process among Alibaba, Huawei, and Google. (1) Considering user privacy, each platform cannot share their own QoS data to each other [3033]. (2) Since the user-service quality data keeps updating overtime, its volume becomes increasingly massive, which significantly reduces the collaboration efficiency and scalability [19, 34]. (3) As a user often invokes an IoT service constantly, a user-service pair is made up of a series of QoS values, which makes traditional user-based recommendation approach unsuitable for this situation [35, 36].

Considering the abovementioned challenges, we present a novel LSH-based IoT service recommendation method. The method will be elaborated in Section 4.

4. Privacy-Preserving and Time-Aware LSH-Based IoT Service Recommendation Algorithm: IoTSerRect-LSH

In this section, we present a time-aware LSH-based recommendation algorithm for IoT services across multiple platforms, which is denoted as IoTSerRect-LSH. In summary, our proposed algorithm is divided into three steps:Step 1: calculating user indices at each time slots based on LSH. For an IoT service platform, spk(1 ≤ k ≤ z), each user u‘s QoS data in spk at the t-th time slot is mapped into Ht,k (u), which denotes the k-th subindex of user u at the t-th time slot. User u’s complete index at the t-th time slot Ht (u) is merged as (Ht,1(u),…, Ht,c (u)) offline. The set of indices of all users at each time slot is regarded as a hash table.Step 2: finding out top-K users most similar to utarget. Compute utarget’s complete index at each time slot according to step 1. Then, calculate the similarity between utarget and other users according to the user indices at each time slot, and return the top-K users with the highest similarity score with utarget.Step 3: recommending IoT service for utarget. Predict the quality value of services never invoked by utarget based on utarget’s top-K neighbors’ quality data. Then, retrieve the quality-optimal one to utarget.

Next, we demonstrate the implementation of each step in details.

4.1. Step 1: Calculating User Indices at Each Time Slots Based on LSH

In the IoT environment, a user (i.e., ) usually invokes a IoT service (i.e., ) at intervals and continuously generates a sequence of QoS data (i.e., ) [37]. To reduce the scale of historical QoS data, we only employ the historical QoS data of the latest t time slots (i.e., ). Therefore, we denote the QoS data in the p-th platform at the -th time slot as a matrix shown in (1), where the i-th row describes the quality value of all services provided in the p-th platform invoked by the i-th user at the -th time slot. Note that if the i-th user does not employ the j-th IoT service at the -th time slot, .

Next, to preserve the user’s privacy, we utilize LSH technique to map a user’s QoS data to less-sensitive user index. Since classical CF-based recommendation algorithms often employ Pearson correlation coefficient (PCC) as a measure of user similarity, we adopt the LSH function for the PCC distance to realize the transformation.

For each platform and each time slot , we randomly generate a -dimensional vector from the range [−1, 1]. For user , the LSH function for time slot is defined in (2). Here, , and .

Thus, according to (2), a user’s observed IoT service data at the p-th platform is firstly mapped into a less-sensitive one-dimensional float value; then, all the float values at each platform are accumulated and mapped into a binary value.

Because LSH is a neighbor search method based on probability, one hash function usually leads to less accurate search results. Thus, we adopt amplified LSH through employing multiple hash functions and hash tables. Concretely, for each time slot , we define r hash functions based on r vectors randomly generated from the range [−1, 1]. Afterwards, we obtain a t  r 0-1 matrix, represented as in (3). Indices of all users in U are stored in a hash table.

4.2. Step 2: Finding Out Top-K Users Most Similar to

In step 1, we have generated a hash table; a user’s quality data in recent t time slots is mapped into a t  r 0-1 matrix. For convenience, we denote the matrix (in (3)), as shown in (4), where is a vector referring to the -th row in the matrix. Thus, the hash values of all users’ quality data in recent t time slots can be represented by t  m matrix (shown in (5)).

Because of the probability-based feature of LSH, it is too strict to find neighbors according to single hash table. Thus, we build hash tables through repeating step 1 L times, which can reduce the number of possible “false-negative” neighbors.

Next, we compute the similarities between and the other users at the -th time slot by comparing with the other elements in the -th row in (5). The similarity between and at time slot is denoted as . We first initialize to 0. If holds in any hash table , we increment by one. The similarity between and is calculated with the formula in (6), where is a tunable parameter to control the influence of different time slot. The tunable parameters meet the constraint defined in (7).

Finally, the K users with the highest values are returned as the similar user set of , denoted as .

4.3. Step 3: Recommending IoT Service for at Time Slot t

In step 3, we have obtained the top-k most similar users of , i.e., SIM_U_Set(). Next, we recommend the optimal IoT service for at time slot t based on SIM_U_Set(). In detail, we first predict the quality value of IoT services which have not been employed by at time slot t according to formula (8); then, we make recommendation for according to the predicted quality value.

5. Experiments

To testify the feasibility of our proposed method, extensive experiments are conducted on a real-world QoS dataset, i.e., WS-DREAM [17]. The dataset consists of quality values (i.e., response time and throughput) of 4532 web services invoked by 142 users at 64 different time slots. In our experiments, we consider only one quality dimension, i.e., response time. Moreover, each country that owns services is used to simulate a geographically distributed IoT service platform.

Next, we generate test dataset by randomly removing a proportion of data in the dataset. If P% QoS values have been erased, the sparsity of the test dataset is defined as P%.

Because of the intrinsic nature of LSH, the privacy-preservation capability of our approach is not testified here. Specifically, we testify and compare three evaluation measures.(1)Time cost: time consumption of our proposed approach, which is employed to measure the efficiency and performance of a recommendation system.(2)MAE (mean absolute error): average difference between predicted value and real value, which is employed to measure the accuracy of a recommender system.(3)RMSE (root mean square error): square root of the average of squared error between predicted value and real value, which is also employed to measure the accuracy of a recommendation system.

Concretely, in our experiments, we testify and compare five profiles in our experiments.

5.1. Profile 1: Determining the Best Value of Parameters L and r

In our approach, the number of hash tables (i.e., L) and the number of hash functions in a hash table (i.e., r) can significantly influence the accuracy of a recommender system. Thus, we compare the prediction accuracy (i.e., RMSE and MAE) under different L-r settings and find out the best value of L and r in this profile. For each L-r setting, we repeat our approach on 50 different test datasets and all of the test datasets’ sparsity if fixed to 10%. The reason why we choose 50 different test datasets is that we find the average accuracy tends to be convergent after repeated 50 times (as shown in Figure 2).

Table 1 demonstrates the MAE and RMSE values of our algorithm under different combination of L, r, and K when the sparsity of test dataset is set to 10%. Results reveal that our algorithm has the best accuracy when parameters are set as follows: L = 16 and r = 8. This holds when top varies from 5 to 10.

5.2. Profile 2: Finding the Proper Value for Parameter K

In our proposed method, we need to find out a target user’s K most similar users. To this end, we conduct experiment with various values of K, where parameters L and r are fixed to 16 and 8, respectively. Therefore, we can find out the proper value of K in our method. Figure 3 shows that, after repeating the experiment on different test datasets 50 times, our method achieves the highest average accuracy (both MAE and RMSE) when top = 8.

5.3. Profile 3: Accuracy Comparison of Three Recommendation Algorithms

To demonstrate the feasibility of our method, we compare IoTSerRect-LSH with two state-of-the-art algorithms UPCC [18] and SerRecdistr-LSH [4] within different dataset sparsity. Based on the findings in Profile 1 and Profile 2, parameters in IoTSerRect-LSH are defined as follows: L = 16, r = 8, and K = 8. The following observations are made based on the results presented in Figure 4:(1)The MAE and RMSE values of all three approaches decrease as the dataset sparsity increases. When the sparsity is high, less useful QoS values are fetched, which leads to a less precise prediction.(2)SerRecdistr-LSH performs much worse than our approach since it does not consider the historical temporal information.(3)When the dataset sparsity is lower than 50%, the MAE value of our approach is a little lower than UPCC. In spite of this, our algorithm still outperforms other algorithms in most cases. The reason is that the quality values at current time slot are sufficient to find the most similar users for UPCC when the dataset sparsity is low. However, temporal information is necessary within sparser dataset.

5.4. Profile 4: Efficiency Comparison of Three Recommendation Algorithms

In this profile, we make a comparison among the three recommendation algorithms in terms of the efficiency. All of the experiments are performed on a computer with Intel i5 processor and 16.0 GB RAM, which runs Windows 10 and Python 3.6. The experimental results (in Figure 5) show that the time consumption of UPCC is much higher than the other approaches since the UPCC method calculates correlation coefficients online. Furthermore, the time cost of our method is close to that of SerRecdistr-LSH since they both adopt the offline recommendation strategy.

6. Conclusion

The number of IoT services is rapidly growing. Recommendation technology can significantly relieve the target users’ selection burden. However, because of user privacy concerns, it is impractical for different platforms to directly share their data with each other. In this paper, we proposed a novel cross-platform LSH-based IoT service recommendation method with privacy preservation. Through LSH technique, high dimensionality historical quality data is hashed into less-sensitive indices. Moreover, historical data at different time slots are employed to make more accurate recommendation. Finally, a number of experiments are conducted on real-world dataset WS-DREAM. The experimental results demonstrated the advantage of our approach in terms of accuracy, efficiency, and the capability of privacy preservation.

However, some limitations should be addressed. As we did not consider the periodicity of the quality value of IoT service, the number of time slots is fixed in the current method. Furthermore, spatial information of both users and service providers is not employed in our approach. Therefore, we will further enhance our algorithm by taking more context factors such as periodicity [38, 39] and location [40, 41] into account.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.