Abstract

Intelligent computing provides efficient, real-time, and secure data analysis services for the Internet of Things (IoT). As the number of IoT devices increases, IoT generates massive, diverse, and multisourcing datasets that can be used to improve IoT services further. Models trained by intelligent computing from a single system or sensor are often not global, and sending all data directly to the computing platform wastes network bandwidth and may cause network congestion and even privacy leakage. To ensure IoT applications’ quality of service and privacy, we propose a framework that integrates edge computing and blockchain to provide lightweight data fusion and secure data analysis for IoT. We propose a lightweight data fusion method that can reduce the amount of data at the node level and prevent network congestion and bandwidth waste. Furthermore, we propose a hierarchical fuzzy hashing method to check and locate anomalies of IoT machine learning models to ensure the validity of IoT intelligent computing and the security of sensitive data. Finally, we demonstrate the effectiveness of the method proposed in this paper through experiments.

1. Introduction

The Internet of Things (IoT) refers to the interconnection and integration of things in the physical world and cyberspace through the Internet or other communication networks [1]. With the increasing number of inexpensive information sensing embedded devices, and sensors, such as smartphones, mobile devices, wireless sensor network devices, and ubiquitous communication devices, IoT has played an essential role in various practical systems. At present, IoT has penetrated many scenarios such as medicine, industry, transportation, and agriculture, to provide intelligent services and applications [2].

With the addition of a large number of sensor devices, IoT generates massive, multisource, heterogeneous, and sparse diverse datasets [2]. The collected datasets can help improve IoT services and enhance smart services. However, it is unwise to directly send massive and multisource heterogeneous IoT data to the cloud, which will waste network bandwidth and even cause network congestion and packet loss [3]. Therefore, the original data collected by the node needs to be processed by data fusion before it can be transmitted to prevent network congestion or packet loss. Data fusion is performed locally by nodes to discard multiple copies of the same data, reduce the size and dimension of data, and optimize data quality to facilitate the extraction of useful information. Therefore, data fusion can promote a vital technology to mine essential data from extensive and complex data to improve quality and facilitate decision-making.

To provide efficient, real-time, and secure data analysis services for the IoT, the intelligent computing platform for the IoT has gradually entered the research field [4]. However, current intelligent computing platforms often suffer from two flaws. (1) IoT data obtained from discrete embedded devices or a single sensor alone often does not have global properties and even introduces abnormal modeling and training noise. (2) Centralized machine learning models may cause risks to the privacy of training data [57], and sensitive IoT data may cause disclosure of the actual physical process of industrial manufacturing sites.

To solve the above problems, we propose a distributed framework integrating blockchain and IoT to provide lightweight data fusion and secure big data analysis services for IoT. The edge nodes train the fused data to obtain a local model and transmit it to the cloud server. The cloud server obtains the global model by merging the local models. Our approach to training machine learning models in a distributed fashion consisting of edge nodes reduces the risk of data privacy breaches (as data does not leave the local training nodes). At the same time, we use the blockchain to ensure the consistency of the local model and the global model and propose a hierarchical fuzzy hashing method to detect and locate the abnormal position of the model. The main contributions of this paper are as follows: (1)Propose a distributed structure combining edge computing and blockchain to provide lightweight data fusion and secure big data analysis for the IoT(2)Propose a lightweight data fusion method, which uses adaptive algorithms and maximum and minimum functions to identify and delete redundant data to achieve node-level data fusion(3)Propose a hierarchical fuzzy hashing method to check and locate the anomaly of the IoT machine learning model and ensure security

The rest of this paper is organized as follows. In Section 2, we outline the work related to our proposed method. In Section 3, we describe our proposed framework and algorithm in detail. The experimental results and performance evaluation are summarized in Section 4. Finally, we conclude in Section 5.

2.1. IoT and Data Fusion

Many mathematical theories are used for data fusion in IoT. [8] introduced a method based on probability distribution to express the dependence between random variables. However, the method in [8] is difficult to obtain the density function and prior probability, and its performance is limited when processing complex multivariable data simultaneously. Hong et al. [9] propose an evidence-based reasoning method that introduces beliefs to represent uncertainty in the world. Song et al. [10] proposed a lightweight data analysis method. However, the quality function problem [9] limits its practical application. Chen et al. [11] proposed a multisensor data fusion method for load monitoring. Pan et al. [12] used the fault diagnosis method to fuse the data of the wind turbine gearbox. Lin et al. [13] proposed a data fusion strategy based on transfer learning for industrial networking. Dina et al. [14] used a spatiotemporal fusion method for the real-time fusion of IoT data. However, the methods of papers [1114] often require prior knowledge or prior modeling and cannot be applied at the node level.

At present, some work has focused on the privacy protection of data fusion. Shang et al. [15] used probability density function values and distribution factors to fuse different sensor data but lacked attention to robustness and verifiability. Wang et al. [16] used differential privacy combined with deep learning to achieve privacy protection but did not consider robustness. Lin et al. [17] proposed a privacy-enhanced data fusion strategy for medical data, but this method is less versatile.

2.2. IoT with Blockchain and Edge Computing

As a paradigm shift, blockchain transforms the IoT field by providing a decentralized environment [18]. Yang et al. discussed the impact of blockchain on IoT in terms of security, privacy, scalability, etc. [19]. Lin et al. [20] pointed out the benefits that blockchain provides to smart cities and provided a research route for smart city computing resource transactions. Li et al. [21] investigated the benefits of blockchain decentralization for microgrid energy supply. Duan et al. [22] proposed a communication network architecture for the Internet of Vehicles with the introduction of edge computing. [23] discussed the integration trend of IoT and blockchain and introduced future research directions.

Asante et al. [24] pointed out that physical devices on the blockchain are often unavailable storage and computing resources, which requires edge computing to solve this problem. Wang et al. [25] utilized edge computing and blockchain to exchange medical IoT data efficiently. Guo et al. [26] used blockchain technology to solve the problem of information silos between IoT platforms. Jindal et al. [27] leveraged blockchain and edge computing to secure vehicle-to-grid energy transactions. Liu et al. [28] proposed a new wireless blockchain framework supporting mobile edge computing to offload intensive mining figures from edge computing nodes. Sun et al. [29] proposed a system combining federated learning, edge computing, and blockchain to achieve this model sharings.

3. Data Fusion and Security Data Analysis for IoT

Sensor nodes for various applications in IoT transmit their data to cloud data centers through network gateways and edge servers. Unstructured data and spatiotemporal stream data collected by sensors can affect data center processing speed and decision-making behavior, potentially causing bandwidth bottlenecks, latency, and throughput degradation for time-critical and latency-sensitive applications. Large amounts of redundant data will lead to packet loss, longer delays, and network congestion [30]. At the same time, a single centralized database solution is often not suitable for complex IoT systems, so we consider integrating the distributed structure of blockchain and IoT to provide secure big data analysis services. This section discusses our proposed data fusion method and security protection strategy to eliminate redundant data and provide secure and effective data analysis services for IoT applications.

3.1. IoT Framework Integrating Edge Computing and Blockchain

In IoT applications, if the data collected by the IoT application layer is transmitted to the cloud layer directly or through the edge, it may cause congestion and significant delay at the cloud service layer. Therefore, we add data fusion at the nodes for optimization. IoT often involves private data, such as electricity consumption data in smart grids, medical and health data, and other private data that should not be leaked. At the same time, the IoT network requires real-time monitoring, forecasting, and other services, such as traffic forecasting, fire alarms, etc., and the data often needs to be processed locally. Given the above, in our model, edge nodes will perform local model training on the data locally to protect the privacy of the data. After the edge node transmits the local model parameters to the cloud server, it will train the global model and pass it to the edge node. The edge node will use the data analysis results or model parameters of the cloud server to process the data in time to meet the timeliness of the IoT network. At the same time, in the proposed architecture, the consistency and validity of local data and cloud data transmission verified by blockchain are used to prevent poisoning attacks. Figure 1 shows the overall framework of this paper.

3.2. Lightweight Data Fusion Method

To eliminate correlated and redundant data in IoT, we use a lightweight data fusion method at the node level. The proposed method utilizes adaptive distance exponent and minimax functions at the node level to identify and remove redundant data [31, 32]. Our method can be flexibly extended to any application. Through our method, on the one hand, redundant data can be identified and removed, and on the other hand, the changing trend of the data can be keenly captured to ensure the quality of the data. At the same time, due to the existence of the adaptive distance index, our method can automatically fuse the data even without prior knowledge.

The parameters of the fusion algorithm, including the maximum and minimum values, the distance index, the fusion index, and two lists, are stored in the node’s buffer. The maximum value, minimum value, distance index, and fusion index are denoted by , , , and , respectively. The fusion index is a constant, and . The larger the fusion index is, the lower the fusion rate is; the smaller the fusion index is, the higher the fusion rate is. The two lists are the distance interval list and the data packet list, respectively, denoted by and . At time , the sensor value is denoted as . In the beginning, when a new sensor value is sensed by a node, then , . The parameter changing method is shown in

Input:, , , , , , ,
Output:
1. for in range (0, ):
2.  if :
3.    = =
4.   .append ()
5.  else:
6.   .append ()
7.   .append
8.  if :
9.  if :
10. 
11. 
12. for in range :
13.  if :
14.   if or :
15.    
16.    break
17.   else:
18.    
19.    break
20.  end if
21. end for
22. end for
23. return

The parameters and are used to capture the magnitude of the data change and draw a fixed point on the data change curve. It is assumed that the sampling rate of each node is packets per second, where . The node sends data to the gateway at intervals . In our method, data fusion is performed at each time interval period. After the sensor value is entered into the node, the value will first update the , , and values. and its distance from the previous packet are added to the distance list. We use the updated value to detect the distance list, delete the data packets that do not meet the requirements of the data list, and update the distance list. The data fusion steps are shown in Algorithm 1.

Our method automatically adjusts the distance index to fit the characteristics of the data according to the varying range of the data. According to the distance index, the data is fused while retaining the extreme points of the data. Our method perfectly preserves the data features and reduces the fusion loss. At the same time, the fusion rate is used to flexibly adjust the sensitivity of the data to adapt to different scenarios. The fused data significantly reduces data redundancy, packet collision probability, and network delay and reduces network congestion.

3.3. Secure and Privacy-Preserving Anomaly Detection and Localization Algorithms

After data fusion, each node transmits the fused data to the cloud data center through network gateways and edge servers. The gateway acts as a relay node to monitor IoT devices and offload data to edge servers. The collaborative work of the edge service layer and the cloud service layer provides efficient, accurate, and real-time data processing and decision support for IoT applications [33].

The edge service consists of edge nodes, and the cloud service layer consists of peer cloud servers. In our model, the edge node is responsible for training the local model, and the cloud server merges the local model to obtain the global model. The edge node transmits the trained local model parameters to the cloud server, and the cloud server trains the global model according to all collected local models. Our strategy addresses the defect that local data cannot represent global attributes in IoT and guarantees local data privacy. Machine learning scenarios in IoT, such as smart medical care and smart grid, may attack machine learning models [1]. Here, we consider the use of blockchain technology to build defense mechanisms. That is to use the blockchain to protect the consistency and immutability of the local model and the global model. This method can improve the scalability of the IoT network, eliminate single points of failure, and ensure the model’s validity. Figure 2 shows the basic idea of the hierarchical fuzzy hashing algorithm. The local models of edge nodes differ, as illustrated in Figure 2, due to the diverse types of sensors used by edge nodes.

We introduce the blockchain to record the parameters of local and global models to process and check the erroneous input of information. The variability of the data recorded by the blockchain can prevent attackers from changing the model parameters after training. That is, the blockchain constitutes a defense tool to model attacks. Referring to [4, 34], we consider using a hierarchical fuzzy hash function to replace the information record structure of the traditional blockchain.

We propose the method of hierarchical fuzzy hashing, mainly considering the contingency of IoT attacks. The amount of data transmitted in the network through hierarchical fuzzy hashing is reduced, and data security is guaranteed. We divide the blockchain into two levels of information, the local chain and the global chain. The underlying information consists of a fuzzy hash function, and the fuzzy hash value can detect abnormal changes in model parameters. To improve the transmission efficiency, we save the hash of the fuzzy hash value in the global information. Figure 3 shows the basic idea of distributed anomaly detection and location algorithm combined with blockchain.

We use a hierarchical fuzzy hashing algorithm to reduce data transmission loss by preserving subinformation of information. The global model is recorded in the global chain using a hashed hash. The edge node hashes the record information of the global chain and compares it with the record information of the local chain. An abnormality is detected using the fuzzy hash value detection of the global chain and local chain records when there is an abnormality. We use the fuzzy hash value to calculate the Hamming distance between the models, determining the degree of attack on the model and locating the attacked position. Our hierarchical fuzzy hashing method and anomaly detection algorithm improve the efficiency of the IoT network and ensure the model’s security under the condition of guaranteeing anomaly detection.

4. Experimental Results

In this section, we evaluate the efficiency of our proposed lightweight data fusion and the effectiveness of the hierarchical fuzzy hashing security strategy based on experimental metrics.

4.1. Data Fusion Algorithm Test

For data fusion, we perform data fusion at the node level. We use the data collected by the Intel Berkeley Research Lab [35] to test the data fusion algorithms’ performance and fusion rate.

We first measured the capture of data features by the lightweight fusion algorithm. Taking the temperature sensor data of a particular day as an example, we take one hour as the fusion period, and the fusion result is shown in Figure 4. Figure 4 shows the data packets transmitted by our proposed lightweight data fusion algorithm under different fusion indexes. As seen from the figure, our proposed algorithm dramatically reduces the number of packets. At the same time, we can find that our algorithm can reduce the number of packets while ensuring the characteristics and quality of the data. Experimental results demonstrate that our method reduces data redundancy, packet collision probability, and reduces network congestion.

We tested the fusion performance of the data fusion algorithm. We compare existing data fusion methods, including stratified sampling [3] and EECC [36]. Figure 5 shows the proportion of data transmitted to the gateway by different data fusion methods. In Figure 5, the horizontal axis represents the total number of data packets, the vertical axis is the percentage of transmitted data packets , and is the number of compressed data packets. In Figure 5, our method significantly outperforms other algorithms in compression performance. During the fusion process, stratified sampling takes the extreme value of each layer after stratification. The stratification leads to a higher percentage of fused data delivered by the gateway, while EECC increases the number of datagrams due to the transmission of multiple copies. This experiment proves that our method has a better data fusion effect than existing algorithms.

4.2. Anomaly Detection and Localization Algorithm Test

We tested the model anomaly detection and anomaly localization methods. We consider the case where an attacker produces anomalous models by retraining the model with additional data containing Trojan triggers. The main purpose of our method is to check whether the model is poisoned by comparing it with a benign model and then using the local fuzzy hash to locate the abnormal location and give the poisoning ratio of the model.

We test the feasibility and effectiveness of our proposed method on anomaly detection and localization of parameters using a set of machine learning models. We focus on how the model detects and locates anomalies. We consider the global model to be benign. The experiments assume that the attacker leads to the generation of abnormal models by retraining the model with additional data containing Trojan triggers. We first use the two-layer hash value comparison of the local and global models to check whether the model is poisoned. When the model poisoning is detected, we use the fuzzy hash value to locate the abnormal location and give the model poisoning ratio.

Figure 6 shows the hierarchical fuzzy hashing algorithm’s anomaly detection and localization process after model poisoning. As shown in Figure 6, after the global hash and the local hash are entirely different, the fuzzy hash value will be backtracked (when the global hash is consistent with the local hash, this step is not performed). It can be seen from Figure 6 that the poisoning model is not completely inconsistent with the correct model. According to the difference of fuzzy hash value, we can locate the poisoning position and poisoning degree of the model. This experiment demonstrates our proposed hierarchical hashing, checking anomaly models and locating anomalous locations.

This experiment mainly provides an extensive evaluation of the various performances of our proposed hierarchical hash function. To compare the performance of the hierarchical hash function on the blockchain system, we will first take the method that does not use machine learning and hierarchical fuzzy hashing as the basic method (the edge nodes directly transmit all data to the cloud server for processing). Meanwhile, we use the fuzzy hashing method used in the paper [5] as another model for comparison. We use data from the same sensor for performance testing.

We contrast the proposed layering with the paper’s base method and the method [5]. At the same time, for simplicity, we use the central node as the verification node to improve the system’s efficiency. That is to say, after the central node (cloud service) has passed the operation, it can be written to the global chain, and each edge node maintains its local chain and periodically transmits encrypted data to the cloud service. In this experiment, there are five edge nodes and one central server. Figure 7 shows the relationship between the number of processing blocks of a node and the required time.

We also tested end-to-end latency. We simulate the process of storing and exception queries (as shown in Figure 8). Compared with other methods, our proposed hierarchical fuzzy hashing algorithm requires less time on average to process packets. As shown in Figure 8, the hierarchical fuzzy hashing proposed by us significantly improves the processing power of the blockchain system.

4.3. Data Packet Overhead

We finally test a comparative experiment of the overhead of data transfer packets for the overall experiment. The purpose of the experiment is to compare the impact of our proposed data fusion algorithm and hierarchical hash function on the data transmission overhead of IoT. We calculated the comparison of packet sizes transmitted by edge nodes to cloud servers in the case of 5, 10, and 15 nodes. As shown in Figure 9, the packet size processed by our method is much smaller than the packet size before processing. The above experiments demonstrate that the use of data fusion, machine learning, and hierarchical fuzzy hashing can positively impact IoT data analysis services, which can reduce time and transfer data volume while improving data confidentiality and fault tolerance.

5. Conclusion

In this paper, given the current situation of the rapid increase in the scale and quantity of IoT data, we propose a method to integrate edge computing and blockchain to provide data fusion and secure big data analysis for the IoT. We propose a node-level lightweight data fusion method and a hierarchical fuzzy hashing method, which are used to reduce the amount of data transmitted by IoT and ensure the security of IoT data. We reduce the amount of data transfer at the node level using a lightweight data fusion method. The edge node calculates the local model to reduce the pressure on the cloud server, and the cloud node is used to merge the local model to improve the representativeness of the model. We utilize blockchain and hierarchical fuzzy hashing to ensure the consistency and validity of local and global models while maintaining data privacy. Our experimental results show that our method dramatically reduces the amount of data transmitted by the IoT and improves the system’s security. In the future, our goal is to solve the load imbalance problem of edge nodes to enhance the utilization of edge devices.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This paper is supported by the Jiangsu Provincial Key Research and Development Program under Grant BE2020001-1, independent project of State Key Laboratory of Robotics (2022-Z03), and Liaoning Natural Science Foundation (2021-MS-030).