Abstract

Aiming at the problem of real-time detection and location of moving objects, the deep learning algorithm is used to detect moving objects in complex situations. In this paper, based on the deep learning algorithm of wireless sensor networks, a novel target motion detection method is proposed. This method uses the deep learning model to extract visual potential representation features through offline similarity function ranking learning and online model incremental update and uses the hierarchical clustering algorithm to achieve target detection and positioning; the low-precision histogram and high-precision histogram cascade the method which determines the correct position of the target and achieves the purpose of detecting the moving target. In order to verify the advantages and disadvantages of the deep learning algorithm compared with traditional moving object detection methods, a large number of comparative experiments are carried out, and the experimental results were analyzed qualitatively and quantitatively from a statistical perspective. The results show that, compared with the traditional methods, the deep learning algorithm based on the wireless sensor network proposed in this paper is more efficient. The detection and positioning method do not produce the error accumulation phenomenon and has significant advantages and robustness. The moving target can be accurately detected with a small computational cost.

1. Introduction

With the development of big data and the Internet of things, the research of moving target recognition is more and more important in the field of pervasive computing. There are many application scenarios in the field of motion target recognition. In the field of intelligent medicine, in the trauma emergency clinic, medical staff may forget the medical steps of treating patients because of the busy operation, thus affecting the treatment time for patients [1]. Medical staff can effectively prompt the operation progress of medical staff and prompt the next operation of hospital personnel by arranging Kinect and RFID reader in the trauma emergency room and arranging RFID tags on medical supplies to analyze the signal strength of images and sensor data in the process of treatment. In the field of monitoring system, the monitoring system in public places can detect the abnormal behavior of suspicious elements in crowded places such as subway stations and airports. In the field of behavior analysis, for example, in sports competitions, the motion data of athletes can be collected through the wearable sensor devices worn on the clothes of athletes, so as to further analyze the behavior of athletes which can provide ideas for the coach’s direction [2]. In smart home, smart home sensor devices can monitor the behavior of moving objects in real time, when the moving objects have dangerous accidents, the intelligent home detection system can send the timely information status of moving objects to medical staff or children, which effectively makes use of rescue time and provides protection for the life safety of moving objects [3]. Therefore, the recognition of moving objects plays a very important role in people’s life.

The data of moving target recognition mainly comes from two sources, one is image data obtained from video equipment such as camera and monitoring system and the other is sensor data [4]. The image data is mainly obtained by camera, and the data based on the sensor is mainly through accelerometer, gyroscope, magnetometer, Bluetooth, router, RFID reader, and sound sensor. Compared with the image data obtained by cameras and other devices, activity recognition by using various sensors can ensure the safety of users. First of all, in the actual scene layout, the camera setting is often limited by the external environment. At the same time, the image data acquired by the camera may lead to the phenomenon of personal privacy leakage [5]. Secondly, the information obtained by the camera will contain the information interference of nontarget objects, which brings inconvenience to the later data processing. With the continuous development of sensor technology and pervasive computing and for the protection of personal privacy, in the field of activity recognition, human daily activity recognition based on sensor data becomes more and more popular [6]. Activity recognition using nonhardware resources such as Bluetooth and wireless signal is also developing rapidly. The activity recognition data based on the sensor can reflect the characteristics of time series data, which improves the recognition accuracy [7]. Meanwhile, in recent years, the activity recognition based on sensor data has made some achievements in gait detection, behavior analysis, mobile user monitoring, and health management.

The deep learning algorithm can automatically learn high-level sensor data features from the end-to-end deep neural network, thus replacing the manual extraction of sensor data features in the machine learning method. However, there is little research on the transfer of the deep learning algorithm in the field of activity recognition, and its effect, mechanism, and influencing factors are still unclear. Therefore, this paper focuses on the sensor data in the field of moving target recognition and applies the deep learning model and migration method to carry out research. Make full use of the ability of mobile devices to communicate, collect and process data, and collect users’ travel data through different wireless sensors. After the traditional statistical features are extracted from the sampled data, the depth features are learned through a variety of different deep learning network models, and various important superparameters are monitored and adjusted to improve the recognition accuracy. Through fine-grained recognition of moving targets, the reliability of recognition is improved, thus ensuring the detection of high precision, low energy consumption, and robustness.

2. Deep Learning-Based Wireless Sensor Networks

2.1. Wireless Sensor Network

The deployment of wireless sensor networks and the hardware and software design of sensor nodes are the premise and foundation of wireless sensor network topology control and also the key to effectively control and optimize the network topology. The system architecture of the wireless sensor network is shown in Figure 1, and the wireless sensor network is composed of sensor nodes, sink nodes, task management nodes, Internet, and satellites. In practical applications, sensor nodes are generally randomly deployed in the monitored area and form a wireless network through self-organization [8, 9]. Each node is not only responsible for information collection and transmission but also responsible for information forwarding. In order to improve the accuracy and credibility of the information obtained, the sensor node, after preliminary processing of the detected information, transmits the information to the sink node in the way of multihop relay and then reaches the monitoring center through mobile communication network, Internet, or satellite. The wireless sensor node can monitor, release, and manage the network data.

Sensor node is an embedded system. Due to the constraints of volume, price, energy, and other factors, its computing power, data storage capacity, and communication ability are weak. Usually, sensor nodes only communicate with their neighbors within their own communication range and transmit information to nodes outside the communication range through multihop routing. Compared with general sensor nodes, the sink node usually has strong computing power, data storage capacity, and communication ability. It can be regarded as an enhanced sensor node with enough energy supply, or a special gateway device with the wireless communication interface. The sink node is a bridge connecting the wireless sensor network and external network. Through protocol conversion, the communication between the management node and wireless sensor network is realized. The collected information is forwarded to the external network, and the tasks submitted by the management node are released.

2.2. Wireless Sensor Network Data Acquisition Based on the Wireless Sensor Network

The sensor domain is regarded as the basic unit of the network. The domain is composed of the sink and node, and it is assumed that the sink node has sufficient power and certain computing power. The energy and computing power of ordinary nodes are limited. They are evenly distributed around the sink and can be supplemented by some form of energy. The domain is composed of D × D area and divided into B × B blocks. The nodes in the block are randomly generated patrol nodes for event discovery, wake-up node, and neural computing.

When the network is built for the first time and no kernel is available, the data collection is carried out after the network is formed, and the core training is completed by the sink or base station (BS). At the same time, the core obtained by the network can be used for later networking. The training process includes model compression: pruning, quantization, and Huffman coding, in which quantization uses a formula to update the gradient center of error backpropagation.

The sensor domain is managed by the sink. At the beginning, the sensor domain is partitioned, the partition blocks are collected, and the patrol is generated randomly by the sink, and the cores are randomly distributed to all nodes or stored in the nodes before network deployment, and the nodes are dormant. The node can supplement energy in some form, and the charging can still be carried out when the node is dormant [10]. Figure 2 shows the process of network operation.

The basic unit domain of the network operates in the form of round, and the whole domain is managed by the sink to maximize the distributed and parallel processing advantages of WSNs. In each round, a certain amount of patrol is randomly selected in the area to form the event sensing chain of the whole sensing domain [11]. Taking 100 nodes in each zone and 1 s acquisition interval in each round as an example, if two patrols are selected, the rotation completes all nodes in the area for about 50s. Each block forms a data packet as a pool operation to get a pooled output. The pooled output is collected by the neighbor patrol, and the same zone patrol shares the data. The B × B output corresponding to the number of blocks is obtained by pooling in the region.

The extracted core index table should be filled in first, and the index table and original data should be uploaded first. Further feature extraction is performed in the data flow through patrol, and the feature extraction is completed in parallel. After feature extraction is completed, the feature image is uploaded. If all kernel extraction is completed, only feature image and index table will be transmitted [12]. If the sink is not completed, subsequent feature extraction will be performed to complete the calculation of other levels of the whole model to realize event recognition.

2.3. Deep Learning Algorithm Model and Behavior Recognition of Wireless Sensor Data

The algorithm model based on deep learning can effectively extract the low-dimensional features of data. The feature extraction and classification model proposed in this paper is a single-hidden layer neural network. By seeking the optimal parameters, the output value of the hidden layer can reconstruct the input as much as possible. At this time, the output value of the hidden layer can be regarded as the low-dimensional features after dimension reduction. The deep learning model algorithm is shown in Figure 3.

Before the classification and fusion of sensor node data, the corresponding feature extraction classification model needs to be trained. Aiming at the situation that the training sample does not contain label information or contains label information, a deep learning algorithm based on the wireless sensor is designed. (1) The network node has a unique ID number and fixed position after random deployment; the initial energy of the node is the same and cannot be replenished. (2) The sink node is deployed outside the sensing area, with fixed location, sufficient energy, and strong storage and computing capacity. (3) The sink node can directly send data to the node, but the power of the node is limited; each node can get its own location information.

With the continuous development of behavior recognition technology and the rapid development of wireless sensor technology, human behavior recognition technology based on the wireless sensor is gradually becoming a research hotspot in this field. Due to many advantages of sensors, this topic uses sensor data to study human behavior recognition. The following defines human behavior recognition based on the sensor.

Suppose that the user is executing some predefined human behaviors, which are represented by the set , , and represents a specific human behavior, and i represents the number of human behavior categories. Under different behaviors of users, the sensor readings are different, showing different rules. The sensor readings in a period of time can be expressed as x, , and xt are the sensor readings at time t. The actual category of human behavior is expressed in formula (1), and the set t represents the actual human behavior:

The goal of human behavior recognition is to establish a behavior recognition model h by using the learning algorithm and predict human behavior. The predicted behavior category is expressed as C, and the formula is expressed as follows:

In mathematical sense, the learning goal of human behavior recognition is to learn the model h by minimizing the difference between the predicted behavior category m and actual behavior category n. In general, the model h does not directly take the sensor reading x as the input, but needs to transform the sensor reading first, and the transformation process is expressed as . At this time, the learning goal is expressed as follows:

At present, the human behavior recognition model based on sensors can be represented as shown in Figure 4, which mainly includes sensor data preprocessing (signal denoising and sliding window segmentation), feature extraction, and model learning. Combined with the above mathematical description of behavior recognition, the sample X is obtained after sliding window segmentation. The process of feature extraction is equivalent to the transformation of sensor readings, and the corresponding x and learning model h represent the classification algorithm. Each part of the behavior recognition process is described in detail as follows.

3. Analysis of Motion Behavior Sensor Data and Special Diagnosis Extraction

3.1. Data Analysis of the Motion Behavior Sensor

The experimental evaluation of the dataset, data processing algorithm, and model is carried out. Firstly, the effectiveness of the target dataset is evaluated and compared with seven sample datasets. Then, we evaluate the data processing algorithm and compare it with the original sensor acceleration data input and whether it will affect the training model after being converted into image data. Finally, the comparative experiment of the optimal values of some parameters in the experiment is carried out.

It can be seen from Figure 5(a) that when a person is in downstairs, the center of gravity changes not only in the vertical direction but also in the horizontal direction along with the human movement, but mainly in the z-axis direction. Therefore, the data change in the z-axis direction is more obvious than that in the x-axis and y-axis direction. In the figure, the initial acceleration value of the x-axis and y-axis is close to 0, and the acceleration value of the z-axis is about 2. The fluctuation range of the z-axis acceleration value is [−2, 2], most of z-axis acceleration values are greater than 0, and the fluctuation of the x-axis and y-axis is relatively stable. At the beginning of sampling, the value of the z-axis fluctuates greatly, but the change is not obvious in the later period. It can be seen from Figure 5(b) that when a person is in downstairs, the center of gravity will not only change in the vertical direction but also change with the movement of people in the horizontal direction. When the behavior of participants is upstairs, theoretically, the z-axis should be positive, and the values on x and y axes fluctuate slightly. In the figure, the initial acceleration value of the z-axis is close to −1 and increases continuously. More than 90% of z values are positive, the fluctuation range of the z axis is [−1, 2], and the fluctuation of the x-axis and y-axis is relatively stable. This is consistent with the actual situation. It can be seen from Figure 5(c) that when the participants do the standing action, if the absolute static standing state is ensured, the acceleration values of x and y should be zero, while the z axis will have an acceleration value due to the gravity of the Earth. The value of the x-axis is 0 in 95% of the whole interval, while the value of y is close to 0.4 in the whole time interval, and the value of the z-axis is stable around −0.5. This is due to the error in the actual data acquisition. It can be seen from Figure 5(d) that when participants do the sitting action, the values of triaxial data tend to be stable, and most of the fluctuations are not obvious. It can be seen from Figure 5(e) that when the participants perform the running action, the data values of the three axes fluctuate obviously. The values of the three-axis acceleration move up and down in the range of 0, which has obvious regularity. The fluctuation ranges of the three axes are all [−2, 2]. Their initial values are all around 0. Because running is a kind of violent activity compared with standing, the three-axis data value changes greatly. It can be seen from Figure 5(f) that when participants take the walking action, the three-axis data fluctuate significantly. The acceleration of the y-axis moves up and down between 0, which has obvious regularity. The fluctuation range of the x-axis is [−1, 1], that of the y-axis is [−1, 2], and that of the z-axis is [−1, 3]. Their initial values are all around 0. The acceleration of walking and running is similar in the x-axis and y-axis, and z-axis acceleration of walking behavior is higher than that of running.

3.2. Data Feature Extraction and Analysis of Motion Recognition

UCI daily and sports dataset and USC-HAD dataset were used. UCI daily and sports has a total of 8 people’s data, using acceleration, gyroscope, and magnetometer sensors to collect data, a total of 19 categories of daily activities. USC-HAD has a total of 14 people’s data, using acceleration and gyroscope sensors, a total of 12 kinds of daily activities; we combine the elevator up and down movement activities into one activity, so there are 11 kinds, select two people from two datasets to study the distribution of characteristics, use their 80% data to train the model, and change the number of output nodes in the last layer of the full connection layer to 2, that is, to extract the two-dimensional characteristics of each activity through this layer and observe their distribution in the two-dimensional coordinate axis.

In Figure 6, the horizontal axis and the vertical axis, respectively, represent the coordinate values of the two-dimensional features. From Figure 6, the following phenomenon is observed: the distribution of many class activity features is very close, and some have intersection, which is called too small class spacing. Comparing the data collected by UCI daily and sports and USC-HAD, it can be concluded that the distribution of UCI-based datasets is more dispersed than USC datasets, showing too large class spacing, while too small class spacing of USC datasets may lead to feature mixing, confusion of classifiers, and failure of correct classification. The distance between the feature centers cannot be defined simply by the distance between the feature centers [6, 13]. The features of activity recognition are distributed in a long strip, and we think that the class spacing is the distance between the edges where the distribution of the two types of features is closer. The class space is the mean square distance between the sample points of the same pattern. Because this is a two-dimensional feature, it can be measured by Euclidean distance. However, the general features are high-dimensional features, and its measurement needs further research. The feature extracted by CNN is often distributed in a long strip. The distribution of the same kind of activity tends to be very long, which is called intraspecific distance. In fact, just using one person’s data can easily lead to over fitting. The characteristics of activity recognition data show the distribution of small distance between classes and large distance within classes. If the distance between classes is too small, the features may be mixed and the classifiers may be confused. If the distance between classes is too large, the features of the edge will be closer to those of other classes, which will bring difficulties to the recognition. The similarity of human subjective activities cannot be compared with the similarity of sensor data features.

4. Moving Object Detection Based on Deep Learning-Based Wireless Sensor Networks

According to the characteristics of the traversal process, the histogram measurement model of the received signal strength value based on the detection window is established through the single link experiment; two sensor nodes are used to form a link of a certain length, and the target to be detected and located is arranged to move back and forth evenly on the link under a certain environment, and the statistical experiment of attenuation of the received signal strength value of the single link is carried out, and the length and environment of the link are described. The process of repeating steps is to average the attenuation distribution of the received signal strength value, and the histogram of the reference received signal strength value is obtained. In the wireless sensor network, a circular window with radius R is defined as the detection window, and the link passing through the detection window in the wireless sensor network is defined as its affected link. In the process of time localization, the distribution statistical histogram of attenuation of the received signal strength value of the affected link is the histogram of the candidate received signal strength value.

The histogram measurement model of the received signal strength value and the PAP distance are used to calculate the positive position of the possible target, and the deep learning algorithm is used to realize the target detection and location. When the detection window scans the whole network area and every time a position is scanned, a histogram of the candidate received signal strength value corresponding to the corresponding position can be obtained. When the candidate received signal strength value histogram of each position at the current time is obtained, they are compared with the reference received signal strength value histogram at one time. The histogram comparison method is used to measure the discrete square by using the Babbitt distance set a threshold for the distance between graphs, and when the distance between the candidate received signal strength value histogram and the reference received signal strength value histogram is lower than the threshold position, the candidate received signal strength value histogram is marked as the positive position; the deep learning algorithm is used to cluster all the positive positions in the network according to the spatial distance, and each class obtained represents a target, and the target number is the final number of classes [14, 15]. In addition, the deep learning algorithm includes the following steps: (1) all positive positions are defined as n different classes, and the distance between these classes is defined as the Babbitt distance between the positive positions contained in each class; (2) finding the pair with the smallest distance between these classes, if the distance between them is less than the preset threshold value, the distance between each class is defined as n; then, they are fused into a class, and the two positive positions are classified into one class, and their average coordinates are calculated as the location information of the new class; (3) the distance between the new class and all the classes left before is calculated.

By cascading the low-precision histogram and high-precision histogram, the positive position of the target is determined; the low-precision received signal strength histogram of each position is calculated to exclude a large number of areas that obviously do not exist in the low-precision received signal strength value histogram; in the remaining area, continue to calculate the high-precision received signal strength value histogram, and then, pass the high-precision received signal strength value histogram of the received signal strength value which further confirms the positive position of the target [16]. Figure 7 shows the schematic diagram of RSS value time distribution based on the single link.

As shown in Figure 7, there are two sections of RSS value data from different time periods on the same link. During this period, there are targets moving back and forth on the link. It can be calculated that the characteristic parameters of the two sections of data are basically constant: −59.5 dbm, 4.59 dbm, and −58.7 dbm, and it can be seen from the above calculation that, as long as the environment is generally stable, the RSS measurement value on the link can be regarded as a stable process.

As shown in Figure 8, the upper part gives an RSS attenuation histogram on a 3 meters long link, and the lower part gives the average of the RSS attenuation histogram measured on different links. It can be seen that their distribution is very similar, and the RSS measurement value on the link can be regarded as a traversal process, which means that our reference RSS histogram in time statistics is comparable with the candidate RSS histogram based on spatial collection statistics in actual operation.

As shown in Figure 9, the detection accuracy of the histogram using traditional methods in literature is lower than that using the deep learning algorithm. Through the cascade method of the low-precision histogram and high-precision histogram, the method to determine the positive position of the target includes the following steps: for each frame detection and positioning results, the detection window needs to scan the whole network and calculate the histogram of each position and must be compared with the reference RSS histogram. In order to save computation, the first step is to calculate the low-precision RSS histogram of each location. Through these histograms, a large number of regions that obviously do not have targets are excluded, but the traditional methods in literature do not have such advantages.

Figure 10 shows the horizontal and vertical comparison of the real trajectories and the tracking trajectories of the three targets; for most of the time, the estimated position of each target is in good agreement with the real position, and there are a few positions where the estimation error of the number of targets occurs.

5. Conclusion

This paper mainly studies the sensor data in the field of moving target recognition and applies the deep learning algorithm and migration method. Make full use of the ability of mobile devices to communicate, collect and process data, and collect users’ travel data through different wireless sensors. After the traditional statistical features are extracted from the sampled data, the depth features are learned through a variety of different deep learning network models, and various important superparameters are monitored and adjusted to improve the recognition accuracy. Through the single link experiment to establish the received signal strength histogram measurement model based on the detection window; use the received signal intensity histogram measurement model and Babbitt distance to calculate the possible positive position of the target and use the hierarchical clustering algorithm to achieve target detection and positioning; through the cascade method of the low-precision histogram and high-precision histogram, determine the positive position of the target. The fast real-time detection function of the deep learning algorithm based on the wireless sensor network is realized, and the detection and positioning method does not produce the error accumulation phenomenon, and the established measurement model has sufficient theoretical basis, high precision, and strong robustness; at the same time, the reference of the histogram level online system makes the overall calculation consumption keep within the range of real-time use. Through fine-grained recognition of moving targets, the reliability of recognition is improved, thus ensuring the detection of high precision, low energy consumption, and robustness.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.