Abstract

As one of the crucial sensing methods, multisensor fusion recognition aids the Internet of Things (IoT) in connecting things through ubiquitous perceptual terminals. The small size, sluggish flying speed, low flight altitude, and low electromagnetic intensity of unmanned aerial vehicles (UAVs) have put enormous strain on air traffic management and airspace security. It is urgent to achieve effective UAV target detection. The radio monitoring method, acoustic detection scheme, computer vision, and radar signal detection are commonly used technologies in this field. The radio monitoring approach has low accuracy, the acoustic detection strategy has a limited detection range, computer vision is limited by weather conditions, and the radar signals at low altitudes are influenced by ground clutter. To address these issues, this paper proposes an information fusion strategy based on two levels of fusion: data-level fusion and decision-level fusion. In this strategy, Computer vision and radar signals complement each other to improve the detection accuracy. For each level, the method of information fusion is introduced in detail. Furthermore, the effectiveness of the method has been demonstrated by a series of comprehensive experiments. The results show that the accuracy of the fusion method is improved, and the proposed method can still work even when the single method loses function.

1. Introduction

The rapid development of the types and quantities of UAVs [1] has broadened their range of applications from military [24] to civilian (e.g., smart agriculture [5, 6], smart city [7], and surroundings [8]). These show that UAVs will have a significant impact on future production and lifestyle. Furthermore, certain UAVs may be illegally used which may disturb the normal airspace order [9]. These issues will put pressure on air traffic management and even threaten airspace security. Faced with these problems, it is crucial to employ adequate detection methods to accurately detect and track UAVs. UAV target detection refers to the functions of positioning, detecting, warning, or classifying through related technologies. By achieving this goal, researchers can obtain more precise information on UAVs and provide support for the following activities, such as path planning, obstacle avoidance, and equipment maintenance. Computer vision, capture remote sensing, and radar detection are the three main approaches in this field [9]. In this paper, we focus on the combined use of computer vision and radar detection.

The identification of UAVs based on computer vision mainly analyses the images captured by cameras or snapshotted from video streams. These digital images may contain different kinds of UAVs, which will be judged by the corresponding algorithms. Benefitting from easy image acquisition, simple data processing, and low equipment cost and weight, this method has attracted lots of attention and has been used commercially, such as CASIA [10]. In [11], the author proposed the CenterNet method for drone target detection, applied TensorRT to accelerate and split the network model, and proposed a method for location tracking using multiple cameras. But this method also has some limitations in this special field. For example, the method based on computer vision cannot work properly during the night.

Radar is another commonly used equipment for detecting UAVs. By processing the reflection of electromagnetic waves, the direction, distance, position, and speed of UAVs can be determined. Electromagnetic waves will never be affected by light and can penetrate clouds [12]. Therefore, radar can easily detect high-altitude targets. All-round airspace surveillance has been provided by radars with the continuous development of radar technology and application of passive radar [13]. In [14], micro-Doppler characteristics are used to identify the UAVs and the cepstrum method is used to extract the number and speed information of the UAV rotor. However, as an application of electromagnetic waves, radar signals are easily interfered by other electromagnetic waves [15]. At the same time, the radar may be unable to detect low-altitude targets. As a result, the radar approach is not omnipotent. There will be a detecting-blind area in some cases.

To overcome the shortcomings of computer vision and radar signals in detection, we propose a set of information fusion methods based on combining computer vision and radar signals to overcome the limitation of using a single method and maximizing the detectable range of the UAVs. In the process of information fusion, we separately fuse the two levels of the information, namely data and decision. By implementing the fusion of the two levels, we examined the efficiency of the information fusion methods and compared the improvement of detection accuracy with a single method.

The main contributions of this paper are as follows:(1)According to the two levels of information fusion, information fusion models suitable for computer vision and radar signals are given, respectively. For each model, the modules in it are proposed in detail to prove its feasibility in theory. The methods used in the model can be replaced to make the model extensible.(2)For each model, an information fusion implementation method using specific methods is designed. In order to prove the effectiveness of this method, comprehensive experiments were performed to compare the detection accuracy before and after fusion. The experimental results showed that the two specific implementation methods are feasible and effective.

The arrangement of the remaining sections in this paper is as follows: Section 2 discusses the related work of computer vision and radars in UAV detection, and discusses the existing information fusion methods. Section 3 introduces the proposed method and gives a specific implementation method of the data fusion model and the decision fusion model. Section 4 conducts comprehensive experiments, implements two methods, and analyses them separately. Finally, the conclusion of this paper is given in Section 5.

Figure 1 depicts a certain scene of identifying UAVs. In the city, radars and cameras on rooftops maintain monitoring the airspace. When an alien UAV enters the monitoring area, it will be detected by both cameras and radars. Then, the system can identify the UAV by fusing data collected by cameras and radars. Because both cameras and radars are used, UAV identification is more efficient and accurate than either method alone.

2.1. Computer Vision

Artificial intelligence (AI), particularly deep learning, has offered significant technical assistance for computer vision. To achieve the objective of detection, cameras are used to acquire images, and the corresponding algorithms are used to determine whether there are specific targets. This technology has been used widely in facial recognition, medical diagnosis, UAV detection, and other sectors, with positive results. The following methods are commonly used in UAV detection.

2.1.1. Convolutional Neural Network (CNN)

CNN is widely used in UAV detection because convolutional kernels can effectively extract target features from images. In order to solve the detection of UAVs in a video sequence, Aker and Kalkan [16] proposed an end-to-end model based on CNN. Simultaneously, they also proposed an algorithm based on a background subtracted to solve the problem of insufficient data in the training model. In order to identify UAVs accurately and determine their types and flight modes, Allahham et al. [17] proposed a new detection method. This method used multichannel-dimensional CNN and achieved good results in the DroneRF dataset. Reducing background interference can improve the detection accuracy; therefore, Zhang et al. [18] used Mask R-CNN to eliminate the invalid area in the UAV detection process and used the attention mechanism to detect the targets.

2.1.2. YOLO (You Only Look Once)

Unlike other methods, the YOLO method only requires one recognition procedure. Hu et al. [19] improved YOLO v3 to make it more suitable for detecting small targets, for example, UAVs. Owing to the unique advantages of YOLO, how to achieve real-time detection has become the focus of UAV detection [20, 21].

Other computer vision methods are also being applied in this field, such as boosting [22], fuzzy clustering [23], and multiple neural networks (MNNs) [24]. These methods have unique characteristics and play an important role in detecting UAVs.

2.2. Radar Methods

Radar plays a significant role in the detection of UAVs due to its intrinsic properties. From military to civilian applications, this technology began to evolve toward ease of use and low cost. The radar technology used in this field mainly includes digital array radar [14], multi-input-multioutput (MIMO) radar [25], continuous wave radar [26], synthetic aperture radar (SAR), and inverse synthetic aperture radar (ISAR).

SAR can penetrate clouds, smoke, and fog, and produce high-resolution images [2729], which can reduce the impact of weather conditions in detection. In order to detect suspicious UAVs and reduce the cost, Park et al. [30] designed a set of systems based on low-cost SAR. The systems had the characteristics of autonomy and mobility, and performed well in the tests.

During the evolution of SAR, ISAR occurs and plays an important role in the detection of long-range targets, because it can provide high-resolution imaging. Pieraccini et al. [31] examined the radar cross section (RCS) of tiny UAVs and employed ISAR for 2-dimensional (2D) and 3-dimensional (3D) imaging, and the experimental results were good. Authors in [32] proposed a method of introducing Bayesian statistics into ISAR to solve the problem of a small RCS. The efficiency of this strategy was confirmed by simulation, which used posterior probability density to determine the imaging results.

2.3. Information Fusion Methods

Some studies examined multiple types of fusion procedures in order to enhance accuracy. In the research of Kim et al. [33], new images are synthesized for UAV detection; this method combined the time-domain and frequency-domain information of the micro-Doppler signature (MDS), and these images were the data set of classification. Training in the CNN could improve the accuracy by more than 5%, which proved the effectiveness of this merging method. Joshi et al. [34] reviewed 112 articles, all of them fused optical and radar remote sensing data, which is of great significance to the research of this paper. These studies have been applied in the field of land, and many studies showed that the effect of using the fusion method was better than using a single method. At the same time, for the traditional classification algorithms, the most commonly used method was to fuse before classification, with pixels as input. This review discussed the related articles from multiple perspectives and fully explained the application status of the fusion method.

3. Methods to Identify UAVs by means of Information Fusion

The redundancy design of the system helps to improve system performance and robustness. This work introduces the information fusion method based on computer vision and radar signals to improve the detectable range of UAVs. A two-level information fusion system including data fusion and decision fusion is designed in this work. On one hand, the UAV’s position namely the coordinate is the primary target in the data fusion part. On the other hand, decision fusion aims to fuse the unique feature of the UAV. Figure 2 presents the UAV identification system based on information fusion.

3.1. Data Fusion
3.1.1. Digital Data Processing

When a 3D object is photographed by an optical camera, its image will be turned into a 2D image, but its relative position in the picture remains unchanged. An image is composed of many pixel points. Pixel coordinates can be used to express an object's position while determining its location. However, it is necessary to select an appropriate coordinate system. The image coordinate system, camera coordinate system, and world coordinate system are the three types of coordinate systems used in the photographs. Among them, the image coordinate system is a 2D coordinate system, and the other two coordinate systems are 3D coordinate systems. In this paper, the camera captures images of the UAVs directly, so the camera coordinate system is the best option for calibrating the UAV’s position. In the processing of digital data, the position of an UAV is marked with 2D coordinates.

3.1.2. Radar Data Processing

Broadband radar can identify the target's direction and distance via echo. Through the micro-Doppler effect of the UAV, the position of the target can be obtained, and then, according to the orientation, other characteristics such as the height of the target can also be obtained. Therefore, compared with the camera, the radar can achieve height measurement; therefore, the target’s position obtained in this way will be 3D information.

3.1.3. Data Fusion Method

In order to better detect the same UAV targets, the combination of optical images and radar positioning can be used. If the coordinates of the two UAVs are the same after being transformed into the same coordinate system, it can be determined that this is the only UAV, so as to realize the detection of UAVs at the data level. It is actually a perspective projection problem to transform the 3D coordinates of the object obtained by the radar into the 2D coordinates of the images taken by the camera. This problem can be solved in three steps as follows:

(1) Determine the Projection Plane. The 3D coordinates of the object obtained by the radar are based on the radar being the origin of the reference system, so the observation point of the optical camera often does not coincide with the position of the origin of the coordinates in the radar. As shown in Figure 3, determine the reference point A, the observer coordinate S , take any reference direction point B, and then, set a sight distance to determine the projection plane HPFK. In this case, a projection plane equation can be determined as follows:

, , and are calculated by

is determined by

(2) Determine the Projection Plane Coordinate System. After the reference point A is given, the line between the observer and the reference point is the normal vector of the projection plane. The equation of this straight line I is as follows:

The straight-line equation and the projected plane equation can be combined to find the intersection point D; this point is set as the origin of the projection plane. In the same way, the intersection point E of the reference direction point with the observer line and the projection plane can be obtained. These two 3D points are two relative 2D coordinate points on the projection plane, and the vector is set to the positive y-axis direction on the projection plane.

Using the straight line I as the axis, rotate the reference direction point around this axis (viewed by the position of the observer) clockwise by 90°, so that there will be no uncertainty (i.e., no 2 points will be generated), and a new coordinate point C will be obtained. Similarly, connect this point with the observer and find the intersection point F of the connection line with the projection plane so the vector is in the positive x-axis direction on the projection plane.

(3) Conversion of 3D Coordinates to 2D Coordinates. When determining the coordinates of the target point, the observer and the target are directly connected, and the intersection point with the projection plane is the projection point of the target on the 2D plane. The origin, the x-axis positive direction, and the y-axis positive direction have been determined on the projection plane before, so that the projection coordinates of the target point on the 2D plane can be obtained.

Figure 4 shows how we convert the coordinates of the drone in the picture to three-dimensional coordinates of the real world. Starting with radar coordinates , the coordinate of the camera is . The maximum horizontal and vertical viewing angles of the camera are and , respectively. The length and height of the photos are . Set up a cartesian coordinate system to locate the coordinates of the points on the photo. The coordinates of the upper left corner are and the coordinates of the lower right corner are . The coordinates of the UAV in the picture are . The coordinates of the UAV in the cartesian coordinate system with the centre of the picture as the origin are . and represent the angle at which the drone is deflected relative to the direction of the camera.

The direction vector of camera erection is . Then, the direction vector of the UAV relative to the camera is

The connection between the camera and UAV can be expressed as , where is a parameter.

Similarly, another parametric equation of the connection between the camera and UAV can be worked out, and the intersection point of the two straight lines is the UAV coordinate.

3.2. Decision Fusion
3.2.1. Digital Decision Processing

After YOLO was put forward [35, 36], it has been widely used because it can detect the target in real time. When it came to YOLOv3 [37], the accuracy of detection for small targets has been improved significantly. For UAV detection, the YOLOv3 algorithm is more suitable. First of all, real-time detection can meet the detection requirements of UAVs in fast flight: different from static objects or slow-moving objects (such as pedestrians and ships), the requirements for time are not very strict, but the UAV’s speed may be faster; hence, the requirements for time accuracy will naturally improve. Secondly, UAVs may have a high flying altitude, which may be the very small targets in the camera. The feature extraction effect using the CNN and other methods may not be obvious. However, YOLOv3 rebuilt the neural network structure and reconstructed the loss function, focusing more on the detection of small targets, which is also suitable for UAV detection. Based on the abovementioned reasons, YOLOv3 is used to process the optical images, and the detection results are obtained before the decision fusion. Figure 5 depicts the workflow of YOLOv3.

Simple Online and Real-time tracking (SORT) is a simple and efficient tracking method based on the Kalman filter and Hungarian matching algorithm. The main shortcoming of the SORT algorithm is that the association metric it uses is valid only when the uncertainty of state estimation is low. Otherwise, tracking will fail when the target is covered.

On the basis of the SORT algorithm, the DeepSORT algorithm combines the motion information and the appearance information of the target as the association metric. In this way, the DeepSORT algorithm can track the occluded target.

The DeepSORT algorithm uses the results of the detector to initialize the tracker, and sets a counter for each tracker. The counter is accumulated after Kalman filtering. When the prediction result matches the detection result, the counter is set to zero. If no appropriate detection result is matched within a period of time, the tracker will be deleted.

The DeepSORT algorithm combines motion information and appearance information to match the prediction box and tracking box by using the Hungarian algorithm. For motion information, the algorithm uses Mahalanobis distance to describe the connection degree of prediction results and the detection results. When the target movement information uncertainty is low, the Mahalanobis distance is a suitable correlation factor. However, when the target is blocked or the lens view is shaken, only the Mahalanobis distance correlation will lead to a target identity switch, so appearance information should be considered. The Mahalanobis distance can provide reliable target location information in a short-term prediction, and the cosine similarity of the appearance feature can be used to recover target identification (ID) when the target reappears. Using linear weighting, the two methods complement each other. Figure 6 depicts the workflow of the DeepSORT algorithm.

3.2.2. Radar Decision Processing

Chen [38] found that the micro-Doppler phenomenon caused by micromotion can also be observed in the microwave radar system, proposed the mathematical expression of the micro-Doppler effect, and believed that it has a potential application value in target feature extraction. In addition to its own main direction movement, a target may also have other mechanical movements. These additional mechanical movements will cause the frequency modulation of radar echo, resulting in the micro-Doppler effect.

The influence of different kinds of objects’ mechanical motion on radar echo is quite different (such as the vibration of birds’ wings and the vibration of UAVs’ rotors). Therefore, the use of micro-Doppler features has a good effect in distinguishing different types of objects. At the same time, the micro-Doppler effect of UAVs will also be greatly different due to the speed, number, length, and other factors of the rotors; therefore, it also plays an important role in identifying different types of UAVs.

For micro-Doppler feature extraction, the Fourier method and time-frequency analysis method are mainly used. The frequency information related to the time cannot be obtained by the Fourier method, so it is not the mainstream method of micro-Doppler feature extraction. For time-frequency analysis, the main methods include short-time Fourier transform [39], generalized S-transform [40], and Gabor transform [4143]. Gabor transform is a short-time Fourier transform with a Gaussian window. It has no cross-term and faster operation speed also has obvious time-frequency characteristics, so it is suitable for extracting micro-Doppler features of UAVs. This is also the reason why this method is chosen in this paper.

We initially estimate the position of the UAV using a one-dimensional image after receiving the radar echo of the UAV target. Then, based on the target's distribution, we choose a suitable approach for separating the UAV target. Finally, to fulfill the goal of UAV radar system identification, we employ the cepstrum approach to extract the properties of the UAV Figure 1 shows the workflow of identifying UAVs by radar system.

3.2.3. Decision Fusion Methods

When multiple methods are used to detect the targets, all the detection results can be fused in the decision-making stage, which is an effective method to transform the weak classifiers into the strong ones. This is also an important idea of the boosting algorithm [44].

X for an unknown target of a certain category, for the ith detection method, for the probability vector output by , for the recall rate of , and for the accuracy rate of . , , and , where , represents the number of detection methods to be fused; , represents the number of target categories. The following rules were used for decision fusion.

Maximum rule:

Minimum rule:

Mean rule:

Product rule:

Recall rule:

Accuracy rule:

Figure 2 depicts the decision fusion model. The model shows that the two approaches are independent before a single detection result is obtained. Multiple detection methods will not interfere with each other because of this independent processing, and using both picture and radar forms can ensure the most data diversity. Because the detection methods utilized can be altered, the model has scalability. The data properties reflect this decision-level fusion as well. The usage of images mainly extracts features from targets such as lines and textures of objects, and judges whether the targets exist. The use of micro-Doppler signals mainly extracts time-frequency information, and analyses the change of frequency over a period of time to infer the target’s status. Therefore, before the decision fusion, it can also be seen as the interaction of different features. These features with large differences can jointly complete the task of target detection.

4. Results and Discussion

The previous sections explain how the system works. In the actual work, the identification results may be affected by external conditions. Therefore, this paragraph will use the measured results to verify the accuracy of the system’s identification of UAVs.

4.1. Introduction to the Experimental System

Considering the actual operation scenario of the system, the experiment was carried out outdoors. The test equipment include an antenna, vector network analyzer, high-speed camera, turntable, computer, one single rotor UAV and one quadrotor UAV, signal amplifier, and power supply. The device was connected as shown in Figure 8.

4.2. Outdoor Test

Figure 9 shows the experimental system.

We conducted the outdoor experiments in three different scenarios: one with good light and a short detecting distance, one with poor light and a short detecting distance, and one with good light and a long detecting distance. The reason why we chose these three scenarios was to check whether the information fusion method works well when the single method cannot work well in a bad situation. We know that the computer vision method fails to identify UAVs when the light is poor. As to the radar method, UAVs are small targets and their echo signals are weak. It is hard to extract useful information of targets when the targets are far enough. However, the common rotors of UAVs are composite materials and their echoes are even weaker than those made of metal. So, the radar may be unable to detect UAVs when they are far beyond the detection range. In our experiments, we choose 4 meters as the short detecting distance and 8 meters as the long detecting distance.

The outdoor experiment ran three sets of tests on single-rotor and quadrotor UAVs for every scenario. After the UAVs are launched, the turntable is activated so that the radar system and camera system can scan the UAVs. The radar system and camera system will collect data and transmit it to the computer for processing. The processed data of the images taken by the camera are shown in Figure 10.

4.3. Experimental Results and Discussion

It can be seen from the experimental results that the identification system confirms that the radar system and the camera system identify the same target by determining the target coordinates first. Then, the system will compare the identification accuracy of the two systems. Finally, the identification system will output the result of the system with higher identification accuracy in different environments, which improves the accuracy by a maximum of 9.5%. At the same time, the robustness of the identification work is guaranteed (Table 1).

The single identifying method whether the computer vision method or radar works well when the light is good enough and the detecting distance is short. However, the detecting accuracy of the single method will never be higher than the proposed information fusion method because the maximum rule is chosen for decision-making in the proposed method. The detecting performance is significant and the robustness of the identification work will be guaranteed in spite of a higher cost. This is more practical.

In this scenario, images of UAVs from the camera are clear enough, and it is easy to identify UAVs and track their location in the images. The detecting distance is also critical for radar detecting of UAV targets. Because the UAVs are made of composite materials to lose weight, the radar echo is much weaker than those made of metal. The micro-Doppler signature is produced by the rotors on the UAVs. Usually, the rotors are tiny compared with the whole UAV body. Under current technology conditions, a short detecting distance is necessary to ensure the extraction of the micro-Doppler signature. Or else, radars cannot receive a strong enough echo of UAVs, and the extraction of the micro-Doppler signature will never be accomplished.

In Table 2, it is obvious that the recognition function of the computer vision method is out of operation in a poor light scenario. The cameras fail to capture images in dark environments. Therefore, the recognition could never be carried out. The computer vision method fails to detect UAVs in this kind of scenario. It is dangerous for airspace surveillance. However, the radar can still work well in dark environments. The proposed information fusion method still works well because it is composed of the radar system. Additional equipment contribute to more robust performance.

Table 3 shows the weakness of the recognition performance of radar when the UAVs are far from radars. The radar echo is weak when the distance is long. The longer the distance, the weaker the echo, especially when the UAVs are made of composite materials such as plastic and carbon fibre. However, the computer vision method can still work in this scenario, though the images of UAVs are smaller in the picture captured by cameras; the robustness of the identification work is guaranteed as well.

From experiments 2 and 3, we can conclude that the proposed method can always guarantee the detecting function no matter which part loses function. It is more clear from Figure 11. We equipped additional devices to obtain a more robust system function regardless of extra costs because, in airspace surveillance, the detecting performance is more critical.

5. Conclusions

In this work, we presented a UAV target identification method based on information fusion of computer vision and radar signals. The system uses coordinates to confirm that the radar system and camera system are identifying the same target. Then, the system will compare the identification results of both single systems to give the final identification result. The comprehensive experiments verified that the system can identify a single-rotor UAV and quad-rotor UAV. And it is superior than the single method. It is a worthy choice to obtain a more robust system function regardless of extra costs.

In the future, we will replace YOLOv3 with the latest YOLOv5 which will provide better performance. After some improvements to YOLOv5, we will also try to construct a set of UAV optical image real-time detecting systems with a hardware platform. Due to a shortage of time, the UAV data set is insufficient. Situations in the real world are more complicated. Therefore, more UAVs, seasonal elements, and targets similar to UAVs (birds and kites) can all be added to continue to improve UAV data sets. More decision-making algorithms are also in the plan.

Data Availability

The data used to support the findings of this study are included within the article.

Disclosure

A preprint has previously been published [45].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work by Chaoqun Fang was supported by the Central Guidance on Local Science and Technology Development Special Fund of Shenzhen City under Project no. 2021Szvup079. The work by Tao Hong was supported by the National Natural Science Foundation of China under Grant no. 61827901.