Abstract

For precise detection and positioning of weapons and equipment under complex ground backgrounds and weather-changing aerial backgrounds. Compared with the traditional convolutional neural networks, the Capsule Network (CapsNet) is more suitable for identifying weapons and equipment in complex backgrounds because it uses vectors as input for the first time, which can well retain the characteristic information such as the direction and the angle of the target. Therefore, this paper proposes a radar target classification algorithm based on the combination of CapsNetv2 and infrared lidar, which simplifies the convolutional layer of the traditional 9 × 9 capsule network through a 1 × 1 reduction layer and a 3 × 3 convolution kernel, and adopts a double-layer capsule layer. Two prediction frames are obtained to improve the recognition accuracy; at the same time, the output volume retains the direction and the angle, which can more accurately classify the radar targets in various complex backgrounds. Applying the method proposed in this article to the MSTAR dataset shows that the radar target positioning is accurate. The rate increases to 99.5%. Finally, compared with the AlexNet and the YOLOv4 methods designed by Alex Krizhevsky, the proposed radar target recognition method can accurately and quickly identify weapons and equipment from complex backgrounds. The results obtained from the CapsNetv2 are accurately compared with other methods’ in complex backgrounds. The proposed method significantly improves the efficiency of military inspections.

1. Introduction

Radar technology plays an important role in the field of modern target detection. It is widely used in military and civil transportation fields due to its all-weather and omni-directional work characteristics. Target recognition is one of the basic tasks in computer vision. Identifying the target area and obtaining the accurate position of the target lay the foundation for the next information processing of the carrier and improve the perception ability of machine recognition. The current main model is to process the visible light image accordingly. However, the visible light image is susceptible to environmental lighting. Under low light, dark or shadow conditions blocked by surrounding interference, the processing data becomes more complicated. To achieve high-reliability classification and recognition effects, modern pattern recognition theories and methods are usually used for classifier design, such as statistical-based pattern classification methods, feature extraction methods, and neural network-based pattern classification methods.

Statistics-based classification and recognition algorithms use probability models to obtain the feature vector distribution of each category and classify the unknown samples. For example, Shen Yanyan obtained the likelihood function by extracting the ocean wave radar echoes and used Bayesian classifiers for classification [1]. Liu Jingrui and others established a weather radar warning system using probability statistical models to distinguish between strong and weak rainfalls [2]. However, the weather environment is complex and changeable, and there are many interference factors, making the actual task of processing the radar signals much more difficult than processing and identifying the visible light images.

The traditional feature methods mainly match the known features by extracting the feature points of the target. Commonly used feature matching methods include the histogram of oriented gradient (HOG) feature [3], the scale-invariant feature transform(SIFT) feature [4], and the speed-up robust features (SURF) feature [5]. In 2001, the American company ENSCO developed the Visual Identity System (VIS) track video detection system to realize real-time detection of the working status of PandaPal fasteners [6]. In 2005, the German railway engineering company GBM Wiebe developed the GeoRail-Xpress comprehensive inspection vehicle that was able to perform a real-time inspection of the entire railway electrical equipment system [7]. However, because it is necessary to extract and classify multiple regions of the image, the recognition speed was slow, and it was difficult to meet the requirements of real-time detection.

In recent year, the target recognition algorithms based on deep learning have made significant progress compared with the traditional target detection algorithms. The representative algorithms include R-CNN [8], Fast-RCNN [9] and Faster-RNN [10]. However, the detection steps of these methods are more complicated, and the real-time effect is poor. The AlexNet [11] and YOLO [12] that have appeared one after another can meet the requirements of real-time detection, but often require a lot of data for neural network training, and the training weights are easy to overfit. Moreover, the technical requirements for the equipment are relatively high. In 2017, Hinton proposed that the Capsule Network, referred to as CapsNet, would possibly replace the traditional CNN network, bringing new opportunities to the field of deep learning [13]. For example, the literature [14] used the capsule network to classify handwritten digits. Due to the single characteristics of the digits, the recognition rate is high. However, the radar targets are generally weapons and equipment with complex structures and are easily affected by the conditions such as illumination and angle.

This paper proposes a radar target recognition and location algorithm based on CapsNetv2. For the characteristics of weapons and equipment, light intensity, position, deformation, angle, texture, and position information should be considered. Therefore, these six features are selected as the input vectors, and then a 1 × 1 reduction layer combined with a 3 × 3 convolutional layer is used to simplify the traditional Capsule 9 × 9 capsule neurons. Then the MSTAR dataset is trained and learned through the double-layer capsule network, and two prediction boxes are obtained, one large and one small, to complete the recognition under different complex backgrounds. Finally, the improved CV model and the infrared Lidar uses edge detection to accurately locate the location of hazardous enemy weapons and equipment.2 Structure.

A paper for publication can be subdivided into multiple sections: title, list of all the authors and their affiliations, a concise abstract, keywords, main text (including figures, equations, and tables), acknowledgement, references, and appendix.

2. Capsnetv2

In 2011, Hinton proposed the concept of capsule [15]. Unlike the traditional scalar neurons, the capsule network is a vector composed of many neurons. The vector length of the capsule neuron model indicates the possibility of the existence of the target passed by the upper network, and its direction represents the actual state of the entity, that is, “Instance parameters”, as shown in Figure 1 [16].

The dynamic routing algorithm (Squash) solves the problem by the output value of the capsule.

The update formula is: where is the dynamic routing coupling coefficient and is the number of initial similarity weights .

The capsule output is obtained from the lower-level capsule inputs and : where is derived from ,

is the weight of the capsule network.

The output should be expressed as a probability. Thus, the output value should be controlled between [0, 1], which can be obtained by nonlinear compression:

The principle of CapsNetv2 is roughly the same as that of the capsule network. The image is first input to the convolutional layer (ReLu), and a basic capsule layer is obtained through the convolution operation. Then the data of the basic capsule layer is transmitted to the image through the dynamic routing algorithm (squash). The capsule layer then transfers the image capsule layer data to the feature capsule layer, and finally uses the fully connected layer to reorganize and model the feature capsule layer data. However, the CapsNetv2 consists of two image capsule layers and two feature capsule layers. If training is performed when the data of one layer of the capsule layer has over-fitting, it can ensure the success of the training of the other capsule layer. The structure of the CapsNetv2 is shown in Figure 2.

The radar target image is composed of 3 categories, which are set as BTR70(armored transport vehicle), BMP2(infantry fighting vehicle), and T72(tank). The moduli of the three types of target vectors are calculated and the vector with the largest modulus value is the category with the highest possible target probability.

The AlexNet, Yolov4 and the traditional capsule network are used to compare the performance of the CapsNetv2 and classify the image dataset. Table 1 compares the Top-1% and Top-5% classification performance of each model for the same dataset, where the GPU model is Titan X, and the CPU model is Intel I7-10700(4GHz).

As shown in Table 1, CapsNetv2 has higher classification accuracy in Top-1% and Top-5% compared with AlexNet, Yolov4 and CapsNet. Moreover, the recognition time of GPU and CPU is less, indicating that the CapsNetv2 has better performance.

3. Principles of Radar Target positioningSubheadings

3.1. Radar Image Preprocessing

In the radar target recognition technology, the collected radar image contains various disadvantages such as noise, jitter, and weak light due to its complex background, weather and other factors that will affect the model training and recognition results. Therefore, it is necessary to pre-process and correct the collected original image and then extract the feature value of the target and separate the target from the background.

The pre-processing steps include grayscale change, binarization, noise reduction, filtering and edge extraction. The specific flow chart of pre-processing is shown in Figure 3. (1)Perform grayscale processing on images of different categories in the mSTAR dataset. The results are shown in Figure 4(a);(2)Binarize the grayscale processed image to remove the influence of complex background, that is, set the pixel point to 0 or 255, where the target gray value is 255, and the other background is 0 as shown in Figure 4(b);(3)Noise will reduce the quality of the image, and the collected radar target image is usually accompanied by auxiliary equipment and anti-jamming equipment that contain a lot of Gaussian noise. Therefore, this article uses Gaussian filtering to process the image, as shown in Figure 4(c);(4)By comparing the radar target recognition effect with Robert, Sobel or LOG operator, the edge of the target detected by the Canny algorithm is more complete. Therefore, this paper uses the Canny algorithm to extract the edge of the radar target as shown in Figure 4(d).

3.2. Improved CV Model

The Chan-Vese(CV) model is used to divide the fuselage and barrel of the T72 tank. The energy function of the CV model is [17]:

In the formula, is the CV model constant; is the arc length of the curve C; is the length term, which can smooth the evolution curve; and are the weight coefficients, both greater than 0; is the image pixel gray value; and are the average gray values of the pixels outside and inside the image evolution curve, respectively, and is the regularization step function.

Considering that targets such as armored vehicles or tanks are regular models with regular shapes and horizontal symmetry, adding the level set method can better correct the contour topological changes. The level set evolution Euler-Lagrangian equation is: where is the global function, as the impulse function of the CV model; is the divergence operator and is the curvature of the evolution curve.

This paper selects the T72 heavy tank with obvious barrel characteristics as the segmentation object. The rectangle is set as the initial contour line through the CV model. The image needs to be corrected by Hough to accurately locate the damaged location.

Figure 5 shows the original image (a), the initial circular contour (b), the level set function (c) and the ellipse contour positioning result corrected by Hough transform (d). It can be seen that for the barrel with obvious characteristics on the tank, the rectangular initial contour of the CV model is modified by Hough change. The elliptical contour can better locate the whole part of the tank compared with the circular initial contour. Therefore, this article adopts Hough change. The revised CV model is used for the positioning of the T72 tank.

3.3. Infrared Lidar Positioning

Suppose is calibrated as the conversion matrix from the lidar coordinate system to the camera coordinate system, and the formula is as follows:

According to formula (9), the relative three-dimensional coordinates of the target in the camera can be obtained.

Let be the coordinate system of the camera, where is the optical origin, and a point in space corresponds to a point on the image plane, then: where is the focal length of the camera.

Let be the pixel coordinate system, and axes are parallel to axis to the right and axis down, respectively. If the pixel coordinates are scaled times on the axis and times on the axis, the relationship between the coordinate and the pixel coordinates is:

Let , , be rewritten into a matrix form through a homogeneous linear equation as: where is the parameter matrix in the camera.

By formula (12), the real coordinates of the target in space can be obtained, and then it can be combined with the CapsNetv2 to realize the recognition and positioning of the radar target.

4. Algorithm Implementation

Figure 6 shows the specific process of the proposed radar target image recognition and positioning model based on the CapsNetv2. (1)Input target images such as BTR70, BMP2 and T72 in the MSTAR dataset as different output vectors into the CapsNetv2(2)The primary capsule layer is obtained through the convolution operation of the 1 × 1 reduction layer and the 3 × 3 convolution layer. Then the two image capsule layers are, respectively, trained and predicted to obtain 8 × 8 ×255 and 16 × 16×255 two prediction boxes(3)The dynamic routing algorithm iterative formula (1) is updated to obtain the characteristic capsule layer(4)According to formula (4), the maximum probability of the vector output modulus can be obtained, and the classification with the maximum probability of the radar target is obtained, and the two prediction boxes are mutually verified(5)A clear and complete edge line is obtained through edge extraction of the identified target classification image(6)By improving the CV model and the precise positioning of the infrared lidar, the real three-dimensional coordinates of the target can be obtained

5. Experimental Verification

The method proposed in this paper is implemented using MATLAB R2014b and TensorFlow software. The 6000 radar target images with complex backgrounds in the MSTAR dataset are used as the training set, and 20% of the training set is randomly selected as the test set to verify the accuracy of the classification.

Example target images are shown in Figure 7. Among them, (a) and (b) are the T72 tanks in the sand and forest environments, (c) and (d) are the BTR70 armored vehicles in the sand and forest environments, and (e) and (f) are the BMP2 tank in the sand and forest environments.

Figure 7 shows that the CapsNetv2 can accurately identify radar targets in different complex backgrounds and has good robustness.

5.1. Different Network Training Effects

To verify the practicability and recognition accuracy of the CapsNetv2, the radar target images with different complex backgrounds were used for training, and the performance of the CapsNetv2 was compared with that of the deep learning models of AlexNet and YOLOv4. The learning rate and the step length were changed and their performances were compared to select the best value of the parameter. The learning rate was 0.5, and the total number of steps was equal to 3000 as the optimal parameter. The training results are represented by the loss values, as shown in Figure 8.

Figure 8 shows the loss functions of AlexNet, YOLOv4 and CapsNetv2. The following conclusions can be drawn: (1)The Loss value has shown an overall downward trend in the training of the three networks, and the first half of the decline is very fast. However, compared with the AlexNet and YOLOv4 networks, the initial loss value of the CapsNetv2 is only 0.9. This is because the AlexNet needs to scramble every time it reads the data, while in YOLOv4, the MSE loss itself has certain problems and needs to be replaced by IOU loss(2)After the step size reaches 3000, the final loss function value of the CapsNetv2 is equal to 0.00015, which is about ten times smaller than the loss value of AlexNet. This is because the CapsNetv2 uses a simpler convolutional layer and a protocol layer, and two image capsule layers for training. At the same time, a model can be selected that has not been trained over-fitting, so that the model has good robustness(3)Since there are no corner feature points in the AlexNet, and training on a small sample dataset cannot make the model more stable, once the loss value reaches 0.5, the model training becomes jittered and the training is terminated early. However, the CapsNetv2 retains different features information and training is more stable, which highlights the superior performance of the CapsNetv2

In order to verify and improve the learning performance of the capsule network, a database was used to randomly select the image data and compared with several other different algorithms. The results are shown in Figure 9.

It can be seen from Figure 9 that the recognition rate of the CapsNetv2 is higher than that of AlexNet and YOLOv4. Through continuous learning, the recognition accuracy reaches 99.5%. This is the result of learning by multiple vector capsules and retaining different feature vectors (such as amplitude and angle). At the same time, the two image capsule layers can be predicted separately, which reduces the phenomenon of over-fitting and the possibility of misclassification.

Table 2 compares the recognition times of different algorithms. It can be seen from the table that compared with AlexNet and YOLOv4, the CapsNetv2 has a shorter classification time and is more suitable for detecting radar targets in different complex backgrounds.

5.2. Target Positioning in a Complex Background

The improved CV model is used to locate the radar target image identified in the CapsNetv2. From Steps 4 to 6 in Figure 6, the precise positioning of the radar target includes edge extraction, CV model positioning and infrared lidar positioning correction. The positioning results are shown in Figure 10.

It can be seen from Figure 10 that the improved CV model proposed in this paper and then corrected by the infrared lidar can accurately locate the radar target images in different backgrounds.

From the different types of radar targets identified by the CapsNetv2, a group of 20 images were randomly selected for precise positioning of the radar target, and compared with the infrared imaging method and the local feature analysis method [18]. Table 3 compares the positioning accuracies of different methods.

Table 3 shows that the proposed radar target recognition and positioning method that combines the CapsNetv2 and the CV model is more suitable for small sample learning and has better training effects. Thus, the proposed method has higher positioning accuracy than the other methods and is suitable for different complex backgrounds.

6. Conclusion

With the continuous improvement of military warfare technology, the real-time detection of different radar targets under different complex backgrounds is particularly important. This paper proposes a radar target detection model based on the CapsNetv2 and the improved CV model modified by infrared lidar. The proposed model can identify radar targets in complex backgrounds and accurately locate their positions. The target positioning algorithm is simulated and experimentally verified. The following conclusions can be drawn: (1)The CapsNetv2 has strong self-learning and adaptive capabilities, and has a good training effect for small sample sets. It can effectively detect different types of radar targets and suppress interference caused by complex backgrounds. The recognition rate reaches as high as 99.5%. The reason is that the input of CapsNetv2 is a vector, which retains the feature information of the target to the greatest extent, and through the double-layer image capsule layer for training, it effectively reduces the over-fitting phenomenon and can more accurately classify different radar targets(2)The radar target image identified and classified by CapsNetv2 is segmented by the improved CV model, and finally corrected by the infrared laser mine, which can accurately locate the position of the target. The accuracy rate of the proposed method reaches 97.5%, which is more suitable for the precise positioning of radar targets than the other methods

The method proposed in this paper can better realize the radar target recognition under complex background and can provide accurate location information to meet the requirements of real-time inspection. However, the training time of the CapsNetv2 for a large number of images is relatively long. Thus, reducing the training time of the capsule network will be the focus of future research.

Data Availability

The [MSTAR] data and the [CaspNetv2 solution] data used to support the findings of this study were supplied by [Jiaxing Hao] under license and so cannot be made freely available. Requests for access to these data should be made to [Jiaxing Hao, [email protected]].

Conflicts of Interest

The authors declare that they have no conflicts of interest.