#### Abstract

Space target identification is key to missile defense. Micromotion, as an inherent attribute of the target, can be used as the theoretical basis for target recognition. Meanwhile, time-varying micro-Doppler (m-D) frequency shifts induce frequency modulations on the target echo, which can be referred to as the m-D effect. m-D features are widely used in space target recognition as it can reflect the physical attributes of the space targets. However, the traditional recognition method requires human participation, which often leads to misjudgment. In this paper, an intelligent recognition method for space target micromotion is proposed. First, accurate and suitable models of warhead and decoy are derived, and then the m-D formulae are offered. Moreover, we present a deep-learning (DL) model composed of a one-dimensional parallel structure and long short-term memory (LSTM). Then, we utilize this DL model to recognize time-frequency distribution (TFD) of different targets. Finally, simulations are performed to validate the effectiveness of the proposed method.

#### 1. Introduction

Space target defense is a fundamental aspect in modern air defense operations [1, 2]. Currently, target recognition technology is rapidly developing. Micromotion, as an intrinsic attribute, can be used as the theoretical basis for ballistic target feature extraction and recognition [3] and has attracted extensive attention in the field of target recognition. As the inherent feature of a moving target, it is well-known that the micromotion feature can be used to describe weak kinematic features. Thus, m-D features can be used as the theoretical basis for target recognition [4, 5]. However, multitargets existing in midcourse is a complex challenge in space target identification due to different motion forms [6]. Chen et al. established a model of common micro forms in [7] and realized the unification of the model. In [8], a micromodel closer to the real situation is constructed by using high-frequency electromagnetic calculation, and the echo data of precession cone target and human body are obtained. According to the theory of electromagnetic scattering, the electromagnetic characteristics of the space target are approximately equal to the sum of some scattering centers at high frequencies. In [9], the scattering center is divided into localized scattering centers (LSC), distributed scattering centers (DSC), sliding-type scattering centers generated by edge diffraction (SSCE), and sliding-type scattering centers on the space target curved surface (SSCS).

Many studies have been conducted on the m-D recognition of space targets by using the Cadence Velocity Diagram (CVD), time-frequency spectrograms, ISAR image, and other traditional methods. For instance, [10] proposed three technical frameworks of the m-D classification based on the Cadence Velocity Diagram (CVD). The genetic algorithm-general parameterized time-frequency transform (GA-GPTF) method has been proposed to accurately estimate the parameters of m-D [11]. Kei Suwa uses ISAR movie images to extract three-dimensional structural features of a target [12]. However, traditional space target recognition requires manual participation, and the recognition result depends on manual judgment.

In recent years, deep learning (DL) has been widely used in various fields [13, 14]. Note that target recognition is also inspired by the DL techniques. Kim and Moon [15] creatively used convolutional neural network to classify the micro-Doppler spectra of different targets. [16] used the DL network to recognize high resolution range profile (HPPR) images of different targets, and the recognition accuracy reached more than 90%. In [17], a new all-convolutional network was proposed to reduce the number of free parameters in the intelligent target recognition of SAR images. Compared with other images, time-frequency spectrograms have a strong temporal correlation, which can better reflect the m-D effect of space targets. Hence, in this paper, we propose a network structure that utilizes time-frequency spectrograms to classify space targets.

The remainder of this paper is organized as follows. The mode of warhead and decoy is introduced in Section 2. The deep learning network design of the proposed method is described in details in Section 3. The experiment verifies the accuracy of the method in Section 4. Finally, in Section 5, we present the conclusions.

#### 2. Model of Warhead and Decoy

The warhead moves in the form of procession or nutation as it is affected by spin and lateral disturbance. This is because the imitation decoy has no attitude-adjusting device, which would enable it to move by rotation or vibration [18].

##### 2.1. Decoy Mode

As shown in Figure 1, the radar coordinate system and target coordinate system are the left-hand coordinate system. According to the theory of scattering center, Point *P* can be considered as LSC.

For the decoy rotating, the target rotation axis is a straight line passing through the point ; the azimuth and elevation angles of the target coordinate system are and , respectively; the azimuth and elevation angles between the radar Light Of Sight (LOS) and are and , respectively. The distance between and is , The distance between and is . According to [7], at time , the distance iswhere , is rotation angular velocity, and is the matrix related to the angle parameter of the rotation axis. The following expression describes .

In equation (1), the micromotion of the rotating target conforms to the law of sinusoidal modulation, which is affected by factors such as the radar line of sight and the target rotation attribute.

For a decoy vibrating, is the amplitude of vibration, and is the vibration angular velocity. Vector representation of vibration direction and LOS direction in the target coordinate system can be described as and , respectively. At time , the distance between and is , which can be described as

From equation (3), the distance from the vibrating target to the radar is modulated by sinusoid.

##### 2.2. Warhead Mode

In Figure 2(a), the mass of the warhead is defined as the coordinate system origin. Spin angular velocity of the target is , cone rotation angular velocity is , and the azimuth and elevation angles of the target symmetry axis are and , respectively. The angle between the LOS and axis is . In particular, point A is fixed on the top of the warhead, which can be assumed as a localized scattering center. Points B and C are located at the edge of warhead, that is, the position of points B and C slides with the movement of the target [9], which can be considered as SSCE. According to [19], the m-D distance of each scattering point can be expressed as follows:where is the time variable, is the radius of the target bottom, and are the distances from the top of the cone to the center of the mass and from the center of the mass to the center of the bottom, respectively. can be expressed as follows:

**(a)**

**(b)**

According to equation (4), the m-D of the sliding scattering point does not conform to the sinusoidal modulation law.

The warhead moving in the form of a nutation is actually adding to the swing based on precession. According to the derivation, the nutation m-D distance of each scattering point can also be expressed as equation (4). The only difference is that the expression of is modulated by the swing parameter. Assuming that the amplitude is , denotes the elevation angle at the initial time, and is the swing angular velocity. Thus, we can rewrite as follows:

Moreover, space targets have a special streamlined structure, as shown in Figure 2(b). Point *D* can be considered as the SSCS.

The angle between the LOS and axis is . and are the lengths of the major and minor axis of the streamlined structure, respectively. The coordinates of point *D* can be expressed as follows:where , .

Vector representation of LOS direction can be described as follows:

The m-D distance of SSCE can be expressed as follows: where is the transfer matrix of degree rotation around the *Z* axis. can be deduced by the following:

The m-D characteristics of SSCS do not conform to the sinusoidal modulation law.

#### 3. Deep Learning Recognition Network

Figure 3 shows the overall network architecture for space target recognition, which consists of a time-frequency transform, 1-D parallel structures for local feature learning, an LSTM layer for global temporal information extraction, and softmax for classification. For traditional image recognition, because the image is 2-D, the convolution layer in the network usually adopts a 2-D structure. In this paper, considering the temporal correlation of images, we treated time-frequency spectrograms as multiple channels (time dimension) of a 1-D (frequency dimension) image.

##### 3.1. Parallel Structure

The development of deep learning (DL) has led to many breakthroughs in the field of target recognition. Unlike the traditional artificial target recognition, the intelligent target recognition based on DL can realize moderate feature extraction.

As a DL method, the convolutional neural network (CNN) transforms the original data into a more abstract expression through a simple nonlinear model. Many scholars have designed different network structures based on CNN, such as Alexnet [20], VGG-16 [21], and VGG-19 [22].

Unfortunately, these networks exhibit a type of deep frame structure. The convolution layers of these networks are linearly connected; hence, only one convolution layer can extract sole features simultaneously.

To extract different features, we propose to introduce the 1-D Parallel structure in the proposed architecture.

The parallel structure is shown in Figure 4.

In Figure 4, the parallel structure uses three different types of 1-D convolution kernels, namely, , , and convolution kernels, in which represents the kernel width. We introduce a design strategy for neural network architecture based on the fractal theory. To reduce the computation and expand the network depth, we take a convolution layer as the initial layer, connect the two initial layers, and then use the join operation to merge the connected structure with the convolution layer to form the second layer framework. And then, we connect the two second layer frameworks and use the join operation to merge the connected structure with the convolution kernels. In this structure, one max pooling is also performed. For brevity, the activation function of ReLU is not enumerated.

Instead of the sole features of the independent convolution kernel learning, the parallel structure can automatically extract different features and is capable of some complex tasks. Different processing branches adapt to different convolution kernels for the module to capture the features of different scales. In this way, gratifying results can be obtained in space target recognition.

##### 3.2. LSTM

As the traditional recognition method utilizes the envelope information of the time-frequency spectrogram without considering the temporal correction between the frequency cells, we adopted the Long Short-Term Memory (LSTM) model to process the spectrogram.

LSTM is an efficient time series processing unit and has been widely applied [23, 24]; it can handle time series by learning the long-term dependence information between the time steps of the sequence data. The structure of the LSTM is illustrated in Figure 5.

The core of the LSTM is the cell state, which runs through the whole cell. Information can be deleted or added to the state of the cell through a structure called the gate.

The LSTM includes a forget gate, an input gate, and an output gate. The forget gate can determine what information should be discarded in the cell state. The input gate uses the sigmoid function to judge what new information would be useful for the cell state; the tanh function is used to obtain new candidate cell information. The output gate multiplies the information filtered by the sigmoid function, and a vector between −1 and 1 is obtained by the tanh layer to obtain the output of the structure.

After the 1-D parallel structure process, the output can be regarded as the frequency feature vectors arranged in the time dimension; therefore, we can utilize the LSTM to learn about contextual time information of multichannel 1-D frequency feature vectors.

#### 4. Simulation Results

In this section, the system implementation details and the performance analysis are introduced primarily. Then, the identification of the performance of different network configurations is analyzed.

##### 4.1. System Implementation Details and Performance Analysis

Some radar parameters are listed as follows: the radar transmission carrier frequency is approximately 10 GHz, the pulse repetition frequency is 2000 Hz, the pulse width is 10 , and the observation time is 2 s. Some target parameters are listed as follows: Target 1 is a precession cone warhead target; its height is 2.4 m, radius is 0.5 m, and cone rotation angular velocity . Target 2 is a nutation cone warhead target; its height is 2 m, radius is 0.5 m, and cone rotation and swing angular velocities are and , respectively. Target 3 is a rotation decoy target; its height is 2.18 m, radius is 0.52 m, and rotation angular velocity . Target 4 is a procession streamlined structure warhead target that has SSCS; its cone rotation angular velocity , major and minor axis are 2.5 and 0.5 m, respectively. Target 5 is a vibration decoy target; its height is 1.92 m, radius is 0.48 *m*, and vibration angular velocity . The time-frequency spectrograms of the five targets are illustrated in Figure 6.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

Short-time Fourier transform (STFT) is an effective time-frequency transform method. Its basic idea is to use a window function to take out the signal in a small-time interval, and use FFT to analyze the signal frequency in each time interval. In this paper, we use the window function to divide time-frequency spectrogram into 153 parts, and then these 153 parts have been fed to 1-D parallel network. The time-frequency spectrograms of the five targets are illustrated in Figure 6.

In Figure 6, the features of the different m-D time-frequency spectrograms are consistent with the theoretical analysis in Section 2. Samples of space targets are often more difficult to obtain. Note that all data were collected by electromagnetic simulation. The sampling interval and the range of each parameter are shown in Table 1. Data acquisition was done under the condition of SNR of −10 dB : 2 dB : 10 dB. The number of all target time-frequency spectrograms was 11 × 1000. In this paper, 70% of the collected time-frequency spectrograms are used as training images, while the remaining 30% are used as test images.

The initial configuration of deep learning recognition network settings is shown in Table 2. For brevity, the ReLU activation function is not enumerated.

For space target recognition, signal-to-noise ratio (SNR) is an important factor that affects the recognition accuracy. Therefore, in this paper, we focus on the recognition effect of the network under different SNRs. The concreteness of the results is shown in Table 3.

The confusion matrix of the network under −10 dB is illustrated in Figure 7. The rest of the confusion matrix is not displayed here because of space constraints. From Table 2 and the confusion matrixes, we can distinctly find that with the improvement of the SNR, the accuracy of recognition is increasing. When the SNR attains 10 dB, the accuracy is close to 1. As the SNR decreases, it is difficult to identify the m-D of the target time-frequency spectrograms due to noise, and the classification accuracy is reduced. Note that even when the SNR is relatively low, it can still recognize the three motion forms of rotation, vibration, and nutation with high accuracy. Since the time-frequency spectrograms of the precession cone warhead and the procession streamlined structure warhead are similar to a certain degree, we consider that the SSCS model m-D feature should be investigated in the future.

To further verify the recognition effect of the network proposed in this paper, we used different state-of-the-art networks on the same data set for an effective comparison.

In this paper, traditional network training parameters are set as follows: the size of the mini-batch for each training iteration was set at 128. The maximum number of epochs was 24. The learning rate was initially set at 0.001, and then it was decreased by a factor of 10 after every 10 epochs. In total, the learning rate was decreased twice. We train the network using the stochastic gradient descent with momentum (SGDM) optimizer. Figure 8 shows the root-mean-square error (RMSE) over the epoch of traditional networks in the case of 6 dB, which gives an overview of the training process.

As shown in Figure 8, RMSE gradually decreased and stabilized. Variation curves of test recognition accuracy varying with SNR for different networks are illustrated in Figure 9.

As depicted in Figure 9, the recognition accuracy of the network proposed in this paper is higher than that of the traditional network. Furthermore, the classification performance is better, and the robustness is stronger when the SNR is low.

In order to verify the effectiveness of the method, the proposed network method is compared with the classification methods of the support vector machine (SVM) classifier [25] and m-D threshold recognition [26]. The recognition accuracy of the different existing micromotion feature extraction algorithms is shown in Figure 9 when the SNR ranges from −10 dB to 10 dB.

It can be seen from Figure 10 that the recognition accuracy of the network proposed in this paper is the highest. When SNR<0 dB, the recognition accuracy of the traditional classification methods is less than 60%, while the target classification accuracy of the deep learning model is still relatively high, at around 95%. For traditional micromotion recognition methods, in order to obtain a higher accuracy, the SNR generally needs to be set above 6 dB, which establishes that the proposed network has a better anti-noise performance than the traditional algorithm.

##### 4.2. Analysis of Network Configurations on Recognition Performance

In this section, we focus on the influence of network configurations on network performance. Based on the initial network structure, we changed the number of one-dimensional parallel structures or LSTM units in order to observe their effects. As presented in Table 4, when the number of LSTM units is doubled every time from 128 to 512, the principle it follows is that the number of LSTM units (N) is usually half or the same as the kernel width of the previous layer (K). The number of one-dimensional parallel structures is increased from 5 to 11 in an interval of 2.

Similarly, we train diverse configured networks under different SNR conditions. The comparison results of the different configured networks are illustrated in Figure 11.

**(a)**

**(b)**

From Figure 11, we can observe that 256 LSTM units display a higher classification accuracy. As far as the parallel structure is concerned, its performance is significantly better when the number reaches 7 or 9. Unfortunately, the error is inevitable owing to the time-frequency spectrograms, which are greatly affected by noise under the condition of low SNR (−10 dB). The recognition accuracy of the network is lower than 0.8, in spite of the network configurations being adjusted.

#### 5. Conclusions

Aiming at the resolution of space target identification in a ballistic missile defense system, we divide space targets into warheads and decoy targets, according to the movement mode. We proposed a new network based on the parallel structure and LSTM units. In practical applications, the network identified by the time-frequency spectrograms is greatly affected by noise. To accurately evaluate the recognition effect of the proposed network, we obtained the recognition accuracy of the network under different SNR conditions. Through comparison, it was found that the recognition accuracy of the proposed network is better than the traditional networks. More importantly, we optimized the network by searching the number of parallel structures or LSTM units.

It is worth noting that the space target recognition in this paper is implemented on the premise that the group target has been separated, that is to say, the process of group target signal separation is ignored. However, the separation of group target signals is a very complex issue, and the result of this step directly affects the effect of the subsequent target recognition. Therefore, the focus of our next research is the separation of the target signals of complex space groups.

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.

#### Acknowledgments

This research was funded by the National Natural Science Foundation of China (No. 61701528) and in part by the Natural Science Foundation of Shaanxi Province (No. 2019JQ-497).