Abstract

Aiming at the nonstationary characteristics of echo signal for a high-speed maneuvering target, a signal feature extraction method is proposed by combining the time-frequency analysis and convolution neural network, and then the automatic detection of radar moving target in a noisy environment is realized. Firstly, the echo signal is modelled as a more accurate Gaussian modulation-linear frequency modulation (GM-LFM) signal and converted into the time-frequency image by a second-order synchroextracting transform (SET2). Then, ridge extraction is applied to extract the maximum energy ridge from the time-frequency distribution, and the data set is constructed by the maximum energy ridge. Finally, the data set is input into AlexNet for training, and the deep-level features of echo signal are extracted to realize the automatic moving targets detection. Simulation results show that SET2 and RE can effectively enhance the time-frequency characteristics of echo signal under the noisy environment, and the detection accuracy and noise robustness of the proposed method are better than that of SET1 and smooth pseudo-Wigner–Ville distribution (SPWVD).

1. Introduction

As an important branch of radar signal processing, moving target detection plays an important role in the target search, tracking, and recognition. Radar uses the unequal Doppler frequency shift caused by the different speeds to distinguish the target and extracts the moving target echo signal by the filter [1]. Due to the complex detection environment and various target motion characteristics, the radar echo signals usually present nonstationary characteristics. The moving target detection method based on Fourier transform (FT) has the best performance for the uniform moving target, which is no longer effective for high-speed maneuvering targets owing to its spectrum divergence and poor energy accumulation [2].

As a powerful tool to study nonstationary signal, time-frequency analysis (TFA) can provide the joint distribution information of signals in the time-frequency plane. The TFA-based moving target detection method completes the detection by extracting the features corresponding to the target in the time-frequency plane, such as short-time Fourier transform (STFT), fractional Fourier transform (FRFT), Wigner–Ville distribution (WVD), Choi–Wiliams distribution (CWD) [35]. These methods improve the real-time response rate and require no priori knowledge on the motion parameters. However, these methods need to manually extract the features of the echo signal, which is difficult to extract the deep-level features with high discrimination [6]. As an artificial intelligent method, deep learning (DL) can automatically quarry abstract features from the input data without complicated manual feature extraction and has excellent image classification and recognition ability [7]. Therefore, many scholars have explored the combination of TFA and DL to establish an intelligent detection method, which can replace tedious hand-designed features and realize the automatic recognition of radar signal under a low SNR [813]. In these methods, TFA is firstly applied to transform the echo signal into the time-frequency image, and then the deep-level features are automatically extracted by DL to realize the radar moving target detection. The organic combination of TFA and DL not only gives full play to the advantages of DL in image recognition but also effectively improves the noise robustness of the detection method. For example, Ni et al. apply synchrosqueezing transform (SST) to convert radar signals into time-frequency images, and a multiresolution CNN with three different kernel sizes is built to extract more deep features [8]. Pan et al. use the smoothed pseudo-Wigner–Ville distribution (SPWVD) to generate the time-frequency images of intercepted signals, and a novel MIML-DCNN is proposed to automatically recognize the overlapping LPI radar signals [9]. Su et al. use STFT to obtain the time-frequency image of echo signal for sea surface moving target, and three types of CNN models (i.e., LeNet, AlexNet, and GoogleNet) are used in binary detection and multiple micromotion classifications [12]. The echo signal model and TFA method play a key role in these methods and directly affect the accuracy of target detection. However, the above-given methods still have some problems in the signal model, time-frequency resolution, and noise robustness. For example, the echo signal is modelled as an equal amplitude linear frequency modulation (LFM) signal, which cannot accurately represent the echo signal for a high-speed maneuvering target [14]. Due to the fixed window function, STFT cannot obtain high time resolution and frequency resolution simultaneously. SST squeezes the TF coefficients into the instantaneous frequency (IF) trajectory, the unavoidable noise will result in bad noise robustness [15].

Therefore, to address these problems, a moving radar target detection method is proposed based on second-order synchroextracting transform (SET2) and AlexNet. Firstly, the echo signal of a high-speed maneuvering target is modelled as a more accurate Gaussian modulation-linear frequency modulation (GM-LFM) signal, and then the echo signals are converted into the time-frequency images by SET2. Moreover, the maximum energy ridges are extracted from the time-frequency images and then used to construct the data set. Finally, the data set is input into the AlexNet for training, and the deep-level feature of the echo signal is extracted to realize automatic detection. The main contributions of this paper include (1) SET2 is used to build the time-frequency image of echo signal for a high-speed maneuvering target, which effectively enhances the energy concentration of time-frequency distribution; (2) the ridge extraction is further applied on the time-frequency distribution of echo signal, which effectively reduces the impact of noise on network training; (3) the target detection is converted into image recognition and classification, and AlexNet is used to realize the automatic detection of radar moving target.

The remaining sections of this paper are organized as follows: in Section 2, the echo signal model is presented briefly. In Section 3, the principle and definition of SET2 are introduced. A radar moving target detection method is proposed in Section 4, followed by the simulation analysis presented in Section 5. Finally, the conclusion is stated in Section 6.

2. Echo Signal Model

It is assumed that radar transmits linear frequency modulation (LFM) signal, i.e.,where denotes the rectangular function, and denote the carrier frequency and chirp rate, B and T denote the bandwidth and pulse width. The received signal at time t iswhere denotes the radar cross-section of target. denotes the time delay, where denotes the speed of light, denotes the distance between target and radar, denotes the slow time from pulse to pulse in a coherent processing interval. After demodulation and pulse compression, the echo signal can be expressed as follows:

Then, is expand into Taylor series about , i.e.,where and denote the target speed and coherent integration time. Take the first three items of equation (4) as the quadratic approximation of , i.e.,where and denote the initial velocity and acceleration of the target. In this paper, three types of moving target including uniform speed, uniform acceleration, and uniform deceleration are considered. The IF of uniform speed and uniform variable speed target arewhere denotes the radar wavelength.

The dynamic imaging method after translational compensation for maneuvering targets usually assumes that the echo signal of a single scattering point is a constant amplitude LFM signal. In fact, in addition to the time-varying Doppler of each scattered point echo, the amplitude of the echo signal is also time-varying. For example, when the moving target has a dihedral or trihedral structural member, the scattering points have strong directionality. For a high-speed maneuvering target, when the scattering points move across the distance cell, some scattering points cross and enter the analysed distance cell halfway. Therefore, the echo signal of each scattering point in the moving target is an amplitude modulation frequency modulation signal. The amplitude modulation-linear frequency modulation (AM-LFM) signals can be used to approximate the echo signal [14]. In this paper, the echo signal model of radar moving target is established as GM-LFM signal, i.e.,where denote the amplitude, initial frequency, and chirp rate of the echo signal.

3. Second-Order Synchroextracting Transform

Yu et al. proposed an one-order synchroextracting transform (SET1) [15], the assumed signal model in SET1 is a constant amplitude single-frequency signal (see Figure 1), which cannot accurately characterize the nonstationary signal. Compared with the constant amplitude single frequency signal with a linear phase, the GM-LFM signal can represent the nonstationary signal more accurately. Recently, Bao et al. proposed a second-order synchroextracting transform (SET2) based on the GM-LFM signal model [16], which has better time-frequency resolution and signal reconstruction accuracy than that of SET1. The amplitude of the GM-LFM signal is a Gaussian function, and its phase is a quadratic polynomial function [17], i.e.,where represents the phase of . The IF of the GM-LFM signal can be obtained by the derivative of , i.e.,

The derivative of equation (9) with respect to is given by the following equation:where

Combined with equation (10), the IF of the GM-LFM signal can also be given by the following equation:where represents the imaginary part of a complex number .

The Gaussian STFT of a signal is defined as follows:where Gaussian window function is given by the following equation:

Substituting equation (9) into equation (14), the Gaussian STFT of GM-LFM signal can be given by the following equation:where

Take the partial derivative on both sides of equation (16) about , thenwhere denotes the STFT of with window function . From equation (18), then

Take the partial derivative on both sides of equation (19) about , thenwhere and denote the STFT of with window function and . According to equations (19) and (20), and can be given by the following equation:

GM-LFM signal can be used for local approximation of the analyzed signal at any time, so the IF of the analyzed signal can be estimated from the IF of the GM-LFM signal. At any point , the IF estimator of the analyzed signal can be given by the following equation:

In order to obtain the ideal time-frequency distribution of a signal, only the amplitude and instantaneous frequency at the IF of the GM-LFM signal are used to construct the time-frequency distribution of the signal. Therefore, SET2 is defined as follows [16]:

4. Detection Method

4.1. Principle

Below, the following echo signals of uniform acceleration, uniform deceleration, and uniform speed moving targets are used to analyze the time-frequency characteristics, i.e.,where , . The time width is s, and the sampling frequency is .

For the noiseless case, the time-frequency distributions of echo signal are established by STFT, synchrosqueezing transform (SST), second-order SST (SST2), SET1, and SET2 (see Figure 2), corresponding to moving targets with uniform acceleration, uniform deceleration, and uniform speed. For a uniform variable speed target, the IF of the echo signal changes linearly with time , while the IF of echo signal for a uniform speed target does not change with time . Since SST and SET are two postprocessing TFA methods of STFT, the time-frequency distribution obtained by SST and SET have better energy concentration than that of STFT. In addition, Rényi entropy is used as an index to measure the energy concentration of time-frequency distribution, smaller Rényi entropy means better energy concentration (see Table 1). It can be seen from Table 1 that the time-frequency distribution obtained by SET2 has better energy concentration than other methods.

The radar moving target detection method is built on the characteristic extraction of the echo signal, but the received echo signal usually contains target information and various noises. Therefore, the characteristic extraction of echo signal under the noisy environment plays a key role in a radar moving target detection method. For the echo signal of the uniform acceleration target, by adding Gaussian white noise to the signal, the time-frequency distributions of noisy GM-LFM signal are obtained by five methods (see Figure 3). Since SST squeezes all time-frequency coefficients into the IF trajectory, the noise will inevitably be gathered into the SST coefficients [15]. Different from SST, SET only utilizes the TF coefficient having maximum value to generate the time-frequency distribution, so the energy concentration of time-frequency distribution obtained by SET is better than that of SST. Meanwhile, the GM-LFM signal model is used in SET2, so the time-frequency distribution obtained by SET2 has better TF resolution for these echo signals with time-varying IF. In addition, the corresponding Rényi entropy under different SNR are shown in Figure 4. For the noisy GM-LFM signal, the time-frequency distribution obtained by SET2 has the minimum Rényi entropy, which means better energy concentration and noise robustness. The better energy concentration and noise robustness are conducive to accurately extract the time-frequency characteristics of the GM-LFM signal, and the subsequent extraction of deep-level characteristics by AlexNet.

4.2. AlexNet

In this paper, the deep-level characteristics of echo signal are extracted by AlexNet. As a deep convolution neural network, AlexNet uses stacked convolution layers to extract image characteristics and applies dropout and data augmentation to suppress over fitting [18]. AlexNet has a deeper network with eight learned layers, including five convolutional and three fully-connected. AlexNet has the following important characteristics.

4.2.1. ReLU Nonlinearity

The commonly used activation function sigmoid and tanh function will appear gradient saturation when the input is large, resulting in slow training convergence. AlexNet uses the nonlinear and unsaturated function ReLU: as the activation function, the unsaturated and nonlinearity function is much faster than that of the saturated nonlinear function in terms of training time for gradient descent.

4.2.2. Overlapping Pooling

Overlapping pooling means that the step size of the pooling movement is less than the size of the pooling layer. Compared with LeNet using the same step size and pooling layer size, the overlapping pooling used in AlexNet can suppress overfitting during training.

4.2.3. Dropout

Dropout sets the output of each hidden neuron to zero with probability 0.5, the neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in back-propagation. The neural network samples a different architecture when input is presented, but all these architectures share weights. This technique reduces complex coadaptations of neurons, and the overfitting of the model is effectively avoided.

In this paper, the softmax function is used to complete the classification task in the last full connection layer. The batch normalization (BN) is used to replace local response normalization (LRN) [19] in AlexNet. BN has the following advantages: (1) there is no high requirement for initialization in the training process, and a larger learning rate can be used; (2) BN makes the optimized environment smoother, so the gradient is more predictable and stable [20].

4.3. Algorithm

The flow of the proposed radar moving target detection method is shown in Figure 5, and the specific steps are listed as follows:Step 1. Data preprocessing: the echo signal of the radar moving target is demodulated and pulse compressedStep 2. Time-frequency distribution establishment: SET2 was used to establish the time-frequency image of echo signalsStep 3. Data set construction: ridge extraction is performed on the time-frequency distribution to extract the maximum energy ridge, and the data set are constructed based on the maximum energy ridgeStep 4. Model training: the training set is input into AlexNet for training, and the deep-level characteristics of the echo signal is extractedStep 5. Model parameter optimization: the validation set is used to optimize the hyperparameters of the model, and the model is preliminarily evaluatedStep 6. Target detection: the test set is used to test the generalization ability of model, so as to complete the moving targets detection

5. Simulation

This paper uses Pycharm2019, Python3.6, CUDA10.1, cuDNN7.6, and keras library in deep learning TensorFlow framework to build AlexNet and uses RTX-2060GPU to train the model. Firstly, these images in the training set are preprocessed, and the preprocessing parameters are listed in Table 2. The images in the training set are randomly shuffled and scaled to 227 × 227 by bicubic interpolation, and 32 images were trained as a batch of data.

The parameters of AlexNet are listed in Table 3, and the selected optimizer is random gradient descent (SGD). SGD updates parameters through random sampling, and the speed of updating parameters in each round is greatly accelerated. The initial learning rate is set to 0.01, which improves the performance of the deep learning model and reduces the training time. The dropout rate of hidden nodes is set as 0.5, the randomly generated network structure is the most [21]. The number of iterations is set as 25 rounds, and the detection accuracy and loss function curve are shown in Figure 6. With the increase of training times, the detection accuracy of the training set and verification set increases and finally tends to 100%, and the loss function decreases and finally tends to 0.

5.1. Effect of RE on Detection Performance

The sampling frequency and time width of the echo signal are , the chirp rate and initial frequency are listed in Table 4. For the uniform acceleration and deceleration targets, it is difficult to distinguish those with small k from the uniform speed targets with . To improve the detection accuracy for these uniform acceleration and deceleration targets with small k, more echo signals with small k are generated for training in the data set. Then, Gaussian white noise is added to the GM-LFM signal, and the variation range of SNR is with an interval of 2 dB. For the same SNR, 900 time-frequency images of each type of signal are obtained by SET2, where 70% are used for model training and 30% are used for model verification.

From Figure 3, the noise still exists in the time-frequency images of the echo signal. In order to further reduce the impact of noise on network training, RE is further applied to remove noise and strengthen the time-frequency characteristics of the echo signal. RE uses a penalized forward-backward greedy algorithm to find the maximum-energy ridges by minimizing at each time point, and optionally constrains jumps in frequency with a penalty, where A is the absolute value of time-frequency distribution. RE is performed on the time-frequency distribution obtained by SET2, the extracted maximum-energy ridges are shown in Figure 7. From Figure 7, RE can effectively remove the noise in the time-frequency distribution, and the time-frequency characteristics extracted by SET2 and RE (abbreviated as SET2 + RE) are generally consistent with the theoretical results (see equations (6) and (7)). In brief, SET2 + RE can accurately extract the time-frequency characteristics of GM-LFM signal, and better resolution and noise robustness are conducive to the subsequent deep-level feature extraction by AlexNet.

For each type of echo signal, 100 time-frequency images are generated by SET2 and SET2 + RE for model testing, and the test results are listed in Table 5. The proposed detection method is built on the time-frequency characteristics of the echo signal, the effect of time-frequency feature extraction under the noise environment will directly affect the detection accuracy. Table 5 shows that the detection accuracy is gradually improving with the decreasing of noise intensity. Under the same SNR, compared with the SET2-based detection method without using RE, the combination of RE and SET2 can improve the detection accuracy. Especially for low SNR, the application of RE can significantly improve the detection accuracy.

5.2. Comparison

Next, the proposed method is compared with the detection method based on SET1 + RE and smooth pseudo-Wigner–Ville distribution (SPWVD) [13]. Under the same SNR, 900 time-frequency images of each type signal are obtained by SET2 + RE, SET1 + RE, and SPWVD, where 70% are used for model training and 30% are used for model verification.

For each type of signal, another 100 time-frequency images are generated for model testing, and the test results are listed in Table 6. The changing trends of detection accuracy with SNR are shown in Figure 8, where 8(a)8(c) correspond to the moving targets with uniform acceleration, uniform deceleration, and uniform speed, respectively. From Table 6 and Figure 8, the following results can be obtained: (1) With the increase of SNR, the detection accuracy of the three methods is gradually increasing. When , the detection method based on SET2 + RE can accurately identify all kinds of targets. (2) Because SET2 can give more obvious time-frequency characteristics of the echo signal, the detection accuracy obtained by SET2 + RE is higher than that of SET1 + RE in a low SNR. The results show that SET2 can effectively improve the accuracy and noise robustness of the detection method. (3) The SPWVD-based detection method uses mean filtering to denoise the time-frequency image, but mean filtering also destroys the detail part of time-frequency image (ridge) when filtering out the noise. However, RE not only has the capable of removing the noise but also retains the time-frequency characteristics of echo signal to the greatest extent. Therefore, for a low SNR, the detection accuracy based on SET2 + RE is higher than that of SPWVD.

6. Conclusion

In this paper, a radar moving target detection method based on SET2 + RE and AlexNet is proposed, which combines the advantages of TFA and CNN. This method converts the echo signal into a time-frequency image by SET2, and the time-frequency characteristic of the echo signal is extracted with high resolution under the noise environment. Based on the time-frequency images of the echo signal, the deep-level features of the echo signal are extracted by AlexNet to realize the automatic moving targets detection. Simulation results show that the application of RE can improve the detection accuracy and noise robustness of the detection method, especially for a low SNR. Compared with other methods, the detection accuracy of the proposed method is higher than that of SET1 + RE and SPWVD.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant nos. 62201298 and 62161040) and Inner Mongolia University of Science and Technology Innovation Fund (Grant no. 2019QDL-B39), Fundamental Research Funds for Inner Mongolia University of Science and Technology.