Abstract
Feature extraction and recognition of signals are the bases of a cognitive radio. Traditional manual extraction for signals’ features becomes difficult in the complex electromagnetic environment. Although convolutional neural networks (CNNs) can extract signal features automatically, they have low accuracy in recognizing electromagnetic signals at low signal-to-noise ratios (SNRs) due to the agility of signals. Considering the great potential of spiking neural networks (SNNs) in classification, a spiking convolution neural network (SCNN) for the recognition of electromagnetic signals is proposed in this paper. The SCNN effectively integrates the extraction ability of spatial features in CNNs and temporal features in SNNs. Since the SCNN is difficult to train, the strategy of surrogate gradient is proposed to train it. By taking the 2-dimensional time-frequency distribution of 6 signals as input, the SCNN can effectively identify different signals at low SNRs. The method proposed in this paper contributes to promote the research and application of SNNs in the recognition of electromagnetic signals.
1. Introduction
Modern electronic warfare is rapidly developing with the rise of electronic information technology. Electromagnetic signal identification becomes a critical part of a cognitive radio (CR) [1]. However, electromagnetic signal waveforms appear to be agile and heavily interfered in a diverse electromagnetic environment. The agility of signals reduces the performance of earlier recognition methods based on the interpulse feature [2]. Therefore, in-pulse features such as time-frequency features, wavelet packet features, wavelet ridge-frequency features, and higher-order spectrum are widely used for signal recognition by combining support vector machines (SVM) and deep learning (DL) [3]. As a type of DL, convolutional neural networks (CNNs) are widely used in target recognition and image classification. Due to their powerful capability for feature extraction and generalization, CNNs are quite effective in the recognition of radar signals [4, 5] and communication signals [6, 7].
In recent years, many studies have been done on electromagnetic signal recognition with CNNs. Some methods focus on preprocessing the input signals of CNNs to reduce noise interference [8–10]. Ye et al. use the time-frequency distribution preprocessed with binarization as the input of a 3-layer CNN for signal recognition. The recognition accuracy of signals at −6 dB is still above 90% [8]. Yao and Wang preprocess the time-frequency maps with more methods including symmetric mapping, primary energy ridge extraction, binarization, and image reset to reduce noise interference. Then, they use a pretrained CNN to identify the preprocessed time-frequency maps and obtain a higher accuracy than methods with manual feature extraction [9]. Denoising is also an optional method for preprocessing. When using a residual neural network (ResNet) to denoise time-frequency maps of radar signals, the Inception-V4 network which combines the inception structure and residual connections achieves a recognition accuracy of over 90% for signals at −10 dB [10]. Meanwhile, different features are constructed for signal recognition [11, 12]. Pu et al. use orthogonal slices of Gaussian-smoothed fuzzy function as a feature extraction target. By learning the features of the orthogonal slices, the employed CNN achieves a recognition accuracy of 88.5% for signals at −6 dB [11]. Xie et al. firstly use chirp decomposition for preliminary classification of signals. Then, they take the Zernike matrix extracted from time-frequency maps as the input feature of a ResNet to complete further classification of the signals. Their method finally achieves good robust performance on signals’ parameters [12]. Some other methods are committed to improving the structure of networks for improving the recognition accuracy [13–15]. Lin et al. propose a deep residual shrinkage attention network for signal recognition. They use the attention mechanism in ResNet to reduce the impact of redundant information. Finally, the deep residual shrinkage attention network achieves excellent recognition accuracy and good scalability at the same time [13]. In addition, better performance than either network alone can be achieved by combining the long- and short-term memory network (LSTM) and CNN. The combined network fuses time-domain features with time-frequency domain features, providing some improvement in the performance for signal recognition at low SNRs [14, 15].
Nonetheless, there are still some challenges to be considered and addressed. (1) Most of the CNN-based methods have low recognition accuracy for signals with low SNRs. (2) Recognizing electromagnetic signals with CNNs requires preprocessing methods or manually constructed features to reduce noise interference, increasing the complexity of signal processing. (3) Rare electromagnetic signal data, especially radar signal data, is difficult to meet the requirement for training deep CNNs. Furthermore, too large networks cannot easily be applied to mobile devices.
In contrast, spiking neural networks (SNNs) have great potential for development due to their biointerpretability and low power consumption. Li et al. use Gaussian-tuned coding to convert the time-frequency maps into pulses. Then, a single-layer SNN composed of tempotron neurons is used to process the pulses for signal recognition. Their simulations show that a single-layer SNN performs better than a 3-layer CNN [16]. Li et al.’s results suggest that a simple SNN may have better performance than a deeper CNN. Although deeper SNNs were less used due to difficulties in building and training in the beginning, some studies have shown that SNNs can be transformed from CNNs [17–19] or trained with a surrogate gradient [20, 21]. New methods for building and training SNNs enable SNNs to have good performance in classification. SNNs have achieved superior performance in efficiently processing complex and noisy spatial-temporal information [22]. SNNs show a higher potential for robustness in many ways compared to CNNs [23]. The information is less susceptible to random background noise in SNNs [24].
Thus, the purpose of this study is twofold. First, we propose a spiking convolution neural network (SCNN) for signal recognition in this paper. The SCNN combines the SNN with the CNN. It integrates the temporal and spatial information of the feature maps for better feature extraction. At the same time, the SCNN has a smaller model size compared to CNNs with the same accuracy. Second, the method of surrogate gradient for training the SCNN is given in this paper. In addition, two key parameters that affect the performance of the SCNN are analyzed and explained in detail. The remainder of this paper is summarized as follows. Section 2 describes the principles of time-frequency transform and SNNs besides the methods used in this paper. Section 3 determines the optimal values of two key parameters of the SCNN and compares the SCNN’s performance with those of other networks. The noise generalization performance of the SCNN is analyzed at the end of this section. Finally, the conclusion is given in Section 4.
2. Theory and Methods
Figure 1 illustrates the process of identifying electromagnetic signals using the SCNN. The signals are first transformed into time-frequency maps. Then, the time-frequency maps are used as the input of the SCNN to complete the classification. Therefore, the first two layers of the SCNN are convolutional layers which are each followed by a layer composed of leaky integrate and fire (LIF) neurons. The feature maps after LIF neurons are then mapped to a pooling layer to reduce the dimension. And the two fully connected layers are each followed by a layer consisting of integrate and fire (IF) neurons. The number of LIF and IF neurons is the same as the size of the feature map of the previous layer. So, the LIF and IF layers do not change the size of the feature map.

2.1. Time-Frequency Distribution of Electromagnetic Signals
In order to avoid the features of one-dimensional signals being affected by noise, we use time-frequency distributions to get the features of signals. Different kernel functions correspond to different time-frequency distributions. A series of time-frequency distributions have been proposed, mainly including the Choi-Williams distribution (CWD), the smoothed pseudo-Wigner-Ville distribution (SPWVD), and the reduced interference distribution (RID) [25]. The CWD with minimal cross-terms among all unprocessed Cohen-like distributions is chosen in this paper. Its kernel function is shown in equation (1) as follows:
The corresponding CWD is described in equation (2) as follows:
We use six signals commonly used in communication and radar, including the binary phase shift keying signal (2PSK), binary frequency shift keying signal (2FSK), quadratic frequency shift keying signal (4FSK), continuous wave signal (CW), linear frequency modulated signal (LFM), and nonlinear frequency modulated signal (NLFM). The Choi-Williams distributions of the 6 signals at 0 dB are given in Figure 2.

(a)

(b)

(c)

(d)

(e)

(f)
2.2. Spiking Neural Network
Various spiking neuron models have emerged, including the Hodgkin-Huxley (HH) model, integrate-and-fire (IF) model, leaky-integrate-and-fire (LIF) model, and Izhikevich model [22]. Although its biological approximation is favourable, the HH model is computationally complex and inconvenient to use in practice. The simplified LIF and IF models have become common and widely used models, since they reduce the computational effort of the model while retaining the threshold-based membrane potential change rules of biological neurons. Both LIF and IF neurons are integral models. Their membrane potential is influenced not only by the input at the current moment but also by the membrane potential at the end of the previous moment. When the membrane potential does not exceed the threshold, the charging process of a continuous-time neuron can be expressed as equation (3) as follows:
Equations (4) and (5) show the charging process of LIF and IF neurons, respectively. where is the time constant and is the reset voltage. An analytical solution to equation (4) cannot be obtained as the input is a variable. The continuous differential equation is approximated by the discrete differential equation (6).
For both discrete LIF and IF neurons, their forward propagation includes two processes, discharge and reset, in addition to the charging process. The discharge process can be expressed in equation (7). where is the voltage of the neuron before releasing the pulse after charging. is the threshold voltage at which the neuron is activated. is the output pulse with a value of 1 for a released pulse and 0 for an unreleased pulse. is the step function. For LIF and IF neurons, the difference lies in the charging process. The membrane potential of IF neurons remains unchanged in the absence of input, whereas the membrane potential of LIF neurons gradually falls below the resting potential in the absence of input due to leakage and then rises back to the resting potential.
2.3. Forward Propagation
As shown in Figure 1, the first convolutional layer is used directly for feature extraction and has no temporal state. It forms an autocoding layer with the LIF neurons in the second layer. Since the input of the first layer does not vary with time, additional computational effort can be reduced by placing this layer outside the process of time steps. In addition, batch normalization is performed after each convolution layer to accelerate the convergence of the SCNN.
Figure 3 shows the activation process of the SCNN in forward propagation. The eigenvalues of the time-frequency maps are assigned to the LIF neurons and then start to accumulate at the moment . When the activation threshold is reached, the LIF neuron releases a pulse and sets the membrane potential to . If the neuron does not activate, the membrane potential is superimposed directly to the next moment . At the same time, if there is no input, the membrane potential will decay at a certain rate. This process is repeated until the end of the given time step. The size of the output feature map after the LIF neurons remains the same as the input size. This process completes the extraction of temporal features and achieves the fusion of spatial and temporal information.

Unlike the fully connected layers of CNNs which use probability for classification, the SCNN classifies signals by the frequency of pulses issued in time . A higher frequency indicates a higher probability. The pulse frequency can be obtained as in equation (8), where is the class of the signals and denotes the cumulative number of pulses issued by the neuron in time .
Among all neurons in the last layer, the neuron with the highest pulse frequency corresponds to the predicted class of the input signal. Finally, the recognition accuracy of the SCNN can be determined by equation (9). where is the number of signals correctly predicted and is the total number of signals.
2.4. Backward Propagation
If the SCNN updates the weight by backpropagating the error with gradient descent, the chain rule in equation (10) gives its gradient transfer. where denotes the output pulse and denotes the membrane potential. As we know from equation (7), in equation (10) cannot be derived since is a step function. The weights cannot be updated directly by gradient descent. Therefore, the method of surrogate gradient is proposed to train the SCNN. The commonly used surrogate functions are the sigmoid function and the Atan function. The approximation effects of the two functions are similar. The Atan function can be expressed as in equation (11).
The approximation effect of the Atan function on the step function under different is shown in Figure 4. can be regarded as the gradient factor. The larger its value, the larger the gradient is and the closer it is to the step function. However, too large tends to gradient explosion at the threshold and the gradient disappears far from the threshold, which finally leads to the weights not being updated in training. In this paper, the Atan function is used as the surrogate function. We analyze the effect of to get its best value for optimal network performance.

3. Results and Discussion
In order to verify the advantages of the proposed SCNN in recognizing signals at low SNRs, 800 time-frequency maps of electromagnetic signals are generated for every 3 dB interval in the SNR range from 0 dB to −15 dB. 600 time-frequency maps are used for training, and 200 time-frequency maps are used for testing. To reduce computational effort, we used grayed-out time-frequency maps for training and testing. The time-frequency maps are scaled to a size of using bicubic interpolation. Tables 1 and 2 show the signals’ parameters and simulation environment, respectively. 64 time-frequency maps are taken as a batch. The learning rate is 0.1 at first and then changes by cosine annealing. Finally, the SCNN is optimized with the SGD optimizer.
3.1. Impact of the Time Step
As the SCNN accumulates pulses over time, its performance is affected by the length of the accumulation time. If the accumulation time is too short, the spiking neurons cannot be activated to fuse temporal and spatial information. If the accumulation time is too long, the network becomes very deep in time. The computational effort and inference delay of the SCNN are too much to meet the real-time requirement for signal recognition. In this paper, we should first determine an optimal time step for accumulating pulses. Theoretically, only affects the depth of the SCNN in time. And the gradient factor only affects the backpropagation of inaccuracies. Actually, numerous experiments also show that and independently affect the performance of the SCNN. As a result, the optimal values of and can be obtained by controlling the variables and . First, is kept unchanged. is changed from 2 to 14. When recognizing 6 signals from 0 dB to −15 dB, the accuracy trends with iterations are consistent for the same at different SNRs. Only the results of at −12 dB are shown in Figure 5 for limited pages. The accuracy of the SCNN can reach the maximum value and remains relatively stable after 40 iterations when and . Since an increase in will lead the increasing of delay, is chosen as the optimal time step.

3.2. Impact of the Gradient Factor
Since and independently affect the performance of the SCNN, we can keep the time step as we have gotten. The gradient factor is changed from 2 to 8. When recognizing 6 signals from 0 dB to −15 dB, the trends of accuracy with iterations are consistent for the same at different SNRs. Only the results at −12 dB are shown in Figure 6 for limited pages. As shown in Figure 6, the convergence of the SCNN worsens with increasing . It can be seen that the recognition accuracy is the highest and most stable when . And is finally adopted.

3.3. Result Comparison of Different Networks
Six electromagnetic signals at SNRs from 0 dB to −15 dB are identified with the SCNN when and . Other simulation settings are shown at the beginning of Section 3. Table 3 shows the composition and size of different networks. SNN1 is a single-layer SNN consisting of the temptron neurons [16]. CNN3 is a 3-layer CNN proposed by Ye et al. [8]. 5-layer LeNet5 is used by Guo et al. [26]. The improved AlexNet8 is proposed by Yang et al. [27]. VGG16 is used by Li and Zhu [28]. Furthermore, as the residual structure gives better performance of deep CNNs, we make a further comparison with an 18-layer ResNet18 with a residual structure [29].
As the recognition accuracy of different networks shown in Figure 7, it is found that the proposed SCNN and ResNet18 own better recognition accuracy among all the models at different SNRs. The reason that ResNet18 performs well is because of the inclusion of the residual structure [29]. Although the performance of ResNet18 is comparable to that of the SCNN, the ResNet18’s model size is about 9 times of the proposed SCNN, which has been shown in Table 3. In Figure 7, we can also find out that the recognition accuracy of CNN3 is relatively low due to its simple structure. With the increasing of depth, the fit ability of CNN-based models is enhanced, resulting in the increasing of recognition accuracy for the six signals at different SNRs. Even so, the recognition accuracy of CNN-based models with a large number of layers, such as VGG16, is still lower than that of the proposed SCNN. The outstanding performance of the SCNN compared to CNNs is attributed to its accumulated information in time which compensates for its lack of spatial depth. The fusion of temporal and spatial information allows the SCNN to have good antinoise ability.

Based on the previous analysis, we know that VGG16 has the best performance for all CNNs without a residual structure. To further illustrate the advantages of the SCNN over ordinary CNNs without a residual structure, the confusion matrices of VGG16 and the SCNN are given in Figure 8. As the SNR decreases, the energy of noise covers the energy of signals. The energy distribution of different signals becomes blurred in the time-frequency maps, leading to a decrease in the recognition accuracy of both the VGG16 and SCNN for various signals. But in all, the overall recognition accuracy of the SCNN is still higher than that of the VGG16.

(a)

(b)

(c)

(d)

(e)

(f)
Figure 9 shows the recognition accuracy of the SCNN for each signal at different SNRs, in which we can see that the recognition of 2PSK is most susceptible to noise. This is because the energy of frequency components of 2PSK is relatively dispersed compared with other signals, which has been shown in Figure 2.

3.4. Analysis of Generalization Performance for Noise
In order to verify the generalization performance of the SCNN for noise, 6 electromagnetic signals at 0 dB, −3 dB, −6 dB, and −9 dB are used for training and testing. As shown in Table 4, the recognition accuracy is very high when the SNRs of trained signals are lower than or comparable with those of tested signals. Only when the SNRs of tested signals are much lower than those of trained signals will the recognition accuracy decline. So, it indicates that the SCNN has good generalization performance for noise in a certain SNR range.
4. Conclusions
The SCNN proposed in this paper utilizes the convolutional layer as visual perception. The LIF neurons and IF neurons are used to construct the convolutional layers and fully connected layers of the SCNN, respectively. The feature maps are converted into pulses to obtain temporal information, which enables the SCNN to perform well in the recognition of electromagnetic signals. And the outstanding recognition performance of the SCNN for electromagnetic signals at low SNRs is verified through the recognition of 6 signals at different SNRs. As a third-generation neural network, information transmission in the form of pulses in SNNs has stronger biological interpretability. The low power consumption of SNNs can get greater benefits in signal recognition.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the State Key Laboratory of CEMEE (Grant no. CEMEE2022K0102A).