Abstract

With the development of artificial intelligence technology, deep learning has been applied to automatic modulation classification (AMC) and achieved very good results. In this paper, we introduced an improved deep neural architecture for implementing radio signal identification tasks, which is an important facet of constructing the spectrum-sensing capability required by software-defined radio. The architecture of the proposed network is based on the Inception-ResNet network by changing the several kernel sizes and the repeated times of modules to adapt to modulation classification. The modules in the proposed architecture are repeated more times to increase the depth of neural network and the model’s ability to learn features. The modules in the proposed network combine the advantages of Inception network and ResNet, which have faster convergence rate and larger receptive field. The proposed network is proved to have excellent performance for modulation classification through the experiment in this paper. The experiment shows that the classification accuracy of the proposed method is highest with the varying SNR among the six methods and it peaks at 93.76% when the SNR is 14 dB, which is 6 percent higher than that of LSTM and 13 percent higher than that of MentorNet, Inception, and ResNet purely. Besides, the average accuracy from 0 to 18 dB of the proposed method is 3 percent higher than that of GAN network. It will provide a new idea for modulation classification aiming at distraction time signal.

1. Introduction

With the rapid development of communication technology, the wireless communication environment is becoming more and more complex and communication signals with various types of modulation are becoming more diverse and complex [1]. Automatic modulation classification (AMC) plays an important role in modern wireless communication [2]. The signal analysis and processing can be carried out only when the modulation of the signal is recognized [3]. It finds applications in various commercial and military areas. For example, recognition of the modulation type is used in software-defined radio (SDR) in order to adapt to various communication systems without requiring control overhead in the shortest time [4, 5]. Under such conditions, advanced automatic modulation classification techniques are required. It is also essential for identifying the source of received wireless signals [6, 7].

At present, AMC can be divided into two categories: likelihood-based (LB) and feature-based (FB) [8]. The LB modulation classifier recognizes the modulation of signal by comparing the likelihood function value of received signal within the known modulation pool [6]. It has been used for modulation classification in multiple channel environment with high accuracy [9]. While it needs some parameters to be known in advance, such as carrier frequency, code rate, and channel parameters, it becomes very complex when unknown parameters are introduced. Thus, it is difficult to design a system for the acquisition of signal. Some researchers have studied the way to simplify the likelihood function, which will lead to information absence and inaccurate results [10]. As LB method is sensitive to parameter estimation deviations or model mismatches, it is not applicable in many practical communication scenarios.

For FB method, features of the received signal are extracted and the modulation of signal can be identified by either comparing features with threshold values or feeding feature to pattern recognizer [11, 12]. In many traditional pattern recognition methods, it is necessary to extract features of signal manually, such as instantaneous statistics, high-order statistics, time-frequency characteristics, asynchronous delay sampling characteristics, etc. [13]. Then, these features are used as input of the classifier, such as decision tree and support vector machine [14]. Although it is simple with less computation, it shows poor performance for non-linear problems. Apart from this, features selected manually may not reflect the characteristics of signals with different modulation and improper feature selection will reduce the classification recognition accuracy of the classifier.

In recent years, great progress has been made in artificial intelligence and a single computer chip’s computing power has been greatly improved, which promotes deep learning algorithms to be widely used in modulation classification [15, 16]. It solves the core problem of how to automatically select and extract the features of samples. Besides, it realizes the combination of simple features into more efficient and complex features to achieve classification recognition [17]. In addition, deep neural networks have a multi-layer structure, which can better extract features of signal avoiding the tedious manual selection of data features [18]. At present, CNN, Google’s Inception, and residual network (ResNet) have been used in modulation classification with good results.

CNN-based modulation classification method was proposed and it can be used for the sampling sequence of the intermediate frequency signal directly [19]. Rajendran et al. proposed that recurrent neural network (RNN) can also be used for modulation classification of the sampling sequence of the intermediate frequency signal [20]. Hu et al. studied the effect of different noise on the modulation classification by using RNN [21].

Generated adversarial network (GAN) was proposed for data argument with RGB three-channel constellation in modulation classification filed [22]. Wang et al. established a 2-level convolutional neural network (CNN) architecture to distinguish 16QAM from 64QAM. In the second CNN architecture, 16QAM and 64QAM constellation diagrams were used as input to obtain a higher recognition rate [23]. However, this method relies on the constellation diagram and can only identify the baseband signal. The accuracy is very low when modulated signals are sampled on the radio frequency directly. A convolutional long short-term deep neural network (CLDNN) was proposed by combining the architectures of CNN and long short-term memory (LSTM) into a deep neural network, which takes advantage of CNNs, LSTMs, and conventional deep neural network architectures and enables the learning of long-term dependencies [24]. A ResNet architecture was proposed to distinguish between different modulation types by adding the bypass connections and showed good performance [17]. Shengliang P proposed two convolutional neural network- (CNN-) based DL models, AlexNet and GoogLeNet [25]. Cheong et al. extended the previous work on automatic modulation classification (AMC) by using deep neural networks (DNNs) and evaluated the performance of these architectures on signals [26]. Xie et al. proposed two algorithms, including M2M4-aided algorithm and multi-label DL based algorithm, to combat the varying SNR [27]. Peng et al. converted the raw modulated signals into images that had a grid-like topology and fed them to CNN for network training [28].

The dataset used in this paper was released on Proceedings of the GNU Radio Conference in 2016 for applications of machine learning (ML) to the Radio Signal Processing domain [29]. Li et al. used GAN network with data argument for modulation classification testing on this dataset and the highest accuracy was no more than 90% [30]. LSTM, Inception, and residual network were also used for modulation classification on this dataset, while the highest accuracy among them was around 88%. Miao Du proposed a new network structure called fully dense neural network (FDNN) for the automatic modulation classification and the average accuracy with FDNN is 89.6% from 0 to 18 dB [31]. Yuan Zeng proposed CNN architecture with spectrogram images as signal representation and achieved good recognition accuracy in 2019, while the average accuracy from 0 to 18 dB of it was less than 90% [32]. Wu et al. proposed convolutional neural network and multi-feature fusion for automatic modulation classification. The overall classification accuracy of it is less than 93% [33]. In addition, Zhang et al. proposed an automatic digital modulation classification network based on curriculum learning and the overall classification accuracy is less than 90% [34]. The highest accuracy of the proposed method in this paper is 93.76% at 14 dB and the average accuracy from 0 to 18 dB with the proposed method is 93.04%, which has excellent performance.

This paper is organized as follows: Section 2 introduces the related work. Section 3 formulates the basic principle of the ResNet. Section 4 gives the basic principle of Inception network. Section 5 details the proposed method in this paper. Section 6 presents the results of simulations and experiments to support the theoretical analysis.

2.1. Basic Principle of Signal Modulation

At present, digital modulation technology has been used widely in wireless communication. Although signal processing techniques such as modulation and demodulation of digital modulation are more difficult than analog modulation and the signal processing system is more complex for digital modulation, digital modulation technology has strong anti-interference and it is easy to use modern digital signal processing technology to process and analyze the signal. Therefore, digital modulation signals are widely used in practice. The modulation classification of digital modulation signals is studied in this paper.

A general expression for the received baseband complex envelope iswhere denotes baseband complex envelope of the received signal with no noise, and is noise. can be expressed as where represents the amplitude of signal, denotes the offset of carrier frequency, is the time-invariant carrier phase, is the phase jitter, is the symbol period, is the normalized epoch for time offset between the transmitter and signal receiver, is the composite effect of the residual channel with denoting the channel impulse response and denoting mathematical convolution, and is the transmit pulse shape. In equation (2), is the multidimensional vector that includes the deterministic unknown signal or channel parameters for the modulation type. The goal of modulation classification is to recognize the modulation type from the received signal [30]. The Amplitude Shift Keying (ASK), Frequency Shift Keying (FSK), Phase Shift Keying (PSK), and Quadrature Phase Shift Keying (QPSK) are commonly used modulation types.

2.2. Deep Neural Network for Classification Modulation

Convolutional layers are a common element in all state-of-the-art deep neural networks. A convolutional layer usually consists of convolutional filters. The size of convolution kernels is typically very small, such as 1 × 1 through 5 × 5 sizes. The transfer function for a standard convolutional layer [6] is given below:where is the output feature map for the filter, and represent learned bias and filter weight parameters, represents the input activations, denotes the convolution operation, and denotes a (typically non-linear) activation function [35].

A visible trend in neural networks for classification task is building deeper networks to learn more complex functions and hierarchical feature relationships. Deep networks enable more complex functions to be learned more readily from raw data.

Typically, applying deep neural networks to solve modulation classification is a matter of(i)Designing a network architecture(ii)Training the network to select weights which minimizes loss(iii)Validating and testing in practice to solve problem

To this end, we use various machine learning classifiers based on deep neural network architectures, where a training dataset is used to train the network, and then the classification accuracy is computed over the classification output for a testing dataset.

Figure 1 illustrates a diagram of deep learning for modulation classification. Generally speaking, the classification method mainly includes three parts: signal preprocessing, feature extraction, and evaluation. The deep neural network can automatically combine simple basic features into more complex features gradually, achieve effective feature expression of data samples, and maintain high recognition accuracy. In this paper, a supervised recognition method is used. Firstly, a large number of labeled samples are used to train the deep neural network, and then the trained model is used to recognize the unknown samples.

3. Residual Network

Although neural networks with more layers have better learning ability, the degeneration occurs sometimes, which leads to low accuracy. An effective approach so far, which won ImageNet 2015, is residual networks. A residual network adds one layer’s output to the output of the layer two layers deeper, as shown in Figure 2. Vanishing gradients are resolved by normalization techniques that have been widely adopted and that network depth is instead limited by training complexity of deep networks which can be simplified with residual functions [36].

ResNet improves the original network structure by adding connection identity mapping. The learning function is defined as and the input is added to the learned function. By adopting this method, the parameters and calculation amount of the network will not increase, and the training speed and recognition accuracy of the model will be significantly improved.

4. Inception Network

The inception architecture is one successful approach to increasing the depth of network and learning ability. This network consists of repeated inception modules. As shown in Figure 3, each inception module contains four parallel paths with the output being the concatenation of the four parallel outputs. The first path is a bank of 1 × 1 convolutions that forward along selected information. The 1 × 1 convolutions are a form of selective highway networks that simply pass information forward with no transformation. The second and third paths are 1 × 1 convolutions followed by a bank of 3 × 3 and 5 × 5 convolutions to make the network have larger receptive field. The last parallel path is a 3 × 3 pooling layer followed by 1 × 1 convolutions. Intermediate inception modules in the network are connected to Softmax classifiers that contribute to the network’s global loss for training. These classifiers are believed to help in increasing the model’s ability to learn features [37].

5. Proposed Method

5.1. The Structure of the Proposed Network

The architecture of the proposed network is based on the Inception-ResNet network by changing the several kernel sizes and the repeated times of modules to adapt to modulation classification for software-defined radio [38]. The architecture of the proposed network consists of stem network module, three Inception-ResNet modules, and reduction modules. This network combines the advantages of the Inception module and ResNet module, which has faster convergence rate. The network is proved to have excellent performance for modulation classification through the experiment in this paper.

For the residual versions of the Inception networks, cheaper Inception blocks than the original Inception are used. Each Inception block is followed by filter-expansion layer (1 × 1 convolution without activation) which is used for scaling up the dimensionality of the filter bank before the addition to match the depth of the input. This is needed to compensate for the dimensionality reduction induced by the Inception block.

Figure 4(a) shows the overall schema of the proposed network; the input dimension of input data is 1 × 2 × 128 which is set in the dataset for modulated signal. There are three Inception-ResNet modules in Figure 4(a); the structure of each Inception-ResNet module is shown in Figures 5(a)5(c), respectively. The module A is repeated 10 times. 20 times and 10 times are repeated for module B and module C. In order to make residual operation feasible, 1 × 1 convolution is added to match the depth of the network in each Inception-ResNet module. The schema for the stem of the proposed method is shown in Figure 4(b), where there are 6 convolutional layers and 1 max pooling layer.

5.2. Loss Function

For a multi-class classification task such as modulation recognition, the objective function is often categorical cross-entropy. Categorical cross-entropy (equation (4)) is a measure of difference between two probability distributions. For deep learning classification tasks, the probability distribution is usually a Softmax (equation (5)) of the output of the classifier network which is then converted to one-hot encoding for classification purposes. The error is calculated in what is known as the forward-pass and weights are adjusted using the chain rule to find error contribution for each parameter in what is known as the backward-pass. This kind of network output layer, optimization, and loss function have been used very successfully for multi-class vision tasks such as object recognition on the ImageNet dataset [39].

In equation (4), and are two probabilities and is a value to measure the difference of two probabilities.

In equation (5), the standard exponential function is applied to each element of the input vector and these values are normalized by dividing by the sum of all these exponentials; this normalization ensures that the sum of the components of the output vector is 1.

6. Experiment and Discussion

6.1. Dataset

We use the RadioML2016.10b dataset as a basis for evaluating the modulation recognition task. The dataset adopted in this paper was first released at the 6th Annual GNU Radio Conference in 2016. The dataset allows machine learning researchers with new ideas to dive directly into an important technical area without the need for collecting or generating new datasets and allows for direct comparison to efficacy of prior work. The dataset is generated with GNU Radio, consisting of digital and analog modulations at varying signal-to-noise ratios. Details about the generation of this dataset can be found in [30]. Figure 6 shows a high-level framework of the data generation.

The time segments were sampled randomly from the output stream of each simulation and stored in an output vector commonly used for Keras, Theano, and TensorFlow. The dataset uses a 4D real float 32 vector, taking the form Nexamples × Nchannels × Dim1 × Dim2, where each example consists of 128 complex floating point time samples. Nchannels = 1, a representation which commonly is used for RGBA values in imagery, Dim1 = 2 holding I and Q channels, and Dim2 = 128.

6.2. Training Process

The dataset is divided into two parts: 960000 samples are used for training the deep neural network and 240000 samples for validation. All models and training are done with the Keras with TensorFlow as a deep learning library using a TITAN RTX 24G GPU. The Adam optimizer was used for all architectures, and the loss function was the categorical cross-entropy function. We also used ReLU activation functions for all layers, except the last dense layer where we used Softmax activation functions. We used a minimum batch size of 128 and a learning rate of 0.01 and 0.001. Figure 7 shows the flow diagram of our experiment.

6.3. Trick during the Process

In order to get better performance and avoid overfitting, we introduce several tricks in training.(a)Batch NormalizationNormalize the input data between −1 and 1. This trick can be related towhere is the input data for the network and is normalized data. This trick can accelerate training and prevent overfitting.(b)Early stoppingA compromise is to train on the training dataset but to stop training at the point when performance on a validation dataset starts to degrade. This simple, effective, and widely used approach to training neural networks is called early stopping. Stopping the training of a neural network early before it has overfit the training dataset can reduce overfitting and improve the generalization of deep neural networks.The challenge of training a neural network long enough to learn the mapping, but not so long that it overfits the training data.Model performance on a holdout validation dataset can be monitored during training and training can be stopped when generalization error starts to increase.The use of early stopping requires the selection of a performance measure to monitor, a trigger to stop training, and a selection of the model weights to use.In this paper, we stop training process when the validation loss does not decrease within 10 epochs. As shown in Figure 8, the validation loss is 0.8820 at the 8th epoch and the validation loss is more than 0.8820 in the following 10 epochs. In this connection, we adopt the parameters at the 8th epoch to evaluate the performance of the model.(c)DropoutAssuming that the probability of discarding neurons in each layer is , then the remaining probability is . This trick can prevent the network from overfitting to a certain extent. In the dropout module in this paper, the remaining probability is 0.8.

6.4. Result and Discussion
6.4.1. Baseline

In this paper, the performance of the proposed method is compared with five methods including CNN, ResNet, Inception, LSTM, and one of the state-of-the-art methods MentorNet in [34]. The optimizer adopted in this paper is Adam. The default value of Adam optimizer (beta1 = 0.9, beta2 = 0.999, epsilon = 1e − 8, decay = 0) is set. For each method, the performances under different parameters are compared in order to get higher classification accuracy. Here, the influence of different learning rates and batch sizes is considered. The initial learning rate (lr) is initialized to 0.001 and 0.01, and the batch size is initialized to 128, 512, and 1024. It is detected that too high or too low learning rates and batch sizes could yield homogeneous prediction and low accuracy. Figure 9 illustrates the overall classification accuracy of the six methods from −20 dB to 18 dB with different parameters.

As shown in Figure 9, overall classification accuracy of the proposed method and five other methods from −20 dB to 18 dB with different parameters is compared. In this experiment, the highest classification accuracy in Figure 9 for each method is adopted as baseline. The baseline result is gotten from CNN, pure ResNet, pure Inception, LSTM, and MentorNet network on the RML2016b dataset. The accuracy of the baseline and the proposed method on the validation is shown in Table 1.

6.4.2. Overall Classification Accuracy

Figure 10 shows the overall classification accuracy of CNN, LSTM, Inception, ResNet, MentorNet, and the proposed method on the validation dataset from −20 dB to 18 dB. It can be seen that the classification accuracy of the six methods increases gradually and remains stable with the increase of SNR. The classification accuracy of the proposed method is highest with the varying SNR among the six methods. The classification accuracy of the proposed method reaches 90% at 0 dB, while the accuracy of the other methods is less than 90%. The classification accuracy of the proposed method peaks at 93.76% when the SNR is 14 dB and it is 6 percent higher than that of LSTM and 13 percent higher than that of MentorNet, Inception, and ResNet purely. The performance of the proposed method is verified by this experiment.

6.4.3. Classification Accuracy of Each Kind of Modulated Signal

Figure 11 shows the normalized classification of each kind of modulated signal including 8PSK, AM-DSB, BPSK, CPFSK, GFSK, PAM4, QAM16, QAM64, QPSK, and WBFM by the six methods when the SNR is 18 dB. In Figure 11, the vertical column represents the true label of the modulated signal and the horizontal row represents the predicted label gotten from the deep neural network. All data is normalized. In this connection, the data on the diagonal is the classification accuracy. For example, in Figure 11(a), there are two non-zero values in the top row, that is, 0.95 and 0.05, which indicates that the classification accuracy is 95% for 8PSK modulated signal by the proposed method and there are 5% 8PSK signals that are recognized as QPSK signals. In this connection, it can be found that the classification accuracy of 8PSK, BPSK, CPFSK, GFSK, PAM4, QAM16, QAM64, QPSK, and WBFM gotten from the proposed method is highest among the six methods when the SNR is 18 dB.

From Figure 11, we can get the classification accuracy of each modulated signal gotten from the six methods when the SNR is 18 dB. In this context, we can get the classification accuracy of each modulated signal using the proposed method, LSTM, CNN, Inception, and ResNet from −20 dB to 18 dB, which is shown in Figure 12.

Figure 12 indicates the classification accuracy of each modulated signal from −20 dB to 18 dB using the proposed method, LSTM, CNN, Inception, ResNet, and MentorNet networks. It can be seen that the classification accuracy of each modulation type using the six methods shows a general upward trend gradually and fluctuates within a narrow range with the increase of SNR. The classification accuracy of 8PSK, BPSK, CPFSK, GFSK, PAM4, QAM16, QAM64, and QPSK is more than 95% using the proposed method, which has the highest number of modulation types whose classification accuracy is more than 95% among the six methods. We can also see that the classification accuracy of GFSK and QAM64 is close to 100% using the proposed method, while that of the other four methods is less than 80%. In addition, the classification accuracy of WBFM is close to 100% using the proposed method, while that of the other four methods is less than 40%. We can also see that the classification accuracy of AM-DSB increases and then declines from −20 dB to 18 dB using the proposed method, which shows the proposed method is not well used to recognize AM-DSB modulation type. Overall, the proposed method has the best performance in classifying the modulated signals for the RML2016b dataset among the six methods adopted in this paper.

6.4.4. Computation Complexity

Table 2 shows the computation complexity of different methods. Total parameters indicate the number of parameters in different models. Training and inference time refer to the cost of time in training and validation process. FLOPs (floating point of operations) can also represent the complexity of deep neural network. From the table, we can see that the number of total parameters in the proposed network is highest among the six methods. The training time, inference time, and FLOPs are also higher than other methods. In this connection, it will take more time to use the proposed method to get the trained model.

7. Conclusion

In this paper, we introduced an improved deep neural architecture for implementing radio signal identification tasks, which is an important facet of constructing the spectrum-sensing capability required by software-defined radio. The goal was to achieve feature extraction by learning from original sampled signals on training dataset and evaluate the performance on validation dataset. We also compared the classification accuracy of the proposed method with that of LSTM, CNN, Inception, ResNet, and one of the state-of-the-art methods MentorNet. The experiment shows that the classification accuracy of the proposed method is highest with the varying SNR among the six methods and it peaks at 93.76% when the SNR is 14 dB and is 6 percent higher than that of LSTM and 13 percent higher than that of MentorNet, Inception, and ResNet purely. The proposed method has excellent performance for modulation classification. We believe that the significance of this paper far exceeds the modulation recognition task itself and it will offer a new idea for modulation classification.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

S. S. contributed to data curation; P.W. and X.W. contributed to methodology; J.Z. contributed to resources; B.S. contributed to software; J.W. contributed to supervision.

Acknowledgments

This work was supported by Youth Science Foundation of Hunan Natural Science Foundation under Grant No. 2019JJ50121.