Abstract

Deep learning is a new direction of research for specific emitter identification (SEI). Radio frequency (RF) fingerprints of the emitter signal are small and sensitive to noise. It is difficult to assign labels containing category information in noncooperative communication scenarios. This makes network models obtained by conventional supervised learning methods perform unsatisfactorily, leading to poor identification performance. To address this limitation, this paper proposes a semisupervised SEI algorithm based on bispectrum analysis and virtual adversarial training (VAT). Bispectrum analysis is performed on RF signals to enhance individual discriminability. A convolutional neural network (CNN) is used for RF fingerprint extraction. We used a small amount of labelled data to train the CNN in an adversarial manner to improve the antinoise performance of the network in a supervised model. Virtual adversarial samples were calculated for VAT, which made full use of labelled and large unlabelled training data to further improve the generalization capability of the network. Results of numerical experiments on a set of six universal software radio peripheral (USRP; model B210) devices demonstrated the stable and fast convergence performance of the proposed method, which exhibited approximately 90% classification accuracy at 10 dB. Finally, the classification performance of our method was verified using other evaluation metrics including receiver operating characteristic and precision-recall.

1. Introduction

Specific emitter identification (SEI) refers to the technology used to identify individual emitters using distinctive external features of the signal called the radio frequency (RF) fingerprints [1]. This is obtained from the differences between the hardware characteristics of individual emitters and is hard to reproduce and/or eliminate. Moreover, the RF fingerprints of an emitter have unique characteristics that are independent of the content of the signal and may show consistency in different parts of the signal from the same emitter. Even for emitters produced by the same manufacturer and batch of equipment, the RF fingerprints are still different from each other. Therefore, the RF fingerprints can be used as a unique identification measure that enables SEI technology to be employed in military and civilian applications [2, 3].

In recent years, researchers have proposed various approaches for RF fingerprint extraction, which is a method to obtain the hardware characteristics of the emitter. For example, Padilla et al. [4] proposed methods for extracting parameters (transient waveform, instantaneous phase, and amplitude of the signal) of the RF signals, which are used as RF fingerprints, by means of the preamble in the communication. The method can identify 28 different Wi-Fi devices with an accuracy higher than 95%. However, because this method can only be applied to communication signals with preamble, its application scope is limited. Lopez-Risueno et al. [5] proposed using short-time Fourier transform to obtain the time-frequency energy spectrum of the signal, where the fingerprint features of the emitter signal are extracted based on the differences in the time-frequency energy spectrum. However, this method is based on a linear transformation, which is not suitable for nonlinear radiation source signals. Zhou et al. [6] proposed a feature extraction method based on the bispectrum-radon transform, which used bispectrum analysis to characterize the RF fingerprints and completed feature compression through radon transform. The method identified 6 ADS-B emitters with an accuracy of 90.25%. The high-order spectrum analysis method, however, is only able to extract some of the features of the emitter signal while losing some of the important subtle features, which results in a lower identification performance. Yuan et al. [7] extracted 13 types of feature parameters of emitter transient characteristics through empirical mode decomposition (EMD) and Hilbert transform to form RF fingerprints and effectively identified 8 mobile phones. Moreover, the method can theoretically be used with any type of emitters as it does not require prior information for the RF signal. However, the method only applies to the transient signal of an emitter, which is challenging to capture in practical applications. Satija et al. [8] proposed an SEI approach based on variational mode decomposition and spectral features (VMD-SF). The advantage of the method lies in its adaptability to both single hop and relaying scenarios under both AWGN and flat-fading channels. Further, the method has a low computational cost and satisfactory real-time performance. However, the performance of this method must be verified using simulation signal data, so its practicability requires further research.

Although the conventional feature extraction scheme can reflect and amplify the individual differences of emitters, it needs to blindly try all the previous manually predefined RF fingerprint features to find an effective feature extraction method for a specific task. However, the complexity of the emitter signal renders it impossible to represent the signals using a unified mathematical model. Therefore, for the target signal of interest, the choice of the feature extraction method can only depend on the subjective judgment and cognitive level of the researchers, which cannot fully reflect the differences between individual emitters.

With advancements in artificial intelligence, techniques such as machine learning (ML), deep learning (DL), and reinforcement learning (RL) have been widely used in many fields [9]. In addition to the applications in traditional computer vision (CV), natural language processing (NLP), etc., deep learning technologies have gained great success in emerging fields, such as the Internet of Things (IoT) [9], physical layer communication [10, 11], and edge intelligence [1214]. Furthermore, deep learning is now used in SEI, which is a new research direction that can comprehensively and deeply extract the fingerprint features of the emitter signal through neural networks to improve recognition performance. Wong et al. [15] used a convolutional neural network (CNN) to estimate the gain and phase deviations of the in-phase and quadrature components of an emitter signal, achieving SEI based on the estimated gain deviation and phase deviation. Their method does not require preprocessing, such as signal synchronization and carrier frequency tracking, and can be applied to signals of multiple modulation types. He et al. [16] used a long short-term memory (LSTM) network to learn RFFs from preprocessed RF signals. Compared with CNN, LSTM is more suitable for processing time series such as one-dimensional signals. This implies that it can achieve better identification performance. However, LSTM also introduces significant computational costs and training difficulties, resulting in poor real-time performance. Based on network models such as CNN and LSTM, some new deep learning-based network models and algorithms for SEI have been proposed recently. Qian et al. [17] proposed an approach of multilevel sparse representation-based identification for SEI, which comprehensively used the CNN for RF fingerprint extraction and principal component analysis for sparse representation. The method can classify nine transmitters with a classification accuracy of over 90% using a small number of training samples. A complex-valued neural network (CVNN) proposed in [18] was used to process complex baseband signals to perform SEI. In addition, a network compression algorithm was proposed to reduce the model size and decrease the training complexity. The CVNN-based SEI method can achieve nearly 100% recognition accuracy at high SNRs. Furthermore, the network size can be compressed by nearly 70%–90%, and the training complexity decreases at different SNR levels. The fixed neural network structure typically has the problem of poor flexibility when processing complex RF signals at different scenarios. Hence, a balanced network architecture search (NAS) mechanism proposed in [19] was applied to conduct SEI. The framework uses a recurrent neural network (RNN) as the controller and cooperates with a balance function to search for the optimal network structure, thereby providing a suitable scheme for processing RF signals at current SEI tasks.

SEI based on deep learning has been extensively studied; however, most studies focus on supervised learning models, which assume that all training samples have label information. SEI is used in noncooperative scenarios in most cases, and a typical application is radio surveillance, which is the application scenario for the SEI system considered in this study. SEI can be used to distinguish between legal and illegal radio stations, identify different types of illegal radio stations (which can be targeted to eliminate their interferences), and effectively control the utilization of spectrum resources. It is difficult for illegal radio stations to be assigned label information representing the signal category. Thus, only a few signal samples contain label information. Using a large number of unlabelled signal samples combined with a few labelled signal samples to conduct semisupervised SEI-based deep learning is a worthwhile research area. To address this problem, we propose a semisupervised SEI scheme based on bispectrum analysis and virtual adversarial training (VAT). Bispectrum analysis, a signal preprocessing method, is used as the representation of RF fingerprints. On this basis, we used the CNN to extract the RF fingerprints, and VAT is performed to fully utilize the labelled and large amount of unlabelled signal samples to conduct semisupervised SEI. Compared with other existing supervised learning-based SEI methods, our research focuses on semisupervised learning-based SEI. On the one hand, our method improves the antinoise performance of the SEI system by changing the traditional network training mode. On the other hand, more importantly, our method makes full use of many of unlabelled signal samples to carry out semisupervised training through VAT, thus applying to SEI in noncooperative scenarios.

The main contributions of this work are summarized as follows: (1)To enhance the individual discriminability of different emitters, the bispectrum distribution is used as a characteristic representation of the RF signal, which lays the foundation for RF fingerprint extraction based on the CNN(2)Considering that RF fingerprints are susceptible to noise, we perform adversarial training [20, 21] on the CNN to improve the antinoise performance of the network. In addition, we propose an algorithm for calculating the minimum number of adversarial samples for use during adversarial training to maximize the antinoise performance of the network(3)After improving the antinoise performance of the network through adversarial training, we further propose a VAT method to train the CNN using labelled and large amounts of unlabelled signal samples collected in noncooperative scenarios, which can greatly enhance the generalization capacity of the network and improve the identification performance of the system(4)Various experimental results are presented. We first evaluate the convergence proposed method. Then, we measure the classification accuracy of the proposed framework along several factors, including SNRs, ratio of labelled to unlabelled samples, communication propagation channel, and number of emitters. Simulation results show that our method performs well for SEI in noncooperative scenarios

The remainder of this paper is organized as follows. In Materials and Methods, we introduce the signal preprocessing method based on bispectrum analysis and the basic concepts of CNNs, adversarial training, and VAT. Our proposed method of VAT-based semisupervised SEI is also introduced in this section, together with the experiments conducted on a real-world RF dataset generated through a software-defined radio (SDR) platform. Results and Discussion present and discuss the experimental results. Finally, we conclude the paper.

2. Materials and Methods

2.1. Bispectrum Distribution of RF Signal

Bispectrum analysis is a special case of higher-order spectral analysis, which has demonstrated its superiority in processing non-Gaussian and nonstationary signals [22, 23]. The bispectrum distribution of the signal can be obtained by calculating the two-dimensional Fourier transform of the third-order cumulant of the signal: where and represent the two-dimensional frequency and the third-order cumulant can be expressed as

Figure 1 shows the bispectrum distributions of two different RF signals and . It shows the visible differences in bispectrum distribution features, which demonstrate that the bispectrum analysis is an effective method of signal preprocessing to enhance individual discriminability.

2.2. Convolutional Neural Network

It is difficult to fully extract RF fingerprints using the methods mentioned earlier. In this study, we employed neural networks which have shown excellent performance in processing large quantities of data and extracting deep features [24, 25]. We used a CNN for RF fingerprint feature extraction and signal classification. A CNN is mainly composed of three structures: a convolution layer, an activation function, and a pooling layer. The function of the convolution layer is mainly to extract the features of the signals. In the convolution process, multiple convolution cores are used to convolve the respective feature maps of the previous layer to realize feature extraction and mapping. Each convolution layer includes many convolution cores, thus realizing multifeature extraction of signals. In a CNN, each feature map uses the same weight parameter, which is called weight sharing. Weight sharing reduces the number of parameters in the model and maintains displacement invariance for the position and size of the input. The activation function is used to introduce nonlinear factors into the convolutional layer for it to avoid becoming a linear combination of input vectors which helps in extracting complex and deep-level features [26]. The pooling layer is used for subsampling, whose main goal is to reduce the resolution of feature maps and thus facilitate the extraction of deep-level features.

When using a CNN to achieve signal classification, the feature space output by the CNN is used as the input of the fully connected neural network (FCN). The features extracted from the convolution layer are vectorized through the fully connected layer, thus mapping the distributed feature representation to the label space of samples. Finally, the output probability is mapped through the SoftMax function, and the maximum output probability corresponds to the recognized signal category.

2.3. Adversarial Training

Adversarial training is an important method to enhance the robustness of neural networks. The training samples mixed with some minor disturbances, which are subtle but can cause misidentification, are fed to the neural network to enable it to adapt to changes and be robust to interference. This method is widely used to defend against adversarial samples [27], which is one of the most effective means of adversarial defence. In contrast, adversarial training can be used for adversarial attacks [28]. The attacker constructs the adversarial input vector based on the adversarial sample so that the machine learning model misjudges, which is called an adversarial attack. Adversarial training can also be extended as VAT for semisupervised learning based on unlabelled data [29].

The RF fingerprints of the emitter signal caused by the subtle hardware differences of emitters are not generally evident and can be easily interfered by noise and be misidentified. To solve this problem, we used the adversarial training method to train the CNN by adding adversarial samples to the training data, which improves the network’s recognition of adversarial samples and improves the robustness of the network. The key to adversarial training is the generation of adversarial samples. For a trained CNN, a subtle perturbation is added to the input vector . When the network loss reaches its maximum value, the corresponding becomes the adversarial sample [27]. The subtle perturbation most likely results in the misjudgment of the original neural network. Therefore, the CNN needs to be trained with the input vector and the real label to improve its recognition performance for adversarial samples, which will result in enhanced network robustness [28].

Let , , and represent the labelled training samples, their corresponding labels, and unlabelled training samples, respectively. We also assumed that and are uniformly represented by . The loss function of the adversarial training is defined as [28] where is used to measure the divergence between two distributions and ; is the true distribution of the output label, which can be approximated by a one-hot vector; is the data distribution predicted by the CNN; and represents an adversarial sample, which is equivalent to and can be expressed as follows:

After obtaining the adversarial samples of all signal types, we regarded the loss value generated by the adversarial sample as part of the original loss value and added it to the original loss function in the form of regularization, which is expressed as where represents the labelled training sets, represents the weighting coefficient of regularization, and represents the supervised loss function for the CNN, which can be calculated through the cross-entropy function.

The parameter of the CNN is tuned using the backpropagation algorithm: where represents the learning rate.

2.4. Virtual Adversarial Training (VAT)

The adversarial training algorithm trains a CNN in a supervised learning model, where all the training samples must be labelled. However, in noncooperative communication scenarios, only a small number of signal samples are labelled. Using a small number of labelled samples to train the CNN through adversarial training results in poor generalization capacity.

To exploit the information in the unlabelled signals, we adopted VAT [30] to employ the labelled and unlabelled training data and to smoothen the output space of the neural network. This minimized the change in the output of the neural network where its input was locally perturbed. Therefore, VAT proved effective for semisupervised learning [31].

However, in the semisupervised learning model, there are many unlabelled training samples such as . Therefore, unlabelled training samples cannot be used to train the CNN through adversarial training algorithms. Note that, for a large amount of labelled training samples, approaches . We can use “virtual” labels that are probabilistically generated from rather than labels unknown to the user. We then compute the adversarial direction based on these virtual labels [32, 33]. The loss function for VAT can be expressed as where represents the weight parameters of the neural network in the current training state and represents the virtual adversarial sample:

After obtaining the virtual adversarial samples, the full loss function is given by where and represent the labelled and unlabelled training sets, respectively; represents the regularization coefficient that needs to be set in advance. represents the supervised loss function of the CNN, which is equivalent to Equation (6).

Equation (10) shows that both labelled data and a large amount of unlabelled data are used to carry out semisupervised training. The labelled data was combined with the unlabelled data to conduct virtual adversarial training. Supervised learning can use labelled data to guide network training. The loss function of VAT can be regarded as a measure of the local smoothness of the current network, and its optimization can smooth the network output space. , as the regularization coefficient, is used to control the relative balance between supervised learning and virtual adversarial training, ensuring the effect of semisupervised training.

Finally, the parameter of the CNN is tuned according to the backpropagation algorithm.

2.5. Semisupervised SEI Based on VAT

VAT is a semisupervised learning model that can be applied to SEI in noncommunication scenarios. However, networks trained with a considerably small amount of labelled signal data have poor generalization capability; these networks then probably assign the wrong virtual label for the unlabelled signal data, which can cause severely harmful effects for the subsequent classification. To solve this problem, we first trained the network model with labelled signal data via adversarial training, which improves the generalization capability of the network. Furthermore, we calculated a specific value for the perturbation weighting coefficient of the adversarial sample, which is the minimum value of that causes misidentification. At this time, the corresponding is the smallest perturbation that can lead to misidentification. Using this as an adversarial sample for adversarial training maximizes the antinoise performance of the neural network. The network parameters obtained through adversarial training will be directly used in VAT to conduct semisupervised SEI. Therefore, the procedure of semisupervised SEI based on VAT is concluded in Algorithm 1.

Required:
: signal data, including labelled data and unlabelled data
: labels corresponding to the labelled signal data
: number of training iterations
: regularization coefficients of (virtual) adversarial training
: supervised loss function of network model
: loss function of adversarial training
: full loss function of adversarial training
: loss function of virtual adversarial training
: full loss function of virtual adversarial training
: parameters of network model
: learning rate
: the upper limit of the perturbation weighting coefficient
: the lower limit of the perturbation weighting coefficient
: number of iterations of the perturbation weighting coefficient
1. for to do
2.
3. do
4.
5.
6.
7.
8. while.
9. end while
10., .
11. Initialize , , and
12.
13.
14.
15. end for
16. for to do
17.
18.
19.
20.
21. end for
2.6. Implementation Details

The calculation of (virtual) adversarial samples is essential for the (virtual) adversarial training algorithm. However, in practice, we cannot obtain a closed form of or calculated in Equation (4) or Equation (8). Therefore, in this section, we provide the core implementation details of the proposed algorithm: calculating (virtual) adversarial samples in an approximate manner.

The calculation of in this study can be approximated with a linear approximation of with respect to in Equation (4), which can be expressed as

For a neural network model, the calculation of can be computed through forward- and backpropagation.

Furthermore, the calculation of is performed in an approximate manner, which can be described as follows: for an input training sample , a random unit vector of the same size that obeys the standard Gaussian distribution is generated. Then, is obtained by taking the gradient of with respect to on for .

For the neural network model, the calculation of can be computed through forward- and backpropagation.

2.7. Signal Data Collection and Experimental Setup

We demonstrate our network model based on a software-defined radio platform composed of GNU Radio and seven USRP model B210 devices. By combining GNU Radio with the USRP, we then define the transceiver of radio signals through the PC to form a complete communication system composed of software and hardware. This platform realizes communication functions, and signal modulation and demodulation can be done at the software level.

A computer, running on Ubuntu 18.04, was connected to USRP to build a communication system. We used six USRPs as the transmitters and one USRP as the receiver, and then, we collected six types of RF signals through six USRPs, which operated at a 2.4 GHz centre frequency, and the received signals were sampled at a rate of 16 MHz. The signal modulation mode was QPSK, and the bandwidth was 1.2 MHz.

For each emitter, 20,000 segments of RF signals were collected from the lab environment; thus, the signal to noise ratio (SNR) of the signals was high. After the preliminary measurement, the SNR was found to be more than 50 dB. Therefore, we assumed that the signal was not affected by noise. For each class sampled signal, we calculated the average symbol energy within the input frame as and used MATLAB to add different levels of simulated additive white Gaussian noise (AWGN) to set the ratio of symbol energy to noise density () as 0 dB, 2 dB, …, 20 dB, respectively. The signal data polluted by noise were processed by bispectrum analysis to obtain bispectrum distributions, which have a uniform size of . Figure 2 shows the dataset structure of our experiments. The dataset contains six classes of sampled signal data. Each class signal includes 20,000 segments at a specific SNR (), each of which is transformed to a bispectrum distribution with the dimension of .

For 20,000 bispectrum distributions of each class signal at a specific SNR, 80%, 10%, and 10% were allocated to training, validation, and testing, respectively. For the training and validation datasets, we set the ratio of labelled data to unlabelled data at 10%.

The proposed semisupervised SEI network architecture was built on the Keras framework based on TensorFlow, and the network was trained on a Windows 10, Intel (R) core (TM) i9-10900 CPU, 16 GB RAM, and NVIDIA Ge-Force RTX 3090 system.

The structure and the detailed architecture parameters of the CNN are shown in Figure 3. Three convolutional layers are utilized, and the ReLU function is used as the activation function of the convolutional layer, which has stable output and no vanishing gradient problem. The features extracted from the convolution layer are vectorized through the fully connected layer (Dense I), thus mapping the distributed feature representation to the label space of samples. Finally, the output probability is mapped through the SoftMax function of the last fully connected layer (Dense II). Dense II contains neurons, corresponding to transmitters. By comparing the output probabilities of neurons, the maximum probability corresponds to the recognized emitter category.

In practice, the network depth (the number of convolution layers) is determined through simulations. Table 1 summarizes the architecture details of the CNN for different depths.

We evaluated the classification accuracy for different network depths based on the RF signal dataset mentioned above. Figure 4 shows that the classification accuracy improved with the increase in the number of convolution layers, and it remained at a high level when the number of convolution layers was 3. However, when the number of convolution layers reached 5, the network was too deep to fit the input signal data, and the classification accuracy slightly decreased. Considering classification accuracy and network complexity, the optimal network depth is 3 convolutional layers.

We also list the hyperparameters used to train the network model as summarized in Table 2.

We used the validation dataset for hyperparameter tuning. Learning rate is the most significant hyperparameter, which directly controls the magnitude of the network gradient update during training and affects the effective tolerance capability of the model. For tuning the learning rate, we first used a small amount of data to train the network to determine the magnitude of the learning rate. We then chose a specific value within this magnitude range as the initial learning rate. During the training process, as the number of training datasets increased, the learning rate decayed exponentially until the verification loss value converged, and the learning rate at this point was optimal. Batch size is a relatively independent hyperparameter that determines the direction in which the gradient decreases. Regarding the choice of the batch size, we determined several candidate values of 128, 256, 512, 1024, and 2048, and then evaluated how the classification accuracy in the validation dataset changed over time, and finally selected the batch size corresponding to the fastest improvement in classification performance over time. An epoch represents the training time. After each training epoch, the classification performance of the model on the validation dataset was evaluated, and the training was stopped when the classification performance stopped increasing. Therefore, generally, a large value was assigned to the epoch. The dropout rate was generally set to 0.5, which was used to prevent overfitting. By tuning hyperparameters, the network model can achieve better performance.

3. Results and Discussion

3.1. Convergence Performance

We evaluated the convergence performance of the neural networks trained using the proposed approach. We first collected RF signal data with an SNR of 10 dB from six USRPs, with each device representing a class of signal. The maximum epoch of network training was set at 200. Moreover, we chose the training loss value and test loss value as metrics to evaluate the convergence performance.

Figure 5 shows that the loss function of the network tended to be stable after approximately 80 training epochs, which means our approach has a relatively fast convergence speed. Moreover, the training loss function curve and the test loss function curve are relatively smooth with no noticeable fluctuation. This indicates that the training process was stable. The loss value of training and test procedure decreased as the number of iterations increased, i.e., the two curves exhibit a downward trend, which indicates that the network model performs well in both the training dataset and the test dataset and that no overfitting or underfitting problem occurred. The results show that our approach can be used for semisupervised training of neural networks.

3.2. Classification Accuracy

We first proved the superiority of the proposed method compared with that of the method using only labelled data to train CNN. The algorithm of t-distributed stochastic neighbour embedding (t-SNE), [34] one of the best dimensional reduction methods, was used to visually display the feature parameters extracted from the neural network model. Figure 6 shows the t-SNE dimension reduction distribution diagram of feature parameters extracted through two different methods.

As shown in Figure 6, compared with the algorithm using only labelled data to train the CNN, the feature parameters of the RF fingerprints extracted by the proposed algorithm have stronger clustering within classes and greater differentiation between classes. This proves that the proposed VAT algorithm, which uses a large amount of unlabelled signal samples for training, can improve the neural network’s generalization ability. This allows the network to extract the characteristic parameters of an individual emitter more comprehensively, thus improving the classification of RF signals.

We then considered four factors that significantly affect classification accuracy: (1) SNR, (2) ratio of labelled to unlabelled samples, (3) communication propagation channel, and (4) number of emitters.

Classification accuracy vs. SNR: first, we tested the classification accuracy on different SNRs. Both the labelled and unlabelled data were used to conduct semisupervised training on CNN based on the proposed method. Figure 7 shows the confusion matrix of classification at different SNRs.

As shown in Figure 7, all emitters maintain a relatively average classification accuracy, and no serious confusion occurs in the classification between individual emitters. This shows that our method can fully extract RF fingerprints and effectively distinguish between individual emitters, thus proving the effectiveness of the deep learning-based SEI. In addition, our method can classify six emitters with an average accuracy of more than 83% at 4 dB and 93% at 10 dB, which demonstrates that our proposed VAT-based semisupervised SEI is robust to noise interference.

Then, we compared the classification accuracy of the proposed method with that of the method using only labelled data to train CNN and those previously proposed methods in [29, 35]. The experimental results are shown in Figure 8.

Figure 8 shows that our approach achieves the highest classification accuracy compared to other SEI schemes. Compared to the method using only labelled data to train the CNN, our method can improve classification accuracy by 15%–20% on average. This is because VAT can smoothen the output space of the network to enhancing its generalization ability. This effectively overcomes noise interference and improves classification accuracy. Compared to the method proposed in [29], our method improves classification accuracy by 5%–10% on average. This is because our improved VAT method augments the antinoise performance and generalization capability of the network through adversarial training in the pretraining process. This results in the assignment of accurate virtual labels for the unlabelled signal data, thus contributing to the effectives of VAT. Compared to the method based on metalearning proposed in [35], our method shows an advantage in terms of classification accuracy at low SNRs, although both methods a have similar classification accuracy only when . Therefore, the experimental result shows that our method can adapt well to the task of SEI at low SNRs.

Classification accuracy vs. ratio of labelled to unlabelled samples: we evaluated classification accuracy on different labelled to unlabelled ratios. Similarly, we fixed the unlabelled data sample at 10,000 for each class signal and the labelled data sample from 200 to 1200, which means that the ratio of labelled samples to unlabelled samples was 2 to 12%. These labelled and unlabelled training samples with SNRs from 0 dB to 20 dB were used to train the CNN through the method of VAT.

Figure 9 shows that the classification accuracy improved when ratio of labelled to unlabelled samples increased from 2% to 6%, but stabilized when the ratio reached approximately 8%. Furthermore, even in the worst case of 2%, the classification performance did not deteriorate significantly. The classification accuracy was more than 80% at 10 dB and approximately 90% at 20 dB. According to these results, our approach needs only a small amount of labelled data samples to achieve a high and stable classification accuracy, giving it the ability to handle practical situations.

We further evaluated the classification performance of our method compared with that in [29, 35] at a ratio of 2%, 4%, and 6%, respectively. To eliminate the factor of noise interference, the SNR for the signal dataset was set to 20 dB. The experimental results are shown in Table 3. For all the ratios, our proposed method outperformed the existing methods, which further proves that the proposed method is adaptable and advanced in noncooperative scenarios.

Classification accuracy vs. communication propagation channel: we evaluated the classification accuracy of the proposed algorithm in different propagation channels. The collected RF signals were transmitted over an AWGN channel, Rayleigh channel, and Rice channel. Figure 10 shows the classification accuracy of different channels as a function of the SNR. With the Rayleigh and Rice channels, the classification accuracy was lower than with the AWGN channel; this is because the RF fingerprints of the signal emitted by the emitters were not apparent, and it was more difficult to distinguish after being affected by multiplicative noise. The proposed method essentially constructs (virtual) adversarial samples against additive noise, but not multiplicative noise, limiting the adaptiveness of the SEI system in terms of Rayleigh and Rice channels. However, the decline in classification accuracy, which is 5%–10% on average, is not significant. This demonstrates that the proposed approach can still calculate approximate and relatively effective (virtual) adversarial samples for signal samples according to the current network model under interference from multiplicative noise, which can improve the antinoise performance and generalization ability of the network to some extent. The experimental result further demonstrates the excellent classification performance of the proposed method.

Classification accuracy vs. number of emitters: we evaluated the classification accuracy on a different number of emitters. We collected RF signals from up to 14 different individual emitters and evaluated the classification performance of the network on different number of emitters, which varied from 6 to 14 and needed to be classified. The experiment was conducted based on an AWGN channel with .

Figure 11 shows how the classification accuracy changes with the number of emitters. The experimental result indicates that although classification performance deteriorates as the number of emitters increases, the network can classify up to 14 emitters with an accuracy of more than 85%. In general, with the increase in the number of emitters to be identified, the network scale should be increased also, resulting in a higher computation cost. Nevertheless, our method can classify more emitter individuals at a high classification accuracy, maintaining the existing network scale, which benefits from the elaborate training algorithm based on VAT and indicates that the proposed method has good scalability for large emitter populations.

3.3. Other Evaluation Metrics for Classification Performance

Classification accuracy is the ratio of the number of correctly classified samples to the total number of test samples, which can reflect only the overall classification performance. However, it is difficult to determine whether each class of RF signals is correctly classified, particularly when each class is in a minority with respect to the rest of the RF signals, leading to the class imbalance problem. In this case, classification accuracy is not a comprehensive evaluation measure of classification performance.

We used the receiver operating characteristic (ROC) and the precision-recall as metrics to further evaluate the classification performance of the proposed approach. To identify six USRPs, we chose one device as the positive class with a weight of five and the remaining five devices as a single negative class with a weight of one. The six classes of RF signals were collected with SNR of 10 dB, which were used for training and identification. The ROC and precision-recall curves for each of the six devices are shown in Figure 12.

As shown in Figure 12(a), the ROC curve for each device is distributed in the upper left of the figure. This implies that the system achieves a high true-positive rate with a low false-positive rate. Furthermore, precision and recall are two evaluation indices that balance each other. Figure 12(b) shows that the precision-recall curve for each device is distributed in the upper right of the figure, indicating that precision occupies a larger proportion. We also calculated the area under the curve (AUC) for each device. The ROC AUC and precision-recall AUC for each device is more than 90%, which further proves the excellent classification performance of the proposed method. Table 4 summarizes the mean ROC AUC and mean precision-recall AUC at SNRs of 0 to 20 dB. As expected, our proposed framework achieves a better mean AUC on higher SNRs and has a drop-off at lower SNRs, which is not significant.

4. Conclusion

To address the shortcomings of traditional SEI based on deep learning, this paper proposes an SEI method based on bispectrum analysis and VAT. First, bispectrum analysis is performed on the RF signals as a way of signal preprocessing. Noting that the emitter signal is susceptible to noise interference and only a small number of labelled training samples are available with many of the samples being unlabelled in a noncooperative communication scenario, we calculated the virtual adversarial samples for both labelled and unlabelled signal samples, using which we then calculated the corresponding loss functions. Loss functions based on the labelled samples were also calculated. Using the two loss functions and the preset harmonic parameters, the objective function of the neural network was calculated. Through iterative tuning, the neural network model corresponding to the minimum loss function value of the verification dataset was obtained as the optimal output. Finally, the neural network model could be used for SEI.

Numerical experiments were conducted to evaluate the performance of the proposed method. First, convergence experiments showed that our approach has stable and fast convergence. Second, we considered four factors that impact the classification accuracy of our method. The classification accuracy vs. SNR experiment showed that our method is significantly robust to noise. The classification accuracy vs. ratio of labelled to unlabelled sample experiment showed that our method can handle weak labelling problems in practical situations. The classification accuracy vs. propagation channel experiment showed that our method can also resist the interference of nonlinear multiplicative noise to RF signals and maintain a relatively high classification accuracy. The classification accuracy vs. number of emitter experiment showed that our method exhibits good scalability for large emitter populations. Moreover, we used two other methods including ROC and precision-recall to further evaluate the classification performance of our method. The AUC for ROC and precision-recall curves were calculated to represent the correct recognition rate of each device, and experimental results demonstrated the excellent classification performance of our method more comprehensively.

Future research will consider the following two aspects. (1) Various emitter devices, in addition to USRP, will be used to collect more types of RF signals to avoid using one type of signal data for experimentation and verify the scalability of the proposed method for RF signals emitted by different types of devices. Furthermore, these devices work in an outdoor environment to obtain more realistic signal data to verify the practicability of our method. (2) Our method is essentially for closed-set identification. The RF signal class to be identified is the same as that in the database used for training. However, our method cannot address anomalous emitters it was not previously trained on, which is common in real-world applications. Therefore, future work will focus on detecting anomalous emitters and classifying known emitters.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 91538201, in part by Taishan Scholar Project of Shandong Province under Grant ts201511020, and in part by project supported by Chinese National Key Laboratory of Science and Technology on Information System Security under Grant 6142111190404.