#### Abstract

The rapid development of communication and computer has brought many application scenarios to the fingerprint identification technology of communication equipment. The technology is of great significance in electronic countermeasures, wireless network security, and other fields and has been widely studied in recent years. The fingerprint identification technology of communication equipment is mainly based on the fingerprint characteristics represented on the transmitted signals of the equipment, which are different from other devices, and the connection between the characteristics and the hardware equipment is established, so as to realize the purpose of identifying the communication equipment. In this paper, the author studies the key technologies related to fingerprint recognition of communication equipment, including signal acquisition, signal feature extraction, and classifier design, and transient signal recognition equipment. In this paper, the integrated learning and deep learning based on fingerprint recognition are taken as the main research contents of communication equipment, and the fingerprint recognition scheme of communication equipment is given; the proposed scheme is verified by the measured data. Aiming at the transient signal of communication equipment, an algorithm using the short-term periodicity of signal is presented. The feature extraction of steady-state signal is realized. The autoencoder feature and four kinds of integral bispectrum feature are analyzed and visualized. Research on communication equipment individual recognition technology is based on ensemble learning. An individual recognition scheme for communication devices based on Extreme Gradient Boosting (XGBoost) classification model is studied. The Gradient Boosting Decision Tree (GBDT) model with different parameters was used as the primary learner of stacking classifier. The steady-state signal recognition of mobile phones based on deep learning is studied. The results show that the stacking recognition rate improved by about 2% compared with GBDT using multiple GBDT models with different parameters as the primary learner.

#### 1. Introduction

Communication wireless equipment identification is one of the important means to obtain information in modern information warfare. The device is capable of tracking equipment, which enables target monitoring, tracking, electronic jamming, and military strikes against critical communications equipment and electronic carriers. The implementation of this technology not only can accurately master individual attribute information exchanges but also can track enemy equipment and then infer the deployment of enemy forces, analyze the composition of enemy communications networks, and provide significant and timely support for military intelligence. Different from the traditional modulation pattern recognition in communication reconnaissance individual recognition of communication wireless equipment is usually realized by extracting the tiny features of communication equipment. Generally speaking, there is tolerance between electronic components of wireless communication equipment, that is, in the manufacturing process, manufacturing errors of components lead to differences between their electrical parameters and nominal values and ultimately lead to differences in signals sent by these components.

In communication systems, the information carried in the signal is usually studied. The received signal not only carries the required information but also contains the hardware information of the transmitter. This part of hardware information is called the “fingerprint” of the source device. The method of extracting device hardware features based on received communication signals is called “RF fingerprint extraction.” The method of using extracted RF fingerprints to identify different wireless communication devices is called “RF fingerprint identification.” With the rapid development of wireless network technology, wireless network security has gradually become a research focus of various academic groups and research institutions at home and abroad. At present, key authentication is the main method of user identity authentication. If the key is leaked, the user’s personal information will have security problems. Since the fingerprint characteristics of hardware devices are difficult to imitate and stabilize, the security of wireless networks will be greatly improved if the authentication process can be combined with software and hardware authentication [1].

In the field of electronic countermeasures, target location and target recognition are two key tasks. Target positioning technology has been mature, and many global navigation satellite systems have been put into practical application for many years, with high positioning accuracy, and have been widely used in civil aspects. However, the development of target recognition is relatively slow. In recent years, protocol standards are constantly changing and the number of devices is increasing exponentially, which makes it difficult for some traditional countermeasures to meet the requirements of fingerprint identification of communication devices in the current complex and changeable electromagnetic environment. In the course of confrontation, it is necessary to intercept enemy communication signals. The lack of prior knowledge of signals makes it difficult to identify limited and unknown communication signals. Therefore, it is of great significance for the development of electronic countermeasures to find an identification method with stronger generality and better performance to further improve the identification effect of single communication equipment in the case of limited signal samples [2].

At present, there is no detailed technical reference standard and complete authoritative theory in the field of fingerprint identification of communication equipment. Therefore, the research of communication equipment fingerprint identification technology not only has engineering practical value in military and civil fields but also has theoretical value, which is of great significance to the development of many fields [3].

#### 2. State of the Art

Deep learning (DL) is a new research direction in the field of machine learning (ML). It is introduced into machine learning to make it closer to its original goal: artificial intelligence (AI).

Deep learning is to learn the internal rules and representation levels of sample data, and the information obtained in the learning process is of great help to the interpretation of data such as text, image, and sound. The ultimate goal is for machines to be able to learn analytically, like humans, and to recognize data such as text, images, and sound. Deep learning is a complex machine learning algorithm that has achieved far more results in speech and image recognition than previous related technologies.

Deep learning has made a lot of achievements in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, speech, recommendation and personalization technology, and other related fields. Deep learning enables machines to imitate human activities such as hearing and thinking and solves many complex pattern recognition problems, making great progress in artificial intelligence-related technologies. Deep learning model can be used for communication device individual recognition technology.

The purpose of fingerprint identification of communication equipment is to distinguish different equipment by transmitting signals. This technology mainly involves feature extraction and classifier design of transmitting signals of equipment [4]. With the increasing progress of hardware equipment manufacturing process, the circuit difference of the same type of equipment will become smaller and smaller, resulting in the difference between signals transmitted by different equipment becoming more and more difficult to distinguish. Transmitting characteristics of fingerprint features were characterized by some signal and the fingerprint characteristics should be at least two features with discriminant and universality; namely, the discriminant sex characteristics can be used to distinguish between different individual communication equipment, features can only represent the characteristics of the equipment, and universal characteristics in different communication devices should exist. The merits and demerits of features are directly related to the effect of subsequent classifier recognition. Therefore, extracting more effective signal fingerprint features of all kinds of communication equipment is an important link in fingerprint recognition of communication equipment, and it is a hot topic of current research [5].

In recent years, signal fingerprint identification technology of communication equipment has been widely concerned in the military field. According to the existing literature, many experts and scholars at home and abroad have conducted in-depth research in this field. The US Naval Research Institute has long been in a leading position in this field, especially in the direction of radar. Domestic research started at the end of the 20th century, and Cai Zhongwei put forward the first domestic communication equipment fingerprint identification framework in the early 21st century. A communication device signal is a combination of multiple components of a single internal device. Therefore, the signal generation process may be the same as that of an unintentional modulation device. Therefore, it is difficult to establish a unified and accurate mathematical model. The authoritative theories and technical standards in this field have not been unified. Scholars have conducted a lot of research on this issue [6].

Generally, the signal sent by the communication equipment in the process of switching on and off, communication mode switching, and so forth is called transient signal, and the signal generated by the equipment when the communication process tends to be stable is called steady signal. Compared with transient signal, steady-state signal is easier to obtain, but its characteristics are more difficult to extract and analyze [7].

Transient feature research mainly includes two aspects: one is to detect signal endpoints, and the other is to extract signal features. In terms of signal endpoint detection, in 2011, Hu Liting used short-term signal energy to conduct endpoint detection of wireless card signal [8]. Wang et al. proposed a popular method of waveform feature in 2015, which is used to extract weak features from the waveform feature space obtained by binary wavelet packet transform. Literature in 2016 proposed a method based on recursive graph analysis for signal detection. In the aspect of transient signal feature extraction, the early polynomial coefficient feature extraction method is simple and widely tested. Its limitation is that the order needs to be manually specified and it is difficult to describe subtle signal changes. In the study of transient characteristics, wavelet coefficient features are widely used as a transform domain method. In 1995, Choe proposed using Daubechies wavelet characteristics of signals to identify equipment radiation signals [9]. Hall further improves the generalization performance of fingerprint recognition of communication equipment by using multiple transform domain coefficients including wavelet coefficients for feature coefficient fusion. In the literature in 2012, short-time Fourier transform (STFT) was applied to transient signals to obtain the characteristics of the transform domain, and peak-to-peak value and variance of the characteristics of the transform domain were taken as the fingerprint characteristics of transient signals. This method achieved good recognition effect. Huang proposed in 2016 using Normalized Permutation Entropy (NPE) to describe the fingerprint characteristic information of communication equipment [10]. This method has a good effect on individual radio stations, but it is greatly affected by the amount of data [11].

Steady-state signal process is usually complicated, time-varying, and nonstationary, and it is difficult for conventional low-order statistical methods to accurately analyze the signal. Therefore, some high-order methods have obvious advantages. In 1993, literature proposed using J and R features of signals as steady-state signal features, which were extracted by high-order statistics [12]. The Hilbert-Huang Transform (HHT) method proposed by Huang is an adaptive analysis method, and many research works are carried out based on this method [13]. In 1999, Ureten et al. extracted signal amplitude and phase features after HHT transformation for recognition [14]. Yuan constructed several subdivided characteristic parameters based on Hilbert spectrum for signal analysis, which achieved good results in experiments on multiple mobile phones. Many transform domain methods have good advantages in feature discrimination, and the unintentional modulation generated by internal devices can often be detected by transform domain methods. Therefore, this method has been widely concerned. In 2011, Lei et al. used the constructed fuzzy function to slice the signal and obtain the features [15]. In 2012, literature constructed holder coefficient as the signal features. In 2015, Mingqian et al. proposed the use of generalized cumulant and instantaneous phase as signal characteristics for modulation signals radiated by communication equipment, and the method achieved good results [16]. In 2018, Yang et al. extracted signal envelope by constructing high-order cumulant detection function and used envelope fractal features for clustering, which can effectively suppress noise [17].

In the field of fingerprint recognition of communication equipment, many research works directly use feature to train classifier after feature extraction, without considering the matching relationship between feature and classifier. In recent years, studies have been carried out to address this problem [18]. In 2015, Reising conducted feature subset test on original features using the effect feedback of classifier training stage and analyzed which subset contributed more to the classification effect. Experiments showed that only 10% feature subset could still maintain the classification effect of all features [19]. In 2016, Bihl studied the influence of different feature selection methods on the classification effect of equipment individuals [20].

In the process of classifier design, the can communication equipment regards the fingerprint recognition process as a special case of machine learning process and trains according to the original signal features extracted by the classifier. A good classifier must meet the requirements of strong generalization ability and high recognition rate [21]. In the current research on fingerprint recognition of communication equipment, the design of classifier often needs to be determined according to the specific application scenarios, and factors such as data set size, data characteristic dimension, and number of data categories should be considered. Classifier design has been extensively studied in various fields, including speech recognition, medical diagnosis, and automatic driving [22]. Although there are many classifier models in mature fields for reference and utilization, in practical engineering application, classifiers should be designed reasonably according to the characteristics of data and the final classifier should be selected based on the effect of measured data. In current engineering fields related to machine learning, the most common classifiers include support vector machine (SVM), decision tree, k-Nearest Neighbor, and kNN classifier. These shallow models can often achieve good results for some binary classification tasks such as spam discrimination, but when the number of classification increases, the representation ability of shallow models is usually insufficient, and it is difficult to model multiclassification tasks. In recent years, researchers in the field of fingerprint recognition of communication equipment have studied the shallow model. Danev et al. used Mahalanobis distance to distinguish the distance between features after dimensionality reduction in 2012 to distinguish two communication equipment individuals. However, the generalization effect of this method decreases sharply when the number of device categories increases. In 2017, the improved semisupervised support vector machine classifier was used to classify radar signals, effectively solving the shortcomings of traditional classification such as low accuracy and unstable classification performance.

In addition to shallow models, deep learning models can effectively fit large-scale data, have strong expression ability, and do not rely too much on the advantages of feature engineering, so there have been more and more research works on deep learning in recent years. Through reviewing relevant literature, it can be found that deep learning model has also been studied and progress was made in the field of fingerprint recognition of communication devices. In 2016, Li et al. proposed a method combining multilevel modeling and deep learning, which reduced the dependence on prior knowledge and achieved good performance. In 2018, Lida et al. used supervised dimension reduction method to reduce bispectral features and used convolutional neural network to train and recognize them. This method reduces the computation amount of classifier network, but some classification information will be lost during supervised dimensionality reduction. You can see that the above methods have some effect but still require a lot of debugging. It can be said that, at present, with the gradual research of deep learning in various fields, a relatively complete theoretical basis has been formed. However, in practical engineering application, the adjustment of model parameters still needs human participation.

#### 3. Methodology

##### 3.1. Introduction to Integrated Learning

Ensemble learning is a generalization method that combines multiple single classifiers in some way for better generalization. It is also called classifier system or committee. The single classifier in ensemble learning is called base classifier. The traditional base classifier model usually seeks the optimal solution in a given model, namely, function space, such as naive Bayes classifier, support vector machine, and logistic regression model. The actual process of ensemble learning usually consists of two steps: base classifier generation process and model integration process. In the process of generating the basic classifier, the basic classifier has a good recognition rate for the basic classification equipment. There should be differences between different basic classifiers, which can be selected by different basic classifier models. Different training sample sets are used to set model parameters. The differences between basic classifier models are the key to improve the generalization ability of the final model. In the process of model integration, the final decision results need to be given, and the final results can be obtained by integrating the results of multiple base classifiers in a certain way using the algorithm, as shown in Figure 1.

The integration methods usually adjust different base classifiers, such as reducing the weight of the classifier with poor classification effect to make the final decision result have a higher recognition rate. The commonly used integration strategies include average method, voting method, and Bayesian decision method.

Compared with traditional single classifier, ensemble learning has many advantages. For a single classifier, the applicable space of the model often depends on the selection of the model. Therefore, it is often difficult to improve the effect of a single classifier by designing the classifier on this basis. A single classifier is easy to fall into local optimum and can get good results for some samples, while the recognition rate of the other part is low; that is, a single classifier is unstable for the prediction results of the whole sample. Ensemble learning combines the prediction results of multiple base classifiers. If the effect of some classifiers fluctuates greatly, it has little influence on the final prediction results. In other words, ensemble learning reduces the influence of a single classifier on the final results, so that the deviation of the system is reduced and the overall model is better, as shown in Figure 1.

##### 3.2. GBDT Algorithm Is Introduced

The block diagram of GBDT is shown in Figure 2.

In the specific implementation process, GBDT needs to train *K* column trees in parallel, where *K* is the number of categories. The input of each column tree contains all data feature samples, but the sample identification is different. The fitting loss of data is defined by the cross entropy loss function in the following formula:

The algorithm first initializes all sample cumulants and then passes the Softmax function in turn, which is defined in the following formula:

At this point, all sample probability distributions are uniform, that is, train several trees of each category in turn. Since it is necessary to calculate the probability that the sample belongs to each category after each iteration, it is necessary to train the next tree after training the previous tree in all *K* categories. When training each tree, residual errors of each feature sample in training the current tree need to be calculated, as shown in the following formula:where is the accumulative quantity of *m* − 1 tree before the *k*-th column of the sample and *y*_{k} is the probability that the sample belongs to the *k*-th category after the accumulative quantity of *m* − 1 tree. of *m* − 1 tree. The current tree is established by using feature samples and residuals at this time, and the mean square error of residuals falling into nodes is minimized through node splitting of the tree, as shown in the following formula:where the tree iteratively splits nodes to its maximum depth. The output value of leaf node is approximately output after Newton iteration from the initial value 0, as shown in the following formula:

The cumulant can be obtained by summing up the attenuation steps, as shown in the following formula:where is the JTH leaf node of the MTH tree in the k-th column. When the model recognizes feature samples, the category with the largest cumulative value is selected as the prediction category of feature samples.

##### 3.3. Introduction to XGBoost Algorithm

XGBoost (Extreme Gradient Boosting) is an efficient gradient lifting decision tree algorithm. It was improved on the basis of the original GBDT, which greatly improved the effect of the model. As a forward addition model, boosting idea is its core, which integrates several weak learners into a strong one by means of a certain method. That is, multiple trees make decisions together, and the result of each tree is the difference between the target value and the predicted result of all the previous trees, and all the results are summed up to get the final result, so as to achieve the improvement of the effect of the whole model.

XGBoost is made up of multiple Classification and Regression Trees (CART), so it can handle classification regression and other problems.

For *k*-category classification problems, XGBoost algorithm needs to train *k*-group trees in total, where each group of trees is fitted for a category. Input to dataset *X* of the *k*-th tree, where *X* contains the sample characteristics of each data sample and the true value of each data sample. If the data sample belongs to the *k*-th category, the value is denoted as 1; otherwise, it is denoted as 0. Each group of trees consists of several trees. For any sample, the *m*-th tree in the *k*-th group corresponds to an output value, which is defined in the following formula:where is the cumulative fitting value of the first *M* tree in group *k* of the sample and is the fitting value of the sample in the *M* tree in group *k*. Define the loss function of the model when training the *k*-th tree in the *k*-th group. The function definition is shown in the following formula:where *L*^{t} is the first- and second-order differentials of the loss function with respect to cumulants as shown in the two following formulas:

is regarded as a constant in the process of obtaining the minimum value so formula (8) is equivalent to the following formula:

For the *k*-th tree, in order to prevent the model from overfitting data, the number of leaf nodes *T* needs to be punished, and, in order to prevent the value from fluctuating between several trees, the output values of a total of *T* leaf nodes are also punished. The regular penalty term is defined as in the following formula:

Each data sample will eventually fall on a leaf node of the *k*-th tree, so the loss function is shown in the following formula:where *G*_{j} is the j-th leaf node of the tree. To simplify the formula, define new variables and see the following formula:

Then, the loss of the *t*-th tree is shown in the following formula:

For the output value of any leaf node, solve the quadratic function to obtain the leaf node value as shown in the following formula:

At this point, the loss is minimized. If there are *N* categories in total and each category trains *M* trees, then a total of *M* trees are trained, as shown in Figure 3. In the construction process of each tree, it is still considered to minimize the value of the loss function. That is, for any node *R* in the tree, we want to find a node division so that the total loss value of the left and right child nodes of the data samples in *R* will be reduced compared with the original maximum loss.

According to the above, when the value of leaf node is taken, the minimum loss function of this node can be obtained as in the following formula:where *G* and *H* are, respectively, the gradient sum of data falling into node *R*, and the loss value is obtained after partitioning, as shown in the following formula:

At this time, we hope to find the segmentation, so that the Gain takes the maximum value, and the value can be calculated as shown in the following formula:where L_{1} and L_{2} are the first-order and second-order gradients and they are calculated after the data in node *R* falls into the left and right child nodes, respectively. For dataset *X* of node *R*, if there are *k* dimensions in total, find the maximum and minimum values of data in dataset *X* in this dimension for each dimension, and obtain *P* partition points on this feature, as shown in the following formula:

*P* partition gains of all features are calculated, respectively, and the feature dimension and partition point values corresponding to the maximum gain are found as the partition method of this node. In particular, if all the calculated gains of this node are negative, the partition of this node is stopped.

##### 3.4. Stacking Algorithm Introduction

In ensemble learning, when there is a lot of training data, “learning method” is a powerful combination strategy, that is, combining through another learner. Stacking is a typical example of learning. We refer to individual learners as primary learners and those used for associations as secondary learners. This paper applies the algorithm to individual communication equipment identification. The general algorithm flow of the stack algorithm is to use the output of the upper classifier to train the current classifier. Figure 4 shows the framework structure of the stacked algorithm model, which consists of two layers of independent individual classifiers called primary learners. Classifiers that group multiple primary learners together are called secondary learners or metalearners.

The basic Stacking process of the Stacking algorithm is to use the original dataset to train the layer 1 classifier, and the prediction results of the layer 1 classifier are used as the new training set for the training of the layer 2 classifier; that is, the input characteristics of the layer 2 classifier are the fusion of the prediction of the layer 1 classifier, and the sample label information in the dataset does not change. In the Bagging algorithm, a combination of methods such as voting is used to aggregate the predictions of multiple classifiers, while, in the stacking algorithm, voting in Bagging is replaced by metaclassifiers to reduce the variance and bias of the overall model through further model training.

##### 3.5. Perceptrons and Multilayer Networks

The perceptron is composed of two layers of neurons. The input layer receives multiple external inputs and passes them to the output layer after operation, as shown in Figure 5.

The perceptual function realizes y and f, and non-basic operation, and the output is obtained by the excitation function after the input is weighted, as shown in the following formula:

Different operation functions can be realized by setting different parameter values. More generally, weights can be learned from a given training dataset. If the threshold *θ* is regarded as the link weight corresponding to a node with a fixed input of −1, the learning of weight and threshold can be unified as the learning of weight. The training method of perceptron is as follows: the weights are randomly initialized, and given the training dataset (*x*, *y*), the weights are updated repeatedly by the difference between the predicted results and the real results until convergence, as shown in the following formula:where is the learning rate and is the output value of the current perceptron. The perceptron only has neurons in the output layer for activating function mapping and its learning ability is very limited. The AND, OR, and NOT problems are linearly separable problems; that is, a linear hyperplane can be found to separate different modes. For simple nonlinear problems such as XOR problems, perceptron cannot provide solutions, and, in the process of repeated iteration, the weight will keep oscillating and it is difficult to stabilize. Therefore, in order to solve complex nonlinear problems, it is necessary to build a higher-level model, that is, to build a multilayer network based on the principle of perceptron. There are also several hidden layers between the output layer and the input layer, and the neurons of the output layer and hidden layer need to be nonlinearly mapped through the activation function. The common neural network structure is “multilayer feedforward neural network,” as shown in Figure 6.

The neurons in the input layer receive the input of feature samples, the hidden layer and the output layer process the received feature samples functionally, and the final result is given by the neurons in the output layer. Neurons at the same layer cannot be connected to each other, but, in some complex network structures, connections can exist across layers, and connections between adjacent layers may not be fully connected. The activation function of each neuron can be different. The commonly used activation functions are ReLU, Sigmoid, and Tanh. As the weight is updated through the backpropagation algorithm, the activation function needs to meet the differentiable. If the neuron output between each layer is not mapped by activation function, the output of the whole network is obtained by the chain multiplication of weight matrix of different layers, and its effect is the same as that of single-layer perceptron, so the problem of linear separability cannot be solved. Activation function introduces nonlinearity into network model, and the existence of activation function enables multilayer network to solve many complex nonlinear problems.

#### 4. Result Analysis and Discussion

The 6 mobile phones in the previous section were used to collect signals, calculate four kinds of integrated bispectrum (SIB, RIB, CIB, and AIB) of signals, and then splice them to obtain the feature set of integrated bispectrum. The feature sample set was sectioned at a ratio of 3 : 2, among which 2100 feature samples of each mobile phone were used as the training model of samples, and the remaining samples were used as the test set to test the recognition effect.

##### 4.1. GB after Repeation

Among the parameters of gbdt model, the model is adjusted to adapt to the four hyperparametric integral bispectrum features: the number of trees in each category in gbdt should not be too large. The experiment shows that too many trees will cause a slight impact effect, resulting in no improvement in the amount of calculation and generalization performance. Therefore, a total of 100 trees are set in the experiment. Correspondingly, the attenuation step size of residual iterative update should not be too small. The experiment shows that if the step size is too small, the bispectral information obtained by each tree will be too small, so more trees are needed [23]. As a result, more computation is introduced. The tree splitting criterion adopts the minimum mean square error, the fitting results are evaluated by the cross entropy error function, and the iterative residual of each sample is obtained by the partial derivative of the sample error function with respect to the current cumulant. Cross entropy error is one of the commonly used loss functions.

The individual recognition results of GBDT on 6 mobile devices are shown in Figure 7.

The average recognition rate of the six phones reached 96.5%. Compared with the naive Bayes model based on the independent hypothesis modeling of features, the recognition effect of GBDT is obviously better, which fully demonstrates the correlation between different dimensions of extracted integral bispectral features. Since GBDT only considers the first-step degree information of the error function for the current cumulant of the sample during training, and the calculation amount is small, it can be used as the initial learner when model fusion technology, such as stacking, is used to significantly reduce the overall computation amount and improve the generalization performance by integrating multiple models. The *F*1 values take the accuracy rate and recall rate into consideration comprehensively. It can be seen from Table 1 that the *F*1 values of the six mobile phones are all greater than 0.92, indicating that GBDT has a better recognition effect on the steady-state signals after bispectral feature extraction.

Table 1 shows individual identification results of GBDT mobile devices.

When the number of iterations in the model, namely, the number of trees in each group, reaches a certain level, the accuracy of model recognition is basically stable. For steady-state signal of mobile phone, integrating bispectrum feature is more suitable for XGBoost model than autoencoder feature, and it is easier to achieve better results. In particular, when XGBoost model parameters are set as follows: the maximum depth of each tree is set to 5, and the penalty weight of leaf node output value is set to 1.1, the penalty weight of the number of leaf nodes was set to 0.001, the forward coefficient was set to 0.5, and each group of trees was set to 150. The obtained recognition results of XGBoost on 6 mobile phones are shown in Figure 8.

##### 4.2. Stacking Results and Analysis

Figure 9 is a comparison of four integral bispectrum features in stacking classifiers and SVM classifiers, in which SVM was trained in a one-to-many manner for six mobile phones to obtain six independent SVMs. The prediction results of the feature samples to be tested are the categories corresponding to the maximum output values of the six SVMs.

It can be seen from Figure 9 that the average recognition rate for all four integral bispectrum features is higher than that of SVM, and the recognition rate for all four integral bispectrum features is still not good. After the completion of the combined integral bispectral training model, the test set is used for testing, and the confusion matrix obtained is shown in Figure 10.

As can be seen from Figure 10 and Table 2, the recognition effect of Nokia 3 and Fuzhongfu 6 is poor, and misrecognition is most likely to occur between 3 and 4 and between 5 and 6. Table 2 gives the indicator scores for steady-state signal identification of six mobile phones with stacking, with an average recognition rate of 98.71%, about 2% higher than that with a single GBDT, as shown in Table 2.

It can be seen from the table that the *F*1 values of the six mobile phones are all greater than 0.97, and the comparison between Tables 1 and 2 shows that the *F*1 value of each mobile phone is higher than GBDT, indicating the advantages of stacking algorithm for steady-state signal feature recognition. It can be seen that the accuracy and recognition rate of the algorithm are higher than those of the other algorithms, so this kind of recognition technology can be vigorously promoted.

#### 5. Conclusion

This paper mainly discusses the communication equipment individual recognition technology based on ensemble learning and introduces GBDT, XGBoost, and stack algorithms in detail. After integrating bispectrum feature extraction, the recognition rate of single steady-state signal of mobile phone by XGBoost reaches more than 98%. Using GBDT model with multiple different parameters as the primary learner, the superposition recognition rate is about 2% higher than GBDT, indicating the effectiveness of integrating multiple heterogeneous models to enhance individual recognition. A framework for steady-state signal recognition of communication equipment based on ensemble learning is proposed. This paper mainly demonstrates the superiority of the proposed algorithm by comparing several algorithms. From the comparison of recognition rate, we can find the superiority of the improved algorithm in this paper.

#### Data Availability

The labeled datasets used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.