The development of 5G technology has brought about a new era of Internet of Thing (IoT), and at the same time, electromagnetic spectrum monitoring and sensing have also ushered in huge challenges. Digital modulation recognition technology is an important content of electromagnetic spectrum sensing. In the increasingly complex wireless communication transmission environment, especially in noncooperative communication, it becomes more and more difficult to receive target signals and accurately extract effective semantic information from diverse modulation signals of the electromagnetic spectrum. At this stage, with the rapid development of network information and wireless communication technology, within a prescribed distance, the IoT built by many sensors has attracted wide attention from people in related fields. This paper proposes a distributed collaborative sensing spectrum semantic recognition architecture for communication signals based on feature fusion. Perform wireless communication and transmission between multiple sensors to form a self-organizing network to cooperatively sense signal semantic information, and extract the signal features of each sensor in the distributed network structure. Finally, the extracted sensor features are semantically analyzed and modeled, and the effective features are fused to complete the entire perception and recognition process. Even if the channel environment of a small number of receiving nodes deteriorates in a complex transmission environment, the signal quality features can still be accurately extracted, the classification and recognition effect like or higher than the best channel state performance can be achieved, and the fault tolerance of the system can be effectively improved. It can also enhance the performance of spectral semantic information sensing and recognition in the IoT environment.

1. Introduction

The rapid development of technology in the communication field has opened a new era of IoT with the emergence of 5G technology, followed by richer application scenarios and increasingly complex electromagnetic environments, which has brought spectrum monitoring management and electromagnetic spectrum sensing utilization huge challenge. In the complex electromagnetic environment where everything is interconnected, accurate perception of semantic spectrum information and identification of signal modulation methods can provide important information for communication networking, etc., thereby effectively improving spectrum utilization efficiency. Typical spectrum semantic sensing and recognition (SMSR) methods are mainly divided into traditional likelihood function-based decision-making [1, 2], feature extraction-based pattern recognition [3, 4], and other methods, as well as deep learning (DL) methods that have emerged recently [5]. Traditional methods mainly extract specific features manually, and the recognition effect largely depends on manual experience, which leads to poor recognition performance and fewer recognition types. With the advent of AlphaGo in 2015, more and more researchers are focusing on DL. It has achieved good results in classification tasks with its outstanding feature extraction capabilities. O’shea and West [6] built a simulated communication model through GNU Radio and collected 11 communication signals, using a Convolutional Neural Network (CNN) to extract signal features from In-phase and Quadrature () components. The DL method has no expert-derived features, showing a great accuracy improvement over traditional statistical methods. Wu et al. [7] improved the CNN and added a Long and Short-Term Memory (LSTM) structure, which improved the network’s feature extraction ability for signal timing and increased the signal recognition accuracy to 80%. Wu et al. [8] combined the cyclic spectrogram and constellation diagram and simulated it on the public dataset. When the signal-to-noise ratio (SNR) is 0 dB, the accuracy rate reaches 80% and the training time is shortened. In the next few years, researchers used more complex DL architectures [913], using extracted feature inputs and neural network pruning to improve operating efficiency. With the increasingly complex electromagnetic environment of signals, DL has gradually become the mainstream algorithm in spectrum semantic sensing (SMS) and modulation recognition algorithms relying on powerful feature extraction capabilities and robustness. Although communication signal modulation recognition technology has gradually matured and the results have become more abundant [14], with the rapid development of wireless communication technology, signal transmission scenarios have become increasingly diversified, and application requirements have become increasingly updated [15], all of which promote the improvement of modulation methods. Therefore, modulation recognition technology always needs to be constantly updated according to changes in application scenarios and application requirements.

In the actual wireless communication environment, the single-node SMS technology is easily affected by factors such as multipath effects, hidden terminals, and path loss and cannot obtain correct sensing results. At this stage, with the rapid development of network information and wireless communication technology, within a prescribed distance, the IoT built by a large number of nodes has attracted wide attention from people in related fields, and the number of network devices and sensors deployed in the physical environment is rapidly increasing. The increase also brings new challenges to the wireless SMS and recognition, and the research on the distributed network architecture [16] of the combination of multiple receivers arises at the historic moment. Distributed multisensor node wireless SMSR technology can be divided into data layer-based fusion, feature layer-based fusion, and decision-making layer-based fusion schemes. Zhang et al. [17] proposed that the automatic modulation recognition scheme based on multisensor signal fusion can provide higher reliability than single-sensor signals. Dulek [18] proposed a classifier based on online and distributed expectation maximization, which can achieve a classification and recognition effect similar to the best channel state performance. Distributed recognition technology is widely used in optical fiber vibration sensing recognition [19, 20]; Sun et.al [21] developed an improved deep learning method based on a serial fusion feature extraction model for an optical fiber distributed vibration sensing system which can automatically extract and identify effective features. Distributed fusion schemes based on the feature layer mostly use artificial features to achieve [2224], but in noncooperative communication scenarios, the received signal is usually a weak signal, which makes it difficult to obtain accurate feature expression. Although the fusion scheme based on the data layer can enhance the received signal strength to a certain extent, it is often necessary to perform centralized calculation and processing on the data of each node in the fusion center, which causes the fusion center to be overloaded. The distributed recognition architecture based on the decision-making layer needs to clarify the influence factors of each node on the final decision. Although the decision result can improve the recognition performance, it needs to know the prior information such as the SNR of the signal at the receiving end of each receiver node. It is not conducive to signal recognition in noncooperative communication scenarios. Therefore, in order to improve the performance of spectrum semantic perception and recognition in complex electromagnetic environments such as noncooperative communication scenarios, this study uses the outstanding feature extraction capabilities of DL methods to propose a distributed collaborative recognition scheme based on feature fusion. The main contributions of this study are as follows: (1)Build a distributed multisensor signal reception scene, and set up a transmitter and multiple receiver nodes for signal reception. Model the spectral semantic information in distributed scenarios, and simulate the transmission of communication signals in different state channels through simulation experiments. The specific scene settings are introduced in the next section. In addition, the method proposed in this paper can to a certain extent solve the problem of inability to perform accurate SMSR recognition modulation recognition in noncooperative communication confrontation scenarios due to weak received signals(2)The use of DL algorithms is to realize the feature semantic information extraction of multireceiver node signals, the more accurate feature expression can be obtained by fusion of multinode features’ semantic information, and the dimensionality reduction of the fused features is performed through the classifier to complete the distributed and coordinated communication signal recognition. It can eliminate the uncertainty of communication signal recognition caused by poor channel conditions to a certain extent

2. Materials and Methods

Under the IoT, the rapid development and comprehensive use of digital communication technology have brought huge challenges to the task of electromagnetic spectrum sensing. Recognition of communication signals, as one of the key technologies for electromagnetic spectrum monitoring, is of great significance in both military and civilian fields. However, most of the current researches are limited to a single node. Channel conditions and received signal strength will directly affect recognition performance. A single node is affected by environmental changes and has poor fault tolerance performance, which leads to its signal recognition effect; when the channel environment is poor, the effect will be poor. With the development of wireless sensor networks, spectral semantic sensing, signal estimation, and recognition algorithms have received more and more attention. Distributed deployment of multiple sensors in the monitoring area forms a wireless communication-based self-organizing network system, which can realize multisensor collaboration to sense the sensing objects in the detection area and finally send the collected semantic information to the control center for further processing. The use of multinode data fusion, feature fusion, and other methods can greatly eliminate the ambiguity of unknown signals. When the channel conditions of a small number of sensor nodes deteriorate, the recognition probability can still be finally maintained. In this study, the communication signal features extracted by each sensor node are fused, and the fused features’ semantic information is used to identify the signal modulation mode, to complete the identification process of the entire distributed algorithm. Figure 1 shows the distributed cooperative signal sensing and recognition framework based on feature fusion in this study, which can be divided into three modules, the distributed signal receiving module, the CLDNN feature semantic extraction module, and the fusion classification and recognition module.

The specific process is shown in Algorithm 1

Multireceiver distributed collaborative identification process
Step 1: Signal reception: the data received by each receiver is a baseband modulated complex sequence of length , ;
Step 2: Signal preprocessing: extract the real and imaginary parts of the complex baseband signal sequence as and channels:
   Will be stored as a two-dimensional matrix: ;
Step 3: Feature extraction: input the processed data of each node into the feature extraction network for training, map the data to the high-dimensional feature space, and extract the feature vector after training;
Step 4: Feature fusion: fusion of the feature vectors of different receivers in step 3, dimensionality reduction processing of high-dimensional features, retaining the main features with differences, removing redundant features, and obtaining the fused feature vector;
Step 5: Classification output: send the fused feature vector to the classifier for classification and recognition, and output the recognition result.
2.1. Distributed Signal Reception

Through the research and development of networking information systems and autonomous sensing and intelligent information equipment, the electronic equipment system is developing towards decentralization, networking, and distributed coordination, which greatly improves the level of electronic warfare. The network communication system based on the distributed concept is promoting the development of combat intelligence. The distributed scenario of this study is shown in Figure 2.

In the distributed scenario built in this paper, there are one transmitter and multiple receivers. The number of receivers needs to be determined according to the actual scene requirements, and different numbers of receivers often affect the final recognition performance. The transmitting end signal transmits the communication signal to each receiving end through different channels. After the signal features of each receiver are extracted, the features semantic information of different receivers can be analyzed and fused in the fusion classification center.

In the signal receiving module, the receiver converts the received signal into a baseband modulated signal through digital downconversion and other processing. The signal model received by each signal in the AWAG channel can be expressed as

Among them, is the signal sequence sent by the transmitter, represents the channel coefficient, is the signal at the receiving end, and is Gaussian white noise. However, due to the influence of distance and other factors in actual signal transmission, the signals of different receiver nodes have different delays and other factors. It can be further expressed as

where represents the channel transmission delay of the th receiver. The received signal quality of different receivers mainly depends on and .

For the popular digital receivers on the market, especially the software radio platform (SDR), the received communication signal is often a baseband complex sequence. Therefore, it is very necessary to start with the baseband data to perform modulation recognition on the signal. In this study, a vector is used to represent the received complex signal sequence with noise, and the signal model received by the th receiver can be expressed as

Feature extraction is performed on the received signals of each receiver node and then sent to the fusion classification center for fusion analysis and modeling of feature semantic information to further complete the recognition.

2.2. Feature Extraction and Analysis

In recent years, DL methods have stood out among many machine learning methods by their neural network architecture, algorithms, and optimization technologies. They have been widely used in machine vision and speech recognition and have achieved a series of breakthroughs. DL is a method that effectively uses a data-driven approach to extract features and accurately identify it. Compared with manual feature design and extraction, DL algorithms can effectively extract the shallow features and implicit features of the data, while also saving time. The natural attributes of big data in the communication field have led scholars to explore the possibility of applying DL to the communication field, such as the use of deep neural networks for modulation recognition and radar waveform recognition. Therefore, this study will also use the method based on DL for the feature extraction of the distributed collaborative recognition of communication signals.

CNN is mostly used in the field of image processing. In recent years, they have also been widely used in the field of communication for automatic modulation recognition of signals. The convolutional layer extracts the local features of the data through the convolution kernel; the LSTM network is a special recurrent neural network designed to avoid long-term dependence problems. The difference between the deep neural network (DNN) and recurrent neural network (RNN) and CNN is that DNN specifically refers to a fully connected neuron structure and does not include convolutional units or temporal associations. It can convert the extracted features into a feature space making the output easier to classify.

CNN is good at reducing frequency changes, LSTM is good at time modeling, and DNN is suitable for mapping features to more separable spaces. Therefore, CNN, LSTM, and DNN are complementary in modeling capabilities. Sainath et al. [25] use the complementarity of CNN, LSTM, and DNN to build a CLDNN network model for speech signal recognition, which has an improved effect compared to the three models used alone. There is a natural similarity between speech signal and communication signal, or it can be said that speech signal is a kind of communication signal. In terms of data representation, it is a discrete correlation sequence in the time domain, but the digital modulation signal data is included in natural language processing. In addition to the same information that it carries, the more important thing is its modulation information. Different from natural language processing, its modulation information is only related to the current symbol and a few adjacent symbols, not a real-time sequence. With obvious timing features, this paper builds a CLDNN network model that is more suitable for signal modulation recognition based on [21] for feature extraction. Its structure is shown in Figure 3.

In traditional feature extraction methods, only one channel of data is usually used for processing. In order to fully extract the subtle features’ semantic information of the received signal, this study converts the received complex number sequence into two channels of data:

where is the real part of the complex modulated signal and is the imaginary part. The two-way data is stored as a two-dimensional matrix as the input of the feature extraction network, that is, , where is the signal length, as shown in

Input the two-dimensional matrix sequence of the digital modulation signal into the CNN network, and complete the feature extraction and dimensionality reduction at this stage after convolution and pooling operations. To adapt to the features of the input sequence, this study uses the one-dimensional convolution kernel commonly used in sequences to replace the traditional two-dimensional convolution kernel to extract features, as shown in Figure 4, and the expression of the one-dimensional convolution kernel is shown in

where is the first feature of the layer, is the weight value of the feature of the layer, is the offset of the feature in the layer, is the number of feature maps in the layer, represents the size of the one-dimensional convolution kernel, represents the activation function, and . In this study, the main features of the modulated signal are extracted by 7 layers of convolution kernels with a size of . The number of convolution kernels are 64, 128, 128, 256, 256, and 512, respectively.

To reduce the size of the model and increase the calculation speed, this study uses a one-dimensional maximum pooling method with a step size of 2 to downsample the convolutional feature map. It should be referred to as

Among them, and represent the feature map of the and layer, respectively, and represents the downsampling.

After the input data vector is convolved and pooled, the feature space at this time can be expressed as . The long input information is converted into a shorter high-level feature sequence as the input of the LSTM module, using its features to learn the features of several adjacent symbols and input into the DNN network module, and the extracted features are mapped to a feature space that is easier to separate, and the final feature vector is obtained. The LSTM and DNN network parameter settings are shown in Table 1.

2.3. Fusion and Classification

The previous section introduced how to extract the features of the digital modulation signals of different receiver nodes. In order to make full use of the semantic information of different receivers in the communication network and further improve the accuracy of signal recognition, next, this section will introduce how to fuse feature semantics and collaborative identification of data from different receiver nodes.

Feature-level fusion refers to a fusion method that is completed by integrating or combining features from all nodes. Its purpose is to use the complementarity of each single node semantic information to synthesize the extracted features into a feature that is more discriminative than the input feature.

This study uses the aforementioned method to complete the feature extraction of the received data and finally extracts 128-dimensional features for each receiver node’s data as the feature vector of the signal sequence of the node, and the resulting fusion feature vector is as follows:

Among them, represents the feature vector after fusion, the size of each sample is , represents the data feature vector of each receiver obtained using the feature extraction method in this study, and represents the fusion operation on the feature vector of each node, and the size of the fused feature vector is .

After extracting effective feature vectors from single-node receiver data and analyzing them, perform feature vector fusion on multinode feature data. The obtained feature fusion vector can reduce the influence of channel quality and signal strength on the extracted features and can represent more modulation information about the signal than the vector extracted by a single node. But at the same time, multiple nodes also increase the complexity of problem analysis. Therefore, it is necessary to process the fusion features, so as to reduce the feature parameters while retaining the effective features to the greatest extent and complete the comprehensive analysis of the feature data. In summary, it is necessary to reanalyze the closely related feature parameters, eliminate redundant feature quantities, and finally realize the information contained in each feature with fewer comprehensive feature parameters. This study uses the same network model for feature extraction on different receiver node data, so the Principal Component Analysis (PCA) algorithm, which is often used in high-dimensional vector analysis, is used to process the fused feature data.

Assuming that the number of receiver nodes is , the feature of each node has a dimension of , the data of each node can be expressed as , the data feature of each node can be expressed as , and the error in the sample mapping process can be expressed as

Among them, represents the new feature after mapping, and the dimension remains unchanged. The process of PCA is as follows. (1)Feature normalization

Normalize the training samples to obtain the training parameters and then normalize the test samples. (2)Calculate the covariance matrix of the sample

Use the singular value decomposition method to calculate the eigenvalues and eigenvectors of the covariance matrix:

wherein is a dimensionality reduction matrix, which means that all the eigenvectors corresponding to the covariance matrix correspond to the eigenvalues one-to-one, and its dimension is , and if the first column of the matrix is selected, the sample features will be reduced to dimensionality. (3)Dimensionality reduction analysis

All nodes’ data samples can be expressed as , and the dimensionality reduction feature matrix is obtained according to the rules shown below:

Among them, the dimension of is , and the dimension is ; then, the matrix dimension after dimensionality reduction analysis is . The size of the dimensionality reduction error mainly depends on the selection of . The larger the value of , the more the feature vectors in the representation of , which can retain the features of the original features, and the smaller the error, but the redundant features will also be retained, and the amount of calculation will also be reduced. To retain the system’s 99% uncertainty, the determination of the can refer to

Through the above steps, the eigenvalues of the principal components can be determined, and the new feature space after dimensionality reduction and the new fusion feature vector can be obtained.

The feature vector after the dimensionality reduction process reduces the redundancy of the feature semantics and retains the main semantic information. At this time, the feature vector dimension becomes 30. Input the fused dimensionality reduction feature vector into the classifier for classification. Then, the modulation method of the current sample can be obtained.

In machine learning, the function of the classifier is to judge the class to which a new observation belongs on the basis of the labeled training data. In this paper, after extracting the data features of different nodes through the deep learning model, the classifier can be used to complete the category judgment of the unknown sample data. Commonly used classifiers generally include -nearest neighbors (KNN), decision tree classifiers, and support vector machines (SVM). SVM cannot rely on statistical methods, thus simplifying the usual classification and regression problems, and can find key samples that are critical to the task, so this method is used in this paper for the final classification and recognition.

3. Results and Discussion

3.1. Experimental Dataset

In this section, a number of experiments are carried out to evaluate the performance of this model and algorithm. The dataset used in the experiment is generated by simulation, and its parameters are shown in Table 2. In actual communication scenarios, low-order modulated signal features are easier to extract, but high-order modulated signals are usually used in practice. Recognize modulation modes including confusing modulation mode signals as dataset . Each modulation type under each SNR includes 1000 instances, 80% of which are selected as the training set and 20% as the test set.

3.2. Experimental Settings
3.2.1. Comparison of Feature Extraction Network Models

All experiments in this study are carried out in the TensorFlow framework, and the GPU accelerator used is GeForce RTX 2060. In all simulation experiments, the training model adopts the adaptive moment estimation optimizer, and the learning rate is set to 0.0001 to evaluate the training parameters. In order to prevent overfitting, this paper adds an early stop mechanism during the training process. When the loss function is iterated 30 times and when it is not falling, it can be considered that the model training is completed and tested.

To verify the effectiveness of the feature extraction module in this study, several different network models are trained and tested on the RadioML2016.10a dataset. The test results are shown in Figure 5.

As can be seen from Figure 5(a), on the open-source dataset RadioML2016.10a dataset, the CLDNN network model (CLDNNnet) network exhibits the best recognition performance. In terms of parameters, it can be seen from Figure 5(b) that ResNet and CLDNNnet have the least amount of network parameters, saving computing resources, and CLDNNnet is better than ResNet in terms of recognition performance. Therefore, considering comprehensively, the subsequent experiments in this paper extract different node features through the CLDNNnet.

3.2.2. Distributed Collaborative Recognition Results and Analysis

In the actual sensor network, due to the influence of factors such as the transmission distance and the distribution location of the monitoring nodes, the signal energy and quality at each receiving node are different. In the simulation experiment environment, it is mainly reflected in the different SNR of the received signal of each node. Therefore, in this experiment, each channel is set to be independent, and the signal quality of different receivers under different channel state transmission conditions is simulated with different sizes of Gaussian white noise. The experimental verification is divided into the following two cases: (a)Keep the average SNR of each node the same, and test the recognition effect of feature fusion of different numbers of receiver signals(b)Node 1 represents the node with the worst channel quality, and its SNR, namely, SNR1 varies from -20 to 12 dB, and the remaining nodes increase by 2 dB on the basis of node 1, namely, , , ….

Figure 6(a) illustrates the recognition performance of different numbers of receivers under the same average SNR. It can be seen from the figure that with the increase in the number of receivers, the recognition effect after feature fusion is continuously improving. The recognition performance is improved significantly under the low SNRs. In the case of 7-receiver cooperative recognition, the average recognition accuracy of the 8 kinds of modulated signals can reach 81.3% at -4 dB, which is nearly 10% higher than the single-receiver recognition accuracy. This is conducive to realize modulation recognition of weak and poor-quality communication signals in noncooperative communication scenarios. It can be seen from the confusion matrix in Figure 6(b) that the recognition error mainly comes from the confusion of 8PSK and 16PSK, 16QAM, and 64QAM which have similar features.

Under the setting of experiment (b), the variation curve of the cooperative recognition accuracy rate of different numbers of receivers under -20-12 dB is shown in Figure 7. With the increase in the number of receivers, the accuracy rate is continuously improving, and the improvement is obvious when the SNR is low. It can be found from Figure 8 that in the signal transmission process of different channels, under the conditions of low SNR or high SNR, the fusion node identification effect always tends to identify the receiver node with the highest performance. In a distributed sensor network, the fusion recognition of the features of different receiver nodes can eliminate the ambiguity of unknown signals to a great extent. When the condition of the transmission channel of a small number of signals deteriorates, it can still maintain a high recognition probability in the end.

In Figure 9, receiver 1 is the transmission with the worst channel conditions. The recognition accuracy rate under 0 dB is only 77.31%. There is serious aliasing between 8PSK and 16PSK and between 16QAM and 64QAM. A good receiver 3 still has a serious aliasing phenomenon between 16QAM and 64QAM. This is because despite the improved channel conditions, for signals with similar characteristics, the model is still unable to accurately discriminate between confusing samples. In receiver 5, due to good channel conditions and high received signal quality, an average recognition accuracy rate of 99.3% is achieved at 0 dB. In spite of the poor channel conditions, the average recognition accuracy rate of 0 dB still reaches 99.44% when five receivers are cooperatively recognized, and there is basically no recognition confusion. The distributed architecture of the sensor network uses the complementarity of the data features of each receiver to synthesize the extracted features into a feature that is more discriminative than the input feature, thereby greatly improving the recognition effect in harsh environments without affecting performance of other receivers.

3.2.3. Comparison of Different Channel Transmission Conditions

In actual signal transmission on different channels, different magnitudes of frequency offset, phase offset, multipath fading, etc. are often generated. In order to further explore the identification method of distributed multireceiver node feature fusion in different channel environments, on the basis of the above dataset, different degrees of frequency offset, phase offset, and multiple fading delays are added to affect the channel. In order to facilitate comparison, experiments are carried out by adding noise of the same magnitude to each node. The parameters of the further generated dataset are shown in Table 3.

Datasets 1 and 2 are generated by simulation in Rayleigh fading channels. By adding different degrees of fading coefficients and delays, the situation of receivers placed at different distances from the transmitter is further simulated, representing the signals at both ends of receivers 1 and 2, respectively. In addition, two datasets with only frequency offset or phase offset are set as the channel simulation with better transmission status, receiver 3 and receiver 4, respectively.

Although there are a few cases of severe channel deterioration in the distributed system architecture, the recognition performance has been improved after fusion, and the result after fusion always tends go to the node with the best performance. Even compared to the best receiving node, there is still a certain degree of improvement. As shown in Figure 10, the recognition performance of receiver 1 and receiver 2 is very unsatisfactory compared with receivers 3 and 4, but the fusion effect is significantly improved.

The extracted features of different receiver signals are visualized after dimensionality reduction, and the effect of feature extraction can be further observed. The effect of data feature visualization under 8 dB is shown in Figure 11. Because the transmitter signal is transmitted in a complex channel environment, it is affected by a variety of factors, resulting in poor signal quality received by receivers 1 and 2 and difficulty in feature extraction. In Figures 11(a) and 11(b), it can be found that the signal features of the two have serious aliasing phenomenon, and the clustering effect is poor. It is difficult to obtain accurate feature expressions, resulting in poor classification and recognition performance. When the features of the two nodes are fused, there is still serious aliasing, and it is difficult to achieve better recognition results, as shown in Figure 11(c). But when the signal features of the receiving end with better transmission are added, as shown in Figure 11(d), there is basically no aliasing. The signal features of each modulation type can be clearly distinguished, and the signal features of the same modulation are strongly aggregated, so that better recognition results can be achieved. This experiment further illustrates that although the distributed collaborative recognition framework based on feature fusion can improve the signal recognition performance to a certain extent, when the signal transmission channels are severely deteriorated, the recognition performance will not be greatly improved.

4. Conclusions

Aiming at the problem that it is difficult to accurately extract signal features and perception signal semantic information from complex channel transmission in noncooperative communication scenarios, this paper uses the excellent feature extraction performance of DL algorithms to propose a recognition architecture for multisensor distributed cooperative sensing spectrum semantics based on feature fusion. The distributed wireless sensor network monitors the electromagnetic spectrum to realize the semantic information of wireless communication signals. The simulation experiment proves that the distributed collaborative sensing and recognition architecture in this paper can solve the problems of single-node signal transmission, which is difficult to accurately analyze the spectrum semantic information under complex channel conditions, poor adaptability to the environment, and low recognition performance. In particular, in the confrontation scenario of noncooperative communication, the algorithm in this paper is more conducive to the feature extraction of weak signals and realizes spectral semantic perception and recognition. However, this solution still cannot guarantee a good recognition effect in a more complex communication signal transmission environment with higher recognition accuracy requirements. Since the datasets in this experiment are all generated under simulation conditions, in order to further verify the effectiveness of the algorithm, the next research will be applied to the perception recognition of the measured datasets on this basis. In addition, this paper realizes the fusion scheme based on the feature layer. How to coordinate the recognition of the data layer, feature layer, and decision layer without increasing the data computing load is an important research direction.

Data Availability

The RadioML2016.10a used in this paper is an open-source dataset in Reference [6]. Other datasets used in this article can be obtained by contacting the email [email protected].

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.


This work is supported by the National Natural Science Foundation of China (grant number 62001137), the Natural Science Foundation of Heilongjiang Province (grant number JJ2019LH2398), and the Foundation of Key Laboratory of Signal and Information System of Shandong Province.