Abstract

In wide-area distributed scenarios, it is particularly important to carry out information security situational awareness for the air traffic management (ATM) system with integrated air-ground structure. The operation data of the communication, navigation and surveillance (CNS) equipment of ATM system have the characteristics of multi-dimension, complexity, and strong correlation. In the process of situation awareness feature extraction, there are problems such as poor model accuracy, weak feature expression ability, and low classification performance. A feature association algorithm is designed to solve the above problems. Based on this algorithm, a deep-related sparse autoencoder (DRSAE) model based on improved sparse autoencoder is established. In DRSAE model, L1 regularization and Kullback–Leibler divergence (KLD) sparsity terms are used to penalize the parameters of the encoder network, and the quantity of hidden layers is increased to allow the model to optimize the global encoder network by iteratively training a single encoder. Moreover, the proposed DRSAE model and other feature extraction models such as principal component analysis (PCA), autoencoder (AE), and sparse autoencoder (SAE) are compared and evaluated by using the support vector machine (SVM) classifier. Compared with other feature extraction models, it is found that the proposed DRSAE model has good robustness in feature extraction of ATM system, and the obtained features have strong expression ability, which enhances the classification performance of the model and is convenient for situation awareness.

1. Introduction

As the key equipment of air traffic management (ATM) system, the safe operation of the communication, navigation and surveillance (CNS) equipment is very important to ensure the stable information exchange of ATM system. CNS equipment contains various types of ATM data, and its security involves all levels of the ATM system. If the confidential information is subject to the abovementioned security threats, it will affect the flight status of the aircraft and even threaten the safety of aircraft navigation in severe cases. In addition, when the equipment fails due to human or design reasons, the ATM information exchange business will be suspended. In the process of information interactive transmission, the security threats of CNS equipment of ATM system generally include insecure physical access to IT infrastructure, unencrypted communication of satellite or ground system, insufficient security configuration of IT equipment hardware and software, unauthorized personnel access, malware infection, etc. [1].

The high-dimensional and complex operation data of ATM system equipment make it difficult for decision-making managers to perceive the security situation of the system. To make sure the availability, confidentiality, and integrity of the operation information of ATM system equipment, it is imperative to perceive the information security situation of ATM system. The situation awareness of CNS equipment aims to reduce the effect of the equipment on the operation of ATM system due to network threats, human operation errors, equipment failures, etc., so that managers can timely understand the security status of the system from a macro perspective and make correct decision [2]. In situation awareness research, feature extraction of data is the first step to obtain situation elements. Feature extraction transforms multi-dimensional feature space into universal low-dimensional feature space through linear transformation or nonlinear transformation, which is convenient for the subsequent situation assessment of the system [3].

The ATM system involves a wide range of users, so it is the most basic to guarantee the data security in the ATM system. The paper starts with the communication system equipment, navigation system equipment, surveillance network such as Automatic Dependent Surveillance-Broadcast (ADS-B) system, and management network such as System-Wide Information Management (SWIM) system platform of ATM system, analyzes the security threats faced by each system equipment and the existing system vulnerabilities, and studies the feature extraction of equipment operation data. The main innovations are as follows:(i)Since the operation data of ATM equipment have the characteristics of multi-dimensional, numerous, time-space dependence, etc., and there is correlation between the features, in order to prevent overfitting and high complexity, the data should be dimensionally reduced to use representable simple dimensional structure to represent data. Therefore, a feature association algorithm is proposed, which analyzes the correlation between data features and performs feature selection on the initial dataset to prevent overfitting of feature extraction models.(ii)In this paper, an improved feature extraction model–the deep-related sparse autoencoder (DRSAE) model—is proposed combined with the feature association algorithm. The model uses multiple hidden layers to extract features, and the sparsity of weights and neuron activation degrees in hidden layers is limited by L1 regularization and Kullback–Leibler divergence (KLD) to increase the accuracy of encoder feature extraction.

The remainder of this article is divided into the following four sections. The second section analyzes the existing research on autoencoder-based feature extraction methods. The third section describes the proposed feature association algorithm and the DRSAE model, and elaborates on the principle and structure of the model. In the fourth section, ATM dataset is used for feature association analysis and feature extraction experiments, and the support vector machine (SVM) classifier is combined to evaluate the classification effect of the feature extraction network model. Finally, the superiority of the proposed DRSAE model in the field of ATM feature extraction is proved by the simulation experiment of ATM system. In addition, the fifth section summarizes the contributions and deficiencies of this paper, and puts forward the next work.

ATM system is vulnerable to network attacks due to various types of business and multi-source heterogeneity of equipment operation data. And the overall equipment architecture is complex. Once the key equipment has problems, it will affect the stability of the whole ATM system. Therefore, the information security of ATM system has received extensive attention in aviation field. The Federal Aviation Administration (FAA) and other stakeholders are experimenting with data interconnection sharing. ATM requires the highest level of resilience, multi-level redundancy, and dynamic environment. This fragile environment brings a new attack medium to ATM. Moreover, the connection of some legacy systems of ATM system to the network will produce serious threats and vulnerabilities [4]. Aiming at the information security research of ATM system, Chivers[5] proposed a control consistency as a method to check and exchange control set information in the management system and studied the systematic defects of ATM security. The ADS-B monitoring system in ATM system is vulnerable to security threats. Leonardi et al. [6] proposed to use the carrier phase of the transmitter as features to determine the type of aircraft, thereby distinguishing legitimate and false information. At present, the international research on feature extraction of CNS equipment security situation in ATM system is still in the initial research stage, and a systematic theoretical system has not been formed. Therefore, this paper is devoted to study the feature extraction method of ATM system security situation.

The traditional linear feature extraction method is to map the initial feature to a lower dimension through linear projection. This method can no longer meet the requirements of processing multi-dimensional data generated by complex information exchanges in the era of big data. After dimensionality reduction, it can only ensure that the mathematical relationship between data remains relatively unchanged, but the nonlinear dimensionality reduction method can keep the feature information of complex nonlinear data while dimensionality reduction, ensuring that the essential features of the original data do not change. It belongs to the nonlinear manifold learning method [7]. The nonlinear dimensionality reduction method based on deep learning does not use manual and expert knowledge for feature extraction, but uses hyperparameter adaptive feedback training to obtain the optimal model. Autoencoders show significant advantages in deep learning-based feature extraction, and many scholars have improved them to improve the accuracy of feature extraction models. Xi et al. [8] used a feature correlation-based autoencoder (AE) anomaly detection method to reduce the impact of correlations between data and constructed a data association model combined with graph neural network to fuse sample features and increase the detection precision of model. Liu et al. [9] proposed a batch normalized stacked sparse autoencoder (SSAE) method to diagnose equipment faults, which has better detection ability than other methods. Lee et al. [10] proposed a feature extraction method based on deep unsupervised sparse autoencoder (SAE) for data classification, which improved the classification performance and detection speed, but the performance of sparse classes was worse than that of other classes. Xu et al. [11] proposed a deep belief sparse autoencoder (DBSAE), which captured features of label-free dissolved gas analysis (DGA) raw data, and a supervised trained back propagation network is used to implement transformer fault diagnosis. Marir et al. [12] proposed a new stack denoising SAE method, which was implemented by using spark-based iterative simplification paradigm to improve detection performance and algorithm efficiency. Tang et al. [13] considered the full structure of features in the AE, constraining the AE by adding low-rank properties and Laplace operator structure on features. For solving the problems of lack of fault dataset, fuzzy features, interaction between components, and coupling of fault features, Xie et al. [14] proposed an improved SAE method combined with a multi-level denoising strategy to diagnose electromagnetic interference faults. The method uses a relational constraint to limit the relationship between electromagnetic interference data, and multi-level denoising is carried out for fault data to enhance the ability of feature expression. Feng and Duarte[15] proposed an unsupervised feature selection method based on graph and AE to perform the spectrogram analysis and the feature extraction. Han et al. [16] proposed an ensemble autoencoder (EAE) model based on SAE and denoising autoencoder (DAE), and used convolutional neural network (CNN) pooling layer to control model overfitting, but this model only uses a single hidden layer, and the feature extraction accuracy is not high. Miao et al. [17] designed sparse representation convolutional autoencoder (SRCAE) model for making a fault analysis of the equipment, which made the performance of SRCAE better than of deep neural network (DNN), but the proposed model did not consider the multi-source heterogeneity of the equipment data.

The operation data of ATM system equipment have the characteristics of multi-source, complex, and feature correlation. In order to transform the complex and redundant high-dimensional data into low-dimensional data which is easy to study, and realize the situation awareness of the ATM system to provide decision basis for managers, it is indispensable to study the situation feature extraction method suitable for ATM system.

3. Materials and Methods

The feature selection method can reduce the data dimension, and the output data dimension is only a subset of the input data dimension. However, the AE is different from this method, and by learning the potential relationship between data, the AE reconstructs the data with similar structure to the input data, which has the characteristics of input and output data correlation, output data loss, and unsupervised and automatic learning. The AE uses artificial neural network to reduce the dimension of data by minimizing the reconstruction loss, which does not involve dataset annotation, thus reducing the workload of manual or automatic dataset annotation.

3.1. Autoencoder

A typical AE mainly consists of encoder, hidden unit, and decoder. The encoder and decoder are connected by hidden unit of neural network. The AE uses back propagation method to set the target output to a structure approximately equal to the input. Feature extraction of data mainly focuses on the structure of the encoder, and data can be converted to a new feature representation after encoder network training, namely, the feature obtained by dimensionality reduction [18]. In addition, the output results similar to the input data are obtained by reconstructing the features through the decoder. Figure 1 shows the architecture of an AE.

The encoder in an AE is the mapping from the input layer to the hidden layer , that is, the encoding process. Among them, m is the quantity of neurons in H, and n is the feature dimension of X. The function f(x) of H can be formulated as follows:where the function ae() is the nonlinear activation function of the encoder, which is used to convert the input data into the output signal through nonlinear transformation, and and be, respectively, represent the weight and deviation value of the X mapped to the H.

The decoder in the AE is a reconstruction mapping from the hidden layer to the output layer , that is, the decoding process. The function g(h) of Y is shown as follows:where ad() is the nonlinear activation function of the decoder, and and bd, respectively, represent the weight and deviation value of the H mapped to the Y.

The above values of , be, , and bd can be obtained by training the whole AE model. The AE requires training to make the reconstruction error between X and Y close to zero, so as to obtain the feature representation of the hidden layer H that best represents the relationship between the input X data, so that as much as possible. Mean squared error (MSE) [19] can represent the similarity between X and Y. MSE is shown as follows:where xi and yi are data items of X and target output Y, respectively, and n is the quantity of data units of X and Y.

3.2. Feature Association Algorithm

The data of ATM operation are multi-source and heterogeneous, the complexity of data space is high, the dependence between data features is strong, and the correlation between data attributes is prominent. During network training, due to the interaction between features, some data attributes may play a greater role in the overall relationship of data than others, which will affect the features extracted by the encoder to a certain extent, making it impossible to fully represent the potential rules between data. This paper uses the method based on data attributes to establish the feature association model, so as to achieve the purpose of decoupling the attributes and features, and make the attribute association clearly expressed [20].

As some attribute features interact with each other, and in order to accurately represent data attributes, each attribute is regarded as an independent variable in this paper, and the established feature association model based on data attributes is shown in Figure 2.

First, let the original feature set of input data X be , and k is the total quantity of attributes of X, that is, the initial data feature dimension. Secondly, the feature set T is divided according to the correlation of the actual meaning of attributes, which is further expressed as , n is the quantity of attribute types that divide the data feature set T, represents a class of features related to attributes, and then , , …, . Among them, , are the number of attributes of types of features, respectively.

Taking the features t1 and t2 in the R1 class as an example, the degree of correlation between t1 and t2 is calculated by calculating the Euclidean distance between t1 and t2. The correlation [21] between features t1 and t2 can be formulated as follows:where xi is the data item in the feature t1 variable, yi is the data item in the feature t2 variable, and n′ is the total number of input data items. The correlation coefficient matrix C1 of the features in R1 is shown as follows:where p1 is the number of attributes of the R1 class features. The attributes of the data are completely related to themselves, so in (5) above, , and the correlation coefficient matrix C1 of R1 can be rewritten as follows:

The range of results obtained by calculating the correlation c is [−1, 1]. As the value of c approaches 1, it means that the two characteristic variables are positively correlated, when the c tends to −1, it means that the two characteristic variables are negative correlated, and when the c tends to 0, it means that the two feature variables do not influence each other.

The correlation coefficient matrix C of each type of feature is obtained by performing correlation analysis on n types of features, and a threshold value is set. Except the diagonal correlation elements of the C, when the absolute values of other correlation elements in the C are greater than or equal to this threshold, the two features of the correlation element are considered to have a strong correlation. Comparing the mean and variance of the two feature data items, since the size of the variance determines the degree of influence of this feature on the overall data, the feature with small variance is selected to be deleted; that is, delete the features that have less impact on the overall data to reduce the feature dimension.

After the correlation analysis is performed on each type of feature, the correlation analysis of the whole data needs to be performed again. If the obtained correlation coefficient matrix shows that there is correlation between attribute features in different categories, take feature t1 in R1 class and feature tp1+1 in R2 class as examples, that is, . When is greater than or equal to the threshold, compare the attribute numbers p1 and p2 of R1 class and R2 class; when p1 < p2, it indicates that the feature tp1+1 of R2 class has more attributes than the feature t1 of R1 class. In order to maintain a relative balance in the number of attributes of various features, the feature tp1+1 of R2 class is deleted. When p1 > p2, the principle is the same. When p1 = p2, deleting the feature t1 of R1 class or the feature tp1+1 of R2 class has the same effect.

The feature association algorithm processes the data before feature extraction, which is equivalent to feature selection on the data, and alleviates the influence of data correlation that affects the accuracy and robustness of feature learning.

3.3. DRSAE Feature Extraction Model

In the training process of traditional AE, MSE may be too small, which leads to overfitting of the model, weak generalization ability of the network, and inability to learn important data features effectively. When the dimension m of the hidden layer is expanded to a range greater than or the same as the dimension n of the input layer, it may happen that the output of the AE is equal to the input, and the SAE adds a sparsity penalty constraint to the loss function of the encoding network, so that the encoder can obtain high-dimensional and deeper feature representation, and to some extent, the generalization ability of the encoder is improved by limiting the weight W of the hidden layer to the input layer [22].

SAE generally uses L1 regularization and sigmoid function as encoder constraint and hidden layer activation function, respectively. According to the special properties of the sigmoid activation function, the original optimal solution will have different offsets. In the end, some elements of the hidden layer output are close to 0; that is, the elements are in the inactive state, and some elements are close to 1; that is, the elements are in the active state, which makes the encoding network sparse and avoids overfitting of the model. L1 regularization constrains the weight vector , and the calculation equation is as follows:where is the sparsity constraint, which is used to control the degree of regularization, and is the weight of the hidden layer H to each sample of the input . After L1 regularization restriction, the loss function of the encoder is shown as follows:

In addition to using the L1 regularization to constrain the loss function, the relative entropy, namely, the Kullback–Leibler divergence (KLD), can also be used as a penalty term to asymmetrically measure the difference between X and the target Y probability distribution to limit the sparsity of the network, and KLD [23] is defined as follows:where px and py are Bernoulli distributions of random variables x and y, and and are the probability distribution functions of xi and yi. In SAE, the average activation degree aj of the jth neuron in H to is shown as follows:where h(j) is the jth component of the matrix vector of H and ah(j) is the overall activation degree of the jth neuron in H; when the input sample is , indicates the activation degree of the jth neuron in H; when the input data are xi, its value depends on the weight and deviation in (1).

To ensure the sparsity of the network, neurons in H need to be in an inhibited state most of the time; that is, the quantity of neurons in the active state is much smaller than the quantity of neurons in the inactive state. Let a be the expected average activation degree of neurons in H for the input sample, which is a value close to 0 (usually set to 0.05). In an ideal state, the actual average activation degree aj of each neuron should be equal to the sparsity parameter a, and the KLD of aj and a is shown as follows:

The smaller is, the smaller the difference between input distribution and output distribution of SAE is, and the smaller the loss of encoder network is. The target loss function of SAE is shown as follows:where is the penalty constraint parameter imposed on aj when aj is not equal to , that is, the penalty factor of the penalty term , which represents the sparse degree of neurons in H.

When the encoder uses the sigmoid activation function, the range of the output feature values of H is between (0, 1). The loss function can be calculated using KLD, but it is prone to gradient saturation. When the encoder uses the rectified linear unit (ReLU) activation function, since there is no gradient vanishing problem, the range of output feature values of the hidden layer is , and the denominator may be 0 when calculating KLD. Therefore, the KLD cannot be used to constrain the model, and only L1 regularization can be used to make the encoder learn the sparsity characteristics of the data.

In addition, when training the encoder network, the setting of hyperparameters, namely, sparse parameters, will affect the sparsity of the network target output. The stochastic gradient descent (SGD) algorithm and adaptive moment estimation (Adam) optimizers are usually used to train sparse parameters iteratively through back propagation to determine the optimal parameter values, thus minimizing the loss function.

In this paper, SAE is used for extracting features from the multi-dimensional data of ATM system and converting the equipment operation data from multi-dimensional space to one-dimensional space or a dimension that is favorable for subsequent research. By improving the hidden layer of the original AE, SAE regards some nodes of each unit in the hidden layer as inactive states and only studies the correlation and characteristics of other nodes in this layer [24], which can improve the accuracy of feature extraction of the encoder.

Increasing the number of hidden layers in SAE can make the encoder learn more useful hidden structures and representations of data, and make SAE become a deep sparse autoencoder (DSAE), which is more accurate than models using SAE with a single hidden layer in terms of system feature extraction. Therefore, based on the coupling problem between features of ATM system, this paper increases the number of hidden layers in SAE for improving the data quality of input network and training the AE layer by layer. Finally, a DSAE feature extraction model is formed. The structure of DSAE is appeared in Figure 3.

DSAE uses MSE to measure the similarity between the input and output of the autoencoder, and uses L1 regularization to impose regularization constraints on the encoder, which makes the encoder generate a sparse weight matrix. Then, combined with KLD, the activation degree of neurons in H of the encoder is limited to increase the accuracy of feature extraction model. Since the encoder calculates the gradient value by averaging the partial derivatives, which only need to sum the partial derivatives of all terms, the loss function can also be directly expressed by adding together. According to (8) and (11), the objective loss function of DSAE is shown as follows:

In addition, the data in this paper are sparse, the features with low frequency need to be updated more frequently, and the learning rate will gradually decrease with the number of updates. Therefore, the Adam optimizer is used as the optimizer of the encoder parameters to calculate adaptive learning rate for each parameter, and the optimal parameter combination is found through back propagation during training.

On the basis of DSAE, the proposed feature association algorithm is integrated into DSAE to form DRSAE feature extraction model. The established DRSAE model is shown in Figure 4.

Since the DRSAE model integrates the feature association algorithm, the initial input of the model has changed, and the loss function of the improved DRSAE model has also changed. According to (13), the loss function of DRSAE is as follows:where xi ∈ X, yi ∈ Y, X and Y are the input dataset and output dataset of the model after feature association selection, respectively, , X′ ⊆ X, , Y′ ⊆ Y, s is the dimension of the input data and output data of the model after feature association selection, m′ is the number of neurons in the hidden layer of the model after feature association selection, s ≤ n.

The algorithm steps of DRSAE feature extraction model are as follows:(i)Normalized dataset.(ii)Perform feature correlation analysis on the data, obtain the correlation coefficient matrix of each type of feature by calculating correlation, use the threshold to compare the correlation, and delete the features that have little impact on the overall data to obtain a new dataset.(iii)Divide the training set and the testing set on the basis of the size of the dataset, and input the training set into DSAE.(iv)Train the encoder models one by one, and unsupervisedly learn the features of the data through the fully connected layer of the autoencoder to make the contrast loss between the reconstructed output and the input tend to zero as much as possible.(v)Use the encoder output weight of the last trained associative SAE as the encoder input weight of the current associative SAE, and train the current associative SAE.(vi)Connect the trained encoder layers of each SAE to form a DRSAE, initialize parameters of the entire DRSAE model using the previously trained model parameters, and carry out global optimization by Adam optimizer.(vii)After training the model, extract the output weight of the last hidden layer of the encoder, that is, the extracted final features, input these features and their corresponding labels into a set classifier, and carry out supervised learning through labeled sample data.(viii)After training the classifier, get the classification result of the model.

4. Results and Discussion

In this paper, Windows 7 (64 bit) operating system, Intel Core processor is used as the experimental environment. Experiments are carried out on the equipment operation data of the ATM system through the Python language. The Keras framework in deep learning is used to train the feature extraction and classification model.

4.1. Parameter Description

The DRSAE method in this paper first performs feature association analysis on the data, uses the unsupervised learning method to learn the data features through the sparse autoencoder, adds an SVM classifier to the last layer of the encoder, and performs supervised learning combined with the data labels. The whole model actually is a semi-supervised learning model. Among them, the activation function of the hidden layers of the model adopts the sigmoid nonlinear activation function to perform nonlinear transformation, and the SVM uses the radial basis function (RBF) as the kernel function to classify the normal and abnormal data. The values of the model parameters used in the experiments are set as shown in Table 1.

The threshold in the feature association algorithm is set based on empirical judgment on the strength of the ATM data feature association, which has certain subjective factors. Usually, the threshold is set to 0.9 in experiments. The value of a is close to 0. In an ideal state, the actual average activation degree aj of each neuron should be equal to the sparsity parameter a. In experiments, it is generally set to 0.05. The settings of λ, μ, and the dimension of the hidden layer are explained in the experiments below. Since the setting of SVM classifier has no impact on our study and is only used to evaluate the performance of feature extraction model, the penalty coefficient of SVM objective function is set as a fixed 0.5.

4.2. Data Information and Preprocessing

According to the regulations for operation and maintenance of China Civil Aviation Communication, Navigation and Surveillance System [25], this paper conducts simulation experiments on the operation data of ATM system equipment. Select three representative pieces of data from the ATM system equipment operation dataset, as shown in Table 2, including planned total working hours (PH), normal working hours (NH), normal operation rate (NOR), total number of equipment (TN), number of faulty equipment (NF), equipment intact rate (EIR), number of accidents (NA), number of serious errors (NS), and number of general errors (NG). A piece of data without accidents and errors is regarded as normal data and is recorded as 0, and a piece of data with accidents and errors is regarded as abnormal data and is recorded as 1.

The calculation methods of NOR and EIR are shown as follows:

The normalization method used for data preprocessing [26] is shown as follows:where max is the maximum value of the scope of data normalization, min is the minimum value, and the scope of this experiment is set to [0, 1].

4.3. Feature Association Analysis

The features of the ATM operation dataset include 9 attributes, which are separated into three classes, namely, running time feature (including 3 attributes), running quantity feature (including 3 attributes), and running error features (including 3 attributes), which are recorded as R1, R2, and R3 features, respectively. Feature association analysis was carried out for these three types of features, respectively, and the heat maps of correlation coefficient matrix of each feature class are shown in Figure 5.

According to the feature association algorithm, the threshold is set to 0.9, when the absolute value of the correlation between two feature attributes is greater than or equal to 0.9, the feature is selected, and the data item is deleted. It can be seen from Figure 5 that the relationship between the R1 class and R3 class feature attributes is weak, and the absolute values of the correlation are less than the threshold of 0.9. While attribute 2 and attribute 3 in R2 features have a strong correlation, the absolute value of the correlation is 0.98, higher than the threshold value of 0.9. According to the feature association algorithm, the variance of the two is compared, and the data item of attribute 2 is determined to be deleted. After the above analysis, the number of features of the R1 category remains unchanged at 3, the number of features of the R2 category becomes 2, the number of features of the R3 category remains unchanged at 3, and the final data dimension is 8.

After performing the above correlation analysis on the features of the R1, R2, and R3 categories and eliminating one of the feature attributes with strong correlation, the overall correlation of the obtained new data is analyzed. The correlation coefficient matrix of the overall data is shown in Figure 6.

In Figure 6, the absolute values of the correlation between attributes of R1, R2, and R3 features are all less than 0.9, so the feature numbers of R1, R2, and R3 remain unchanged, and the data dimension finally obtained is 8.

4.4. Experimental Results and Analysis

The feature extraction model is used to train and test the ATM dataset, and SVM is combined to judge the data category. All models are set to reduce the input data from 8 dimensions to 4 dimensions. Assume that in the initial experiments, the first encoding layer of DRSAE model reduces the input 8-dimensional data to 6-dimensional space, and the second coding layer reduces the input 6-dimensional data to 4 feature spaces. The dimensional change of the two decoding layers is opposite to that of the encoding layer. Finally, the reconstructed final features are obtained through iterative training.

In the DRSAE model proposed in this paper, in order to select appropriate parameter values, this paper conducted a comparative experiment on how the changes of each parameter affect the experimental results. When the batch size is 32, 64, 128, and 256, respectively, the changes in the classification score of the DRSAE model after feature extraction are shown in Figure 7. When the epochs of the entire DRSAE are 50, 100, 150, and 200, respectively, the changes in the classification accuracy of the DRSAE model after feature extraction are shown in Figure 8.

It can be seen from Figures 7 and 8 that the batch size and epochs have a certain impact on the classification results. When the batch size is 64 and the epoch is 100, the classification score and accuracy results are the best, 63.89% and 71.36%, respectively.

In DRSAE model, the value of penalty constraint μ is very important for the actual average activation degree aj of each neuron. Five evaluation indexes are used to evaluate the influence of μ on classification results, including accuracy, precision, true-positive rate (TPR), false-positive rate (FPR), and F-score [27]. The evaluation results are shown in Figure 9.

For the classification evaluation of different feature extraction models, the higher the values of accuracy, TPR, precision, and F-score are, the lower the FPR is, which indicates that the better the classification performance and the higher the accuracy of the model. It can be seen from Figure 9 that when μ is 2, the FPR is the lowest, which is 25.49%, indicating that the proportion of the number of abnormal data classified by the classifier as normal data to the total number of abnormal data in the actual dataset is the smallest. The F-score is a good combination of TPR and precision. When μ is 1, F-score is the highest, which is 78.84%, and the classification accuracy is also the highest, which is 74.37%. FPR is 48.04%, which is slightly higher than when μ is 2. In conclusion, the model has the best performance when μ is 1.

In DRSAE model, the dimension setting of the hidden layer is a necessary thing to be done in this experiment. As the DRSAE model is designed with multiple hidden layers, and the initial dimension of ATM data used in this experiment is 8, there is no need for too many hidden layers to increase the complexity of feature extraction. Deeper layers may bring overfitting. Therefore, five hidden layers are used for feature extraction, among which the first three layers are the hidden layers of the encoder and the last three layers are the hidden layers of the decoder. ROC and AUC evaluation indexes in dichotomies were used to determine the results. ROC curves corresponding to different dimensions of hidden layers are shown in Figure 10.

Due to the area between ROC curve and x coordinate axis, namely, AUC, the larger the AUC is, the better the model classification performance is. As can be seen from Figure 10, when the dimensions of the three hidden layers of the encoder are 8, 7, and 4, respectively, the AUC is the largest, which is 0.89. Therefore, the dimension of the hidden layers of the encoder is determined to be (8, 7, 4).

In the DRSAE model, the influence of λ controlling the degree of regularization on the experimental results is shown in Figure 11.

As can be seen from Figure 11, when λ = 0.0005, the classification results have the best performance.

In addition, the DRSAE model can be regarded as using three methods, namely, the feature association method, the L1 regularization method, and the KLD method. The ablation experiment on the model is shown in Table 3.

It can be seen from Table 3 that when the feature association algorithm is not used, the classification accuracy of the model is less than 70%. When the feature association algorithm is added, the classification accuracy of the model is improved because the association analysis is performed on the features first and the feature selection is realized. Compared with only using the L1 regularization method, only using the KLD method has a greater effect on the model. And after adding the feature association algorithm, the KLD method has a greater effect on the model than the L1 regularization method. The DRSAE model combines three methods, and the model classification accuracy is greatly improved. Therefore, all three methods are important to the DRSAE model, and none of them are indispensable.

The DRSAE model in this paper was compared with the typical linear dimension reduction methods, such as principal component analysis (PCA) [28] model, and the nonlinear dimension reduction methods, such as AE [29] and SAE [30] models by using parameter settings in Table 1, and SVM classifier was used to classify data. The feature extraction time and classification time of each model are shown in Table 4.

The air traffic control system operation data used in this experiment are simulation data, and the amount of data is not large, so the processing time of all models is very fast. In Table 4, the PCA model can only reduce the dimension linearly, the model is relatively simple, and the feature extraction time is the shortest. The DRSAE model proposed in this paper takes the longest time, because the DRSAE model performs nonlinear dimensionality reduction on the data, and the model is more complex. Moreover, the classification time of DRSAE model is 0.005 s, which is in the middle level. Compared with AE and SAE models, this model increases the number of hidden layers and regularization limits, the operation of feature extraction is more complex, and the classification time is slightly longer, but the time is very short, less than 1 s, which can be ignored.

The DRSAE model is compared with PCA, AE, and SAE models, and the classification evaluation results of all models to the testing set are shown in Figure 12.

As can be seen from Figure 12, for the operation data of ATM equipment, since the PCA feature extraction method of linear dimension reduction can only perform linear transformation, the model classification accuracy is the lowest, which is 63.32%. Nonlinear dimensionality reduction methods such as AE and SAE can perform both linear transformation and nonlinear transformation, and their model classification accuracy has been improved accordingly, indicating that nonlinear dimension reduction method can extract more effective feature representation for the current dataset with complex feature space. However, the nonlinear dimension reduction model DRSAE in this paper adds sparsity restrictions and the number of hidden layers. The DRSAE model has the highest classification accuracy of 85.43%, which is about 9% higher than SAE model and has the strongest feature extraction performance. For FPR, it can be seen that the classification error rate of DRSAE in this paper is 19.61%, which is in the middle position due to the error of model training. It can be seen that the F-score of the DRSAE model in this paper is the highest, which is 85.85%, indicating that the situation features of the ATM system extracted by the DRSAE method have strong expression ability, and the DRSAE feature extraction model has the peculiarities of high accuracy and strong classification performance, which is convenient for the subsequent situation assessment of ATM system.

5. Conclusions

In this paper, a SAE-based feature extraction method for ATM system is proposed. Because of the correlation problem of ATM system data features, a feature association algorithm is designed, and a deep-related sparse autoencoder (DRSAE) model is established. L1 regularization and KLD sparsity term are combined to limit the encoder network. After iterative training of a single encoder, the whole encoder is globally optimized. Finally, SVM classifier is combined to evaluate the DRSAE model and other feature extraction models. The experimental results show that, compared with other linear dimension reduction methods, such as PCA and nonlinear dimension reduction methods, such as AE, the DRSAE model considers the correlation between data, increasing the quantity of hidden layers enables the model to extract more expressive and robust features, thus achieving smaller structural loss and lower classification error rates, and the model has stronger performance.

However, this paper does not conduct multi-class evaluation of the model, and the model performance can be further optimized. In the future, more advanced classifiers will be used to achieve multi-classification of situation data, and a new situation assessment model will be proposed to evaluate the situation of ATM system by calculating the situational value, so as to provide more reliable decision-making basis for controllers.

Data Availability

The ATM dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was funded by the joint funds of National Natural Science Foundation of China and Civil Aviation Administration of China (U1933108 and U2133203), the National Natural Science Foundation of China (62172418), the Natural Science Foundation of Tianjin China (21JCZDJZ00830), and the Open Fund of Key Laboratory of Airworthiness Certification Technology of Civil Aviation Aircraft (SH2021111907).