Abstract

Due to the increasing variety of encryption protocols and services in the network, the characteristics of the application are very different under different protocols. However, there are very few existing studies on encrypted application classification considering the type of encryption protocols. In order to achieve the refined classification of encrypted applications, this paper proposes an Encrypted Two-Label Classification using CNN (ETCC) method, which can identify both the protocols and the applications. ETCC is a two-stage two-label classification method. The first stage classifies the protocol used for encrypted traffic. The second stage uses the corresponding classifier to classify applications according to the protocol used by the traffic. Experimental results show that the ETCC achieves 97.65% accuracy on a public dataset (CICDarknet2020).

1. Introduction

According to the forecast of Cisco’s Annual Internet Report [1], by 2023, the total number of global Internet users will increase from 3.9 billion in 2018 (51% of the global population) to 5.3 billion (66% of the global population), and the number of devices connected to IP networks will reach 29.3 billion, more than three times the global population. As more users and devices connect to the network, the applications will become more and more diversified, and Internet communication methods will become more and more complex, which also make network management more complicated [2]. But if we can identify the application type of network traffic, we can improve the level of network management. For example, many applications occupy a large amount of network bandwidth, causing other applications to operate abnormally. If Internet Service Providers (ISPs) can provide different levels of service quality according to different types of applications, the unfair use of network resources can be solved [3], and the user’s Internet experience will also be better.

On the other hand, in order to meet the needs of users for security and privacy, various encryption technologies are widely used in network communications [4]. Security Socket Layer (SSL), Virtual Private Network (VPN), Secure Shell (SSH), and The Onion Router (Tor) are currently the most common encryption methods [5]. But encryption not only protects users’ privacy but also poses other threats to users. Encryption technologies can help hackers hide their malicious behavior. Network managers need to be able to identify encrypted traffic in a timely manner, so as to quickly and accurately locate attacks on the network, cut off the transmission path, and reduce the harm of malicious behavior to users. Encryption also causes trouble to the IT team. The payload will change after the traffic is encrypted. This change brings additional challenges to the accurate identification of encrypted network protocols and encrypted network applications, resulting in the complexity and difficulty of the traffic analysis and network management [6]. Moreover, even if it can be classified accurately, it is difficult to guarantee real-time performance.

Encryption invalidates many early traffic classification methods, such as port-based classification, entropy-based classification, payload-based classification, and pattern matching-based classification. This is because the port, entropy, payload, and header of network traffic will change with encryption [6]. In recent years, machine learning methods have been the most commonly used method for classifying encrypted traffic. This is because encryption is usually only for the payload, and the machine learning method only care about statistical features, not the value of the payload. Hence, machine learning methods are less affected by encryption. This makes machine learning based methods more accurate than other methods.

Most encrypted application classification methods are based on single label. In other words, they directly use the classifier to determine the application type of network traffic. But under different encryption protocols, the characteristics of the application are also different. The encryption protocol mainly has two steps, the initialization of the connection and the transmission of encrypted data. The initialization of the connection is divided into initial handshake, identity verification, and shared key establishment. Because the encryption principles of different encryption protocols are different, these steps are very different, which leads to different representation of the final encrypted traffic [5]. Therefore, if we can classify encrypted applications on the basis of known encryption protocols, we can get more accurate results than single-label classification.

In this paper, we propose an Encrypted Two-Label Classification method, referred to as ETCC, to improve the accuracy of encrypted application classification. ETCC is a two-stage two-label classification method. The two labels are encryption protocol and application. The first stage classifies the protocol used for encrypted traffic. The second stage uses the corresponding classifier to classify applications according to the protocol used by the traffic. The contributions of this paper are summarized as follows:(1)We propose a two-stage two-label scheme called ETCC, which carries out refined application classification according to the encryption protocol used(2)In the second stage of application classification, encrypted traffic can select the corresponding classifier according to the protocol type, instead of uniformly using the same classifier(3)Our scheme can identify both the protocol and the application, which can meet various needs

The rest of this paper is organized as follows. Section 2 introduces some encryption traffic classification methods and some multilabel classification methods. In Section 3, a scheme is proposed to achieve refined applications classification. And some experiments and evaluations are presented in Section 4. Finally, Section 5 concludes our work and proposes some future works.

In this section, we introduce some methods for classifying encrypted traffic and methods for multilabel classification. These works give bright inspiration for our research.

In the early research, the commonly used methods include port-based classification, entropy-based classification, payload-based classification, and pattern matching-based classification. In the early days of Internet development, every application had a fixed port number assigned by IANA [7]. Therefore, we only need to check the IANA TCP/UDP list to know the type of application. However, with the emergence of technologies such as port confusion and network address translation (NAT), port-based methods have become no longer feasible. Entropy-based methods classify encrypted traffic by extracting geometric features between traffic. Casino et al. [8] propose a method to distinguish encrypted and nonencrypted traffic based on the entropy value. They only analyze a random subset, not the complete network traffic, to ensure real-time performance. The payload-based method can no longer analyze the contents of the package and cannot be used anymore [9]. The method based on pattern matching judges whether it is encrypted traffic and encryption protocol type by checking the header format but cannot further judge the application type. In summary, we need more advanced methods to achieve encrypted traffic classification task.

The most commonly used method is based on machine learning. The differences of these methods are reflected in feature extraction, model selection, and parameter setting. Liu et al. [10] only consider first N packets in a sliding window, which not only reduces the dimension of encrypted traffic characteristics but also reduces the number of data packets in each flow. Similarly, Hasan et al. [11] analyze the first 64 packets to identify Android applications. Finally, they state that most Android applications can be identified through the TCP/IP header. Shen et al. [12] combine the certificated packet length and the first application data size as a unique fingerprint for a given application and then use the second-order Markov chain to classify encryption applications. Cui et al. [13] propose the SPCaps model, which uses capsule neural networks (CapsNet) to learn the spatial features of encrypted traffic. The advantage of this model is that it simultaneously learns the position of the feature in the package and the order between the packages. Ly Vu et al. [14] used time series as an entry point to classify encrypted traffic. Their method is divided into two steps. The first step is to extract behavior patterns based on the time series of packets. The second step is to classify according to the correlation between time series samples. Zeng et al. [15] think more comprehensively. Their scheme not only analyzes spatial features but also analyzes temporal features and coding features. However, these works still ignore the suddenness of network traffic and cannot capture complex nonlinear features. The framework proposed in [16] leverages multifractal feature extraction technology, which can capture the self-similarity of network traffic structure in a wide time range. Because it is always difficult to consider comprehensively when extracting features, Wang et al. [17] took a different approach and directly converted the flow into a picture and put it into the model for classification. Lotfollahi et al. [18] employ CNN and SAE to classify encrypted traffic, respectively. There is no need for an expert to extract features and provide reference for many later studies.

The classification in general scenarios was introduced earlier, but, for specific scenarios, using specific methods can be more efficient. Shen et al. [19] introduce the traffic classification in Ethereum. Because these flows are all generated on the same platform, it will be more difficult to distinguish. To this end, they study where the existing methods are easy to misclassify and extract features from three aspects: packet length, packet burst, and time series. In order to evaluate quality of experience (QoE) and bring better services to users, Orsolic et al. [20] propose a system for YouTube videos called YouQ. They collect YouTube videos and evaluate the QoE of the videos based on the traffic characteristics of each video session. Similarly, Tarun Mangla et al. [21] evaluated the QoS of encrypted HTTP-based adaptive streaming (HAS) sessions. Anderson et al. [22] analyze TLS encrypted sessions in commercial malware sandboxes and two enterprise networks. They claim that the choice of features has a great impact on performance. In order to monitor and detect specific users, Pierre-Olivier Brissaud et al. [23] propose a scheme for monitoring HTTP/2 communication based on the TLS protocol. This scheme is designed to detect whether the user has performed certain specified operations. The QUIC (Quick UDP Internet Connection) protocol is a new default encrypted Internet communication protocol that provides many improvements to speed up HTTP communication while making it more secure. However, since it is a new type of protocol, the amount of data available is very small. Rezaei et al. [24] propose a semisupervised learning based method that first trains the model with a large amount of unlabeled data and then retrains the model with a small amount of labeled data. For network traffic classification, it greatly reduces the amount of labeled data required.

The studies on multilabel classification are very few. There are two common ways to deal with multilabel classification. Convert the multilabel classification problem into several single-label classification methods, or integrate multilabels into a single label. Grigorios Tsoumakas et al. [25] give a detailed introduction to multilabel classification and compare several classification methods, which provide a lot of guidance for our research. Tien Thanh Nguyen et al. [26] propose a Bayes-based method that not only considers the relationship between labels and features but also considers the relationship between label pairs. Jesse Read et al. [27] constructed a multilabel Hoeffding tree with classifiers at the leaves. Moreover, they create a new set of benchmarks in predictive performance and time complexity. Darshin Kalpesh Shah et al. [28] use RNN and LSTM to classify multilabel text. The performance is significantly better than Logistic Regression and ExtraTrees. Ou Guangjin et al. [29] present a graph convolution networks based multilabel zero-shot learning model to recognize novel categories. Most of the multilabel classification is aimed at the problem of category independence. However, Nadia Ghamrawi et al. [30] study the problem of high label dependence. Jesse Read et al. [31] also study the high dependency between labels. They use a chaining method to model the label relationship. Pengcheng Yang et al. [32] regard the multilabel classification task as a sequence generation problem and used the sequence generation model for classification. Experiments show that this method can effectively capture the correlation between labels. These works help us a lot. Similarly, a two-stage two-label method is proposed in our paper, in which the protocols are classified in the first stage, and then applications are classified in the second stage. The biggest difference between our method and other multilabel classification methods is that our method will select the corresponding classifier for the second stage classification based on the results of the first stage. We achieve refined classification and two-label classification can meet various needs.

3. Methodology

In this section, we propose a two-stage, two-label scheme to classify encrypted applications, called ETCC. Our scheme consists of three modules: preprocessing, first label and second label module. They are used to preprocess data, classify protocols, and classify applications, respectively. Figure 1 presents the details.

3.1. Preprocessing Module

This module is used to process raw data and convert them into a format suitable for the input of the classifier.

First, we collect some encrypted traffic and label them with protocols and applications.

Second, we select and extract some features. A flow is a collection of packets with the same IP five-tuple {Source IP, Destination IP, Source Port, Destination Port and Protocol}. Because the packets of the same flow are usually the same encryption protocol and application, we process data in units of flows. We use spatial features and temporal features to distinguish encrypted traffic, because these two features are not easily affected by encryption. Spatial features are related to quantity and size. Temporal features are features associated with time series. The specific features are shown in Table 1.

Third, we use the Sequential Floating Forward Selection (SFFS) algorithm [33] to select the most suitable features. We finally selected 41 features about Port, Protocol, Flow Duration, Length of Packet, Flow Bytes/s, Packets/s, Flow IAT, Forward IAT, Backward IAT, Flag Count, and Active Time. Detailed features are shown in Table 1. Through these simplified features, we can get a classifier with better generalization and faster speed.

Finally, we apply Min-Max Scaling [34] to normalize feature to meet the input requirements of supervised classifier and speed up model training. The formula of Min-Max Scaling is shown inwhere is the maximum value of the sample data, is the minimum value of the sample data, is the current sample value, and is the normalized value of the current sample.

After this, the feature values are all mapped to the interval [0,1] and fed into the first label module.

3.2. First Label Module

We leverage this module to classify various encryption protocols into m categories. At first, we choose CNN and LSTM classifiers and test their performance, respectively. In the end, we apply CNN, which performs better. The reason for applying CNN is addressed in Section 4.3.

Figure 2 depicts the architecture for CNN. It contains of convolution, pooling, flatten, and dense layers. The convolution layer is used to extract different features of the input. However, if several convolution layers are used continuously, the amount of calculation will become very large, and the pooling layer can effectively reduce the amount of calculation through downsampling. Next, the flatten layer will convert the convolved data to one-dimensional and facilitate connection to the dense layer. The dense layer combines all local features into global features at the end to get the classification results.

Figure 3 depicts the architecture for LSTM. The input layer and output layer of LSTM are similar to CNN, but the difference lies in the intermediate calculation process. LSTM cells can learn two pieces of information: new input information and previous memory. This allows LSTM to effectively use historical information so that it can learn long dependencies [35].

After input and calculation, the output layer can get a probability distribution of the flow classification . We define that determines the prediction category.

Finally, protocol types of encrypted traffic are obtained. We sent this m encrypted application traffic to the next module.

3.3. Second Label Module

On the basis of known encryption protocols, we leverage this module to further classify encrypted applications into n categories.

Corresponding to the m encryption protocols obtained in the last module, we prepare m classifiers. That is, each protocol corresponds to a classifier. encrypted traffic selects the corresponding classifier according to its protocol type, and each classifier is only responsible for the application classification of a specific protocol. By using different classifiers for different protocols, we can get more accurate results.

We choose CNN and LSTM in this module. In the end, we apply CNN. The performance of these two algorithms is addressed in Section 4.3.

4. Experiment and Evaluation

In this section, we do some experiments to evaluate ETCC and compare it with the state-of-the-art method. We deploy our model on Ubuntu 16.04 OS, equipped with NVIDIA GTX 1050 GPU.

4.1. Dataset Description

Three public datasets CICDarknet2020 [36], ISCXTor2016 [37], and ISCXVPN2016 [38] are used to evaluate ETCC. These datasets include four types of protocols and five types of applications. The four protocols are Tor, Non-Tor, VPN, and Non-VPN. The five applications are chat, FTP, email, audio, and video, as shown in Table 2.

CICDarknet2020 is a complete dataset covering Tor traffic and VPN traffic. The specific quantity of each type of data is shown in Table 3. Since ISCXTor only has Tor traffic and ISCXVPN only has VPN traffic, we mix them together as a dataset, called ISCX-Tor-VPN. In order to eliminate errors caused by data sample selection, ISCX-Tor-VPN uses the same sample quantity as CICDarknet2020. In addition, we set the ratio of the train set to the test set with 4 : 1.

4.2. Parameter Settings

We deployed experiments for each classifier in each stage.

For the first label module, the structures of the CNN classifier and the LSTM classifier are shown in Figure 4. The dropout layer is used to discard neurons with a certain probability to prevent model overfitting and improve the generalization ability. Furthermore, we set the activation function, loss function, batch size, and epochs with ReLU, categorical_crossentropy, 32, and 15, respectively. For optimizer, the CNN classifier uses SGD, and the LSTM classifier uses Adam.

For the second label module, we have four classifiers to classify encrypted applications. The structures of the CNN classifier and the LSTM classifier are shown in Figure 5. Other parameters are the same as the last module.

4.3. Results and Discussion

In this section, we analyze the performance of ETCC on the two datasets and compare ETCC with the state-of-the-art method.

We evaluate the classification results after the first label module. Figure 6 shows confusion matrices of the results. Rows and columns represent the true category and predicted category. The value represents the probability of a category being classified into each category.

From Figure 6, we find that, under the same model, the results of CICDarknet2020 are better than the results of ISCX-Tor-VPN. This is because the data of CICDarknet2020 is generated under the same network environment, and ISCX-Tor-VPN is a mixed dataset, which makes the distinction between Tor and Non-Tor and between VPN and Non-VPN smaller. For the two classifiers, it is obvious that the results of CNN are better, so we choose CNN as the first stage classifier. In addition, we also find that the easily confused categories are VPN and Tor and Non-VPN and Non-Tor. It is not difficult to understand that there are some similar characteristics between encrypted traffic and nonencrypted traffic.

Tables 4 and 5 show the experimental results of the second label module on the premise that the first label module uses CNN classifier. Accuracy, precision, recall, and F1 are used to evaluate the scheme. They are defined as follows:

For category X, is the number correctly classified into X, is the number correctly classified into Not-X, is the number incorrectly classified into X, and is the number incorrectly classified into Not-X.

As can be seen from the Tables 4 and 5, CNN performs better than LSTM. For CICDarknet2020, except the F1 of Tor, other indicators CNN performs better. For ISCX-Tor-VPN, except the precision of Tor, the precision of Non-VPN, and the F1 of Non-VPN, other indicators CNN performs better. This is because CNN has a better understanding of local features, while LSTM can memorize some context information. In our dataset, the category of a flow has little relationship with the flow before and after it, so CNN performs better. Therefore, we also chose CNN as the second stage classifier; the worst indicator also exceeds 91.1%.

Tables 6 and 7 show the performance with the second label module and CNN classifier. We find the classification capabilities of Non-Tor and Non-VPN classifiers are better than Tor and VPN classifiers. This proves that encryption makes traffic classification more difficult. Another observation is that the precision of email is very low; this is because the sample size of the email in the dataset is very small. This phenomenon will not occur when the sample size is balanced. Moreover, audio and video achieve the best classification results.

Finally, we compare the results of CICDarknet2020, ISCX-Tor-VPN and the state-of-the-art method [39], as shown in Table 8. The result of CICDarknet2020 is better than that of ISCX-Tor-VPN. The reason is as mentioned earlier; that is, ISCX-Tor-VPN is a mixed dataset, and data is less distinguishable. Moreover, compared with [39], except the precision of email and the recall of video, other indicators are improved. Total precision and recall increase by 1% and 1.6%, respectively. In general, our ETCC significantly improves the classification accuracy of encrypted applications through a two-stage two-label method. This proves that applications have different characteristics under different protocols, and the classification of applications on the basis of known protocols will result in more accurate results.

5. Conclusion and Future Work

In this paper, to achieve refined classification of encrypted applications, we propose a two-stage two-label scheme. The first stage classifies the protocol used for encrypted traffic. The second stage uses the corresponding classifier to classify applications according to the protocol used by the traffic. The experimental results prove that our scheme is effective and feasible.

Furthermore, we discuss two-label classification in this paper. We will consider more labels in the future and propose more practical solutions. In addition, our method is based on the identification of encryption protocols. Once the traffic uses an unknown encryption protocol, the application classification results will be affected. Therefore, we will consider the use of unknown encryption protocols in our future work.

Data Availability

The datasets used in this paper are mainly obtained through the website https://www.unb.ca/cic/datasets/darknet2020.html; https://www.unb.ca/cic/datasets/tor.html; https://www.unb.ca/cic/datasets/vpn.html. The raw/processed data required to reproduce these findings cannot be shared at this time as the data also form part of an ongoing study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported in part by the National Natural Science Foundation of China under Grant no. 61702267, Jiangsu Planned Projects for Postdoctoral Research Funds, and in part supported by the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing.