Abstract

Due to the proliferation of mobile applications, mobile traffic identification plays a crucial role in understanding the network traffic. However, the pervasive unconcerned apps and the emerging apps pose great challenges to the mobile traffic identification method based on supervised machine learning, since such method merely identifies and discriminates several apps of interest. In this paper we propose a three-layer classifier using machine learning to identify mobile traffic in open-world settings. The proposed method has the capability of identifying traffic generated by unconcerned apps and zero-day apps; thus it can be applied in the real world. A self-collected dataset that contains 160 apps is used to validate the proposed method. The experimental results show that our classifier achieves over 98% precision and produces a much smaller number of false positives than that of the state of the art.

1. Introduction

Mobile apps are now the most popular way to access the Internet. Smart Insights [1] reports that mobile devices dominate in minutes spent online across countries and more than 80% of mobile minutes are spent on apps. According to Statista [2, 3], as of the first quarter of 2018, Android users were able to choose between 3.8 million apps in Google Play and an average of 6,140 mobile apps were released through the Google Play Store every day. BrightEdge [4] reports that 57% of all online traffic was on mobile and tablet in 2017. Hence the current focus of research is shifting from the traditional workstation traffic identification to the mobile traffic identification, which is the task of associating network traffic with a certain app in this paper.

Mobile traffic identification plays an important role in network management, marketing research, and user characteristic analysis [5, 6]. For example, based on this technology, a network administrator can obtain the popular apps in the network and optimize the resource allocation accordingly to improve the user experience. A company can monitor whether employees use unallowed apps during working hours, such as game and shopping. For advertisers, understanding a certain app is popular with users in which area and time periods can help them create a better advertising strategy. For market researchers, understanding the use of apps of concerned users can help them analyze the interests and needs of users; then further business activities can be carried out. For example, if a person uses a flight booking app frequently, then the user may be a potential customer of travel services.

Although mobile traffic identification task looks similar to the traditional workstation traffic identification task, the particularities of mobile traffic pose great challenges for traditional identification methods. First, mobile traffic is almost carried over HTTP/HTTPS, making the port-based approach identify mobile traffic as Web only. Second, lots of apps use encryption protocols for data transmission in order to protect user privacy. Indeed, some encryption protocols may expose useful information during its negotiation process, such as the TLS SNI (Server Name Indication), so that part of the encrypted traffic can be identified by DPI (Deep Packet Inspection) approach. However, the SNI field sometimes is blank and not every SSL/TLS connection has a negotiation phase, which decrease the effectiveness of this method. For example, we randomly checked 50 HTTPS connections that come from MOMO, which is a social app in China. There are 9 connections that do not have a negotiation process and 11 connections have a negotiation process but the SNI field is blank. Third, mobile apps often access third-party libraries, resulting in the fact that different apps would generate similar traffic. It is difficult to discriminate such traffic via DPI technology or IP address. This problem can be circumvented in a sense if such traffic is considered as an individual category. Fourth, CDN (Content Delivery Network) is used by many apps to improve the user experience. As a result, a server’s IP address can be shared by multiple apps. For example, we found an IP address 101.226.220.12 serving at least five apps at the same time. Additionally, there are apps that do not use DNS to obtain the IP address of the server. For example, WeChat, a popular instant messaging app in China, is observed to return a list of hundreds of server IP addresses to a particular request from the client; thus the client no longer needs to perform a DNS query. The above scenarios reduce the traffic volume that can be identified by DNS-based approach. In view of the above reasons, the traditional traffic identification methods are insufficient for handling mobile traffic.

The statistical-based method has recently gained extensive research. It uses raw traffic data or side-channel information leaked from network traffic to train classifiers based on machine learning. Many methods have been proposed and the results are encouraging. However, it is impossible to identify all apps traffic due to the large number of mobile apps; thus a classifier usually merely identifies several apps of interest. Then the massive unknown traffic that comes from unconcerned apps and the emerging apps (also called as zero-day apps in this paper) brings great challenges to the classifier.

In general, there are two ways to enable a classifier to handle unknown instance in machine learning. One is constructing a N+1-class classifier, and the other is achieving multiclass classification by multiple binary classifiers. The N+1-class classifier treats the unknown instances as one category. A major drawback of this method is the training set is always insufficient since it is not possible to collect all unknown instances. The latter way learns each known category’s patterns separately by training a binary classifier. Only when the predictions from all binary classifier are negative will the instance be classified as unknown. The first drawback of this method is the same as that of the former method. Another drawback is that the prediction criterion is prone to identifying unknown instances incorrectly.

In this paper, we propose a three-layer classifier to identify mobile traffic under open-world settings. This classifier possesses the capability of excluding unknown apps traffic even if the training set is insufficient. The first layer does a coarse-grained classification to exclude unconcerned apps traffic whose patterns have been learned. Then the second layer does a fine-grained classification to discriminate between target apps traffic. Finally, the third layer learns the patterns of each target app traffic from different perspectives and sets a strict prediction criterion to exclude the false positives caused by unknown traffic. Besides, we only use side-channel traffic information and raw traffic data as traffic features. To the best of our knowledge, this is the first time to identify mobile traffic in open-world settings which contain unknown app traffic.

The main contributions of our work are as follows. Firstly, we propose a novel multilayer classifier that can identify target app traffic and exclude unknown app traffic. This approach can be applied to the identification of mobile traffic in the real world. Secondly, we collect a representative mobile traffic dataset to validate our method. This dataset contains network traffic from 160 apps that are installed on 12 mobile devices. Finally, our method outperforms the state of the art. The results show that the proposed classifier achieves more than 98% precision with the lowest number of false positives.

The rest of the paper is organized as follows: Section 2 surveys related work; Section 3 describes the proposed multilayer classifier architecture; Section 4 evaluates the proposed method; Section 5 gives a brief discussion; Section 6 concludes the paper.

For the reasons explained above, there are some deficiencies in port-based, DPI-based, and DNS-based approaches when they are applied to mobile traffic identification. Please refer to work [7] for a detailed survey about using DPI-based methods to identify mobile traffic. Here we highlight several DNS-based and machine learning-based traffic identification methods that have recently been proposed.

2.1. DNS-Based Traffic Identification

Bermudze et al. [8] presented a notable work that associates network traffic with domain names. They extracted the 3-tuple (i.e., ) by parsing the captured DNS packets and labelled traffic with domain name according to . Then offline analysis was performed based on the label of traffic, including the distribution of domain names and server IP addresses and the domain names or service offered by certain CDN vendor. The authors pointed out that about 73% of server IP addresses have a unique domain name and 82% of domain names have a single IP address. Similar mechanisms were proposed in work [9, 10]. In addition, Mori et al. [10] enriched the tuple library by combining the DNS response information of multiple users and ignoring the TTL, thus increasing the identifiable traffic volume. However, the above work only maps the traffic to the corresponding domain name without further identifying its related apps.

Trevisan et al. [11] investigated the effectiveness of DNS-based traffic identification method. They showed that about 65% of server IP addresses have a unique domain name, but less than 15% of the traffic is owing to these addresses. By manually mapping domain names to services, up to 55% of the traffic can be identified. The authors further explored how the associations between domain name and IP addresses evolve over time. The authors discovered that some IP addresses would become invalid over time. Therefore, although the DNS-based traffic identification method is simple and straightforward, only a small portion of mobile traffic can be handled.

2.2. Machine Learning-Based Traffic Identification

Wang et al. [12] proposed a system for identifying mobile apps. They collected traffic from 13 target apps by running apps dynamically for 5 minutes and then a Random Forest classifier was trained. Since the sample size in this work is inadequate, it is difficult to assess whether the results of this work are representative.

Alan et al. [13] identified thousands of apps using the launch-time traffic generated by target apps. The results showed that the classification accuracy reached 88% when training and test sets are collected on the same device; otherwise the accuracy would drop by as much as 26%.

AppScanner [14, 15] proposed a scheme for fingerprinting and identifying apps. They collected network traffic generated by different versions of apps installed on two Android devices. Then Support Vector Classifiers and Random Forest Classifiers were trained to classify 110 apps. Bursts of data are considered in this work to extract statistical features. Additionally, they improved the performance of the classifier by detecting “ambiguous flows”. Moreover, AppScanner also used a postvalidation mechanism to reject samples with low prediction credibility. The experimental results reported 96% average accuracy in the best case with recall lower than 40%. This work suffers from the fact that the burst is used to model traffic; thus AppScanner is only feasible in the simple network, such as a network which contains a single mobile device. It cannot be applied to the high-speed backbone network because it is likely that bursts cannot be extracted.

Some approaches based on CNN (Convolutional Neural Networks) also give a notable performance. For example, Chen et al. [16] encoded HTTP plaintext requests and identified 20 apps using 2D-CNN, but this technique only works with unencrypted traffic. Wang et al. [17] proposed a classifier for identifying malicious traffic based on 2D-CNN. The raw traffic data is converted into 2D-vectors as the input of the classifier. Their another work [18] held the view that the traffic is essentially sequential data, so 1D-CNN model is used to identify traffic. The accuracy reached 86.6% when network traffic is identified in fine-grained classification. They showed that the classifier can achieve better performance when using raw traffic data from all protocol layers than using payload only. However, as pointed out by Giuseppe et al. [19], the data provided in this case is always in the form of PCAP files, containing information that could introduce a bias in the classification results. Deep Packet [20] proposed a similar mechanism to identify mobile traffic using a 1D-CNN and a Stacked AutoEncoder (SAE). Giuseppe et al. compared four NN-based traffic identification methods and [18] gives the best performance when identifying the traffic generated by Android apps.

Although the aforementioned studies have proven to be effective, the proposed methods do not take into account the impact of unknown traffic on the classifier, thus impeding their application in the real-world networks.

3. Methodology

To handle the real-world mobile traffic identification task, a classifier needs to meet two requirements. One is identifying the target app traffic correctly and the other is eliminating a large amount of unknown traffic even if the unknown traffic training set is insufficient. Based on these two requirements, we present a three-layer classifier and the architecture is depicted in Figure 1.

We first introduce the terms defined in this paper. Bidirectional flow, which is a set of packets carrying the same 5-tuple (i.e., ), is used to decompose the captured traffic into discrete units. Flow is used to represent bidirectional flow in the rest of the paper when it does not cause ambiguity. For a TCP connection, SYN and RST/FIN indicate the beginning and end of the flow, respectively. A timeout mechanism (90s) is used to determine the end of a flow when a termination is not observed. Since mobile apps use mostly HTTP/HTTPS, only TCP flows are considered in this paper. But the proposed method can be ported to work with UDP traffic without any changes. Target represents the traffic that comes from the apps of interest (also called target apps in rest of the paper). Appi represents the i-th target app. The unknown traffic that comes from the unconcerned apps and zero-day apps is defined as Other category. Inspired by AppScanner, “ambiguous flows”, that is, traffic that is common among more than one app, is also used in our method. A new ambiguity detection method, which will be described in detail later in this paper, is designed to extract ambiguous flows.

3.1. Coarse-Grained Classification

The first layer of the architecture is a coarse-grained binary classifier that identifies flows as Target or Other. The binary classifier is not attempting to discriminate the Target from the Other accurately, which is also unrealistic because the classifier cannot be trained with the universe unknown flows. Although unknown instances are insufficient, the classifier can still learn some patterns of unknown traffic from the existing instances. Thus the primary purpose of this stage is to eliminate as much unknown traffic as possible without misidentifying the Target, thus reducing the incorrect classification of the followed second layer classifier. Hence this binary classifier is expected to have a high recall but may have a low precision of the Target class. This can be carried out by assigning appropriate weights to training instances.

3.2. Fine-Grained Classification

The second layer is responsible for fine-grained classification; i.e., the classifier in this layer aims to distinguish between target apps traffic. If there are N target apps, then an N+1-class classifier is trained in this layer. The N+1 classes consist of N target apps and ambiguous flows class. The flows classified as Target in the first layer will be classified by this fine-grained classifier. The possible classification results of a flow at this stage are as follows:(1)Classified as ambiguous flow: the classifier cannot identify the flow to a target app and refuses to give an explicit label.(2)Classified as Appi and the flow belongs to Appi: the classification produces a true positive.(3)Classified as Appi but the flow belongs to another target app or unknown app: the classifier produces a false positive on Appi.

In close-world settings, the primary purpose of a classifier is to distinguish different target apps traffic effectively, the traffic generated by other apps is not under consideration. By contrast, in open-world settings, the unknown traffic is the main source of false positives of the classifier and it will decrease the performance of the classifier dramatically. Therefore, the third layer is designed to verify the classification results of the second layer.

3.3. False Positive Exclusion

The third layer aims to eliminate the false positives caused in previous layers, i.e., to eliminate the misclassified target traffic in the second layer and unknown traffic that is not excluded by the first layer. The involved classification categories in this stage include N target apps, Other class, and ambiguous flow class. Then (N+2)(N+1)/2 binary classifiers are trained using One vs One.

If a flow is classified as Appi in the second layer, then it will be classified by N+1 binary classifiers in this layer. The N+1 binary classifiers are Appi vs Appj (j not equal to i), Appi vs Other, and Appi vs ambiguous flow. The output of this layer is Appi only when all binary classifiers classify the flow as Appi. Otherwise, this stage refuses to give a prediction. Multiple base classifiers and different traffic features can be utilized to train these binary classifiers so that mobile traffic can be portrayed from different perspectives. In short, the classifiers designed in this stage start with the assumption that if a flow belongs to Appi, it should be identified as Appi regardless of the feature or model used.

By this way, the third stage focuses on portraying each target category from multiple perspectives. Therefore, even if the patterns of unknown traffic are not learned by classifiers, the strict prediction criteria of the third layer will enable the classifier to eliminate nontarget instances effectively. Additionally, although One vs One is used to train (N+2)(N+1)/2 classifiers, each flow that arrives at the third layer only needs to be classified by N+1 classifiers.

In fact, the third stage excludes the unknown traffic at the expense of the number of flows whose prediction is valid. However, we hold the same view as [15]: false positives are usually undesirable for app identification.

3.4. Classifier Implementation
3.4.1. Ambiguous Flows Extraction

An Android app is built to aid us in collecting and labelling mobile traffic. Further information on this tool is available in Section 4. However, there is some “noisy data” in the captured data. First, network traffic coming from different apps using the same third-party libraries has different app labels. Second, we do not impose any restrictions on the user behavior, but some user actions may cause flows to have wrong labels. For example, if a user clicks an Appj’s link in Appi and keeps on using Appj in Appi process, the generated traffic will be labelled as Appi rather than Appj. In fact, the pattern of such traffic is in accord with that of Appj. Thus the classifier will be given contradictory training examples. We adopt the concept of “ambiguous flow” given in AppScanner to alleviate these problems.

A heuristic rule is exploited to extract ambiguous flows in this paper in contrast to AppScanner, which trained a Random Forest classifier for extracting such traffic. For network traffic coming from the same third-party library, it may have identical server IP addresses and ports. Similarly, the server IP address and port of the traffic coming from the latter case should also have some associations with the traffic generated by Appj. Based on this assumption, we extract the ambiguous flows as follows. First, the training set is grouped according to the pair. Then for each group, if the traffic in the group has multiple labels and there is no dominant category, that is, there is no category accounting for more than 90% of the total sample size in the group, flows in this group are relabelled as ambiguous flows.

3.4.2. Traffic Features

We designed 37 traffic features, including packet length related features, time interval related features, packet numbers, and ports. Then the correlation-based feature selection and best-first search provided in Weka [21] were used to select an effective feature set. Additionally, Bela et al. [22] showed that a P2P traffic classifier can reach a remarkable accuracy over 95% using as limited data the first 16 bytes of the first packet of each flow. Therefore, as listed in Table 1, our final feature set has 29 features including 12 statistical features, 16 byte values, and destination port. In order to classify traffic in real time, we extract all features from the first five packets with non-null payload of each flow given that Bernaille et al. [23] found that the first five packets of a TCP connection are effective for traffic classification.

3.4.3. Base Classifier

Previous work [12, 14, 15] has shown decision tree-based models have an impressive performance in identifying mobile traffic. Therefore, we use decision tree-based models as base classifiers to implement our method. The details of each layer classifier are shown in Table 2.

The training set includes 2 categories in the first layer, i.e., Target and Other, and 16 features are used to train a Random Forest classifier. Next, ambiguous flows are extracted from the training set. The training set for the Random Forest classifier in the second layer consists of the instances of the ambiguous flows class and the remaining instances of N target apps. In the third stage, in order to describe traffic from a different perspective, port and 12 statistical features are used as traffic features and two different decision tree-based classifiers are trained. The training set in this stage includes N+2 categories. The N+2 categories include the ambiguous flows class, N target app classes, and Other class. Other class contains unknown instances that are misclassified by the first layer classifier. This is because the third layer classifier has no need to learn the features of an unknown flow if the flow is excluded in the first layer. Since we use two models at this stage, the third layer contains (N+2)(N+1) classifiers in total, and a flow entering this stage needs to be classified by 2(N+1) classifiers.

4. Evaluation

4.1. Dataset Collection

Since mobile traffic involves user private data, public mobile traffic dataset is not available currently. Therefore, the existing work uses the self-collected dataset to validate the proposed method. However, the commonly used data collection methods have drawbacks and poor scalability. First, mobile devices generate lots of background traffic, resulting in the fact that running one application at a time cannot get the accurate ground truth. Some measures are still needed to exclude background traffic [15]. Additionally, although there are tools, such as Network Log [25], that can be used to collect and label traffic according to the app process, these tools always require a rooted device, which results in poor scalability.

To overcome the above deficiencies, we built an Android app based on the VPNService framework provided in the Android system. This tool does not require the mobile device to be rooted. When it runs in the background, it can capture traffic and label it according to the app process that generates it. Then the captured traffic will be saved as a pcap file and the file is sent to a server every 5 minutes. In this way, our data collection has the following benefits over the data collection by manually running an application in a limited environment or using UI fuzzing technique for automatically running the apps. First, once the app is launched, the device will upload its traffic whenever the user uses it. Thus there is no need for further human intervention. Second, there are some execution paths in the app that cannot be executed by UI fuzzing, but this problem does not exist in our approach. Third, the captured traffic is generated in various network environments, making our dataset more representative. Fourth, this method is not affected by background traffic and has a good scalability. However, since other mobile operating systems do not provide similar interfaces, this tool is only available for Android devices. We are developing a function-like app for the iOS system to obtain traffic coming from iOS devices for future work.

Based on the aforementioned tool, we collected the mobile traffic generated by mobile devices of 12 users in nearly three months. Although all devices run under Android system, they come from different vendors such as HuaWei, XiaoMi, and Samsung. The final captured dataset contains network traffic from 160 apps. Besides, the traffic is generated in various network environments covering 3G, 4G, and WI-FI. The collected data is divided into two datasets as shown in Table 3.

Dataset1 is used to train and test our three-layer classifier. Seven apps with more than 5000 flows in Dataset1 are chosen as target apps in our setting; the remaining 131 apps act as unconcerned apps. Dataset2 is only used to test the classifier. It is worth mentioning that Dataset2 contains 22 apps that have not been seen in Dataset1; thus the 22 apps can be regarded as zero-day apps and they account for 5569 flows. The details of the two datasets are listed in Table 4.

4.2. Evaluation Metrics and Experimental Setup

Five evaluation metrics are used: True Positive (TP), False Positive (FP), False Negative (FN), precision, and recall. For an app A, TP refers to the number of samples correctly classified as A. FP refers to the number of samples incorrectly classified as A. FN refers to the number of samples incorrectly classified as non-A. Then the precision and recall of identifying A are TP/(TP+FP) and TP/(TP+FN), respectively.

Scikit-learn [26] machine learning library is used to implement our classifiers. We compare it with the state of the art [18] and a single Random Forest classifier is trained as the baseline. The baseline is an N+1-class Random Forest classifier which includes 30 trees with a maximum depth of 20, and the features used for the training are 29 features as described in Table 1. To implement the 1D-CNN model proposed in [18], the first 784 bytes of the payload of each flow are converted into a 1D-Vector and an N+1-class 1D-CNN classifier is trained by Keras [27] with TensorFlow [28] as backend. The parameters of the 1D-CNN classifier are consistent with those in [18]. The ambiguous flows detection is not applied to these two classifiers. The parameters of our classifier are as follows. The Random Forest classifiers of the first two layers each include 30 trees with a maximum depth of 20. The Random Forest classifiers of the third layer include 20 trees with a maximum depth of 20, and the XGboost classifiers include 10 trees with a maximum depth of 5.

4.3. Evaluation on Dataset1

Before evaluating our classifier, we first verify how much mobile traffic can be identified using the DNS-based method described by Trevisan [11]. First, we extract the pairs from the DNS traffic in Dataset1. Then the IP addresses with a single domain are used to identify traffic. Finally, this method can identify up to 30.66% of the flows, which account for 20.8% of the amount of bytes. Hence the traffic that can be identified by the DNS-based method is in the minority.

To evaluate our classifier, Dataset1 is randomly split into a training set (70% of samples) and a testing set (30% of samples). For each classifier the evaluation process is repeated 10 times (with different splits each time) and the average results are presented. The average precision, recall, and FP for target apps of each classifier are listed in Table 5. The detailed results are illustrated in Figures 24.

The results show that our method achieves the highest precision of nearly 99% and produces a much smaller average FP number than the other two classifiers. The 1D-CNN classifier has the highest FP number, resulting in its lowest precision. Our method produces a 94% reduction in FP compared to the baseline, which indicates that the third layer of our classifier has a better capability of excluding unconcerned traffic. However, our classifier has the lowest recall. As can be seen in Figure 3, TaoBao, BaiDu, and QQ’s low recall decrease the average recall.

The recall of the second and third layer of the proposed classifier is shown in Figure 5. It can be seen that the classifier already has a low recall on the latter three apps in the second stage. In the second stage, 57.29%, 47.43%, and 33.97% of the instances of the latter three apps were classified as ambiguous flows, respectively, which is the main reason for the classifier’s low recall. Then we examined the training data carefully. It is interesting to note that the latter three apps have a lot of associations with the unconcerned apps in Dataset1. For example, QQ is an instant messaging app, but it integrates a lot of functions, such as news, mail management, and music playing. Moreover, these additional functions have independent apps which belong to the same company as QQ. This results in the fact that some traffic of QQ has similar behavior patterns to that of other apps. And the classifier would reject the judgment of these flows aggressively in order to prevent false positives. A similar phenomenon exists in TaoBao and BaiDu.

Additionally, we also investigate whether the low recall is related to the sample size, considering that the sample sizes of the latter three apps are indeed lower than those of the other four apps. Therefore, we oversampled the training set using the SMOTEENN [29] before training the second and third layer classifiers. The results are already shown in Table 4 and Figures 24. It can be seen that the sample size is not the main reason affecting the recall of the classifier. Oversampling does increase the recall of apps with smaller sample size but also increases the FP greatly.

4.4. Evaluation on Dataset2

We retrain the three classifiers with Dataset1 as training set and evaluate them on Dataset2. The results are listed in Table 6.

The results are similar to the evaluation results for Dataset1, which show that the three-layer classifier has a better capability of excluding unknown traffic compared to the other two classifiers. The proposed classifier produces 152 false positives in total, among which 45 flows come from zero-day apps. By contrast, Random Forest produces 1478 false positives, among which 403 flows come from zero-day apps. 1D-CNN produces 3963 false positives, among which 1348 flows come from zero day-apps. Therefore, our classifier can exclude up to 99.2% zero-day traffic.

Additionally, some interesting results are observed by scrutinizing the remaining false positives produced by our classifier. For example, 31 flows are misclassified as Tencent Video, among which 3 flows belong to QQ Music and 28 flows belong to Tencent News. It is noteworthy that Tencent Video, QQ Music, and Tencent News are all developed by Tencent, and Tencent News needs to access a lot of video resources which are also accessed by Tencent Video. A similar situation exists in QQ, where 46 false positives are from several other apps of Tencent, such as Tencent Maps and Tencent Weibo. Besides, we extracted the server IP address of the flow which is misclassified as Sougou Pinyin and then filtered the flows in the Dataset1 with the same server IP address. We note that all labels of those training samples are Sougou Pinyin. Therefore, the misclassified flow is likely to have an inaccurate ground truth.

4.5. Encrypted and Unencrypted Traffic Identification

In order to find how the proposed classifier behaves for the encrypted and unencrypted traffic, the proposed classifier is used to identify the encrypted and unencrypted traffic, respectively. For simplicity, flows over port 443 are considered to be encrypted traffic in this paper. The rest is considered as unencrypted traffic, even if the data of one flow is encrypted before transmission.

In Dataset1, 70% of encrypted flows and unencrypted flows are used to train the three-layer classifier, and the remaining are used as the test set. The encrypted traffic distribution of Dataset1 and identification results are shown in Table 7. The encryption ratio means the ratio between the size of encryption flows and the whole flow size.

It can be seen from Table 7 that the precision of the identification of unencrypted traffic is slightly higher than that of encrypted traffic, but the recall of the identification of unencrypted traffic is much higher than that of encrypted traffic. It shows that the identification of encrypted traffic is indeed more difficult than unencrypted traffic. One of the possible reasons is that the proposed classifier uses the payload byte values as features, and the byte values of encrypted traffic do not have distinct distinguishable features due to encryption. In addition, it can be seen that the encryption ratio of BILIBILI is relatively high, and its identification recall is also higher than other apps. Although TaoBao also has a higher encryption ratio, its recall is lower. In contrast, QQ has a low encryption ratio and a low identification recall. Therefore, the identification performance of encrypted traffic is not directly related to the encryption ratio of an app. Further comparisons can be made between identifiable encrypted traffic and unidentifiable encrypted traffic in the future work.

5. Discussion

Mobile traffic identification in real world requires more than merely identifying and discriminating apps traffic of interest. Another requirement is eliminating massive unknown app traffic. In contrast to other methods proposed in close-world settings, our method takes into account both the requirements. The experimental results obtain better performance than the state of the art. Throughout this work, there are some observations deserving further discussion.

First, the results show that 1D-CNN has poor capability of excluding unknown traffic. 1D-CNN uses payload as input, so the extracted features are limited to sequence features in the payload, which may lead to its poor performance. In contrast, models based on decision trees and side-channel data feature show better robustness. Second, the evaluation results for Dataset2 suggest that it is difficult to completely discriminate an app from other apps, especially when two apps have a close relationship. For example, two apps have the same functionality and are developed by the same company. In this case, both will access the same resources, thus generating similar traffic that cannot be discriminated. It is worth mentioning that half of the false positives produced by our classifier are caused by this reason. Therefore, the impact of various associations between apps on identification tasks deserves further study. Additionally, the experimental results show that the identification of encrypted traffic is more difficult than the identification of unencrypted traffic. In order to better identify encrypted traffic, different traffic characteristics can be designed for encrypted traffic, and encrypted traffic and unencrypted traffic could be identified separately.

The limitation of our classifier is that it radically excluded many true positives in the second layer, resulting in a low recall for some apps. In future work, we will try to design different ambiguous traffic extraction method to detect ambiguous flows, thus enhancing the performance of our classifier.

6. Conclusion

In this paper, we proposed a three-layer classifier to identify mobile traffic. This classifier can distinguish the traffic between different target applications and eliminate unknown traffic effectively. We collected a representative dataset to validate the classifier. The proposed classifier has a precision of 98.9%, and the produced false positives are far less than the state of the art. Additionally, the experiment results show our classifier has great capability of detecting zero-day apps traffic, which meets the requirements of mobile traffic identification in real world networks.

Data Availability

The mobile network traffic data used to support the findings of this study have not been made freely available because of the need to protect user privacy. Requests for access to these data should be made to Shuang Zhao, [email protected].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant No. 61379148.