Abstract

The post-COVID epidemic world has increased dependence on online businesses for day-to-day life transactions over the Internet, especially using the smartphone or handheld devices. This increased dependence has led to new attack surfaces which need to be evaluated by security researchers. The large market share of Android attracts malware authors to launch more sophisticated malware (12000 per day). The need to detect them is becoming crucial. Therefore, in this paper, we propose PICAndro that can enhance the accuracy and the depth of malware detection and categorization using packet inspection of captured network traffic. The identified network interactions are represented as images, which are fed in the CNN engine. It shows improved performance with the accuracy of 99.12% and 98.91% for malware detection and malware class detection, respectively, with high precision.

1. Introduction

Cell phones have become a vital piece of our routine for accessing valuable services as mobile banking, shopping, food, and governance. The data transferred from these apps are sensitive, and many malicious applications are objectified to get such information using different means [1]. Cybercriminals resort to social engineering tools, the most common of these passing a malicious application off as another popular and desirable one. Recently, a popular and attractive name, “Coronavirus,” has been used in different ways for malicious purposes, such as package names concealing spyware and banking Trojans, adwares, and droppers [2]. Of course, this was not limited to naming: the pandemic theme was also used in application user interfaces. Mobile malware and adware in particular often come in the form of a gaming or entertainment app that seems harmless, but what users are unaware of is that their device is doing malicious activities in the background [3]. Therefore, mobile malware is on the rise, with attackers shifting their efforts to smartphones and tablets as global mobile markets come under attack. Staying secure means recognizing your risk and understanding common threats by adopting an effective malware detection mechanism [4]. Figure 1 illustrates the rise in research publications in the domain of malware detection and related terms.

The existing malware detection mechanism relies on two methods, dynamic and static. In addition, having a new literature review with machine learning influenced the research studies and explored some technical details in malware detection using machine learning-based techniques. Numerous past works are identified with Android malware detection, yet the vast majority of the past investigations utilize limited features to distinguish malware [68]. Each kind of component can address a couple of properties of the applications. Android malware detectors are vulnerable and can be evaded with a low evasion rate. A robust approach is required to establish a durable defense against these adversarial attacks that are too difficult to bypass. The expanding malware threats risk has constrained the Android antimalware industry to foster solutions for mitigating malware threats on Android cell phones and other Android gadgets [9, 10].

In this paper, we propose PICAndro (Packet InspeCtion-based Android malware detection) a network interaction-based detection framework. We first generate dynamic analysis network traffic logs for an input APK (an Android executable), followed by conversion of network traffic into network interactions after packet inspection. Furthermore, they are represented as gray-scale images. Images are thus fed into the convolution neural network model for training. Moreover, the model is evaluated against the dataset for malware detection.

Organization of the paper is as follows. Section 2 discusses the related work. In Section 3, we present the proposed framework followed by performance evaluation in Section 4. Furthermore, we discuss issues related with proposed approach and comparison in Section 5, followed with conclusion in Section 6.

The expanding number of Android malware brings more security issues to mobile users and makes it challenging to identify the malware [11]. Various researchers have been focused on different solutions in Android malware detection.

2.1. Network Traffic-Based Android Malware Detection

To identify and classify Android malware, various solutions have been proposed in the literature. Malik and Kaushal [14] gave CREDROID, a semiautomated Android malware detection approach using network traffic. The authors focused on the DNS server and remote server traffic to identify malware transferring sensitive information. The proposed solution lacks the identification of malware without network interactions. Li et al. [12] proposed a technique to detect malware based on network traffic monitoring and used SVM for feature extractions. The authors focused on the improvement of Android terminal defense ability against malicious attacks. Arora et al. [13] focused on malware detection in Android-based mobiles. The authors used rule-based classifiers in traffic analysis for malware detection.

Zulkifli et al. [16] proposed a dynamic malware detection technique based on decision tree algorithms emphasizing behavioral aspects of the network. The authors used Drebin and Contagio dataset for feature selections. Wang et al. [29] focused on multilayer traffic analysis for malware detection. The authors proposed lightweight malware detection based on the combination of network traffic analysis and machine learning. The proposed approach is applied on the server-side only. Zaman et al. [30] focused on malware detection and proposed a method based on behavioral analysis using syscall tracing. Abuthawabeh and Mahmoud [17] proposed a model for Android malware detection and categorization based on conversation-level network traffic features. However, authors do not include feature extraction at the run time.

2.2. Deep Learning-Based Android Malware Detection

Different methodologies have been proposed in past research works fully intent on identifying Android malware based on deep learning mechanisms. Alzaylaee et al. [21] proposed a deep learning-based malware detection approach for mobile applications based on the dynamic analysis with the help of state entire input generations. The proposed method can be able to detect zero-day Android malware. In this paper, the authors evaluated 31,125 Android applications and 420 static and dynamic features. Wu [31] presented a detailed study on the deep learning-based Android malware detection solutions and classified them as per their techniques. Yuan et al. [19] proposed a deep learning-based malware detector based on rule mining techniques. The authors extracted 192 features from both static and dynamic analysis with the help of the DBN-based deep learning model.

Kim et al. [20] presented a detailed study on multimodal deep learning used for malware detection and proposed a malware detection framework based on static analysis. The authors provided a flexibility feature that in future more features can be added as per the requirements. Sihag et al. [22] proposed deep learning-based Android malware detection framework using dynamic features. The authors considered dynamic analysis of the logs of Android APK and done processing on features. The proposed approach was tested on 13,533 applications and extract behavioral patterns. Zhang et al. [23] focused on feature selection and processing and proposed an Android malware detection approach based on the text sequence of APPs generated by AndroPyTool. Bayazit et al. [24] proposed a neural network-based Android malware detection mechanism based on IP features selection. The authors used the CICMalDroid2017 dataset for analysis. The IP was converted into integer numbers and subdivided into four numbers.

2.3. Image-Based Android Malware Detection

Darwaish et al. [28] presented an image-based Android malware detection approach that is robust against various adversarial settings. The authors have checked the proposed against two novel attacks. Ding et al. [25] proposed the CNN-deep learning-based static Android malware detection method. The authors used the bytecode file as a binary stream and converted it into the 2D matrix. Mercaldo and Santone [18] focused on the familial classification problem of malwares and evaluated their approach against 50,000 samples. They used mobile applications as a gray-scale image to identify belonging malware facilities.

The static analysis approach can be affected by code obfuscation and code manipulation techniques [32]. Ünver and Bakour [27] proposed a framework for distinguishing between the Android applications as software or malware. Yang and Wen [33] inspected unzipped files from APK files using images patterns with the help of a random forest classifier.

Table 1 provides a comprehensive overview of research work in network, deep learning, and image-based Android malware detection approaches available in the literature.

3. Design of PICAndro

In this section, we discuss the overview and design of the proposed framework.

3.1. Overview

The proposed architectural diagram of the PICAndro framework is illustrated in Figure 2. The objective of the framework is to classify the given Android application executable .APK based on its network behavior. Network behavior of APK is extracted by executing it in an emulated environment. Captured network interactions in the form of packets are inspected to extract network flows and sessions, which are further represented in the form of images. The generated images are fed into convolution neural networks for training the model, which is then evaluated against the test dataset to answer our research questions. The proposed approach consists of below mentioned modules.

3.2. Dynamic Analysis

Two types of approaches, namely, static and dynamic analysis are used to extract application features. Static (code) analysis analyzes an app by scanning its code, whereas dynamic analysis extracts features by executing it. We employ dynamic analysis to record application behavior as it is effective against evasive applications [34]. The first module of the proposed approach involves running sample Android applications on an emulator to log application behavior and capture network traffic [35]. User interactions into the emulator were fed using the Monkey tool. The captured log includes system calls, network traffic, binder calls, and composite behavioral interactions. For our analysis, we focus on network traffic only.

3.3. Image Representation

The captured traffic comprises packets of different sizes and different network interactions. Packet inspection based on different network granularity levels outputs different network interactions. The proposed work uses flow and session as network interactions. It does not consider per packet interactions. A session can be defined as a collection of flows in both directions corresponding to a connection whereas a flow can be defined as packets having the same 5 identifiers, namely, source and destination IP addresses; source and destination port numbers; and protocol. We consider only the first bytes of a flow/session for representing as an image for data uniformity. The starting bytes of a flow/session best reflect its characteristics as it contains connection information and few data contents. Each byte of the network interaction is represented as a pixel (e.g., 0 x ff represents a white and 0 x 00 represents a black pixel). Steps involved are defined in Algorithm 1. Figure 3 shows 20 × 20 image representation of flows in malware samples from different families.

Input: RawTraffic.pcap
Output: Gray-ccale images
Extract NInt from RawTraffic.pcap
foreach NInt do
if NInt is empty then
  continue
end
if NInt already exists then
  continue
end
if NInt sizebytes then
  consider first bytes
end
if NInt sizebytes then
  pad 0 x 00’s till size bytes
end
 Generate N × N size gray-scale image
end
3.4. Convolution Neural Networks

We employ the convolution neural network (CNN)-based deep learning method for image classification. The CNN model is first fed with Network Interaction (NInt) images of size N × N. The first convolution layer () performs convolution operation with 32 kernels (of size 3 × 3). The results of (N × N × 32 output shape) are fed into a 2 × 2 max-pooling layer . It is followed by a second convolution layer with 64 kernels (of size 3 × 3) and second 2 × 2 max-pooling layer . The last two layers are dense layers (dropout = 0.1). For the output layer, we use sigmoid and softmax functions for binary and multiclass classification, respectively. Rectified linear unit (ReLU) activation function is used for hidden layers as it forwards only the positive part of the argument.

3.5. Classification Model

The representation of network interaction behavior enables us to detect misbehavior by samples effectively. The above discussed CNN model is trained on the dataset, which is then used for classification and detection. The classification results are then used for performance evaluation.

4. Performance Evaluation

In this section, we first introduce datasets and evaluation parameters. It follows with the evaluation of our proposed approach against the following research questions:RQ1. Can PICAndro detect malware samples with high accuracy?RQ2. Can PICAndro effectively classify malware samples into their classes?RQ3. Which network interaction among flow and session is better for network traffic-based detection?RQ4. Which image size is most effective for representing network interactions?

4.1. Dataset and Evaluation Metrics

To answer the listed RQs, we evaluate the performance and efficiency of PICAndro against a dataset. A dataset of Android application consisting of 13533 samples was collected from different sources (Benign samples = 2621 and malicious samples = 11712) [3639]. The dataset comprises different Malware types and Benign samples. Samples were categorized for 2-class (Malware and Benign) and 5-class (Adware, Banking, Benign, Riskware, and SMS) scenarios. Samples were analyzed using dynamic analysis, and network traffic was recorded. Of the initially collected APK samples, samples that did not execute during dynamic analysis or generate network traffic were not considered further. The captured traffic was inspected to identify network interactions (flows and sessions). Table 2 describes the successful sample APKs in each category of the dataset, generated flow, and session statistics. It was observed that Riskware category generated most number of interactions (#Sessions/#APKs and #Flows/#APKs), with 34.6 sessions per sample and 61.1 flows per sample. SMS category generated the least network interaction around 2.6 sessions and 5.2 flows per sample. Figure 4 represents the filesize distribution of sample APK files among categories of the dataset. Figure 5 illustrates the number of packets per session and flow for the captured traffic. Each network interaction is then represented as an image of N × N size. For experimental purpose, we have considered multiple image size 400 (20 × 20), 625 (25 × 25), 784 (28 × 28), and 900 (30 × 30).

Parameters listed in Table 3 are considered to evaluate the PICAndro framework.

4.1.1. RQ1: Can PICAndro Detect Malware Samples with High Accuracy?

The problem of malware detection deals with identifying malicious network interactions from the dataset. From the dataset, we created a binary classification scenario for both flow- and session-based network interactions. For flow-based binary classification, 81149 Benign and 322768 malicious flows were generated. For session-based binary classification, 41145 Benign and 178341 malicious sessions were generated. Table 4 shows the results of PICAndro against Precision, Recall, F-measure, and Accuracy parameters. Following conclusions are drawn from it:(i)For binary classification, both flow- and session-based scenarios perform satisfactorily on dataset with accuracy (greater than 99%)(ii)For 2-class classification, all scenarios with different image sizes perform considerably well with reference to evaluation parameters

RQ1 answer: PICAndro can effectively detect malware samples with high accuracy.

4.1.2. RQ2: Can PICAndro Effectively Classify Malware Samples into Their Classes?

The problem of classifying malicious samples into respective malware classes is popularly known as malware type detection/classification. For performance evaluation of PICAndro, we considered a 5-class classification scenario for both flow- and session-based network interactions. Classes considered were Adware, Banking, Benign, Riskware, and SMS type. For session-based classification, 23566 Adware, 30066 Banking, 41145 Benign, 114150 Riskware, and 10559 SMS class unique sessions were generated. For flow-based classification, 46435 Adware, 54000 Banking, 81149 Benign, 201570 Riskware, and 20763 SMS class unique flows were generated. Table 4 shows the results of PICAndro against Precision, Recall, F-measure, and Accuracy parameters for RQ2. Following conclusions are drawn from it:(i)Proposed work performs satisfactorily on dataset with accuracy greater than 98.5% and F-measure greater than 98%(ii)For 5-class classification, all scenarios with different image sizes perform considerably well with reference to evaluation parameters

RQ2 answer: PICAndro can effectively classify malicious samples into their class/type with high accuracy and F-measure.

4.1.3. RQ3: Which Network Interaction among Flow and Session Is Better for Network Traffic-Based Detection?

In the proposed work, we study the effectiveness of network interactions for network traffic classification. Multiple works on packet-based classification (malware and intrusion detection) exist in the literature. We try to identify which network interaction amongst flow and session does better network representation. For both binary and 5-class classification, flow-based detection outperforms the session-based approach. For each image representation of size 20 × 20, 25 × 25, 28 × 28, and 30 × 30, the flow-based approach shows better performance in terms of Precision, Recall, F-measure, and Accuracy. Only for a single scenario in 5-class classification with the image size of 25 × 25, session-based network interaction shows slight improvement over the flow-based one. Accuracy curve for training and test dataset for best results in each scenario is shown in Figure 6. Following conclusions can be drawn from it:(i)For binary classification, flow-based detection (99.12% accuracy and 97.76% F-measure) outperforms session-based (99.09% accuracy and 97.57% F-measure) approach(ii)For 5-class classification, flow-based detection (98.91% accuracy and 98.49% F-measure) outperforms session-based (98.56% accuracy and 98.05% F-measure) approach

RQ3 answer: flow network interaction is better for network traffic-based detection.

4.1.4. RQ4: Which Image Size Is Most Effective for Representing Network Interactions?

In the proposed work, we study the effectiveness of network representation in the form of N × N images. Multiple works exist on image-based malware detection approaches, where code segments are represented as images. We try to identify which image size does better network representation. Figure 7 illustrates the confusion matrix from 5-class classification of dataset based on flow-based network interactions represented as 20 × 20 images. Following conclusions can be drawn from it:(i)For 5-class classification, 20 × 20 image representation outperforms other image sizes.(ii)For 2-class classification, 20 × 20 image representation outperforms other image sizes. The 28 × 28 image representation during flow-based scenario also performs equally well on one instance.

RQ4 answer: image size 20 × 20 is most effective for representing network interactions.

5. Discussion

In this section, we compare our proposed system against state-of-the-art Android malware detection systems using network traffic. The efficiency and performance of the proposed solution are compared with those of previous studies in Table 5. It lists features employed solving Android malware detection problems using network traffic, furthermore the dataset used, techniques, and performance. The previous evaluation demonstrates the efficacy of our method in detecting recent malware using their network traffic.

Our proposed approach suffers from few limitations. Dynamic analysis is being used to execute the sample APK in the emulator. As dynamic analysis suffers from code coverage issues, random events in the emulator were generated using the Monkey tool to explore components of each activity. This increases the probability of triggering malicious behavior. However, it is possible that some of the malicious code segments were not triggered. A stateful input generator for emulation can be explored in future to gain advanced code coverage and real-world traffic.

6. Conclusion

Malware is an increasing threat to smartphone users. Antivirus scanners are evaded by ever-evolving malware with hardening methods. We introduce Android malware detection methods using network interactions (flows and sessions generated by packet inspection) represented as images. Evaluation of the proposed approach demonstrates its potential as it outperforms existing approaches and identifies malicious interactions with few false alarms. It shows improved performance with the accuracy of 99.12% and 98.91% for malware detection and malware class detection, respectively. In the future, the PICAndro framework can be extended to include other static and dynamic features (for example, system call, API, permissions, and network statistical information) than network features alone. Visual image analysis for malware families can also be explored for identification.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C101218712) and was supported by the Nuclear Safety Research Program through the Korea Foundation of Nuclear Safety (KoFONS) using the financial resource granted by the Nuclear Safety and Security Commission (NSSC) of the Republic of Korea (No. 2101058).