Abstract

With the advent of the 5G network, edge devices and mobile and multimedia applications are used a lot; malware appeared to target edge devices. In the fourth quarter of 2020, 43 million pieces of malware targeting mobile devices occurred. Therefore, a lot of researchers studied various methods to quickly protect users from malware. In particular, they studied detecting malware for achieving the high accuracy with deep learning-based classification models on mobile devices. However, such deep learning-based classifiers consume a lot of resources, and mobile devices have limited hardware resources such as RAM and battery. Therefore, such approaches are difficult to be used in the mobile devices in practice. In this work, we study how a deep learning classifier classifies malware and proposed a novel approach to generate a light-weight classifier that can efficiently and effectively detect malware based on the insight that malware exhibits distinctive features as they are programmed to perform malicious actions such as information leaks. Therefore, by analyzing and extracting distinctive features used by a deep learning classifier from malicious dataset, we generate a light-weight rule-based classifier with high accuracy to efficiently detect malware on edge devices called LiDAR. On an edge device, LiDAR detects malware with 94% accuracy (F1-score) and 85.67% and 328.24% lower usages for CPU and RAM, respectively, than a CNN classifier, and showed the classification time 454.37% faster than the classifier.

1. Introduction

With the introduction of the 5G network, people enter the era of Internet of Things (IoT) in which more devices are connected as developed IoT; edge devices are growing a lot. It is expected that there will be more than 7.49 billion edge device (e.g., smartphones and wearable devices) users worldwide in 2025 [1]. Also, due to the high use of edge devices, multimedia applications are used a lot, and it is seen that cumulative downloads of multimedia applications (e.g., WhatsApp, YouTube, and Facebook) are about 28.4 billion or more [2]. Furthermore, mobile multimedia usage is about 4.23 hours per day which is consumed a lot of time [3]. Unfortunately, due to the severe security threats (e.g., Botnets and man-in-the-middle attack) and major privacy violations (e.g., social security numbers, credit card numbers, and passwords), the use of the edge devices is still risky [48]. For example, a single wrong click can launch a malicious program causing damage such as personal information leakage or financial loss. In the fourth quarter of 2020, 43 million pieces of malware targeting mobile devices appeared [9].

Such threats have led to the release of many commercial antivirus products such as Avast, Kaspersky, McAfee, and Norton. However, those antivirus products have a fatal limitation: They cannot detect unknown malware because they mostly rely on the signatures of known malicious applications [10]. To overcome the limitation, a lot of research works have focused on developing malware detection approaches using deep learning algorithms to protect users [1123].

Recently, along with the advances in mobile systems-on-a-chip (SoCs), there have been increasing pushes to run malware detection schemes directly on edge devices [11, 12]. This is because executing the schemes on the edge devices can improve the service response time by eliminating the data transfer overhead. It can save up to 46% overhead system consumption than local execution [24]. However, running deep learning-based malware detection approaches on edge devices is still at the nascent stage, since the edge devices are usually energy and resource constrained [25]. Running complex neural networks including many layers, nodes, and many features makes the edge devices consume CPU usage of at least 60% or more (six cores) and RAM usage of about 10 GB [26, 27]. Although previously studied deep learning-based malware detection approaches could achieve very high accuracy, it is hard to apply them on the edge device of which executing resources are limited. Consequently, it is of importance to develop approaches that can employ deep learning-based malware detection on the edge device.

In this work, we propose a novel approach to generate a deep learning-based light-weight classifier, named LiDAR, to enable efficient malware detection at the edge. To build the LiDAR, we first analyze malicious dataset such as SMS spam dataset, e-mail spam dataset, and Android malware dataset. We then extract word tokens from the malware dataset and train a convolutional neural network (CNN) algorithm using the extracted word tokens. Based on the trained CNN algorithm, we extract features that have high weight values using a visual explanation method of decisions from a large class of a CNN-based model, called gradient-weighted class activation map (Grad-CAM) [28], assuming that those features highly contribute to the prediction accuracy. Based on those features, we build a light-weight rule-based classifier.

To show the efficiency and effectiveness of LiDAR, we evaluate it on a workstation as well as the Raspberry Pi. Our evaluation results clearly demonstrate that LiDAR significantly improves the resource utilization as well as the classification time, compared to the state-of-the-art CNN-based classifiers, achieving the feasible accuracy: on average, LiDAR showed 85.67% and 328.24% lower usages for CPU and RAM, respectively, than a CNN classifier, and showed the classification time 454.37% faster than a CNN classifier to detect Android malware, while achieving 93% of the prediction accuracy.

In summary, our contributions are as follows: (i)First, we analyze general approaches of malware detection process using deep learning-based classification models with spam dataset and Android application dataset(ii)Second, based on the analysis, we use a deep learning algorithm to find distinctive features of malware. And, we design a light-weight classifier with the high accuracy to efficiently detect malware on edge devices(iii)Lastly, we thoroughly evaluate a prototype of LiDAR. Also, we compare our classifier against deep learning classifiers to demonstrate the computation resources and classification time of it. Our approach shows better performance of 85.67% and 328.24% lower usages for CPU and RAM than CNN classifiers with 94% accuracy (F1-score)

In this section, we introduce the advantages and disadvantages of Android malware detection using deep learning-based approaches. We, also, discuss commonly used features of Android malware employed by the previous studies.

2.1. A Limitation of Deep Learning-Based Android Malware Classification Approaches

Recently, a surge of studies were proposed to detect Android malware by using deep learning-based approaches using various features [1123]. The advance of deep learning algorithms helps achieve the high accuracy by learning distinctive features of data with complex neural networks. Table 1 shows the accuracies (or F1-score) of previous deep learning-based malware detection approaches with algorithms and features used. However, classifiers generated by deep learning algorithms usually require the high computation time and resource usage because many approaches use excessive and detail features based on complex neural networks to achieve the high accuracy [1119]. Consequently, even though they could achieve the high accuracy, it is difficult to employ them in practical on the most of smart edge devices which have limited computing resources.

2.2. Commonly Used Features for Android Malware Detection

Table 1 summarizes state-of-the-art deep learning-based malware detection approaches. In general, the methods are built based on various features including permissions and/or API calls. Permissions include information on the system-level functionalities, such as current location and network status. API calls are related to the functionalities that an application provides to users (e.g., SMS functions, call functions, and read and write functions). Malicious applications usually exploit specific permissions or API calls, such as reading sensitive data (e.g., a function reading a password) or transferring data (e.g., a function writing to a socket), to leak private data or capture the user behaviors. By using combinations of such features, previous approaches aimed to not only detect malware but also discover its malicious behaviors to assist the wholistic analysis process.However, in edge use cases, it does not necessarily use such detailed features because we merely need to discover whether an application is malicious or not rather than discovering its malicious behaviors in detail. Also, malicious applications usually share distinct features because they are programmed to inflict damages such as sensitive information leaks or financial loss to users. Hence, based on this insight, we propose a way to generate a light-weight classifier that can efficiently detect malicious applications.

3. Overview

We first analyze how deep learning-based classifiers classify malware. Based on the analysis, we aim to design an approach to generate a light-weight classifier with the high accuracy to efficiently detect malware on edge devices. To achieve the goal, we employ a deep learning algorithm to find distinctive features of malware. Since we cannot directly obtain the distinctive features from the trained neural network due to its insufficient explainability, we use Grad-CAM that visualizes how much the features contribute to the classification accuracy. Based on the extracted distinctive features, we build a light-weight rule-based classifier, named LiDAR. It is worth noting that our approach can be applied onto the malware classification problem as well as other types of data which have remarkable features such as scam email. In general, such “malicious” samples in any dataset have distinguishable features from benign samples because attackers create them to have uncommon features shared by benign samples. Therefore, by using distinctive features from malware, we could reduce features and lowering overhead classification for malware detection.

In the following sections, we show how we collected the dataset for this study (in Section 4.1), how we preprocess the dataset (in Section 4.2), how we learn features of the dataset by using a deep learning algorithm (in Section 4.3), how we select important features with a visual explanation technique from the deep learning-based model (in Section 4.4), and how we generate a light-weight classifier based on the features (in Section 4.5).

4. Design

In this section, we demonstrate our approach to generate a light-weight classifier based on the learning result of a deep learning algorithm to classify data samples that have distinctive features such as malware. Figure 1 shows the overview of our approach.

4.1. Dataset

In this work, we collected 24,232 real-world data as in Table 2, which consists of SMS spam message dataset [29], e-mail spam dataset [30], and Android malware dataset appeared from 2019 to 2020 [4]. By using our dataset, we demonstrate that malicious samples of the dataset have notable features to distinguish them from benign data samples, and thus, we can generate a much lighter classifier than deep learning-based models.

4.2. Preprocessing

To make light-weight classifiers, we use word tokens. We, thus, transform the malicious dataset (i.e., SMS spam dataset, e-mail spam dataset, and Android malware dataset) to word tokens. Finally, we remove duplicated word tokens.

4.2.1. Word Normalization

To remove unnecessary texts such as special characters, newline, and stopword for malware classification, we normalize the dataset. We, then, group texts that means the same (e.g., [email protected] to email address, https (http) to http address, and $ to dollar). On the other hand, Android malware datasets have many text features (in Section 2.2). Hence, we use Android framework APIs as the main feature of Android malware. We, also, extract API call graphs (ACG) by which we can track data flows between a point where sensitive data is read and another point where the sensitive data is exported by using FlowDroid [31].

4.2.2. Word Encoding for the Malware Dataset

To learn the malware dataset, we convert a preprocessed each word token in the malware dataset to an integer number for the efficiency. When we meet unknown tokens that could not find in the learning process, we map such word tokens to “Unknownword” token. Lastly, add paddings to make the malware dataset the same length.

4.3. CNN Architecture

We employ a simple CNN for the deep learning algorithm [32, 33]. CNN is widely used to find common features of malware word tokens that are frequently used in actual malware dataset [34]. We use a standard convolutional neural network architecture. The input first goes through an embedding layer and then a one-dimensional convolutional layer (Conv1D) with ReLu activations. The last layer is a dense layer after we flattened data into a vector. The Conv1d is trained by a word using kernel size of 1 to capture a feature of each. We also use the Sigmoid activation function, to further classify binary labels.

4.4. Feature Selection

To investigate how different word token features contribute to the accuracy of a CNN classifier, we use Grad-CAM. Grad-CAM enables one to visualize each feature map layer and understand how the input data of a CNN affect the classification. Also, Grad-CAM can extract weight values without architectural changes or retraining. Grad-CAM exploits the feature maps extracted from the Conv1D layers to identify the impact of the features on the classification results. Grad-CAM sorts the feature maps based on the weight values of any class flowing into the final convolutional layer. As a result, Grad-CAM can extract a heat map of weight values for the word tokens which can be used for the light-weight classification.

Table 3 shows extracted features of the malware dataset using Grad-CAM. Higher values indicate malicious features, while lower values indicate benign features.

4.5. LiDAR

To build the light-weight classifier, we identify important features to classify malware from the malware dataset based on the weight values of the extracted features (Section 4.4). As a running example, Figure 2 shows the classified malicious data from the SMS spam dataset based on the weighted values by using the CNN algorithm. In Figure 2, the first three words indicate malicious weighted values, and the others indicate benign weighted values. In this case, an average of more than one-third of the 600 training SMS spam dataset can be identified as the malicious weight values. This means that the malware dataset has more than one-third of distinct malicious features, and the malware dataset can be classified by the number of malicious values. The rule-based classifier can be built based on the observation, by analyzing the number of malicious weight values. Because the CNN classifier does not classify the malware with the context information of SMS spam dataset but with the observed number of distinct words, the rule-based classifier can be built using the following two conditions: (i) When a data has a lot of prelearned words—in this case, we can apply a heuristic condition when a data do not have more than 1/3 of prelearned malicious or benign words. If a data sample has more than one-third of malicious words, we classify it as malware. (ii) On the other hand, if a data sample contains more malicious words than benign words, we classify it as malware. By exploiting distinctive features of malware, we can generate an effective classifier much lighter than a deep learning classifier, albeit we need manual efforts to decide the threshold for classifying malware.

5. Evaluation

In this section, we evaluate our approach to demonstrate its efficiency and effectiveness. We use a Raspberry Pi using the ARM64 architecture as well as a workstation. For the convenience, we refer the CNN classifier to CNNc, CNN classifier using high-weight features to CNNg, and our approach to Light-weight Deep Learning-based Malware Classifier (LiDAR).

5.1. Experiment Setup

We performed our evaluations on a workstation running Ubuntu 18.04 with 20-core Intel Xeon Gold 6230 two CPUs at 2.10 GHz, 256 GB RAM, and a NVIDIA GeForce RTX 2080 GPU. And we conduct experiments on a Raspberry Pi 4 Model B (Rev 1.4) running Ubuntu 18.04 with a 4-core Cortex-A72 (ARM v8), 4GB RAM. We implemented LiDAR by using Python v3.7.1, TensorFlow GPU v1.14.0, Keras v2.2.4, CUDA v11.2, and FlowDroid v1.5 for extracting ACG.

Table 4 shows that the number of words used for performance comparison in each classifier.

5.2. Evaluation Metrics

To explore the effectiveness and efficiency, we used the following metrics. (1)CPU Usage. We consider the maximum workload that a single CPU can handle is 100%, and we show the classifier’s CPU usage based on it (e.g., if CPU usage is 200%, it means we need two cores fully to perform a classification)(2)RAM Usage. We measure the resident set size (RSS) of a classifier when it runs(3)Classification Time. We measure the total execution time of a classifier(4)F1-Score. We use the F1-score of classification results to show the effectiveness of each classifier

5.3. Evaluation Results on the Workstation

In this section, we evaluate classifiers on a workstation using malware dataset (SMS spam dataset, e-mail spam dataset, Android malware dataset).

Figure 3 and Table 5 show the experimental results. CNNc used an average of 3,107% of the CPU usage, and CNNg used an average of 2,867%. On the other hand, LiDAR showed an average of 108% of the CPU usage, which is much lower than the CPU usage of CNNc and CNNg. In addition, the RAM usage of LiDAR is also averagely 500.4% and 311.02% lower than that of CNNc and CNNg, respectively, as shown in Table 4. These results yielded the significant improvement of classification time of LiDAR (averagely 228% and 46.78% faster than CNNc and CNNg, respectively). Nevertheless, LiDAR achieves almost similar F1-score with CNNs and CNNg; the accuracy difference of CNNc and CNNg is only 3.87%. These results imply that LiDAR strikes a good trade-off point between the performance and prediction accuracy.

5.4. Evaluation Results on the Raspberry Pi

Table 6 and Figure 4 illustrate evaluation results of each classifier on the Raspberry Pi. CNNc and CNNg used 285% and 278% CPU usages on average, but the CPU usage of LiDAR is 154% on average, which is 80.98% and 85.67% lower than the CPU usage of CNNc and CNNg, while the RAM usage of CNNc and CNNg is 328.24% and 171.13% on average, which is much higher than that of LiDAR. As a result, LiDAR has an average classification time of 454.37% and 127.95% faster than CNNc and CNNg. Despite the improvement of these results, there is only a small difference in F1-score of 3.87% with CNNs and CNNg, such as the experimental results on a workstation. Consequently, we can observe that LiDAR offers a good compromise between the performance and classification accuracy in any environment.

6. Conclusion

With the advent of the 5G network, a lot of malware targeting IoT devices occurred. Accordingly, a lot of research is on deep learning-based approaches to quickly protect users from malware. However, such deep learning-based approaches consume a lot of resources. In this work, to enable efficient malware detection on the edge devices, we proposed a novel approach to generate a light-weight classifier, LiDAR. We analyzed the SPAM and malware features by using deep learning-based Grad-CAM. Based on distinct features extracted by Grad-CAM, we built LiDAR with a rule-based classifier. Our evaluation results show that LiDAR can effectively detect malware achieving 92.78% of prediction accuracy, while only exhibiting 154% and 205.15 MB of CPU and memory resources, respectively, which resulted in the significant improvement in the classification time: roughly two times faster than a CNN-based deep learning model on average.

6.1. Limitations and Future Works

First off, LiDAR has the out of vocabulary problem as the other deep learning-based approaches have. If our classifier meets an unknown word token, the token is simply ignored. Therefore, to use LiDAR in practice, it is important to continuously learn emerging malware. In addition, similar to the other malware classification approaches, LiDAR cannot detect heavily obfuscated malware because we cannot find effective word tokens from malware if obfuscation techniques such as the class encryption are applied on the malware. We note that classifying unknown and obfuscated malware is a challenging problem, and the limitation is common in deep learning-based approaches. We leave these limitations as future work.

Data Availability

The data used to support the findings of this study were supplied by Jinsung Kim under license and so cannot be made freely available. Requests for access to these data should be made to Jinsung Kim ([email protected]).

Conflicts of Interest

The authors declare that they have no conflict of interest.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) Grant through the Korean Government (MSIT) under Grant NRF-2021R1A4A1029650.