Abstract

There are thousands of malicious applications that invade Google Play Store every day and seem to be legal applications. These malicious applications have the ability to link the malware referred to as Dresscode created for network hacking as well as scrolling information. Since Android smartphones are indispensable, there should be an efficient and also unusual protection. Therefore, Android smartphones usually continue to be safeguarded from novel malware. In this paper, we propose RoughDroid, a floppy analysis technique that can discover Android malware applications directly on the smartphone. RoughDroid is based on seven feature sets () from the XML manifest file of an Android application, plus three feature sets ( and ) from the Dex file. Those feature sets pass through the Rough Set algorithm to elastically classify the Android application as either benign or malicious. The experimental results mainly consider 20 most common malware families, plus three new malware families (Grabos, TrojanDropper.Agent.BKY, and AsiaHitGroup) that invade Google Play Store at 2017. According to the experimental results, RoughDroid has 95.6% detection performance for the malware families at 1% false-positive rate. Finally, RoughDroid is a lightweight approach for straightly examining downloaded applications on the smartphone.

1. Introduction

The world’s most preferred mobile operating system currently is Android OS. Android surpasses Windows as the globe’s most preferred OS, yet some Android applications have been discovered to privately swipe individual details from various other applications. Recently, the GadGet Hacks website stated that, after evaluating 110,150 Android applications over a duration of 3 years, the scientists located countless sets of applications that can possibly leakage delicate phone or individual information as well as permitting unapproved applications to hack the blessed information. With numerous thousands of applications in various markets, Android OS offers riches of capability to its customers. Smart devices running Android are progressively targeted by assaulters as well as contaminated with destructive software programs [1].

Google took down over 700,000 bad Android applications in 2017, that is, 70% more than in 2016 [2]. In addition to the existing malware families, three new Android malware families (Grabos, TrojanDropper.Agent.BKY, and AsiaHitGroup) invade Google Play Store at 2017 [3]. It appears that there is an urgent requirement for quitting the expansion of malware on Android markets and also smartphones. The Android platforms constantly attempt as well to supply numerous security solutions that stop the installment of malware applications, most significantly the Android authorization system. To carry out particular tasks on the Android device, such as capturing a picture, the application needs to clearly ask for consent from the individual throughout the setup procedure. Some customers thoughtlessly approve the installment agreement to unidentified applications without thoroughly reviewing it.

As a result, malicious software is hardly constricted from the Android permission program in training. Opening your Android phone or tablet as much as for applications and video games outside Google’s protective walled yard likewise makes your device considerably a lot more at risk to malware. It is the cost you spend for a totally free software programs [4]. There should be a method that is able to restrict these malicious applications.

1.1. Motivation

To the very best of our understanding, a huge body of research study has actually examined approaches for evaluating and also discovering Android malware applications before their setup. These approaches could be approximately classified right into techniques making use of dynamic as well as static evaluation. There are some techniques that could keep track of the habits of applications at run-time, such as TaintDroid [5], DroidRanger [6], and DroidScope [7] which are techniques that could check the actions of applications at run-time. Although run-time monitoring is really reliable in determining harmful task, it experiences a substantial cost and could not be straight used for mobile devices. On the other hand, static evaluation techniques, such as Stowaway [8] and RiskRanker [9], typically generate just a tiny run-time overhead. While these methods are scalable and also reliable, they mostly improve the handcrafted discovery patterns, which are commonly not readily available for new malware circumstances. This is behind our motivation to propose a new Android malware detection scheme that makes it possible to recognize malware straight on the smartphone throughout the setup process based on Rough Set algorithm.

1.2. Main Contributions

In this paper, we introduced RoughDroid that is a new broad floppy analysis malware detector on smart Android phones during the installation time by introducing robust feature extraction framework. The main contributions could be summed up as follows:(i)Effective Detection: We introduce a novel scheme (RoughDroid) for combining floppy analysis and machine learning that is capable of identifying Android malware with high accuracy and few false alarms. Also, it is independent of manually crafted detection patterns.(ii)Various Features: RoughDroid groups numerous features from the manifest file as well as application’s Dex code. Those features are categorized into ten feature sets ().(iii)Rough-Based Detection: The proposed scheme considers the adware Android applications during the detection of malware applications. This is due to executing the detection process elastically using Rough Set algorithm that introduces flexible (not straight line) classification into benign and malware applications.(iv)Lightweight Analysis: For efficiency, we apply linear time analysis and learning techniques that enable detecting malware on the smartphone as well as analyzing large sets of applications in a reasonable time.

Finally, the experiments with 131,611 applications and 5,560 malware samples, in addition to 158 malware applications introducing three new malware families at 2017, demonstrate the efficacy of our method for directly checking downloaded applications on the smartphone.

1.3. Organization

The rest of this paper is organized as follows: Section 2 introduces our RoughDroid scheme with its ten feature sets and Rough Set detection algorithm. Section 3 presents the experimental evaluation of RoughDroid by comparing it with some popular detection schemes and ten of the most common antiviruses. Section 4 introduces the related work and smooth comparison between the currently proposed Android malware detection schemes and RoughDroid. This is followed by the conclusion in Section 5.

2. RoughDroid

In this paper, we present RoughDroid, a lightweight technique for discovering Android malware that presumes discovery patterns immediately. In addition, it allows recognizing malware straight on the smartphone. RoughDroid performs a broad floppy analysis, gathering as numerous features from an application’s code as well as manifest as feasible. These features are organized in groups of strings (for instance, features API calls along with network speeches) and embedded within a combined vector space. As an example, an application sending out premium SMS messages is cast to a particular area in the vector room connected with the equivalent consent, intents, and also API calls. This geometric depiction allows RoughDroid to recognize mixes as well as patterns of features indicative for malware automatically, by utilizing machine learning techniques.

To this end, our technique utilizes a broad floppy analysis that extracts feature sets from various resources and examines these features in a meaningful vector space. This procedure is shown in Figure 1 and also described as follows:(I)Floppy Analysis: RoughDroid floppily inspects a given Android application and various feature collections from the application’s manifest and also disassembled Dex code. RoughDroid inspects the application’s manifest and disassembled Dex code of a given Android Application in Parallel Sweep to reduce the time of analysis.(II)Constructing Vector Space: The extracted feature sets are after that mapped to a joint vector space, where patterns and also mixes of the features could be evaluated geometrically.(III)Rough-Based Detection: The embedding of the feature sets allows us to recognize malware utilizing effective strategies of machine learning (Rough Set algorithm).

2.1. Floppy Analysis

As the primary step, RoughDroid carries out a lightweight floppy analysis of an offered Android application. The floppy extraction of features should run in a constricted environment and in full prompt way. The customer might avoid the recurring procedure, if the evaluation takes so long time. Appropriately, it becomes vital to pick features that can be extracted effectively. We therefore focus on the manifest in addition to the disassembled Dex code of this application, which could be obtained with a parallel sweep within the application’s content. To enable an extensible as well as common evaluation, we represent all extracted features as sets of strings, such as authorizations, intents, and also API calls. Specifically, we extract the adhering ten sets of features.

2.1.1. Manifest Feature Sets

It is an effective file in the Android system that defines the performance and also demands of an application to Android. AndroidManifest.xml could be located at the root of the project and has numerous various feature sets.

A simple XML manifest file generated for an Android application tested under RoughDroid is shown in Figure 2. The presented XML file declares seven different features sets (, and ) as follows::Hardware Components: It has the requested hardware features by an application. The figure indicates three requested hardware features (microphone, telephony, and location.gps). An application that has access to GPS and network modules is, for instance, able to collect fine location information and send it to an attacker over the network.:Software Components: It indicates that the application utilizes or requires software features. The figure declares sip.voip that allows the application to use Session Initiation Protocol (SIP) services and do VOIP calls.:Requested Permissions: It is very important for Android security mechanisms. The figure mentions three dangerous permissions (RECORD_AUDIO, SEND_SMS, and ACCESS_FINE_LOCATION) that are granted to the application during the application’s setup time by the user.:App Components: It is a set of Boolean expressions that grant some services to the application, such as allowBackup and directBootAware.:App Activities: It allows the application to execute a specific activity, such as directBootAware and hardwareAccelerated.:Intent Filters: It specifies the types of intents that an activity, service, or broadcast receiver can respond to, such as action.MAIN and action.EDIT.:App Services: It represents a service as one of the application’s components, such as directBootAware and exported.

The information saved in AndroidManifest.xml file could be effectively obtained on the device by making use of the Android Asset Packaging Tool that allows us to extract out the previously mentioned sets of features.

2.1.2. Disassembled Code Feature Sets

We implement a lightweight disassembler, which takes as input the Dalvik Executable (Dex) and provides RoughDroid with the complete information about API calls and the data utilized in the application. The Dex file contains a set of class definitions and their associated adjunct data. Table 1 introduces a simple example for the Dex file that is enhanced bytecode for the Dalvik virtual machine. Every Android application has a unique classes.dex file, which references any type of approaches or courses utilized within an application. Basically, any type of task, things, or piece utilized within the codebase will certainly be changed right into bytes within a Dex file that could be run as an Android application.

We are mainly interested in the API calls and method calls, because they can be easily extracted from the Dex file of an application, as follows::Access to Undocumented/Hidden APIs: Applications could be limited from accessing APIs that are undocumented in the Android Software Development Kit (SDK). RoughDroid looks for the incident of these demands in the Dex file, in order to get a further understanding of the behavior of an application.:Suspicious APIs: Requesting some delicate information or sources of the Android phone might result in destructive behavior. We are laying more importance to a set of such suspicious APIs:(i)Sensitive data (IMEI and USIMnumbeleakage) APIs, where the Android requests are such as getDeviceId(), getSimSerialNumber(), and getImei();(ii)Network communication APIs, such as setWifiEnabled() and execHttpRequest();(iii)Location leakage APIs, such as getLastKnownLocation(), getLatitude(), getLongitude(), and requestLocationUpdates();(iv)Sending and receiving SMS/MMS messages APIs, such as sendTextMessage(), SendBroadcast(), and sendDataMessage().:Restricted API calls: The Android authorization system limits accessibility to a collection of crucial API calls. Our approach looks for the event of these calls that represent a apart of the Dex code, in order to get a much deeper understanding of an App’s capability.

2.2. Vector Space Construction

A harmful task is normally shown in particular patterns as well as mixes of the extracted features. As an example, a malware application sending the fine location of a smartphone might have the permission android.permission.ACCESS_FINE_LOCATION in and the hardware feature android.hardware.location.gps in . Preferably, we would like to create Boolean expressions that catch this reliance in between features as well as returning true if a malware is found.

We will need to place the extracted feature collections from an Android application () in a vector. In our experiments the vector space () contains approximately 550,000 different extracted features. If the application () contains the feature (), the vector space element for that feature is mapped to 1 (); otherwise, it is mapped to 0 (). A simple structure as an example of the vector space is shown in (1). Regardless of the measurement of the vector space, it is hence enough to just save the extracted features from an application for sparsely standing for the vector by using either hash tables [10] or Bloom filters [11].

2.3. Rough-Based Detection

Rough Set based data analysis [1214] starts after constructing the vector space (feature table), as depicted in Section 2.2. Each row represents a specific feature obtained from a certain feature set according to a specific Android application in our scheme. The Rough system has multiple entities and stages.(i)Feature Table: It is a pair where is a nonempty finite set of Android applications called the universe and is a nonempty finite set of features such that for every . The set is called the value set of , and elements of are called Android applications.(ii)Decisions: It is the feature table in the form , where (not a feature in ) is the decision feature. The features of are called conditional features or simply conditions.(iii)Approximations: Let :(a)Lower Approximation: It consists of all Android applications, which definitely belong to .(b)Upper Approximation: It contains all Android applications, which possibly belong to .(c)Boundary Region: The difference between the upper and lower approximations constitutes the boundary region of the Rough Set algorithm. Boundary positive and negative regions [15] are described as below. An Android application of the negative region does not belong to , an application of the positive region belongs to , and only one application of the boundary region belongs to . Those approximation sets and regions are shown in Figure 3.(d)Approximation Accuracy: The roughness precision of any subset with regard to , represented as , is quantified by , where represents cardinality of . For an empty set , we define . It is worth noting that . If , the set is crisp with respect to . If , is tough with reference to .

3. Evaluation

After providing RoughDroid thoroughly, we currently continue to an empirical assessment of its efficiency. In order to do so, we first describe the used dataset and then run some experiments to evaluate the detection performance.

3.1. Considered Data Sets

Our experiments are executed based on a dataset of genuine Android applications and also actual malware. We are utilizing Drebin dataset [1], it comprises 131,611 software samples collected from other tools, including Google Play Store, both Chinese and Russian Markets, and also Android sites. Additionally, it Includes 5,560 malware applications, from 179 distinct malware families, for example, FakeInstaller, DroidKungFu, Plankton, Opfake, GingerMaster, BaseBridge, Iconosys, Kmin, FakeDoc, Geinimi, Adrd, DroidDream, Linux/Lotoor, GoldDream, MobileTx, FakeRun, SendPay, Gappusin, Imlog, and SMSreg.

In addition, we have also considered 158 Android applications introducing three new malware families (Grabos, TrojanDropper.Agent.BKY, and AsiaHitGroup) that invade Google Play Store at 2017. It should be mentioned that the adware applications are considered in our dataset.

3.2. Performance Analysis

Our RoughDroid does not need initial training in advance, which is one of its basic advantages. RoughDroid analyzes each application in a broad floppy way and grab a great collection of features categorized in ten feature sets (). It should be mentioned that the results are obtained from the average of 25 trials using the same environmental conditions. We introduce our analysis based on comparing RoughDroid’s results with the results obtained from related approaches and ten popular antivirus scanners, finally employing RoughDroid to find the detection rate for the most popular malware families.

3.2.1. RoughDroid and Related Approaches

We initially contrast the efficiency of RoughDroid versus associated static methods for the discovery of Android malware. Specifically, we think about Drebin [1], RCP [16], Peng et al. [17], and SigPID [18]. The outcomes of these experiments are displayed in Figure 4 as ROC curve. RoughDroid outperforms the four previously mentioned approaches by detecting 95.6% of the malware applications at a false-positive rate (FPR) equal to 1%.

Also, according to the statistical measures introduced in Table 2, the Asymptotic Probability of the five schemes is much smaller than 0.05; thus we can conclude that all schemes are effective. In addition, the area under the curve of RoughDroid is 0.95633, which is closer to 1.0; hence, RoughDroid is the best scheme in successfully detecting the malware of an Android application. The excellent efficiency of RoughDroid arises from the various feature sets that are used to design the malicious activity of an application.

3.2.2. RoughDroid and Popular AV Scanners

RoughDroid reveals a much better efficiency compared to related approaches ([1, 1618]). We likewise contrast it with ten picked antivirus scanners on the considered dataset. It should be mentioned that we consider , which we assume to be adequately low enough for practical scenarios.

Experimental results are displayed in Table 3. The best antivirus detects over 90% of malware applications. Our RoughDroid also provides best performance with detection rate of 95.6%.

3.2.3. Detecting Malware Families

When evaluating the detection efficiency of an approach, the equilibrium of malware family members in the dataset is very important. If the number of applications of a particular malware family members is little great compared to various other families, the detection result might mostly depend on these families. An unreal solution to this problem is to make use of the same number of applications for each malware family.

We are laying more stress on 20 (FakeInstaller, DroidKungFu, Plankton, Opfake, GingerMaster, BaseBridge, Iconosys, Kmin, FakeDoc, Geinimi, Adrd, DroidDream, Linux/Lotoor, GoldDream, MobileTx, FakeRun, SendPay, Gappusin, Imlog, and SMSreg) top common malware families, plus the new three (Grabos, TrojanDropper.Agent.BKY, and AsiaHitGroup) malware families that invade Google Play Store at 2017.

We perform three more experiments, by restricting the variety of applications for a certain family in the test set. In the first experiment, we offer no applications of the family. In the second experiment, we place 10 arbitrarily picked applications of the family back right into the test set. Finally, in the third experiment, we use 20 arbitrarily picked applications of the family back right into the test set. The consequences of these three experiments are shown in Figure 5. RoughDroid can reliably detect all households with a typical precision of 95.6% at . The figure also shows that five (Kmin, MobileTx, FakeRun, Grabos, and AsiaHitGroup) families are perfectly detected.

To the best of our knowledge, Android malware detection and classification have a wide research area in the last decade. It has three basic categories, based on the detection technique, that is, static analysis, dynamic analysis, and machine learning analysis. Several methods have been proposed for statically analyzing an Android application, such as [8, 9, 18, 19]. Also, there are some contributions based on dynamic analysis, such as [57, 20, 21]. Regarding realizing the data placement considering both the energy consumption in private cloud and the cost for renting the public cloud services, the authors in [22] have proposed a cost- and energy-aware data placement method, for privacy-aware applications over big data in hybrid cloud.

Furthermore, the detection techniques [16, 17, 23, 24] are based on machine learning. The authors in [25] propose a new bio-key production algorithm called FVHS, which unites the benefits of the biometrics authentication and user-key authentication. Also, in [26] the authors suggest a new scheme named FREDP (File Remotely keyed Encryption and Data Protection). This strategy entails interaction between one of the clouds that are personal and a terminal. The authors in [27] propose a new identity-based blind signature scheme based on number theorem research unit lattice.

The authors in [28, 29] are proposing a new access control for cloud infrastructure as a service. Also, a trust based access control model is proposed in [30]. In addition, cryptographic access control scheme is introduced in [31]. Also, the authors in [32] propose a new space metric optimization pushed deep-learning frame for age-invariant facial recognition. A complete review for Blockchain and intrusion detection is available in [33]. Reference [34] introduced JFCGuard for detecting juice filming charging attack and [35] enhanced network capacity. A privacy-preserving scheme based on location is introduced in [36].

Due to the sparsity of big rating data in E-commerce, both similar friends and similar product items may be absent from the user-product purchase network, which leads to a big challenge to the recommendation of appropriate product items to the target user. The authors in [37] propose a structural balance theory-based recommendation scheme. Also, protecting users’ privacy is challenging when IBM releases its own data to Amazon. In addition, the recommendation efficiency and scalability are often low when the user-service quality data of Amazon and IBM are updated frequently. Thus, the authors in [38] have proposed a privacy-preserving and scalable service recommendation approach based on distributed locality-sensitive hashing.

Based on deep learning, [39] proposed a novel finger vein recognition algorithm. For social networks, [40] introduced a measure for social influence. An early detection scheme for IP traffic is introduced in [41]. A new instant encrypted transmission is proposed in [42]. Based on trusted routing, a sensitive analysis of attack-pattern is proposed in [43]. Finally, [44] presents a new scheme M-SSE that achieves both forward and backward security based on a multicloud technique.

5. Conclusion

This paper introduced RoughDroid that is a new broad floppy analysis malware detector on smart Android phones during the installation time by introducing robust feature extraction framework. RoughDroid performs a broad floppy analysis, gathering numerous features from an application’s Dex code as well as manifest file. It is based on ten feature sets (). It then uses the Rough Set algorithm to check the behavior of an Android application. The experimental results showed that RoughDroid is detecting 95.6% of the malware applications at a , which means that RoughDroid outperforms the well-known detection approaches (Drebin [1], RCP [16], Peng et al. [17], and SigPID [18]). Also, RoughDroid is compared with the ten most popular antivirus scanners and proved efficiency in practical scenarios. Finally, RoughDroid is able to perfectly detect five (Kmin, MobileTx, FakeRun, Grabos, and AsiaHitGroup) malware families.

Data Availability

The data used to support the findings of this study are available from the authors upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by Guangzhou Scholars Project for Universities of Guangzhou (No. 1201561613). Also, this work was supported by the Egyptian Ministry of Higher Education, the Arab Republic of Egypt.