Abstract

As the most widely applied mobile operating system for smartphones, Android is challenged by fast growing security problems, which are caused by malicious applications. Behaviors of malicious applications have become more and more inconspicuous, which largely increase the difficulties of security detection. This paper provides a new security sandbox approach of Android based on hook mechanism, to further enrich Android malware detection technologies. This new sandbox monitors the behaviors of target application by using a process hook-based dynamic tracking method during its running period. Compared to existing techniques, this approach can create a virtual space where apk can be installed, run, and uninstalled, and it is isolated from the outside and a risk assessment approach based on behavior analysis is given so that users can obtain an explicit risk prognosis for an application to improve their safety. Tests on malware and normal application samples verify this new security sandbox.

1. Introduction

The security of Android is highly valued by and of wide concern to the industry [1] since a huge number of mobile applications are developed and used based on Android system. Various security issues of Android apps are continually being discovered and discussed, ranging from sensitive data leakage [24] to privilege escalation [57]. The proliferation of malicious apps on Android devices and the theft of user privacy data have become major problems for the development of the Android biosphere [8].

Previous methods have normally used Android internal sandbox to protect Android system in the public network [9, 10]. However, the internal sandbox hardly analyzes the risk behaviors of Android applications. Most of the prior studies for risk behaviors of Android applications rely on static analysis and dynamic analysis. Research on static analysis of Android application behavior, including MDroid [11] and MHealth Apps [12] static analysis tools has achieved good results in the application of behavior. On the other hand, static analyses may have their limitations, for instance, detecting sensitive data leakage [1316] and analyzing capability leakage [5, 17]. In addition, the effectiveness of static analysis is restricted due to the distinctive features of Android’s programming paradigm. Android apps are based on the Android OS, which can be regarded as a giant set of libraries containing both Java code and native code (so far, Android has consisted of more than 13 million lines of code [18]). Reference [19] proposed a verifiable diversity ranking search scheme over encrypted outsourced data while preserving privacy in cloud computing, which also supports Android data.

Contrary to static analysis, dynamic analysis executes selected program paths and thus can precisely identify property violations [20, 21]. Nonetheless, Android apps are tightly coupled with the Android OS which consists of a set of libraries containing both Java and native code and complex interprocess communications. To address this, Qi [22] proposed the taint propagation information tracking system based on TaintDroid, by modifying the Android system source code. Reference [23] implemented a tool named DroidInjector, which reproduced the malicious behavior of the application through the simulation of the application behavior. Gharib [24] put forward a system named ProfileDroid, which can depict the application behavior from 2 aspects of semantics and behavior and improve the accuracy of behavior analysis. Ardeshiricham [25] then designed a more detailed system named DroidScope. Reference [26] proposed an Activity call graph analysis method to generate UI interaction script automatically for Android applications. It can not only depict the application behavior semantically but also analyze the behavior characteristics of the application components. Lin has designed and implemented a tool named sandbox [27]. It can analyze sensitive actions from 2 aspects of general behavior and internal behavior. Reference [28] presented a scheme named SecDisplay for trusted display service; it protects sensitive data displayed from being stolen or tampered surreptitiously by a compromised OS. To detect DDoS attack, Cheng [29] proposed an abnormal network flow feature sequence prediction approach which could fit to be used as a DDoS attack detector in the big data environment and solve the aforementioned problems. And that method can also be used for detecting risk behaviors of Android app. Liu [30] designed a more practical privacy protection data aggregation protocol based on a new trusted model to avoid attack from DDoS.

In general, dynamic analysis on the application behavior can accurately judge the malicious behavior; however it cannot guarantee that all the codes for the implementation of application coverage, so that there might be a high missing rate. Static analysis can ensure the code coverage, but the accuracy of static analysis is easily affected by code obfuscation technology. For this reason, the paper begins with the application of Android risk behavior perspective, dynamic analysis based on the idea of the design and implementation with hook mechanism. It can automatically install, start, and uninstall the application and can simulate user actions in the application after the start, at the same time to monitor and record the behavior of the application. On this way, we put forward an evaluation method based on risk behavior. A security risk for the application of behavior is analyzed, so that users about risk related applications can have a clear anticipation.

2. Preliminary

2.1. Information Entropy

Information entropy, also known as Shannon entropy, is proposed by Shannon to solve the problem of quantitative measurement of information. In information theory, entropy is used to measure the expected value of a random variable. It represents the amount of information lost in the process of information transmission before being accepted and is also the entropy of information. In thermodynamics, the definition of entropy is the logarithm of the number of possible states of a system; its physical meaning is a measure of the degree of disorder in the system. Entropy is the measure of uncertainty of random variables in information theory. Information entropy considers that the magnitude of information of a message is directly related to its uncertainty. Named after Boltzmann’s H-theorem, Shannon denoted the entropy of a discrete random variable with possible values asHere is the expected value, and is the information content of . is itself a random variable. If denotes the probability mass function of then the entropy can explicitly be written as

2.2. A Hook Example

As printf function, for example, if the hook module wants to intercept the calls to printf() and redirect to another function, it should write the redirected function address to the offset addresses of the symbol printf defined in the relocation sections, after the linker loaded the dynamic library into memory.

To replace the call of the printf() function with the call of the redirected hooked_printf() function, as shown in the software flow diagram in Figure 1, a hook function should be implemented between the dlopen() and libtest() calls. The hook function will first get the offset address of symbol printf, which is 0x1fe0 from the relocation section named  .rel.dyn. The hook function then writes the absolute address of hooked_printf() function to the offset address. After that, when the code in libtest2() calls into the printf(), it will enter the hooked_printf() instead.

3. Methods

The goal of sandbox is to avoid both apps and Android system code modifications. The design of sandbox is to directly modify the apps virtual-memory tampering with ART internals representation of Java classes and methods. ART-sandbox consists of two components. The first component is the core engine written in C and the other one is the Java side that is a bridge for calling from user-defined Java code to sandbox’s core. The core engine aims to find target methods in virtual memory, load user-supplied DEX files, hijack the vtable, and set native hooks. Moreover, it registers the native methods callable from the Java side. Sandbox is configured by reading a user-supplied JSON formatted configuration file containing the target methods list.

Figure 2 represents the apps memory layout while sandbox hooking library is enabled. The hooking library is loaded inside the apps virtual memory (step 1), and then sandbox loads the user-defined patch code by DexClassLoaders methods (step 2). After this, sandbox uses its internal functions to retrieve target methods reference. It can hook these methods by both vtable and virtual methods hijacking (step 3).

To get the target methods reference, sandbox uses the JNI function FindMethodID. Sandbox overwrites the target methods entry within both the vtable and virtual methods array by writing the address of the methods patch code. The original methods reference is not modified by sandbox and its address is stored inside the internal data structures. This address will be used to call the original method implementation.

3.1. Hook Virtual Space

When the Android application starts the Activity, the ActivityManager.start-Activity() method will eventually be invoked no matter what API call is passed. This call is a remote Binder service (speeding up the call), and the Android application will first look up the Binder service cache in the local process. Virtual app(VA) intervened in the invocation process by the following way.

(1) Replace the local ActivityManagerServise(AMS) Binder service for the proxy object that is constructed for the VA to take over the call. This step is realized by Java reflection technology.

(2) After taking over the AMS, when the startActivity is invoked to open more applications, the Activity in the VA modification Intent is the occupied Activity that has been declared in VA. The goal of this step is to directly start Activity without AndroidManifest.xml.

(3) When the multiple application process is started, the message processing callback is increased by the ActivityThread.mH.mCallback. This step takes over the message callback of more than one application master threads.

On the basis of the above modification, the multiple application Activity startup process can be divided into the following two steps: start Activity (shown as in Figure 3) and resume Activity (shown as in Figure 4).

3.2. APP Risk Behavior

An application invokes an API that truly reflects the behavior of the application in the Android system. For example, an application that generates network behavior will certainly invoke the API associated with the network communication. An application that generates file behavior will certainly call the file-related APIs. Hence, to describe an application’s behavior, one can use it as a standard for invoking the API. Once users have installed malicious applications in Android, these apps typically have some of the following common malicious behaviors in Android:

(1) The backstage sending the charge message and calling the toll telephone.

(2) Theft of user information (including mobile phone messages, call records, mobile phone IMEI, IMSI number, and user-used operators);

(3) Access to the user’s location information, to open the mic recording and camera in the backstage.

(4) Backstage networking, transmission of user information, and consumption of user network traffic.

(5) Camouflage process, in the backstage to kill other mobile phone processes (such as Alipay application process) and then camouflage another process to cheat.

According to the description above, if an application invokes one or more APIs required to implement the above behavior, there is a certain degree of risk that the user can install the application. Table 1 shows some of the APIs and their behavior levels.

When the risk is large to a certain extent, users should be informed. The level of risk represented by different APIs is not the same for risk APIs. In general, the user’s economic interests as a direct risk measurement criteria:

(1) APIs that may directly cause loss of property to the user, with the highest degree of risk

(2) The API that can obtain or disclose user’s privacy, its risk degree being secondary

(3) To modify the system settings and User Configuration, damage to the system environment of the API, the degree of risk being relatively low.

Based on the theory of information entropy, this paper proposes a new approach to evaluate the risk behavior of Android using information entropy. One installs the application into the sandbox and runs and simulates user actions for a fixed number of times, such as 500. Suppose that, in this process, all the sensitive APIs invoked are , and set the information entropy used to evaluate the risk behavior of the application to , and set s as the total API numbers that has occurred for all behavior:

Because malicious applications generally focus on risk behavior, they frequently invoke sensitive APIs. This paper uses a set of sensitive APIs to describe and characterize an application. If is greater, its entropy will be more than the normal application of information entropy. By calculating an application on the sensitive API set of information entropy , one can judge the level of risk.

4. Experiments and Results

We firstly test one app which is named authclient.apk downloaded from Google. Figures 5 and 6 show the test results.

This paper collects 1200 malicious Android apps that have been identified by each major mobile phone’s security platform as a sample of malicious applications and crawls the top-ranked downloads in Google’s official App Store, a total of 400 applications, as a normal application sample. In general, Google’s official App Store is generally considered to be a more secure app store, and the application of the top-ranked Google stores has withstood the test of countless users and security vendors. It can be considered a safe and normal application. Based on the above 2 sample sets, the sandbox is used to monitor and record its API calls, and the results are statistically categorized as shown in Figures 7 and 8.

From the above results, network API calls occupy the main position for both malicious and normal application, and that is a trend in the future development of Android applications. In addition to the network API, there are malicious applications to other sensitive API calls and normal application of the obvious difference. As can be seen from Figures 6 and 7, malicious applications have significantly more frequent calls to sensitive APIs than normal applications, and these APIs can capture the user’s IMEI code, IMSI code, phone number, SIM card sequence code, etc., so these APIs are obviously more risky than network APIs much higher.

5. Discussion and Conclusion

With the rapid development of mobile Internet, the Android applications have become an indispensable tool for people’s daily life, and understanding the risks of Android applications is very important. This article focuses on the analysis and evaluation of the Android application risk behavior, monitors and records the behavior of Android applications through the Android sandbox, and uses the information entropy theory to analyze and evaluate the risk behavior of Android applications. This method can provide reference for application store review and also make Android users have a clear assessment of the risks associated with the application. Based on the above methods, this paper collects more than 1200 malicious applications and 400 normal applications, calculates the information entropy of both, compares them, and verifies the validity of the method described in this paper.

However, before being widely deployed in practical applications, this proposed sandbox has to tackle the privacy and efficiency challenges in sharing schemes [31]. We need to send users’ data to clouds and guarantee the security of users data while ensuring rapid transmission of instant data [32, 33] and the security of public storage clouds [34, 35]. Note that mining data from multiple data sources to extract useful information [36, 37] for better understanding of security risk evaluation should be considered in the future study.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Foundation of Science, China (41274109), the Innovative Team Project of Sichuan Province (2015TD0020), and the New Zealand Marsden Fund.