Abstract

In order to solve the problem that Android platform’s sand-box mechanism prevents security protection software from accessing effective information to detect malware, this paper proposes a malicious software detection method based on power consumption. Firstly, the mobile battery consumption status information was obtained, and the Gaussian mixture model (GMM) was built by using Mel frequency cepstral coefficients (MFCC). Then, the GMM was used to analyze power consumption; malicious software can be classified and detected through classification processing. Experiment results demonstrate that the function of an application and its power consumption have a close relationship, and our method can detect some typical malicious application software accurately.

1. Introduction

Smart phones and other mobile terminals have become the most important part of modern life, and Android operating system in the smart phone has more than 50% share. However, not only do open source characteristics of the Android platform provide convenient conditions for software developers, but also security problem has become a hot spot of concern. According to the latest statistics of F-Secure [1], in 2014 the Android platform has 277 species threats, such as unauthorized download applications and GPS location tracking. So distinguishing the mobile normal software and malicious software is a hot problem to be solved.

At present, there are two kinds of methods for malware detection: one is static analysis method, and the other is dynamic analysis method. The static analysis is mainly based on application decompiled and gets source code of malicious behavior.

However, due to the limitation of software code analysis method and decompiled technology, dynamic analysis has gradually become the trend of the mobile malware detection technology development.

In dynamic analysis field, Zheng et al. [2] propose a dynamic analysis system running in the Dalvik virtual machine, but because of the limitation of Dalvik layer in Android, it cannot monitor the detailed information of underlying code. Enck et al. [3] set up a monitoring program running in simulation environment; this method can effectively achieve real-time behavior for a specified application, but it cannot run on Android devices without ROOT access.

For the Android platform application software, there are two ways to get power consumption information. We can use native Android API to calculate power consumption. This method has been used by some application software, such as PowerTutor [4].

Another method is proposed by Curti et al. [5] who rewrite the underlying driving mode for power information. Compared with the first method, this method is more accurate, but it has high intervention to the system.

In the study of power consumption analysis method to detect malicious software, Jacoby et al. [6] propose a model which is based on power consumption to detect intrusion.

Based on this, Buennemeyer et al. [7] and Kim et al. [8] use the Android mobile phone battery configuration files to detect malicious software. But all of the above methods only consider the amount of power consumption, without taking the timing characteristics of power consumption into account.

According to the above problem, this paper puts forward a kind of mobile malware detection model based on mobile phone power consumption. The model structure is shown as in Figure 1, and it mainly includes four function parts:(1)Battery consumption monitor: monitoring of the application software in a certain time range of power consumption.(2)Analysis waveform characteristics: analysis of the diagram of power consumption and extraction of the feature of power consumption waveform.(3)Waveform feature matching: matching the feature of power consumption waveform which is calculated by the analysis module in the prebuild feature database.(4)Output results: output of the classification results for each power consumption feature.

2. Battery Power Consumption Monitoring

The method for monitor battery power consumption is the key to this detection model; if battery power consumption data is not accurate, it will directly affect the accuracy of the results. In this paper, we use PowerTutor [4] to monitor battery power consumption information of the specified application.

PowerTutor can get the battery power consumption time-series data of each mobile application by monitoring their components including CPU, audio, screen, Wi-Fi, 3G, and GPS.

During the power consumption monitoring, we found that power consumption has the same state transition characteristics as sound signal did. But the state of power consumption is much more stable, and 5-minute sampling time can cover all status changes. So, in the rest of this paper, for each application we take 5 minutes to monitor battery consumption.

Figure 2 shows the CPU power consumption of two Android applications (“UC browser” and “iReader”) within 5 minutes. It can be found from Figure 2 that the consumption curves of two applications with different functions are significantly different. It also shows that we can effectively distinguish different applications through the analysis of the power consumption.

3. Waveform Feature Extraction and Analysis

We know that applications with different functions have different power consumption from Figure 2. But when using these consumption data to classify software, we also need to consider the influence of different factors on the power consumption, such as uncertainty user input, different screen resolution, and the influence of different hardware acceleration technology on power consumption. In order to reduce the above factors on the result of application software to identify or test, it is important to choose an effective method to extract the feature information of power consumption waveform.

In the field of waveform characteristics analysis, the most widely used and the most mature way is Mel frequency cepstrum coefficient (MFCC). It has been widely used in speech recognition and has high identification accuracy [9, 10].

This paper designs a MFCC calculation process for waveform feature extraction which was mentioned before (see Figure 3).

MFCC computation process is as follows.

(1) Frame Pretreatment. Frame pretreatment in this paper only needs to do framing and window processing for the original input data of . The purpose is to extract short signal features (similar to audio processing). In order to facilitate the calculation of subsequent steps, set the frame length 5 s; frame offset (the overlap between two frames) is 2.5 s. During the processing, we use Hamming window; the calculation formula is

Among them, the 0.54 and 0.46 are the constant which the Hamming window defined, is sampling points per frame, and is the short signal feature which is the result of the Hamming window formula.

(2) DFT (Discrete Fourier Transform) Linear Spectrum Calculation. We use Fourier transform to calculate the short-time energy and frequency domain signals. First of all, add some 0 after each frame in the time domain signal to make sure the frame length is (); then, we can get the linear spectrum through the discrete Fourier transform. The calculation method is

(3) Mel Frequency Filter Calculation. The input frequency signal through a number of triangle pass filters and Mel frequency spectrum signal are obtained by filtering calculation. Among them, each of the triangular filter center frequency calculation steps is as follows.

Set the number of filter , signal sampling frequency is , and the max Mel frequency is

Center frequency spacing between each filter is

(4) Logarithmic Energy Calculation. In order to have better robustness of noise, it is indispensable to calculate the logarithmic energy of Mel frequency spectrum. Set the number of triangular filter for ; logarithmic spectrum of linear spectrum is calculated as follows:

(5) Cepstrum Calculation. Through the discrete cosine transform (DCT), we can transfer the logarithmic spectrum to cepstrum; the formula is

4. Waveform Feature Matching

Feature matching includes two processes. Firstly, create the battery power consumption characteristics database. And then match software battery power consumption characteristics of input signal with the feature database. In the feature matching calculation, we choose Gaussian mixture model (GMM) [11]. The advantage of GMM is that it can approximate to arbitrary shape distribution probably and the algorithm complexity is low. Battery power consumption feature matching process design is as follows.

(1) Build Characteristic Database. The key to build battery power consumption characteristics database is essentially to calculate Gaussian distribution parameters of the given MFCC feature sequence. Therefore, we can use the maximum likelihood estimation to obtain probability distribution parameters; calculation method is as follows.

Set the MFCC sequence . Each component of is independent of each other; the likelihood for the GMM model parameters is

The eligible parameters are to find a group which makes largest:or

While using maximum likelihood estimation, we choose to use the expectation maximization (EM) algorithm [12] to get the estimated parameters. The calculation steps are as follows.

For each training sample , is the category which this sample may belong to, and is a certain probability distribution of ; then

Estimated parameters are calculated as follows:

The EM algorithm can converge because of Jensen’s inequality [13].

According to formula (10) and Jensen’s inequality, when calculating expectation we assume that the value of is unchanged; then we can get the equation

But in each iteration, the value of is changed; if we treat as variable and calculate its derivative, we can get

Through formula (12) and formula (13) we proved is monotonically increasing. So one way to convergence is that no longer changes, or change very little.

(2) Features Value Matching. After building the database, the given is a known characteristic. For the existing categories , calculate the probability value . Characteristic value matching is finding , which can make ; then is the output of feature matching.

5. Experiments and Result Analysis

In the experiment, first we use the method proposed in Section 3 to build GMM model for game, browser, music player, and malicious software. And for the malicious software, we simulate a floating point number attack for the Android platform. In the construction of the GMM model, we repeatedly monitor the software’s power consumption in 5 minutes and calculate the MFCC features for the reset of 3 types of applications and then use the EM algorithm to calculate the GMM model parameters.

Experiments are designed on 6 cell phones, including Samsung GALAXY S5 and LG G2 mobile phones. Battery monitoring tool is PowerTutor [4]. In the experiments, we only monitor and collect the CPU power consumption data when the application software is running. The cycle time of monitoring is 5 minutes and sampling frequency is 300 ms.

MFCC feature calculation results are shown as in Figure 4; it is the test result of the application that is named iReader. The GMM parameters model is shown in Figure 5.

We can see from Figure 4 although power consumption MFCC feature values for the same application in different mobile phones are not completely the same, the distribution is roughly the same. It also proves that the MFCC feature extracting can largely reduce the equipment and other impacts caused by factors.

And, in Figure 5, the GMM model also shows that the estimated parameters can be calculated using the extracted MFCC parameters and the result is convergence.

In application software testing, firstly we build GMM model for four kinds of typical applications, including games, browser, music player, and malicious software. In this test, this paper only simulates one kind of malicious software which is floating point attack.

During the test step, this paper selects 100 applications in each group. Under the normal condition, we monitor the CPU battery consumption during 5 minutes and then classify it through the proposed power analysis model. Four kinds of typical application test results are shown in Table 1, and the detection rate of the four kinds of typical applications is shown in Table 2.

The malware detection rate in Table 2 (negative rate, NR) shows that the proportion of application was classified as malware, benign detection rate (positive rate, PR) shows the proportion of application software was classified as benign, and accuracy rate (AR) shows the proportion of application by accurate classification. Table 2 data show that the proposed power analysis model to differentiate benign software and malicious software is highly accurate.

The experiment proves that CPU power consumption characteristics of simulated malicious software have a significant difference from others, and through the analysis of the power consumption we can detect and classify malicious software.

Since our method is based on the Gaussian mixture model, we know that the model order will affect the test results. So during the experiments, we change the model order from 2 to 12 with the same sampling time and the same sample data of music applications and compare the results. And the experimental results are shown in Table 3.

The data in Table 3 can prove that, with the increase of GMM’s order number, detection accuracy has a significant rising. But when the GMM’s order is greater than 4, detection precision begins to decline. This is because as the GMM’s order is larger the model becomes more complex; the matching process and parameters calculating process will take a longer time. Since our experiment takes 5 minutes’ sampling time, when GMM’s order becomes larger the sample data is too little to calculate accurately the higher order of GMM’s parameters.

6. Conclusions

Based on battery power consumption sequence characteristics of the application software which is collected from mobile terminals such as mobile phone, this paper introduces MFCC feature extraction algorithm and GMM model classification algorithm and then proposes an Android malware detection method. What is more, the experiment proves that the method has effective identification and detection of malicious software.

In the future, we will try to make MFCC parameters extraction and classification of GMM model integrate into the mobile terminal equipment and consider the battery consumption of other hardware when application software is running.

Competing Interests

The authors declare that they have no competing interests.