Abstract

In the last years the number of malware Apps that the users download to their devices has risen. In this paper, we propose an agent-based model to quantify the Android malware infection evolution, modeling the behavior of the users and the different markets where the users may download Apps. The model predicts the number of infected smartphones depending on the type of malware. Additionally, we will estimate the cost that the users should afford when the malware is in their devices. We will be able to analyze which part is more critical: the users, giving indiscriminate permissions to the Apps or not protecting their devices with antivirus software, or the Android platform, due to the vulnerabilities of the Android devices that permit their rooted. We focus on the community of Valencia, Spain, although the obtained results can be extrapolated to other places where the number of Android smartphones remains fairly stable.

1. Introduction

The security in devices connected to the Internet is an issue that has long been concerned, from governments and companies to individual users. However, this threat seems not being perceived by the smartphone users taking into account the potential risky behavior of them and the sensitive data and pictures the users store in their devices. Moreover, the risk increases with the new companies policies that permit the employees the use of their own smartphones in the work accessing to company sensitive data and applications (bring your own device BYOD).

Different types of malware have already been documented [1, 2] and it may be a threat that must be studied to quantify the users’ potential risk. Here, we will focus on Android platform because most of the smartphones use Android OS [3].

During the year 2011 appeared the first study on the characterization of viruses on mobile OS Android [1, 2]. This study categorizes the types and families of viruses found, depending on the type of installation, activation, effects on the infected device, the user management of the permissions, and so forth, showing the diversity of different virus families and the ineffectiveness of the traditional antivirus methods on mobile devices.

Also, there are several works that have approached the analysis and detection of malware on the Android platform [46]. The common objective of these works is to propose new methods of virus detection on mobile devices from a dynamic point of view, that is, to detect at runtime anomalous or unwanted behavior of the device (system calls, network access, and memory or file modifications). In contrast, static and classic antivirus methods are based on repositories of previously known viruses that do not protect the user in case of the spread of an unknown new virus type. However, dynamic detection of viruses is unsuitable for mobile devices for their CPU and memory consumption. The two approaches, static and dynamic methods, have their own advantages and disadvantages, and both may be bypassed and unable to avoid the spread of new viruses.

1.1. State of the Art

In the literature, there are several approaches to the mathematical modeling for the spread of viruses on mobile devices. In [7] the authors describe a framework and the main guidelines to design reliable agent-based malware models considering infections via SMS/MMS, Bluetooth RF, IM, P2P, and email. In [810] the authors propose approaches based on mathematical epidemic techniques where the malware infection follows similar dynamics to the infectious diseases.

Also, there are models based on the physical architecture of the mobile and wireless networks [12] or based on the mobility of the users, but they do not consider the interconnectivity based on the exchange of applications [9].

To the best of our knowledge, we do not know any paper showing quantification, prediction and/or simulation about how the users install malware Apps. However, literature about the application of machine learning techniques to detect malware Apps in the markets can be found [6]. Nevertheless, any of the above approaches do not take into account the infection model based on an App-market ecosystem, like smartphones environment does.

1.2. Our Proposal

Likely, the model guidelines suggested in [7] are the most suited to the current scenario. In that contribution, an agent-based model of malware dynamics covering all of the possible infection models except the App-market ecosystem model is proposed. The integration of the App-market ecosystem is the key aspect that we will consider in this paper.

As we indicated previously, researchers and companies characterized mobile malware and proposed alternative methods to prevent, detect, and avoid mobile malware. Also, different companies publish periodically mobile malware reports with estimations and statistics. However, in the literature there is a lack of studies that quantify the effects of the malware infection in the Android platform in order to show realistic data to know the extent of the threat as our model does [13]. Our model (App-Model) complements the agent-based malware modeling suggested in [7] introducing a new infection process based on applications downloaded from the App-market. In Figure 1 we can see a rough description of the items which we deal with to build the App-Model.

The App-Model will quantify the Android malware infection evolution (to know the real threat for the users), the number of potential infected smartphones (to estimate the population of smartphones affected by malware), and the type of malware that affects these infected smartphones in the community of Valencia, Spain [14]. The results can be exported to other regions where the number of Android smartphone users is fairly stable.

We must say that other approaches as machine learning or data mining techniques could be used to study the evolution of the malware infection; however these techniques do not take into account the behavior of the actors (markets, Apps, and clients). The knowledge of their behavior and how they interact allows simulating new scenarios where the behavior may be different and predicting the evolution of the malware infection considering these changes.

Additionally, note that with the results of the model we will be able to analyse the critical part of the smartphones business model related to malware; that is, we will find out which part is more critical: the users, giving indiscriminate permissions to the Apps or not protecting their devices with mobile antivirus software, or the Android platform, due to the vulnerabilities of the Android devices that permit their rooted. Furthermore, we will be able to estimate the cost that the users should afford in case that they have in their devices malware that causes financial charges.

The paper is organized as follows. In Section 2 we present the agents of the model: Apps, markets, users, habits, and so forth. In Section 3 we describe how the agent-based model evolves over the time. Section 4 is devoted to carry out simulations, present results, and discuss them. Conclusions are drawn in Section 5.

2. Material and Methods

To conduct our study, we set the time period in a month. The starting time-point () is Jul 2011. This has been chosen because in Jul 2011 none or only very few smartphones could have been infected.

The agent-based approach allows the analysis of service interactions among the agents and fits perfectly the relation between mobile device users and App-markets.

Then, in this model, two domains including their agents will be considered: the markets, where the agents are the Apps that belong to different markets, and the users, where the agents are the mobile devices (or clients) that belong to every user. The study of the behavior of the agents has been studied in reference [15].

The App has the attributes malware and type that indicate whether an App is malware and its type, respectively. Given that the effect over the client produced by a malware App can be one or more of the malicious payload described, we consider that if a malware App carries more than one payload, the type of the malware App belongs to the most upper level payload, according to financial charge, privilege escalation, remote control, and information collection [15].

Whether the client is infected or not, the OS version, if the device has or not software protection, and the kind of infection are the attributes of the client. The privilege escalation malware affects the client depending on the OS version [15].

Additionally, we consider that clients download a certain number of Apps every month, determined by download method, select the downloaded App by the method selection, and determine if the downloaded App infects the client or not with the infection method. More details related to download process are the following.(i) Download method: we admit that the number of Apps downloaded by a user in a month follows a Poisson distribution: where is the number of downloaded Apps and is the average number of Apps downloaded every month in every smartphone.(ii) Selection method: knowing from download method, this method selects randomly Apps from the markets. The selection will depend on the popularity and the number of downloads.(iii) Infection method: with the selected Apps, we take the ones that are malware, and this method determines if the App affects the client or not, depending on the App attributes (malware and type) and the Client attributes (OS version and antivirus).

Now, we are going to summarize, until the end of this section, the main results given in [15] that we will use throughout this paper.

2.1. Official Market

Let us describe the main features of the official market also known as Google Play [16].(i)Considering that the data of the new Apps entering every month in the official market show a linear trend, it can be modeled, in the mean square sense, by the function where is the number of months since July 2011. More details are in [15].(ii)Analogously, the number of malware Apps in the official market is modeled by the function where is the number of months since July which corresponds to . More details are in [15].(iii)Distribution of Apps according to their popularity: Apps are classified according to their popularity. The initial distribution of the 221 875 Apps in July 2011 is given in Table 1.(iv)Distribution of malware Apps according to their popularity: the malware Apps initially distributed by popularity (July 2011) can be seen in Table 2. Repackaging [2], that is, take a popular App, introduce some malware code, and upload it again to the market, is considered.(v)Malware detection: the admitted effectiveness of the App scanning service of Android market is around 40% [15].(vi)Distribution of malware Apps according to their type: the distribution of malware Apps in the official market according to [2] is shown in Table 3.

2.2. Nonofficial Market

Let us describe the markets other than Google Play [17].(i)Considering that the data of the new Apps entering every month in the nonofficial market also show a linear trend, it can be modeled, in the mean square sense, by the function where is the number of months since July 2011 which corresponds to . Details can be found in [15].(ii)Similarly, the new malware Apps entering every month in the nonofficial market are given by the function describing the evolution of Apps in the nonofficial market, where is the number of months since July 2011 which corresponds to .(iii)Distribution of Apps according to their popularity: we classify the Apps depending on their popularity in the nonofficial market as it is shown in Table 4.(iv)Distribution of malware Apps according to their popularity: we classify the Apps depending on their popularity in the nonofficial market as it is shown in Table 5.(v)Malware detection: there is not antivirus service in the nonofficial market.(vi)Distribution of malware Apps according to their type: the distribution of malware Apps, according to [2], in the nonofficial market by their type is shown in Table 6.

2.3. Users

Let us describe the main features regarding the users behavior.(i)Number of users: in the community of Valencia there were 1 176 954 Android smartphones. We are going to consider this value constant over the time.(ii)Smartphone renewal: a user changes his/her smartphone every 11.5 months, in average [18].(iii)Users with antivirus installed in their devices: we assume that the number of users with antivirus installed in their devices is 33% [19]. However, this figure is under discussion; therefore, we will simulate taking values in , the maximum unbiased likely interval. Moreover, the admitted effectiveness of these antivirus software is between 20.2% and 79.6% [2].(iv)Average number of Apps downloaded: every user downloads an average of 6.2 Apps per month [15].(v)OS version evolution and infection by privilege escalation malware: the evolution of the OS versions installed on the smartphones is shown in Table 7.Table 8 shows the percentage of devices that can be affected by the most common Android privilege escalation vulnerabilities [20].(vi)App downloads by popularity: let us define , , , and with , the average number of downloads of Apps in the community of Valencia with less than 500 downloads, between 500 and 5000 downloads, between 5000 and 50000 downloads, and more than 50000 downloads all over the world, respectively. Denoting by , , , and the probabilities a user in the community of Valencia downloads an App which number of downloads in all the world are less than 500, between 500 and 5000, between 5000 and 50000 or more than 50000 downloads, respectively, then where and , , should satisfy that where More details are in [15].(vii)When can a user be infected by a malware App? A downloaded malware App infects the client if one of the following conditions are met.(a)The downloaded malware App is of the type privilege escalation, the OS is vulnerable, and there is not any installed antivirus.(b)The downloaded malware App is of the type remote control, financial charge, or information collection and there is not any installed antivirus.(c)The downloaded malware App is of the type privilege escalation, the OS is vulnerable, and the installed antivirus does not detect the malware (the antivirus is not effective).(d)The downloaded malware App is of the type remote control, financial charge, or information collection and the installed antivirus does not detect the malware (the antivirus is not effective).(viii)Probability that a user detects that his/her smartphone is infected and fixes it: we also mentioned that the average replacement cycle of smartphones is 11.5 months [18]. Anyway, the user detects and repairs infections caused by financial charge malware when he/she receives the mobile bill.

3. The App-Model Evolution Rules

The users and the markets have their own rules that define the initialization point and the evolution for the agents sets. The evolution rules for the client agents simulate the behavior of the users, establishing how many Apps are downloaded monthly by a client, how the App selection method by the client based on the App’s popularity is, if the downloaded App infects the device, and how long a user changes his/her device.

The evolution rules for the App-markets establish the number of new Apps in every market each month, how the markets control the new submitted Apps (Google Play uses Bouncer which scans submitted Apps looking for malware), how the markets distributes the Apps by popularity, and so forth.

Then, using the considerations introduced so far, we are going to describe the evolution rules of the model. Recall that the time period is a month and the starting point of the model corresponds to July 2011.

First, we sample percentages , , , and as described in (6), (7), and (8). Then for every month .(i)State the official market:(a)determine the number of Apps in this market in month according to (2);(b)distribute them according to their popularity following the percentage values in Table 1;(c)determine the number of malware Apps in this market in month according to (3);(d)distribute them according to their popularity following the percentage values in Table 2;(e)malware detection: 40% of malware is detected and removed.(ii)State the nonofficial market:(a)determine the number of Apps in this market in month according to (4);(b)distribute them according to their popularity following the percentage values in Table 4;(c)determine the number of malware Apps in this market in month according to (5);(d)distribute them according their popularity following the percentage values in Table 5.(iii)User behavior. For every user:(a)Download method: take a random value between and and obtain the maximum value of such that (see expression (1)).(b)Selection method: select Apps from each market with a probability of 50%, in such a way that their popularity is rated according to the probabilities , , , and , and malware or not with probability for the official market and for the nonofficial market.(c)Infection method: if any of the downloaded Apps are malware, for each malware App, one has the following.(1)If it has been downloaded from the official market, determine its type with probabilities given in Table 3. Then, it infects the smartphone depending on the OS installed (Table 7) if there is antivirus and its effectiveness.(2)If it has been downloaded from the nonofficial market, determine its type with probabilities given in Table 6. Then, it infects the smartphone depending on the OS installed (Table 7) if there is antivirus and its effectiveness.(d)Check if the user detects if the smartphone is infected and fix it only in case the malware is financial charge and the repair is done at the end of the month.(e)Check if the user changes his/her smartphone (every 11.5 months in average).

The algorithmic evolution of the App-Model described above is drawn as the flowchart shown in Figure 2. The left side of the figure represents the evolution of the clients and the right side the evolution of the Apps that evolve in parallel. The start point represents the initial month of the model (), where the model creates the clients and sets their attributes. After this, and for every step (), the model begins its evolution and all of the clients (left side of the figure) run their methods in the showed order and change, if needed, their attributes. Also, for every step (), the model establishes the markets that are changing every month, sets the Apps attributes, and groups them depending on the number of downloads (right side of the figure). After this, and for every step, the number of Apps of the markets is recalculated according their evolution curve. All this process runs in parallel, but on every step the selection method of the clients can be executed only after the Apps are grouped. After the last step of evolution of the model (), the end point of the simulation is reached.

4. Results and Discussion

Once the model has been built and the evolution rules are stated, there are some model parameters unknown but satisfying some restrictions:(i)Apps download percentages per popularity , , , , and , satisfying (6), (7) and (8),(ii)the percentage of smartphones with antivirus, denoted by , is in [19], and(iii)the effectiveness of the antivirus protection, denoted by , is in [2].

Now, in first place, we are going to see if the model output depends on the number of smartphone users. If it is, we will have to simulate the behavior of 1 176 954 users. Otherwise, we will be able to reduce the number of users in order to run the simulation very much quicker.

Secondly, we will simulate a large amount of runs in order to estimate the number of the monthly infections by malware Apps.

4.1. Model Evolution Depending on the Number of Users

In this first experiment, we take fixed values of , , , , , , and and we run simulations for 1000, 5000, 7000, 10000, 15000, 20000, 30000, 40000, 50000, 65000, 80000, 100000, 120000, and 150000 users during months. Then, in Table 9 we can see the comparison of percentage of cumulative (aggregated) and residual (new ones) infected users for month . Few differences can be noted. Therefore, we do not need to simulate the 1 176 954 Android smartphones in the community of Valencia to obtain reliable and accurate results. After some tests, we decided to consider 50000 users.

4.2. Estimations

Thus, in order to compute reliable estimations based on 95% confidence intervals (CI 95%), we use the technique called latin hypercube sampling (LHS) [21] to select sets of parameters to be substituted into the model. Latin hypercube sampling (a type of stratified Monte Carlo sampling) is an efficient method for achieving equitable samples of all input parameters simultaneously. Moreover, the random selection of the sets of parameters done by LHS will allow us to study the model sensitivity by the CI 95%.

In our case, taking 50000 smartphone users, starting in July 2011 and finishing in December 2014 ( months), and following the evolution rules, LHS was used to generate 100000 different values of each input parameter , , , , , , and sampled as follows.(1)Sample values such that , and calculate , .(2)Sample a value of uniformly in the interval .(3)Sample a value of uniformly in the interval .

We used these samples to run 100000 evaluations of the model obtaining 100000 model outputs (infected smartphones) for each month . Then, for each month we take the 100000 model outputs and calculate the mean and the 95% confidence intervals taking into account the empirical 2.5% and 97.5% percentiles.

In Figure 3 we can see the evolution of the cumulative infections since July 2011 until December 2014 with a 95% confidence interval. In Table 10 we can see the numerical values of the mean and CI 95% of the cumulative infections in the community of Valencia in July 2013, July 2014, and December 2014.

In Figure 4 we can see the evolution of the new (residual) infected smartphones every month with a 95% confidence interval since July 2011 until December 2014. It can be seen that since October 2012, there is a certain stabilization in the number of new infected smartphones. In Table 11 we can see the numerical values of the mean and CI 95% of the residual infections in the community of Valencia in July 2013, July 2014, and December 2014.

Finally, in Figure 5, we show the mean and the 95% confidence interval of cumulative infected smartphones by privilege escalation (PE) and financial charge (FC) malware. Comparing Figure 5 to Figure 3 we can see that financial charge malware infects a half of the smartphones according to [2, 22]. In Table 12 we can see the numerical values of the mean and CI 95% of the cumulative infections in the community of Valencia in July 2013, July 2014, and December 2014.

4.3. Model Validation

In [13], Patterson talks about Google’s Android security chief Adrian Ludwig who gave a talk at the Virus Bulletin conference in Berlin. In this talk, Ludwig said that the problem Google wants to solve is that most of independent security researchers do not have access to a platform such as Google to measure how many times a malware App has been installed. Also, he mentioned that security researchers are very good at finding and fixing malware, but in the absence of reliable data that indicate how frequently a malware App has been installed the threat level can become exaggerated. Reports that reach publication are often extremely exaggerated. To emphasize this point, Ludwig revealed in his analysis that some of the most publicized recent malware discoveries are installed in less than one per million installations. Additionally, he reported that based on the data from tracking over one and a half billion App installs Google obtained convincing evidence that the rate of potentially harmful Apps installed is stable at about 1 200 per million App installs or about 0.12%.

Furthermore, the official reports as F-Secure Report (mobile threat report September 2013), Trend Micro Report (Trend Labs Security report 3Q 2013), or Secure List Report (mobile malware evolution February 2014) do not only show the number of devices affected by installed malware Apps but also show the number of Apps detected as malware.

As a consequence, comparing the figures given by the proposed model to the real ones is not going to be an easy task because of lack of real data. In fact, to our knowledge, the only data about potentially harmful Apps installed are the one mentioned above: stable and about 0.12%.

Then, taking into account that the conference was held on October 3rd, 2013 [13], we may compare this data with prediction of the model for new smartphone infections in September 2013: stable and mean 0.33% with CI 95% [0.21%,0.44%].

Hence, our model predicts a stable situation of harmful Apps installed, as Google says, and a little bit higher number of infected smartphones than Google. This slight difference may be due to the development of the techniques for detecting malware during the period of time considered in our simulation, resulting in increased effectiveness of antivirus software than that used in the initial parameters of our simulation in terms of the effectiveness of antivirus software and therefore reducing the number of malware installed in the Google analysis. Taking into account this regard, we consider that our model provides valid results in terms of estimation of number of infected smartphones and in terms of stable evolution of the infections.

5. Conclusion

In this paper we present an agent-based model to quantify the Android malware infection evolution. Some model outputs are compared to data given by Google and the results are fairly similar, stable, and a little bit higher for the model predictions.

Considering the parameters of our model and our simulations, the obtained results show that, given a specific population of devices with Android OS, one has the following.(i)A mean of 0.3% of devices are infected every month by some kind of malware. This number is stable over time from October 2012 onwards, considering the growing curve for the total Apps and malware Apps.(ii)Taking into account cumulative values from July 2011 to December 2014, we predict that the infections will be around a mean of 13.83% over the total number of devices considered.(iii)From this 13.83%, around the half of the total (48%) will be infections by financial charge malware type, and around a third (27%) will be infections by privilege escalation malware type. The remainder (25%) will be infections by remote control and information collection malware type.(iv)Thus, the infections by financial charge, remote control, and information collection malware type are due to the users because they give indiscriminate permissions to the Apps and do not protect properly their mobile with antivirus software. Therefore, we show that two-thirds of the infections are caused by these two factors, showing that the most critical part for the malware infections at smartphones is the users habits and the ineffectiveness of the traditional antivirus software, not due to the OS vulnerabilities.(v)Quantifying and monetizing the financial charge malware incidence: we can consider that, from the 0.3% new infected devices during a month, the half part is infected by financial charge and that every infection causes a monthly overrun of 30 euros (we have some examples of mobile bills such that their owners suffered an infection of financial charge malware and the amount of these bills are around 30 euros) in every device. Considering that the total population of Android devices in Spain is 10 853 813, the number of infected devices by financial charge malware type during a month is 16 280 (i.e. 0.15%) and the financial charge caused by this kind of malware during a month will be 488 400 euros.

With our model, we show realistic data that can be considered in order to quantify the real threat for the users and the number of potential infected smartphones. With these results, we consider that preventive strategies against mobile malware should be developed mainly focusing on new malware detection approaches before being downloaded by the users, because, as we shown, the users decisions and the ineffectiveness of the traditional antivirus software approach are the critical part for the infections.

Moreover, with the presented model, despite the increasing of Apps, we could see that the number of new infected smartphones achieved stable figures, and then it is not expected a significant change in the current stable trend.

One of the most interesting features of the presented model is that if some of the parameters vary because of changes in the behavior of the actors (markets, Apps, and clients) we only have to tune the corresponding model parameters and perform the simulations to predict the evolution of the infected smartphones for the new scenario.

Finally, we want to point out that this model and simulations can be extrapolated to other regions where the number of Android smartphones is fairly stable over the time.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this article.

Acknowledgment

This work has been partially supported by the Ministerio de Economía y Competitividad Grant MTM2013-41765-P.