Big Data Management and Analytics for Mobile Crowd SensingView this Special Issue
How Dangerous Are Your Smartphones? App Usage Recommendation with Privacy Preserving
With the rapid proliferation of mobile devices, explosive mobile applications (apps) are developed in the past few years. However, the functions of mobile apps are varied and the designs of them are not well understood by end users, especially the activities and functions related to user privacy. Therefore, understanding how much danger of mobile apps with respect to privacy violation to mobile users is becomes a critical issue when people use mobile devices. In this paper, we evaluate the mobile app privacy violation of mobile users by computing the danger coefficient. In order to help people reduce the privacy leakage, we combine both the user preference to mobile apps and the privacy risk of apps and propose a mobile app usage recommendation method named AppURank to recommend the secure apps with the same function as the “dangerous” one for people use. The evaluation results show that our recommendation can reduce the privacy leakage by 50%.
The rapid growth of mobile devices has been leading to the prosperity of mobile applications (apps). For example, as of the end of 2014, the number of apps on Play Store has been over 1.4 million and over 1.2 million on Apple Store. This number is still growing dramatically with the proliferation of mobile devices. Mobile users download these mobile apps and use them on their mobile devices to satisfy different purposes. However, the functions of mobile apps are varied and the designs of them are not well understood by end users, especially the activities and functions related to user privacy. Indeed, to improve user experience and the functionality of mobile apps, developers start to move their eyes on the personalized service that can be provided by apps. They develop new functional apps or enhance the capability of apps by digging into the personal information, such as location information, contacts, camera, messaging, and even calling service. However, when users launch apps on their mobile devices, they may fall into danger as some unknown activities or functions might cause privacy issue.
Although app stores (e.g., Google Play) may remove those apps with malfunctions or low quality periodically, many well developed apps with privacy violations are not perceived by the stores. In other words, app stores release the right to end users to let them decide whether to install the apps or not. In such case, most end users download and install those apps by neglecting the privacy warning. Even when some users do notice the privacy issue, as the functionality and preference of mobile apps, they still install them into the mobile device. Once the apps are installed, users’ privacy information will be leaked when they launch those apps.
As mobile apps serve for different functionalities, different types of privacy information may be leaked by users for launching different types of mobile apps. For instance, a location-based service (LBS) needs to collect user real time location information from users. It may refer to home and workplace where users may not expect to be exposed. A social-aware service needs to extract contacts from users, which violates many users’ personal life as well, not to mention the information collected for function-unrelated purpose. In fact, it is reported that users have growing concern about their privacy while using mobile apps. A recent survey from IDG news reveals that over 30% of mobile users prefer to uninstall those apps after learning the personal information they collected. Unfortunately, users do not know how much of the personal information has been collected and how much danger of mobile apps with respect to privacy violation to mobile users. Therefore, it is crucial to understand how dangerous are installed apps on mobile devices as well as how much privacy risk is taken by launching those apps. It will also be beneficial for mobile users to know how to reduce the privacy leakage by using those apps with less privacy concern and meanwhile maintain the quality of experience.
Thus far, majority of mobile app recommendation approaches have been developed based on the popularity of apps while neglecting the privacy issues existing in mobile apps [1–3]. Several privacy-concerned app detection and recommendation mechanisms are proposed to discover the malfunction of mobile apps. They either focus on the service provider side to let app stores recommend those apps with less privacy concern without considering the user personalization  or target the developer side to investigate the inside of apps, find malware code, and restrict the app access permission [5–9]. However, these actions need the cooperation with either service providers or apps developers, which makes them difficult to be implemented in practice.
In this paper, in order to measure how much privacy a mobile user attempts on the mobile phone and help to recommend apps with less privacy, we propose a privacy evaluation mechanism by analyzing the app usage data. The violation of privacy depends not only on the risks of apps, but also on the user usage pattern. Although some apps violate user privacy heavily, user information cannot leak if the app has not been used. To evaluate the privacy leakage, we define a danger coefficient to quantify the privacy and analyze the privacy violation distribution of mobile users. To reduce the privacy leakage from mobile devices, on one hand, we need to understand how much privacy a mobile app can expose. On the other hand, we need to investigate the apps usage of different users. Therefore, we combine both the user preference to mobile apps and the privacy risk of mobile apps for apps recommendation. To formulate the user preference to mobile apps, we apply the distribution of common preference probabilistic method, which can enrich the context of personalized preference. To understand the privacy violation of mobile apps, we measure the privacy access permissions of mobile apps. Finally, we seek a balance between the user preference to apps and the privacy violation of apps to propose an app usage recommendation method named AppURank. According to the functionality of mobile apps, we classify them into groups by the topic model. The proposed approach recommends the same functional apps with lighter privacy concern and high user preference. We evaluate our mobile app recommendation approach with extensive experiments. The results show that our proposed recommendation method can halve the danger of mobile devices and meanwhile maintain the same level of user preference.
The contributions of the paper are summarized as follows:(i)We carry out a mechanism to evaluate the extent of privacy leakage when people use their mobile devices. We define a danger coefficient to measure the privacy violation of apps from usage perspective and analyze the privacy violation distribution of mobile users.(ii)We propose an app usage recommendation approach for end users, named AppURank, by combining user preference, privacy risks, and functionality of apps. The proposed method is to recommend people with preferred apps but with less privacy violation.(iii)We evaluate the app recommendation method on our collected data. It shows that the method can reduce the danger of mobile devices to half and meanwhile maintain the same level of usage preference.
The rest of the paper is structured as follows: Section 2 reviews the related literature. Section 3 describes the problem, introduces the definition of danger coefficient of mobile users, and provides a recommendation on the app usage that can minimize the risk of privacy leakage. Section 4 shows the experimental results. Conclusions are finally given in Section 5.
2. Related Literature
In this section, we review the state of the art for the privacy leakage evaluation of mobile apps and mobile app recommendation approaches.
The privacy issue of mobile apps has been studied for many years. One group of previous studies regarding privacy issue of mobile apps concerns the risk analysis of mobile apps. For instance, Au et al.  surveyed the permission systems of smartphone operating systems from the amount of controls, the information released to users, and the levels of interactivity from users. Felt et al.  focused on the permission request of various mobile apps to determine whether Android developers follow least privilege with their permission requests. They further built Stowaway to detect overprivilege in Android apps. Enck et al.  proposed TaintDroid, which provided real time analysis of mobile apps on the monitoring of their data access by leveraging Android’s virtualized execution environment. In contrast, majority of mobile app privacy studies are proposed for privacy violation or mobile app malfunction detection. To deal with the information stealing of mobile apps, Zhou et al.  carried out the TISSA system, which can empower users to flexibly control the accessibility of mobile apps to personal information. Enck et al.  exploited a rule-based certification model named Kirin to perform lightweight certification of mobile apps at installing time to reduce the privacy issue caused by mobile apps.
However, these mechanisms need investigation on the code installed in mobile apps in order to identify the privacy violation, which is difficult for all apps installed on mobile devices. Meanwhile, people do not like apps scanning their mobile devices all the time. In this paper, we propose the privacy preserving method by recommendation approach, which can avoid the installation of risky apps.
The majority of mobile app recommendation methods consider the popularity or user preference as factors for the decision making. For instance, AppJoy  made personalized mobile app recommendation by analyzing how users actually use their installed apps. They applied collaborative filtering algorithm for individual recommendation. Yu et al.  and Zhu et al.  considered the user context for mobile app recommendation and used Latent Dirichlet Allocation (LDA) topic model to describe the problem of mobile apps recommendation. Few papers focus on the mobile apps based on privacy issue. Peng et al.  proposed a risk ranking method of Android apps using probabilistic generative model to tell users the privacy risk of mobile apps before installation. The other one by Zhu et al.  used the modern portfolio theory to recommend mobile apps from the perspective of app store by considering the awareness of security and privacy. Our paper tries to recommend mobile apps from personalized user-app usage perspective to avoid privacy violation. There are some literatures focusing on permission settings recommendation to preserve user privacy. Lin et al.  proposed to provide reasonable default settings to help users configure their privacy settings by identifying distinct privacy profiles. Liu et al.  proposed and implemented PriWe, which leveraged the crowd sourced permission settings to understand users’ privacy expectation and provides app specific recommendations to mitigate information leakage.
We use the similar methods as  to obtain the privacy risk of mobile apps. However, different from the state of the art [4, 10], we not only consider the privacy risk of mobile apps in general, but also combine the function relativity of mobile apps to the permissions and people usage patterns of mobile apps to evaluate the privacy danger of mobile apps to users. We then propose a mobile app recommendation method by considering both user preferences and mobile app privacy.
3. Problem Formation
When an app is installed or launched on mobile device, it always asks the permission to access certain information. The permission means the capability users grant to mobile apps so that mobile apps can access certain part of mobile users’ information. These permissions are associated with mobile apps to either help the mobile apps to achieve some functions (e.g., localization) or fulfill the mobile apps to collect user data. The permissions requested by a mobile app are independent of each other. When mobile app is installed or launched on the smartphone, the users have the right to make decision for the app to access the permission of information. In fact, the information those apps intend to access may refer to the sensitive personal private data, such as location information and the control of hardware (e.g., camera). For instance, Table 1 illustrates the access permission list requested by a version of Google search app in an app store. It can be seen that some information requested by the app permissions, such as network connection and storage, is necessary for the function of the app (i.e., information searching), referred here to as “function-related” for the app. At the same time some other information requested by the app permissions, such as phone state, personal data, payment service, and system tools, is not highly correlated with the function of the app, referred here to as “function-unrelated” for the app. As a matter of fact, for different types of permissions, the degree to which they violate the privacy is different. For instance, as shown in Table 1, the payment service permission that allows running the operation for payment is more severe than the permission of network connection. There are two reasons for such a judgement. First, the payment service permission may cause the economic loss, which is more vital than the network connection. Second, the payment service is function-unrelated while network connection is function-related for searching function. Therefore, all the information involved in an app can be categorized into several tiers according to its degree of privacy violation and its relativity to app function: (a) “normal permission,” which does not involve sensitive information of mobile users, such as network connection and storage; (b) “severe permission,” meaning the information is severely related to user privacy, such as personal information, location, and payment service; and (c) “system permissions,” related to the control of hardware and system, such as the access of hardware and setup of system level configuration. Therefore, combining the function and the extent of privacy violation, six different types of information permission are considered: function-related with normal permission, function-related with severe permission, function-related with system permission, function-unrelated with normal permission, function-unrelated with severe permission, and function-unrelated with system permission.
Although apps violate user privacy by permissions of accessing user information, mobile users still install and launch different types of mobile apps on their smartphones according to their preferences. In order to recommend apps by considering both user privacy preserving and user preference, in this paper, we discuss three issues: how to measure the privacy risk of an app and the privacy violation to mobile users by launching mobile apps, how to determine user preference to mobile apps, and how to balance the privacy violation to mobile users and the user-app preference to meet the requirement of users.
In the following, we will define the danger coefficient to quantify the privacy violations of permissions to mobile users and also address the above-mentioned three issues to recommend mobile apps from usage perspective.
3.2. Danger Coefficient
Generally speaking, the privacy information is normally leaked when people launch an app with privacy permission. We introduce here a new parameter, called danger coefficient (DC), capable of expressing the leakage of privacy when users run apps on their mobile device.
To evaluate the DC of each user, two factors need to be determined. The first one is the privacy risk of each app, which is reflected by the permissions that the app asks from users. We measure the privacy risk of permissions requested by the app and consider it as one factor for evaluating the app’s DC. The second one is the app usage pattern by the user. In what pattern the mobile app is used indicates the probability that the privacy information disclosed by the app will be leaked, which is considered as the other factor for evaluating app’s DC.
We now address the problem by characterizing and quantify the privacy risk of permissions and then the privacy risk of an app. Permissions can be classified into three classes, which are normal permissions, critical permissions, and system permissions as we discussed in the previous section. Requesting a more critical permission increases risk more than requesting a less critical one. For the quantification, we leverage the probabilistic approach proposed in  to evaluate the risk of each of the categories of permissions.
For the generic app , the permissions that will be accessed are denoted by the set , where is the total number of permissions. The generic variable is binary and assumes value 0 or 1 if permission is not or is present in app . For mobile apps, different types of permissions may correlate with each other. For instance, permissions related to network (including Internet access, checking WiFi state, checking network status, changing WiFi status, and changing network connection) are mutually correlated. However, such dependence introduces sophisticated analysis to conduct privacy risk evaluation. In contrast, the study  discovers that assuming the independence of different permissions can still perform well for the overall privacy risk evaluation but it is more simplistic for the analysis compared with dependence situations. Furthermore, the assumption of independence of different permissions allows a monotonic model, which allows the consideration of each individual permission. Besides, it can also help to differentiate different classes of permissions.
Therefore, if we use to indicate the privacy risk factor for app , is generated by independent Bernoulli random variables and is given by where is the probability that permission is accessed by an app.
Following , is obtained using a function. That is,where is the total number of apps used for evaluation. In this paper, the value of is set to 900 as the dataset contains 900 mobile apps. To suggest the several privacy violation risks, we set , with less penalty effect for critical permissions. For normal permissions, we set , , which is normal distribution suggesting the less effect of the privacy risk, , with less penalty effect for critical permissions, for normal permissions the value , , which is normal distribution suggesting the less effect of the privacy risk. With this method, different types of permission privacy risks can be distinguished.
Furthermore, apps may request both function-related and function-unrelated permissions. For example, an app providing map service needs location information as its function-related permission, whereas if an app serves as chatting service, requesting calling permission will very likely be considered as function-unrelated permission. If an app requests a function-related permission, the privacy violation is considered much weaker than that of function-unrelated permission request. To show the difference, we assign different weights for the privacy risk and define the weight factor of permission for app . The weights of the function-related permissions should be less than that of the function-unrelated permissions, as the function-related permissions are about to enable functions and services, whereas the function-unrelated permissions intend to collect user privacy information. For instance, we take an empirical value that the weight for function-related permissions is 0.5 and for function-unrelated permissions is 1. Then the overall risk factor of app that takes into account the weight of the different risks is given by
Considering the app privacy risk is monotonically decreasing with respect to the probability of using granted permissions, which means removing a permission always reduces the risk value of an app, for privacy risk calculation of an app, we use the following function :
As far as the app usage pattern is concerned, a user with longer time duration of using an app will have more chance to access privacy information. Although there are some mobile apps that are launched in the background and steal the user privacy information without being invoked by users, they are not measured and counted in our analysis. In this paper, we assume that the probability that a mobile app accesses privacy information is proportional to the time duration in which the app is used. Therefore, we measure the usage pattern by expressing the fraction of time user in which uses app ; that is, , where is the total time usage of app by user and is the total observation interval of the system.
The state of the art suggests that interevent times of human behaviors follow Poisson distribution . In this paper, we assume the interusage time follows a Poisson distribution with the parameter equal to for user running app . Then .
The DC measures the danger degree of a user using a mobile device. For a user , the danger coefficient can be derived by combining the app privacy risk and the app usage pattern and is expressed as where is the number of apps launched in the device. The larger indicates the more chance for privacy violation when user launches apps in the mobile device. For a user with multiple devices, each device can be evaluated by the above procedure for the danger coefficient estimation. We use DC as a metric to evaluate the danger degree of the recommendation algorithms in Section 4.
3.3. User-App Preference
The user-app preference reflects the preference relationship between individuals and apps. As each individual user may not display enough information to fully discover his or her individual preference, we employ an individual user-app preference based on the distribution of common preference, as presented in . Specifically, we firstly investigate the common preference of many users and then represent each user’s preference by a distribution of common preference. If the common preferences are presented by , the conditional probability that a user prefers the category a given the set of all apps can be represented as For given apps and user , and are constant. Therefore, we havewhere the preference of user to the category of apps is determined by the common preferences of many users (i.e., ) and also the user’s personal preference conveyed by a distribution of common preferences (i.e., ). The calculation of common preferences of other users and the distribution of common preferences of user can be presented by the normalized number of apps launched by user and other users.
The above user-app preference suggests the preference of a user to a category of mobile apps. To identify categories of mobile apps, we first find the categories of each mobile app with its assigned categories. However, such classification is not fine-grained enough for app usage preference recommendation. For example, many mobile apps are associated with offline services, such as online banking apps or social media apps. A Facebook social app cannot be replaced by a Twitter social app. Therefore, for each category of mobile apps, we use a topic model to categorize apps in different coarse grained preference categories into different groups according to their functions and usage context. For this purpose, we employ Latent Dirichlet Allocation (LDA) model , in which an app is associated with a word in a document, and each category is a topic. With such topic model, apps are put into different fine-grained preference groups. Specifically, for each user, we extract the context components (i.e., time stamp, and location) from the usage record and consider the set of context components of a user as this user’s bag of context components. For the fine-grained preference of each user, the procedure is conducted as follows. It begins with a random assignment of fine-grained preference to each context component. It then iteratively estimates the conditional probability of assigning of the preference to context component and updates the preference of each context component according to the latest calculation. The assignment will converge finally, which means each context component is assigned with a fine-grained preference. Then the user fine-grained preference will be determined by her context component bag. But LDA has some drawbacks to identify the sequence of words and also it is in the topics composition in which the same words appear in the multiple topics. In our case, these drawbacks do not affect the performance of mobile app group identification. For the fine-grained preference classification, only the special words associated with the fine-grained group are used. Moreover, the classification does not involve the sequence of words. Therefore, LDA model is capable of fine-grained classification for mobile apps. If a person launches an app on the mobile device, it is indicated that the person is interested in the functionality provided by the app. If the app is considered with high risk, then another app in the same category should be recommended.
3.4. App Usage Recommendation
As the high risky app may cause more privacy leakage, recommending the user preferred apps with less privacy risk is more desirable. In this section, we will propose a method for app usage recommendation combining required function, user preference, and privacy, named AppURank.
Regarding required functions, we recommend apps which are in the same category as the launched apps to guarantee the function similarity. Then we recommend apps according to preference and privacy. The objective of the app usage recommendation is finding a group of apps consisting of a collection of apps with the corresponding weight and which display minimal privacy risk and meanwhile satisfy the individual preference and function requirement. The general formulation of the objective can be presented aswhere suggests is one mobile app in category and the value of depends on the privacy and individual preference requirement of users. For example, if a user considers privacy more important, then the value of will be larger, while if a user takes user preference more into account, then the value of will be smaller. Given the vector , the corresponding preference vector and risk vector are represented by and . Equation (8) can be rewritten as
In this case, and are independent, and the selection of is to rank mobile apps according to user preference and privacy risk according to the calculation of and and then combine the two aspects, as specified by (9). The app with the highest combination value is considered as the most recommended app. Apps with top values are the top apps on the recommendation list.
4. Performance Results
In this section, we describe the experimental data and evaluate the privacy risk of mobile apps, danger coefficient of mobile users, and AppURank recommendation method.
4.1. Experimental Data
In this paper, we need two data traces to evaluate the danger coefficient and the proposed recommendation approach.
One data trace contains apps with their requested permissions, and the permissions are marked with function-related and function-unrelated. There are 900 apps and their permissions in the data trace. For each app, we collect the permission information from the privacy description in the store and use a matrix to store them. The rows are apps, and columns are permissions. If an app requests a permission, the element in the matrix will be 1; otherwise, it will be 0. Then we take the function relativity into consideration. We investigate the function relativity in two stages. In the first stage, we crawl the coarse grained categories of mobile apps and identify the permission by their main functions. For example, a location-based social service should need the location information, which means location information is function-related for the location-based social service. In the second stage, we manually identify the other permissions accessed by each of the mobile apps. For instance, an alarm app could require access to microphone to let the user record some memo/voice to be played during timeout. If the permission is requested by malicious functions of an app, it is considered as function-unrelated. For a function-related permission, the value of element is changed to . For function-unrelated permissions, the value of element is still . Then the privacy risk of each app in this data trace can be calculated using the expression of in (4). Figure 1 shows the distribution of apps in terms of the number of requested permissions. It shows that about 20% of apps request less than 10 permissions. The percentage of apps requesting 10 to 20 permissions reaches over 75%, and about 5% of apps request over 20 permissions. On average, each app requests 12 permissions. Figure 2 shows the distribution of apps in terms of different types of permissions. More than 70% of apps request Internet access, phone state, and network state. There are several apps requests for external device format, which highly violate the user privacy and even refer to the security of mobile devices. Regarding function relativity of apps, we draw Figure 3 to show the average number of function-related and function-unrelated permissions of different categories of apps. It shows that function-unrelated permissions are more than function-related permissions in most cases. In the apps of security, car/cab, social, voice, reader, helper, and tools, the function-unrelated permissions are requested twice as much as function-related permissions.
The other data trace is the mobile users and their mobile app usage pattern. In order to obtain such data, we design a mobile app named AntTest (http://www.wandoujia.com/apps/edu.bupt.anttest) on Android platform. Indeed, the AntTest application is developed to measure network speed of mobile device. For such purpose, the data such as app usage pattern of each user is recorded every 5 seconds, which can be exactly applied in this study. We put the AntTest in the app store to provide a way for users to download. So far, the app has been available for more than days since March 2014. The total number of records is over 103 million produced by 13,969 users. The detailed data description is presented in Table 2. The format of one record is presented as , where RecordID is the ID of the record, IMEI indicates the user equipment ID, AppID is identified by the name and package of the app, and Time indicates the time of the record. We anonymize the dataset to conduct the experiment for the privacy concern.
4.2. Evaluation of App Usage
In order to evaluate the privacy risk when people launch mobile apps, we measure the app privacy risk, app usage pattern, and DC of users when they launch apps from their mobile devices.
For the given mobile apps in the permission dataset, we measure the privacy risk of each app, which is one factor of DC, combining the requested permissions and function relativity. The average privacy risks (by (4)) of different mobile app categories are presented in Figure 4. It shows that all the categories are with privacy risk more than 15. According to the statistics shown in Figure 1, one mobile app can access as many as 25 permissions in the dataset, indicating the upper bound of the privacy risk factor is about 60. According to our observation, apps are risky if their privacy risks are over 20 (1/3 of the upper bound). From the figure, the privacy risk expectation of apps for video, audio, web browser, shopping, voice, tools, and security is high. In contrast, apps for news, live streaming, photography, reading, and radio are relatively secure.
To show the impact of function relativity, we conduct the evaluation with varied , which is weight for function-related and function-unrelated permissions. We fix to be equal to 1 for function-unrelated permissions and vary for function-related permissions with 0.2, 0.5, and 0.8 (as the legend) in terms of five-category apps (WiFi, streaming, social, video, and mail). The result as shown in Figure 5 presents the average privacy risk of the five categories of mobile apps. It shows that the trend of privacy risk does not change with different values. Moreover, the higher the value of chosen for function-related permissions selected, the higher the privacy risk the mobile apps own. Among all the rest of experimental results, we use for function-related permissions and use for the function-unrelated permissions to reflect the different privacy violation concern with respect to function relativity of mobile apps.
The app usage pattern, which is the other factor of DC, measures the time usage of different apps. Figure 6 shows the usage time distribution of all apps with respect to mobile apps usage (Figure 6(a)) and users usage time (Figure 6(b)). They record 900 mobile apps and around 2,000 users who run these 900 apps in the datasets. Both of them follow the heavy tail distribution (see the straight line). Majority of apps have short usage time, while few apps have a long usage time. Similarly, most users use their mobile devices for short time, while few users have long time duration for mobile device usage. To show the usage pattern of specific mobile apps, we select several typical apps (i.e., WeChat, QQ, and Tencent video) and show their usage patterns in Figure 7. The plot shows that over 90% of users run WeChat for less than s, while this number of users decreases to 75% for Tencent video. This is due to the different attributes of the apps functionalities, where WeChat is used for messaging, whereas Tencent video is used for video playing. The usage time duration for video playing apps (i.e., Tencent video) should be longer than that for messaging apps (i.e., WeChat and QQ).
(a) App usage time
(b) User usage time
We measure the DC of mobile devices according to user-app usage pattern and the risk of mobile apps. Specifically, we evaluate the DC of 50 sampled users given the collected information of mobile apps and draw the DC value of each person. In our experiment, the DC value is bounded by . For the convenience of illustration, we ordered the DC values as shown by the line with forward triangles in Figure 8. It shows that all users are with DC value larger than 20, and the highest one reaches almost 30. From our observation, if DC value is over 20, it indicates that the mobile phone usage becomes risky. Our selected 50 users are all in the risky status.
4.3. Evaluation of App Usage Recommendation
To evaluate the performance of the proposed recommendation approach, we calculate DC values of people under different conditions to quantify how much danger of mobile apps with respect to privacy violation to mobile users. Specifically, we vary the parameter from 0 to 1 and finally to 100, to see the DC turbulence for different users. We compare the proposed approach with the PNB method , which only considers the app risk for recommendation. Indeed, PNB provides the baseline for the evaluation.
The result of the evaluation is shown in Figure 8. Besides the ground truth observed in the real world, it presents the DC value of the proposed recommendation approach in case of , , and and PNB method. When , the recommendation is led by the personalized user-app preference, as illustrated by the line with plus marks. It shows that the danger coefficient is much higher than ground truth if the privacy risk issue is neglected. Meanwhile, it also shows that some users’ preference can reach a lower danger coefficient. This is due to the intrinsic low risk of the mobile apps.
Furthermore, we consider the situation in which both user-app preference and privacy have the same importance, which is considered setting , and the line with backward triangles is obtained. The DC value is much lower than the ground truth. Only two users are still with DC value higher than 20. We further increase the value of to 100 to show the DC values when the risk takes the dominant role. The result is shown as the line with solid circles. It reaches almost the baseline obtained from PNB marked by the line with circles. Without loss of generality, we consider that the majority of people would like to consider preference and privacy equally important. In such case (), the DC values reduce to about 50% on average compared with ground truth.
We further investigate the recommendation results for news app categories. We first show several common permissions of the news apps in Table 3. In all 7 listed permissions, only network connection is the function-related permission to news reading. The remaining 6 permissions such as location, SMS, contacts, recording, camera, and system settings are function-unrelated to these apps. For each app, we mark a cross if it asks for a certain permission in the corresponding place. It shows that the app online retail requests only one function-related permission. CCTV and Sina request one function-unrelated permission system setting besides network connection. Baidu, Sohu, NetEase, Phoenix, and the paper request even more function-unrelated permissions. The most severe one is Tencent, which requests nearly all the listed permissions. We show the recommendation list of news apps as a function of different parameters shown in Table 4. When , Baidu News is the first app recommended, indicating it fits user preference for news reading. The paper news is the last one as it is not as popular as the others. When , Baidu News is still in the first place. If people consider privacy and preference equally important, Baidu News is the best choice. However, Tencent News goes to the last position. This is mainly due to its violation of user privacy. If the privacy is considered as the most important factor (when and PNB), the online retail news app becomes the first recommended app, due to its lowest privacy violation to users, as we have shown in Table 3. In contrast, Baidu News falls into the fourth position. Tencent News still holds the last position, given its high violation to user privacy.
In this paper, we proposed a method to evaluate the privacy risk from mobile apps when people use a mobile device. We evaluated the mobile app privacy risk and defined a danger coefficient for each user by combining the mobile apps risk and user preference. According to the requirement of privacy and satisfaction of app usage preference, we proposed a mobile app recommendation method named AppURank. The evaluation results showed that the privacy risks of apps are different, and the DC of mobile users is very high for many of them. The proposed recommendation method can help to reduce the danger coefficient by 50% on average and meanwhile maintains personalized user preference. For the future work, we will build the mobile app recommendation system on the mobile device and evaluate the performance of the recommendation algorithm by implementation and deployment.
The authors declare that they have no competing interests.
This work has been partially sponsored by National Science Foundation of China (no. 61502045), EU FP7 IRSES Mobile Cloud Project (Grant no. 612212), the 111 Project (no. B08004), the Fundamental Research Funds for the Central Universities, and the Beijing Higher Education Young Elite Teacher Project.
K. Yu, B. Zhang, H. Zhu, H. Cao, and J. Tian, “Towards personalized context-aware recommendation by mining context logs through topic models,” in Advances in Knowledge Discovery and Data Mining: 16th Pacific-Asia Conference, PAKDD 2012, Kuala Lumpur, Malaysia, May 29-June 1, 2012, Proceedings, Part I, vol. 7301 of Lecture Notes in Computer Science, pp. 431–443, Springer, Berlin, Germany, 2012.View at: Publisher Site | Google Scholar
H. Zhu, H. Xiong, Y. Ge, and E. Chen, “Mobile app recommendations with security and privacy awareness,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14), pp. 951–960, ACM, New York, NY, USA, August 2014.View at: Publisher Site | Google Scholar
W. Enck, P. Gilbert, B.-G. Chun et al., “An information-flow tracking system for realtime privacy monitoring on smartphones,” in Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI '10), pp. 1–6, USENIX Association, Berkeley, Calif, USA, 2010.View at: Google Scholar
Y. Zhou, X. Zhang, X. Jiang, and V. W. Freeh, “Taming information-stealing smartphone applications (on android),” in Proceedings of the 4th International Conference on Trust and Trustworthy Computing (TRUST '11), pp. 93–107, Springer, Pittsburgh, Pa, USA, June 2011.View at: Google Scholar
J. Lin, B. Liu, N. Sadeh, and J. I. Hong, “Modeling users' mobile app privacy preferences: restoring usability in a sea of permission settings,” in Proceedings of the Symposium on Usable Privacy and Security (SOUPS '14), pp. 199–212, USENIX Association, Menlo Park, Calif, USA, July 2014.View at: Google Scholar
R. Liu, J. Cao, L. Yang, and K. Zhang, “PriWe: recommendation for privacy settings of mobile apps based on crowdsourced users' expectations,” in Proceedings of the IEEE International Conference on Mobile Services (MS '15), pp. 150–157, New York City, NY, USA, June 2015.View at: Publisher Site | Google Scholar
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” The Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 993–1022, 2003.View at: Google Scholar