Abstract

As smartphones such as mobile devices become popular, malicious attackers are choosing them as targets. The risk of attack is steadily increasing as most people store various personal information such as messages, contacts, and financial information on their smartphones. Particularly, the vulnerabilities of the installed operating systems (e.g., Android, iOS, etc.) are trading at a high price in the black market. In addition, the development of the Internet of Things (IoT) technology has created a hyperconnected society in which various devices are connected to one network. Therefore, the safety of the smartphone is becoming an important factor to remotely control these technologies. A typical attack method that threatens the security of such a smartphone is a method of inducing installation of a malicious application. However, most studies focus on the detection of malicious applications. This study suggests a method to evaluate threats to be installed in the Android OS environment in conjunction with machine learning algorithms. In addition, we present future direction from the cyber threat intelligence perspective and situational awareness, which are the recent issues.

1. Introduction

The Internet of Things (IoT) technology is being applied to various fields such as public safety, transportation, industrial, and healthcare. The expansion of this technology contributes to the quality of human life. The Gartner report predicts that the IoT market will grow at an average annual rate of 28.8 percent from about $ 300 billion in 2015 to more than $ 1 trillion in 2020 [1]. In addition, the report published in 2019 predicts that this IoT technology will be combined with artificial intelligence technologies to develop into “Intelligent Things”. With the expansion and evolution of these IoT technologies, many companies are developing application services that can connect their IoT products and smartphone platforms [2]. Typically, Apple developed a “W1” chip to build the IoT ecosystem and began embedding it in IoT products and the iPhone [3]. Samsung is making efforts to install “Tizen”, a proprietary operating system, on smartphones, smart cars, and smart home appliances [4]. In addition, Hyundai plans to mass-produce the Near Field Communication (NFC) smart key application in 2019 to control the vehicle through smartphones [5].

The combination of IoT and AI technology is making cyberspace's influence expand to the real world. The cyber-attack in the past occurred only in cyberspace, and the damage was less likely to occur in the environment where it is not directly related to it [6]. However, the damage caused by the cyber-attack is currently spreading from damage to software and hardware, causing a wide range of secondary damage such as leakage of personal information, revision of direction of national policy, and manipulation of public opinion. A typical threat is a mobile malware, such as spyware. Thirty-five percent of attackers use these methods of attack, and the main attack targets include personal and confidential information of high-ranking government officials. In addition, RSA's Current State of Cybercrime, published in 2016, claimed that the mobile fraud targeting financial information in 2015 increased about 173% from 2013 [7].

Global security companies have recently stated that the possibility of a cyber-attack on mobile devices is increasing. They argued that North Korea is concentrating its cyber-attacks on mobile environments that are vulnerable to security. Palo Alto Networks announced that the attack target of malicious application disguised at Google Play Market is Samsung's smartphone users using Korean. McAfee pointed out the “Lazarus”, which is supposed to be a hacker organization in North Korea, behind the attacks of malicious Android backdoor applications disguised as biblical applications. In addition, South Korea's security companies are behind a spy app that seized information, text messages, and contact information on about 10 high-ranking officials in the national defense, diplomacy, and security fields in 2016 [8]. The reason for this increase is that personal information or financial information is often stored on smartphones, and smartphones are at the center of connection and control of IoT devices.

Changes in the way in which personal information is stored and the efficient use of IoT equipment are reasons to increase the frequency of attacks on smartphones. These mobile threats can cause damage to both cyberspace and the real world. To provide situational awareness to decision-makers, it is necessary to extend threat detection for a mobile malicious application to evaluate threats. This study proposes a method of assessing threat based on the extracted features through Android malware detection using basic machine learning using the risk model, factor analysis of information risk (FAIR). This paper describes the relationship between mobile threats and cyberspace, threat assessment, and limitation of mobile malware detection in the next sections. In addition, this paper proposes frames from the threat assessment procedure from the perspective of situational awareness section, and the results are described in the result of threat assessment for Android malware application section.

2. The Relationship between Mobile Threats and Cyberspace

With the development of IoT adventure, the border between cyberspace and the real world disappears, and the frequency of cyberattack has a negative effect on the real space. In general, cyberspace is defined as a virtual space that enables communication of the information environment by overcoming the temporal and spatial limits of reality through a virtual network environment composed of electronic devices and electronic spectrum. This environment is interdependent with the Internet, network, embedded processing, human, society, and policy, so the damage from cyber-attacks is likely to expand in the future.

2.1. Mobile Threats to IoT Devices

Mobile devices such as smartphone tablet PCs can be used as access roads to IoT equipment. In recent years, many companies have named IoT devices as smart devices and are controlling them through applications installed on smartphones. That is, if the smartphone is vulnerable to security, data of IoT equipment storing personal privacy information may be leaked. Mobile threats related to these IoT devices can be classified into data leakage, unsecured Wi-Fi, network spoofing, phishing attacks, spyware, etc.

Data Leakage. Mobile applications often cause unintended data leakage. Currently used applications are granted full privileges by the user, but the security of this application is not perfect [9, 10]. They tend to be free of charge in the official App Store. Such information leakage may be prevented through text data. However, about 51.6% of the programs have the risk of information leakage during Android applications, and the attacker can access them through the access token [11]. That is, when a user transmits data to an IoT device through a smartphone, privacy information may be exposed to an attacker.

Unsecured Wi-Fi. In most public places (e.g., airports, parks, etc.), users can use wireless hotspots, but tend to use free Wi-Fi networks. However, these free Wi-Fi generally do not provide secure security. N. Sombatruang et al. have confirmed that personalized photos, e-mail, documents, and login credentials are transmitted without encryption in packets on mobile devices that are connected to unsecured Wi-Fi [12]. If the smartphone is storing data transmitted from IoT devices, it shows that information could be leaked through the fake Wi-Fi.

Network Spoofing. Network spoofing means that an attacker sets up a rogue access point in a public place like a coffee shop, library, or an airport. This method of attack often allows attackers to give users a common name by assigning a common name to access points such as “Free Airport Wi-Fi” or “Coffee House”. Many users create an “account” and enter a password to access the free service, at which time users tend to use the same e-mail and password combination for multiple services [13]. This could allow an attacker to expose access to e-mail, e-commerce, and IoT equipment.

Phishing Attacks. Mobile devices are always exposed to phishing attacks because they are almost powered on. Mobile devices are more likely to be a threat than desktops. Even if they receive e-mail, they are less likely to receive warnings through security programs. Such an attack can provide a path for malware to enter the user's mobile device [14, 15]. This may provide the attacker with the rights to the mobile device and the IoT device connected to the device or the personal information stored in the device.

Spyware. Mobile spyware is an application that monitors and records user's privacy and personal information without user's permission. This spyware is installed when a user installs another application, redirects to a malicious website, or physically unlocks the computing device. When a mobile device is infected with spyware, it can eavesdrop on nearby conversations or access data stored or transmitted to the device. For example, if there is a smartphone near the keyboard, it is possible to detect what was entered into the computer through the accelerometer sensor of the smartphone [16]. In addition, an application having a backdoor may induce the use of push notification services to leak a user's personal information [17].

2.2. Extending the Mobile Threats to the Real World with IoT Devices

The mobile threats can affect not only mobile devices, but also people using IoT equipment and IoT technology. In other words, it means that the mobile device acts as a medium that connects cyberspace and real space. This concept can be explained through the cyberspace layer described in US Joint Publication (JP) 3-12R Cyber Operation. According to this document, cyberspace is composed of a physical layer, a logical layer, and a persona layer. The physical layer is a layer where physical devices such as IoT devices, routers, switches, and mobile devices are geographically and physically. The logical layer is a layer where network or communication devices are located in the physical layer. The persona layer is a layer that represents a human being in a virtual space through an electronic service provided at the physical layer and logical layer and corresponds to an IP address, an e-mail, and an ID [18].

These cyberspace layers are subdivided and expanded into the basic concepts of physical layer, logical layer, and persona layer. David Clark subdivided cyberspace into a physical and a logical layer, as well as an information layer that represents the information flow in cyberspace and a top layer-people layer that allows human decisions to be projected into cyberspace. The characteristic of this concept is that the service of cyberspace has a dependency between the layers, thus providing a basis for the threat of cyberspace to be reflected in the real space [19]. The US Department of Homeland Security has defined a cyber ecosystem that extends the concept of the cyberspace layer. The cyber ecosystem is a notion that social organizations such as private companies and governments as well as human beings construct virtual ecosystems by interacting with electronic devices and electronic services such as hardware and software [20]. This concept reflects the situation in which the gap between cyberspace and real space is decreasing due to the development of IoT and AI (Artificial Intelligence) technology. The cyber ecosystem consists of 15 layers including the cyberspace layer concept. These hierarchies suggested that the impacts in cyberspace could affect social or policy systems (see Figure 1).

This means that the threat of cyberspace through mobile devices can be visualized through IoT equipment. The reason why IoT equipment can be visualized as a threat is due to the factors of IoT. IoT equipment consists of identification, sensing, communication, computation, service, and semantics [21].

Identification. It is an explicit identification of each object in the logical layer and is provided by addressing the device, such as an electronic product code (EPC) and ubiquitous code, by naming the device or by addressing a specific object. IPv6 is used to assign a unique address to each entity [22]. Mobile devices could be exposed to threats if they are made into fake access points with the name of such equipment.

Sensing. It refers to the process of collecting information from objects through RFID tags, smart sensors, and so on. Since most IoT devices are connected to the mobile device, if the mobile device is exposed to spyware, phishing attacks, etc., an attacker can gain access to private data [23]. Or when a user accesses unsecured Wi-Fi, the ID and password of the IoT device may be exposed and illegally monitored or the personal information may be leaked to the outside.

Communication. It is one of the main purposes of IoT technology because various IoT devices are connected to each other. This element can send and receive messages, files, and other information stored on IoT devices and utilize communication technologies such as Near Field Communication (NFC), Bluetooth, Wi-Fi, and Long-Term Evolution (LTE) [24]. However, if mobile devices connected to communication are exposed to threats such as spyware, there is a possibility that private information of IoT equipment articles connected to mobile devices may be exposed.

Computation. It is an element that uses sensors to perform calculations on information collected from objects and is developed to perform processing in IoT applications. Typically, this is the operating system. Mobile devices such as smartphones are divided into Android and iOS. The vulnerabilities of OS are used for spyware or malware. In other words, when a smartphone with an IoT application installed on a smartphone is attacked, IoT devices may also be exposed to threats [25].

Service. It is provided by IoT applications and can be divided into four types. The identity-related service is used to get the ID of the object that sent the request. An information aggregation service aims to collect all information from an object. The collaboration service makes decisions based on the information gathered and sends appropriate responses to the devices. Finally, ubiquitous services are used to respond to devices immediately, regardless of time and location [26]. However, malicious applications such as these can be used for data leakage and user surveillance.

Semantics. It is an element that reduces the gap between cyberspace and real space, ensuring the convenience of users in data collection and utilization of IoT equipment. This element is responsible for gathering information from IoT equipment and making appropriate decisions to send a response to the mobile device [27]. Since the decision-making power is on the human side, the mobile device is used to remotely control it, but if the mobile device itself is exposed to threats, semantics can be exploited by an attacker.

3. Threat Assessment

To respond to an attack, the manager performs a risk assessment on the assets he holds. The risk assessment consists of threats, vulnerabilities, assets, and missions (or countermeasure). If the assets to be protected are clear, the administrator will focus on the threats associated with them. Recently, Cyber Threat Intelligence (CTI) has been used to counter APT attacks. However, in the case of a threat, an attacker needs to organize the threat into tactics, techniques, and procedures (TTP) because the attacker sets the attack method through several different cases. This section describes the relevance of threat assessment and situational awareness (SA) for evaluating cyber threat information and the Factors Analysis of Information Risk (FAIR) model applied in this study.

3.1. Threat Assessment in Situational Awareness

Situational awareness (SA) is the process of recognizing an environmental element in a threat situation or the time and space where a specific event occurs and establishing countermeasures. These procedures are used as a framework to recognize, judge, and respond to threats such as terrorism, security, and cyber security. The SA is used in command and control systems that are used for decision support in the military sector.

The basic concept of SA extends from Endsley's model. The Endsley's model is a process for understanding the perceptions of the environment, understanding the changes in the current situation, and foreseeing the consequences of future projections. The Endsley's model has three phases of perception, comprehension, and projection. “Perception” recognizes the status and attributes of related elements in the current environment. “Comprehension” recognizes the current situation through elements collected at the "Perception". "Projection" assesses how the information analyzed in “Comprehension” will affect the state of the future operating environment (see Figure 2) [28].

In cyberspace, the SA model is being developed through the basic concept of this Endsley's model. J. Okolica et al. proposed a cyber situational awareness model (CSAM) with business continuity planning (BCM). This model updates cyberspace assets or systems in real time and predicts future threats through sense, evaluate, and assess [29]. G.P. Tadda and J.S.Salerno developed the Situational Awareness Reference Model (SARM) to improve understanding of various data in cyberspace. The SARM can actively respond to changing threats in real time [30]. N. Evancich et al. proposed Effective Cyber Situational Awareness (ECSA) focusing on network security. ECSA is divided into three stages: “Network Awareness”, “Threat Awareness”, and “Operational Awareness”. “Network Awareness” is a step where a decision maker recognizes the characteristics of the network's assets and security. The “Threat Awareness” step detects an attack vector that has entered the network. “Operational Awareness” is a measure of damage to network operation capability due to the threat. ECSA is an improved situational awareness model than CSAM in decision-making, collaboration, and resource management [31].

The SA models, which are being studied recently, seek to recognize changes in the current situation by detecting threats. From the viewpoint of SA, threat assessment is a link between policy and technology to utilize technologies such as the prediction of a future threat, prediction of attack path, and identification of countermeasure, as well as threat detection in the perception of awareness stage [32]. Threat assessment in cyberspace is an engineering methodology that detects, identifies, and prioritizes cyber threats in order to apply countermeasures that reduce vulnerabilities to cyber-attacks. Many kinds of research and developments have been made in the evaluation of cyber threats. The CyberPrep Working Group has established cyber-aware enterprise transformation strategies that reflect the understanding of APT attacks, organizational responses, and cybersecurity investment strategies. MITER's center for resiliency experimentation developed cyber resiliency engineering to develop a methodology for processes, personnel, and individual systems that support resilience strategies and techniques for mission functions against cyber threats. In addition, this threat assessment uses System/Acquisition Mission Assurance Engineering (SAMAE) applying System Development Lifecycle (SDLC) to analyze APT attack knowledge [33]. This threat assessment includes the following elements in common.(i)Identify and prioritize high-risk tactics, techniques, and procedures (TTP) that cyber-assets can be affected(ii)Identify and prioritize effective countermeasures against identified TTP.(iii)Recommend a countermeasure to reduce the possibility of cyber asset attacks

There are limitations in the risk assessment method of measuring the risk by setting the range of the cyber asset in the situation where the boundary between cyberspace and the real space is unclear such as IoT environment. Since the IoT environment provides a diverse attack path, the threat assessment can contribute to improving the risk management approach to assets [34].

3.2. FAIR Model for Threat Assessment

The factor analysis of information risk (FAIR) model is a risk management model developed by J.A. Jones. This model focuses on the probability of an emerging threat event and measures threat and asset by frequency and size to measure risk. This FAIR model measures the risk by combining Loss Event Frequency (LEF) and Loss Magnitude (LM) (see Figure 3). This model reflects in detail the probabilities and probabilities. The likelihood is expressed as 100% or 0% when a threat condition is possible in a binary condition, and the probability reflects continuity between absolute certainty and impossibility. By using this probabilistic approach, it is possible to balance the compensation probability with the understanding of the loss probability, and it has an advantage that the range of the acceptable level of the decision maker (or manager) can be quantified with respect to the risk level.

In this model, the part that can be utilized as a frame of threat assessment is LEF. The LEF is the frequency with which the threat agent is likely to be harmed by the threat agent in a certain period or situation and consists of Threat Event Frequency (TEF) and Vulnerability (VUL) [36].

Threat Event Frequency (TEF). It is the frequency with which a threat agent is likely to act on an asset within a specified period, although the threat agent may act on the asset, but not on the success of the attack. A typical example is a hacker who has not successfully attacked a web server. Such an attack is not a loss event, but can be considered a threat event. This TEF consists of Contact and Action, and the attacking action is based on the contact of the threat agent. Contact refers to the frequency with which a threat agent may contact an asset within a certain period. Types of contact are classified as “Random”, “Regular”, and “Intentional”. Action refers to the probability that a threat agent will perform an actual attack on an asset in a situation where a contact of the threat agent occurs. The preconditions for the action are intelligent threats such as “Thinking” threat agents and malicious programs.

Vulnerability (VUL). It means the probability that an asset is unable to resist the action of the threat agent. The vulnerability occurs because there is a difference between the power of the threat agent and the ability of the asset to resist it. That is, the vulnerability is relative to the type of threat. These VULs are measured as a combination of Threat Capability (TCap) and Control Strength (CS). TCap means the level of expected threat agent power that could have a negative impact on the asset. Since all the threats of TCap are not created the same, the threat agent does not perform the same function in one threat community. Also, although the TCap may be high for an attack target in which a threat agent is set, it may be incompetent for other objects. CS refers to the strength with which an asset possesses resistance against threats, measured against the threat agent or threat community. If the CS is set to a small number of controls, the probability value of each control can be calculated independently.

The elements that measure these threats can create a profiling list considering the assets and threat agents that can be the targets of the attack at the end. The attackers are called the threat community (TCom), which is divided into Nation-States, Groups, and Individuals according to the “NIST SP 800-30: Risk Management Guide for Information Technology Systems”. Nation-States refers to threat agents who conduct cyber-attacks by government or government support. Typical examples are APT37 (North Korea), APT32 (Vietnam), and APT33 (Iran). Group refers to threat agents for political ideals or legitimate and illegal gains against cyber-attacks, such as Anonymous, APT18 (Wekby), APT19 (Codoso), and APT28 (Tsar). These organizations receive informal support, so if the sponsor organization becomes a state or government by cyber threat TTP in the future, the threat community may be changed to Nation-States. Individuals consist of outsiders, insiders, etc. and perform cyber-attacks based on personal beliefs or retaliation. A typical example is Edward Snowden. Through the classification of TCom, threat profiling stores and manages attack patterns and characteristics of malicious attackers with motive, primary intent, sponsorship, preferred general target characteristics, preferred targets, capability, personal risk tolerance, and concern for collateral damage.

4. Limitation of Mobile Malware Detection

Research and development for the detection of malicious behavior by attackers are one of the important technologies in the field of security. As the usability of mobile device increases in IoT environment, threatening behavior is changing not only damage to one device, but also damage to the whole environment of the society. From this point of view, most malicious behavior detection studies tend to focus on automation technology such as training set configuration and detection algorithm, such as machine learning [37].

Z. Aung and W. Zaw proposed a semisupervised algorithm to detect malware in Android. This research extracted permission features from Android’s apk files and clustered files suspected of malicious activity. In addition, this research classified clusters of malware using three methods: Decision Tree Algorithm, Random Forests (RF), Classification and Regression Tree (CART). They found that for 500 sample Android applications, the RF algorithm showed a high accuracy of 91.8% [38]. D. J. Wu et al. proposed a static analysis-based mechanism to detect Android malware. The proposed model clusters the information related to permissions and intents in the manifest file by setting them as features. Finally, this research applied the classification algorithm to detect Android malware. They experimented with Expectation Maximization (EM), clustering method of K-means algorithm, k-nearest neighbor (-NN) and Naïve Bayes classification algorithm and found that the algorithm combining K-means and KNN is 97.87% [39]. N. Khan et al. proposed an efficient method for detecting malicious Java scripts in web applications. In this research, feature subset uses the wrapper method to reduce dimensions and supervised learning algorithms to achieve high accuracy. They applied support vector machine (SVM), Naïve Bayes, decision tree, -NN, and RF algorithms and found that the -NN algorithm achieves about 98.3% accuracy [40].

H. S. Ham et al. conducted research to detect Android malware from the viewpoint of a secondary impact such as privacy infringement and information leakage caused by combination of IoT and smartphone. They used a linear SVM algorithm to detect Android malware. As a result, this study showed that the linear SVM algorithm yielded 95.7% higher than other machine learning classification algorithms (average precision = 68.7%) [41]. J. Sahs and L. Khan detected Android malware using a one-class SVM algorithm. They used the permissions of the application as a feature, resulting in about 50% true negative (TN) and about 90% true positive (TP) for 91 malicious Android applications out of 2081 [42]. M. G. Schultz et al. studied the malware detection of Android malware as a feature of the DLL and the raw hexadecimal representation of the system call, string, and binary generated by the program. This study compared the accuracy of the signature method, RIPPER, Naïve Bayes, and Multi-Naïve Bayes algorithm. As a result, the Multi-Naïve Bayes algorithm showed a higher accuracy of 97.76% than the other algorithms [43].

K. Riad and L. Ke detected malicious applications in the Google Play Store. This research proposes a RoughDroid algorithm, a floppy analysis technology that can discover Android malware applications directly on smartphones. They extracted the features of the XML manifest file and Dex file of the Android application as features and obtained the accuracy of 95.6% [44]. I. Martin et al. analyzed indirect functions and metadata to identify patterns in Android malware applications. The research focused on malware detection, such as application developers and certificate issuers published on the Android Market. They used logistic regression (LR), SVM, and RF to construct a small, efficient classifier that could detect malware applications early in the sandbox [45]. W. Niu et al. proposed a method to detect the advanced persistent threat (APT) malicious code command and control (C and C) domain with high accuracy by analyzing the mobile DNS log. The study scored the domain through Alexa rankings and VirusTotal and identified the C and C domain of malware using the Global Abnormal Forest (GAF) algorithm. As a result, they confirmed that the proposed method achieves more than 99% accuracy compared with the local outlier factor (LOF), -NN, and iForest algorithm [46].

Most studies are focused on improving the effectiveness or accuracy of malware detection. However, in recent years, the focus of CTI on security has focused on preventing accidents. Detection of malicious behavior of an attacker is also an important technology, but whether or not the detected result can express a criterion to respond to a threat is one of the important challenges. Recently, MITRE has emphasized that it is the task of the current security field to recognize threats while presenting the method of Threat Assessment and Remediation Analysis (TARA) [33]. In addition, Fireeye, a global security firm, publishes intelligence reports quarterly and claims that measurement and evaluation of cyber threats are essential [47].

5. Threat Assessment Procedure from the Perspective of SA

In terms of the situational awareness model, threat measurement tends to depend on the human cognitive judgment. This study measures the LEF factors of the FAIR model as a probability distribution to minimize qualitative judgments and provide objective indicators to decision-makers. These probability distributions aim at securing objectivity of threat assessment by detecting actual malware data. The proposed SA model consists of three levels. The first level is Malware Awareness, which detects the attacker's malicious behavior using supervised learning. The second level is Threat Awareness, which maps the features of detected malicious activity to the LEF factors of the FAIR model and then evaluates the threat. This level also clusters the threats to the number of classes required by the decision maker. The final level is Decision-Making Awareness, which provides decision-makers with an optimal level of threat (see Figure 4) [48].

5.1. Malware Awareness

As the expansion of the IoT environment increases the usability of mobile devices, cyber-attacks on smartphones are increasing. Particularly, this is done by a malicious application of attack on the mobile environment, and most attacks are an unknown attack. However, existing rule-based algorithms have limitations in actively detecting new threats (e.g., unknown malicious applications). The Malware Awareness Level aims to detect threats using machine learning. The features of the high detection result (e.g., ROC, True Positive, Accuracy, etc.) obtained through the classification of the threat are used as the probability distribution of the FAIR model's LEF factors for threat measurement.

5.2. Threat Awareness

The probability distribution of features that resulted in high detection results at the Malware Awareness Level becomes the input value of the LEF factors. This study matches the detected results to TEF and VUL (TCAP, CS). There is a limit to the difficulty in clearly distinguishing the threat class from the LEF measured by the joint probability of TEF and VUL. To solve this problem, this research uses the K-means clustering algorithm. Based on the matched results, the decision maker clusters features into threat classes with dimensions. For example, in the case of five ratings from “Very High” to “Very Low”, is 5. The probability value of the features matched to TEF and VUL is . In this case, if the th corresponds to th threat cluster, and 0 is otherwise. The distortion measure function for clustering the threat class is TC. This can be expressed as [49]To minimize the threat class function TC value, we use the iterative method EM (Expectation Maximization) algorithm. The EM algorithm is a process of iteratively fixing one of and , and the convergence value is calculated through it. The TC value is the minimum value for m. Therefore, if the function TC is differentiated to , is used as a value to cluster the threat class. This can be expressed as [50]

5.3. Decision-Making Awareness

Even if the threat class is clustered through the K-means algorithm at the Threat Awareness Level, there may be a limit to the semantic decision. To overcome these limitations, this level is optimized by clustering threat class using Gaussian Mixture Model (GMM). In this paper, we assume that the distributions of the threat class are combined into the number of Gaussian distributions. The GMM can statistically deduce the characteristics of these Gaussian probability distributions. The value of is the number of threat classes set at Threat Awareness Level. The GMM performance measure is a log likelihood function that estimates the parameter maximizing the probability of the function at the Threat Awareness Level. The parameter of the log likelihood function, which optimizes the threat at the Decision-Making Awareness level, is the mean , covariance of the Gaussian distributions and the probability . This is equivalent to Since the GMM optimization algorithm is also difficult to estimate by jointly updating, this level uses the EM algorithm, which is an alternative update method. This Decision-Making Awareness level minimizes the log likelihood function to provide an optimized threat class for decision-makers. It fixes , calculates , , or fixes , and computes . Equations to optimize the threat class are 5, 6, and 7. is a parameter that makes optimization calculations easier with latent variables [51, 52].

6. Result of Threat Assessment for Android Malware Application

To maximize the availability of IoT devices, companies developing IoT devices are encouraging applications to be installed on user's smartphones to help connect their IoT devices. However, when a users’ smartphone is exposed to a malicious application, the personal information stored on the user's smartphone and the privacy information collected through the IoT device may be leaked to the attacker. Therefore, when detecting the malicious behavior of the smartphone in the center of the multiple link system, it is important to the process of assessment by CTI as well as the process of producing it. This section measures the threat of a malicious application in the Android environment by applying the threat assessment procedure of the SA model perspective proposed above.

6.1. Detection of Android Malicious Application

The data of this study refer to the malicious application data obtained by static analysis from J. Jang et al. This data consists of APIs related to malicious behavior of Android applications, and a list of system commands. The total number of data is 2000 (1500 normal applications, 500 malicious applications) [53]. Based on this data, this study applied classification algorithms (e.g., KNN, SVM, Logistic Regression, etc.) with high accuracy during related studies. The Confusion Matrix was used for the measured results.

-Nearest Neighbor (-NN). It is a nonparametric method used for classification or regression. This algorithm consists of the nearest training data in the feature space with . In -NN, objects are classified by majority vote of objects assigned to the most common items among nearest neighbors of . This study utilized various k-NN algorithms. The basic k-NN algorithm utilizes the Euclidean distance measure. Cosine k-NN is a method of using cosine similarity to cluster the similarity between vectors measured using the cosine of the angle between two vectors of inner space. Cubic k-NN is a clustering method using Minkowski space distance measurement. If this space is used, it is effective, for high-dimensional calculation because it has the advantage of expressing more than four dimensions. Weighted k-NN is a method of clustering by assigning the distance weight as a reciprocal of the square. This method has been used for learning data near distance for new input data has more influence on the decision than neighboring data far away. [54].

Support Vector Machine (SVM). It is a method of classifying through a set of hyperplanes. This algorithm classifies the hyperplane into data that makes the classifier error small. That is, the categorized application data calculates the hyperplane with the greatest distance from the nearest malicious application data. To calculate this, the algorithm classifies the data by defining a kernel function . The sum of represents the degree of proximity of the training data and its corresponding data point and measures the relative proximity of these points. SVM uses a linear (1st order kernel function) as a technique to find the hyperplane that maximizes the margin while classifying the data well. However, since it is difficult to classify data completely through any straight line, various kernel functions should be used for accurate classification. In this study, kernel functions were modified from first to third order, and kernel functions were transformed into Gaussian functions [55].

Logistic Regression. It refers to the use of the relationship between dependent and independent variables as a concrete function for future prediction models. This algorithm is a classification technique in which the dependent variable is categorical data and the result of the data is classified into a specific class when the input data is given. This study classified malicious application features into categorical data [56].

The Malware Awareness of this study derives its results through the Confusion Matrix, which evaluates the performance of machine learning algorithms. The Confusion Matrix is represented by true positive (TP), false negative (FN), false positive (FP), and true negative (TN). TP is a measure of normal activity of an application as normal behavior. FN is a measure of normal behavior as malicious behavior. FP is a measure of malicious behavior as normal behavior. TN is an indicator of malicious behavior as malicious behavior. Through these indicators, true positive rate (TPR), positive predictive value (PPV), false negative rate (FNR), accuracy (ACC) and -score can be measured (see from (8) to (12)) [57].

The Android malicious application is categorized into 136 features corresponding to permission and intent, resulting in high accuracy (see Table 1; Figure 5). In particular, the algorithms that show the highest outcome among the algorithms used are SVM series, such as linear SVM, Quadratic SVM, Cubic SVM, and logistic regression. The features (such as “permission” and “intent”) that show high detection results are used in Threat Awareness as factors of threat in FAIR model.

6.2. Threat Assessment of Android Malicious Application

Detection of malicious Android applications can achieve accurate results using previously studied machine learning algorithms. However, in the situational awareness (or decision maker) viewpoint, there is a limit in choosing the countermeasure through the detection of the threat. In other words, decision-makers must consider the cost of ignoring or responding to threats to assets they hold. To maximize the effective use of machine learning, the threat assessment method proposed in this study utilized the factors of LEF of the FAIR model.

Threat Event Frequency (TEF) of Malicious Application. The TEF of the FAIR model consists of the factors of Contact and Action. Contact indicates the possibility of an attack, usually measured in frequency. Action means that the attacking action works in a real environment, and it is mainly expressed by probability. In other words, the combination of these two factors is expressed as the frequency of malicious application's threats in the Android environment. However, the frequency of these threats is not an indication of the success of the attack. Previously, the features used for detection at the Malware Awareness Level were identified as “permission” and “intent”. The features of “permission” have features that allow access to most data, such as camera, location, call history, etc. in the Android environment. Typical examples are android.permission.GET_ACCOUNTS, android.permission.BLUETOOTH_ADMIN, android.permission.READ_SMS, etc. In the case of “intent”, most of the less relevant features (android.intent.action.MAIN, android.intent.action.BOOT_COMPLETED, android.intent.action.DATA_SMS_RECEIVED, etc.) that access malicious application were excluded from the threat assessment process. There is a research result that features of “permission” act as a features of the malicious applications as mentioned in Limitation of Mobile Malware Detection. Based on the detection results, the TEF is set to the probability distribution of the features of the “permission” series and reflected in the threat assessment.

Vulnerability (VUL) of Malicious Application. VUL is caused by the difference between the power of threat (TCap, Threat Capability) and the defending ability (CS, Control Strength) of the target asset. The threat in this process is whether the application installed in the Android environment has access to the device's data. Therefore, TCap can be configured with the probability of “permission” features included in the Android malicious application. In other words, the more access privileges a malicious application has, the more robust it is. These threat capabilities have various malicious “permissions” that an attacker could illegally acquire various pieces of information. CS is the defensive ability of assets. This is the same as the ability to find vulnerabilities in the Android environment and to test them against an exploit kit. These indices can be set based on the price traded in Android malware. “Zerodium” is a representative company that deals with mobile malware. Through “Zerodium Payouts for Mobiles”, this study sets up CS for the Android OS (see Table 2) [58].

For the threat assessment, this study assumed that each probability distribution of TEF and VUL for the Android Malicious application was generated under independent conditions. Through this, not only the TEF and VUL factors, but also the threat class (LEF) can be measured. In the case of such a combination, it is difficult to intuitively judge the current situation from the viewpoint of the decision maker. For this reason, the above-mentioned method adds an optimization step through GMM. That is, at the Threat Awareness Level, the decision maker can classify three (Low, Moderate, and High), five (Very Low, Low, Moderate, High, and Very High), or overall seven classes by changing the value against the threat, and visualization is possible through the optimization process through GMM. The advantage of subdividing this threat class is the ability to determine the time and cost of investing in countermeasures to counter the threat (see Figure 6).

A representative example from the point of view of threat assessment is TARA. In this case, the basic strategy is to measure threats and identify sophisticated countermeasures. In particular, this method assumes that each countermeasure is independent when selecting a countermeasure. This has the limitation that the selected countermeasures can reduce the efficiency of the other countermeasures. However, if the elements of the threat are identified and probabilistic, such as the approach presented in this study, a conditional approach to the countermeasure of the threat is possible. In addition, when combining with recent threat detection technology using machine learning, deep learning, artificial intelligence, etc., it is possible to contribute to the efficiency of decision-making by securing the real time of threat evaluation.

7. Summary and Discussion

The expansion of the IoT environment is making connectivity between cyberspace and real space stronger. Also, for efficient control of the IoT environment, many companies are concentrating on developing mobile devices such as smartphones (hardware development). However, the security of the application installed on the smartphone is weak. This results in a higher attack rate of malicious attackers. The malicious application is being updated on the market (e.g., Google Play Store) without being verified by the administrator. Many studies have focused on the detection of these malicious applications, and their accuracy and efficiency are approaching commercialization. However, there is a limit to utilize the detected result of a malicious application for decision-making from the manager's point of view. Also, existing risk assessment studies are concentrated on owning assets, so there are limitations that simplify the threat, and there are few studies evaluating threats in connection with research on threat detection through machine learning. This study proposes a method to extend and assess threat detection using machine learning for applications installed in the Android OS. The proposed scheme is Malware Awareness (Level 1) aimed at detecting malicious behavior for Android application, Threat Awareness (Level 2) for rating it, and Decision-Making Awareness (Level 3) for optimizing threat class.

The reasons for approaching from the viewpoint of SA are also related to CTI which is a recent issue. The availability of CTI is an essential element of threat assessment. The TARA developed by MITER also emphasizes the ability to identify countermeasures through threat assessment. In addition, cyber-SA framework research from the “Cybaware” project has resulted in an asset, configuration, impact, threat, and visualization as key areas of research [59]. In particular, the threat area has identified and evaluated the types of attackers (TTP, Tactics/Techniques/Procedures) and objectives and developed countermeasures as research results. Therefore, the proposed approach can contribute to threat detection, production, measurement, and evaluation of CTI in the security field.

Data Availability

The data used to support the findings of this study are available from the authors upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by Defense Acquisition Program Administration and Agency for Defense Development under Contract UD160066BD.