Abstract

Different types of connectivity are available on smartphones such as WiFi, infrared, Bluetooth, GPRS, GPS, and GSM. The ubiquitous computing features of smartphones make them a vital part of our lives. The boom in smartphone technology has unfortunately attracted hackers and crackers as well. Smartphones have become the ideal hub for malware, gray ware, and spyware writers to exploit smartphone vulnerabilities and insecure communication channels. For every security service introduced, there is simultaneously a counterattack to breach the security and vice versa. Until a new mechanism is discovered, the diverse classifications of technology mean that one security contrivance cannot be a remedy for phishing attacks in all circumstances. Therefore, a novel architecture for antiphishing is mandatory that can compensate web page protection and authentication from falsified web pages on smartphones. In this paper, we developed a cluster-based antiphishing (CAP) model, which is a lightweight scheme specifically for smartphones to save energy in portable devices. The model is significant in identifying, clustering, and preventing phishing attacks on smartphone platforms. Our CAP model detects and prevents illegal access to smartphones based on clustering data to legitimate/normal and illegitimate/abnormal. First, we evaluated our scheme with mathematical and algorithmic methods. Next, we conducted a real test bed to identify and counter phishing attacks on smartphones which provided 90% accuracy in the detection system as true positives and less than 9% of the results as true negative.

1. Introduction

A phishing attack is used to obstruct and limit legitimate user access to resources of service providers on global networks. A phish exploits the victim’s systems resources to acquire confidential and nonconfidential data. The phishing attack can be single and standalone on the system resources or can be distributed, known as a phishing Distributed Denial of Services (DDOS) attack. Phishing attack focuses on a single system by using different launching pads [1]. Although it is not mandatory for both standalone and distributed phishing attacks to harm the data permanently and directly, it is certain that they deliberately compromise all the resource availability for cornerstone security services. Phishing attacks craft congestion in networks by spawning tremendous data traffic in the vicinity of the victim’s system, which is adequate to thwart a legitimate packet from reaching its destination. In phishing attacks, the attacking traffic not only contaminates legitimate users but also attacks the target system, either to downgrade the system performance or, in some cases, to stop the service availability. Compared to other cyberattacks, phishing attacks are harder and more complex to circumvent [2]. Most often, phishing attacks exploit network bandwidth and connectivity, downgrading performance in systems they are compromising, whether during network-based attacks or host-based attacks, as shown in Figure 1. Consequently, phishing attacks are successful at halting, interrupting, and demoting the real-time performance of the system by draining all its resources [3].

Figure 1 shows multiple-incident level attacks that can occur on smartphones. Among these attacks, a phishing attack is the worst as its damages smartphones compared to other attacks such as ransomware, backdoors, Denial of Service (DOS)/(DDOS), bot activity, and worm propagation. A phishing attack harms smartphones in both active and passive attack scenarios. In active attack scenarios, a phishing attack damages all the data content of the smartphones, while in passive attack scenarios, a phishing attack uses the smartphone as a launching pad against other systems after compromising the system. The additional attacks mentioned in Figure 1 target the system in either active or passive mode. For example, other attacks in Layer 1 are DOS/DDOS, escalation privileges (internal/external), unauthorized access, and malware, which are active attacks only. Similarly, if we examine Layer 2 of Figure 1, different attacks are seen, which can be easily classified as active or passive. In this work, the challenge in all the various incident types is the phishing attack, where an intruder can harm the victim in both an active and passive manner.

Confidentiality, integrity, and availability (CIA) are the main trivet underlying security services. Avoiding any one of these tenets deliberately or inadvertently might lead to open security breaches and unresolved vulnerabilities, consequently leading to a loss of credentials, reputations, and financial gains. Hence, the focus on web security against phishing attacks is necessary and must counter the latest exploitation techniques [3]. Among all cyberattacks, phishing attacks appear friendly but target financial transactions and highly confidential data. Their exploitation might result in the disappearance of financial gains and critical losses. Phishing attacks employ spam emails that attack online banking and money transfer accounts, all of which contain secret pins and passwords for user authentication, which is why hackers focus on banking and money transfer websites [4].

Cluster-based searching is vital, particularly in cybersecurity, because the algorithms use Markov chains to rank the data into clusters. Also, cluster-based searches work on probability, in which clustering is more appropriate for the query in process. Cluster-based searches also save battery life and execution time for smartphones.

The following are the objectives of this work:(1)To counter the phishing attacks on smartphones.(2)To falsify the fake websites containing phishing scams into a cluster form.(3)To report the true positives and true negatives.(4)To gray list and then blacklist the phishing attack links

The paper is organized as follows: Section 2 elaborates the literature review with critical analysis along with the security trial parameters. Section 3 explains the proposed scheme. Experimental results are evaluated with the help of detailed algorithms and mathematical and statistical methods. With assistance from the Weka tool and JavaScript language, a real test bed experiment was conducted using datasets from UCI and Mendeley, which are discussed in Section 4. Finally, Section 5 summarizes the key findings and presents our future trends.

2. Literature Review

Tools such as the Global System for Mobile (GSM), General Packet Radio Services (GPRS), enhanced data rates for GSM evolution (EDGE), Universal Mobile Telecommunication System (UMTS), Bluetooth, and infrared make smartphones as a device of connectivity. However, this connectivity also serves as a gateway for malware, gray ware, and spyware. The GSM global communications expertise of another generation-2G enables the messaging among smartphones and sorted locations by exchanging the subsystem’s replaced first-generation (1G) analog-centered facilities for a numerical, complete duplex and circuit substituted network for voice telephony [4].

GPRS 2.5-generation technology developed to improve the data rates and decrease the connection access time for 2G. Implementing the packet switching mechanism and introducing the Wireless Access Protocol (WAP) and Multimedia Messaging Services (MMS). EDGE improved GPRS features with an enhancement of its data rates and service reliability [5].

The UTMS, developed in 2002 attained a data rate perimeter of 2 Mbps together for packet and circuit-switching networks sustained concurrently. Several facilities can be entered instantaneously by the consumer such as streaming, discussions, and collaborations with colleagues. Bluetooth was industrialized in 1999, grounded on radio-broadcast small wavelengths customary for data communication as well as private area networks. Bluetooth provided an optimal level of security and a small array of communications up to 100 meters with negligible charge and ingestion [6, 7].

Based on the legality, delivery methods, and user authentication, mobile threats are classified into three main categories: malware, gray ware, and spyware, with regard to assorted attack vectors, motivations, and defense mechanisms [8].

For launching cyberattacks, all illegitimate activities such as spam emails and messages and twitter messages are exploited for smartphones. These cyberattacks either damage all the data contacted in the smartphones or compromise the smartphone to use it as a launching pad against other platforms. In [911], all schemes used data-mining techniques in different ways to counter cyberattacks on smartphones.

In [12, 13], the authors highlighted phishing attacks targeting smart grids to launch a phishing attack and compromise the system data to roll back the system. The authors proposed a data-mining technique to identify and counter phishing attacks and falsify fake pages and messages.

Table 1 shows the critical evaluation of the literature review with respect to the classifications. The table consists of six columns, in which the first column states the approach of the scheme with its reference, while the second column explains the classification of the category where the scheme is occurring.

In the third column, the mechanism of the antiphishing scheme and how the scheme is working are briefly stated. In the fourth column, the contribution or the strength of the scheme is highlighted with its achievements. In the fifth column, the weaknesses or the limitations of the scheme are mentioned with its possible vulnerabilities. In the sixth column, we mentioned the implementation scenario that in which scenario the scheme is implemented. In the last column, there is the tool/technology used by the scheme mentioned in the literature.

The main emphasis of malware is to annoy the genuine consumers, damaging the platform, cutting the reserved data, or misusing the scheme or policy susceptibilities irrespective of any notice to the victim users. Malware includes viruses, worms, Trojans, rootkits, and botnets. A computer virus is defined as a self-replicating piece of code, and a worm is a self-copying program [19], Trojans impersonate software that appears to provide services but in reality is a malicious program. A rootkit installs Trojans after which it then disables firewalls and antiviruses. Finally, botnets are a complete set of device viruses that infect victims for organized crime, consisting of a group of “zombies”; each is an infected computer or device [20].

Malware is prohibited in numerous countries, such as the United States, and in some cases of malware sharing, jail sentences have been administered [21].

Determining the position of a node and retrieving its history for a specific span of time is the main objective of the spyware. Depending on the practical circumstances, spyware may be genuine or illicit. For example, if a person is going to install personal spyware on his children or spouse’s smartphone, the spyware is not going to cheat the victim. However, if the spyware is installed without the user’s consent and successfully gains access to the device, sending confidential information to the intruder rather than the real author, then it is illegitimate [22].

Accumulating consumer evidence for the sole tenacity of summarizing and then advertising are the main intentions of gray ware, as indicated in Table 2. The gray ware distributor’s corporate objectives are not to harm the user but rather to provide some sort of functionality and importance to the host user. If a user finds that the data collection process of gray ware is questionable, the user can complain and block the services of the gray ware. In contrast to malware and spyware, the illegal use of gray ware is punished by fines rather than any personal statements in countries where there is a rule of law. Therefore, gray ware is sometimes identified as laying at the boundary of legitimacy and illegitimacy. Based on the dogma of confidentiality and the consumer’s rights of grievances, gray ware companies must disclose their compilation practices [2326].

Another novelty of this research is that we classify schemes in a unique way that can be easily detected. We organized phishing attacks into multiple classes so that each category was tested against our proposed scheme of the CAP model. Based on the literature review, we classified the phishing attacks as shown in Figure 2. The main categories include Internet Protocol (IP), Uniform Resource Locator (URL), Domain Name System (DNS), certificate-based, social engineering, and technical maneuver [2729].

In the first category, IP-based phishing attacks are classified. In the second category, URL-based phishing techniques are classified, which are subclassified into abnormal URL, URL of anchor, URL of long address, and repeating the same characters of the URLs. In the third category, we classified all phishing attacks based on social engineering. In the fourth category, we classified all phishing attacks that can be caused by technical maneuvers. In the fifth category, we classified phishing attacks that can occur from DNS poisoning. In the last category, we classified attacks that can occur via digital certificates. This category is subclassified into Secure Socket Layer (SSL) and centralized authority certificates.

3. Proposed Schemes

A novel cluster-based antiphishing (CAP) mechanism, which addresses the following concerns, is necessary to secure users from cybersecurity attacks. For example, to thwart admittance of phishing websites/phishing attacks, to shield vital e-mail communication from phishes, to perceive deceptive website via (a) Appropriate Domain Name System and IP/MAC addresses toning and (b) authentication followed by authorization of website, data trickle consequential from device damage or robbery, inadvertent confession of data from smartphones, assault on decommissioned smartphones, mitigate spyware attacks, supervise network spoofing attacks, and financial malware attacks.

Our scheme operates in three main phases. In the first phase, the classification of incoming data is observed on the basis of a map provided in the second phase. In the second phase, packets are being clustered into their classified groups. In the third phase, digital forensics of the malicious packets are investigated, tracking back the culprits for future blacklists or recovery that can be used as a honey pot. As our scheme focuses on smartphones, which are capable of using only lightweight software, we designed our novel solution for implementations on base stations (centralized) rather than smartphones (distributed). This focus allows for placing all of the mechanisms in one package because the smartphone market is full of variants with different architectures of software and hardware. For smartphones, not only the attack (phishing) is distributed but also the tools and techniques are multiple and distributed in nature, such as social engineering and website spoofing techniques.

4. Results and Discussion

The CAP mechanism is elaborated in the following three evaluation methods: (1) algorithms, (2) mathematical and statistical formulae and tools (e.g., SPSS), and (3) test bed implementation via the Wireshark tool. The evaluation matrix consists of the following four main components: “true positive” measures the rate of correctly detected phishing attacks relative to overall prevailing phishing attacks; “true negative” measures the proportion of appropriately noticed genuine occurrences in relation to completely prevailing genuine occurrences; “false positive” measures the proportion of genuine occurrences that are inaccurately identified as phishing attacks relative to completely prevailing genuine occurrences; and finally, “false negative” measures the rate of phishing attacks that are inaccurately noticed as genuine relative to entirely prevailing phishing attacks.

4.1. Algorithm

In this section, the pseudocode of the CAP mechanism algorithm exploiting the IRC messenger of a smartphone is shown in Algorithm 1. In Step 1, we defined all the parameters involved in the execution of our scheme: PS, N, S, E, C, M, and P, all of which are labelled as described. In Step 2, the values received by the scheme as inputs will be validated, and each entry is executed as with FOUR subsequent IF and ELSE conditions. For example, if the argument E is received, then it means end times of communication or NULL value; if the argument C is received, then the channel name is identified through which the communication is required; if argument M is received, then the Internet Relay Chat (IRC) messenger will be communicated; and if argument P is received, then the port number is received through which the communication will be considered. In Step 3, once the communication is initiated with the IRC messenger, then a connection object is created in Step 4. Subsequently, a channel is created for communication between the nodes. In Step 5, an event handler is activated that is already defined in JavaScript to monitor malicious activities. The communication is countered if it is malicious; otherwise, it will proceed as normal.

//PS: packet size defined in RFC, N: number of packets (date rate) defined in RFC, S: start time of the data communication, E: end time of data communication, C for channel name, M for IRC messenger/IRC name, P for port number.
(1)Assignment and validating entries/values of the parameters defined
IF S ≥ 0 THEN IF E = 1 THEN ends time of communication ELSE 0, IF C = 1 THEN channel name ELSE 0, IF M = 1 THEN IRC messenger name ELSE0, IF P = 1 THEN port number ELSE 0.
(2)
(3)
(4)
(5) then
4.2. Mathematical and Statistical Model

In our mathematical and statistical model, we attempted to best generalize the CAP model to all possible incoming phishing attacks. We checked the maximum size and threshold value of the packets as well as their starting and ending limits. Furthermore, we tested combinations of various types of attacks from multiple resources and finally, the probability of the attack to occur:

For all ∑ of all bits = packet maximum size (defined in RFC).

∫ starting and ending of packets ratio = threshold (defined n RFC).

“Starting” indicates the lower limit and “ending” means the upper limit.

nCr combination of different packets from various resources.

nCr = n!/r! (n − r)!

Where n = for all incoming packets and r = malicious packets.

nPr permutation of different malign and nonmalign packets.

nPk = n!/(n − k)!

Hence, the probability of launching a successful attack.

4.3. Implementation via Weka Tool with Results

Figure 3 shows the results taken by the Weka tool after the successful launching of a phishing attack on a smartphone. To validate our scheme, we considered the following system setup. The principal server where the DNS is laid on is considered as the main target of a phishing attack. The clients are exhausting a JavaScript code to implement bots on the server, while keeping the JavaScript code in an indefinite loop within the malicious code. A Wireshark instrument mounted on the server notices the scheme’s position and deployment before and after the attack. Results are displayed in both scenarios in the real test bed implementation phase.

JavaScript code used for phishing attack: the malicious code below was used against our CAP model to test its capability for countering a phishing attack.

<html> <head>

hello me PHISHING attack.

</head> <SCRIPT language = JavaScript>

var name = prompt (“R u ready”, “Name”);

</SCRIPT> <BODY> </BODY> </html>

The results are shown in Figure 3, in which the blue shows the true positive data while the red shows the false positive data successfully encountered by our CAP model. The CAP model produces more than 90% accuracy in its detection system, which has been classified as true positive (blue packets). Similarly, the CAP model also reduces the percentage of false negatives (red packets) to a single digit, that is, less than 9%.

4.4. Real Test Bed Experiment Conducted Using Datasets from UCI and Mendeley

As per the standard for assessment when evaluating the performance, a 10-time validation is completed for each classifier. A standard methodology examines a dataset by dividing data into 10 equal sizes, in which one set is used for testing and the other is used to train the data until each subset has been used for testing [30]. Investigating antiphishing revealed the following techniques for analysis, evaluation, and experimentations, as provided in Table 3. The 10 different techniques were then used for testing the UCI and Mendeley datasets.

4.4.1. Dataset Taken from UCI

Source Neda Abdelhamid, Auckland Institute of Studies, nedah '@' ais.ac.nz.

(1) Dataset Information. In online communications such as e-banking and e-commerce, phishing attacks are considered a threat. From 1353 websites, we have collected different issues related to legal and phishing websites (http://www.phishtank.com), where anyone can collect information about phishing attacks. The website used is Yahoo, in which 548 websites out of 1353 found were legitimate using a web script developed in PHP, 702 phishing URLs and 103 suspicious URLs. The results are shown in Table 4.

In the second column of Table 4, different techniques are considered for testing on the UCI datasets. As shown in the third column, the highest correctly classified technique used is J48, with the lowest incorrectly classification specified. However, regarding percentages, J48 is still less effective than our CAP model, which produces 90% true positive and 9% false negative results. As shown in the sixth column, the precision value of the J48 is 0.899, while the accuracy level in the last column is 89.8%. Similarly, if we consider any other technique applied to the UCI datasets shown in Table 4, none can reach the value of 90% accuracy. Therefore, we can deduce that the accuracy of our CAP model is much superior to any other of the latest techniques tested on UCI datasets.

4.4.2. Dataset Taken from Mendeley

Source Phishing web page: Phish Tank, Legitimate web page source: Alexa, Common Crawl.

(1) Dataset Information. In this scenario, the dataset under consideration is extracted from 10,000 websites; 48 features were extracted from 5000 phishing and legitimate websites. The results are shown in Table 5.

As a second test case, different techniques were considered for Mendeley datasets. As seen in the third column of Table 5, J48 shows the highest precision (0.873) and accuracy at 87.31%, yet it is still less than our CAP model. We can conclude that by either changing the datasets or the technique, no method can achieve better than our proposed CAP model.

5. Conclusion

Fake commercial advertisements can play the role of honey pots for phishing attacks, as they behave like original finance and business sector advertisements. As the users follow, the fake websites and log on once, it is enough for the phishing hackers to steal the passwords and transact according to their own wishes and possibly changing login details. Currently, all phishing hackers must pass through some sort of Internet service providers (ISP), for which the administrator is responsible for countermeasures. Techniques such as content filtering, heuristics engines, IP blacklisting, and fingerprinting are currently employed; however, the problem of spamming followed by phishing is not yet contained. Neither of the schemes in the broad classification above focus on smartphone platforms. All the schemes in this work are silent about the digital forensics of the phishing attack. Our scheme successfully identified and maligned all the phishing attack packets in one single solution. The test bed results showed that the CAP model successfully identified and countered phishing attacks on smartphones. The CAP model produces more than 90% accuracy in its detection system, which has been classified as true positive. Similarly, the CAP model also reduces the percentage of the true negative to a single digit, namely, less than 9%. As a future trend, our model can be extended for Edge, FOG, and cloud computing environments as the CAP model is a lightweight scheme that can easily be integrated into such energy deficient computing zones.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.