Security and Communication Networks

Security and Communication Networks / 2018 / Article

Research Article | Open Access

Volume 2018 |Article ID 9706706 | https://doi.org/10.1155/2018/9706706

Jungwoo Seo, Sangjin Lee, "Abnormal Behavior Detection to Identify Infected Systems Using the APChain Algorithm and Behavioral Profiling", Security and Communication Networks, vol. 2018, Article ID 9706706, 24 pages, 2018. https://doi.org/10.1155/2018/9706706

Abnormal Behavior Detection to Identify Infected Systems Using the APChain Algorithm and Behavioral Profiling

Academic Editor: Petros Nicopolitidis
Received28 Jan 2018
Revised20 May 2018
Accepted06 Aug 2018
Published04 Sep 2018

Abstract

Recent cyber-attacks have used unknown malicious code or advanced attack techniques, such as zero-day attacks, making them extremely difficult to detect using traditional intrusion detection systems. Botnet attacks, for example, are a very sophisticated type of cyber-security threat. Malicious code or vulnerabilities are used to infect endpoints. Systems infected with this malicious code connect a communications channel to a command and control (C&C) server and receive commands to perform attacks on target servers. To effectively protect a corporate network’s resources against such threats, we must be able to detect infected systems before an attack occurs. In this paper, an attack pattern chain algorithm (APChain) is proposed to identify infected systems in real-time network environments, and a methodology for detecting abnormal behavior through network-based behavioral profiling is explained. APChain analyzes the attribute information of real-time network traffic, connects chains over time, and conducts behavioral profiling of different attack types to detect abnormal behavior. The dataset used in the experiment employed real-time traffic accumulated over a period of six months, and the proposed algorithm was developed into a prototype for the experiment. The C&C channel detection accuracy was measured at 0.996, the true positive rate at 1.0, and the false positive rate at 0.003. This study proposes a methodology that can overcome the limitations of conventional security mechanisms and suggests an approach to the detection of abnormal behavior in a real-time network environment.

1. Introduction

According to the 2016 Internet Security Report [1], targeted attacks, such as spear phishing, increased by 55% in 2015 compared to the previous year. Notably, in 2015, attacks using zero-day vulnerabilities increased by 125%. Every year, more than 10 new zero-day vulnerabilities are reported. Moreover, 430 million new pieces of malicious code were discovered in 2015, a 36% increase from the year before. New vulnerabilities include targeted attacks, smartphone threats, social media scams, and Internet of Things (IoT) vulnerabilities.

In the past, attacks tended to extensively infect nonspecific systems with malicious code; however, cyber-attacks that attack targets with specific objectives, such as leaking important information or destroying the system, are becoming more common [2]. Attackers distributing malicious code can control a remote host through a command and control (C&C) channel and connect to backdoor networks to perform its attacks. An effective way to detect hosts that are infected by a botnet is to detect the C&C channel and eliminate the infected system before the attack can be launched. However, a number of major challenges exist in detecting C&C channels. For example, attackers may regularly change the address of the C&C server or use evasive methods such as proxy server traffic redirection. Additionally, because C&C channels use HTTP or HTTPS protocols to communicate, they are difficult to distinguish from general web traffic, making it challenging to establish a definitive countermeasure.

The research in [3, 4] identified botnet C&C channels in an internal network without any prior information. However, as mentioned in [5], because botnet attacks have distinguishable characteristics, an improved detection algorithm is needed [6]. For example, the research in [3] detected C&C channels by checking the active responses from a host group at regular time intervals. However, to communicate with the C&C server, a botnet attempts to make contact in irregular connection cycles, presenting a problem for existing methodologies employed to detect C&C channels. For that reason, recent studies, including [4, 79], have switched their focus to detecting abnormal behaviors of infected systems.

The effective detection of a botnet requires a detailed understanding of the internal network environment and the information service, as well as the configuration of multiple network monitoring environments such as log analysis, file integrity checking, registry monitoring, and rootkit detection. However, when configuring a host-based and network-based consolidated monitoring environment in a corporate network environment, resource utilization and performance limitations are typically encountered. Setting up many rule sets and excessive anomaly detection protocols to identify botnets in a large volume network environment will increase resource inefficiency and, in some cases, threaten the operability of the internal network. Because of these problems, companies limit the rule sets for their information security system or shut down their detection function. Therefore, understanding the characteristics of a botnet, improving resource efficiency, and providing stable detection performance are important elements of any response to intrusions.

In this paper, in order to detect botnets, the attribute information of network traffic is used to construct an attack pattern chain algorithm (APChain) over time, and behavioral profiling is conducted to detect abnormal activity. With this method, real-time network traffic analysis, optimal resource utilization, and encrypted packet attack detection are possible.

The remainder of the paper is structured as follows. Section 2 reviews related work and current challenges, while Section 3 presents a conceptual overview of the proposed approach. Section 4 describes the system model for the proposed algorithm’s architecture. Section 5 presents an experimental evaluation of the approach using a real-time traffic dataset. Finally, conclusions are drawn in Section 6.

Botnets have become a major threat on the Internet and extensive research has been conducted over the past several years to detect them. Botnet detection can be classified into two main types: vertical correlation [10] and horizontal correlation [3, 11].

BotHunter [10] is an example of detection based on vertical correlation. It observes a single machine and compares its behavior with a model of bot behavior. It recognizes correlated dialog trails consisting of multiple stages and representing successful bot infection. Therefore, this strategy is also referred to as “dialog correlation.” BotHunter is designed to track two-way communication flows between internal assets and external entities and consists of a correlation engine driven by several malware-focused network detection sensors. The BotHunter correlator links the dialog trail of inbound intrusion alarms with those outbound communication patterns to detect infected local hosts. When a sequence of evidence matches BotHunter's infection dialog model, a report is produced that captures all of the events relevant to the local host’s role in the infection process.

BotHunter has some important limitations. For example, it is restricted to the life cycle of the predefined infection model, and some stages, such as C&C communication, provide only signature-based sensors. Thus, BotSniffer and BotMiner are often used to complement BotHunter. They do not necessarily require the observation of multiple different stages within an individual host, nor do they require botnet-specific signatures.

BotSniffer [3] and BotMiner [11] operate on the principle of horizontal correlation by observing correlations and similarity across multiple hosts. Because horizontal correlation detection strategies conduct preprogrammed activities related to the C&C channel under the control of a botmaster, bots within the same botnet will exhibit spatial-temporal correlation and similarity. BotSniffer is designed to primarily detect centralized C&C channels, and it monitors multiple rounds of spatial-temporal correlation and the similarity in activity responses from a group of hosts that share a common centralized server connection such as IRC or HTTP. BotSniffer can achieve the theoretical bounds for false positive and false negative rates within a reasonable detection time using statistical algorithms. BotMiner presents a more general detection framework that is independent of botnet C&C protocols and structure. It clusters together similar communication traffic and similar malicious traffic and performs cross-cluster correlation to identify hosts that share similar communication and malicious activity patterns. Therefore, these hosts are considered bots within the monitored network. BotSniffer has an important limitation in that it is restricted to the detection of botnets that primarily use centralized C&C channels. With horizontal correlation, it is difficult to detect a botnet in a small network environment and to classify hosts that are infected with different characteristics of the botnet within the same network range. In addition, BotHunter, BotSniffer, and BotMiner all usually require a relatively long time to observe multiple stages of botnet communication.

Other strategies for botnet detection have also been proposed. In [12], specific traffic information is extracted to run a learning module that can detect a C&C channel. NetFlow’s records (flow size, client access patterns, and temporal behavior) are used to identify the characteristics of the traffic; using the learning module, reference values are created to match the C&C channel characteristics. A detection module then conducts matching to detect the botnet C&C channel. In [13], attacks are detected by analyzing ordinary HTTP requests and C&C channel characteristics to detect HTTP-based C&C channels. To precisely differentiate between C&C and legal domains, a CODDs-defined approach is proposed. The proposed algorithm analyzes the DNS information requests corresponding to these domains during a particular time window.

In this paper, the APChain algorithm is proposed to detect abnormal behavior. Abnormal behavior traffic is detected in the initial stages with the goal of quickly eliminating systems infected by malware. We configure port mirroring on the backbone switch to create APChain in a real-time network. Over a period of time, attribute information is linked to the chain, and the results from APChain are then used to conduct behavioral profiling to detect abnormal behavior.

This study makes three main contributions. First, this paper proposes an approach that differs from traditional methods of botnet detection. The proposed methodology constructs APChain using traffic attribute information and detects abnormal behavior such as botnets through behavioral profiling. Our system is able to detect new types of botnet and can improve its accuracy by modifying its behavioral profiling algorithm. Second, botnets can have very flexible C&C channels. They can use different protocols such as IRC and HTTP and can encrypt content for C&C communication [3, 10, 11, 14]. This paper proposes a detection scheme that groups hosts with similar behavior over time. We are able to determine the IP address of the C&C server, the connection time, and connection count, among other parameters. As a result, the proposed approach detects variable C&C channels by tracking communication with the C&C server. Finally, current detection techniques are based on the inspection of network traffic. However, recent malware uses encrypted C&C traffic or code obfuscation to evade these detection techniques. The proposed methodology utilizes the attribute information of the protocol header to overcome problems associated with analyzing the payload of network traffic. Therefore, it is possible to detect encrypted C&C channels and to detect the abnormal behavior of obfuscated packets.

3. System Overview

In this section, we investigate the overall concept of the proposed methodology, the configuration of APChain in detecting abnormal behavior, and the behavioral profiling process.

3.1. Overview of the Proposed Methodology

Recent cyber-attacks are initiated by using advanced social engineering or by sending a targeted e-mail to attack targets [15]. If a host is infected by malicious code, an attempt to communicate with a C&C server is made and a communications channel is formed using Internet Remote Chat (IRC) or an HTTP protocol. When a communications channel is established, the host receives an attack command from the C&C server or updates the binary file. At this time, APChain is configured to monitor real-time traffic and detect abnormal behavior, followed by behavioral profiling.

The methodology proposed in this paper is presented in Figure 1. It consists of a five-step process to detect abnormal behavior. First, real-time traffic is collected by the flow collector and attribute information from the collected traffic is extracted. Second, this traffic attribute information is analyzed in order to configure APChain. APChain’s role is to connect the features of the traffic attribute information into a chain and to match this to abnormal behavior. Third, behavioral profiling based on APChain is conducted. Fourth, suspicious patterns are categorized and abnormal behavior is detected. Finally, intrusion responses against the abnormal behavior are executed, and network forensics for an audit trail are conducted.

3.2. Attack Pattern Chain Algorithm (APChain)

In order to detect abnormal behavior, attribute information is extracted from the DMZ or the internal backbone switch, and a chain is configured that contains the attribute information over time.

3.2.1. Attribute Information for APChain

Network traffic headers contain a variety of information such as the IP addresses and protocols of the origin and destination. The standardized extraction defined by the communication protocol enables efficient analysis. An 8-tuple source IP, source port, destination IP, destination port, protocol, access time, MAC, URL, which contains the fundamental attribute information required to create APChain, is extracted from the traffic header and payload, and a 10-tuple SN, SGN, TIN, CND, RATD, CNTI, RATD, CNTI, ACTTI, SCTTI is created, which provides additional attribute information based on the extracted 8-tuple. The 8-tuple defines the network connection between the origin and the destination systems, and the 10-tuple is utilized as analysis data for the detection of a host’s abnormal behavior.

3.2.2. Execution Results of APChain

The 8-tuple source IP, source port, destination IP, destination port, protocol, time, MAC, URL, collected through the network switch of the experimental environment depicted in Figure 10, is used as input to create APChain. The results using the APChain algorithm in Table 3 are expressed as Output=Input SN, SGN, TIN, CND, RATD, CNTI, RATD, CNTI, ACTTI, SCTTI. Table 1 presents the APChain table created using the traffic attribute information.


AttributeTypeDescription

SNVarcharSequence number

SGNVarcharSequence group number with the same destination IP

SipVarcharSource IP

SportIntegerSource port

DipVarcharDestination IP

DportIntegerDestination port

MACVarcharMedia Access Control address

PTVarcharProtocol

TSDateTime stamp

URLVarcharUniform Resource Locator address

TINIntegerTIN subtracts the previous access time from the current access time

CNDIntegerCND is the cumulative number of times a connection is made to the same target IP address

RATDIntegerRATD is the connection time interval to the same target IP address

CNTIIntegerCNTI is the cumulative number of times a connection is made to the same target IP address over 30 minutes

ACTTIIntegerACTTI is the average connection time interval to the same target IP address over 30 minutes

SCTTIIntegerSCTTI is the standard deviation of the connection time interval to the same target IP address over 30 minutes

Figure 2 shows an example of APChain configuration for the abnormal traffic behavior of a host infected by malicious code.

For example, the intrusion server (217.11.xxx.78) connects to the malicious server (12.5.xxx.48), which distributes malicious code, infecting the host. The infected host periodically communicates with the C&C server (106.23.xxx.129) and updates the binary malicious code or receives an attack command. At this time, to configure APChain to detect abnormal behavior, port mirroring is configured on the backbone switch, and network traffic attribute information is extracted to create APChain. This traffic attribute information is utilized to calculate additional attributes and added to APChain.

When connections are made to a web server with the same target address, they are defined in APChain by the same group name, “GXXX53-PXXXX,” and, after they are interconnected in the chain, behavioral profiling is conducted to detect abnormal behavior.

3.3. Behavioral Profiling

Behavioral profiling analyzes attack types based on their characteristics and categorizes any traffic that exhibits abnormal behavior. Table 2 shows the behavioral profiling algorithm using APChain.


Algorithm 1. Behavioral Profiling

Input: Result of Function APChain (T): where T is a collection of network traffic.
Output: Results of abnormal behavior
Function Behavioral_Profiling (T):
, where h is the host infected
by malware
, where d represents destination servers
RRATD field value of APChain
CCNTI field value of APChain
AACTTI field value of APChain
SSCTTI field value of APChain
while (not stop condition) do
if Abnormal_Behavior C&C then
, where m is the C&C server
/the host attempts to connect to the C&C server/
if Abnormal_Behavior (Pharming) then
, where m is the fake website
/the host connects to the fake website/
if Abnormal_Behavior (DDoS) then
, where m is the victim system
/the host executes a DDoS attack/
if Abnormal_Behavior (IP-spoofed DDoS) then


Algorithm 2. Attack pattern chain (APChain)

/ T set of packets /
Function APChain (T):
While (not stop condition) do
/ Array for record of APChain /
for T is not Ø do
/ S00001 /
/ G00001 /
/ 58.203.xxx.xxx /
/ port number /
/ 211.106.xxx.xxx/
/ port number /
/ D0-27-88-47-15-4B /
/ 2016.08.12 07:05:28 /
/ www.cnn.com /
/ the cycle interval between the previous packet and the current packet /
Function CND (C,T):
for do
if then
return accVal
Function RATD (C,T):
if then
return interVal
Function CNTI (C,T):
for do
if then
return accVal
Function ACTTI (C,T):
for do
if then
return

The behavioral profiling algorithm in Table 2 consists of case studies for three representative attack types. The first case analyzes the RATD, CNTI, ACTTI, and SCTTI field values of APChain and confirms whether there is communication with a C&C server to detect a C&C channel. The second case analyzes the URL, RATD, and CNTI fields of APChain and checks for website tampering in order to detect pharming attacks. The third case analyzes the RATD, CNTI, ACTTI, and MAC fields of APChain and checks for IP-spoofing by calculating the traffic frequency in order to detect IP-spoofing DDoS botnets. Figure 3 presents the abnormal behavior detection process of APChain-based behavioral profiling in a real-time network environment.

3.4. Elimination of Whitelist-Based False Positives

In this paper, we target the early detection of abnormal behavior using APChain and behavioral profiling. However, because the proposed methodology utilizes the attribute information of network traffic, false positives may occur when normal traffic is included in the detection results. For this reason, the proposed algorithm exchanges data with the external system or categorizes normal traffic such as web service calls based on a whitelist and eliminates them from the analysis [16, 17].

The IP addresses registered on the whitelist are classified into two types. The first type is those that are part of the Internal Whitelist, which has packet attributes (e.g., Sip, Dip, and interval) regularly exchanged between the internal and external environments. When batch scripts such as crontab are used in the communication with the external environment for data exchange, the Internal Whitelist compares and analyzes the IP address, port, and execution cycle and saves the analysis in a file.

However, it is difficult to detect abnormal behavior when it is registered with the Internal Whitelist because it is categorized as trustworthy communication. Therefore, the condition of (1) is periodically checked to detect the abnormal behavior of the hosts registered in the Internal Whitelist; any abnormal behavior detected will be excluded from the Internal Whitelist.The second type of IP address is those associated with sites on well-known Global Whitelists, such as those compiled by antivirus software companies. Through the continuous updating of the Global Whitelist, new C&C IP addresses are added, and IP addresses and domain names are included. Figure 4 outlines the elimination of whitelist-based normal IP addresses from the behavior profiling process.

The elimination process for whitelist-based false positives is shown in Figure 4. The traffic from January to June 2017 collected in the experimental environment presented in Figure 10 is analyzed and whitelist-based IP addresses involved in normal communication are eliminated from the analysis.

3.5. Characteristics of C&C Channels and Their Detection Method
3.5.1. Characteristics of C&C Channels

Attackers use either an encrypted communications channel or an alternative communication method to hide C&C channels. When hosts infected with malicious code establish an Internet-enabled communications environment, a channel is created to communicate with a C&C server and, through this, the infected host receives an attack command from the C&C server or extracts vital data from the host on its internal network [7, 1820]. A C&C server can have at least 1 and up to N number of C&C channels with hosts and involves repeated connection and standby requests. Because the communication cycles of C&C channels differ depending on the characteristics of the malicious code, it is impossible to detect all C&C channels in categorized regular time intervals. Even though a C&C channel may exist between a host and a C&C server according to the configuration conditions, this does not mean that communication will always occur on a regular cycle. Moreover, a host infected with malicious code downloads the IP address of a new C&C server from the original C&C server and creates a communication channel with this new server. Figure 5 shows the connection and standby cycles of a C&C channel connection to a Linux/Xor.DDoS botnet.

3.5.2. C&C Channel Detection Method

Analysis of the real-time traffic occurring in the experimental environment allowed us to categorize C&C channels into three types. The first type either engages in socket communication with the external server via the Internet to exchange data or uses a file transfer program such as the File Transfer Protocol (FTP) to exchange data with anonymous or authenticated users. In instances such as this, in which data is exchanged with the external system, a high connection frequency is observed at regular time intervals. The second type is when a connection is made to the website through browsers to surf the web. The host connects to a specific website and regularly requests a service or requests a random service from an unspecified website. In the case where a service is requested via a website, the connection frequency has significant variation at random time intervals. The third type is when a host is infected by malicious code upon connecting to a botnet or malicious code distribution site and attempts to communicate with an external network. The infected host configures a communication channel with the C&C server and either receives an attack command or updates the configuration file. When communication with a C&C server occurs, the connection cycle varies according to the characteristics of the malicious code and exhibits a high connection frequency.

Further analysis of the network traffic characteristics over time confirms that, according to the traffic type, a difference occurs between the connection frequency and connection time intervals. Figure 6 presents the connection frequency and intervals for three types of traffic: a C&C channel used to communicate with a C&C server, communication with an external file exchange system to exchange data, and website service requests made upon connecting to a website. If it were possible to analyze the traffic collected in real time, as was done in the experimental environment, it would be possible to detect attacks that have singularities such as those of the C&C channel.

In this paper, the following attributes are used to analyze the characteristics of a C&C channel and its connection frequency to configure APChain: CND, RATD, ACTTI, and SCTTI. The time interval for the analysis of the traffic attributes is set to 30 minutes. The reason for this is because the average lead time for a host infected by malicious code to perform malicious activity is three hours. Therefore, the goal is to detect abnormal behavior using the proposed algorithm before the infected host can initiate malicious activities.

3.6. Characteristics of Pharming

Pharming is a type of cyber-attack in which specific domain names such as a cache DNS server are configured to use a forged IP address. Users input a normal website address to request a web service but are instead connected to a fake site created by the attacker. The fake site has the same form as the normal website and obtains personal information such as a user’s login credentials and account information. Therefore, an effective method to detect pharming attacks is to identify real websites and detect whether the DNS cache information has been falsified or not.

The proposed methodology analyzes the traffic attribute information over time and categorizes pharming attacks. For example, when a host is infected by a pharming attack, even if it were to request a normal website, as in Figure 7, a fake website is delivered with a falsified domain name and, as (2) shows, the number of calls to the fake website increases at a specific point in time.In this paper, to detect pharming attacks, the URL and IP address of the website the host is connecting to and the connection time are analyzed to configure APChain. APChain’s URL, Dip, CND, RATD, and CNTI are used as key elements in behavioral profiling to detect pharming attacks.

3.7. Characteristics of IP-Spoofing DDoS Botnets

Because of the increase in network bandwidth and the development of hardware, recent DDoS attacks are generating larger volumes of traffic incomparable to what has been seen in the past [21, 22]. Malicious code such as IP-spoofing DDoS botnets Linux.Shelldos and Linux.Xor.DDoS receives attack commands from a C&C server and generates large volumes of packets threatening to paralyze communications.

Although security systems such as firewalls are set up to detect and block this type of attack, if a host infected by malicious code were to manifest large numbers of DDoS attacks, then even if a security system were to detect the DDoS attacks and block them, the internal network could be paralyzed due to the large volume of traffic. Therefore, if a host infected by an IP-spoofing DDoS attack is not detected and eliminated quickly, the internal network will be subject to the threat for a very long period of time.

Figure 8(a) shows that the internal network is affected by a large amount of traffic originating from a host infected with a DDoS botnet. Figure 8(b) presents the traffic from the internal network using PRTG software. The infected host generates a large volume of traffic from the host to the external server and failures occur on the internal network. Figure 9 shows the packets generated from a host infected by an IP-spoofing DDoS botnet.

In this paper, to detect IP-spoofing DDoS botnets, the attributes of outbound traffic are analyzed, and the following values are used to configure APChain: Sip, Sport, MAC, CND, RATD, CNTI, ACTTI, and SCTTI. Additionally, behavioral profiling is used to confirm whether IP-spoofing has occurred or not and to identify the host IP infected by the botnet.

4. System Model

In this section, we outline the proposed algorithm for detecting abnormal behavior and examine the abnormal behavior detection methods according to attack type in the form of case studies.

4.1. Collection of Network Traffic

Port mirroring is configured to the backbone switch for the collection of real-time network traffic. Port mirroring is set up to collect all of the traffic routed from the backbone switch to analyze the attribute information. Figure 10 presents a diagram outlining the collection of real-time network traffic.

4.2. Extraction of Attribute Information

Attribute information extracted from real-time traffic in the experimental environment presented in Figure 10 is stored in the flow log database (FLDB). The FLDB stores the features of the attributes extracted from the traffic headers, both normal and malicious traffic (e.g., Trojans, botnets, etc.) is included.

The collection of network traffic collected at the backbone switch is represented as T=t1, , tn-1, tn, and the traffic attribute information is defined as t=tip, tport, ttime, tmac, , turl. The size of the attribute information stored in the FLDB is and is cumulatively calculated if new attribute information is stored in the FLDB. The protocol stack for extracting packet attribute information is shown in Figure 11.

4.3. Attack Pattern Chain (APChain) Creation

The APChain algorithm configures a chain of traffic attribute information over time. Behavioral profiling using APChain is then conducted to detect abnormal behavior. Table 3 shows the algorithm used to configure APChain using traffic attribute information extracted from the protocol stack (Figure 11).

The source IP, source port, destination IP, destination port, access time, and MAC value are extracted from the network traffic, and the URL information from the payload is analyzed to configure APChain. Additionally, the connection time and connection frequency of traffic that have the same target IP address are calculated and stored within APChain.

APChain record is 102 Bytes in size, and the same number of APChain records is created as the number of transfers inbound to outbound. Figure 12 shows the results of APChain creation using attribution information from the collected traffic.

4.4. Abnormal Behavior Detection Using Behavioral Profiling

In this section, we will review the process and algorithm for behavior profiling using APChain and investigate the detection of abnormal behavior using three case studies.

4.4.1. C&C Channel Detection [Case Study A]

Hosts are infected with malicious code by malicious code distribution sites, phishing emails, or social networks. Infected hosts attempt to connect to receive an attack command from a C&C server, to update the C&C server list, or to update the binary file. At this time, the C&C channel uses random ports higher than the known ports and configures a channel with at least one C&C server. Therefore, one method to effectively block botnet attacks is to detect the communication channel with the C&C server and eliminate it before the host infected by malicious code can manifest an attack. The hypothesis that we established for C&C channel detection is presented in Table 4. C&C channel detection results are verified in the experiments.


Hypothesis 1. C&C channel detection

Given an environment:
Let , where t represents the network traffic, and is the traffic currently being analyzed.
Let , where h represents a host infected by malicious code.
Let , where s represents a C&C server.
HS, the infected host attempts to connect to the C&C server.
A host infected with a botnet creates a C&C channel in order to communicate periodically with the C&C server. As a result, the frequency that a host connects to a particular system increases, and if this pattern is repeated often enough, it can be considered to be unusual traffic. At this point, the host and the C&C server receive attack commands or update binary files while repeating the connection requests and responses.
Therefore, the set (G) of C&C channels can be formed by detecting and grouping the requests from hosts connecting to the C&C server.
G = H1S1, H2S2, , , HnSn

The commands used to trigger communication between the host and the C&C server are included in a botnet, and the communication method for the creation of the C&C channel and connection cycle are configured. Therefore, if we can detect hosts infected by malicious code connecting to a C&C server, then we can block the attack before the actual attack can be manifested.

In this paper, the following process is followed to detect a C&C channel between a host infected by a botnet and a C&C server. First, it confirms if the traffic to be analyzed contains an IP address included as part of the whitelist-based false positive elimination process. If the IP address is included on the list, it is categorized as normal traffic, and if not, it is analyzed to determine abnormal behavior. Second, it calculates the frequency (CNTI) of the host connecting to the target system and compares it with the threshold value. If the CNTI value is higher than the threshold value, then it concludes that communications are occurring regularly and moves on to the next step. Third, it compares the connection time interval (RATD) for traffic that has the same target IP address with the average connection time interval calculated every 30 minutes. At this time, the standard deviation of the average connection time interval (SCTTI) is also calculated, and if the connection time interval (RATD) is , then the corresponding traffic is suspected to be a C&C channel. Figure 13 displays the distribution of the connection frequencies and connection cycles of traffic suspected to be C&C channels. Finally, if the connection frequency of a bot to a suspected C&C channel exceeds the threshold value, it is categorized as abnormal traffic behavior.

Normal packets and abnormal packets can be categorized by analyzing the network traffic collected in the experimental environment shown in Figure 10, and traffic suspected as being a C&C channel can be categorized according to the hypothesis defined in Table 4.

Figure 13 presents the connection frequency between a host and a suspected C&C channel. If APChain’s ACTTI is A and SCTTI is S with APChain connection frequency as x, then, for the traffic suspected as being a C&C channel, C is C=. Here, ACTTI is used as the reference value for C&C channel detection. A fixed value is not used; it varies according to the connection cycle of the target IP address. Table 5 presents the algorithm used to detect a C&C channel. The proposed algorithm was developed into a prototype, and its performance is verified in the experiment and evaluation sections of this paper.


Algorithm 3. C&C channel detection

/ T set of packets /
/ C set of APChain fields /
Function Behavioral_Profiling (T):
While (not stop condition) do
for do
if ( then
if then
if then
if then
return

4.4.2. Pharming Attack Detection [Case Study B]

Pharming attacks target specific individuals or organizations and infect them via spear phishing or malicious code distribution sites. Pharming attacks falsify Windows environment files or DNS addresses, so even if a host infected by pharming were to call a fake site, the intrusion response system would categorize the corresponding traffic as normal. The hypothesis for detecting a pharming attack is shown in Table 6.


Hypothesis 2. Pharming attack detection

Given an environment:
Let , where t represents the network traffic, and is the traffic currently being analyzed.
Let , where h is a host infected with malicious code.
Let , where s is a fake website.
HS, the infected host connects to the fake website.
The user requests a legitimate website, but a host infected with pharming redirects the connection to a fake website. At that time, when requesting the URL of the legitimate website, a connection to a specific destination IP address may be suspected as a pharming attack.
Therefore, we analyze requested URLs from hosts and compare them to the IP addresses of destination websites. The set (G) contains hosts that request the same destination IP addresses.
G = H1S1, H2S2, , , HnSn

The algorithm to detect a pharming attack proposed in this paper analyzes whether the site that the host connects to is fake or not. For example, when the host attempts to connect to a website with a specific IP address through different domain names, it can be suspected of being a pharming attack. The average time interval of the connection to a fake website is RATD, and the frequency is CNTI. When the threshold is y, traffic suspected as pharming, P, is represented as . Table 7 presents the algorithm used to detect a pharming attack through behavioral profiling.


Algorithm 4. Pharming attack detection

/ T set of packets /
/ C set of APChain fields /
Function Behavioral_Profiling (T):
while (not stop condition) do
for T is not null do
if then
if then
/ Analysis of connections to websites suspected as
pharming and connections to a specific target
IP address /
return pharming
Function n-gram(C):
if then
/ Accuracy is increased using the n-gram algorithm /
return true

4.4.3. IP-Spoofing DDoS Botnet Detection [Case Study C]

As advances in network infrastructure and hardware have progressed, the large volume of traffic generated from a small number of systems has become the main threat to the stability of an internal network. In particular, if malicious codes such as Windows or Linux-related DDoS botnets are not detected in time, then large volumes of traffic would be generated on the internal network, and we would face the threat of the entire network shutting down. Therefore, it is imperative to detect and effectively eliminate hosts infected with malicious code. However, it is not easy to detect hosts that manifest IP-spoofing DDoS attacks, and related research is limited. In this paper, APChain is configured, and we propose a method to detect IP-spoofing DDoS botnets through behavioral profiling. To maintain a communication channel with a C&C server, a host infected with an IP-spoofing DDoS botnet attempts to connect and, if successful, moves to a standby state to receive attack commands. If the host receives an attack command from the C&C server, it generates a large volume of dummy packets to send to the attack target destination. At this time, the source IP address of the packet uses a spoofed IP address and dummy data included in the payload and transfers them to the attack target system. The proposed methodology uses APChain’s connection frequency (CND), connection time interval, and average connection time interval (ACTTI) values to conduct behavioral profiling and detect DDoS attacks. The hypothesis employed to detect IP-spoofing DDoS botnets is shown in Table 8.


Hypothesis 3. IP-spoofing DDoS botnet detection

Given an environment:
Let , where t represents the network traffic, and is the traffic currently being analyzed.
Let , where h is a host infected with malicious code.
Let , where d represents a target system for DDoS attack.
HD, the infected host executes a DDoS attack on the target system.
A host infected with an IP-spoofing DDoS botnet receives an attack command from the C&C server and implements a DDoS attack, and the origin IP address of the host attacking with DDoS is modified. At that time, a DDoS attack can be suspected if the host sends large amounts of traffic to the destination system. Also, an IP-spoofing DDoS botnet can be categorized if a particular host has a different origin IP address but the same MAC address.
Therefore, the set (G) consists of hosts that perform IP-spoofing DDoS attacks.
G = H1D1, H2D2, , , HnDn

The algorithm proposed to detect IP-spoofing DDoS botnets detects hosts that have different source IP addresses but that have the same MAC address. For example, when the traffic frequency for a host connecting to the target IP address is CNTI and the average connection cycle is ACTTI, if the threshold value is y, then the traffic is suspected to be a DDoS attack, D, where . At this time, if the traffic has different source IP addresses but the same MAC address, it is presumed to be an IP-spoofing DDoS botnet.

The method used to detect an infected host is to search for the MAC address suspected to be a spoofed IP address in the MAC field of the APChain table and to extract the record that was first registered in the APChain table to confirm the source IP address. It is assumed that the host with the same IP address as the source IP address is a botnet. Table 9 summarizes the IP-spoofing DDoS botnet detection algorithm using behavioral profiling.


Algorithm 5. IP-spoofing DDoS botnet detection

/T set of packets /
/ C set of APChain fields /
Function Behavioral_Profiling (T):
while (not stop condition) do
C Call Function APChain (T)
for T is not null do
if then
if then
accVal1 += 1
if then
if then
if then
return Call Function infection_host ()
Function infection_host ()
while (not stop condition) do
if then
return

5. Experimental Evaluation

The test bed was prepared as shown in Figure 10, and a prototype was developed to evaluate the performance of the proposed algorithm. Real-time network traffic collected from the test-bed environment and datasets downloaded from the Malware Capture Facility Project (MCFP) were used as data for the experiment. The dataset included botnets that create and utilize C&C channels.

In this section, the following four key aspects are covered:

(i) Explanation of the datasets used in the experimental environment

(ii) Evaluation of the accuracy of the proposed algorithm

(iii) Measurement of the performance of the developed prototype

(iv) Effectiveness and accuracy of the experimental results

5.1. Experimental Environment and Performance

The experimental test environment is divided into a server farm domain, a user domain, and a branch office. Real-time traffic is collected from the server farm domain and the user domain. The intranet bandwidth is 10 Gbps, and the Internet environment bandwidth is 2 Gbps. The experimental environment for performance evaluation is presented in Table 10.


EnvironmentDescription

Servers 100 active servers in a server farm

Host 1,500 active hosts in the internal network

Bandwidth10 Gbps internal network and 2 Gbps external network

Traffic flow 530 GB of data monthly

The prototype was implemented in an environment consisting of an Intel i7 8-core CPU, 16 GB of RAM, and an 8 TB HDD with Java used as the programming language.

5.2. Test Dataset

The datasets used in the experiment are network traffic collected in real-time from the test-bed environment and traffic representing malicious behavior from the MCFP [23, 24]. The datasets are explained below.

Dataset_1 collects traffic in real-time from the experimental environment shown in Figure 10 and analyzes it. The experimental environment comprises 100 servers and 1,500 hosts; the collected data includes network traffic from the server farm and user domains. The dataset stored collected traffic flow for a period of six months, from January 2017 to June 2017. It includes malicious code that corresponds to the attack types specified in the System Model section. The traffic flow consists of 150,000,000 records and is 3.185 TB in size; the characteristics are presented in Table 11.


EnvironmentDescription

Period of traffic collectionFrom January to June 2017

Number of records 150,000,000

Collected record size 3.185 TB

Servers and Hosts 100 servers, 1,500 hosts

Flow typeBotnet, normal

The scenarios used in Dataset_1 are summarized in Table 12. Dataset_1 includes header information (source IP, Destination IP, Protocol, etc.), MAC address, and URL of packets extracted from the network traffic. It includes both normal and malicious traffic.


ScenarioCapture nameSizeThreat type

S01KU-Malware-012.6 GBBot

S02KU-Malware-024.3 GBBot

S03KU-Malware-031.4 GBBot

S04KU-Malware-04412 MBBot

S05KU-Malware-051.3 GBPharming

S06KU-Malware-065.4 GBPharming

S07KU-Malware-07354 MBPharming

S08KU-Malware-084.3 GBDDoS bot

S09KU-Malware-093.8 GBDDoS bot

S10KU-Malware-102.8 GBDDoS bot

Scenarios S01 and S02 are packets installed with malicious code through a backdoor from a seized account in a system located in the DMZ that includes a communications channel with a C&C server. S03 has a web application system (WAS) administrator account set up with initial values. After the system is seized by an attacker, a webshell is uploaded to install malicious code. S04, S05, S06, and S07 are a scenario in which a host connected to a malicious code distribution sites is infected with a malicious code. Scenarios S08, S09, and S10 involve a host infected with malicious code which is maintaining a communication channel with a C&C server and conducting an IP-spoofing DDoS attack [25].

Dataset_2 is malicious code traffic distributed by the MCFP [23]. The MCFP is a research project created by the Czech Technical University ATG Group with the objective of capturing, analyzing, and distributing malicious code traffic, and it distributes datasets to assist in the development of various detection methods. The malicious code traffic distributed by the MCFP can be downloaded from its website [1] and includes a PCAP file, netflow file, and readme file. However, because the PCAP file contains personal information, it does not provide all data. The PCAP file used in the experiment targets traffic that contains information about the C&C server. The MFCP dataset used in the experiment is summarized in Table 13.


ScenarioCapture nameSizeBot

S11CTU-Malware-13305 MBMurlo

S12CTU-Malware-425.75 GBNeris

S13CTU-Malware-444.78 GBRbot

S14CTU-Malware-46371 MBVirut

S15CTU-Malware-473.05 GBMenti

S16CTU-Malware-786.3 GBZeus

S17CTU-Malware-116317 MBKazy

Dataset_2 in Table 13 is a Trojan horse that is installed via user e-mail, messenger applications, or reference libraries. Scenarios S12 and S13 are Neris bot and Rbot bot, and scenarios S14 and S15 are Virut bot and Menti bot. The PCAP file used for the analysis contains both normal and botnet traffic and was captured on the main router of the university network.

5.3. APChain Creation

The configuration of APChain is critical to detecting abnormal behavior and becomes an important factor with regard to detection speed and accuracy. APChain extracts attribute information from real-time traffic, performs additional analysis, and links attribute information over time. Figure 14 shows the time (GT) to generate APChain records by packet and the size (FS) of APChain table.

According to the experimental results in Figure 14, APChain algorithm requires a minimum storage capacity of 102 Bytes to 115 Bytes per packet to generate APChain table, and 100,000 packets require a storage capacity of 10.9 Mbytes. A minimum of 5 milliseconds and maximum of 8 milliseconds are needed to analyze the attribute information of the packets and to generate record of APChain table.

As a result, if it takes 6 milliseconds overall to generate APChain record, ACTTI and SCTTI fields account for 3 milliseconds and 2 milliseconds. Therefore, in order to apply the proposed methodology to a real-time network environment, the calculation of the ACTTI and SCTTI fields of APChain algorithm should be done effectively.

5.4. Performance Evaluation

In this section, the proposed algorithm is applied to the attack types defined in the system model in order to detect these attacks and evaluate its performance.

5.4.1. C&C Channel Detection [Case Study A]

Hosts infected by a botnet maintain a communication channel with a C&C server, receiving either an attack command or updating the configuration before the attack is initiated. Therefore, if it were possible to detect the channel between the host infected by malicious code and the C&C server before important information is breached, it would then be possible to detect or block attacks within the lead time needed for attacks to succeed. According to this, the hypothesis used to detect C&C channels is defined in Table 4, and the proof for the hypothesis is provided in Table 14. The performance of the proposed algorithm is verified using experiments, the results of which are presented in Table 15.


Proof 1. Detection of C&C channels

Given an environment,
, where h is a host infected with malware.
, where s is a C&C server.
If the connection cycle from the host to the destination system satisfies condition (x), then calculate the accumulated count (ac), and if the accumulated count satisfies threshold (), then this is defined as a C&C channel.