Abstract

As Advanced Persistent Threat (APT) becomes increasingly frequent around the world, security experts are starting to look at how to observe, predict, and mitigate the damage from APT attacks. In the meantime, the Internet of things devices are also risky and heavily exposed to the Internet, making them more easily used by hacker organizations to launch APT attacks. An excellent attacker can take down millions of Internet of things devices in a short time. Once the IoT botnet is built, attackers can use it to launch complex attacks which could damage Internet infrastructure and cause network disconnection. This paper proposes GroupTracer, a framework for observing and predicting the Internet of things attacks. GroupTracer is designed to automatically extract the TTP profiles (i.e., tactics, techniques, and procedures) that can describe the behavior of attackers through their tactics, techniques, and processes and dig out the potential attacker groups behind complex attacks. Firstly, it captures attacks by IoT honeypots and extracts relevant fields from logs. Then, attack behaviors are automatically mapped to the ATT&CK framework to achieve automatic TTP profiles extraction. After that, GroupTracer presents four feature groups, including TTP profiles, Time, IP, and URL features, a total of 18 features, mines potential attack groups through hierarchical clustering algorithm, and compares the clustering results with two baseline algorithms. As the ground truth labels are unknown, we apply three internal validation indexes to evaluate the cluster quantity. Experimental results showed that the proposed framework has achieved an excellent performance in exploiting potential groups as the Calinski–Harabasz index reaches 3416.93. Eventually, attack trees are generated for each cluster where nodes indicate attack commands and edges represent command sequences. These attack trees could help better understand each attack group’s actions and techniques.

1. Introduction

The Global Research and Analysis Team (GReAT) at Kaspersky points out that the Advanced Persistent Threat (APT) activity has become increasingly complex and destructive [1] since these APT groups launch targeted attacks on critical infrastructure and attempt to compromise central networks. Meanwhile, the Internet of things has become the no. 1 security threat to personal privacy, corporate information security, and even critical infrastructure since IoT devices are inherently risky and easy to exploit while being heavily exposed to the Internet. What is worse, attackers can utilize open-source tools to quickly assemble malware that can scan, penetrate, and control IoT devices. Excellent hackers can take down millions of IoT devices in a short time. Once IoT botnets are formed, attackers can launch an APT attack to hazard the Internet infrastructure and cause network disconnections (e.g., Dyn cyberattack [2] and VPNFilter event [3]). The emerging challenge is how to observe and predict attacks on IoT devices by individuals or even attacker groups since the number of attacks on IoT devices, which are perfect tools for APT attacks, has risen dramatically.

1.1. Describing Individual Behavior

While behavior detection methods for attacks are mostly based on Indicators of Compromise (IOCs) extracted from rule-based methods or traditional blacklists, the information conveyed by such IOCs is not enough to describe the abundant and varied network security environment due to the following reasons:(i)IOC is unstable and is easily changed by attackers. For example, if adversaries are leveraging an anonymous proxy service like Tor, they may change IPs quite frequently with little effort and never be noticed.(ii)IOC cannot express how the attacker interacts with the victim system, and the process of the attack cannot be represented.(iii)Redundancy occurs when IOC is used to express an attack. In other words, more IOCs do not necessarily lead to a better description.

Bianco proposed the Pyramid of Pain [4], in which each level of the pyramid represents different types of attack indicators leveraged to detect the activities of the adversary, and the most valuable attack indicator is attacker TTPs. TTP profile [5] describes the flow that adversaries go through to accomplish their mission, from initial access to impact and at every step in between, which is abundant to support a comprehensive analysis of the aggressive behaviors of individuals or attack groups. Meanwhile, the defense is shifting from vulnerability-centric to threat-centric, and flexible and efficient security architecture can only be constructed with a sufficient understanding of the threat of the critical assets, which depends on an overall comprehension of the attack tactics, techniques, and behavior patterns (i.e., TTPs). However, at this stage, there is no mature method to normalize the description of attacks on IoT devices and map them to the analysis model. A method for automatic TTP profile extraction of IoT device attacks is expected.

1.2. Clustering Attackers into Groups

With the rapid growth of APT activities, the evolution of a threat landscape moves from a single hacker to well-organized attack actor groups (e.g., Darkhotel [6] and Turla [7]). How to find and depict the behavior of an attack actor group among an ocean of attacks becomes a challenge. Behavioral analysis in sandboxes [8, 9] and binary analysis [10, 11] seem like pleasant ways, which can match malicious samples used by attackers to known or novel malicious families and capture their behaviors to observe the similarities between these attackers. However, malicious family and attack group have a many-to-many relationship, and we cannot just rely on the analysis of malicious samples to find the group behind attacks. Considering the excellent performance of the data-driven approach in the field of network security [1214], we try to tackle the challenges from a data-driven perspective.

Given the challenges presented above, this paper aims to develop mapping knowledge bases from attacker payloads to the ATT&CK framework to extract the TTP profile and generate behavior fingerprint for attackers to discover groups behind active campaigns. The ultimate purpose is to observe the behavior of attack actor groups and predict attacks in the Internet of things.

1.3. Contributions

Three critical contributions of the paper are as follows:(i)Comprehensive Description of Attacker Behavior. GroupTracer leverages four feature groups (TTP profile, Time, IP, and URL) that are derived from log data to characterize different actions of attackers, which addresses the emerging challenge of the observation and prediction of attacks on IoT devices by individuals. The TTP profile depicts the technique, tactic, and procedure of the attacker. The Time feature group provides statistical characteristics based on attack duration, number of attacks, and time zone of the attacker. The IP and URL feature groups both involve the type of IP/URL and the malicious index, while the latter also analyzes the download file.(ii)Automatic TTP Profile Extraction. Considering that the data source is honeypot log data, which collects payloads utilized by attackers, we construct the 1st and 2nd knowledge bases, which store the mappings between commands and TTPs. By using these knowledge bases, GroupTracer maps commands derived from payloads to the ATT&CK framework to extract the TTP profile, which bridges the gap between cyber threat intelligence (CTI) and the attacker.(iii)Group Cluster and Attack Tree Generation. GroupTracer proposes four feature groups and hierarchical clustering algorithm to build attack group cluster model which aims at finding out the potential groups behind complex attacks. In order to better understand each attack group’s behaviors, GroupTracer also introduces attack tree construction method where nodes describe attack commands and edges represent command sequences. The evaluation result shows that GroupTracer can achieve excellent performance as the Calinski–Harabasz index reaches 3416.93.

The remainder of this paper is organized as follows: Section 2 explains the related work about the fundamental techniques used in our framework. Section 3 presents the data collection, flow of feature processing, application of clustering algorithm, and attack tree creation in GroupTracer. The entire experiment and evaluation process is elaborated in Section 4. Finally, the conclusion and future work are discussed in Section 5.

2.1. Application of IoT Honeypot

To specialize in cyberattacks and defend against them, tools for proactive defense are presented. For instance, honeypot that can capture attacks, document intrusion information about instruments and behaviors of hackers, and prevent attacks outbounding the compromised system [15] has been widely leveraged in cybersecurity. Due to the vulnerable and destructive nature of IoT devices [1618], the number of IoT honeypots based on different protocols is rapidly increasing. Currently, some IoT honeypots have already existed [19]. The work in [20] utilizes IoTPOT, a novel honeypot that stimulates the Telnet-enabled IoT devices, which handles commands sent by attack actors, analyzes malicious families on different CPU architectures, and provides an in-depth analysis of ongoing attack behavior. However, IoTPOT focuses on observing the characteristics of malicious families (e.g., spread tendency and ultimate goal) and relationships between these families. It does not employ existing data to analyze the behavior of the aggressors behind attacks and associations between them in detail, which is the center of cyber threat intelligence. A honeypot that emulates the ZigBee gateway and aims at assessing ZigBee attack intelligence and IoT cyberattack behavior is proposed in the text [21]. Although this paper analyzes the commands in the honeypot data at great length and classifies them into six categories of attacks, it does not mine the TTP of these attacks, which helps analysts in threat modeling. Heo and Shin [22] analyze the connection-level log data to study Telnet service scanning to provide solid evidence for the existence of IoT botnet, whereas the dataset contains only connection metadata, so there is no way to analyze the payload in packages, which is a critical evidence for attacking, thus not entirely convincing.

In conclusion, most published methodologies have focused on a single service such as Telnet and ZigBee and analyzed features of malicious families. GroupTracer is more widely used for protocols where command execution vulnerabilities occur and depicts the characteristics of attack behaviors. Besides, most of the previous studies have not analyzed payloads at the TTP level, and some even have not analyzed payloads at all. GroupTracer extracts attack techniques, tactics, and procedures (TTPs) from payloads and utilizes payloads to build attack trees for potential attack groups to more specifically demonstrate their attack behaviors.

2.2. Cyber Threat Intelligence and TTP Extraction

Gartner defines cyber threat intelligence (CTI) as evidence-based knowledge, which can be utilized to inform decisions concerning the subject’s response to menace or compromise [23]. With the rapid evolution of the cyber threat landscape, the demand for high quality and fast speed of CTI exchange that allows organizations to respond to emerging threats at the tactical level is becoming increasingly urgent. TTP describes the techniques, tactics, and attack patterns used by the adversary and can be presented in structured text formats that meet the high demand. Husari et al. [24] develop TTPDrill that can achieve the automatic and context-aware analysis of CTI to generate TTPs precisely. Their work bridges the gap between unstructured cyber threat intelligence and structured techniques, tactics, and procedures. Nevertheless, their data source is the cyber threat intelligence, which means that only after CTI is produced, can TTPDrill construct a complete attack pattern. Our work aims at decreasing the time-to-defend even more.

2.3. Group Cluster

Clustering and correlating have been studied extensively and are employed in a multitude of data-driven domains, including security and privacy areas [25]. In a similar direction to this paper, the work in [26] applies an unsupervised method to characterize and classify security-related anomalies and attacks that exist in honeypots without learning phase, labeled traffic, or attack signature database. Cho et al. [27] compare the similarity of the distributed domain to predict the same group, which provides the possibility of response to future attacks. Azevedo et al. correlate IOCs from different OSINT feeds and cluster them to obtain enriched IOCs. This work allows the identification of attacks that was impossible by analyzing IOCs individually. One work that inspires us comes from Ghiëtte et al. [28]. They dissect the SSH protocol to fingerprint tools based on cipher suites and SSH version strings, employing key exchange algorithms and SSH banners to cluster similar tool usage into collaborating individuals and even campaigns. However, as [4] said, adversaries can employ or create another tool that has the same capability to evade detection.

By comparison, GroupTracer employs honeypot log data, in which timestamps, IP addresses, and payloads sent by attackers are usually recorded, and considers four different perspectives (e.g., TTPs and Time) to generate feature groups. For example, it generates TTP profiles by mapping payloads to ATT&CK framework based on command characteristics. Then, it clusters similar adversaries’ behaviors into groups based on these features. Our work draws on the strengths of the mentioned studies and improves their weaknesses.

3. Framework

The ultimate goal of this paper is to automatically extract TTP profiles and cluster attack groups in the Internet of things. Figure 1 overviews the flow of GroupTracer. Firstly, it captures attacks, generates raw data, and extracts features from specific fields (e.g., timestamp, payload, and timezone). We deploy numerous honeypots on the Internet to capture attacks. Secondly, it enriches these features. As for generating the TTP profile feature group, it cuts payload into commands, maps these commands to the ATT&CK framework, and then generates Abstract Syntax Tree of the commands for a second mapping to techniques and tactics. After generating all feature groups, encoding and TF-IDF algorithm are utilized to vectorize these string-type features. Thirdly, it combines all feature vectors and leverages the hierarchical clustering algorithm to cluster these attackers into groups. Finally, attack trees for each group, where nodes are commands and edges are command sequences, are created from their payloads to characterize attack profiles.

3.1. Raw Data Collection

The log format of open-source honeypot contains general fields and particular fields. There are 12 general fields standard in all honeypots. is the source IP, and is the source port number. and represent the IP and destination port number of the honeypot, respectively. represents the time when the event occurred. represents the underlying protocol used by the honeypot. describes the current geographic location of the IP and related information like time zone, which is usually obtained by the external API GeoLite [29]. is added by the honeypot developer to indicate the current version of the honeypot. is mostly a hash value generated from a string of , , and a specific honeypot-specific field. depicts the type of event recorded in the log, and describes the type of honeypot. represents the ID of the docker container. The particular field is determined according to the specific protocol used by the IoT honeypot. GroupTracer mainly leverages three general fields, namely, src_ip field, timestamp field, and geoip field in honeypot log data, as these three fields can provide information to generate the Time and IP feature groups. Given that the payload may appear in different fields, GroupTracer will accurately locate the corresponding field through string matching. To ensure the universality of the framework and the integrity of the payload, the contents of all fields that have payloads are spliced.

3.2. Feature Extraction

The following subsections detail how GroupTracer converts these fields into four feature groups, namely, the TTP profiles, Time, IP, and URL. src_ip field is considered to be the primary key in all fields because the probability of an IP being used by multiple groups is minimal, even if the individual IP is assigned dynamically.

3.2.1. TTP Profile Feature Group Generation

The TTP profile consists of tactics and techniques used by attack actors. Tactic depicts the common strategy of a threat action (e.g., execution and defense evasion). ATT&CK framework provides 12 categories for corresponding techniques. GroupTracer extracted these names as tactics in TTP profiles. Technique describes attack techniques implemented by attack actors under a specific tactic. For example, is a tactic that can be performed by a technique named .

GroupTracer produces the TTP profile primarily from a command execution perspective. There is a crucial issue to be compromised. On the one hand, the classification of commands should be as accurate as possible. On the other hand, command parameter values sometimes affect the classification needlessly. The following two commands illustrate both cases. Both of these commands can be classified as technique file deletion. (i) can be further subdivided into technique clear command history, whereas (ii) can only be classified as technique file deletion. For (i), the whole statement is a valid classification basis, while for (ii), only rm can serve as a valid classification basis, and all other parts are redundant.(i)rm -rf ˜/.bash_history(ii)rm -rf xb.sh xb.sh xb2.sh xb1.sh

Aiming at solving the above dilemma, the quadratic mapping method is proposed. Figure 2 illustrates the process of quadratic mapping. There are two knowledge bases in GroupTracer. One saves the mapping of the entire command statement to tactics and techniques for the first mapping, and the other stores the mapping of the abstract command structure to tactics and techniques for the second mapping. In the first mapping, GroupTracer splits the original payload into several commands and maps these commands to the 1st knowledge base to attain some of the TTP profiles. As shown in Figure 3, GroupTracer generates the Abstract Syntax Tree [30] for each payload to obtain command nodes and extract abstract command structures. Then, these abstract structures are put as input into the 2nd knowledge base to acquire the remaining TTP profiles. Table 1 only shows some mapping samples due to space limitation. After obtaining the TTP profiles, the GroupTracer encodes the corresponding string into a numeric feature vector. The final data structure can be described as follows:where there are variable-length techniques and tactics.

3.2.2. IP and URL Feature Groups Generation

The features related to IP and URL feature groups are shown in Table 2, the first three of which are common to both feature groups, and the last one is unique to the URL feature group. The country can be obtained through Ipdb [31]. VirusTotal [32] provides multiple antivirus scanning engines (e.g., Kaspersky URL advisor, Malware Domain Blocklist [33], and Dr.Web Link Scanner [34]) used for URL scanning. GroupTracer first utilizes VirusTotal to scan for unknown IPs/URLs and then regards the number of antivirus engines that return malicious results as the malicious index for those IPs/URLs. According to the purpose, there are seven types of IP addresses shown in Table 3. This framework uses the RTBAsia API [35] to get the classification of IP types. Download is an optional option for the URL feature group, depending on the type of vulnerability in the honeypot, because in some cases the download is to install a backdoor for subsequent manipulation, while in others it is to provide some tools for privilege escalation under certain circumstances. When attacking different honeypots, it is likely that hackers in a group download file for different purposes, which will greatly affect our performance in clustering only by downloading file names. After attaining these feature groups, the GroupTracer encodes the corresponding string into a numeric feature vector.

3.2.3. Time Feature Group Generation

As the attacker groups tend to use the tool framework to attack the specified target, there is a corresponding regularity in the IP time zone, attack duration, number of attacks, etc. Algorithm 1 describes the generation of the basic time-series features for a given , which represents the set of timestamp for a specific IP. is a collection of used as input to generate the start time, the attack duration, and the number of attacks in each duration. In addition, GroupTracer reuses this code to select the final threshold. When selecting threshold, we nest a for loop in the outermost layer, and the threshold value increases by 1. Meanwhile, we count the number of in each and assign their sum to the variable . If has not changed compared to the previous loop, we jump out of the cycle and output the final threshold.

Require:
Ensure: Basic Time-series features for all IPs
(1)
(2)for alldo
(3)
(4)
(5)fordo
(6)  
(7)  ifthen
(8)   
(9)   
(10)  else
(11)   
(12)   
(13)   
(14)   
(15)   
(16)  end if
(17)end for
(18)ifthen
(19)  
(20)  
(21)end if
(22)
(23)end for

The threshold, which indicates how small the time interval between two attacks is before they are considered to belong to the same attack period, is utilized to divide the attack duration. Partitioning each attack period and characterizing the behavior (e.g., duration and number of attacks) of each attack steadily can be the key to the reliability of time analysis, since the members of a group may be similar in these respects. To select a number as the initial threshold, GroupTracer maps all the time interval values into the following 5 time buckets: <1 s, [1 s, 1 min], (1 min, 1 h], (1 h, 1 day], >1 day. The results show that 99.91% of time gaps fall into the first four buckets, so the initial threshold is set to 1 h. Then, GroupTracer employs Algorithm 1 to accomplish the threshold selection procedure. It first counts the total number of attack durations for each IP and then adjusts the threshold slowly until the number of attack periods for most IPs is almost unchanged. If multiple thresholds have the same result, choose the smaller one first, as the threshold always tends to choose the smaller one.

After we get the final threshold and basic time-series features, we extract the statistical features from the basic ones, namely, the start time, the attack duration, and the number of attacks in each attack duration. GroupTracer draws a new time series for the relevant features of each IP, taking the start time as the independent variable and the attack duration and number of access as the dependent variables. Several features that proved to be significant time-series characteristics in an early stage shown in Table 4 are applied. After generating the time series, GroupTracer will automatically calculate these selected features to produce the final feature vectors.

By encoding the field appearing in the data, GroupTracer finally turns the string-type features into integer vectors. Eventually, statistical features from and encoded time zone constitute the Time feature group.

3.3. Group Clustering Algorithm

As the hacker groups tend to utilize customized frameworks, features like time gap and TTP profiles have specific patterns. Therefore, attacker behavior naturally forms clusters. This paper is aimed at identifying such natural clusters to dig out potential groups behind active campaigns.

We employ the well-known hierarchical clustering algorithm [36] as it captures the hierarchical structure between clusters, which helps security experts to observe the relationship between clusters and subclusters. Moreover, hierarchical clustering is suitable for arbitrary shape clustering and is insensitive to the input order of samples. Figure 4 depicts a bottom-up approach to perform hierarchical clustering. Each sample represents a unique cluster in the beginning, and then we choose Euclidean distance to calculate the similarity between each cluster and merge these clusters successively. The threshold applies when forming the final flat clusters. The Euclidean distance is computed as follows [37]:where and are two points in Euclidean n-space.

3.4. Attack Tree Creation Method

After digging out potential groups, GroupTracer gathers all payloads and generates attack trees for each cluster to embody and better understand group behaviors. Algorithm 2 is to process all payloads from a cluster. represents all payloads collected from a given cluster .

Require: Payloads
Ensure: Attack tree for
(1) []
(2) {}
(3)for alldo
(4)()
(5)()
(6)ifthen
(7)  fordo
(8)   
(9)  end for
(10)else
(11)  
(12)end if
(13)end for
(14)for alldo
(15)
(16)end for
(17)()
(18)
(19)

In the directed graph, nodes are command names and edges represent the command sequence between two commands. The out-degree of each command determines the size of each node. If a command exists only at the end of the payload, its size can also depend on the in-degree. The width of each edge is decided by the weight, which describes the frequency of occurrence of the command sequence. For instance, we have some payloads from a cluster, whose command sequences shown in Table 5 are obtained after command segmentation and abstraction. GroupTracer runs Algorithm 2 to generate the attack tree, as illustrated in Figure 5. Line 3–line 13 generate a list that stores two-step command sequences. For example, it turns into . Line 14–line 16 counts the number of occurrences of all two-step command sequences.

4. Experiment and Metrics

4.1. Dataset

In this section, we first describe the datasets obtained from two kinds of IoT honeypots: UPnP-SOAP multiport honeypot and the Netis router backdoor honeypot. The time of data collection, the number of IPs, and the number of log entries are shown in Table 6.

4.1.1. UPnP-SOAP Multiport Honeypot

In the UPnP service, SOAP protocol assists in defining device types and other related information [38]. Therefore, there are a huge number of IoT devices that provide SOAP services, some of which do not require authentication. Honeypot simulates the behavior of the 11 ports most frequently scanned (e.g., 52881, 5500, and 2048) and the relevant SOAP service path and returns the corresponding information after being scanned. Six nodes are deployed on the Internet, averaging about 700 log entries per day. The dataset contains 153,413 log entries from 2,652 IPs over 196 days in 2019 (Table 6). Each log entry is identified by , , , , and . The in our datasets is globally unique.

4.1.2. Netis Router Backdoor Honeypot

The Netis router listens on port 53413 (UDP) by default. After sending a specific string to it, the attacker can gain root login and then execute the corresponding command to perform a series of malicious behaviors. The honeypot has seven nodes deployed on the Internet, averaging about 300 log entries per day. Our Netis dataset contains 241,593 log entries from only 373 IPs over 279 days in 2019 (Table 6). Unlike UPnP-SOAP honeypot, the log entry in Netis is characterized by , , , and .

Given that both types of honeypots are simulated command execution vulnerabilities, we can leverage commands executed by attackers to discover their technology and tactics. Therefore, GroupTracer can be utilized to analyze the data of these two honeypots at the same time. Our datasets contain ten types of techniques that can be grouped into six tactics. These tactics are as follows [39]:(i)Defense evasion: avoiding being detected while adversaries are intruding on the victim system.(ii)Discovery: figuring out the environment of the victim system.(iii)Lateral movement: moving through the victim environment. Adversaries might download their tools from remote servers to achieve lateral movement.(iv)Execution: running malicious code. The purpose of adversary-controlled code might be communicating with C&C servers or stealing data.(v)Impact: expanding the impact of the intrusion on the victim system.(vi)Collection: gathering information related to the adversary’s objectives.

Table 7 shows techniques corresponding to each tactic and commands mapped to these techniques in our datasets.

4.2. Clustering Performance Evaluation

After clustering, we need measurement indicators to evaluate the effect. In general, the measurement for the quality of a clustering algorithm can be categorized into two kinds of criteria [40]: internal validation and external validation. External criteria are based on the previous knowledge about the data and require that ground truth labels are known. However, the labels of samples in this paper are not available. Thus, the internal validation is more suitable for our evaluation. More specifically, three internal indexes are utilized in this evaluation. These indexes measure if clusters are well compact and separated.(i)Calinski–Harabasz (CH) [41]:The index is defined as follows:where denotes the between-cluster scatter matrix and is the within-cluster scatter matrix. is the number of samples, and represents the number of classes. The larger value of indicates a better clustering solution.(ii)Silhouette Coefficient [42]:The Silhouette Coefficient is composed of two scores: means distance between a sample and the rest in the same cluster. is the distance between a point and all other samples included in the next nearest class. for all samples is given as the mean of the Silhouette Coefficient for each point. As for single sample , can be computed byThe Silhouette Coefficient index ranges from −1 to 1; −1 represents a weak clustering effect, and 1 means a good classification effect. 0 indicates the overlap of clusters. A higher indicates a better clustering quality.(iii)Davies–Bouldin (DB) [43]:

DB can be measured as follows: denotes the number of clusters. , represent different cluster labels (). and are the distance from all samples in cluster and to their respective cluster centroids. is the distance between these centroids. A smaller DB value means a better clustering result.

4.3. Experiment Design

In the following, we evaluate GroupTracer by examining the cluster quality, i.e., how well clusters capture similar attack actors. The primary evaluation is for the group clustering of GroupTracer. We compare the evaluation results of GroupTracer based on the three indicators we mentioned above with the performance of group clustering based on the other two baseline algorithms to conclude that GroupTracer has excellent performance.

4.3.1. Comparison Baselines

Meanshift [44] and K-means [45] clustering algorithms are chosen to be the baselines. Meanshift is a centroid-based algorithm that requires an iterative step. It continuously calculates the expected moving distance of the center point and moves until the final condition is reached. For a given sample set, K-means divides it into K clusters, minimizing a criterion (e.g., within-cluster sum-of-squares). K is a positive integer number and must be predefined.

We first extract the required features according to Section 3.2 and convert them into usable feature matrices. Before performing the final group clustering, we normalize the feature matrices. The reason is that clustering algorithms all use a distance measure to determine if object is more likely to belong to the same cluster as object than the same cluster as object . These distance measures are affected by the scale of the variables. By putting all variables into the same range, we can weigh all variables equally, especially when the feature vectors are generated differently.

The main idea of normalization is to calculate the of each sample and then divide each element in the sample by the norm. The result of this process is that the of each processed sample is equal to 1. In the experiment, is set to 2 and of vector can be defined as [37]

For the purpose of eliminating the influence of irrelevant variables as much as possible, we leverage all data in the dataset for experiments of each algorithm. For GroupTracer, we iterate the threshold value to pick the threshold with the highest clustering quality. The number of iterations is 20. Then we generate multiple versions of K-means clustering to attain the best clustering solution (Calinski–Harabasz index). Meantime, we run the Meanshift algorithm with the window size changing to get the best effect.

4.4. Experiment Result and Discussion

The evaluations of GroupTracer and two comparison baselines using the metrics we mentioned above are shown in Table 8. When running each algorithm, the value of different independent variables (e.g., threshold and quantile) brings the most significant change to the Calinski–Harabasz index, so this index is considered as the primary reference index when evaluating each algorithm. As a result, the Calinski–Harabasz index of GroupTracer reaches 3416.93 when the threshold is set to 10, which is about three times the K-means model and 1.5 times the Meanshift model. Taking the Silhouette Coefficient index as the definitive reference, the performance of GroupTracer is still the best. Although our algorithm performance is not the best after taking the Davies–Bouldin index into account, the gap is also within the acceptable range. GroupTracer generates 4 clusters, while the K-means model generates three classes. In the same way, the Meanshift model only generates 2 clusters for all data.

The evaluation of GroupTracer using the three metrics is illustrated in Figure 6, where the value of threshold ranges from 1 to 20. The Silhouette Coefficient index is on the rise until the threshold reaches 9. When the threshold value ranges from 10 to 17, the index stabilizes at a relatively high level in Figure 6(a). The Davies–Bouldin index shows a downward trend as a whole until the threshold reaches ten and starts to stabilize at the lowest point (0.7367). After that, the index starts to rise rapidly in Figure 6(b). When the threshold value is 10–17, the Calinski–Harabasz index is maintained at the highest level (3416.93) in Figure 6(c), and the number of clusters is also kept at 4. In summary, all three indicators show that when the number of clusters is 4, the cluster quality becomes the highest and reaches a stable state.

When the CH index reaches the highest point, the Silhouette Coefficient and the DB index both have the worst effect in Figure 7. Similarly, when the CH index reaches the highest level, the DB index is in a lower position in Figure 8. It can be seen in these two figures that the changing trends of these indicators are not matched, which means that the two baselines are not well applied to our datasets.

5. Limitation and Future Work

Our research proposes a framework that can dig out potential groups behind active campaigns. This new technique can make full use of information from attack campaigns. However, our current design is preliminary. We only focus on the IPs used by the potential groups and do not go any further to track which specific groups were involved, that is, to try to correspond to the real groups [46]. Moreover, GroupTracer can only deal with honeypot logs that contain attack payloads.

In future work, we expect to combine NLP techniques with cyber threat intelligence to precisely match these potential groups to the real-world APT groups, in which the attack tree may be helpful to extract a group profile. Moreover, an experiment to prove the effectiveness of the attack tree should also be carried out. Further, we expect to apply more data sources such as system log and network traffic to expand our knowledge base and perform more comprehensive analysis.

6. Conclusion

In this work, we propose GroupTracer, a framework for attack actors clustering from IoT honeypot logs. GroupTracer is aimed at extracting the TTP profile automatically and digging out potential groups behind active campaigns. By mapping payloads to the ATT&CK framework, GroupTracer can effectively extract structured TTP profiles using two knowledge bases. Besides, this framework leverages four feature groups (namely, Time, TTPs, IP, and URL), a total of 18 characteristics, derived from log entries to capture the natural hierarchical structures for attacker groups. Finally, GroupTracer constructs attack trees for each cluster to embody the group actions. In the experiment, we compare our algorithm with two baseline algorithms. The evaluation of 395,006 log entries from 3,025 IPs reveals the high performance of GroupTracer, in which the Calinski–Harabasz index reaches 3416.93. Moreover, our proposed framework is generalizable as it is from a log accounting perspective, so its application is not limited to the IoT honeypot.

Data Availability

The research data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by National Natural Science Foundation of China (61902265) and National Key Research and Development Program (2016YFE0206700 and 2018YFB0804503).