Abstract

With the expansion of cyberspace and the increasing importance of users for privacy protection, anonymous network research has been further developed, especially for the numerous internet of things (IoT) devices. However, when we repeat the existing experiment in an anonymous network, there are many problems such as too old version, low realism, poor operability, and so on. In this paper, we analyzed the design requirements and topology of the new experimental platform. Two topologies with different levels of complexity are designed. We also set up a practical anonymous network simulation platform called LUNAR with virtualization, software-defined networking (SDN), and other technologies to solve those problems. The platform we proposed supports multiprotocol and reproducible complex networks with centralized management. Finally, we implement our simulation platform and reproduce two typical attacks, that is, time-linked Tor node reset attack and website fingerprint attacks on The Onion Router (Tor) network, to evaluate the platform. Experiments results indicate the practicality and superiority of our simulation platform in terms of anonymous network simulation.

1. Introduction

Anonymous network is a privacy protection technology that can protect the private information of the users and the service providers [1, 2]. It can hide the true identity and location of one or both parties in the communication so that the attacker cannot know the sender and receiver of the data, decrypt the plaintext information or associate the transmission of information between the sender and the receiver. The global anonymous network represented by Tor [3] and I2P [4] is widely used in the fields of anonymous access, anonymous communication, and hidden services [5, 6]. These networks have millions of daily users and thousands of relay nodes. Users use anonymous network such as Tor to conduct web pages, browse, online communication, and virtual trade. Due to the characteristics of anonymity, the dark web [7] has two sides. On the one hand, it can be used to protect the privacy of Internet users; on the other hand, it can also be concealed criminal traces or other malicious behavior [8].

1.1. Related Work

Since the birth of the anonymous network, the research on its security about the simulation platform has never stopped [915]. For example, [16] proposes a method for connecting IoT devices in a client-server configuration to utilize the Tor network for addressing and secure communication between IoT and CPS devices. Reference [17] introduces a new approach to exploiting Tor’s anonymous communication to handle distributed attacks against smart devices on the Internet and demonstrates the effectiveness of Tor in IoT devices. Reference [18] designs a higher bandwidth Onion IoT gateway to provide robust security protection for vulnerable IoT devices hidden behind an IoT gateway. Reference [19] develops, investigates, and evaluates the performance of machine-learning-based darknet traffic detection systems (DTDS) in IoT networks. Reference [20] presents deep learning recurrent LSTM-based technique to classify the traffic over IoT-cloud platforms. Reference [21] realizes resource-conserving access control and end-to-end security for IoT devices and deploys onion routing for the IoT within the well-established Tor network enabling IoT devices to leverage resources to achieve the same grade of anonymity as readily available to traditional devices. SDN [22] can decouple the control plane from the data plane of forwarding devices, and [23] proposes an SDN-based solution to mitigate the privacy threats by anonymizing both MAC and IP addresses. Reference [24] studies ARP spoofing attacks based on SDN technology. However, this experiment is only limited to the ARP protocol and does not discuss other protocols or networks. Reference [25] simulates the dark network scene on the OpenStack platform to detect dark network resources and obtain dark network information. Reference [26] proposes a novel methodology for constructing a real private tor network by editing and controlling Raspberry Pis. However, there is little research on anonymous network simulation platforms design, mostly focusing on improving existing platforms. Reference [27] shows an anonymous network named Crowds. They use Peersim [28] for Crowds anonymous network simulation. In this kind of network, researchers can conduct anonymous tracking. But the author did not carry out a comprehensive design for different forms of networks. Reference [29] designs and implements a new Tor network simulation model based on Shadow [30]. The current simulation platforms have more or less different shortcomings. Some existing platforms are no longer updated, or they are currently no version available to use. Almost all the platform can only simulate or emulate part of establishing a network connection, which is not suitable for various simulation requirements. At the same time, the modification cost is high and has an impact on the simulation results. Table 1 are statistics for Tor network security research.

1.2. Our Work

In this paper, we design and set up a practical anonymous network simulation platform called LUNAR. The overall design principle of LUNAR is to establish an anonymous network simulation platform with convenient operation, large scale, and a high degree of reality under feasible resource constraints. Besides, the platform can ensure the correctness of the Tor network simulation operation, ensure that the virtual nodes can be linearly expanded, and form a large-scale simulation platform.

Our main contributions are as follows:(1)We summarize all experiments conducted on anonymous networks and classify them into five categories in Section 2.1, from which we find six network topology requirements of the simulation platform in Section 2.3.(2)We provide five design requirements for the current new anonymous network simulation platform in Section 2.2.(3)We devise two kinds of an anonymous network topology for our simulation platform, that is, a basic one for a single small-scale experiment and a more complex one for a complex anonymous network experiment in Section 3.1. Our proposed simulation platform is also compared with five previous ways to conduct anonymous network experiments in Section 2.1.(4)We implement our highly scalable simulation platform and reproduce two types of typical attacks, that is, time-linked Tor node reset attacks and website fingerprint attacks on the Tor network, on our simulation platform in Section 4. Experiments indicate our simulation platform can provide a practical anonymous network environment.

2. Problem Description

2.1. Compare with Existing Platform

The existing methods for conducting anonymous network experiments are roughly divided into several categories. Table 2 is the comparison of the advantages and disadvantages of five experimental platforms.(1)Experiments in real networks. The advantage is that the threshold for conducting small-scale experiments is low, and the realism is high. But it can only provide local experiment results, observation angles, and control methods, and it is impossible to implement a global experiment scenario [31].(2)Experiments with a large research network such as PlanetLab [32]. It consists of more than 1,000 servers distributed around the world. It can apply to several servers to form a network segment and deploy experiment software on it. But the operability is still limited, and experiments can only be carried out in a limited time frame and scale [31].(3)Simulation experiments, such as ExperamenTor [33]. It uses ModelNet [34] as a virtual machine and network simulation to realize the experiment environment. Although the efficiency is partially improved, because of the age, we cannot download the available version.(4)Emulation test, such as TorPs [35]. It simulates the process of the Tor network selection relay node to establish a link. TorPs is suitable for experiments related to improving or changing the link selection algorithm, but not applicable to operational experiments.(5)Semi-simulation and semi-emulation, such as Shadow [30]. It implemented a network layer emulation and application layer simulation test environment. However, the use of Shadow as a research platform requires a certain amount of modification cost, and the modification will directly affect the efficiency, accuracy, and intuitiveness of the experiment results.

Compared with our design, these simulation platforms make different trade-offs in extensibility, global control, operability, and cost, which cannot meet the higher requirements of modern new experiments. For example, Shadow implements a low-level takeover to meet scalability, but it is less efficient.

2.2. Design Goals

Combined with the existing simulation platform analysis, the new platform needs to meet the following design requirements:(1)Support multiprotocol. The experiment platform should support the environment reproduction of Tor, I2P, Freenet, and other anonymous networks designed or modified by themselves and can test and demonstrate the functions, performance, and security of these protocols.(2)Support native code. The simulation platform should directly install and run the original code or installation package of each anonymous network software and its modified variants. Ensure the accuracy of the simulation results.(3)Reproduce network conditions. The simulation platform can simulate different link bandwidths and network congestion, support background traffic, and support network monitoring, interference, and control at different levels (node level, network segment level, and self-made domain level). This satisfies the researchers’ needs for each dimension of the network layer.(4)Save resources. Although the current hardware resource cost is getting lower and lower, the simulation platform’s node size has also been greatly improved. As a result, it is still necessary to prevent the hardware cost from exploding with the scale expansion, making the demand linear with the node size.(5)Centralized control and observation. In order to achieve the global perspective simulation, the platform needs to have centralized management configuration means and a unified result observation method, which can be used by researchers to use the platform to carry out experiments to lower the threshold and provide convenience.

2.3. Platform Functions

Experiments on standard anonymous networks can be summed up in the following types.

2.3.1. Performance Analysis

Tor’s anonymous network has been criticized for its network performance due to the particularity of its protocol [36, 37]. Researchers have been looking for ways to improve the Tor network’s overall performance, either redesigning the protocol and architecture, optimizing Tor’s own weighted link selection algorithm, or enhancing its handling of network congestion, packet encryption, and decryption at the application layer.

2.3.2. Passive Listening

In passive listening, the attacker can control the critical location nodes such as the malicious guard node and exit node, hijack the traffic, carry out the relevant time attack, fingerprint attack, and other attack means to achieve the goal of deanonymization. Traffic feature extraction and matching are often vital in passive listening.

2.3.3. Active Interference

Active interference refers to the attacker’s artificial change of the link establishment process, traffic path selection, and other key points to remain anonymous. For example, an attacker at the AS (autonomous system) level can change the network traffic of the Internet to achieve more granular traffic filtering by implementing BGP hijacking and asymmetric traffic analysis to obtain more accurate analysis results.

2.3.4. Denial of Service

The denial of service is an attack on the Tor network itself. The target of these attacks is often the relay node. Since the overhead of the relay node processing the data packet is much larger than the overhead of generating the data packet, the attacker can continuously send the CREATE data packet to consume the relay node’s computing resources, thereby reducing the performance of the entire Tor network.

2.3.5. Application Layer Induction

In addition to the underlying protocol in the overall composition of the Tor network, the entire application layer system also exists in the surface web website. Browsers may have loopholes and problems, so it is also necessary to test and audit the anonymous network’s service provider. Security vulnerabilities may occur on it.

3. Our Proposed Platform

3.1. Topology Design

We summarize the common anonymous network attack and defense experiment scenarios, extract the rule conditions that satisfy them, and design a standard network topology, which can project various simulation scenes into the topology without distortion. We use a white spot as the terminal node and use a black spot as the basic network node. Therefore, we design a basic topology that meets different conditions.

The topology shown in Figure 1 consists of N routing nodes and N terminal nodes. The routing nodes are connected end to end to form a ring network. Each routing node is externally connected with a terminal node. The routing node acts as the control node of the termination node. The control node of the point can realize monitoring and flow control. Adjacent routing nodes can form an autonomous system to reproduce the characteristics of the anonymous network in the home domain. After the ring network is destroyed on a link, another half of the line can be dynamically routed. This topology simplifies the network structure and node relationships but retains the various network features required for anonymous network experiments and can be used for a single small-scale experiment.

In addition, we have designed a more complex network topology, as shown in Figure 2. The topology separates the routing node used to monitor the terminal traffic from the basic routing node used to transmit information and is connected by an N-route node in a star topology to form a basic transmission network, each of which is on the basic routing node. A routing node is dedicated to the control node, and M terminal nodes are connected to the control node to form an autonomous system. In the experiment, the basic routing nodes are not touched, and only the terminal nodes and control nodes required by the simulation environment are operated. This topology is closer to the real Internet environment, ensuring the availability of the underlying network for complex anonymous network experiments.

The focus of our design is not on traffic collection and stratification but on the design of the topology. Traffic in the protocol layer is the problem of upper-layer applications. Further research applications such as Tor can be deployed on our proposed topology.

3.2. Advantages

Our platform’s network topology has the following advantages:(1)It contains the basic network node responsible for transmission. The basic network node is equivalent to the infrastructure on the Internet and is only responsible for forwarding network packets, not participating in the encryption and decryption of anonymous networks.(2)It runs anonymous network programs in the terminal nodes. Terminal nodes may assume the role of dark web infrastructure such as hidden service directory, may also act as users of anonymous networks, run anonymous service-side programs or client services, and may also be used as a simulation of various services in the surface web so that experimenters from the anonymous network to initiate access to them, such as DNS, Web Server, etc.(3)It interworks between any two nodes. The same experiment instance is performed in an interworking network. If multiple simulation experiments are required at the same time, numerous network instances can be pulled up.(4)No single node can carry all the traffic in the network. The simulation platform should avoid a situation where the entire network is down due to the failure of the single node that carries all the traffic.(5)The communication between any two terminal nodes has no less than two routing lines. The network infrastructure under the dark web (usually the Internet) has certain robustness and will not be faulty due to a single node. And it leads to the whole network.(6)All communication traffic of any terminal node needs to flow through at least one corresponding basic network node. In the simulation experiment for the anonymous network, there may be scenarios in which the traffic of the terminal node is monitored or controlled in the system (rather than being handled by the node itself), such as being controlled by the operator. The node in this topology can implement this condition, which we call the control terminal node.

4. Evaluation

As the simulation platform is virtualized by KVM (Kernel-based virtual machine) and SDN network layer, each node’s operation is consistent with that of the actual Tor network. No additional functions or redundant configuration is required. Log in to the corresponding node shell to perform steps such as compiling and running Tor, starting the Tor process, and changing the Tor configuration. This section mainly introduces the platform’s feasibility and superiority by recreating the simulation approach of two typical attacks on our testing platform.

4.1. Experiment Setup

In order to achieve the design goals of the topology and simulation platform, we choose to use a host virtualization technology to create virtual nodes on a set of server clusters and use SDN technology to connect nodes according to the designed topology to form a virtual network, running different software. The network traffic sent by the virtual machine is transmitted by the SDN controller, using the same communication protocol (IP, TCP, and UDP) as the real network, and finally sent to receive the traffic. The design of the simulation platform is shown in Figure 3.

The simulation platform is built on a server cluster connected by LAN. KVM is used as the virtual machine controller; OVS (Open vSwitch) is used as the SDN switch; and CentOS is used as the operating system on both the server and the virtual machine. The foundation of the experimental platform is composed of KVM, OVS, and CentOS. The routing node is responsible for data forwarding at the network layer, and the complete Tor process is run in the terminal node. For different Tor roles, such as directory servers, we can flexibly allocate more resources to meet its computing and storage requirements.

The version of Tor selected for this paper is tor-0.3.4.8. System version is CentOS, and Linux version is 3.10.0-693.el7.x86 64. The simulation platform designed in this paper provides a related server and virtual machine images.

The platform provides a set of scripting tools, as the management layer of the simulation platform, which realizes the generation and configuration of the virtual machine and virtual network topology, pulls up and manages instances of the simulation environment, controls virtual machine terminals, and changes network bandwidth and congestion.

Another set of scripts implements the business layer of the simulation platform. It can install and configure the basic components of classic anonymous networks such as Tor, I2P, and Freenet and build an independent and complete anonymous network simulation environment. On this basis, the experimenter can also log in to the virtual machine to install other servers or client software and access the anonymous network. Figures 4 and 5 are examples of OnionRoute and HiddenService configuration.

Finally, the platform provides an observation layer. The participants can obtain the control of the corresponding terminal node or routing node according to the scenario setting and role assignment and specify the network traffic mirroring and log information of the location.

4.2. Time-Linked Tor Node Reset Attack
4.2.1. Attack Fundamentals

We divide the link in the process of communication between the client and the anonymous server into several parts for the subsequent explanation. The first is the link from the client to the rendezvous node through the 2-hop node, which we collectively call the CR link part. The link from the anonymous server to the rendezvous node through the 3-hop node is collectively referred to as the HR part.

In the dark web environment, only the hidden server’s entry node knows the real IP of the anonymous service. Suppose we currently have a certain number of entry nodes, clients, and rendezvous nodes.

The attack is divided into three steps:(1)Open the Tor2Web mode at the client we control. Select the RP that we control as the rendezvous node. The client sends a request to a specific anonymous server continuously for a certain period, and the anonymous server further establishes long-link communication with the client.(2)Select to turn off the Tor service at the RP, and the HS will disconnect the HR link at this time.(3)Through our client to access a specific anonymous server again to send a continuous request within a certain period of time, re-establish the CR and HR long link. However, the anonymous server at this time has already established an HR of three hops different from the previous one, and thus, the selected entry node has also changed.

Since we have a certain number of entry nodes, we repeat step 2 and 3 until a specific anonymous server connects to the entry node we control.

Finally, according to the characteristics of packet time association and continuity, the IP of the anonymous server is determined so as to achieve the effect of the anonymous server to anonymity. Figure 6 shows the overall flowchart.

4.2.2. Reproduction Process

First, we modify the client and hidden server configuration files to enable the client Tor2web mode. Also, specify the rendezvous node. After the client and hidden server configuration are completed, continuous requests are sent to the hidden server through the client for a certain period of time. Then use tcpdump crawl traffic data at all controllable portal nodes as well as at the client. Next, reset the Tor service at the rendezvous node, forcing the anonymous server to send a DESTROY instruction to destroy the link.

Repeat the reset service, and the client sends a continuous request for the steps so that the hidden server continuously destroys, re-selects the entry node, and re-establishes the link. Observing the traffic at the controllable node, in order to make the observation effect better, it is necessary to filter out the traffic of other interference items such as the directory server synchronization descriptor. The attack is stopped until the hidden server selects the controllable entry node as a node. Eventually, we can find the real IP (10.0.30.101) of the hidden server and complete the attack. Figure 7 shows the successful attack result.

4.3. Website Fingerprint Attacks on Tor Networks
4.3.1. Attack Fundamentals

Anonymous communication hides the address of the source and destination. The layered encryption of the traffic allows the attacker to detect the content information of the traffic. However, data traffic still has other dimensions. These features can be used to attack the anonymous system.

In order to achieve the purpose of the attack, the attacker first needs to obtain the fingerprint information of N websites, collect the traffic when visiting N websites, set up feature vectors, and train the classifier by using machine learning algorithms. When the target client accesses the website, the client can obtain the client traffic, and the import classifier determines whether the target client has accessed one or several of the N websites according to the classification result. Therefore, the prerequisite for a successful website fingerprint attack is to be able to collect traffic from the client. The attacker is required to be an ISP-level adversary.

4.3.2. Reproduction Process

In this experiment, the Alexa China Top100 website was selected for fingerprint collection.

First, the local client opens Tor, connects to the Tor network, and accesses the above website. In the entry node, tcpdump is simulated by the ISP-level attacking attacker to grab the traffic sent and received by the client. The experiment will access each website 100 times with a random interval in between. The data set is the time delay and size of traffic data packets visiting the corresponding website. The extracted feature is a combination of these delays and sizes, such as the delay of the first 300 data, the size of the first 300 packets, and so on.

The selection of fingerprint features in this experiment is crucial for the entire attack because the selection characteristics largely determine the correct rate of classifier classification. Based on previous researchers’ fingerprint attack experience on anonymous systems, this topic selects the extracted features shown in Table 3.

With the random forest algorithm using the Scikit-learn library in python, the data set is handed over to our classifier. The training is done by means of a tenfold cross-validation method. The ten-fold cross-validation method is used in the training, that is, the data set is divided into ten equal parts, one of which is used as the test set, and the rest are used for training and evaluation, so as to make the classification effect of our classifier better.

We have already built the classifier in the previous work, and then we will introduce the overall attack process. The flow chart of the attack process is shown in Figure 8. Randomly visit the website at the client to simulate normal user behavior and listen to traffic at the ingress route to simulate ISP-level adversaries. The monitored user traffic is processed by our script to extract features. The feature data is then passed to the classifier classification to obtain an output.

To test the accuracy of our classification, assume that the adversary is only interested in the Alexa Top10 website. Control the client to randomly access the website in Alexa Top10 for a total of 200 times, and the monitored traffic data is repeated. Finally, the statistical results in Table 4 are obtained.

The success rate of using fingerprint attacks on different websites to destroy client anonymity is basically above 80%, indicating that this attack is established in the Tor network.

Our design can meet the needs of different applications, and the experimental platform has good scalability. Users can quickly understand and familiarize themselves with the method of use, and the experimental platform has good ease of use. Compared with the existing research platform, our design has good generality.

5. Conclusion

This paper designs a new anonymous network research platform, which reduces the research threshold for anonymous networks and is suitable for comprehensive and efficient simulation of real anonymous network research. The research platform uses virtualization to quickly build the basic nodes in the anonymous network, avoiding deploying a large number of group hardware facilities in the network. The platform uses SDN, which can be defined and controlled by software programming, simplifying the network. The steps of network deployment meet the needs of the anonymous network for complex network environments. The platform can also deploy and run anonymous Tor source codes and other anonymous network source codes to ensure the network simulation operation’s correctness and ensure that the virtual nodes can be linearly expanded. Finally, the paper highlights the platform’s feasibility and superiority by processing two typical attacks on the platform. In the future, we will focus on technical research of anonymous networks in the platform, seeking to identify possible anonymous network vulnerabilities and implementing improvements.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.