Abstract

By utilizing the radio channel information to detect spoofing attacks, channel based physical layer (PHY-layer) enhanced authentication can be exploited in light-weight securing 5G wireless communications. One major obstacle in the application of the PHY-layer authentication is its detection rate. In this paper, a novel authentication method is developed to detect spoofing attacks without a special test threshold while a trained model is used to determine whether the user is legal or illegal. Unlike the threshold test PHY-layer authentication method, the proposed AdaBoost based PHY-layer authentication algorithm increases the authentication rate with one-dimensional test statistic feature. In addition, a two-dimensional test statistic features authentication model is presented for further improvement of detection rate. To evaluate the feasibility of our algorithm, we implement the PHY-layer spoofing detectors in multiple-input multiple-output (MIMO) system over universal software radio peripherals (USRP). Extensive experiences show that the proposed methods yield the high performance without compromising the computing complexity.

1. Introduction

5G mobile communication system puts forward the requirements that are high-speed, high efficiency, and high security under three typical application scenarios: enhanced Mobile Broadband (eMBB), Large-Scale Internet of Things (IoT), and ultra Reliable & Low-latency Connections (uRLLC) [1, 2]. The specific application scenarios that enhance the need for mobile broadband including high-traffic and high-density wireless networks are densely used in indoors or urban areas, in which large-area signals of wireless mobile networks are continuously covered in rural areas. Meanwhile, 5G involves the interconnection and communication between a large number of machines and equipment, which is a necessary condition for the operation of IoT [3]. Many mobile devices access the wireless network at the same time, which results in heavy burden of authentication computing in the wireless network. Therefore, lightweight access methods are required for intensive application scenarios of 5G wireless communication networks.

In response to this need, scholars have successively carried out researches on light-weight security measures based on computational cryptography [4, 5]. However, it is still very difficult to use the cipher algorithm that meets the resource-constrained application scenarios such as wireless mobile terminals, IoT, and sensor networks. Therefore, there is a need to find new technologies to construct the lightweight security scheme. In the last decade, the research of PHY-layer security technology has brought new vitality to the wireless mobile communication industry [610]. The physical layer of the characteristics is difficult to be counterfeit, which can provide high level security with low cost to overcome the lack of the cipher based security technologies. Consequently, physical layer characteristics which can be used to improve the security of wireless communications have been widely concerned for researchers.

Several PHY-authentication techniques are proposed. In [1117], the received signal strength (RSS) and channel impulse response (CIR), as well as channel state information (CSI), are utilized to detect identity-based attacks in wireless networks, such as man-in-the middle and denial-of-service (DoS) attacks. The work [18] presents a PHY-authentication framework that can be adapted for multicarrier transmission. In order to detect Sybil attacks, [19, 20] present a PHY-authentication protocol that combines with high-layer authentication based on the channel response decorrelations rapidly in space, and channel-based detection of Sybil attacks in wireless networks is implemented. In [21], Peng Hao et al. developed a practical authentication scheme by monitoring and analyzing the packet error rate (PER) and received signal strength indicator (RSSI) at the same time to enhance the spoofing attack detection capability. In [2224], the authors analysed the spatial decorrelation property of the channel response and validated the efficacy of the channel-based authentication for spoofing detection in MIMO system by the comparison between channel information “difference” of two or several frames.

However, in above-mentioned works, artificial thresholds are needed to detect spoofing attack. In fact, threshold range cannot be accurately confirmed, resulting in spoofing detection with low precision. In this paper, a machine learning based PHY-layer authentication is developed, which provides an intelligent decision method instead of a one-dimension test threshold. Specifically, Adaboost [25, 26] based algorithm with one-dimensional feature is employed to detect spoofing attacks. To enhance authentication performance, the two-dimensional feature is carried out. The major contributions of this paper are summarized as follows:(1)An AdaBoost based PHY-layer authentication algorithm is proposed to increase the authentication rate.(2)The authentication model based on two-dimensional feature is established, which has a stronger performance for cheating detection than the one-dimensional authentication method.(3)The proposed PHY-layer channel authentication scheme is implemented in a real world environment, based on MIMO-OFDM systems. The simulation results show that the detection rate is greatly increased.

The rest of this paper is organized as follows. Section 2 describes system model and problem formulation. Our proposed algorithm for PHY-layer authentication is presented in Section 3. The system experiment and simulation results are presented in Section 4. In Section 5, we conclude this paper.

2. System Model

In this section, we provide a system model of physical layer authentication and hypothesis testing.

2.1. MIMO Three Parts System Model

As shown in Figure 1, our analysis is based on an Alice-Bob-Eve model in MIMO system, where Alice and Bob are legitimate users equipped with N_T and N_R antennas, respectively. Eve with antennas attempts to spoof Alice by using her identity. They are assumed to be located in spatially separated positions. In order to address this spoofing detection, Bob tracks the uniqueness of wireless channel responses to discriminate between legitimate signals from Alice and illegitimate signals from Eve. That is a physical layer authentication. The detailed physical layer authentication process is as follows: Signals with the pilots which can be used to estimate the channel response of the corresponding transmitter are transmitted over the wireless multipath channel to the receiver. The -th transmission data contains -frames, while each frame consists of OFDM symbols.

Bob is assumed to obtain the Alice-Bob channel information for any frame index , , and save it which extracted by the channel estimation. After a while, when Bob receives the next data frame, the k + 1th data frame, , which is extracted and estimated by Bob the unknown channel response information. Bob compares with the channel of Alice, , to determine whether the corresponding signal is actually send by Alice.

If the values of and are approaching, Bob considers the sender’s identity as valid and stores it. On the contrary, Bob determines that the sender's identity is invalid and directly abandons the data frame.

Channel information is detected by the channel estimation algorithm, denoted by and . Each data frame contains OFDM symbols. Thus, the channel information is given bywhere denotes the -th OFDM symbol of channel information.

2.2. Hypothesis Testing

A binary hypothesis testing is performed to determine the identity authentication in the continuous data frames. Let the receiver Bob verify that the kth data frame originates from the legitimate sender Alice, and the extracted channel information is ; the sender of the k + 1 th data frame is still unknown and the channel information is : the null hypothesis H0 indicates that the packet is indeed sent by the Alice. The alternative hypothesis H1 is that the real client of the packet is not Alice. The spoofing detection builds the hypothesis test given bywhere all elements of and are i.i.d. complex Gaussian noise samples . Therefore, if channel information for hypothesis testing is directly used, the need of considering the impact of noise variables will increase the certification complexity. To this end, since and are with the same statistical characteristics, the “difference” of channel information can eliminate the influence of noise variables. The physical layer authentication translates into the comparison between the “difference” of the channel information and the set threshold. Equation (2) can be expressed aswhere denotes the calculating result of the difference between A and B and is the test threshold.

The null hypothesis, , is that the identity is legitimate and Bob accepts this hypothesis if the test statistic he computes, , is below some threshold . Otherwise, Bob accepts the alternative hypothesis, , that the identity is illegitimate. The channel response “difference” is recorded as T, and (3) can be also written as

As shown in (4), the physical layer authentication is actually a comparison between channel information “difference” and authentication threshold. Thus, the difference between channel information and authentication threshold is the key of physical layer authentication. The test statistics can measure the similarity of channel information and calculate the channel information difference. In this paper, we use two kinds of test statistic TA and TB, respectively. In particular, assuming Bob obtains two consecutive frame channel response of and , respectively, from Alice. We build test statistics of and based on the two frames for the purpose of discrimination identity of Alice or Eve. Subsequently, Bob acquires the k+1th frame channel information as .

The test statistics are calculated aswhere is the phase offset and can be denoted by

From (5), can be taken as the difference of the subcarrier amplitude, which avoids the effect of .

Two consecutive data frames, and , represent measurement errors in the phase of the channel response. Each channel response value consists of frequency domain channel matrix, which is OFDM symbol of N dimensional square matrix and denotes the th row and denotes the column element phase offset.where is the test statistic based on amplitude and phase information. We use and as the one-dimensional test statistic, respectively, for detecting spoofing attack. Unfortunately, it is hard to find the best threshold for achieving high accuracy authentication detection rate. To tackle this problem, we propose a learning algorithm based on AdaBoost to achieve physical layer authentication, in which and are used as training features.

3. Physical Authentication with AdaBoost Algorithm

In this section, we propose a learning algorithm based on AdaBoost for physical authentication.

3.1. AdaBoost Algorithm

AdaBoost is the abbreviation of adaptive boosting and developed by Yoav Freund [24] and is the most widely used form of boosting algorithm. Boosting is a powerful technique combined with base classifiers [25] to produce a form of committee whose performance can be significantly better than other base classifiers. The principal of AdaBoost algorithm is that this algorithm improves its performance by the iterative algorithm, which is adaptive in the sense that subsequent weak classifiers, called as learners, are adjusted to improve those instances misclassified by previous classifiers. AdaBoost can be seen as a particular method of training a boosted classifier. A boost classifier is a classifier as follows:where each is a weak classifier that takes as input and returns a value indicating the class of . The weak classifiers, each of classifiers is trained by using a weighted coefficient from the data set where the weighting coefficient associated depending on the performance of the weak classifiers such as decision tree (support vector machine) SVM, are trained in sequence. More specially, data points which are misclassified by one of the weak classifiers are being given greater weight, which are used to train the next weak classifier. As illustrated in Figure 2, once all the classifiers have been trained until there are no misclassified data points, then their final model is generated via a weight majority voting scheme.

3.2. Physical Authentication with AdaBoost Algorithm

The physical authentication with AdaBoost algorithm is proposed for detection spoofing. The performance chart of the algorithm is illustrated in Figure 3. Bob collects the channel matrix, , which obtained by channel estimation using the pilot from Alice and records it. When Bob receives the next data frame from the Alice, the Bob collects channel information, . Similarly, Bob collects continuous N-frames channel information from Alice and stores as . In the same time an Eve sends the data frames to the Bob and claims that he is Alice. In practical communication scenarios, we do not know where and who Eves are. But in proposed scheme Eves are needed to be test training purpose. Therefore, one or several Eve nodes are set for this purpose. Bob continuously extracts the continuous N frames channel information from Eve and stores as .

The data set is preprocessed by Bob. Firstly, Bob calculates the value of data set, , . Secondly, Bob calculates the test statistics based on test statistics , as

Finally, Bob generates training data set of two categories. The first one iswhere or , , by substituting , into (9), (10), yields , and the value of represents that the transmitter is the legal transmitter from Alice. And the second training set iswhere or , , by substituting , into (9) and (10), yields and , and the value of represents that the transmitter is the illegal transmitter from Eve. Bob uses the two classification training data set , and as input training set.

Spoofing detection is essentially a two-classification problem, which is considered to be solved through AdaBoost algorithm. The training data is made up of a bunch of sample points. Each sample point comprises input sample and label where . Each sample point is given an associated weight parameter , means -th training, and means the number of sample points, which is initially set for all sample points. We suppose that we have a procedure available for training a weak classifier using weighted sample points. At each iteration of the training process, AdaBoost trains a new weak classifier by using the sample points in which the weighting coefficients are adjusted according to the performance of the previously trained weak classifier, so as to give greater weight to the misclassified data points, in which the classification error rate is used to evaluate misclassified data set Then the coefficient of is calculate asFinally, we generate a final model that different weight is being given to different weak classifiers in (8). The AdaBoost algorithm is given as in Algorithm 2, in which the point of the training data can be doubled by combining with the one-dimension test statistics and together and become a new two-dimensional features authentication model for spoofing detection. Therefore, in the AdaBoost algorithm, the input training data set T is following two optional sets:

One-dimension test statistics training data set:

Two-dimension test statistics training data set:

4. Experimental Verification

In this section, we will describe the system setup and the test process of measuring the Algorithm 1 for detecting Alice and Eve.

Input:
The channel information of legal transmitter or illgal transmitter:
Process:
1:Bob calculates the value of data set and from Alice and simulated Eve:
2:The data set are preprocessed by Bob:
3:The data set are divided into two parts, and the one is training data set and the other is testing data set:
4:Use training data set to get the weak classifier:
5:Use the Adaboost algorithm to generate a strong classifer:
6:The testing data set is used to verify whether the claasifier can achieve the target detection rate, otherwise it will return to
the first step:
7:The final classifier is the authenticaton decision model, which can judge whether the new packets are legitimate or illegal:
End
Input:
training data set :
Process:
1:Initialize the weight distribution of the sample points:
2:for to do, means -th training
3:Use the training data set of to learn and get the weak classifier:
4:Calculate the classification error rate of on the training data set:
5:Calculate the coefficient of :
6:Update the weight distribution of the training data set:
,
7:Construct a linear combination of weak classifiers:
End for
return:
4.1. System Setup

We consider the spoofing detection of a receiver called Bob, the legal transmitter called Alice, and the spoofing node called Eve. They were placed in three separate locations in a room, surrounded by many other devices such as printers, desktops, and other types of equipment as shown in Figure 4. There are scattering and refraction phenomena in the room due to the presence of obstacles in the wireless channel from Alice to Bob and Eve to Bob. As shown in Figure 5, we set up experimental platform which implemented on USRPs, and experiments were performed in an indoor environment. Bob is equipped with an 88 MIMO system, Alice is equipped with a 22 MIMO system, and the spoofing node called Eve is equipped with a 22 MIMO system. The signals are sent over 2 antennas each at center frequency 3.5GHz with bandwidth 2MHz.

4.2. Experiment

In the experiment, the following steps are taken.

Step 1. Bob extracts channel information from Alice and Eve by the existing channel estimation mechanisms, respectively.

Step 2. Bob preprocesses the dataset according to (5), (7), (9), and (10) while the threshold is between (normalization).

Step 3. Bob generates a training data set of two classifications according to (11a), (11b), (12a), and (12b).

Step 4. The two classification training data set T is generated according to (15) or (16).

Step 5. Bob is trained to generate a strong classifier based on the training data set of two classifications by using AdaBoost algorithm under Matlab program.

Step 6. Bob uses a strong classifier to judge the test set and obtain the authentication detection rate.

In the experiment, we consider that the collection frames are five hundred frames and the value of test statistic was normalized between 0 and 1. The test statistic of channel information of the Alice and Bob as a function of frames is shown in Figure 6(a), in which the red points is in (5) and green points is in (9). As can be seen, there is the overlapped area. Meanwhile, from Figure 6(b), the overlapped area is large, when we chose the test statistic of channel information in which the red points is in (7) and green points is in (10). It is clearly shown that it is difficult to acquire the best manual test threshold for the accuracy of authentication. Moreover, we use , , and the number of frames, respectively, to draw a three-dimensional plot. As shown in Figure 7, obviously, it is hard to use the traditional manual threshold method to identify the identity of data sets in the three-dimensional condition. However, machine learning algorithm based the authentication model can effectively settle this problem and a dividing curved surface can perform the identification by the AdaBoost adaptive adjustment algorithm.

4.3. Simulation Results

In this section, simulation results are provided to demonstrate the performance of the proposed authentication scheme.

As a comparison, we considered the PHY-layer spoofing detection [15] with a varied test threshold. From the Figure 8, we can see that when test threshold equals 0.4, the best authentication detection rate results of using or reached 79.8% and 65.4%, respectively. In addition, our proposed method which combined two test statistics and as a two-dimensional feature can improve the accuracy of detection. We use , , and the number of frames, respectively, to draw a three-dimensional plot. Figure 9 illustrates the comparison of spoofing detection among the three methods, from which we can conclude that manual threshold method based on test statistics can achieve 79.8% detection rate while machine learning based authentication method with test statistic can acquire 87.1% detection rate and machine learning based authentication method with two-dimensional features and can achieve 91.3% accuracy rate with an additional 10% more computation complexity.

To sum up, the proposed authentication scheme achieves a superior performance over manual threshold strategy [15]. Based on the above observation, the proposed machine learning based authentication scheme with tow-dimensional feature not only exhibits excellent performance than manual method but also has higher authentication rate than that of the same algorithm with one-dimensional feature.

5. Conclusions

In this paper, machine learning algorithm based physical-layer channel authentication for the 5G wireless communication security is proposed. A machine learning authentication method could draw a conclusion whether the received packets are from a legitimate transmitter or from a counterfeiter by using one-dimension or two-dimensional joint features. The effectiveness of the proposed authentication scheme is validated by widely simulations. All the data used in the simulation are derived from real OFDM-MIMO communication platform, which provides a real communication environment. Moreover, the authentication results show that the novel methods provide a higher rate in detecting the spoofing attacks than those of the manual threshold based physical layer authentication schemes. The training of the classifier can be done offline. Therefore, the novel method can perform authentication fast. In addition, whether we can use more machine learning algorithms to further optimize our authentication model and find a better statistical test of large difference in channel information is issue that we need to deal with in the future.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This research was supported by NSFC (no. 61572114), Sichuan Sci & Tech. Achievements Transformation Project (no. 2016CC003), Sichuan Sci & Tech. Service Development Project (no. 18KJFWSF0368), Hunan Provincial Nature Science Foundation Project 2018JJ2535, Chile CONICYT FONDECYT Regular Project 1181809, and National Key R&D Program of China (2018YFB0904900 and 2018YFB0904905).