Abstract

Physical layer identification is an emerging technique that exploits physical layer features to identify wireless devices. The identification accuracy and the device quantity that can be identified at most are significant for the identification scheme. Existing works primarily focus on the feature correlation analysis for multifeature selection without investigating the least upper bound (supremum) of the performance of a single feature. The supremum indicates the limit of the performance, which is another sight for evaluating the quality of features and improving the performance of the identification scheme. Therefore, this paper first investigates the supremum of the performance of the most commonly used physical layer feature, i.e., carrier frequency offset (CFO). Specifically, we offer a rigorous mathematical analysis and derive the closed-form expression of the supremum of identification accuracy based on the max-min distance analysis (MMDA) criterion. And then, the supremum of the number of distinguishable devices is also analyzed. Finally, we conducted a simulation study to verify the theoretical analysis result.

1. Introduction

Device identification plays a vital role in wireless networks, conventionally realized with pre-distributed information such as IP addresses, MAC addresses, and international mobile station equipment identity (IMEI) numbers. With this information, basic access control [1] and location tracking [2] can be implemented. However, the mentioned addresses and numbers can easily be spoofed, exposing wireless devices and infrastructures to security threats [3]. Furthermore, it is often restricted to collect identity information due to business, privacy, and legal reasons, while identity is necessary for some applications. Therefore, there is an urgent need to find a more reliable or complementary way to identify devices.

Recently, the rich characteristics of the physical layer have been intensively investigated to implement device identification in wireless networks [46], also known as physical layer identification. Various physical layer features can be extracted and performed as the device’s identity. These features stem from the small-scale hardware impairment in the transceivers or the location-specific characteristics of the wireless channel between the transmitter and receiver. According to the signal types collected for feature extraction, there are two categories of identification schemes, i.e., transient and steady-state signal-based ones [7]. Since the steady-state signal is easier to capture than the transient one and has attracted more attention from researchers, we concentrate on this type. Two approaches are reported in the literature for physical layer identification based on steady-state signals according to different classifier types.

The first approach, called shallow classifier-based device identification, implements the identification with hand-crafted features calculated from the received signals by carefully designed feature extraction algorithms. These features, usually relying on expert feature knowledge, will then be exploited with traditional shallow classifiers such as support vector machine (SVM) and K-nearest neighbor (KNN) or binary hypothesis testing to identify and authenticate transmitters [711].

The second approach takes advantage of the powerful learning ability of deep learning to identify wireless devices with the collected raw in-phase and quadrature (IQ) signal or its transformed information [6, 1219]. Hence, it is known as deep learning-based physical layer identification. In this approach, hidden features can be automatically extracted from wireless frames with the aid of the representation learning ability of deep learning methods without using explicit feature calculation algorithms.

Both approaches are regarded as multiclass classification problems when using machine learning classifiers to discriminate multiple devices. It is intuitive that the identification accuracy will decrease as the quantity of devices increases. In other words, if we want to achieve the desired accuracy, the quantity of devices that can be identified will be limited. Recent work supports such a claim from the view of experiments, where the accuracies drop for both WiFi and ADS-B datasets using two deep learning models when the quantity of devices increases to 10,000 from 100 [19]. And there exist works focusing on the combination of multiple features to improve identification performance [8]. Their work is based on the view that a single feature leads to limited identification accuracy or limited number of distinguishable devices. User capacity of the physical layer identification system is investigated in [20, 21], where the authors consider the frequency characteristics from fast Fourier transform (FFT) as the radio frequency fingerprints.

Except for the mentioned related works, we still lack detailed analysis on the supremum of the identification accuracy and distinguishable devices for specific hand-crafted features, i.e., what is the highest identification accuracy and how many devices could identify at most under given conditions? This is important for investigating the performance and quality of a specific physical layer feature in an identification scheme and gives insight into finding approaches to improve the performance, such as identification with multiple features. Therefore, this paper focuses on issues not touched upon in existing works with the following contributions: (1)Firstly, with the max-min distance analysis (MMDA) criterion and other mathematical analyses, we derive the closed-form expression of the supremum of the identification accuracy of the shallow classifier-based scheme given specific conditions(2)Secondly, we also analyze the supremum of the number of distinguishable devices (device quantity supremum) of the shallow classifier-based scheme given the accuracy constraint(3)Finally, through comprehensive simulations, we confirm the theoretical analysis and provide some interesting insights. The results indicate that the feature range (the value range of the physical layer features such as CFO specified in the standard protocol) and the SNR are the main factors affecting the identification performance, consistent with the theoretical analysis. We compare the accuracy of the shallow classifier– and deep learning–based identification schemes with the accuracy supremum. We also investigate the device quantity supremum with simulation with different CFO ranges and accuracy constraints at various SNRs

The rest of this paper is organized as follows. Section 2 reviews related works and introduces basic knowledge. Section 3 describes the system and communication models. Section 4 focuses on the supremum analysis of the shallow classifier-based identification scheme. Section 5 presents the simulation settings and results. Finally, Section 6 concludes this paper.

In this section, we first present related works, and further introduce certain basic knowledge regarding machine learning.

2.1. Related Work

There is no difference between shallow classifier– and deep learning–based device identification in terms of essential processes, including feature extraction and device identification. However, steady-state radiometric features such as carrier frequency offset (CFO) [22] and in-phase and quadrature imbalance (IQI) [23] rely on the expert knowledge of signal processing. Therefore, the feature extraction is explicit and protocol-specific. On the other hand, deep learning methods [17, 24] can automatically extract implicit features rather than expert feature engineering based on the raw IQ samples of the signal. However, this process usually requires dedicated hardware, such as graphics processing units (GPUs), to accelerate the computation.

For the first type of identification approaches, the estimation method and the number of independent features are the key factors that affect the performance. It is acknowledged that employing multiple features can improve identification accuracy. Therefore, some existing works focus on exploring new features and integrating with other features [8]. Peng et al. smartly combine differential constellation trace figure, CFO, modulation offset, and IQI to identify 54 ZigBee devices and achieve classification error rates of 4.48% and 9.42% under the line of sight (LOS) and none-line of sight (NLOS) scenarios [25].

While recently, deep learning-based identification approaches have attracted considerable research attention, which apply various deep neural network models to implement the feature extraction and identification processes, raw IQ samples or their transformed information, such as power spectrum and FFT sequence, can be used directly as the inputs of the models.

However, the upper bounds or the supremum of identification accuracy and the number of distinguishable devices for hand-crafted features are unclear for device identification. The supremum of a single feature in terms of accuracy and device number indicates the ultimate performance, with which we can design a better identification scheme and implement a more appropriate feature selection and combination. Although Wang et al. [20] [21] explore the user capacity of the physical identification system, they consider the frequency characteristics from FFT as the radio frequency fingerprints without analyzing hand-crafted features.

2.2. Machine Learning

From the view of model structure, machine learning can be categorized as shallow classifiers and deep learning. Shallow classifiers, which usually adopt statistical models with only a few layers of composition, are mainstream research before the breakthrough of deep learning. These classifiers include naive Bayes, support vector machine (SVM), AdaBoost, random forest, and KNN and are still adopted in many commercial classification systems. Deep learning technologies are neural networks with many layers of nonlinear information processing. In recent years, they have been widely studied in many fields such as computer vision, speech recognition, and cybersecurity.

2.2.1. Shallow Classifiers

Shallow classifiers always have a very efficient and effective performance on high-quality samples [26]. This paper adopts KNN as a shallow classifier for device identification based on hand-crafted features. KNN is a simple but efficient machine learning algorithm, usually used for classification and regression. Usually, the new sample/case will be assigned to the class that is most common among its K-nearest neighbors measured by a distance function, i.e., the majority voting of the new case’s neighbors according to the distance such as Euclidean distance [27].

2.2.2. Deep Learning

There are different deep learning models, such as recurrent neural network (RNN), convolutional neural network (CNN), and generative adversarial network (GAN). This paper focuses on CNN since it has been investigated in much recent literature and has shown great potential in device identification. A general CNN comprises one or more convolutional layers, pooling layers, and fully connected (FC) layers [28]. The convolutional layers aim to promote important hidden features of the input data through the specially designed structures called “filters” having different dimensions, also known as feature extractors. Also, different types of CNN have been investigated for device identification. 1D and 2D CNNs with one/two-dimensional convolutional layers are exploited to identify wireless devices [19, 29]. Complex-valued neural networks are explored in [30] to improve the wireless identification performance.

3. System Model

In this section, we first describe the considered identification and communication model. Then, we formulate the concerning problem about the least upper bound analysis. Table 1 summarizes the main variables and notations used in this paper.

3.1. Identification Model

As shown in Figure 1, the model comprises a wireless receiver (RX) and wireless transmitters (TX). The receiver attempts to identify each transmitter using the received wireless frames. The receiver first collects the raw IQ samples of the concerned field, e.g., baseband preamble, via frame detection and synchronization from the received wireless signals subject to the concerned channels. The receiver can calculate the hand-crafted features using the raw IQ samples for identification. Also, it can directly use the raw IQ samples to identify transmitters with deep learning. Then, the transmitter identification will be formulated as a multiclass classification problem based on hand-crafted features or raw IQ samples, depending on the adopted classifier.

3.2. Communication Model

At the receiver, the passband signal is down-converted to the baseband. Then, the received baseband signal is sampled by the analog-to-digital converter (ADC) to obtain the discrete complex-valued preamble signal, i.e., the raw IQ samples. The baseband signal with CFO is given as [31]: where is the normalized CFO with being the CFO in the corresponding range parts per million (ppm) to the carrier frequency . The range is usually specified in the communication standard concerned. Here, is the sampling interval with being the total communication bandwidth, and is the circular symmetric additive white Gaussian noise (AWGN) with zero mean and variance . Let denotes the long training symbols with length of , which usually contains two -length repetitive sequences. And it is a classical preamble structure in many wireless protocols such as most specifications in the IEEE 802.11 family. When the baseband signal is only subject to the AWGN channel, is the same as , and the received signal is denoted as (1). When the baseband signal is subject to multipath channel, is given by where represents the channel coefficients of the multipath fading channel. Notably, we assume the locations of transmitters and receiver are fixed. Hence, the Doppler offset is zero, and the channel profile is static during the operation time, which can be considered a quasi-static channel similar to [32]. This is reasonable for many wireless networks such as wireless sensor networks (WSN) and wireless local area networks (WLAN) where fixed sink nodes or routers create a static channel profile when the receiver location is also fixed. We then denote the concerned preamble containing the two repetitive long training sequences as after synchronization and denote the SNR of the received signal as with the unit of dB, which is calculated as follows: where and and represent the power of signal and noise, respectively.

3.3. Problem Formulation

For the shallow classifier-based identification scheme with CFO, the CFO will be first estimated from raw IQ samples of the preamble field of the received signal. Then with the collected feature of each frame, we can train a shallow classifier for device identification.

We are interested in how the identification accuracy will vary with the range of a single feature and other conditions. And are there any supremum or upper bound of identification accuracy and the number of distinguishable devices with the considered feature? In a word, our primary aim is to investigate the performance limits of each specific feature adopted in physical layer identification by answering the above questions.

4. Identification Performance and the Supremum

At the receiver, the raw IQ samples of the long training sequence are adopted to implement hand-crafted feature estimation for identification. We then analyze the supremum of identification accuracy and the device quantity supremum considering CFO.

4.1. CFO Estimation

Similar to [3335], when the length of the long training sequence is larger than the maximum channel delay in (2), i.e., , the CFO can be estimated by the two repetitive long training sequences. We first calculate the phase difference between the frequency responses of two identical and consecutive long training sequences as where is the complex I and Q samples of the frequency response of a training sequence and is the time index in a window of samples. Then, we achieve the estimated CFO as and according to [35], the standard deviation of the CFO estimation is which can be considered as the lower bound of the CFO estimation error in the receiver. Usually, there exists more than one method for estimating the same feature with different estimating errors, i.e., variance, which results in different performances for the shallow classifier-based identification.

4.2. The Supremum of the Identification Accuracy

Since the true CFOs of the transmitters are independent uniform random variables in the concerned range [35] [36], we denote it as . And we define the spacing between two adjacent true CFOs as the distance of ; then, we have . Assuming the mean distance of the CFO , then we have

The overall accuracy is widely used for multiclass classification problems, whose definition is as where is the total number of predictions and is the true positive predictions when considering the classification as a binary classification regarding the -th class and other classes. We can also denote the classification error rate and accuracy as and as (9) according to [37], where is the Q-function:

As we assume that the variances of the estimated CFO of all devices are the same at a specific SNR, which means all the CFO samples are from homoscedastic Gaussians as the same standard deviation of (6). We can adopt the MMDA criterion to achieve the maximum separation of all devices concerning CFO [38]. According to this criterion, to achieve a maximum classification accuracy, we have to maximize the minimum distance of each class pair (two devices) to guarantee the separation as best as possible of any class pairs as where the inner minimization chooses the minimum CFO distance of all class pairs, while the outer maximization maximizes this minimum distance [38]. Here, is the set of minimum CFO distance.

Theorem 1. For devices, the supremum of the minimum distance of the CFO in the range is .

Proof. According to the definition of supremum, we first adopt proof by contradiction to prove that is an upper bound of the minimum distance set , i.e., , where as in (7). If , then we have , which is a contradiction.Therefore, holds. Second, we prove is the minimum of the upper bounds. ; we find , and fulfills , which completes the proof.

Proposition 2. When the true CFOs of all devices are distributed with equal distance in the concerned range, i.e., the minimum distance of the CFO equals to its supremum, the separation of all devices will be the best. Thus, in this case, , we have the least upper bound, i.e., the supremum of accuracy as where and .

Proof. First, we prove is an upper bound of the accuracy . Since , where is the cumulative distribution function (CDF) for the standard Gaussian distribution. We have , where is the probability density function (PDF). And when , , which means the Q-function is a monotone decreasing convex function when . Using Jensen’s inequality, we have Combining Equation (11) and the expressions of and in Equations (9) and (12), we have , and thus, is an upper bound of .
Second, we prove is the minimum of the upper bounds of . With , we construct a function as Given that the Q-function is a monotone decreasing convex function when , we have , and when , .
With (12) and (15), then we have and we can find with fulfills , where is the set of identification accuracy. Because combined with (16) and (17) and applying the Lagrange mean value theorem, we have , i.e., as in (17), where , , and . Finally, according to the definition of supremum, the proof completes.

We can observe from (11) that the accuracy supremum is determined by the feature range, the number of transmitters to be identified, and the precision of the feature estimation method (i.e., the standard deviation of the estimate).

4.3. The Device Quantity Supremum

We define the supremum of the number of distinguishable devices or the device quantity supremum of an identification scheme as the device number under which the accuracy of the identification scheme will not exceed the given constraint. And then, with this supremum, we can evaluate the performance limit of the adopted feature for device identification. Intuitively, the supremum is related to the identification accuracy as shown in (11). However, it is difficult to deduce a closed-form expression of the inverse function of (11) concerning and , then we denote (11) as for simplicity. And given the monotone decreasing property of , we have the following proposition.

Proposition 3. For a specific feature range and feature estimation, given the target accuracy, the device quantity supremum is , where is as shown in (11).

Proof. We define two functions and , then we can denote (11) as . Since both and are strictly monotone decreasing with and , will be strictly monotone increasing. Thus, will be a strictly monotone decreasing function too. According to the properties of the inverse of strictly monotone function, also will be a monotone decreasing function. The device number is an integer set, denoting as , and is the supremum of the number set . We prove it by contradiction as follows. Suppose that is not the supremum of , which means there is at least an integer that fullfils . Obviously, this is a contradiction because should be according to its definition, which means the premise cannot be true. Thus, the device number supremum is .

It indicates that the maximum number of transmitters can be identified, i.e., device quantity supremum under the constraint of the desired accuracy is determined by the feature’s range and the precision of the estimation method. Although it is difficult to give the closed-form expression of , we can depict the relationship between and by simulation and observe the variation of the device quantity supremum.

5. Simulation Study

5.1. Simulation Settings

We simulated a typical wireless communication processing of 802.11a based on OFDM. Similar to [16], we also generated the beacon frames for transmitter identification where the (legacy) long training field (L-LTF) was adopted to estimate the CFO. We implemented data generation and processing, machine learning, and deep learning methods on a platform with MATLAB R2021a. The platform is a Dell Precision 3640 tower workstation (https://dl.dell.com/topicspdf/precision-3640-workstation_owners-manual2_en-us.pdf) with an Intel(R) Core(TM) i9-10900K CPU and 32GB RAM running the Ubuntu 18.04 operating system. Further, we used an NVIDIA GeForce RTX 3080 GPU configured on the workstation to train and test the deep learning-based models. The main simulation parameters, including the communication system and the Rayleigh channel, are shown in Table 2.

As the system model shows in Section 3, the receiver collects signals from the transmitters and then uses L-LTF to extract the features and identify the devices. Following the specification, the transmitted L-LTF sequences are configured as the same for all transmitters, enabling the algorithm to avoid any data dependency. Since we assume the transmitters and receiver are static, the multipath channel profile and RF impairments do not vary in time.

After comparing several shallow classifiers in common use, we selected KNN for the shallow classifier-based identification. We tuned parameters of “100” as the number of neighbors, “Euclidean” as the distance metric, and “Equal” as the distance weight.

We adopted the same CNN architecture in [16] as the deep learning identification method. The detailed CNN architecture’s parameters, including convolutional layers (Conv2D), pooling layers (MaxPooling2D), and fully connected layers (FC, also known as the dense layer), is shown in Table 3.

To minimize the sampling bias (when selecting data from the dataset) and ensuring statistical confidence, we adopted a 5-fold cross-validation (CV) in the classification evaluation. We split the dataset into five blocks, ensuring that each block has 200 random frames from each device since we collected 1000 frames per transmitter. Then, we performed five rounds of training and testing for each shallow classifier- and deep learning-based model. One block was selected as the test dataset, and the rest were used for training in each round. We considered the averaged overall test accuracy of the five-round CV as the final metric to evaluate the identification performance, i.e., , where is the overall test accuracy of the -th round of CV.

5.2. Identification Accuracy and the Supremum
5.2.1. Identification Under AWGN Channel

As shown in Figure 2, in each CFO range (2.5CFO means ppm), the identification accuracy of the shallow classifier-based scheme (i.e., with the classifier of KNN and hand-crafted feature CFO, denoted as HC) is always under the supremum (SUP). As the SNR increases, the accuracy first exceeds that of deep learning and then to the supremum. When the SNR is between and 0 dB, the gap between the accuracy of the shallow classifier-based scheme and the supremum is small. When the SNR increases from 0 to 50 dB, the gap first widens and then closes. When SNR ≥ dB, the accuracy reaches the supremum of 100% in the range of 20 ppm. It is reasonable because the error of the estimated CFO will be smaller at higher SNR, as discussed in Section 4. Moreover, the accuracy of the deep learning-based scheme (denoted as DP) in all CFO ranges converges to approximately 95% except for 20 ppm where the accuracy converges to 98%. It indicates that deep learning has limited discriminative capabilities for CFO at higher SNR compared with hand-crafted feature estimation. On the other hand, the deep learning-based scheme can achieve better performance at lower SNR. It means that the influence of SNR on the deep learning scheme is not as significant as that of hand-crafted feature estimation since the former performs better under low SNR.

5.2.2. Identification on More Transmitters

To observe the supremum and identification accuracy with a larger device scale, we implemented the simulation with 400 transmitters. Figure 3 shows the results considering 400 transmitters with the same CFO ranges as in Figure 2. It also presents the identification accuracy of the two shallow classifier-based and deep learning-based schemes with the same CFO ranges and the same transmitter quantity under the AWGN channel. Comparing Figures 3(a)3(d)with Figures 2(a)2(d), it is evident that the accuracy of both schemes with 400 transmitters is lower than those with 50 transmitters, respectively. Also, the supremum decreases too. Combined with Figure 2, we find that at the SNR of about 45 dB, no matter how the quantity of devices changes, the shallow classifier-based scheme consistently exceeds the deep learning-based one.

5.2.3. Identification Under Static Rayleigh Channel

In Figure 4, we also compare the identification accuracy under the static Rayleigh channel with the supremum. Comparing the performance of the deep learning-based scheme in Figure 2 with that of Figure 4, we can observe that the accuracy is improved obviously under the static Rayleigh channel in each CFO range. However, when looking into the three subfigures in Figure 4, we can find that expanding the CFO range has little effect on the accuracy improvement for the deep learning-based scheme. This result validates that deep learning can learn more information from the multipath channel than from the device’s feature, which is consistent with the study of [32]. In other words, the identification accuracy of the deep learning-based scheme will be affected more by the channel than device features. On the other hand, the shallow classifier-based scheme is less affected by the channel than the deep learning one since the accuracy under the Rayleigh channel is only improved slightly in each CFO range. Especially at low SNRs, the gap between the accuracy of the shallow classifier-based scheme and the supremum under the Rayleigh channel is slightly smaller than that of the AWGN channel.

5.3. The Device Quantity Supremum

Figure 5 presents the device quantity supremum under different SNRs considering CFO as the physical layer feature. Figures 5(b) and 5(d) show the details of Figures 5(a) and 5(c), respectively, when the accuracy constraint . As shown in Figures 5(a) and 5(c), the supremum will decrease with the increasing of the desired identification accuracy while also increases with the SNR. With the feature range of 2.5 ppm, the results in Figures 5(a) and 5(b) show that when the SNR dB, the receiver can identify no more than 15 transmitters under the required accuracy of 90%. Even at a high SNR of 40 dB, if a 90% accuracy is required, there should be no more than 140 transmitters. Moreover, when higher accuracy is required, the supremum will be smaller. For example, with the accuracy requirement reaching 99.9%, the device quantity supremum even decreased to 7 and 70 at the SNR of 20 and 40 dB, respectively. However, a small CFO range is quite common for some off-the-shelf wireless devices operating under rigorous synchronization requirements, such as mobile phones and high-end laptops [39].

In Figures 5(c) and 5(d), losing the feature range to 20 ppm, then we can see the supremum increases too. For the same accuracy requirement of 90%, the device quantity supremum can reach 112 at the SNR of 20 dB, more than seven times that in the CFO range of 2.5 ppm. But the supremum is only 11 when the SNR is 0 dB. With the high accuracy requirement of 99.9%, the supremum is 5, 56, and 563 at the SNRs of 0, 20, and 40 dB, respectively. Thus, it is evident that only considering one feature such as CFO, the device identification capability, i.e., the device number supremum, is small, especially when the range is small. Combined with the analysis of the supremum and the simulation results, we can find that a larger feature range, more precise feature estimation, and higher SNR of the signal can improve the accuracy and the device quantity supremum.

6. Conclusion

This paper analyzed the accuracy supremum and the device quantity supremum of the shallow classifier-based physical layer identification scheme based on the hand-crafted feature. Specifically, we mathematically analyzed and deduced the closed-form expression of the accuracy supremum of the identification scheme based on CFO. We also investigated the device quantity supremum, i.e., the supremum of the number of distinguishable devices. The simulation results are consistent with the theoretical analysis. According to the analysis and simulation, there is insufficient fingerprinting space only considering one feature, such as CFO. Thus, if we want to identify more devices, a larger feature range or higher SNR of the signal can help.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported in part by the National Key R&D Program of China (Grant No. 2018YFE0207600), the Natural Science Foundation of China (NSFC) under Grant 61972308, the Natural Science Basic Research Program of Shaanxi (Program No. 2019JC-17), the Hebei Science Supported Planning Projects Under Grant 20310701D, the JSPS A3 Foresight Program (Grant No. JPJSA3F20200001), and the Grants-in-Aid for Scientific Research (JSPS KAKENHI, Grant Number 20K19801).