Abstract

Wi-Fi-based indoor localization has received extensive attention in wireless sensing. However, most Wi-Fi-based indoor localization systems have complex models and high localization delays, which limit the universality of these localization methods. To solve these problems, a depthwise separable convolution-based passive indoor localization system (DSCP) is proposed. DSCP is a lightweight fingerprint-based localization system that includes an offline training phase and an online localization phase. In the offline training phase, the indoor scenario is first divided into different areas to set training locations for collecting CSI. Then, the amplitude differences of these CSI subcarriers are extracted to construct location fingerprints, thereby training the convolutional neural network (CNN). In the online localization phase, CSI data are first collected at the test locations, and then, the location fingerprint is extracted and finally fed to the trained network to obtain the predicted location. The experimental results show that DSCP has a short training time and a low localization delay. DSCP achieves a high localization accuracy, above 97%, and a small median localization distance error of 0.69 m in typical indoor scenarios.

1. Introduction

1.1. Background

As mobile smart devices and wireless networks influence all aspects of human production and life, location-based services (LBSs) have gradually become indispensable in people’s lives [1]. Although global positioning systems (GPSs) have become a leader in outdoor navigation and localization, the severe decline in GPS signals caused by reinforced concrete makes it difficult to navigate and localize indoors. The great research significance and practical value of indoor localization have attracted a large number of researchers, with a large number of indoor localization [24] methods emerging. Currently, wireless sensor networks, computer vision, ultrasonic, ultrawideband, Bluetooth, and RFID [5, 6] are mainly used in indoor localization. However, most of these localization methods require a large number of hardware devices, which limits the universality of localization. With the widespread adoption of Wi-Fi, an increasing number of researchers have applied Wi-Fi to wireless sensing. Wi-Fi-based indoor localization methods [79] have gradually become mainstream.

The received signal strength (RSS), as the main energy characteristic measurement of wireless signals, can be directly obtained from most wireless terminals and is widely used in Wi-Fi-based indoor localization systems [10]. RADAR, proposed by Microsoft Research, was the first system to implement indoor localization using WLAN [11]. RADAR uses RSS and fingerprint matching and proposed a fingerprint-based localization method for the first time. However, in the indoor environment, wireless signals propagate from the transmitter to the receiver through multiple paths due to the presence of obstacles such as walls and furniture. The signals received by the receiver are the superposition of signals from multiple paths. As an average of multipath signals, the RSS is unstable in indoor environments, which limits the reliability and localization accuracy of RSS-based indoor localization systems. Compared to RSS, channel state information (CSI) is fine-grained information from the PHY layer, which describes multipath propagation. Figuratively speaking, CSI is to RSS what a rainbow is to a sunbeam [12]. By modifying the firmware, a sample version of the channel frequency response (CFR) can be obtained in the form of CSI from some Wi-Fi network interface cards (NICs), such as Intel 5300. Each set of CSIs characterizes the amplitude and phase of orthogonal frequency division multiplexing (OFDM) subcarriers. Therefore, CSI has better static stability and dynamic sensitivity [13]. Many researchers have begun to use CSI in indoor localization [1416].

Fingerprints of different locations are generally different. Therefore, many researchers use classification algorithms to implement fingerprint-based localization. As traditional classification algorithms, kNN and Bayesian classification are widely used in fingerprint-based localization. There are many fingerprint-based localizations in Wi-Fi wireless sensing. Sen et al. proposed a fingerprint-based localization system called PinLoc [17] using the distinguishability of CSI at different locations. Xiao et al. used the frequency diversity and spatial diversity of CSI to construct subchannel power information location fingerprints based on CSI amplitude and proposed the FIFS system [18]. Xiao et al. subsequently proposed the first CSI-based passive indoor localization system Pilot [19]. Pilot uses probability estimation to match the obtained CSI with the fingerprint database to achieve position. Chapre et al. used the amplitude differences and phase differences of CSI to construct fingerprints and proposed a Wi-Fi fingerprint system CSI-MIMO [20]. Abdel-Nasser et al. [21] observed that the CSI probability density distribution of different locations nicely fits a Gaussian distribution mixture and used -means to construct a cluster-based fingerprint database. To improve the localization accuracy, Sabek et al. propose a single access point-monitoring point- (AP-MP-) based localization system MonoStream [22], which models the passive localization as an object recognition problem.

Compared with traditional methods, deep learning-based methods are more accurate. Wang et al. proposed deep learning-based CSI fingerprint indoor localization systems DeepFi and PhaseFi [23, 24] using the amplitude and phase of CSI, respectively. In the offline training phase, the system uses the weights of the deep learning training as the location fingerprints. In the online localization phase, the probability method based on the radial basis function is used to obtain the estimated location. Chen et al. proposed the first CNN-based Wi-Fi indoor localization system ConFi [25]. When extracting CSI features, ConFi simulates RGB images, which have three channels. The data of different transmitting-receiving antenna pairs in a multiple input multiple output (MIMO) system are regarded as the different channels of the image. ConFi predicts the location of the target by calculating the probability of the target at different training points. Similarly, Cai et al. constructed location fingerprints based on the amplitude of CSI and proposed a passive localization system PILC [26].

Most CSI fingerprint localization systems based on traditional methods have shortcomings, such as the single features of location fingerprint information and complex models. Although deep learning methods can mine more features, the training of network models generally takes considerable time and limits the real-time capability of localization. Therefore, we propose a CSI fingerprint localization system based on a lightweight convolutional neural network.

1.2. Our Contributions

In our method, we extract amplitudes of CSI subcarriers from data of different transmitting-receiving antenna pairs and calculate the differences. Then, we use amplitude differences to construct CSI feature images. Each channel of the CSI feature image corresponds to data of one transmitting-receiving antenna pair. We design a depthwise separable convolution-based network consisting of seven convolutional layers. In the offline training phase, we collect CSI indoors and construct CSI feature images for network training. In the online localization phase, CSI feature images of test data are sent to the trained network to obtain the predicted location.

The main contributions of this paper are as follows: (i)This paper proposes to use depthwise separable convolutions in the network to speed up network training and reduce localization delay in an indoor localization system based on a CSI fingerprint(ii)The proposed indoor localization system uses CSI feature images constructed from the amplitude difference of CSI subcarriers as location fingerprints. Similar to CSI feature images based on the amplitude of subcarriers, CSI feature images based on the amplitude difference of subcarriers combine time, frequency, and spatial domain information(iii)The proposed indoor localization system uses CSI feature images based on the amplitude difference of subcarriers, which can reflect the difference in the signal attenuation of different subcarriers in multiple paths and amplify the differences between different location fingerprints

The remainder of this paper is organized as follows. The related work and the preliminary are presented in Section 2. The DSCP system is presented in Section 3. Section 4 provides the experimental validation, while Section 5 concludes the paper.

2. Preliminary

This section introduces the concepts needed for the subsequent analysis, including CSI background information, fingerprint-based localization, and depthwise separable convolution.

2.1. Channel State Information

CSI represents the channel characteristics of the communication link between the transmitter and the receiver, reflecting the effects of scattering, attenuation, etc. of the signal propagation. According to IEEE 802.11n [27], a signal can be transmitted through a set of subcarriers with different frequencies and mutually orthogonal by OFDM. To estimate the CSI matrix, the transmitter transmits a known pilot sequence , and the combined received signal can be expressed as where is the CSI matrix and is the noise.

Therefore, the CSI matrix can be approximated as

For a system with subcarriers, can be expressed as where

For a MIMO system with transmitting antennas and receiving antennas, can be expressed as where corresponds to the complex number of subcarrier amplitudes and phases on the stream.

There are 56 subcarries in total in the 20 bandwidth and 114 subcarries in the 40 bandwidth. Grouping is a method that reduces the size of the CSI Report field by reporting a single value for each group of adjacent subcarriers [27]. The subcarrier frequency spacing is 312.5 in the 20 channel [28]. As shown in Table 1, for a 20 bandwidth, there are three grouping modes defined in IEEE 802.11n. is the number of subcarriers sent. This paper is based on the Wi-Fi CSI in a 20 channel.

2.2. Fingerprint-Based Localization

In an indoor environment, due to walls and obstacles, wireless signals form multipath effects. Humans at different physical locations have different effects on wireless signal propagation paths. The differences can be expressed as fingerprints that reflect the characteristics of humans at different locations [29]. Generally, in the case of small indoor environment changes, a valid location fingerprint needs to meet the following two conditions: (i)The fingerprints of different locations also have differences(ii)The fingerprints of the same location at different times are stable

Fingerprint-based localization is usually divided into two phases: the offline training phase and the online localization phase. In the paper, in the offline training phase of DSCP, we first divide the indoor scenario and set sampling points. The feature of CSI data collected at different sampling points are called location fingerprints. We construct CSI feature images as location fingerprints using amplitude differences of CSI subcarriers. Then, the location fingerprints are processed to construct a fingerprint map. We design DSCP network based on depthwise separable convolutions. The DSCP network is trained by location fingerprints of the fingerprint map during the offline training phase. In the online localization phase of DSCP, the target fingerprint is constructed from the CSI data of target location. The target fingerprint is input into the trained DSCP network. Finally, the predicted location is output by the network.

2.3. Depthwise Separable Convolution

Howard et al. of Google Inc. proposed a lightweight deep neural network, MobileNet [30], based on depthwise separable convolutions. Unlike standard convolutions, each depthwise separable convolution is made up of a depthwise convolution and a pointwise convolution. Depthwise convolution does not change the depth of the input image, and each channel of the image corresponds to a convolution kernel with a depth of 1. The pointwise convolution then applies a convolution to the image. As shown in Figure 1, assuming the input image is , where is the height of the image, is the width of the image, and is the number of channels, the size of the output image needs to be . If a standard convolution kernel of size is used, the calculation cost is . Using a depthwise separable convolution, the total calculation cost is , which is only of the standard convolution.

Although the calculation cost of a depthwise separable convolution is much lower than that of a standard convolution, the experiments of MobileNet show that the model using depthwise separable convolutions compared to standard convolutions only reduces accuracy by 1% on ImageNet. Depthwise separable convolution can significantly improve the computational efficiency of the network at only a small reduction in accuracy. Therefore, we use depthwise separable convolutions in the DSCP network to speed up training and improve localization efficiency.

3. The DSCP System

This section introduces the DSCP system, including the architecture of the system, location fingerprints, the DSCP network structure, time complexity of the network, and location estimation.

3.1. System Architecture

As shown in Figure 2, the DSCP system architecture includes an offline phase and an online phase. In the offline phase, the CSI data of each training location are first collected, and then, the amplitude of the CSI is extracted. DSCP constructs CSI feature images as fingerprints of each training location for training the network. In the online phase, the CSI data of the target location are collected, and the fingerprint of the target location is extracted to be input into the trained network. Finally, the predicted location is output by the DSCP network. The DSCP network has seven convolutional layers, as shown in Figure 3.

3.2. Location Fingerprint

Thirty subcarriers can be read for CSI information via Intel 5300 NIC. In this paper, we use the subcarrier amplitudes when constructing the location fingerprints. Each packet contains a CSI matrix, which corresponds to the complex numbers of subcarrier amplitudes and phases on the stream. For each transmitting-receiving antenna pair, a sliding window of size is used to select consecutive data packets in a sequence. To match the 30 subcarriers, we make the height and width of CSI feature images the same. So we set to 30, which means subcarriers of 30 consecutive data packets are chosen to construct one matric. Multiple sets of matrices are constructed. The structure of CSI feature images is shown in Figure 4. Similar to RGB images, the data of each antenna pair correspond to one channel, and each location has some CSI feature images of [25]. As shown in Figure 5, the CSI feature images formed by the amplitudes of subcarriers can reflect the difference of adjacent locations.

The amplitudes of the subcarriers characterize the power fading of the wireless signals between the transmitter and receiver [31]. The signal received by the receiver also contains measurement noise. The frequency difference between adjacent subcarriers is 625 kHz in our experiments. Different frequency subcarriers are affected differently by the multipath effect, and the difference can be reflected in the subcarrier amplitude. ConFi [25] is the first CNN-based Wi-Fi indoor localization system, which proposes to use CSI feature images as location fingerprints. ConFi uses amplitudes of subcarriers from consecutive data packets to construct CSI feature images. ConFi maps the CSI feature subimages from the different antenna pairs into the RGB channels of the image. The elements in the row are amplitudes of 30 subcarriers in one packet. The experiments of ConFi show that CSI feature image of size gets the highest accuracy, which is about 90% on test set. Therefore, CSI feature images based on amplitude of CSI subcarriers are valid location fingerprints. In indoor environments, wireless signals are affected by multipath effects. We consider that the amplitudes of the subcarriers are greatly affected by the physical environment, which may result in little influence of humans on CSI. Different from ConFi, to amplify the differences between fingerprints of different locations, we use the amplitude differences between adjacent subcarriers to construct CSI feature images at different locations. The CSI feature images combine the time domain, frequency domain, and spatial domain information of CSI data. Therefore, the CSI feature images of each location are used as the location fingerprints (as shown in Figure 6).

For transmitting antenna and receiving antenna , the amplitude of subcarrier in packet can be expressed as

The amplitude difference between adjacent subcarriers is

Then, a CSI feature image can be expressed as

The pseudocode for CSI feature image construction is presented in Algorithm 1. The input of the algorithm includes packets with CSI matrix for each of the locations and sliding window size . First, for data of each transmitting-receiving antenna pair, we extract the amplitude of CSI subcarriers from each packet (line 4). Then, we compute the amplitude difference (line 5). Finally, we use a sliding window to group the amplitude difference of 30 packets to construct n CSI feature images of each location (lines 10-13).

Input: packets with CSI matrix for each of the locations, sliding window size .
Output: CSI feature images of each location.
fordo
fordo
  fordo
   For antenna pair , extract the CSI amplitude
   Compute the amplitude difference
  end
end
end
;
fordo
  Construct the CSI feature images base on ;
  
end
3.3. Network Structure

Inspired by MobileNet, we design the DSCP network. The DSCP network is based on depthwise separable convolutions, which have seven convolutional layers. The first convolutional layer of the DSCP network is a full convolution, and the others are depthwise separable convolutions. Table 2 shows the network structure. In the network, the standard convolution, pointwise convolutions, and first depthwise convolution stride are set to 1 to keep the size of the image. The second and third depthwise convolution strides are set to 2. There is a batchnorm and ReLU after each depthwise convolution and pointwise convolution. Since only 30 subcarriers can be extracted with the Intel 5300 NIC, the sizes of the CSI feature images are much smaller than real images. Therefore, the size of each DSCP depthwise convolution is set to . The output size of the final fully connected layer depends on the number of training locations. Adam [32] is an effective random optimization algorithm with high computational efficiency and low memory requirements. We use Adam optimization in the network training.

Similar to processing images with the CNN, DSCP considers the amplitude differences between four adjacent subcarriers of the same packet in each convolution. Moreover, for the same pair of adjacent subcarriers, DSCP also considers their amplitude differences at three consecutive time points. Therefore, the DSCP network can learn the correlation of the CSI amplitude differences in the time and frequency domains.

Three channels of the CSI feature images correspond to the data of different transmitting-receiving antenna pairs in the MIMO system. The standard convolution considers different channels of the image simultaneously in the convolution. Different from standard convolutions, the depthwise convolutions of DSCP use different convolution kernels for three channels of the CSI feature images separately. DSCP realizes the complete separation of learning the frequency domain, time domain correlation, and learning spatial domain correlation of the CSI subcarriers, reducing the coupling of the convolution kernels.

3.4. Location Estimation

During the localization phase, test data will be fed to the trained network, which will output the probability of the test data at each of the trained locations. Probability-based methods tend to have higher precision than classification-based methods. Therefore, we use the weighted mean method based on probability to estimate the final location. Assuming that there are training locations, the coordinate of the th training location is , and the probability of the test data at the th training location is , the estimated location can be expressed as

The pseudocode for network training and location estimation is presented in Algorithm 2. The input of the algorithm are the CSI feature images for training and CSI feature image of the target. First, CSI feature images are used to train the DSCP network (lines 1-3). Then, the probability of the target at each training location will be computed by sending the CSI feature image of the target to the trained DSCP network (line 6). Then, we use the method of weighted mean method based on probability to estimate the location of the target (lines 8-10).

Input: CSI feature image of the training data and CSI feature image of the target , location of trained locations
Output: Location of the target.
Fordo
 Train DSCP network using CSI feature image of the training data ;
end
;
fordo
 Compute the probability of the target at location by the trained DSCP network;
end
fordo
;
end
3.5. Time Complexity of the Network

The total time complexity of all convolutional layers [33] is where is the index of a convolutional layer, is the number of convolutional layers, is the output channels of the th layer, is the length of the output feature image, and is the length of the filter. The time complexity of the network is the sum of the time complexity of each layer.

If only using standard convolution kernels, the DSCP network will have 4 convolutional layers, and the structure is shown in Table 3. We estimate the time complexity of the DSCP and Conv_DSCP networks. For simplicity, we only consider the convolutional layers. Assume the size of the input image is , and the calculation cost of Conv_DSCP is approximately 4,867,776 . The DSCP is approximately 163,397.5 , which is only 3.36% of the Conv_DSCP. Thus, the DSCP can achieve a short low network training time and low localization delay.

4. Experiment Validation

To illustrate the performance of the proposed DSCP method, a set of comparisons are performed with the other three methods, including the two methods ConFi [25] and PILC [26] based on CNN and the other method DeepFi [23] based on deep learning. The comparison analysis is based on the same illustrative examples.

4.1. Experiments Setup

In the experiments, we use a TP-LINK router as the transmitter. A Dell commercial desktop PC equipped with Intel 5300 NIC and Ubuntu 14.04 is used as the receiver. The TensorFlow is accelerated with a NIVIDA RTX2080 graphics card during the experiments. The sampling rate is set to 50 Hz. DSCP is a single-target localization system. When collecting data, there is only one person in the room. We wrote a script to perform data collection. The tester stands at each location for 60 seconds, and the receiver can automatically collect 3,000 data packets. The Intel 5300 NIC has three antennas, and the wireless router has one antenna, which is just a MIMO system. The amplitudes and phases of the 30 subcarriers can be acquired from the NIC. Due to the hardware, there are errors in the phase information acquired by the common commercial network card, and the original phase information is often not directly used. It is common to calculate the exact phase shift by means of multiple linear regression and then experiment with the phase shift. To simplify the system, we only use the amplitude information of CSI.

In this paper, four indoor scenarios were selected for experiments (as shown in Figures 710). The sampling points were divided into training locations and test locations marked as blue and red, respectively. Scenario 1 was a  m laboratory. A total of 32 sampling points were set. The spacing between adjacent sampling points was 1 m. Twenty-four locations were selected for training and 8 for testing. Scenario 2 was a 40 m2 laboratory. Forty-five locations were selected for training, and 32 locations were selected for testing. Scenario 3 was an office of  m. Sixty-six locations were selected for training and 30 for testing. The spacing between adjacent sampling points was 0.6 m in both Scenarios 2 and 3. Scenario 4 was a typical living room of . Fifteen locations are selected for training and 5 for testing. The spacing between adjacent sampling points was 0.5 m.

4.2. The Comparison Works

The performance of DSCP was evaluated in four scenarios. We compared DSCP with three deep learning-based methods, PILC [26], ConFi [25], and DeepFi [23]. ConFi was the first CNN-based Wi-Fi indoor localization system. PILC and ConFi both use CSI amplitudes to construct CSI feature images, which have three channels. The data of different transmitting-receiving antenna pairs are used in a MIMO system. PILC is a passive localization system based on CNN. The PILC network has six layers. DeepFi uses CSI information for all the subcarriers from three antennas and proposes to use the weights in the deep network to represent fingerprints. DeepFi uses a probabilistic data fusion method based on the radial basis for online location estimation.

For DSCP, experiments based on CSI amplitude and amplitude differences were performed. The median distance error and the mean distance error of different localization methods were calculated. The median distance error was the median of all the test case localization errors. The mean distance error is the average of all the test case localization errors.

4.3. Model Evaluation

DSCP was compared with two CNN-based indoor localization methods, PILC and ConFi. As shown in Figure 11, the training loss of PILC started to converge when the number of epochs was approximately 50. The training loss of ConFi converged when the number of epochs was approximately 1,000. The training loss of the DSCP converged when the number of epochs is approximately 20. In subsequent experiments, the values of the epoch were set according to convergence conditions of different networks. The network training time of the three localization methods based on the CNN is shown in Figure 12 and Table 4. The time unit is second.

In this experiment, 75% of all training location data were the training set, and the remaining 25% were the test set. The localization accuracies of DSCP, PILC, and ConFi were calculated. The batch sizes of DSCP and PILC were the numbers of training locations. The epochs of DSCP, PILC, and ConFi were 30, 60, and 1,200, respectively. The batchsize of ConFi was 256. As shown in Figure 13 and Table 5, the localization accuracies of the DSCP in Scenarios 1 and 2 were both high, above 99%, and the localization accuracy in Scenario 3 reaches 97%. The localization accuracy of the DSCP in each scenario was higher than those of the other two methods.

4.4. Localization Performance

Scenario 1. Figure 14 and Table 6 show the localization errors of the four methods in Scenario 1. Scenario 1 was a typical laboratory with some indoor furniture. It can be seen from the experiments that DSCP obtained a median error of 1.75 m and a mean error of 1.77 m. Compared with other methods based on deep learning, the localization error of DSCP was smaller. DSCP was 57.71% lower than PILC in median distance error, 44.82% lower than ConFi, and 24.23% lower than DeepFi. Figure 15 presents the cumulative distribution function (CDF) of localization errors of the four methods in Scenario 1. With DSCP, approximately 40% of the test cases had an error under 1.5 m, and approximately 73% of the test cases had an error under 2 m.

Scenario 2. Figure 16 and Table 6 show the localization errors in Scenario 2. There was no obstacle in the experimental area of Scenario 2, and there were line of sight (LOS) paths between the transmitter and the receiver. As the experiments show in Scenario 2, the DSCP obtained a median error of 0.98 m and a mean error of 1.16 m. DSCP was 42.83% lower than PILC in median distance error, 38.93% lower than ConFi, and 29.77% lower than DeepFi. Figure 17 presents the CDF of localization errors of the four methods in Scenario 2. With DSCP, 55% of the test cases had an error under 1 m, and approximately 90% of the test cases have an error under 2 m.

Scenario 3. Figure 18 and Table 6 show the localization errors in Scenario 3. Scenario 3 was a typical office with a large number of office chairs in the room. The experimental results show that DSCP obtained a median error of 2.63 m and a mean error of 2.54 m. DSCP was 18.33% lower than PILC in median distance error, 12.31% lower than ConFi, and 4.42% lower than DeepFi. Figure 19 depicts the CDF of localization errors of the four methods in Scenario 3. With DSCP, 30% of the test cases had an error under 2 m, and approximately 75% of the test cases have an error under 3 m.

Scenario 4. Figure 20 and Table 6 show the localization errors in Scenario 4. Scenario 4 was a living room with ordinary furniture in the room. The experimental results show that DSCP can obtain a median error of 0.69 m and a mean error of 0.91 m. DSCP was 30.61% lower than PILC in median distance error and 50.90% lower than ConFi. Figure 21 depicts the CDF of localization errors of the four methods in Scenario 4. With DSCP, 45% of the test cases had an error under 0.5 m, and approximately 75% of the test cases have an error under 1 m. The experiments show that the four methods achieved better performance in Scenarios 2 and 4 than the other two scenarios. The reason may be that there was no obstacle between the transmitter and the receiver to block wireless signals in Scenarios 2 and 4. In Scenario 3, the multipath propagation of the wireless signal was more complicated, so the localization performance was not as good as that in the other scenarios. The location fingerprints based on the CSI amplitude differences were slightly better than those based on amplitudes.

The experiments show that the four methods achieve better performance in Scenarios 2 and 4 than the other two scenarios. The reason may be that there is no obstacle between the transmitter and the receiver to block wireless signals in Scenarios 2 and 4. In Scenario 3, the multipath propagation of the wireless signal is more complicated, so the localization performance is not so good as the other scenarios. The location fingerprints based on the CSI amplitude differences are slightly better than that based on amplitudes.

5. Conclusion

This paper proposed a passive CSI indoor localization system DSCP by Wi-Fi wireless sensing. A neural network based on depthwise separable convolutions in fingerprint-based localization was adopted to speed up network training and reduce localization delay. We conducted the experiments in four typical indoor scenarios. DSCP was compared with several existing CSI fingerprint-based indoor localization methods. However, in different indoor scenarios, the experimental results will be affected by different environments. The experimental results showed that DSCP achieved better performance than other methods in terms of the localization accuracy and localization error. DSCP was also more efficient than other CNN-based localization methods in network training and localization. A possible future extension is using both the amplitude and phase of CSI subcarriers when constructing CSI feature images and then using CSI feature images as location fingerprints in a depthwise separable convolution-based passive indoor localization system.

Data Availability

The CSI fingerprint data used to support the findings of this study are available from the corresponding author upon request.

Disclosure

An earlier and preliminary version of this paper was presented as conference in 2020 IEEE Wireless Communications and Networking Conference (WCNC), https://ieeexplore.ieee.org/document-/9120638.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work is partly supported by the National Natural Science Foundation of China under grant nos. 61702284, 61873131, and 61872194 and Postgraduate Research & Practice Innovation Program of Jiangsu Province under grant no. KYCX19_0972.