Abstract

With the emergence of tools for extracting CSI data from commercial WiFi devices, CSI-based device-free activity recognition technology has developed rapidly and has been widely used in security monitoring, smart home, medical monitoring, and other fields. However, the existing CSI-based activity recognition algorithms need a large number of training samples to obtain the ideal recognition accuracy. To solve the problem, an attention-based bidirectional LSTM method using multidimensional features (called MF-ABLSTM method) is proposed. In this method, the signal preprocessing and continuous wavelet transform algorithms are used to construct time-frequency matrix, the sample entropy is used to characterize the statistical feature of CSI amplitudes, the energy difference at a fixed time interval is used to characterize the time-domain feature of activities, and the energy distribution of different frequency components is used to characterize the frequency-domain feature of activities. By expanding the training samples with the proposed tensor prediction algorithm, the accurate activity recognition can be realized with only a few samples. A large number of experiments verify the good performance of MF-ABLSTM method.

1. Introduction

In recent years, the activity recognition technology has developed rapidly and has been widely used in smart home, medical care, safety monitoring, and other fields. The activity recognition can be divided into wearable device [1] and device-free-based recognition technologies. The former requires the target to be equipped with wearable devices, which is inconvenient and increases the cost. The latter does not require the target to carry any devices. Therefore, the device-free activity recognition technology has become the main research direction in this field.

The device-free activity recognition can be divided into video, RSS (Received Signal Strength), and CSI- (Channel State Information-) based recognition methods. The video-based activity recognition method is a popular method [2, 3], which can recognize activity intuitively with high recognition accuracy. However, it has some inherent shortcomings such as requiring a target within the line of sight, requiring good lighting conditions, and invasion of privacy. The RSS-based activity recognition method recognizes the activities according to the reflection, refraction, diffraction, and absorption law of radio frequency signal caused by human body [4]. However, this kind of method needs too many nodes, the cost is high, and RSS is vulnerable to environmental interference and has poor stability, so it is gradually replaced by CSI-based activity recognition method. With the rapid development of the Internet of things (IOT) [5, 6] and the emergence of tools for extracting CSI data from commercial WiFi devices, the CSI-based activity recognition method has become a research hotspot in this field and has been widely used because of its advantages such as convenient data acquisition, not requiring additional devices, not requiring target to carry devices, no limitation of illumination and line of sight, no invasion of privacy, and good data stability.

The CSI-based activity recognition methods are mainly divided into two categories: machine learning and deep learning-based methods. The research on machine learning-based methods started earlier, for example, Support Vector Machine (SVM), K-means, and Naive Bayesian-based methods. These methods all need to denoise, intercept, and extract features for the original data and then input the processed data into the corresponding machine learning algorithms in order to recognize the activities. The above-mentioned common methods-based on machine learning have the following problems: (i) SVM is a memory-intensive algorithm, which needs complex operations to select the correct kernel function, and is not suitable for large datasets. (ii) K-means is an algorithm that needs to specify the number of clusters, and the selection of K values is usually complicated. If the clusters in the training data are not spheroidal, the K-means algorithm will lead to some poor clusters. (iii) Naive Bayesian algorithm is too simple, and the accuracy of activity recognition is poor.

In recent years, with the deepening research of deep learning technology by Stanford University and Google Inc., the deep learning algorithms have achieved excellent results in image recognition, speech recognition, and natural language processing [79]. Long short-term memory neural network (LSTM) is a special type of recurrent neural network (RNN) proposed by Hochreiter and Schmidhuber in 1997, which is suitable for data processing and prediction with relatively long interval and delay in time series [10]. Some scholars have applied LSTM algorithm to CSI-based device-free activity recognition method. For example, Damodaran and Schäfer [11] and Wang et al. [12] used LSTM algorithm to recognize human activities and locations, respectively, and achieved high recognition accuracy. However, the above methods only used the simple single-layer one-way LSTM algorithm which is too simple to recognize complex activities. The reason is that it is difficult to recognize the CSI changes caused by complex activities only by using the forward a priori information, so that the recognition accuracy is low. For the problem, Chen et al. [13] proposed an attention-based BLSTM method (called ABLSTM method in the paper) for passive human activity recognition. In the method, the representative features are learned bidirectionally from the original CSI, and the attention mechanism is used to assign different weights to the learned features, so as to achieve better human activity recognition performance. However, in the method, the denoised experimental data is directly input to ABLSTM network, and the features are automatically extracted by ABLSTM algorithm, which lacks feature directionality. Moreover, to achieve higher recognition accuracy, more training samples are needed, which will greatly increase the cost of collecting training samples. When the number of training samples is insufficient, overfitting problem will occur, which results in poor generalization ability of ABLSTM network, so that the accuracy of activity recognition decreases. When applying the activity recognition system, most users are unwilling to spend time on collecting training samples. For example, when the elderly people use the nursing system, they are generally unwilling to cooperate to collect a sufficient number of training samples. Therefore, how to use a small number of training sample sets (i.e., small samples) to achieve high accuracy human target activity recognition is an urgent problem to be solved in the deep learning method. Metalearning is a common method to solve some small sample problems and mainly uses the learned prior knowledge and small samples to recognize new patterns. However, the prior knowledge can only be achieved by using a large number of training samples, so the metalearning method is not suitable for the problem of the small samples in this paper.

For the problem of small sample activity recognition proposed in this paper, firstly, we use the statistical features of CSI amplitude and time-frequency domain features to construct the feature matrix for the input of ABLSTM network, based on the method of literature [13]. Thus, our multidimensional feature ABLSTM network can extract human target activity features from the more directional data and classify the activities more accurately. Secondly, in order to further improve the recognition accuracy, we propose a tensor prediction method, which expands a small number of training samples to generate enough training samples with similar characteristics as the existing samples. Specifically, the main contributions of this paper are as follows:(1)To reduce the cost of collecting training samples, a method to generate training samples by using tensor prediction is proposed, which can generate a large number of training samples with similar characteristics as a small number of training samples and improve the activity recognition accuracy of deep learning methods.(2)To further improve the accuracy of ABLSTM-based human activity recognition method, an ABLSTM deep learning method using multidimensional features (MF-ABLSTM method) is proposed. In this method, the sample entropy is used to characterize the statistical feature of CSI amplitudes, the energy difference at fixed time interval is used to characterize the time domain feature of human activities, and the energy distribution of different frequency components is used to characterize the frequency domain feature of human activities. The feature vector composed of these features is input to the ABLSTM deep learning network.(3)To verify the performance of the proposed method, a large number of experiments are carried out. The experimental results show that MF-ABLSTM method can still achieve more than 92% recognition accuracy of human activities in the case of small CSI samples.

The rest of this paper is organized as follows. Section 2 introduces the related work on human activity recognition. Section 3 introduces the tensor prediction algorithm for sample expansion. Section 4 describes the proposed MF-ABLSTM method in this paper. In Section 5, the performance of the proposed method is verified by experiments and discussed. Section 6 gives the conclusion of this paper.

At present, a large number of CSI-based human activity recognition applications have emerged, including human daily activity recognition, gesture recognition, fall detection, and physiological index perception.

2.1. Daily Activity Recognition

Wang et al. [14] proposed an activity recognition algorithm based on channel selection. In the algorithm, the WiFi channel with good quality is actively selected, the extended channel which jumps seamlessly between adjacent channels is constructed, and then the time and frequency features are input to LSTM network for activity recognition. In the existing human activity recognition methods, the temporal correlation of CSI in each subcarrier is considered, but the spatial correlation is ignored. To solve this problem, Cui et al. [15] proposed the WiAReS system. In the system, not only the temporal correlation is considered, but also the spatial correlation is analyzed. While keeping the locality of temporal and spatial pattern, Convolutional Neural Network (CNN) is used to automatically extract features from CSI, and an integrated structure that integrates Multilayer Perception (MLP), Random Forest (RF), and Support Vector Machine (SVM) is proposed. Sheng et al. [16] also considered spatial-temporal information and used the CNN to automatically extract the CSI features, but they used BLSTM to recognize the activities and designed a transfer learning method. Wang et al. [17] proposed a CSI velocity model which quantifies the relationship between CSI change and human motion speed and also proposed a CSI activity model to quantify the relationship between human motion speed and human activity. By combining the two models, they achieved the high human activity recognition accuracy. Fang et al. [18] proposed a layered hybrid model based on directed statistical model for the scenario where there are new activities that are not predefined or trained in the environment. The model can be updated incrementally without collecting a large amount of training data and storing historical perception data. For this scenario, Zhang et al. [19] proposed a data augmentation method for transforming and synthesizing CSI data and designed a Dense-LSTM deep learning model to solve the overfitting problem of small-size CSI dataset.

2.2. Gesture Recognition

Abdelnasser et al. [20] used RSSI to complete gesture recognition through three steps: primitive extraction, motion recognition, and motion mapping. In the primitive extraction step, discrete wavelet denoising, edge extraction, and primitive detection are needed. In the motion recognition step, the primitive gesture is segmented and recognized. Finally, the gesture composed of several motions is determined in the motion mapping step. Ohara et al. [21] recognized gestures by using CSI extracted from smart phones. In the method, the component corresponding to the user’s hand movement speed is extracted according to Doppler frequency shift, and the human gestures are recognized without knowing the target localization. Bu et al. [22] proposed a gesture recognition method based on deep transfer learning. Firstly, the CSI stream representing gestures are captured, and the gesture fragment data is extracted by using the amplitude change of CSI. Then, the gesture fragment data is expressed in the form of image matrix, and CNN is used to extract features. Finally, deep transfer learning technology is used to complete gesture recognition. Zhang et al. [23] proposed a gesture recognition system, WiNum, based on gradient boosting decision tree (GBDT). In this system, the discrete wavelet transform is used to eliminate the noise in the original CSI data, and the proposed adaptive segmentation algorithm (AGS) based on entropy difference is used to segment gestures. Experimental results show that the average recognition accuracy of the system for finger gestures reaches 91%. Thariq et al. [24] proposed a sign language recognition method based on wireless devices, which is used to recognize 30 static gestures and 19 dynamic gestures. In the method, the SVM, KNN, and neural network algorithms are used to evaluate the accuracy of gesture recognition. Experimental results show that SVM algorithm can achieve higher gesture recognition accuracy in home and office environments.

2.3. Fall Detection

CSI-based human activity recognition technology is not only applied to the human daily activity and gesture recognition, but also applied to some specific scenes, such as the fall detection of the elderly. Han et al. [25] used CSI for the first time to detect the fall behavior of the elderly, and a warning was issued when the elderly was in danger. To reduce the influence of environment on fall detection algorithm, Hu et al. [26] proposed a fall detection system, DeFall, which is independent of environment. The system consists of an offline stage and an online stage. In the offline stage, the system first models the speed and acceleration of human fall and then uses DTW to generate typical human fall characteristics. In the online stage, the system evaluates the similarity between CSI features and typical features by analyzing the real-time speed and acceleration of human body and then detects the fall behavior. To compare the performance of different deep neural network algorithms, Cheng et al. [27] evaluated the performance of fall detection using CNN, GRU, and LSTM algorithms, respectively. The experimental results show that the GRU algorithm has the best fall detection performance. However, the commercial application of deep learning algorithm is limited because of the long training time. To solve the problem, Ding et al. [28] proposed a method to automatically identify the fall state by RNN. In the method, the collected data is uploaded to the proxy server which processes the data and identifies the fall state, and the client application obtains the processing result of the algorithm from the proxy server.

2.4. Physiological Index Perception

Model-based CSI recognition method does not need offline training process and can recognize the fine-grained breathing and other life characteristics. Zhang et al. [29] proposed a method of monitoring human respiration by using Fresnel diffraction model. In this method, Fresnel diffraction model is used to accurately quantify the relationship between diffraction gain and slight displacement of human chest, and the best respiratory monitoring position is determined by observing heatmap. The experimental results show that this method can achieve more than 98% respiratory rate monitoring accuracy. To study the influence of human body position and direction on respiration detection, Wang et al. [30] proposed the theory of the relationship between the human body position and direction and the detectability of respiration, which explains when and why people’s respiration can be detected by WiFi devices. In the practical application of respiratory detection, there will inevitably be multiplayer scenarios. According to Fresnel zone model, Yang et al. [31] carefully deployed the locations of WiFi transceiver, so that the information influenced by different people can be separated from the received CSI, and the information can correspond to people. In addition, they also considered the influence of people’s sleep movement and sleep posture on the signal, so as to improve the robustness of the system.

3. Training Sample Expansion

When using deep learning algorithm for activity recognition, the larger the number of CSI training samples, the higher the recognition accuracy of the algorithm. However, collecting a large number of training samples will greatly increase the collection cost, and it is not suitable for the transfer learning of the algorithm in different environments. Therefore, the existing few training samples will be extended by using the following tensor prediction algorithm in the paper.

The existing CSI training samples are expressed as three-dimensional tensor , where the three dimensions are the number of samples , the number of subcarriers , and the number of sampling points , respectively. A tensor with twice the number of samples is constructed, and the 2u-th and (2u+1)-th initial samples of are equal to the u-th sample of . The following optimization problem is constructed [32].where is also a three-dimensional tensor, represents the predicted tensor of the expanded sample, is a constant, and and indicate F norm and trace norm, respectively. The trace norm of can be defined as follows:where represents the matrix obtained by ‘unfold’ operation according to the -th dimension of . It can be seen from (2) that the trace norm of is the average of the trace norms unfolded according to three dimensions of tensor. Therefore, the optimization problem of (1) is equivalent towhere represents the matrix obtained by ‘unfold’ operation according to the -th dimension of . To simplify the optimization problem of (3), Liu et al. [32] transformed (3) into the following:where is the additional matrix and , , and represent the weight coefficients of each item. In the tensor , , and the matrix , only one is set to change, and the others are set to be fixed. Block coordinate descent method is used to optimize each variable, and iterative method is used to obtain the final prediction tensor.

Firstly, only is set to change in the -th iteration. The equation of optimizing matrix is as follows:

Let be the singular value decomposition of matrix , and be the diagonal matrix with diagonal elements formed by the singular value of in descending order. Equation (5) can be solved as follows [32]:where , , and .

Then, only is set to change. The equation of optimizing tensor is as follows:

Equation (7) can be solved as follows:where is the function of synthesizing tensor and represents “fold” operation for .

Finally, only is set to change. The equation of optimizing tensor is as follows:

Equation (9) can be solved as follows:

After calculating the prediction tensor , the -th iteration ends. The root mean square error of the two round prediction samples is less than the set threshold, or after reaching a fixed number of iterations, the iteration ends.

4. MF-ABLSTM Method

4.1. System Framework

The framework of MF-ABLSTM method proposed in this paper is shown in Figure 1. The MF-ABLSTM method is divided into two stages: offline training and online testing. Both stages need to preprocess the collected CSI data, including antenna and subcarrier selection, Gaussian filter denoising, and activity interval interception. Then, feature extraction is needed. The extracted multidimensional features include energy difference at fixed time interval, energy distribution of different frequency components, and sample entropy. In the offline training stage, the extracted multidimensional features are input into ABLSTM network for training. In the online testing stage, the extracted multidimensional features are also input into ABLSTM network, and the human activities are recognized according to the parameters obtained in the training stage.

4.2. System Implementation
4.2.1. Construction of Time-Frequency Matrix

The time domain or frequency domain features alone cannot fully characterize the influence of human activities on CSI amplitude, so the time-frequency domain combination features of samples are used. To extract the time-frequency domain features of samples, the time-frequency matrix of samples is constructed, and the specific methods are as follows: (i) The antenna data is selected for experiments according to the overall fluctuation of CSI amplitude. (ii) For 30 subcarriers of the selected antenna, the average value of CSI amplitude is calculated as the subsequent data to be processed. (iii) The original CSI amplitude contains a lot of environmental noise, so Gaussian filter is used to denoise the average value of the CSI amplitude. (iv) Because the collected CSI amplitude includes the stationary interval when the human body is still and the fluctuation interval when the human body is moving, and only the fluctuation interval contains the characteristics reflecting the human activities, the classical variance method [33] is used to intercept the filtered CSI amplitude. (v) Morlet wavelet is used to carry out continuous wavelet transform on the intercepted CSI amplitude in order to construct time-frequency matrix for subsequent feature extraction.

4.2.2. Feature Extraction

To recognize human activities more accurately, we fully explore the features that can characterize human activities from two aspects: the statistical feature and time-frequency features of CSI amplitude. We use the sample entropy of CSI amplitude as the statistical feature which can characterize the complexity of CSI amplitude. In order to characterize the change law of CSI amplitude caused by human activities, we use wavelet transform to transform CSI amplitude to time-frequency domain and construct time-frequency matrix. Then, the energy difference at fixed time interval and the energy distribution of different frequency components are used to characterize the change law of the time domain and the frequency domain, respectively. The above statistical feature and time-frequency features are combined to form the feature vector which is input to the ABLSTM network.

(1) Statistical Feature. Because different human activities have different effects on CSI amplitude, the complexity of CSI amplitude can be used as the characteristic of recognizing human activities. Sample entropy is a statistic to measure the complexity of time series. The greater the sample entropy, the higher the complexity of time series. In this paper, the sample entropy of CSI amplitude is used to characterize the complexity of CSI amplitude change caused by different human activities. The specific steps of calculating sample entropy are as follows:(1)Let the CSI amplitude after interval interception be time series , where represents the -th time component, .(2)Let change from 1 to , and construct vectors , where is continuous CSI amplitudes starting from , that is, .(3)Let be and (), respectively, and define the distance between and as follows:(4)Let the threshold value be . For each , count the number of whose is less than , calculate the ratio of this number to , and represent it as . Then, calculate the average value of as follows:(5)Let , repeat steps (2)–(4), and calculate . Then, the sample entropy of CSI amplitude can be calculated as follows:

To further explain the reason for using sample entropy, we randomly select ten samples from the five kinds of activity samples of boxing, falling, running, walking, and sitting, respectively, and then calculate the corresponding sample entropy. The experimental results are shown in Figure 2. Figure 2 shows the comparison results of sample entropy of CSI amplitude influenced by the five activities, where the abscissa is the sample serial number and the ordinate is the value of sample entropy. It can be seen from Figure 2 that CSI amplitudes corresponding to different human activities have different sample entropy and the sample entropy corresponding to different samples of the same activity is relatively stable. Therefore, the sample entropy of CSI amplitude is used as the statistical feature of human activity recognition in the paper.

(2) Time-Frequency Features. In this paper, CSI samples are constructed into time-frequency matrix, which contains both time domain information and frequency domain information of CSI samples. Therefore, the features characterizing different human activities are extracted from time domain and frequency domain in the paper, respectively.

Wang et al. [17] have verified that the motion speed of different human body parts is directly related to human body activities and also has a quantitative relationship with the energy of time-frequency components of CSI amplitude. Therefore, the parameters related to the energy of time-frequency components of CSI amplitude can be extracted as the features of human activity recognition. In this paper, the constructed time-frequency matrix is regarded as a time-frequency image of CSI samples. As shown in Figure 3, the time-frequency matrix can be expressed as follows:where represents the -th frequency component (), represents the -th time component (), and is the energy value of time-frequency component of CSI amplitude.

To extract the time domain features in the samples that can characterize human activities, in this paper, time components are divided into fixed time intervals with the step length , the sum of energy values in each time interval is calculated, and then the difference between the sums of energy values in adjacent time intervals is calculated as follows:where and represent the time domain features in the -th sample extracted in this paper. can characterize the changes of different human activities with time and movement speed, as shown in the upper right subpicture of Figure 3.

To extract the frequency domain features in the samples, in this paper, frequency components are divided into fixed frequency intervals with the step length , and the sum of energy values of each frequency interval is calculated as follows:where and represent the frequency domain features in the -th sample extracted in this paper. can characterize the frequency domain energy distribution of different human activities, as shown in the lower right subpicture of Figure 3. The different frequencies correspond to different human motion speeds, so the different human activity has different frequency domain energy distribution.

To sum up, and can better characterize the changes of CSI amplitude caused by human activities in time domain and frequency domain, so using the two features can more accurately recognize human activities.

4.3. MF-ABLSTM Algorithm
4.3.1. Attention Mechanism

When people observe a scene, they will focus on specific parts according to their interest. If the specific part appears repeatedly in similar scenes, people will pay special attention to the specific part when observing. Inspired by this phenomenon, the attention mechanism of neural network came into being and is also called resource allocation method [34]. In the attention mechanism, a weight coefficient is automatically generated to characterize the correlation between the hidden state of neural network encoder and the hidden state of the decoder. The is expressed as follows:where is the -th output of neural network and is the weighted sum of hidden state , which is calculated as follows:where provides a mechanism for the output of decoder to pay attention to important input features. Different attention mechanisms use different methods to generate weight coefficients . In this paper, the soft attention method is used [35].

4.3.2. BLSTM Algorithm

RNN is widely used in many fields, such as speech recognition [36] and image processing [37], because of its ability to process variable length sequence data. However, when processing long series data, the RNN has some problems such as gradient disappearance and gradient explosion [38]. LSTM neural network is a special kind of RNN, which solves the problems of RNN well. LSTM can retain the information that needs to be memorized for a long time and forget the unimportant information by controlling multiple gates. Therefore, LSTM has a good classification effect for long series data with correlation among data. The structure unit of LSTM is shown in Figure 4. In Figure 4, and are the outputs of the previous and current LSTM structure units, respectively; and are the memories of the previous and current LSTM structure units, respectively; is the input modulation gate; and are input and output gate, respectively; and is the forgetting gate [39].

However, the output of LSTM structure unit is only related to the previous data and has nothing to do with the following data. In the human activity recognition, both forward and backward directions of CSI amplitude features have strong correlation. Therefore, BLSTM neural network is used in this paper, and its basic structure is shown in Figure 5. It can be seen from Figure 5 that BLSTM neural network constructs two groups of LSTM networks from the forward and backward directions. The output of the forward LSTM network is related to the previous data, and the output of the backward LSTM network is related to the subsequent data. The output of BLSTM structure unit at time iswhere and are the coefficients of and , respectively [40]. Therefore, the output of BLSTM neural network is determined by the whole data sequence.

5. Experimental Evaluation

5.1. Experimental Setup and Data Acquisition

In this paper, the experiments were carried out in a laboratory with an area of 5 m × 5 m. In the laboratory, there are tables, chairs, and experimental tables, and the layout of the laboratory is shown in Figure 6. In the experiment, two computers with Intel 5300 wireless network cards are used as signal transmitter and receiver, and Ubuntu 10.04 operating system and CSI tools are installed on both computers, in which the transmitter sends signals through one antenna and the receiver receives signals through three antennas. In this paper, the transmission frequency of WiFi signal is set to 2.4 GHz, the channel bandwidth is set to 20 MHz, and each antenna receives CSI data of 30 subcarriers. The distance between the transmitting and receiving antennas is 2 m, and the height from the ground is 1 m. The volunteers are located in the middle of the transceiver and perform boxing, falling, running, walking, and sitting, respectively. The receiving end uses 1000 Hz sampling frequency to collect data, each sample is collected for 4 seconds, and 98 samples are collected for each activity. In general, for small samples, about 70% and 30% samples are selected as training set and testing set, respectively. However, in order to verify the good performance of the proposed MF-ABLSTM in the case of small samples, we only randomly select 10 samples from 98 samples of each activity as the training set and the other 88 samples as the testing set in the following experiment. This way of splitting dataset can not only ensure that the number of training samples is small, but also ensure that the number of testing samples is large, so that the experimental results have good statistical significance.

5.2. Parameter Analysis
5.2.1. Analysis of Threshold Value

The value of sample entropy is closely related to the given threshold value . Pincus et al. [40] verified that the value of sample entropy can better measure the complexity of time series when the threshold value is within 0.1–0.25 standard deviation of time series. Therefore, the threshold value is set as 0.1SD, 0.15SD, 0.2SD, and 0.25SD, respectively, where SD represents the standard deviation of CSI amplitude after interval interception, and the accuracy of human activity recognition based on MF-ABLSTM algorithm is analyzed. The experimental results are shown in Figure 7. From Figure 7(a), it can be seen that the accuracy of human activity recognition is all above 90%, but the accuracy of five experiments is more stable and relatively high when is 0.2SD. From Figure 7(b), it can also be seen that the average accuracy of human activity recognition is the highest, 92.6%, when is 0.2SD, so the threshold value of sample entropy is set as 0.2SD in this paper.

5.2.2. Analysis of Step Length and

The time-frequency features extracted in this paper are closely related to the step length of time domain and the step length of frequency domain, so it is necessary to analyze the influence of different and on the accuracy of human activity recognition.

According to the time component length of time-frequency matrix, is set to be 20, 50, 100, 200, 350, and 700, respectively, and is set to be 1. The accuracy of human activity recognition based on MF-ABLSTM algorithm is analyzed. The experimental results are shown in Figure 8. From Figure 8(a) and Figure 8(b), it can be seen that when is 200, the average accuracy of human activity recognition is the highest, and the accuracy is relatively stable in five experiments. The reason is that is too large, which makes the number of time intervals too small, so the discrimination of activity features contained in each time interval is small, resulting in the reduction of activity recognition accuracy. When is too small, the number of time intervals is too large, and the time domain features of human activities become too complex, which in turn leads to a decrease in the accuracy of activity recognition. Therefore, is set to be 200 in this paper.

According to the frequency component length of time-frequency matrix, is set to be 1, 5, 10, 20, and 40, respectively, and is set to be 200. The experimental results are shown in Figure 9. From Figure 9(a) and Figure 9(b), it can be seen that the average accuracy of human activity recognition decreases obviously with the increase of . This is because the frequency domain energy of human activity is mainly distributed in the low frequency part. The smaller the is, the greater the discrimination of activity features contained in the low frequency part is, and the higher the accuracy of activity recognition is. Therefore, is set to be 1 in this paper.

5.3. Analysis of MF-ABLSTM Algorithm
5.3.1. Analysis of Features

In this paper, the statistical feature, time domain feature, and frequency domain feature of CSI amplitude are used to construct feature vectors for recognizing human activities. Each of these features can represent different human activities well. To verify the effectiveness of these feature combinations, the accuracy of human activity recognition based on ABLSTM algorithm is compared and analyzed in five cases: no feature (NF), only statistical feature (SF), only time domain feature (TF), only time-frequency domain features (TFF), and time-frequency domain and statistical features (TFSF). Five experiments were conducted for each case, and the average value was taken as the final result. The experimental results are shown in Figure 10. From Figure 10, it can be seen that the accuracy of ABLSTM algorithm using the TFSF is the highest and the most stable, which is increased by 8.5%, 6.1%, 4.9%, and 1.4% compared with the NF, SF, TF, and TFF, respectively. The reason is that the four types of features characterize the human activities from different perspectives, so their combination can effectively improve the accuracy of human activity recognition.

5.3.2. Analysis of Classification Algorithms

To analyze the good performance of ABLSTM algorithm, the accuracies of human activity recognition with five classification algorithms, ABLSTM, LSTM, CNN, RF, and DTW-KNN (Dynamic Time Warping-K Nearest Neighbors), are compared, when the training set is 10 samples and the features are TFSF. The experimental results are shown in Figure 11. From Figure 11(a), it can be seen that the recognition accuracy of ABLSTM algorithm used in this paper is up to 92.6%, which is 11.4%, 5.3%, 20.2%, and 18.7% higher than LSTM, CNN, DTW-KNN, and RF algorithms, respectively. To further analyze the recognition accuracy of ABLSTM algorithm for each activity, the confusion matrix of ABLSTM algorithm is constructed, as shown in Figure 11(b). From Figure 11(b), it can be seen that the recognition accuracy of ABLSTM algorithm is more than 90% for all three activities, where the recognition accuracy (90%) for falling activity is lower and the recognition accuracy (99%) for boxing activity is higher, which further verifies the effectiveness of ABLSTM algorithm.

5.3.3. Analysis of Small Samples

The number of training samples has great influence on the recognition accuracy of ABLSTM neural network. Generally, the larger the number of training samples, the higher the recognition accuracy of the algorithm. To verify the effectiveness of MF-ABLSTM when there are only a small number of training samples, we conduct the following experiments. We randomly select 40 samples from 98 samples of each activity as the training set and the other 58 samples as the testing set. The ABLSTM algorithm is conducted when the training samples are the first 10, 20, and 40 samples in the training set, respectively, and these experiments are represented as ABLSTM(10), ABLSTM(20), and ABLSTM(40). The MF-ABLSTM algorithm is conducted when the training samples are the 10, 20, and 40 samples where the former one is the first 10 samples in the training set and the latter two are expanded from the 10 samples the latter two are expanded from the 10 samples by using the tensor prediction algorithm proposed in this paper, respectively, and these experiments are represented as MF-ABLSTM(10), MF-ABLSTM(20), and MF-ABLSTM(40). Then, we compare the activity recognition accuracies of the above six experiments by testing the 58 samples, and the experimental results are shown in Figure 12. From Figure 12, it can be seen that the recognition accuracy of ABLSTM algorithm is higher with the increase of the number of training samples, which shows that small samples and large samples have a great impact on the recognition accuracy of ABLSTM algorithm. However, when the training samples are 10, the recognition accuracy of MF-ABLSTM algorithm proposed in this paper is higher than that of ABLSTM(40) and is increased steadily after the proposed tensor prediction algorithm expands the training samples, which verifies the effectiveness of the method proposed in this paper to solve the problem of small samples.

6. Conclusion

Device-free human activity recognition technology based on CSI has become an important research direction in the field of intelligent sensing, and the related achievements emerge one after another. However, the existing research still needs a large number of training samples to obtain the ideal recognition accuracy. To solve this problem, a MF-ABLSTM human activity recognition method based on CSI small samples is proposed. In this method, the proposed tensor prediction algorithm is used to expand the training samples, the statistical features of sample entropy of CSI amplitude and the time-frequency domain features of time-frequency matrix are used to construct feature vectors representing human activities, and the ABLSTM neural network is used to recognize human activities. In this paper, the different feature combinations, the different numbers of training samples, and the performance of different classification algorithms are analyzed through experiments. The experimental results show that the MF-ABLSTM method proposed in this paper only needs to use a few training samples, which can achieve high accuracy and stability of human activity recognition.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 62076114 and Grant 71874025 and in part by the Humanities and Social Sciences Research Planning Foundation of the Ministry of Education under Grant 20YJA630058.