Abstract

Through-wall detection and classification are highly desirable for surveillance, security, and military applications in areas that cannot be sensed using conventional measures. In the domain of these applications, a key challenge is an ability not only to sense the presence of individuals behind the wall but also to classify their actions and postures. Researchers have applied ultrawideband (UWB) radars to penetrate wall materials and make intelligent decisions about the contents of rooms and buildings. As a form of UWB radar, stepped frequency continuous wave (SFCW) radars have been preferred due to their advantages. On the other hand, the success of classification with deep learning methods in different problems is remarkable. Since the radar signals contain valuable information about the objects behind the wall, the use of deep learning techniques for classification purposes will give a different direction to the research. This paper focuses on the classification of the human posture behind the wall using through-wall radar signals and a convolutional neural network (CNN). The SFCW radar is used to collect radar signals reflected from the human target behind the wall. These signals are employed to classify the presence of the human and the human posture whether he/she is standing or sitting by using CNN. The proposed approach achieves remarkable and successful results without the need for detailed preprocessing operations and long-term data used in the traditional approaches.

1. Introduction

The ability to image targets behind building walls or to detect people under debris also including the classification of the human body has been drawing attention since the last decade. For this reason, unlike image processing, ultrawideband (UWB) radars as radio frequency sources more precisely achieve this kind of purpose applicable to real-world problems. UWB radars are used for different applications such as the detection and classification of aircrafts, collision avoidance, detection of a target, or the heart and respiration rate of a human. This kind of radar has several key advantages over narrowband continuous wave radars like having a very high downrange resolution of a target, allowing better separation between targets and clutter due to the large bandwidth; multiple target detection capability; good immunity against multipath interference; and detection of both an object and its position [1]. The concept underlying through-wall human detection using UWB radars lies on a similar approach with that of radar imaging. A fraction of the transmitted RF signals is traversed from a nonmetal wall, reflected from the objects—even humans, and returned to the receiver imprinted by passing the nonmetal wall again having some signature of the objects within the room. By using this received signal, imaging of the objects is possible [2].

As a form of UWB radar, stepped frequency continuous wave (SFCW) radar approaches are commonly used in many practical applications including through-wall radar imaging and target ranging [37], medical imaging [8], and many applications utilizing ground penetrating radar (GPR), a kind of SFCW radar, for civil engineering [9, 10], structural static testing [11], quality estimation of the road surface layer [12], detection of pipes and cables buried in the ground [13, 14], archeological purposes [15], and unexploded ordnance disposal [16, 17]. These studies rely on using SFCW radar signals since they make the spectrum accessible directly to the user. SFCW radar techniques also have benefits including high mean transmitter power and high receiver sensitivity [2, 4]. Not only do they provide the ability to detect targets or events but SFCW radars also enhance the range accuracy, enable clutter rejection, and help reduce the multipath.

Convolutional neural networks (CNNs) have been used for solving many different artificial intelligence problems, providing significant advantages over other machine learning approaches in solving complex learning tasks. In conventional classification approaches, features were manually extracted and designed and then followed by a traditional classifier such as a support vector machine (SVM). Due to having several feature extraction and signification layers, CNNs are capable of performing automatic preprocessing along with their neural network characteristics [1821].

In the literature, studies that classify targets by processing radar signals can be divided into two categories: studies based on creating images from radar signals and studies based on extracting different features of the target traditionally. The first category suggests either operating at very high frequencies (e.g., millimeter wave or terahertz) [22] which do not penetrate walls or using the SAR (synthetic aperture radar) algorithm [23]. The second category proposes to extract the specific features of the target such as vital signs [2426] or movement characteristics [2729] of the target. Micro-Doppler signatures [30] are commonly used to detect vital signs or to classify the target’s specific activities such as running, walking, or even falling. However, the most important feature and disadvantage of this approach is the need for continuous data for a certain period (5-30 secs) for preprocessing and classification. This approach may not be practical for security, counterterrorism, or mission-critical operations where immediate decisions are important. Although the SAR algorithm provides detailed information for the absolute position and shape of the target, there are difficulties in implementing it in practice. Therefore, there is a need to develop methods which have the simplest configuration and can make instant classification without the need for long-term data.

This paper focuses on the classification of the human posture behind the wall using through-wall radar signals and a convolutional neural network (CNN). Thus, due to the advantages of CNN, detailed preprocessing is not required for classification. The SFCW radar is used to collect radar signals reflected from the human target, and these signals are employed to classify the human target whether he/she is standing or sitting by using CNN.

The paper is organized as follows: the SFCW radar concept as a form of UWB radar and CNNs are briefly introduced. After presenting the detection and classification approach of the study, the experimental setup and results are given and discussed. The study is concluded in Section 6, also proposing some future works.

2. SFCW Radar

The SFCW radar is a UWB radar form with advanced features having considerable capabilities for a variety of applications. The main advantage of the SFCW radar is the high dynamic range and low noise floor. Furthermore, with the ability to avoid certain frequencies for transmission, the SFCW is preferred for certain restricted applications. Depending on these advantages of the SFCW radar, these radar systems are better choices on through-wall imaging due to the range, resolution, and propagation characteristics of UWB signals through a dielectric wall [31, 32]. Detailed information about the SFCW radar can be found in [33].

Assuming that the total period of repeated pulses of the SFWC signal is , the initial (minimum) frequency is , the frequency step increment is , and the stepped frequency is ; then, the stepped frequency may be written as follows [3]: where refers to the initial phase of the transmitted signal and

The transmitted signal is reflected and echoed back from the target at the radial distance. The target echo signal may be described as follows: where refers to the time delay of the echoed signal considering a two-way distance of the object at and refers to the speed of light. in (3) refers to the backscattering coefficient of the objective. is assumed constant and set as uniformly within the observation period of the radar signal.

The maximum unambiguous range that the radar can detect is decided by the step size . The resolution which is the ability to distinguish the two closely spaced targets is determined by the bandwidth .

In radar systems, the signal collected in any point of measuring is called an A-scan (1D data). The received signal obtained in the frequency domain from the entire bandwidth, namely, the A-scan data, is converted into the time domain by performing the Inverse Fast Fourier Transform (IFFT). As the time delay between the transmitted and received signals is related directly to the radial distance of objects in the radar’s range, the spatial domain can easily be calculated by using IFFT.

3. Convolutional Neural Networks

This section aims to introduce and clarify some concepts of convolutional neural networks. Detailed definitions of CNNs can be found from the literature [3437].

A CNN is a class of deep multilayer feed-forward neural network machine learning algorithms that was inspired by the visual cortex of the brain. These network models are based on local receptive fields, shared weights, and spatial or temporal subsamplings that ensure some degree of shift, scale, and distortion invariance [38]. The CNN architecture allows the computer to “see”—recognize images by propagating raw natural images from an input layer, a feature extraction module and a classification module, to class scores in the output layer [39]. The feature extraction module (made up of convolutional and subsampling (pooling) layers) automatically gathers relevant information such as colors, blobs, corners, oriented edges, endpoints, and higher order features through a feature learning process by filtering the input for useful hierarchical abstract information [40].

In the traditional classical machine learning approaches which are used for pattern recognition, a hand-designed feature extractor such as the Histogram of Oriented Gradients (HOG), Bag of Features (BOG), scale invariant feature transform, bank of Gabor filters, Linear Binary Pattern (LBP), and Fisher vectors is used for feature extraction in a domain-specific feature-engineered process. Training such models on natural images would lead to problems such as the curse of dimensionality due to their high dimensionality and sparseness [36]. However, when training a CNN model, a filter/kernel is used to perform a cross product with the 2-dimensional input—that is, a convolution operation is performed across the input volume—to produce a 2-dimensional feature map. The convolution operation (hence, they are called CNN) is followed by an additive bias and squashing function (such as the sigmoid function, hyperbolic tangent function, and Rectifier Linear Unit (ReLU)). A cross-correlation interpretation of the kernels is seen as the input or feature map detectors for certain nonlinear features that are large on a given activation map [37].

Parameters on a convolutional layer include input/feature map size , stride (), zero padding (), and filter size (). The spatial size of the activation map is computed as follows:

Pooling layers handle a shift and scale invariance, thus reducing the sensitivity of the output. Moreover, they help reduce the model’s memory consumption by reducing the number of parameters from the feature maps. Nonlinear functions to implement pooling include max pooling, average pooling, and Region of Interest (ROI) pooling [37, 40, 41].

This CNN model resembles the unsupervised self-organized multilayer neocognitron model by Fukushima [42] that was inspired by experiments on the visual cortex of the cat and monkey done by Hubel and Wiesel [42, 43]. The model was made up of cells in a single cell plane (S-layer whose receptive fields are found on the input layer) as shown in Figure 1.

CNN training uses the backpropagation algorithm where a gradient descent search is performed to compute the weights that minimize the classification error [37]. In the backpropagation steps for training the CNN, the stochastic gradient descent search is performed to update the weights. This is done by evaluating gradient based on a single or a small sample of a training sample to update the approximate gradient rather than accumulating the gradients over the entire training set. At each training iteration, a parameter is updated as follows:

However, in the CNN network, the partial derivative is the sum of the partial derivatives with respect to the connections that share the weight parameter.

Other training recommendations include the Gauss-Newton or Levenberg-Marquardt algorithms, such as the Broyden-Fletcher-Goldfarb-Shanno. The classification module, usually a fully connected multilayer perceptron (MLP), is a trainable classifier that categorizes the resulting feature vectors into classes in the output layer by using loss functions such as softmax [38, 41] which is given as follows:

A recent performance of the CNN has greatly improved and surpassed humans in several tasks such as classification, segmentation, object detection, and playing games [36]. This is due to the increasing complexity of the model, increased training samples, and implementation of new training techniques [44, 45]. These new training techniques include initialization schemes [46, 47], deep rectifier networks [48], batch normalization [49], dropout [50], and softmax loss classifier networks [45, 46] and parallel programming with GPUs [40, 51].

4. Detection and Classification of the Human Posture

Using the powerful features of CNNs, it is aimed to detect and classify a human posture and activities whether he/she is standing, sitting, or absent behind a wall by using SFCW radar signals in this study. To detect the human and to classify the posture behind the plastered brick wall, the test data have been gathered by a SFCW radar system with a vector network analyzer (VNA) and two horn antennas. Since the SFCW radar signals carry valuable information about the object behind the wall, the experiment is evaluated by acquiring the SFCW radar signals for 3 different cases including an empty scene, a scene with a standing human target, and a scene with a sitting human target. The last two cases with human targets are evaluated for the presence of a human both 2.5 and 5 meters away from the wall. A brief demonstration of the experimental setup can be seen in Figure 2.

Two S-parameter vectors are used for each sample. The sample matrix which is utilized as the input data for the CNN structure generated by using these two vectors and the input data matrix has the form as 2x(N-1) which is shown as follows:

To classify the human target into three classes (empty, standing, and sitting), the generated data matrix is utilized as the input of the CNN. The CNN structure is constructed sequentially including convolutional, batch normalization, ReLU and pooling layers, second convolutional, batch normalization, ReLU, and pooling layers and fully connected, softmax, and classification layers at the last. Dropout is also applied in the fully connected layer. A brief demonstration of the suggested approach is given in Figure 3.

5. Experimental Setup and Results

In order to collect RF data reflecting a human behind the building wall, one Copper Mountain S5065 model vector network analyzer (VNA) having the frequency range of up to 6.5 GHz is used. The stepped frequency waveforms are generated within the range of 2.0 GHz and 4.0 GHz (2 GHz bandwidth ()) having a step size  MHz. Hence, the number of frequency points . Also, 1000 readings per sample are collected regarding the given configuration. According to this setup, the maximum unambiguous radar range can be calculated as follows; where denotes the maximum range, stands for the speed of light, and refers to the radar’s downrange resolution. Horn antennas for transmitting and receiving RF signals are used during the experiments, and the distance between antennas is set to 40 cm. Both transmit and receive antennas are placed about 80 cm above the ground, and the distance between the wall and the antennas is approximately zero (see Figure 2(a)). Despite the 201 points being set as the frequency steps for better downrange resolution, the first 100 data are considered for ease of calculation since the measurement distance in our test scenario is less than 7.5 m. According to the experimental setup, micro movements can be defined as the minimal movement of the human, just heart beats and aspiration are valuable, and macro movements can be defined as the movement of the head and extremities including the heart beats and the aspiration.

The experimental data is obtained for 9 different scenario measurements. These scenarios can be defined as the absence of a human (empty) scene, standing human scene in which the human is standing at 2.5 m and 5 m away from the wall, and sitting human scene in which the human is sitting on a chair at 2.5 m and 5 m away from the wall. In all scenes excluding the empty scene, the micro and macro movement variations of the scenarios are also collected. Therefore, the experiments are evaluated on 9 totally different scenario data. Each scenario has 1000 readings related with 3 classes which are empty, standing, and sitting. All experiments are evaluated without the wall, states as free space, and two different dielectric wall structures which are a brick wall and a drywall. The plastered brick wall is made of standard 135 mm thickness bricks, and the plaster thickness is approximately 10 mm and is not homogeneous. The drywall was formed by combining 5 pieces of 12.5 mm thick plates side-by-side. There is a nonhomogenous air gap of less than 10 mm between the plates.

In the classification process, a CNN structure is generated specific to the study. The demonstration of the proposed CNN structure is given in Figure 4, and the CNN structure and the detailed layer definitions are given in Table 1. The training dataset is constructed by using 80% of the randomly selected readings, and the test dataset is constructed by using the remaining part of the readings. Therefore, 7200 readings are used in the training phase and 1800 readings are used for the test phase of the experiment. The proposed approach is run 30 times with different random seeds, and the results are presented by using the means of the 30 runs. The confusion matrices of experimental results are given in Tables 24 for the free-space, brick wall, and drywall scenarios, respectively, and the overall experimental results are given in Table 5.

While data are available for longer ranges, data regarding a shorter distance is used in these tests and experiments. This reduced unwanted data such as the reflection and multipath effects the most. Since the reflected RF signals contain important information about the obstacles at the radial distance, the CNN will be able to successfully differentiate situations where dimensional differences are high. Increasing the resolution of downrange by increasing bandwidth will, therefore, increase the success of discrimination. By increasing the number of SFCW frequencies, more data will be obtained, and again, classification success will be increased for longer distances.

Although researchers focus on the reduction of negative effects of the wall, this approach may reduce success in the learning phase. The use of micro-Doppler methods would be appropriate for imaging and detection; the tests would need to be deepened for classification. Preprocessing is applied with the assumption that the wall is constant, and filters will destroy some valuable data on the environment.

The influence of the permeability parameter in the tests is clearly observed. For example, it can be said that the classification success rate has been slightly reduced because there are fewer signals from especially plastered brick walls with lower permeability.

In the tests and experiments, the micro movements which are more difficult to detect by the conventional methods and the macro movements whose effects are observed on the whole frequency band are evaluated separately. Macro movements which reduce valuable information about the environment behind the wall have reduced the classification success. However, increasing the amount of data in the learning and training process will reduce this effect. Thus, by increasing the training data, similar high classification accuracy has been achieved for both micro and macro movements as seen in Table 5.

6. Conclusion

This study focuses on the assessment of the classification of the SFCW radar (as a form of UWB radar) signals in order to detect the absence of a human and the human posture using CNNs. In the literature, there are techniques and methods using micro-Doppler signatures and the slow time data of target movements. However, in this study, long-term data are not needed, and classification is done instantaneously. Consequently, remarkable results have been obtained with this study, which is the first (to the best of the authors’ knowledge) in the scope of using CNN to classify raw data obtained from through-wall radars without both detailed preprocessing and slow time data.

SFCW radar signals of the generated test scenes are obtained using a VNA, and after constructing the dataset by utilizing the readings, it is used to classify the presence of the human and the human posture whether he/she is standing or sitting by using CNN.

In another form of UWB, frequency-modulated continuous wave (FMCW) radar signals could be used as the data source for CNN. Since SFCW radar techniques have benefits including high mean transmitter power and high receiver sensitivity, the SFCW radar has been used in this study.

The proposed approach achieves remarkable and successful results without detailed preprocessing operations used in the traditional approaches. Future works include the usage of synthetic aperture radar (SAR) for dataset construction, evaluating the performance of different kinds of deep learning classifiers for the study concept.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to thank the STM Savunma Teknolojileri Mühendislik ve Ticaret A.Ş. company for their cooperation and support.