Abstract

Driver’s behavior and gesture recognition are most significant in the emerging next-generation vehicular technology. Driver’s face may provide important cues about his/her attention and fatigue behavior. Therefore, driver’s face pose is one of the key indicators to be considered for automatic driver monitoring system in next-generation Internet of Vehicles (IoV) technology. Driver behavior monitoring is most significant in order to reduce road accidents. This paper aims to address the problem of driver’s attentiveness monitoring using face pose estimation in a nonintrusive manner. The proposed system is based on wireless sensing, leveraging channel state information (CSI) of WiFi signals. In this paper, we present a novel classification algorithm that is based on the combination of support vector machine (SVM) and nearest neighbor (KNN) to enhance the classification accuracy. Experimental results demonstrate that the proposed device-free wireless implementation can localize a driver’s face very accurately with an average recognition rate of .

1. Introduction

With the rapid growing automobile industry, Internet of Vehicles (IoV) has attracted many researchers due to its enormous commercial applications [14]. For the next generation fast-paced intelligent vehicles, driver’s face estimation may provide important cues to solve many human-centered problems, e.g., driver behavior recognition and driver attention analysis for safe driving. Face localization is a special case of head pose estimation that is widely used in various applications, e.g., saliency prediction, facial expression analysis, and video conferencing [5]. The nominal face orientation while driving is frontal. If the driver’s face orientation is in other directions (e.g., tilting down or sideway), this is either due to inattention or fatigue. From the literature, it is evident that driver’s head analysis generally points out the attention level of a driver, as well as his/her distraction and fatigue behavior [6, 7].

During the previous decays, various driver assistance systems are designed to avoid accidents. These systems could significantly provide essential information at an early stage to avoid possible accidental risks from occurring. The driver’s attention can be analyzed from several visual or nonvisual parameters, i.e., heart rate variability [8], motion of the hands [9] or the feet [10], and gaze tracking [11] or eye blinking [12]. Among the others, face is an important indicator to measure driver’s attention that deserves further consideration.

In general, the choice of the most suitable driver’s inattention monitoring system is crucial. For optimal performance, the sensing system should be noninvasive and able to perform accurately under various driving conditions, e.g., nights, clouds, sunrises, and sunsets. During the recent years, WiFi-based activity and gesture recognition systems have been emerged with remarkable performance [1327], leveraging channel state information (CSI). Motivated by the desire, a WiFi CSI-based wireless sensing framework is presented that is simple yet accurate face localization system to overcome the difficulties of existing methods. Our proposed WiFi CSI-based wireless sensing solution is nonintrusive to user, calibration-free, and can work well in smoke, darkness, line-of-sight, and nonline-of-sight. As low-density parity-check codes (LDPC) scheme has become the significant choice of WiFi (802.11n/ac/ad) [2831], this innovative idea may also accelerate the benefits of 5G in IoV.

In this research work, we propose a novel hybrid classification technique that is based on the combination of support vector machine (SVM) classifier with nearest neighbor (KNN), to enhance the recognition performance. Both SVM and KNN have been effectively used for various WiFi-based activity and gesture recognition systems [3236]. The performance of KNN is dependent on the size of the training samples. As a result, if the size of training samples is very less, it often cannot predict very accurately due to the problem of high variance. Therefore, nearest neighbor classifiers do not have a good generalization capability. Meanwhile, SVM classifier has a good generalization capability because it is based on finding the optimal hyperplane for nonseparable input data. On the other hand, SVM uses all training samples which may cause time-consuming computation. In order to overcome the issues of the high computational burden of SVM and requirement of large sample size of KNN, a combination of SVM and KNN is proposed. The combination of KNN and SVM algorithms (KSVM) yields excellent results and deals in the multiclass setting with reasonable computational complexity in practice. The proposed KSVM technique follows two simple steps [37]. First, we find close neighbors to a query sample. Second, we train a local SVM classifier which preserves the minimal values on the set of collected neighbors. The proposed method outperforms nearest neighbor and support vector machines for large and multiclass data sets.

The presented scheme utilizes commercially available WiFi devices to record and acquire channel information, which are readily available in the form of CSI measurements [38]. The proposed mechanism leverages the variations in WiFi channel information caused by driver’s poses in WiFi coverage area. As per author knowledge, this is the pioneer work for device-free WiFi CSI-based driver’s face localization system using the combination of nearest neighbor with SVM.

Our remarkable contributions are summarized below: (i)We present a wireless device-free driver’s face localization system utilizing CSI of WiFi signals(ii)We propose a novel hybrid classification method based on the combination of SVM and KNN to increase the recognition accuracy of the system with less computational burden(iii)To validate the reliability of our presented scheme, comprehensive experiments are performed in cluttered scenarios(iv)To evaluate the performance of our proposed classification method, we compare our experimental results with conventional classification algorithms

The remaining part of the paper is organized as Section 2 briefly reviews the traditional methods relevant to our presented research work. Section 3 gives the highlights of suggested framework. In Section 4, the detailed implementation of our proposed system is discussed. In Section 5, we explain the experimentation settings and results. Section 6 demonstrates the main limitations of the presented technique. Finally, Section 7 provides the conclusion with future suggestions.

This section will review the existing WiFi CSI-based device-free gesture and activity recognition systems relevant to our research work. WiFi-based fine-grained physical layer CSI has attracted many scientists because of its high localization accuracy. With the pervasiveness in wireless sensing technology, WiFi-based device-free indoor localization has been entered into a modern era of life [1416]. The emerging device-free activity recognition takes the advantages of WiFi CSI for the characterization of human activities [3942]. WiFi-based localization and recognition has been extended to Wi-COVID [26], a WiFi-based COVID-19 detection and patient monitoring system.

Recently, WiPass [25] introduced WiFi CSI-based smartphone keystroke recognition. The WiFi CSI-based intrusion detection [33, 43, 44] and microactivity recognition [45] systems have been presented with remarkable recognition performance. In recent years, ubiquitous WiFi-based training-free localization system has been suggested with good recognition results [46]. The authors of [17] presented a multiuser gesture recognition system using WiFi signals. DeepSeg [22] and Wihi [23] worked on WiFi-based activity recognition using deep learning approach. During the recent years, WiFi-based posture recognition system has been developed with good recognition results [21]. WiAct [19] proposed a device-free passive activity recognition system exploiting the correlations between WiFi CSI amplitude information and human body movement. The authors of [20] demonstrated the concept of temporal frequency for WiFi-based human activity recognition.

WiFi vision [13] introduced the idea of indoor positioning, WiFi imaging, daily activities recognition, gesture recognition, gait recognition, human identification, fall detection, and human detection using WiFi devices. WiGer [47] presented a WiFi-based hand gestures recognition system via a fast dynamic time warping algorithm. In recent years, the idea of writing in air [18, 48] is introduced for virtual reality devices using WiFi signals which is much complex in comparison to simple gestures recognition. DF-WiSLR [27] is a sign language recognition model exploiting WiFi signals.

Traditionally, support vector machine (SVM) and nearest neighbor (KNN) methods have been widely used in numerous device-free WiFi CSI-based localization, activity, and gesture recognition systems, as stand-alone classifier. In this context, WiCatch [49] presented a WiFi CSI-based hand gesture recognition system leveraging SVM classification method. Wi-Key [34] is WiFi CSI-based system to recognize keystrokes using KNN classifier. WiFall [35] is WiFi CSI-based abnormal behavior detection system leveraging local outlier factor. In this model, one-class SVM is used to successfully classify the features.

In recent decades, WiFi-based driver’s in-vehicle activity and gestures recognition systems have been introduced with good recognition performance [5052]. WiFind [36] presented a WiFi CSI-based driver fatigue detection system. This model is based on one-class SVM technique. WiDriver [53] is dependent on driver’s hand movements to recognize driver actions using CSI of WiFi signals. Different from existing systems, we use hybrid classification approach, i.e., combination of KNN and SVM algorithms for WiFi CSI-based driver’s face localization.

3. System Overview

In this section, we will demonstrate the important facts about WiFi CSI, basic system architecture, and overview of our presented classification mechanism.

3.1. CSI Overview

The proposed system is based on WiFi devices having IEEE 802.11n/ac enabled protocols. The channel state information (CSI) of WiFi signal is used as information source. Commercially available off-the-shelf WiFi devices that exploit IEEE 802.11n/ac usually support multiple-input multiple-output (MIMO) technology and thus comprises of multiple transmitter (Tx) and receiver (Rx) antennas. The CSI of WiFi signal is fine-grained information containing physical layer data that is based on widely used orthogonal frequency division multiplexing (OFDM) technology.

In this work, a WiFi router or access-point is used as a transmitter. The receiver is an Intel 5300 NIC that is used to collect CSI information from physical layer. Both transmitter and receiver are enabled with IEEE 802.11n protocol. The channel properties are usually available in the form of CSI measurements on commercial WiFi devices [38]. Intel 5300 network interface card (NIC) supports 30 subcarriers to records and acquire the channel variations of each CSI Tx-Rx antenna pair in orthogonal frequency division multiplexing (OFDM) system.

The commonly used narrowband flat-fading channel for packet, leveraging OFDM and MIMO technique is formulated as where is the received signal, and is transmitted signal. is the total number of packets received. refers to the CSI channel matrix for packet, and represents the Gaussian noise vector.

Let and refer to the total number of receiving and transmitting antennas, respectively. For each stream, the CSI matrix comprises of complex values. For each Tx-Rx antenna pair CSI matrix, is presented as where carries both phase and amplitude measurements in the form of complex number; calculated as where stands for the amplitude while denoted the phase information.

3.2. System Architecture

Our WiFi CSI-based driver’s face localization system consists of following three main modules: (1) CSI preprocessing module, (2) feature extraction module, and (3) classification module, as shown in Figure 1.

CSI preprocessing module collects and acquires the CSI data from physical layer. Using basic filtering techniques, the CSI data is preprocessed to remove unwanted noises. Feature extraction module detects the relevant face poses and extracts the meaningful features from pre-processed CSI data. The classification module relies on hybrid classification technique to recognize different poses. In the next section, each module is explained briefly.

4. Methodology

In this section, we will explain the complete flow of our system methodology, i.e., CSI preprocessing, pose detection, features extraction, and classification.

4.1. CSI Preprocessing

The acquired CSI data is a composite signal which comprises of useful information and embedded unwanted noises from the surroundings. First, the CSI received information is filtered using basic filtering techniques. From the literature, it is clear that the human activities and poses have less frequency in comparison to the frequency of noise [43]. Therefore, we need to remove high-frequency noise. For the purpose, a second order low pass Butterworth filter is implemented. In our experiments, the packets sampling rate () is adjusted at 80 packets/second that is equal to the normalized cutoff frequency . The received CSI data may be affected by some static path components. Therefore, we subtract the corresponding constant offsets from the streams to alleviate these static path components. The raw CSI data cannot be directly used and can give wrong information because it is wrapped between and [54]. Therefore, it is required to unwrap the measured CSI data. The raw and unwrapped CSI data are shown in Figures 2 and 3, respectively.

Due to unsynchronized time clock between transmitter and receiver, the CSI raw phase data behaves extremely random. The relation between true phase and measured phase can be formulated as where represents the measured phase of subcarrier while shows the actual phase, is the time lag, stands for the subcarrier index, is used for the size of FFT, represents the unknown phase offset, and indicates the random noise.

The phase error is linear function of subcarrier index . We can formulate two calibration parameters and for phase represented as

We subtract from raw phase and get the sanitized phase as

The phase sanitization is performed on all the subcarriers and reassembled according to the corresponding amplitudes.

4.2. Pose Detection

In this section, we discuss how to detect the driver’s face-related activities or poses using the received CSI data and how we can distinguish it from other in-vehicle activities. The first step towards driver’s face localization using CSI data is to detect whether a driver has performed some activity or not. For this purpose, the meaningful CSI streams caused by human motion are initially segmented using step-by-step moving variance technique [41]. In each step, a sliding window of length is used across neighboring CSI packets. The high values in moving variance are caused by dynamic motions while, on the other hand, low values mean slight fluctuation due to surroundings. The moving variance is given as here, is the mean and can be defined as where is the number of packets. is the packet number in sliding window while is the packet number in CSI stream. Afterward, we calculate the variance for 30 CSI filtered streams and obtain a matrix, where is the sliding window moving times. After detecting the moving part of CSI sequence, we accurately detect the face-related activities. Through experimental investigations, we find that the face activities cause large fluctuation in CSI streams. Therefore, we apply a threshold based technique. We set a threshold and compare the maximum value of with for activity related component selection. Mathematically,

where represents the presence of face-related activity. If is set to 1, it means face-related activity is detected. The value of threshold is empirically selected based on the variance of our preliminary measurements, which varies with the activities. To reduce the dimensionality of the acquired data, we perform principle component analysis (PCA) as demonstrated in [55]. After getting the required profile, we can apply extract feature extraction method.

4.2.1. Feature Extraction

Based on the detailed analysis of extracted moving average data and our preliminary experimental investigations, we specifically choose following six statistical features: (i) mean, (ii) standard deviation, (iii) median absolute deviation, (iv) maximum value, (v) percentile, and (vi) percentile. In order to differentiate multiple poses features, we integrate all obtained features into a tuple and can be defined as where represents the set of all features, and stands for a feature.

4.3. Classification Module

Our presented KSVM classification algorithm combines the benefits of both KNN and SVM. It finds the nearest neighbors of query sample and then trains an SVM classifier to perform the recognition [37].

4.3.1. KNN Classifier

The nearest neighbor (KNN) is a very simple machine learning algorithm that is based on statistical data for the classification of features. The KNN method chooses the most relevant class of a testing sample from the available nearest samples. Suppose we have a training dataset can be represented as where is the number of samples , and is the corresponding class label .

According to nearest neighbor rule, any unknown sample is assigned to the class obtained by majority voting of its nearest neighbors in dataset . For binary classification problems, the decision rule of the KNN classifier is mathematically represented as

4.3.2. SVM Classifier

The support vector machine (SVM) is a linear discriminant classification algorithm. This machine learning method is based on the principle of maximizing the classification effect by establishing a hyperplane. The classification hyperplane provides the decision surface and maximizes the isolation boundary between different types of samples. A support surface is drawn on both sides of hyperplane containing the samples that are close to classification interface. The training samples on the support surface are called support vectors.

The training dataset is defined as where is the number of samples. Each sample is a -dimensional vector while class label is either 1 or -1.

We can map the samples to a feature space of higher dimensions using a nonlinear mapping transformation function . The SVM classifier takes the following decision rule: where is the kernel function. The value of and are adjusted to maximize the marginal distance of separating hyperplane. The sign function gives the results in binary form. To obtain results in nonbinary form, we remove the sign function:

4.3.3. KSVM Classifier

The basic idea of KSVM method follows a very simple procedure to classify the features. In the first step, it finds the nearest neighbors to query sample . Afterward, it trains an SVM that is used to perform the recognition. Finally, the samples with minimum values of are considered to be the closest samples to sample . In any transformed feature space, the following inequality can be used for the said purpose:

The final decision function follows

5. Experimentation and Evaluation

This section describes the experimentation settings and evaluation performance of our proposed KSVM scheme.

5.1. Experimentation Settings

In our presented framework, all experiments are performed with 802.11n enabled WiFi devices as described in [5052]. Specifically, a laptop is used as a receiver that is equipped with Intel 5300 NIC and three receiving antennas, i.e., . To record and acquire CSI data, we run 802.11n CSI Tool [38] on the receiver with Ubuntu 11.04 LTS operating system. A single antenna TP-Link router, i.e., is used as a transmitter or access-point that operates at frequency of 2.4 GHz. In this experiment, the receiver pings the access-point that is set at 80 packets/s. The system is generating 3 CSI streams of 30 subcarriers each forming a MIMO system with channel bandwidth of 20 MHz. We have used MATLAB R2016a to perform signal processing. We setup our testbed in a locally manufactured vehicle which is not equipped with preinstalled WiFi devices. Due to unavailability of WiFi access-point in our test vehicle, we installed a commercially available router (TP-Link) as access-point, placed on the dashboard in front of driver’s seat. To acquire CSI information, the receiver (laptop) is configured at copilot’s seat. To evaluate the performance of our presented method, the following two scenarios are chosen: (i)Scenario-I. This is the actual driving scenario where all prescribed poses are performed while driving a vehicle. The vehicle is driven at an average speed of 20 km/hr on a straight road of 20 km long, as shown in Figure 4(a). To avoid the interference of other in-vehicle activities, during pose performance, no other activity is performed(ii)Scenario-II. In this scenario, all prescribed poses are performed in a vehicle standing in a garage. The garage size is feet, as shown in Figure 4(b)

In each experiment, 10 in-vehicle human poses, as shown in Table 1, are performed by five volunteers (3 male and 2 female university students). For each experiment, each volunteer repeated all poses 20 times. Therefore, the data set consists of total 1000 samples () for each experiment. The of the total samples are used for training purpose while for testing purpose. For cross-validation, we keep the testing samples out, i.e., the training data do not have the samples from testing data.

5.2. Performance Evaluation

For performance evaluation, we specifically choose recognition accuracy and confusion matrix. The actual pose occurred is shown on the column of confusion matrix while the pose classified is shown by the rows of confusion matrix. As shown in Figure 5, the presented scheme can recognize 10 different in-vehicle poses with an average accuracy of and for scenarios I and II, respectively.

To evaluate the efficacy and reliability of presented scheme, the obtained results are analyzed by choosing different evaluation metrics, i.e., precision, recall, and -score, defined as (1)Precision is the positive predictive measurement, mathematically represented aswhere TP stands for true positive, and FP is false positive. True positive (TP) is defined as the probability that correctly predicts the positive class of any model. On the other hands, false positive (FP) is the probability that incorrectly predicts the positive class of model. (2)Recall measures the sensitivity of a model and described as the true positive rate (TPR). Recall is represented aswhere FN stands for false negative. FN is defined as the probability that incorrectly predicts the negative class. (3)-measure or -score is the weighted average of recall and precision, represented as

Figure 6 shows the results related to precision, recall, and -score. The average minimum and maximum values are summarized in Table 2. It is clear from the results that for both scenarios, all the prescribed poses are recognized with reasonable limits of precision, recall, and -score.

The accuracy of KSVM is compared with stand-alone KNN and SVM, as shown in Figure 7. The overall results are summarized in Table 3. From the obtained results, it is concluded that KSVM has high recognition accuracy as compared to stand-alone KNN or SVM.

To examine the computational complexity of our proposed KSVM algorithm, we compare the execution time of KSVM with KNN and SVM as illustrated in Figure 8. It is notified that the execution time of KSVM is less as compared to SVM but little higher in comparison to KNN. It can be compromised because the recognition accuracy of KSVM is comparatively far better than KNN.

User independence test is performed to evaluate the generalization of proposed KSVM scheme. For the purpose, leave-one-participant-out cross-validation (LOPO-CV) method is adopted. In LOPO-CV mechanism, the training data is not familiar with testing data, i.e., the whole data set is used as training data set except a specific users’ data that is used as the testing data. This method is repeated for each individual user until all users are treated as testing data. We have separately applied LOPO-CV on SVM, KNN, and KSVM to compare the results, as shown in Figures 911. The overall results are concluded in Table 4. From the obtained results, it is obvious that the proposed mechanism has generalization capability with an average accuracy of 86.4% and 88.2% for scenario-I and II, respectively. Although the stand-alone SVM algorithm has a comparatively good generalization capability but overall performance is far better using KSVM.

The performance of proposed KSVM algorithm is evaluated by comparing its recognition accuracy with conventional classification methods, i.e., Naive Bayes (NB) [56], artificial neural networks (ANN) [57, 58], decision tree (DT) [59], and sparse representation-based classification (SRC) [60, 61]. The overall comparison is shown in Figure 12, and results are summarized in Table 5. One can notice that the recognition performance of KSVM scheme is higher as compared to conventional classification methods.

Extensive experiments are performed to observe the effect of nearest neighbors with varying values, as described in Table 6. One can observe that best results are obtained at ; therefore, optimal values are used throughout the experiments.

We have examined the robustness of our proposed KSVM scheme with different locations of transmitter (router) and receiver (laptop). Our actual layout is “L” as shown in Figure 13(a), while two varying layouts are “L-1” and “L-2,” as shown in Figures 13(b) and 13(c), respectively. The results with varying layouts are described in Table 7. From the results, it is clear that our presented mechanism has acceptable recognition performance at all in-vehicle layouts.

6. Results Analysis and Discussion

In this section, we will discuss about the obtained results with prominent limitations. We observed that all poses are recognized with very good accuracy using our proposed KSVM classification method, however, the accuracy may be degraded due to different limiting factors. In this context, we observed that the CSI of WiFi signal is highly influenced by moving objects. Therefore, other vehicles and people on outside road may influence the recognition performance of the system [36]. Furthermore, the presented mechanism is designed by considering only a single person, i.e., the driver in the vehicle. However, in practical, more than one people may exist inside the vehicle which can make the recognition system more complex. To overcome these issues, we suggest to perform some additional signal processing which will be considered in our future work. Moreover, the effect of driver’s orientation and personalized driving habits is needed to be considered in future study.

Despite these limitations, the proposed device-free WiFi CSI-based driver pose localization system is easy to deploy and more scalable. It is clear that the presented classification algorithm KSVM is a general solution. It can be implemented to solve any device-free WiFi-based localization and gesture or activity recognition problem. In this research work, we have used this scheme for driver’s face localization. The overall performance of our proposed KSVM is far better as compared to existing methods. There are still several aspects which need be considered in future.

7. Conclusion

In this research work, we have proposed a novel classification scheme for WiFi CSI-based device-free driver’s face localization. We have presented a hybrid classification algorithm, i.e., KSVM that is based on the combination of traditional KNN and SVM classification methods. From the experimental results, it can be concluded that recognition performance has been remarkably improved by utilizing our proposed KSVM algorithm. This hybrid classification scheme opens a new window for diverse scale of potential applications. For future, we are interested to explore more complex driving scenarios and intend to observe the impact of roadway types based on the findings presented in this research work.

Data Availability

The data that support the research findings are available on request. The data are not publicly available due to privacy of research participants.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.