Abstract

Fatigue driving is becoming a dangerous and common situation for drivers and represents a significant factor for fatal car crashes. Machine learning researchers utilized various sources of information to detect driver’s drowsiness. This study integrated the morphological features of both the eye and mouth regions and extensively investigated the fatigue detection problem from the aspects of feature numbers, classifiers, and modeling parameters. The proposed algorithm REcognizing the Drowsy Expression (REDE) achieved the 10-fold cross-validation accuracy 96.07% and took about 21 milliseconds to process one image. REDE outperformed the existing four studies on both fatigue detection accuracy and running time and is fast enough to handle the task of real-time fatigue monitoring captured at the rate of 30 frames per second. To further facilitate the research of fatigue detection, the raw data and the feature matrix were also released.

1. Introduction

The AAA Foundation released a report on traffic safety in 2015, and the data strongly suggested the extent and seriousness of fatigue driving [1]. During 2009-2013, 21% of fatal car crashes involved a drowsy driver, and more than two out of five drivers (43.2%) admitted the experiences of fatigue driving. However, there are only 17% of fatal car crashes involving a drowsy driver during 1999-2008. The increasing rate of fatigue driving is becoming more common and calls for better fatigue detection and warning system [2].

Machine learning-based fatigue detection algorithms may be roughly grouped into three categories. Firstly, fatigue-specific patterns in driver’s physical behaviors, including eye behaviors [35] and yawning frequency [6], extracted from the captured video may be useful to detect fatigue. The integrated analysis of these multicue features may potentially improve the detection accuracies of the fatigue detection accuracy [710]. Secondly, fatigue-specific measurements may be extracted from a drowsy driver’s physiological signals, e.g., heart rate variability [11] and EEG (electroencephalography) [2, 12, 13]. Thirdly, the external monitoring of the vehicular behaviors may also facilitate driver’s fatigue detection. For example, Morris et al. predicted driver’s drowsiness by the car’s lane deviation and vehicle heading variation [14]. And Krajewski et al. monitored the behaviors of the steering wheel to predict whether the driver is in fatigue [15].

This study hypothesized that the integration of multiple physical behaviors might achieve better fatigue detection performance. Many previous studies focused on a single type of driver’s physical behaviors, e.g., eye activities or mouth yawning. This study extracted the Local Binary Pattern (LBP) features from the eye and mouth regions and summarized the principal components from LBP features for training the fatigue detection models. The proposed algorithm REcognizing the Drowsy Expression (REDE) outperformed the existing four fatigue detection studies on both accuracies and running times.

2. Material and Methods

2.1. Problem Definition

Fatigue detection is defined as a binary classification problem of determining whether a person is in fatigue or not based on this person’s face photo. The positive samples are the face photos of persons in fatigue and are denoted as , while the face photos of persons without fatigue are the negative samples and are denoted as . The total number of samples is . Let be the -feature vector for a given sample, and the binary classification problem investigated in this study is to assign as either a positive or a negative sample [16].

2.2. Data Collection

There is no public database of fatigue visual images and videos, so we built our video database captured from our recruited volunteers. Table 1 describes the details of the volunteered participants. Seven male and seven female participants were recruited and each of the fourteen volunteers signed the informed consent form. All of the participants do not have sleeping disorders that may affect neurocognitive ability and do not take food/drink/medicine that affects the neurocognitive system [17].

Two videos were captured for each participant at the statuses of nonfatigue and fatigue, respectively. A participant took a normal diet and a full rest for the first day and had one video recording at 8:00 AM of the next day in a rest state. After this time, the participant had no rest for 18 hours, and another video recording was taken at 3:00 AM of the third day in a rest state. The video was captured using the CMOS 5-megapixel camera in the MacBook Pro (13-inch screen), and the video resolution is in pixels. The capturing frequency is 30 frames per second. Each video was recorded for 5 minutes and has 9000 () images.

We randomly extracted 300 images from each video and labeled them as being fatigue or not by the majority voting from 100 volunteers. Two videos were recorded for each participant, so we have 8400 () images in total. Let be the number of votes for that the image is fatigue and be the number of votes against this statement. An image was removed from further analysis if . There were 1581 images removed for this rule. An image with a nonfatigue expression was defined as a positive sample, while an image with the fatigue expression was a negative sample. A person usually stays nonfatigue longer than the time of being fatigue. So we randomly chose 80 positive images and 40 negative images for each participant. The final dataset has 1680 samples with 1120 positive and 560 negative samples, respectively.

The raw images of the eyes and mouths were released at http://www.healthinformaticslab.org/supp/resources.php. The data matrix of grayscale values and extracted LBP (Local Binary Pattern) features were also released as a benchmark dataset at this website.

2.3. Data Preprocessing

This study hypothesized that the morphological patterns in the regions of the eyes and mouth may well represent the fatigue status. So we detected and extracted the regions of two eyes and one mouth from each image by Dlib version 19.4.99 [18] described by Kazemi and Sullivan [19]. The detection accuracy of a random subset was inspected manually and illustrated in Figure 1.

Each image was preprocessed by the following steps, as illustrated in Figure 2. An image was first transformed into the grayscale colors by the formula , where were the pixel values of the red/green/blue channels [8]. This conversion was provided by the function imread() from the library OpenCV. A gamma correction 1/2.2 was usually applied to normalize the light variations for human face detection, as recommended in [20]. Let be the maximal grayscale pixel value of the image matrix after the gamma correction. The final image matrix was calculated as . After being detected using Dlib, the image sizes of the face, eye, and mouth were scaled as , , and in pixels, respectively.

2.4. Extracting Features from the Images

This study extracted the Local Binary Patterns (LBP) from the eye and mouth images and then calculated the eigenvector of the LBP feature vector, as illustrated in Figure 2.

The parameter pCellSize of the LBP algorithm was set to by default. So each sample has 512 features for each eye image and 1024 features for the mouth image. There are 2048 features in total for each sample with two eye images and one mouth image.

2.5. Experimental Environments

This study integrated the morphological features of both eyes and mouth to REcognizing the Drowsy Expression (REDE). A binary classification model was trained over the principal components calculated from the LBP features of the facial images.

The experiments were implemented using Python and C++ on an Intel Core-I5 2.70 GHz CPU and 8 GB memory. The machine learning experiment was conducted based on the Python scikit-learn version 0.18.2 [21]. The processing of video and face recognition was implemented using the software Photo Booth version 9.0.0 [22]. The whole procedure of extracting features and predicting the binary classification result for an image took approximately 0.02 seconds. So the pipeline may be applied to do the real-time monitoring of videos captured at 30 frames per second in this study.

2.6. Classification Performance Measurements

Sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthews’ correlation coefficient (MCC) were widely used to evaluate how well a binary classification model performs [16, 23, 24]. This study chose these four performance measurements to demonstrate the performance of fatigue detection algorithms. Let TP and FN be the numbers of positive samples that are predicted as positives and negatives, respectively. The measurements TN and FP are defined as the numbers of negative samples that are predicted as negatives and positives, respectively. So the number of positive samples is . And the number of negative samples is . The performance measurements Sn and Sp are the percentages of correctly predicted positive and negative samples, i.e., and . The measurement accuracy is defined as the overall percentage of correctly predicted samples, i.e., [25]. Matthews’ correlation coefficient (MCC) ranges between -1 and 1 and is defined as [2628]. The function sqrt() returns the square root of .

All the performance measurements mentioned above were calculated using the 10-fold cross-validations [29].

3. Results and Discussion

3.1. Baseline Information of the Dataset

After the video capturing and image preprocessing, we retrieved 1120 positive images and 560 negative images, as shown in Table 2. One male and one female participants wear glasses. So 1/7 facial images were included for further analysis proportionally.

3.2. Not All the Features Contribute to Fatigue Detection

Firstly, we investigated the differences of the 2048 LBP features between the fatigue and nonfatigue samples, as shown in Figure 3. The maximal difference was 0.0387 for the 452nd feature. Moreover, there were only 26 features with the differences at least 0.01. The -test was used to evaluate the discriminative power of each LBP feature, as shown in Figure 3(b) [30]. 512 LBP features had significant discriminative powers of fatigue detection, i.e., -test value ≤ 0.05. So the majority of the LBP features did not have discriminative powers for detecting fatigue samples. The value was calculated by the -test function in the Python package scipy version 1.1.0 [31].

So we calculated the principal components of all the 2048 LBP features and evaluated the top 20 principal components for the fatigue detection problem, as shown in Figure 4. The principal components were calculated on the data mixture of both fatigue and nonfatigue samples, but they already demonstrated significant differences between these two groups of samples, as shown in Figure 4(a). The discriminative power of these principal components was evaluated by the -test value. Figure 4(b) demonstrates that 11 out of the top 20 principal components had a value ≤ 0.05. The 9th principal component achieved the best discrimination by value = 5.2805e-35. The principal components were calculated by the PCA function in the Python package scikit-learn version 0.18.2 [31].

This study evaluated three image feature extraction algorithms, i.e., LBP, HOG [32], and LeNet [33], for their classification performances on the fatigue detection problem. The performances were calculated by the 10-fold cross-validation of the classifier decision tree (scikit-learn version 0.18.2). Figure 5 demonstrates that LBP achieved the best classification performances among these three algorithms. HOG performed slightly worse than LBP, and LeNet features achieved 0.0458 worse in Acc than LBP. So the following sections used the LBP-extracted features to build the fatigue detection model.

3.3. Optimizing the Features

We evaluated different values for the parameter cell size (pCellSize) of the image feature extraction algorithm LBP and the parameter component number (pComponentNum) of the principal component analysis (PCA). The 10-fold cross-validation accuracy of the Support Vector Machine (SVM) classifier with the radial basis function (RBF) kernel was calculated to evaluate a given feature subset. The experiment was implemented by the Python scikit-learn version 0.18.2, and all the other parameters of these algorithms were used with their default values.

The best prediction accuracy 90.60% was achieved with , as shown in Figure 6. Overall, the classifier SVM-RBF performed very well on the fatigue detection problem and achieved the accuracies ranging between 87.86% and 90.60%. If Width was fixed as 16 or 32, SVM-RBF performed the best when . If , SVM-RBF performed slightly better (1.01%) on than . If Height was fixed as 16 or 32, was the best choice for the classifier SVM-RBF, while the SVM classifier performed slightly better (0.12%) in accuracy on than , if Height was 8. So the parameter was chosen as the default value for further analysis.

The classifier SVM-RBF reached the best accuracy 95.18% at the value , as shown in Figure 7. A smaller or larger value of pComponentNum reduced the fatigue prediction accuracy, and the classifier SVM-RBF performed stably well after the parameter pComponentNum is greater than 11, with the maximal decrease 0.65% in accuracy when . Experiments in the following sections were conducted using .

3.4. Choosing a Good Classifier

Six binary classification algorithms were compared for their prediction performances of the fatigue detection problem, as shown in Figure 8. The Support Vector Machines with both Linear and RBF kernels were denoted as SVM-L and SVM-RBF. The -nearest-neighbor (KNN) was also used to evaluate how a simple distance-based classifier performed on the fatigue detection problem. A decision tree (DTree) classifier has an inherent nature of easy interpretability, and random forest (RForest) classifier tends to have a better classification performance. The extreme gradient boost (XGBoost) algorithm was a recently developed meta-algorithm and outperformed the existing classification algorithms in many cases [34]. XGBoost was implemented in the Python package XGBoost version 0.71. All the other classifiers were provided in the Python package scikit-learn. The features were extracted from the images using the parameters and .

Figure 8 demonstrates that the classifier SVM-RBF performed the best on three of the four performance measurements of the fatigue detection problem. Although SVM-L shares the same framework of support vectors with SVM-RBF, SVM-L performed much worse than SVM-RBF. XGBoost performed similarly well with SVM-RBF, with a slight decrease (4.39%) in accuracy. XGBoost achieved the best sensitivity , but its specificity Sp was worse than SVM-RBF. KNN generated the third best fatigue detection model, but the two tree-based classifiers RForest and DTree did not achieve satisfying classification accuracies. So SVM-RBF was chosen as the classifier for the final model.

3.5. Optimizing the Parameters of the Classification Module

We further refined the best classifier SVM-RBF by screening for the best values of the two parameters and Gamma, as shown in Figure 9. Twenty values between 0.125 and 3.000 with step size 0.125 were evaluated for the parameter . Seven values {0.100, 0.178, 0.316, 0.562, 1.00, 1.334, and 1.778} were evaluated for the other parameter Gamma. A grid search was conducted to find the best choices of these two parameters. The classifier SVM-RBF achieved the best accuracy of 96.07% when and .

3.6. Summarization of the Best Model

The best model to detect fatigue using the facial images was achieved after the above optimization procedure, as shown in Figure 10. The classifier SMV-RBF with the parameter and achieved a satisfying performance on our dataset. The sensitivity and overall accuracy were very high, while the specificity might need further improvements.

3.7. Comparison with the Existing Studies

Four recent studies were chosen for a comparison with our work, as shown in Table 3. Ma et al. refined the background removal of the two-stream video and trained a depth video-based convolutional neural network (CNN) model [35]. They achieved an overall fatigue detection accuracy of 91.57%. Wathiq and Ambudkar proposed a fatigue detection model based on the texture features of facial components and achieved the accuracy 94.32% on the largest dataset [36]. Wang et al. focused on the patterns of the eye activities and implemented an Android-based fatigue detection system with an accuracy of 80.75% [37]. Gao and Wang described the fatigue status by the percentage of eye closure time, the average eye closure time, and the eye blinking frequency and achieved the fatigue detection accuracy 88% [38].

The algorithm REDE in this study achieved the best accuracy 96.07% using the SVM-RBF algorithm and took 21 milliseconds to process one image. So REDE outperformed all the existing four studies on the fatigue detection accuracy. Only the Wathiq and Ambudkar study discussed the processing time of one facial image and ran about 3.33 times slower than REDE proposed in this study. The calculation of REDE was also fast enough to support the real-time fatigue monitoring since the video capturing technology used in this study generated 30 images per second.

4. Conclusions and Future Directions

The facial image-based fatigue detection system REDE consists of three major steps. Firstly, the regions of two eyes and one mouth were extracted from the facial image. Secondly, the LBP algorithm was used to extract features from the three regions, and the principal components were calculated using the PCA algorithm. Lastly, the SVM model with the RBF kernel was trained to do the classification. REDE outperformed the four existing fatigue detection algorithms and ran fast enough to do the real-time fatigue monitoring.

We will focus on improving the first two steps of REDE in the future. To facilitate the fatigue detection community, we also released the images for feature extractions in this study at http://www.healthinformaticslab.org/supp/resources.php. The readers and fatigue detection researchers may also directly use our LBP feature matrix extracted from the images for classification model refining.

Data Availability

To further facilitate the research of fatigue detection, the raw data and the feature matrix were also released at http://www.healthinformaticslab.org/supp/resources.php.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Authors’ Contributions

FZ, XF, and KL conceived the project, designed the experiments, and drafted the manuscript. KL, SW, CD, and YH collected the data and carried out the experiments. FZ, KL, and XF proofed and polished the manuscript and organized this project.

Acknowledgments

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB13040400), Jilin Provincial Key Laboratory of Big Data Intelligent Computing (20180622002JC), the Education Department of Jilin Province (JJKH20180145KJ), and the startup grant of the Jilin University. This work was also partially supported by the Bioknow MedAI Institute (BMCPP-2018-001), and the High Performance Computing Center of Jilin University, China. We appreciate the constructive comments from the anonymous reviewers.