Driver face monitoring system is a real-time system that can detect driver fatigue and distraction using machine vision approaches. In this paper, a new approach is introduced for driver hypovigilance (fatigue and distraction) detection based on the symptoms related to face and eye regions. In this method, face template matching and horizontal projection of top-half segment of face image are used to extract hypovigilance symptoms from face and eye, respectively. Head rotation is a symptom to detect distraction that is extracted from face region. The extracted symptoms from eye region are (1) percentage of eye closure, (2) eyelid distance changes with respect to the normal eyelid distance, and (3) eye closure rate. The first and second symptoms related to eye region are used for fatigue detection; the last one is used for distraction detection. In the proposed system, a fuzzy expert system combines the symptoms to estimate level of driver hypo-vigilance. There are three main contributions in the introduced method: (1) simple and efficient head rotation detection based on face template matching, (2) adaptive symptom extraction from eye region without explicit eye detection, and (3) normalizing and personalizing the extracted symptoms using a short training phase. These three contributions lead to develop an adaptive driver eye/face monitoring. Experiments show that the proposed system is relatively efficient for estimating the driver fatigue and distraction.

1. Introduction

Improvement of public safety and the reduction of accidents are of the important goals of the Intelligent Transportation Systems (ITS). One of the most important factors in accidents, especially on rural roads, is the driver fatigue and monotony. Fatigue reduces driver perceptions and decision making capability to control the vehicle. Researches show that usually the driver is fatigued after 1 hour of driving. In the afternoon early hours, after eating lunch and at midnight, driver fatigue and drowsiness is much more than other times. In addition, drinking alcohol, drug addiction, and using hypnotic medicines can lead to loss of consciousness [1, 2].

In different countries, different statistics were reported about accidents that happened due to driver fatigue and distraction. Generally, the main reason of about 20% of the crashes and 30% of fatal crashes is the driver drowsiness and lack of concentration. In single-vehicle crashes (accidents in which only one vehicle is damaged) or crashes involving heavy vehicles, up to 50% of accidents are related to driver hypovigilance [1, 35]. According to the current studies, it is expected that the amount of crashes will be reduced by 10%–20% using driver face monitoring systems [6].

The driver face monitoring system is a real-time system that investigates the driver physical and mental condition based on the processing of driver face images. The driver state can be estimated from the eye closure, eyelid distance, blinking, gaze direction, yawning, and head rotation. This system will alarm in the hypovigilance states including fatigue and distraction. The major parts of the driver face monitoring system are (1) imaging, (2) hardware platform, and (3) the intelligent software.

In the driver face monitoring systems, two main challenges can be considered: (1) “how to measure the fatigue?” and (2) “how to measure the concentration?”. These problems are the main challenges of a driver face monitoring system.

The first challenge is how to define fatigue exactly and how to measure it. Despite the progress of science in physiology and psychology, there is still no precise definition for fatigue. Certainly, due to the lack of precise definition of fatigue, there is not any measurable criterion or tool [3]. However, a precise definition for fatigue is not defined yet, but there is a relationship between fatigue and some symptoms including body temperature, electrical resistance of skin, eye movement, breathing rate, heart rate, and brain activity [2, 3, 7, 8]. One of the first and most important symptoms of fatigue appears in the eye. There is a very close relationship between Psychomotor Vigilance Task (PVT) and the percentage of eyelid closure over time (PERCLOS). PVT shows the response speed of a person to a visual stimulation. Therefore, almost in all driver face monitoring systems, eye closure detection is the first symptom used to measure fatigue.

The second challenge is measuring the driver attention to the road. The driver attention can be partly estimated from the driver head and gaze direction. The main problem is that if the head is forward and looking toward the road, the driver does not necessarily pay attention to the road. In other words, looking toward the road is not paying attention to it [3].

In this paper, a new driver face monitoring system is proposed which extracts the hypovigilance symptoms from driver face and eye adaptively. Then, the symptoms are analyzed by a fuzzy expert system to determine the driver state. The remainder of paper is organized as follow. In Section 2, some previous researches are reviewed. The proposed system is described with details in Section 3. In Section 4, the experimental results and discussions are presented. Section 5 is related to the conclusions.

2. Previous Works

The driver face monitoring systems can be divided into two general categories. In one category, driver fatigue and distraction is detected only by processing of eye region. There are many researches based on this approach. The main reason of this large amount of researches is that the main symptoms of fatigue and distraction appear in the driver eyes. Moreover, the processing of the eye region instead of the processing of the face region has less computational complexity. In the other category, the symptoms of fatigue and distraction are detected not only from eyes, but also from other regions of the face and head. In this approach, in addition to processing of eye region, other symptoms including yawning and head nodding are also extracted.

Driver face monitoring system includes some main parts: (1) face detection, (2) eye detection, (3) face tracking, (4) symptom extraction, and (5) driver state estimation. These main parts are reviewed in different systems in the current section.

In the most of driver face monitoring systems, the face detection is the first part of the image processing operations. Face detection methods can be divided into two general categories [9]: (1) feature-based and (2) learning-based methods.

In the feature-based methods, the assumption is that the face in the image can be detected based on applying heuristic rules on features. These methods are usually used for detecting one face in the image. Color-based face recognition is one of the fast and common methods. In these methods, the face is detected based on the color of skin and the shape of face. Color-based face detection may be applied on different color-space including RGB [10, 11], YCbCr [12], or HIS [13]. In noisy images or in the images with low illuminations, these algorithms have low accuracy.

Learning-based face detection uses statistical learning methods and training samples to learn the discriminative features. These methods benefit from statistical models and machine learning algorithms. Generally, learning-based methods have less error rates for face detection, but these methods usually have more computational complexity. Viola and Jones [14] presented an algorithm for object detection, which is very fast and robust. This algorithm was used in [1517] for face detection.

Almost in all driver face monitoring systems, because of the importance of symptoms related to eye, the eye region is always processed for extracting the symptoms. Therefore, before the processing of eye region, eye detection is required. Eye detection methods can be divided into three general categories: (1) methods based on the imaging in the infrared spectrum, (2) feature-based methods, and (3) other methods.

One of the fast and relatively accurate methods for eye detection is the method based on the imaging in the infrared (IR) spectrum. In this method, physiological and optical properties of the eye in the IR spectrum are used. The eye pupil reflects IR beams, and it seems as a bright spot when the angle of IR source and imaging device are suitable. According to this interesting property, pupil and eye are detected. The systems proposed in [4, 1820] used such method for eye detection.

Feature-based eye detection approach includes various methods. Image binarization [5, 21, 22] and projection [23, 24] are two feature-based eye detection methods which assume that the eye is darker than the face skin. Usually, more complicated processing is needed to detect the proper location of eyes, because these methods are simple and have high error rate.

There are few methods for eye detection based on other approaches which were used in driver face monitoring systems. In [10], a geometrical face model with some feature-based methods was used to detect eyes. In addition, some systems such as [15] used hybrid methods for eye detection. In [15], elliptical gray-level template matching and IR imaging system were used for eye detection in day and night, respectively.

Usually, the entire image is searched for detecting the face/eye. Searching the entire image increases the computational complexity of the system. Therefore, usually after early detection of the face/eyes, in the next frames, face/eye tracking is performed. In the most of driver face monitoring systems, Kalman filter [4, 19, 25] or extended versions of Kalman filter such as Unscented Kalman Filter (UKF) [23] were used. However, in some researches, search window [18] and particle filter (PF) [26] were used for tracking.

In the driver face monitoring systems, useful symptoms for fatigue and distraction detection can be divided into three general categories:(i)symptoms related to the eye region;(ii)symptoms related to the mouth region;(iii)symptoms related to the head.

Eye is the most important area of the face where the symptoms of fatigue and distraction appear in it. Therefore, many of the driver face monitoring systems detect driver fatigue and distraction only based on the symptoms extracted from the eyes. The symptoms related to eye region include PERCLOS [3, 4, 10, 15], eyelid distance [25, 27], eye blink speed [4, 10], eye blink rate [4, 19], and gaze direction [4].

Yawning is one of the hypovigilance symptoms related to the mouth region. This symptom was extracted by detecting the open mouth in [11, 16]. These systems detect the mouth based on the color features of the lips in the image.

Some fatigue and distraction symptoms are related to head. These symptoms include head nodding [5, 19] and head orientation [4, 10, 19]. Head nodding can be used for fatigue detection, and head orientation can be used for both fatigue and distraction detection. Driver nodding and lack of driver attention to the road can be detected by estimating the angle of head direction.

After symptom extraction, the driver state has to be determined. The determination of the driver state is considered as a classification problem. The simplest method for detecting the driver fatigue or distraction is based on applying a threshold on extracted symptom [22].

Another method for determining the driver state is knowledge-based approaches. In a knowledge-based approach, decision making about the driver fatigue and distraction is based on the knowledge of an expert which the knowledge usually appears in the form of if-then rules. In [19, 25], fuzzy expert systems were used as knowledge-based approach for estimating the driver state.

More complicated approaches such as Bayesian network [4] and nave dynamic Bayesian network [26] were used for driver state determination. These approaches are usually more accurate than threshold-based and knowledge-based approaches; however, they are more complicated.

3. The Proposed System

The proposed system is a driver face monitoring system that can detect driver hypovigilance (both fatigue and distraction) by processing of eye and face regions. Flowchart of our system is shown in Figure 1. After image acquisition, face detection is the first stage of processing. Then, symptoms of hypovigilance are extracted from face image. However, an explicit eye detection stage is not used to determine the eye in the face, but some of important symptoms related to eye region (top-half segment of the face) are extracted. Additionally, a template matching method is used for detecting the head rotation. Finally, we used a fuzzy expert system to estimate driver hypo-vigilance.

Performing the face detection algorithm for all frames is computationally complex. Therefore, after face detection in the first frame, face tracking algorithms are used to track driver face in the next frames unless the face is lost. Therefore, we use an auxiliary variable denoted by sw for determination of face tracking status in Figure 1. If sw is 0, the face is lost, and face detection algorithm must be performed to localize the driver face. In contrast, if sw is 1, it shows that face is tracked successfully by face tracking method. For system initialization, sw is 0. It means that the system must perform face detection algorithm for first frame.

We used Haar-like features and adaptive boosting method proposed by Viola and Jones [14] for face detection. Face detection algorithm was trained by about 3000 faces and about 300000 nonfaces. For face tracking, full search method is used to find the driver face image in the new frame. The search region is around the center of face image in the last frame which the size of search region is changed according to the size of face image (1.5 times bigger than the size of face image). Then, correlation coefficient between the face image and the subwindows of search region is used as the matching criteria.

3.1. The Symptom Extraction

In the proposed system, two types of symptoms are extracted: (1) the symptoms related to eye region and (2) the symptom related to face region. The symptoms related to eye region are PERCLOS, eyelid distance changes with respect to the normal eyelid distance (ELDC), and eye closure rate (CLOSNO). The symptom related to face region is head rotation (ROT).

3.1.1. The Symptoms Related to Eye Region

The proposed system uses horizontal projection in top-half segment of face image to extract symptoms of driver hypovigilance. Our proposed method uses a spatiotemporal approach without explicit eye detection for feature extraction which is not very sensitive to illumination, skin color, and wearing glasses, because it is an adaptive method. This method is based on changing the horizontal projection of top-half segment of face image during time. Horizontal projection in image is computed by

Length of is equal to the height of . In our proposed system, only horizontal projection of top-half segment of face image is used, so the length of horizontal projection will be equal to half height of driver face image. Before extracting the symptoms related to eye region, system needs to be trained. Because of different eyelid behavior in different individuals, estimating driver vigilance level based on absolute values is not suitable for robustness of driver face monitoring systems. Therefore, for developing a robust and adaptive system, normal values of the vigilance symptoms must be estimated by training phase. In our proposed method, “training” has a little different definition in comparison with general machine learning systems. In the proposed method, training means extracting normal value of vigilance symptoms of driver. Therefore, training phase is a short period of time that we assume that driver is fully aware and looking forward. In training phase, the normal values of PERCLOS, CLOSNO, and ELDC are calculated. Normal values of PERCLOS and CLOSNO are denoted by and , respectively. Because the eye is not detected explicitly, the eyelid distance and normal eyelid distance are estimated implicitly. The eyelid distance is estimated by the horizontal projection of top-half segments of face; therefore, the average horizontal projection of top-half segments of face is computed during training phase to estimate the normal eyelid distance.

Training duration is about 1-2 minutes. In the first 100 frames of training sequence, we suppose that driver eyes are usually open. So, horizontal projection of open-eyes can be estimated by computing average of horizontal projections of first 100 frames. Horizontal projection of open eyes was named , and it can be computed by (2). In (2), is the horizontal projection of frame and is 100. Consider

Eye closure can be detected by computing the correlation of horizontal projection of current frame and . The correlation of and is denoted by . If is larger than , eye is open in frame , otherwise, the eye is closed. Consider

After computing the as horizontal projection of open eyes, a copy of is named as . will be updated during acquisition of new frames using fuzzy running average method [28], while is not updated. In fuzzy running average method, updating is dependent to the matching degree (correlation coefficient) of and . Fuzzy running average is shown in (4). In (4), represents the weighting factor and is calculated based on as shown in (5). Consider

In (5), is a constant (0.8 in our system) and represents the minimum value of . According to (5), varies in range . A higher updates slower. Therefore, is updated during driving based on the changes of .

Eye closure state is saved in a circular list . If eye is open, the current element of will be 1, else, the current element of will be 0. When is full, the oldest data is replaced by new data. Length of () must be equal to the number of training frames (about 1500–3000). is helpful for computing PERCLOS and CLOSNO, but ELDC is computed using correlation of current horizontal projection and . shows the eyelid distance of driver in normal state implicitly.

PERCLOS shows the percentage of eye closure during last frames computed by

CLOSNO shows eye blink rate (frequency) in a given duration. If is the first derivation of , CLOSNO can be computed based on . According to (7), indicates the start and stop frames of eye closure events by +1 and −1, respectively, and other elements of are zero. Therefore, CLOSNO is computed by (8). Consider

ELDC is computed based on correlation between current horizontal projection of open eyes and horizontal projection of open eyes in training phase according to

In (9), is the sigmoid function, and and are the parameters of sigmoid function. and show the slope and displacement of sigmoid function respectively. General form of sigmoid function is shown in

In the proposed system, and . Because the range of sigmoid function is , ELDC is always in range . If ELDC is near to zero, distance of eyelids is normal, but if ELDC approaches to one, distance of eyelids approaches to zero (eye is closed).

3.1.2. The Symptom Related to Face Region

Head rotation is a symptom of distraction which is extracted from face region in the proposed system. The head rotation is estimated based on the changes of face image with respect to the frontal face template. In order to compute the frontal face template, we assume that the driver face is in frontal mode during the first 100 frames. The average face image during these frames is computed as frontal face template. Then, the absolute difference of face image in the current frame and the frontal face template is named . Therefore, the head rotation is estimated by

ROT changes in range . When is near to zero, the ROT is near to zero too, and when is near to one, is near to 1. Greater value indicates more head rotation. The proposed method for head rotation estimation cannot determine the angle of rotation.

3.2. Fatigue and Distraction Detection

In the proposed system, driver fatigue and distraction detection is estimated using a fuzzy expert system (Figure 2). A fuzzy expert system is an expert system that uses fuzzy logic instead of Boolean logic. In other words, a fuzzy expert system is a collection of membership functions, inference engine, and rules that are used to reason about inputs and generate proper outputs. At first, a fuzzy expert system fuzzifies crisp inputs by predefined membership functions to generate fuzzy inputs. Then, fuzzy inputs are processed by an inference engine. In inference engine, the truth value for each rule of rule-base is computed using a fuzzy implication method (usually Mamdani or Larsen methods) and applied to the conclusion part of each rule. These results are assigned to each output variable for each rule as a fuzzy subset. Then, all of the fuzzy subsets assigned to each output variable are combined together to form a single fuzzy subset for each output variable. Finally, the fuzzy subset of each output variable is defuzzified to generate the crisp output.

The proposed fuzzy expert system processes four inputs and generates two outputs. The inputs are (1) PERCLOS, (2) ELDC, (3) CLOSNO, and (4) ROT, and outputs are (1) fatigue estimation and (2) distraction estimation. In order to build a fuzzy expert system, Mamdani fuzzy inference method (also called min-max method) is applied on a set of fuzzy rules. The fuzzy rules are shown in Tables 1 and 2. These rules are extracted by an expert. However, these rules are not very complicated, and they are clear to understand.

The fuzzy membership functions of the inputs are depicted in Figures 36. According to Figures 3 and 4, the membership function of PERCLOS and CLOSNO is defined based on the and , respectively. Additionally, ELDC and ROT are two symptoms that were normalized during the computation, and they always vary in range (Figures 5 and 6). Therefore, the defined membership functions for the inputs are fully adaptive and normalized. The membership functions for the outputs are singleton and are depicted in Figures 7 and 8. The number of fuzzy subsets for each membership function is 3. A larger number of fuzzy subset leads to define more rules in rule-base, and this issue makes the system more complicated. In contrast, a smaller number of fuzzy subset leads to decrease the accuracy of driver state estimation.

The defuzzification method in the proposed method is Center Of Gravity (COG). This method is the most familiar and useful method for defuzzification.

4. Experimental Results

The proposed system was tested on 27 sequences which lasted about 76 minutes. The sequences were captured in both laboratory conditions (indoor) and real conditions (in vehicle) from 5 different individuals using a digital camera.

There is no tool for measuring the fatigue and distraction; therefore, objective evaluation is not possible for evaluating the proposed system directly. In this section, the proposed methods for extracting the symptoms are evaluated at first, and then an example sequence is investigated to evaluate the system subjectively.

4.1. Experiments on Symptom Extraction

The accuracy of computing PERCLOS and CLOSNO is directly dependent to the accuracy of eye closure detection algorithm. Therefore, we evaluate the eye closure detection algorithm in this section. Evaluation of eye closure detection is based on two criteria: false positive rate (FPR) and false negative rate (FNR). False positive error occurs when eye is open but the system detected it as closed eye. False negative error occurs when eye is closed but the system detected it as open eye. Table 3 shows FPR and FNR of the proposed algorithm for eye closure detection in different states.

According to Table 3, the FNR of eye closure detection for drowsy state without glasses is greater than normal state without glasses. In drowsy state, the eyelid distance is reduced and blinking speed is slow. Then, horizontal projection of consecutive frames in drowsy state changes slowly. Therefore, many of eye closure events are not detected, and FNR in drowsy state is greater than normal state. But the FPR of eye closure detection in drowsy state is very low with respect to normal state.

According to Table 3, both FPR and FNR of eye closure detection for normal state with glasses are greater than normal state without glasses. In normal state with glasses, the reflection of glasses may appear in the image as a bright spot near the eye. Therefore, detection of changes of horizontal projection of top-half segment of face is difficult, and eye closure detection will have more error rate.

For investigating the accuracy of ELDC, we tested our method on 9-minute-long sequence. Figure 9 shows four sample frames of this sequence in which the driver is being drowsy after 7 minutes. Figure 10 shows the measured ELDC for this sequence. According to Figure 10, the ELDC can indicate the driver drowsiness correctly.

Accuracy of the proposed method for head rotation detection is investigated by applying a threshold on ROT. If ROT is more than 0.3, the head rotation is detected. According to this experiment, FPR and FNR of 9.2% and 12.1% were achieved for head rotation detection, respectively. In Figure 11, some sample frames of a 2-minute-long sequence are shown in which driver rotated his head to different directions. In this figure, (a) image shows the driver face without any rotation, and other images show head rotation of driver in different directions. The result of head rotation detection by the proposed method for the given video sequence is depicted in Figure 12.

4.2. Experiments on Driver State Estimation

Evaluation of driver state estimation is a difficult task because there is not any criterion for measurement of fatigue and distraction. Therefore, objective evaluation is not possible for driver state estimation.

In this section, the extracted symptoms from a sample sequence are plotted, and fatigue and distraction levels in the sequence are estimated by the proposed system. At this experiment, ten-minute-long sequence is used. The first minute of the sequence is used for training. According to the training phase, is 0.02 and is 13 times per minute. The curvature of PERCLOS, CLOSNO, ELDC, and ROT related to this sequence are plotted in Figures 13, 14, 15, and 16.

The estimated levels of fatigue and distraction are shown in Figures 17 and 18. According to Figure 17, the driver has been semidistracted at about the 3rd minute. The estimated level of distraction seems true, because the CLOSNO was decreased with respect to the during this time. In addition, the driver has been drowsy after 7 minutes. The drowsiness state was estimated based on two symptoms: (1) increasing the PERCLOS during the time from 7th to 8th minute and (2) increasing the ELDC after 8 minutes. These symptoms are depicted in Figures 13 and 15.

4.3. The Processing Speed

The proposed method was implemented in MATLAB R2008a and was tested on a personal computer with Intel Core2 Dou 2.66 GHz and 2 GB RAM memory. The processing speed of the proposed method is more than 5 frames per second. Over 85% of computational complexity of the system is related to face tracking.

4.4. Comparison with Other Methods

In this section, we compare our system with other previous systems. Unfortunately, we cannot compare accuracy of different driver state estimation algorithms, because there is not any scientific and precise criterion to measure fatigue and distraction. Therefore, we only compare the accuracy of different system for symptom extraction.

For eye closure detection, the proposed algorithm is compared with other algorithms presented in [10, 19, 21]. The results of comparison are depicted in Table 4. This table shows that the performance of our proposed method is very good in comparison to other methods, while the experimental setup of our system is more realistic, and we used longer video sequences for our experiments.

For head rotation detection, the proposed method is compared with the algorithm presented in [19]. Unfortunately, the accuracy of other methods for head rotation detection was not reported. For example, accuracy of the methods presented in [4, 10] was not reported. In these papers, only the ability of system to measure head rotation in different direction and in a specific interval was reported.

Table 5 shows the comparison result of the proposed method and the method presented in [19]. The comparison result shows that our method achieves higher precision rate.

5. Conclusions

In this paper, a new adaptive method for symptom extraction and driver state estimation was proposed for driver hypovigilance detection. Two types of symptoms were considered: symptoms related to eye region (including PERCLOS, ELSDC, and CLOSNO) and symptom related to face region (ROT). The proposed method extracts the symptoms related to eye region using horizontal projection of top-half segment without explicit eye detection; the symptom related to face region is extracted based on face template matching. Then, the normal value of the extracted symptoms is calculated during a short training phase. According to the normal value of the extracted features, an adaptive fuzzy expert system estimates the level of fatigue and distraction.

The short training phase makes the system robust and adaptive. In other words, the proposed system may be used efficiently for different individuals with different face and eyelid behaviors. Experiments show that the accuracy of the proposed method for extracting the symptoms of driver fatigue and distraction is very good. Additionally, the system can estimate the driver fatigue and distraction very well by subjective evaluation.

The proposed method was also tested on video sequences captured in visible spectrum, but the color information was not used in any part of the system. In other words, the proposed system operates in gray-level visible spectrum. Therefore, the system may operate in IR spectrum with a few changes. The main disadvantage of our system is the face tracking method which is inaccurate and very computationally complex. Adaptive filters such as Kalman filter may reduce the complexity and increase the processing speed and accuracy of the system.