Abstract

This study proposes a method based on Dempster-Shafer theory (DST) and fuzzy neural network (FNN) to improve the reliability of recognizing fatigue driving. This method measures driving states using multifeature fusion. First, FNN is introduced to obtain the basic probability assignment (BPA) of each piece of evidence given the lack of a general solution to the definition of BPA function. Second, a modified algorithm that revises conflict evidence is proposed to reduce unreasonable fusion results when unreliable information exists. Finally, the recognition result is given according to the combination of revised evidence based on Dempster’s rule. Experiment results demonstrate that the recognition method proposed in this paper can obtain reasonable results with the combination of information given by multiple features. The proposed method can also effectively and accurately describe driving states.

1. Introduction

Fatigue is a common physiological phenomenon that reduces a driver’s attention and ability to control the vehicle. Fatigue driving is one of the major causes of road accidents and poses a significant threat to the safety of drivers and passengers. Common methods for detecting fatigue driving include measurements of physiological features, facial features, and features of driving behavior [1].

Physiological signals related to fatigue consist of electroencephalogram (EEG), electrocardiogram (ECG), electromyography (EMG), and heart rate variability (HRV) [25]. The results of fatigue recognition by physiological signals have high accuracy, but this approach has limitations. One particular limitation is the extraction of signals that are intrusive for drivers thereby making them uncomfortable. Noncontact measurements, such as extracting facial features and features of driving behavior, are practical and will not affect normal driving. Existing studies have shown that fatigued people exhibit certain visual changes in facial features, such as eye closure, movements in gaze direction, yawning, and head movements [611]. Fatigue may be also reflected in features of driving behavior, such as lane departure and steering wheel movements [1214]. However, using facial features to evaluate driving states is not always reliable because extraction of facial features may be affected by variation of illumination and driver’s posture. Thus, image processing algorithms cannot ensure the accuracy of recognition results. The reliability of methods based on driving behavior is dependent on road and climate conditions, vehicle types, and driving habits. Thus, accurate evaluation of driving states is difficult to achieve with a single feature under a complex environment. Models based on information fusion have been developed to increase the accuracy of fatigue recognition [1517]. Lee and Chung [15] proposed a dynamic fatigue monitoring system based on Bayesian network. Deng et al. [17] proposed a fatigue monitoring method based on Dempster-Shafer theory (DST). However, the prior probability needed in Bayesian network is acquired according to experts’ subjective experience. The definition of basic probability assignment (BPA) function in DST has no standard solution because of the nonlinear characteristics of fatigue features.

DST [18, 19] is an improvement of Bayesian inference and an effective method for handling imprecise and uncertain information; DST has been widely used in information fusion. However, Dempster’s rule of combination often obtains unreasonable fusion results when proofs are in conflict [20]. Generally speaking, there are two main reasons to cause highly conflicting evidence [21]. One is the questionable information reliability caused by environmental disturbance or instrument errors. To solve this problem, a series of improvement methods have been proposed. These methods are mostly focused on the modifications of combination rules and revisions of original evidence. The modifications of combination rules include Yager’s rule [22], Qian et al.’s rule [23], Lefevre et al.’s rule [24], and Dezert-Smarandache theory (DSmT) [25]. The revisions of original evidence consist of weighted average strategy [26, 27] and discount strategy [2831]. Jiang et al. [32] used -numbers [33] to evaluate the fuzziness and reliability of the uncertainty in sensor data fusion. The other reason is that the given environment is in an open world, which means the frame of discernment is incomplete due to lack of knowledge [34]. The information fusion environment of traditional DST is in a closed world, which means the frame of discernment consists of all the elements. Deng [21] proposed the generalized evidence theory (GET) to deal with uncertain information fusion in the open world. Jiang et al. [35] measured the weight of evidence based on Deng entropy [36] to handle conflict.

This study proposed a method based on DST and fuzzy neural network (FNN) [37] to recognize fatigue driving. Facial features were adopted as the fatigue parameters of the proposed method, which were extracted by cameras mounted in the vehicle. Given the self-learning and self-adaption abilities of FNN [38], this method was used to obtain the BPA of evidence. Under the driving environment, conflict information was mainly caused by environmental interferences and measurement errors. To address this issue, a modified algorithm is introduced into the method by taking the credibility of each piece of evidence into consideration. This algorithm revises original evidence based on discount strategy to improve the reliability of recognition result.

This study is organized as follows. Measurements of fatigue features are summarized in Section 2. Section 3 proposes a recognition method characterized by multifeature fusion. Section 4 discusses the experiments and results. Conclusions are given in Section 5.

2. Measurements of Fatigue Features

2.1. States Evaluation Criteria

The change in driver’s mental states is a continuous process. Driver fatigue is characterized by various facial features, such as droopy eyelid, eye closure, change in gaze direction, yawning, and change in head posture. Using the camera placed on the dashboard, in front of the driver, we can capture the facial expression video. Frames in the video that represent several expressions are shown in Figure 1.

Kimura et al. [39] analyzed the feasibility of using facial expressions to judge the degree of fatigue and the consistency of results between the facial features analysis and ratings of facial expression videos. In this paper, we quantify the driver’s states into three levels and give corresponding scores for each level: Awake (0 point), Fatigue (1 point), Severe fatigue (2 points). The evaluation criteria of driver’s states are shown in Table 1.

Under the vehicle interior environment, extracting features accurately and rapidly becomes difficult because of limitations in image processing algorithms and environmental interferences. To address this issue, this study considers visual features that can be effectively extracted and measured, like eye movement and mouth movement.

2.2. Facial Features Extraction

In order to obtain the eye movement and the moth movement features, we need to locate eyes and mouth areas first. In this paper, CLM (Constrained Local Model) [40] algorithm is adopted to localize facial landmarks. The facial feature points detected from several frames are shown in Figure 2.

Eyes and mouth areas can be extracted according to the coordinates of feature points, as are shown in Figure 3. By analyzing the relation between feature points, we can acquire some states evaluation indexes.

Detecting fatigue through eye features, we mainly focus on the analysis of eyelid and iris movement. Eyelid movement, reflected by eyes’ opening and closing, is one of the most relevant and also the most visually significant fatigue features. We use changes in distance between upper and lower eyelids over a period of time to represent eyelid movement. Figure 4 shows the eyelid movement in a period of time.

As is shown in Figure 4, is the width of time window and is the average distance between eyelids. According to PERCLOS (Percentage of Eyelid Closure) [41], we take approximate 0.7 L as the threshold to determine whether the eye is opened, which is denoted with the broken line in the figure. Thus, we can get the number of blinks, denoted as , and each eye closing duration, denoted as within a given time window. We choose PERCLOS (Percentage of Eyelid Closure) and MCD (Maximum Closing Duration) as the eyelid movement features to measure fatigue. PERCLOS and MCD are calculated as follows:

Iris movement is mainly reflected in the change of driver’s gaze direction. To get iris movement features, we use Canny algorithm [42] and Hough transform [43] to locate iris region and get the center of iris, as is shown in Figure 5.

In Figure 5, and are left and right corner of eye at time . is the center of iris at time , and is the center of iris at time . The width of time window is , and the pupil rest time is . Total number of frames within the time window is . Features extracted according to iris movement are AAI (Average Asymmetry of Iris) and PRPT (Percentage of Pupil Rest Time). AAI and PRPT are calculated as follows:where represents the distance between two points and is the threshold of iris movement.

Yawning is also one of the most visually significant fatigue features. The width and height of mouth can be calculated according to the feature point coordinates in the mouth region, as is shown in Figure 6. And we use the aspect ratio to measure the mouth opening, defined as .

Figure 7 shows the mouth movement reflected by changes in mouth opening. is the width of time window. When , it means that mouth is opened. Yawning can be considered as a process that mouth keeps opening more than three seconds. The number of yawning is denoted as , and each mouth open duration is denoted as .

The fatigue indexes extracted from Figure 7 are YF (Yawning Frequency) and AOT (Average Opening Time). They can be calculated as follows:

3. Framework of Multifeature Fusion Recognition

The recognition method proposed in this paper combines the advantages of DST and FNN. A single FNN is divided into several subnetworks. Each subnetwork with nonlinear mapping capability is trained to process information from different features to conduct preliminary evaluation of driving states. The BPA function is defined by normalizing network output. The problem of highly conflicting evidence fusion can be solved efficiently by revising conflicting evidence using discount strategy. The degree of conflict is measured by the correlation coefficient of evidence to calculate the credibility of evidence, which is used as discount factor. Evidence fusion is based on DST. Figure 8 shows the framework of multifeature fusion fatigue recognition.

3.1. DST

Let be a finite set of mutually exclusive and exhaustive proposition known as the frame of discernment. Power set is the set that includes all subsets of . BPA is a function defined as , which satisfies the following condition:

The value of represents the degree of evidential support of exact set . If , subset is called focal element.

Let and be two BPAs defined on frame , which are derived from two independent sources. Focal elements are and . Combined BPA can be calculated according to Dempster’s rule of combination:where conflicting factor . This equation measures the conflict degree between and , and represents the normalization factor, which is used to avoid assigning nonzero probability to the empty set.

3.2. Determination of BPA

In multifeature fusion recognition, each feature is considered as a piece of evidence. Pieces of evidence are combined according to Dempster’s rule and new evidence is obtained as basis for recognition. In practical applications, the definition of BPA function is based on the characteristics of data. Common approaches include experimental formula, neural network method, fuzzy set method, and gray correlation analysis [4447]. Neural network can conduct generalization, which can play the role of experts in determining BPA after learning a certain number of training samples. In this study, BPAs are constructed by normalizing the output of FNN. BPA function is defined as follows:where stands for driving state and corresponds to the th node of output.

3.2.1. Structure of FNN

In order to get driving states from eyelid movement, iris movement, and mouth movement features, we design three FNNs with two inputs and three outputs to present signs of fatigue. As is shown in Table 1, driving states are classified into three levels: Awake, Fatigue, and Severe fatigue. So the FNN outputs corresponded to these three levels. For eyelid movement measurement, the inputs of FNN are PERCLOS and MCD. For iris movement measurement, the inputs of FNN are AAI and PRPT. For mouth movement measurement, the inputs of FNN are YF and AOT. Structure of FNN is shown in Figure 9. When a state is identified, the corresponding output node is set to 1; otherwise, it is set to 0. Normalizing the FNN output data over a period of time and BPA of each piece of evidence can be obtained by formula (6).

FNN is composed of five layers, namely, input, fuzzy, fuzzy rules, normalized, and output layer.

The first layer is input layer. Each node in this layer represents an input variable .

The second layer is fuzzy layer. Each node represents the value of a linguistic variable. The fuzzy layer calculates the subjection function of each input variable that belongs to a fuzzy set that corresponds to the value of a certain linguistic variable. For all the features defined in Section 2, their values increase along with the accumulation of fatigue. Therefore, we classify the input into three different linguistic variable values, namely, small, medium, and large. The subjection functions of different linguistic variables are assigned as follows:

The subjection function of linguistic variable medium is described by a Gaussian function, where and represent the central value and width of the function that belongs to th fuzzy set of th input variable. The subjection functions of linguistic variables small and large are described by Sigmoid function, where contributes to the right shift of the subjection function along the horizontal axis and adjusts the shape of function. is the number of input variables and is the fuzzy partitions of . The number of nodes in this layer is , .

The outputs of these functions are normalized to range from 0 to 1. The curve of subjection function based on formula (7) is shown in Figure 10.

The third layer is fuzzy rules layer. Each node represents one fuzzy rule. By calculating the subjection degree, the fitness of each rule can be defined aswhere , , , and is the number of nodes in this layer, which is equal to the number of fuzzy rules.

According to the facial expression of fatigue, take eyelid features as an example, the fuzzy rules can be described as follows:If PERCLOS is small, MCD is small, then output is Awake.If PERCLOS is medium, MCD is large, the output is Fatigue.If PERCLOS is large, MCD is large, then output is Severe fatigue.

The fuzzy rules of eyelid movement, iris movement, and mouth movement are shown in Tables 2, 3 and 4.

The fourth layer is normalized layer. The normalized calculation is defined as follows:

The fifth layer is output layer, which is also called defuzzification layer. Each node in this layer represents an output variable . The defuzzification is defined as follows:where stands for the weight of FNN, which can be adjusted through the learning algorithm, and is the number of output variables.

3.2.2. Learning Algorithm

The error of FNN is defined as follows:where represents the actual output and represents the expected output.

Error back propagation algorithm (BP algorithm) is used for network parameter adjustment to ensure that the actual output is close to the expected output. Based on BP algorithm, the weight of FNN can be adjusted as follows:

And the subjection function parameters and can be adjusted as follows:where stands for the learning rate, .

3.3. Revision of Evidence

Given the limitation of DST, unreasonable results are often obtained when combining highly conflicting evidence. Errors in the feature parameters extracted by camera are inevitable while driving because of environmental interferences. Therefore, the Dempster’s rule cannot be used directly when conflict exists.

Evidence with low reliability should not be negated completely because the cause of the conflict remains unknown. Thus, in the modified algorithm of recognition method, the original pieces of evidence are revised through rational distribution of unreliable evidence before performing information fusion based on DST.

The discount strategy proposed by Shafer [19] is applied in this method. The credibility of evidence, which is regarded as the discount factor, is used to revise the BPAs of original evidence. Parts of the unreliable credibility are distributed to set according to the discount rule. The influence on the result of fusion of conflicting evidence can be reduced by increasing the uncertainty of evidence.

Discount rule is defined as follows:

The key of discount strategy is to measure the credibility of evidence effectively. Conflicting factor in DST is used to measure the degree of conflict. The larger the value of , the higher the degree of conflict. However, when pieces of evidence are highly conflicting, . This result means that conflicting factor cannot accurately represent the degree of conflict.

The correlation coefficient of evidence is proposed in this paper to measure the credibility of evidence.

Let and be two pieces of evidence on frame ; BPAs are and ; and focal elements are and . The correlation coefficient between and is defined aswhere . The larger the value of , the higher the degree of correlation between pieces of evidence.

For pieces of evidence provided by multiple features, the correlation coefficient can be expressed as the following correlation matrix:

Pieces of evidence with high correlation coefficient can support each other. Therefore, the degree of one piece of evidence supported by others can be defined based on the correlation matrix as follows:where support degree represents the reliability of evidence. The evidence with the highest degree of support is used as standard evidence. The weight of each piece of evidence compared with the standard evidence can be calculated and considered as the credibility of evidence:

The credibility of evidence is taken as discount factor. The original pieces of evidence can then be revised according to the discount rule in formula (14). Thus, the reasonability of the results increases after combining pieces of evidence using Dempster’s rule.

3.4. Recognition of Driving States

Driving state is identified based on the BPA of combined evidence. Assuming that , and are defined as follows:

If satisfies the following recognition rule:where and are two thresholds, then is regarded as the current driving state. In the method proposed in this study, is set to 0.2 and is set to 0.6.

4. Experiments

4.1. Data Collection

We carry out the experiment in real driving and capture the facial expression video. The experiment lasts two hours. The video resolution is 640 × 480 and the frame rate is 30 fps. The video is divided into several video segments with equal length. In this paper, we set the length of each segment and also the time interval of driving state prediction by FNN to be one minute. In order to get the standard of state assessment, we ask three people to rate these video segments based on the features in Table 1 and the driving states are confirmed if scores by three raters are the same. If raters have different opinions, the video needs to be reassessed. We extract 30 frames at equal intervals in every 10 seconds as test samples and size of all the frames is adjusted to 437 × 437. Fatigue features are extracted based on the methods in Section 2. The state evaluations by different features are obtained using FNN. In one video segment, BPAs are determined by normalizing the output of FNNs and the recognition result is obtained by the multifeature fusion framework based on Section 3. The process of videos and frames extraction is shown in Figure 11.

Thus, we can compare the driver state assessment performance among the recognition results of multifeature fusion, single feature, and the assessment standard based on scores of video segments. The mouth movement feature may be affected by driver’s talking, so we choose fatigue reflected in eyes as the single feature for comparison. The eyes feature is obtained by the combination of eyelid movement and iris movement parameters.

4.2. Results and Analysis

In two hours of driving experiment, we acquire totally 120 video segments. There are 92 segments showing “Awake,” 21 segments showing “Fatigue,” and 7 segments showing “Severe fatigue.” In one segment where the driving state is considered as “Fatigue,” several frames of fatigue recognition are shown in Figure 12.

In measuring PERCLOS, MCD, AAI, PRPT, YF, and AOT, the width of time windows is set as 10s. The values of eyelid movement, iris movement, and mouth movement features are the input of their corresponding FNN. According to formula (6), three BPAs of fatigue evidence are shown in Table 5, where represents the eyelid feature, represents the iris feature, and represents the mouth feature.

The fusion result of DST with revision of evidence is shown in Table 6.

From Tables 5 and 6, according to the recognition rule in Section 3.4, the driving state can be identified as “Awake” based on single feature, while based on multifeature fusion, the state is identified as “Fatigue.”

For all the video segments, the experiment result is shown in Figure 13 and Table 7. Figure 13 indicates the changing process of driving states recognized by multifeature fusion and single feature.

The comparison of two different methods is shown in Table 7. The correct rate by multifeature fusion recognition is increased comparing to single feature recognition. The accuracy of fatigue driving recognition is improved using both eye and mouth features based on the multifeature fusion framework.

5. Conclusions

This study proposed a method for recognizing fatigue driving recognition. This method is based on FNN and DST to address the complexity of fatigue information. The BPAs of multiple pieces of evidences based on different visual features are obtained by FNNs. DST is applied for fusion of evidence. A modified algorithm with discount strategy is also used for the revision of conflicting evidence to enhance the rationality of fusion results. This algorithm adopts the correlation coefficient of evidence to measure the degree of conflict. The credibility of evidence, which is measured by the correlation coefficient, is represented as discount factor for evidence revision. The results of simulations indicate that this recognition method can overcome the interference of unreliable information that originated from environmental interferences and measurement errors. Therefore, the proposed method can increase the accuracy and robustness of fatigue driving recognition.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the Major Project of Natural Science by Education Department of Anhui Province (no. KJ2014ZD04), the Key Project of Natural Science by Education Department of Anhui Province (no. KJ2015A316), the Outstanding Young Talents at Home Visit the School Training Project (no. gxfxZD2016101), and National Natural Science Foundation of China (51605464).