Abstract

The aim of this study is to propose a system that is capable of recognising the identity of a person, indicating whether the person is drunk using only information extracted from thermal face images. The proposed system is divided into two stages, face recognition and classification. In the face recognition stage, test images are recognised using robust face recognition algorithms: Weber local descriptor (WLD) and local binary pattern (LBP). The classification stage uses Fisher linear discriminant to reduce the dimensionality of the features, and those features are classified using a classifier based on a Gaussian mixture model, creating a classification space for each person, extending the state-of-the-art concept of a “DrunkSpace Classifier.” The system was validated using a new drunk person database, which was specially designed for this work. The main results show that the performance of the face recognition stage was 100% with both algorithms, while the drunk identification saw a performance of 86.96%, which is a very promising result considering 46 individuals for our database in comparison with others that can be found in the literature.

1. Introduction

Thermoregulation is a process in which a biological organism modifies its internal temperature within certain limits and is commanded by the hypothalamus. For humans, the temperature reached in a normal state is approximately 36.7° (internal) and 33.5° (skin). In the case that the internal temperature is above 36.7°, thermoregulation generates two processes for heat loss, sweating and vasodilation. If the temperature is less than 36.7°, the thermoregulatory system generates thermogenesis processes (vasoconstriction and piloerection) to increase the temperature.

Some research studies have shown that the thermoregulatory system can be altered depending on mood or the consumption of certain foods [1]. In a series of publications, it is concluded that alcohol alters the correct operation of the system responsible for thermoregulation [2, 3], generating an induced vasodilation in the skin, which increases heat loss through convection, leading to a decrease in body temperature that is directly related to the amount of alcohol consumed.

The identification of drunk people has its basis in biology, medicine, and toxicology. Alcohol causes motor disturbances and disturbances in the psychic system, resulting in abnormal behaviour on a biological level, such as dilation of blood vessels [26] and increased blood pressure. In the case of the human face, there is a temperature increase in capillary density, such as around the nose, forehead, and eyes.

Despite the large number of applications in machine learning, such as face recognition, facial expressions, and personal identification, computer systems applied to the classification of drunk people have not been widely studied. The most notable work done is that of researchers at the University of Patras, Greece [711], who have tried to distinguish a sober or drunk person based on the variations in facial features. In simple words, in [7, 10] and [11], it is shown that the frontal region and nose are the most appropriate for acquiring information to classify people as drunk or sober, using a neural network to perform the classification task. In [9], it is concluded that alcohol causes an increase in the temperature of the eyes, which could be useful for classification. In [8], the extraction of the vascular network is proposed based on the works of Buddharaju et al. [36]. When analysing the total area in pixels of the vascular network of a sober subject compared to a drunk subject, it may be possible to obtain an indicator (feature) to identify whether the subject has consumed alcohol. In [10], the aim is to demonstrate that it is possible to differentiate between sober and drunk persons using the intensities of pixels located in certain regions or areas of the face (forehead, nose, and mouth). A space of separable features can be generated using these intensities; however, this study used only a small number of subjects (8 individuals) and the individuals met certain characteristics (similar weight and height), and as such, it is not possible to ensure generalisation of the classifier.

Again in [10], a method is proposed to find regions with higher thermal variation in the face by comparing intensities of a person while sober and while in a drunken state. It is concluded that the forehead region shows an increase in temperature relative to the region of the nose. The feature extraction used in [10] is mainly based on analysing a number of pixels around interesting areas of the face, to which methods to reduce the dimensionality of the features are applied, such as linear discriminant analysis (LDA) or principal component analysis (PCA). The space generated by LDA, called “DrunkSpace,” is used to distinguish between the states of sober and drunk.

In this context, the purpose of this study is to generate a classification system, based in the “DrunkSpace” proposed in [10], to identify whether an individual is drunk, using approaches of computer vision and pattern recognition. The primary objective is to extract features (patterns) of the face of thermal images obtained from a drunk person to build a Bayesian classifier based on a Gaussian mixture model (GMM) [12, 13]. The importance of thermal imaging is that it can be used to obtain patterns based on the thermal information of the face, which is linked to the processes of thermoregulation of the human face and the amount of alcohol consumed. Besides, since there are few databases available with a reduced number of thermal images of drunk people, it is proposed to create a public drunk thermal database to study the drunk person classification (available in https://goo.gl/7Gxs18).

2. Thermal Face Database

This section describes in detail how the Pontificia Universidad Católica de Valparaíso-Drunk Thermal Face database (PUCV-DTF) was acquired.

2.1. Recruitment

An open call was made through posters at the school of Electrical Engineering at the Pontificia Universidad Católica de Valparaíso. People who attended the call were informed of the research protocol and signed an informed consent form which was previously approved by the Ethics Committee of the university.

2.2. Participants

46 individuals, 40 men and 6 women, were selected. The average age of individuals was 24 years with a standard deviation of approximately 3 years (the minimum age was 18 years, and the maximum was 29 years), being in good health without problems related to alcohol consumption. This analysis was performed by a screening test designed to exclude people who consume alcohol regularly.

2.3. Procedure

Subjects attended the robotics lab where they rested for 30 minutes to stabilise the metabolism to the temperature conditions of the laboratory. The subject then consumed a 355 mL can of 5.5° beer, waited another 30 minutes in the laboratory, and then repeated the procedure until four beers had been consumed. See Table 1 for the whole capturing procedure.

Upon completion of the procedure, subjects with about 0.8 g/L of blood alcohol (drunk) should remain in place until the alcohol percentage decreased to under 0.2 g/L. This was verified by measuring with a breath test. It should be noted that during the entire experiment a paramedic was in the lab to verify the status of the individuals.

2.4. Thermal Imaging

The thermal camera used was a FLIR TAU 2 [14] with a resolution of 640 × 480 pixels, a frame rate of 30 frames per second, thermal sensitivity of 50 mK, and a spectrum range between 7.5 and 13.5 μm. The database included 46 individuals with five subsets, a total of 250 images per subject and 50 images per subset. The set is classified into 5 subsets: “Sober,” “1 Beer,” “2 Beers,” “3 Beers,” and “4 Beers,” corresponding to the procedure of capturing the database, as summarised in Table 1. Following the acquisition of thermal imaging, a preprocessing was performed, where all the images were cropped and aligned according to the coordinates of the eyes, which were marked manually, giving a final resolution of 81 × 150 pixels. The thermal images were normalised using (1), which consists of applying a linear mapping to the pixel intensity values in the range : where Imin and Imax are the minimum and maximum values in the image: , and . In the experiments, the value range of [0,255] is used. An example of a subject with 5 subsets is shown in Figure 1. In order to highlight the information obtained from the alcohol consumption, colour has been applied to the images from the subsets. The figure shows also the forehead temperature (FT), the nose temperature (NT), and the alcohol tester (AT) measures. In the parenthesis, the average values of FT, NT, and AT for each of the classes are shown. Please note that the temperature of the thermal face image varies due to alcohol consumption. However, the nasal contrast varies when the subject has consumed beers. This effect could be attributable to either thermoregulation process or the breathing effect produced during exhalation or inhalation phase or a combination of both of them.

In terms of the alcohol concentration obtained in each breath test measurement, the range of values for each class can be observed in Table 2. Note that the values of alcohol concentration obtained show that the classes are overlapping due to the variability of subjects captured (different weight, height, age, sex, etc.). However, in our experiments, we want to classify whether the subject is sober or has consumed any beer (classes 1 Beer, 2 Beers, 3 Beers, or 4 Beers), regardless of the amount of alcohol concentration that the individual possesses. See in detail the values of temperature and alcohol test measurements in a document attached in the link of the database.

3. Feature Extraction and Classification

The feature extraction process consists of selecting local regions of a thermal face image and then extracting the information using dimensionality reduction methods. As indicated above, the process carried out in this study is related to the generation of a “DrunkSpace,” as presented in [10]. In [10], the information from different regions of the face is extracted from a grid of 20 points. The problem with the grid proposed by [10] is that there are no biological details on the location of the feature points. For this reason and inspired by [15], a different grid of 22 points is chosen. In [15], the aim is to generate a thermographic map of the human face, for which a number of sensors are located on the face and neck of each subject. These 22 points are selected at positions where there are capillaries and veins that cross the face, as seen in any of the sample faces shown in Figure 2. Once the grid had been defined, the information is extracted from the thermal face images. As the information in the selected pixel of the grid may be subjected to noise, we decided to consider a neighborhood of 3 × 3 pixels around every point of the grid and compute the average intensity for each of the 22 regions of the face. Since the images obtained from the database are aligned, the grid was used for all subjects of the database as a unique mask.

After having extracted the features of each thermal image, a feature vector of 22 dimensions was generated. Therefore, for a subject, there are 50 feature vectors for each class (“Sober,” “1 Beer,” “2 Beers,” “3 Beers,” and “4 Beers”). It is not recommended to use the 22-dimensional vector for classification due to its high dimensionality, the complexity of generating a hyperplane that separates all of these dimensions, and the high computational expense it would entail. Thus, to reduce the dimensionality, the Fisher linear discriminant analysis was used.

3.1. Fisher Linear Discriminant Analysis (FLD)

First, it is assumed that there is a classification problem which involves two different classes ( and ), and for each class, there are m-dimensional samples. Thus, there is a set of samples: , corresponding to the class and to , and so forth. The FLD method aims to obtain a transformation from the x-space to the y-space, through the linear projection of all the samples () on a line, using the weights . However, the line to be selected must maximise the separability of the projected samples between the different classes. The linear combination that allows us to project the samples from the x-space to the y-space is represented in where

To find the adequate projection line, we must define a separation measurement between the projected data to then maximise this separation. The solution proposed by Fisher [16] is to maximise a function that represents the difference between the mean values of each class, normalised by a measurement of the variability inherent to each class. The objective function to maximise J(w) can be represented as the function of two scatter matrices, SW and SB as shown in

The SW (the scatter matrix within classes) can be represented as the function of the original samples (x-space) or as the function of the projected samples. The SB (the scatter matrix between classes) can be defined in terms of the original and projected samples. Both definitions are shown below together with the definition of the covariance matrix. See (5), (6), and (7) for the scatter matrices. where μi is the mean value of the original samples of the ith class and is the projection weights. Finally, the optimal projection is obtained using the weights w (8):

This optimal solution w is given by the eigenvector(s) of , corresponding to the largest eigenvalue. Using FLD in our problem reduced the 22-dimensional vectors to two dimensions, while also maximising the distance between the means of the different classes and minimising the variance of each class. This is seen in the following example in Figure 3. Figure 3(a) shows a plane where two randomly chosen features are projected from a subject. In this figure, it is clear that not all classes are separable, since many of the features overlap between classes. However, when applying the FLD method, the DrunkSpace obtained (see Figure 3(b)) is completely separable and it is possible to identify projected clusters for each class.

3.2. Gaussian Mixture Model (GMM)

Once the dimensionality reduction (with FLD) of the data had been carried out, an approach called the Gaussian mixture model (GMM) was then selected to perform the classification. A GMM is a probabilistic distribution whose probability density function is a linear combination of a finite quantity of Gaussian distributions. Each one of these Gaussian distributions represents a different class. In our case, four distributions were used as training classes (Sober, 1 Beer, 2 Beers, and 3 Beers) to generate the GMM. The remaining distribution (4 Beers) was used as a test set. The classification of the test data is performed by analysing the likelihood that the data belongs to each distribution of the GMM. Finally, the training data are assigned to the distribution (class) to which it is more likely to belong.

The theory on which this approach is based is briefly described below. Let Y be a D-dimensional real-valued random variable with a probability density function (pdf) that is written as a linear combination of elementary pdfs (see (8)). If the distributions that compose the mixture are Gaussian, the pdf is known as a Gaussian mixture. where I represents the quantity of elementary components (C) of the mixture and θ represents the set of parameters , where is the set of parameters associated to each distribution that composes the mixture and α = {α1, … , αI} is the weight of each distribution of the mixture. The Gaussian density is the components of the mixture, and the mean and covariance are represented by

The solution to the classification problem is explained below. First, consider a set of samples where yj ∈ RD is one of the j independent outcomes of the random variable Y; then, the likelihood of y is defined by the following likelihood function (for independent and identically distributed observations) given by (11).

Now, the likelihood of y should be maximised. Using some algebraic transformations (see [17]), it is possible to represent the likelihood function as an objective function to be maximised, applied to the Gaussian Mixture Model (12):

This is a hard optimisation problem that is commonly solved using the expectation maximisation (EM) algorithm [18]. More information about Gaussian mixture models is available in [12, 13]. In the present study, the GMM procedure is implemented in Matlab R2015a and the EM algorithm was selected to perform the optimisation. Each Gaussian distribution (of the mixture) is defined by the clusters formed by the projected (FLD) samples of the different training classes (“Sober,” “1 Beer,” “2 Beers,” and “3 Beers”), and the test data, which we want to classify, is the projected samples of the class “4 Beers.”

Figure 4 shows an example of a GMM classifier for the same subject used in Figure 3. The figure shows the DrunkSpace classifier generated with the data from the training subsets: “Sober,” “1 Beer,” “2 Beers,” and “3 Beers.” The regions shown in Figure 4 were constructed evaluating the DrunkSpace projections from Fisher in the GMM classifier for the training subsets; thus, one DrunkSpace classifier was generated. Note that the regions obtained represent the probability of being in one of the classes: “Sober,” “1 Beer,” “2 Beers,” and “3 Beers.” The subset “4 Beers” is used to validate the classification. Figure 4 shows the test set (“4 Beers”) in magenta, which is classified primarily in the region belonging to the class “3 Beers.” The probabilities obtained for this example are 0% for the “Sober” class, 0% for the “1 Beer” class, 16% for the “2 Beers” class, and 84% for the “3 Beers” class.

4. Experiment and Evaluation of the Proposed System

The proposal for this study consists of two stages: face recognition and identifying a drunk person. In Figure 5, the general outline of the proposed system is shown. The first stage determines the identity of individuals for further classification analysis. Once the faces of the subjects have been recognised in the first stage, the second stage is responsible for performing a feature extraction through the FLD method and then the classification of drunkenness is performed using the GMM classifier. The explanation of each of the steps is shown in detail below. The database used in the study is the PUCV-DTF, which is described in Section 3.

4.1. Stage 1

Face recognition is a crucial stage for the complete system of identifying drunk people because it determines the identity of the individuals in the database. The face recognition system used in this study was analysed using two current descriptors commonly used in the literature: the LBP descriptor [19] and the WLD descriptor [20]. Both methods use the histogram intersection distance (HI) as a dissimilarity measure.

The experiment consisted of using images from the PUCV-DTF database to generate a gallery set and test set. The gallery set is composed of the face images of the subjects in a sober state, while the test set is composed of images of the subjects after drinking beer (“1 Beer,” “2 Beers,” “3 Beers,” and “4 Beers”). The result obtained by the recognition system is shown in Table 3. As can be seen in Table 3, both descriptors used get a 100% yield in the recognition rates for each test set, which is due to the nonexistence of temporal variability in the images of the database, because they were acquired in a lapse of three hours. However, the performance of the face recognition system may decline if the images were acquired in a higher time lapse [2123].

4.2. Stage 2

Once the identity of the subject is recognised, we proceed to the stage of drunk state identification. To perform this task, the 2nd stage is divided into two substages: feature extraction and classification. The feature extraction substage involves the selection of the proper information from the face, which is then used to determine if the person is sober or drunk. The substage of classification consists of the generation of a classification subspace (called the DrunkSpace), where the extracted features will be projected, and then based on this projection, the classification of the state of the subject will be performed.

As mentioned in Section 3, the feature extraction is performed using a grid of 22 points. The FLD method is then used to reduce the dimensionality of the data by projecting the feature vectors from 22 dimensions to 2 dimensions. The classification space (DrunkSpace) is then generated for each subject based on the Gaussian mixture model approach (see Section 3). Each DrunkSpace was generated using Matlab R2015a. The training data used was that of the projections of the feature vectors from the images of the classes for the sober subject and the subject after drinking 1, 2, and 3 beers. The images of the subject after drinking 4 beers were used as the test set.

The experiment to perform the classification is carried out as follows: the test feature vector, corresponding to the set “4 Beers,” is projected onto the DrunkSpace. This projection is performed using the same eigenvectors used to generate the DrunkSpace (FLD) of each subject (using the training sets “Sober,” “1 Beer,” “2 Beers,” and “3 Beers”). Once the test data had been projected, the classifier returned the likelihood of the data which belong to each training class (“Sober,” “1 Beer,” “2 Beers,” and “3 Beers”). The highest likelihood returned by the classifier indicates how the projected test data (“4 Beers”) were classified. For instance, if the classifier returned the likelihood of the projected test data to belong to each training class which are 0.1, 0.2, 0.3, and 0.4 (“Sober,” “1 Beer,” “2 Beers,” and “3 Beers,” resp.), then it is considered that the projected training data were identified as a drunk subject, with at least 3 beers. The average classification rate for this experiment to all subjects is summarised in Table 4.

From the results shown in Table 4, it is important to note that the system achieved an identification rate of 86.96% for drunk people, if we consider correct classification when the training data was identified as 1, 2, or 3 beers. If we break down this 86.96% into the percentages corresponding to each “drunk” class, we observe that a 9.09% was classified in the “1 Beer” and 29.39% was classified in the “2 Beers” class, while 48.48% was classified in the “3 Beers” class. It is important to highlight the clear trend that can be observed relating to the 86.96% correct identification; this trend shows a progressive increase in the classification rates from the class “1 Beer” to the class “3 Beers.” This can be attributed to the fact that the test data that are being classified correspond to subjects who drank 4 cans of beer, leading to the conclusion that the projected features in the DrunkSpace follow a certain dynamic and they are moving in certain regions of the DrunkSpace while the subject consumes alcohol. It is because of this that almost half of the test data (“4 Beers”) were projected in the region of the DrunkSpace corresponding to the highest level of alcohol consumption.

5. Conclusions

This article presents a computer vision system that identifies people in a drunk state. The system is composed of two main stages, one for face recognition and the other for drunk classification. The face recognition stage provides the identity of an individual previously stored in a database, while the classification stage identifies the state of the individual, indicating if the subject has consumed alcohol. Inspired by [10], the classification stage uses the Fisher linear discriminant (FLD) method to reduce the dimensionality of the feature vectors and generate a subspace called “DrunkSpace.” We then use a Bayesian classifier based on Gaussian mixture models (GMM) to identify whether or not the subject is in a drunk state.

The results obtained in this study show that the proposed system to identify drunk people achieves a success rate of approximately 87%; that is, the system is capable of identifying if a person drank at least one can of beer. In addition, the proposed system achieves recognition rates of 100% in the face recognition stage using the LBP method or the WLD method. It is important that the face recognition stage should be robust, because obtaining the correct identity of the test subject allows us to select the correct “DrunkSpace” corresponding to the test subject, facilitating the work for the GMM-based classifier that determines if the individual is in a drunk state.

It is important to mention that the good results obtained are mainly because the selected locations of the extracted features present results of metabolic changes in the face of the subjects and are also due to processes related to other biological factors, such as the thermoregulation, which can be observed with a thermal camera.

From obtained results, we hope to encourage other researchers to study the classification of people in a drunk state, because it would lead to noninvasive systems which can be beneficial to society. As a future study, we hope to generalise the problem of the classification of groups of people, that is, to generate a generic classifier which can be used to identify people in a drunk state independent of weight, sex, or height and not individually as was performed in this research.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by FONDECYT under Grant 11130466, Grant 1161584, and Grant 11150476 and in part by Pontificia Universidad Católica de Valparaíso DI Regular Code under Grant 039.420/2017.