Abstract

We examine the efficiency of four machine learning algorithms for the fusion of several biometrics modalities to create a multimodal biometrics security system. The algorithms examined are Gaussian Mixture Models (GMMs), Artificial Neural Networks (ANNs), Fuzzy Expert Systems (FESs), and Support Vector Machines (SVMs). The fusion of biometrics leads to security systems that exhibit higher recognition rates and lower false alarms compared to unimodal biometric security systems. Supervised learning was carried out using a number of patterns from a well-known benchmark biometrics database, and the validation/testing took place with patterns from the same database which were not included in the training dataset. The comparison of the algorithms reveals that the biometrics fusion system is superior to the original unimodal systems and also other fusion schemes found in the literature.

1. Introduction

Identity verification has many real-life applications such as access control and economic or other transactions. Biometrics measure the unique physical or behavioural characteristics of an individual as a means to recognize or authenticate their identity. Common physical biometrics include fingerprints, hand or palm geometry, and retina, iris, or facial characteristics. On the other hand, behavioural characteristics include signature, voice (which also has a physical component), keystroke pattern, and gait. Although some technologies have gained more acceptance than others, it is beyond doubt that the field of access control and biometrics as a whole shows great potential for use in end user segments, such as airports, stadiums, defence installations, but also the industry and corporate workplaces where security and privacy are required.

Biometrics may be used for identity establishment. A new measurement that purports to belong to a particular entity is compared against the data stored in relation to that entity. If the measurements match, the assertion that the person is whom they say they are is regarded as being authenticated. Some building access schemes work in this way, with the system comparing the new measure against the company’s employee database. Also authentication of the identity of a person is frequently used in order to grand access to premises or data.

Since authentication takes place instantaneously and usually only once, identity fraud is possible. An attacker can bypass the biometrics authentication system and continue undisturbed. A cracked or stolen biometric system presents a difficult problem. Unlike passwords or smart cards, which can be changed or reissued, absent serious medical intervention, a fingerprint or iris is forever. Once an attacker has successfully forged those characteristics, the end user must be excluded from the system entirely, raising the possibility of enormous security risks and/or reimplementation costs. Static physical characteristics can be digitally duplicated, for example, the face could be copied using a photograph, a voice-print using a voice recording, and the fingerprint using various forging methods. In addition static biometrics could be intolerant of changes in physiology such as daily voice changes or appearance changes.

Unimodal biometric systems have to contend with a variety of problems such as noisy data, intraclass variations, restricted degrees of freedom, non-universality, spoof attacks, and unacceptable error rates. Some of these limitations can be addressed by deploying multimodal biometric systems [1, 2] that integrate the evidence presented by multiple sources of information. Indeed, the development of systems that integrate two or more biometrics is emerging as a trend. Such systems, known as multimodal biometric systems, are expected to be more reliable due to the presence of multiple, (fairly) independent pieces of evidence. These systems are able to meet the stringent performance requirements imposed by various applications. They address the problem of nonuniversality, since multiple traits ensure sufficient population coverage. They also deter spoofing since it would be difficult for an impostor to spoof multiple biometric traits of a genuine user simultaneously. Furthermore, they can facilitate a challenge response type of mechanism by requesting the user to present a random subset of biometric traits thereby ensuring that a “live” user is indeed present at the point of data acquisition.

In the present paper we present the development and evaluation of four machine learning algorithms for the fusion of the similarity scores of several biometric experts to form a multimodal biometrics system, aiming to raise significantly the recognition accuracy and reduce the false acceptance and false rejection rates. The supervised algorithms are trained using samples from a well known biometrics database and then validated using samples from the same database that are different from the training ones. The aim is to compare the developed algorithms with existing techniques and also find the most efficient one out of the four, in order to use it for the fusion of novel biometrics within project HUMABIO [3].

2. Supervised Fusion Algorithms

Fusion at matching score level is the most common approach of fusion in multimodal biometric systems [2]. This fact is mainly due to the easy accessibility and availability of the matching scores in many biometric modules. The input for a fusion algorithm at matching score level is the (dis-) similarity score provided by a biometric module (Figure 1). There are different approaches of merging scores at the matching score level. In this approach, output scores of the individual matching algorithms constitute the components of a multidimensional vector for example, a 3D vector is created if scores from a face, gait, and voice matching module are available. The resulting multi-dimensional vector is then classified using a classification algorithm such as Support Vector Machines (SVM), Fuzzy Expert systems (FES), neural networks, and so forth, to solve the two class classification problem of classifying the output vector into either “impostor” (unauthorized users) or “genuine” (authorized users, or clients) class. One advantage of the approach is the fact that scores may be inhomogeneous such as a mix of similarities and distances possibly located in different intervals. Thus, no pre-processing is required for classification fusion.

In supervised learning, the learning algorithm is provided a training set of patterns (or inputs) with associated labels (or outputs). Usually, the patterns are in the form of attribute vectors and once the attribute vectors are available, machine-learning methods can be applied, ranging from simple Boolean operators, to Bayesian classification and more sophisticated methods. The performance of a fusion algorithm relies on the tuning of the system. This tuning usually consists in a group of hyper-parameters that can be set manually (such as type of kernel in SVMs, number of chromosomes in Genetic Algorithms (GA), etc.) and another group that is set during the training phase. The training is used so that the algorithm can estimate (learn) the client and impostor spaces and is crucial for the performance of the fusion system.

In this study four state of the art fusion techniques were utilized, namely Support Vector Machines, Fuzzy Expert Systems, Gaussian Mixture Models and Artificial Neural Networks. Each of these techniques follows a different philosophy for the fusion of the unimodal biometric inputs in order to produce an overall estimation of whether the person is a client or an impostor.

2.1. Support Vector Machine

A typical SVM implementation was developed [4]. A radial basis kernel function was used to map the input data to a higher dimensional space, in which they were linearly separable [5]. The radial basis kernel (RBF) was selected in order to handle probable nonlinearities between the input vectors and their corresponding class. It also has less hyperparameters than the polynomial kernel thus making training easier.

After the selection of the kernel, the process followed consists of identifying the best pair of and , that is, the pair with the best cross-validation accuracy. Following the guidelines found in [6], the training set was divided into equal sized subsets and one subset was the validation dataset using the classifier that was trained on the remaining subsets. The process was repeated sequentially until all subsets acted a validation dataset. The selection of and was done via grid search, that is, trying exponentially growing sequences of and [7]. The penalty parameters for each of the two classes (“Genuine”, “Impostor”) was done via complete enumeration trials.

The final trained SVM model was then used with the optimal , pair in order to check the classifier performance on the test dataset which is comprised of “unknown” patterns that were not used for the SVM training.

2.2. Fuzzy Expert System

A TSK FES [8] was developed as described in [9]. The FES’s premise space consisted of three inputs (). Each premise input was segmented by three trapezoid membership functions described by the following equation: where the parameters and locate the “feet” of the “th” trapezoid of the “th” premise input and the parameters and locate the “shoulders”.

This segmentation leads to the creation of 27 three-dimensional fuzzy rules (Figure 2).

The firing strength of the rule , representing the degree to which is excited by a particular premise input vector , is determined by

The premise inputs were selected via extensive experimentation from the available biometric experts (shown in Table 1). Each fuzzy rule produces an output which is a linear function of the unimodal classifiers’ scores shown in Table 1 so as to include all of the available information from the unimodal biometric experts: where NI is the number of unimodal classifiers.

The FES estimation of the client’s authenticity is a synthesis (weighted average) of the 27 fuzzy rule outputs:

This estimation is compared to a threshold and if it is higher than the FES classifies the pattern as genuine transaction while if it is lower it classifies it as impostor.

The parameters of the fuzzy rules’ linear output functions (3), the parameters of the trapezoid membership functions (1) that define the position of the fuzzy rules in the premise space (or in other words the segmentation of the premise inputs via the shape and positioning of the membership functions), and also the decision threshold are optimized by a real coded [9] genetic algorithm [10].

The GA fitness function was selected so as to minimize the false acceptance of impostors and maximize correct authentication rate through the evolution of the GA: where correct_out is the accumulated distance of the fuzzy output from the decision threshold in case of correct authentication (genuine or impostor), error_out is accumulated distance of the fuzzy output from the threshold in case of incorrect authentication for all training patterns, and false_normal is the number of falsely accepted impostor individuals.

In that way the solutions that have minimum false acceptance occurrences have higher fitness value. The same stands for solutions (chromosomes) that produce outputs that have larger distance from the threshold in correct authentications (more robust solutions) and smaller distance from the threshold in case of erroneous authentications (thus driving wrong solutions towards the threshold and rectifying them).

2.3. Gaussian Mixture Model

Bayesian classification and decision making is based on probability theory and the principle of choosing the most probable or the lowest risk (expected cost) option. The Gaussian distribution is usually quite good approximation for a class model shape in a suitably selected feature space. In a Gaussian distribution lies an assumption that the class model is truly a model of one basic class. However, if the actual model is multimodal, this model cannot capture coherently the underlying distribution. Gaussian Mixture Model (GMM) is a mixture of several Gaussian distributions and can therefore represent different subclasses within a class [11]. The probability density function is defined as a weighted sum of Gaussians: where is the weight of component , and

The parameter list defines a particular Gaussian mixture probability density function. Estimation of the Gaussian mixture parameters for one class can be considered as unsupervised learning of the case where the samples are generated by individual components of the mixture distribution and without the knowledge of which sample was generated by which component.

A GMM was developed, which comprised of four mixture components. The weights of the components were estimated after extensive experimentation.

2.4. Artificial Neural Network

The fourth algorithm was a three-layer feed-forward neural network (NN) [12]. The layers consist of input neurons, hidden neurons, and one output neuron where is equal to the number of unimodal biometric experts (from Table 1, ) and is set through experimentation equal to ten. The neurons are fully interconnected and a bias is applied on each neuron. The transfer function is selected to be sigmoid so as to address nonlinearities of the input data set. For the training of the weights, the typical back propagation method was used. The optimum number of training iterations and training parameters was set heuristically. Convergence was achieved after 500 iterations.

3. Benchmark Database

The developed fusion schemes were tested on the publicly available XM2VTS face and speech database [13, 14]. The XM2VTS database contains facial and speech data from 295 subjects, recorded during four sessions taken at one-month intervals. It includes similarity scores from five face experts and three speech experts. The protocol consists of two sets: the development (training) set and the evaluation (validation) set. The development set, which is used for training, contains scores from three multimodal recordings for each of 200 client users and eight transactions from each of the 25 impostors. The evaluation set, which is used for testing the system, contains scores from two multimodal recordings of the (same) 200 client users and eight transactions for each of the 70 (new) impostors. The impostors in the evaluation set are different from the impostors in the development. Thus, the development set contains 600 (200 × 3) client transactions and 40000 (25 × 200 × 8) impostor transactions whereas the evaluation set contains 400 (200 × 2) client transactions and 112000 (70 × 200 × 8) impostor transactions.

Two metrics are computed: the False Acceptance Rate (FAR) and the False Rejection Rate (FRR) defined as

The Half Total Error Rate (HTER) is also reported which is defined as

The Equal Error Rate (EER) is computed as the point where FRR = FAR; in practice, FRR and FAR are not continuous functions and a crossover point might not exist. In this case, the interval [EERlo, EERhi] should be reported. Also, another useful tool for the evaluation of the performance of a biometric system is the Rate Operating Characteristic (ROC) curve, which is produced by plotting FAR versus FRR.

The EERs of the unimodal experts for the XM2VTS database are shown in Table 1.

Figure 3 shows indicative FAR-FRR diagrams for two face and voice unimodal biometrics experts of the XM2VTS database. It can be seen from the threshold range that the expert scores that characterize someone as impostor or genuine differ significantly. However, this does not pose a problem for the fusion algorithms.

4. Test Results

4.1. Comparative Results

Table 2 summarizes the results of the investigated machine learning algorithms for multimodal fusion. More specifically, the classification of the XM2VTS patterns was performed using the SVM, GMM, FES, and NN fusion schemes. Furthermore, some simple combination rules (AND, OR) were also tested. The first conclusion we can reach from the results illustrated in the table is that all of the fusion schemes perform better than the best performing unimodal expert (i.e., expert 6 with EER 1.09%) and the classification absolute improvement using the SVM fusion method is 0.84% (SVM HTER = 0.25%, ~77% relative improvement over unimodal expert 6 classification error). This corroborates the statement that the effective combination of information coming from different experts can improve significantly the performance of a biometric system. Moreover, this table also confirms the superiority of the SVM fusion scheme over the other machine learning techniques. More specifically, the FAR and FRR using the SVM fusion classifier on the XM2VTS database are 0.0% and 0.5%, respectively. The superior performance of the SVM fusion classifier over the second best FES, which is 0.2%, is mainly attributed to the more efficient modelling of the feature space. Moreover, the SVM fusion expert performs very satisfactory when the number of the feature vectors (comprised of the matching scores in this case) is relatively large, as in the case of the XM2VTS database, where there are 8 unimodal experts.

4.2. Comparison with State-of-the-Art Methods

An experimental study was also conducted to compare the developed schemes with state-of-the-art fusion methods on the XM2VTS database. The same unimodal experts and the same protocol were employed so that any performance gain or decrease can be attributed only to the fusion algorithm. The following table illustrates the HTER values of various fusion schemes, as reported in [15].

The first conclusion from Table 3 is that the reported results are inferior compared to the results presented in the previous section. Specifically, the best result, in terms of HTER, is 0.31%, whereas the best result of the algorithms presented in this paper is 0.25%. Moreover, it can be seen that the accuracy achieved in [15] for SVM classification is lower than the authentication accuracy produced by our implementation of the SVM fusion scheme. This could be attributed partly to the different implementation of the algorithm. The comparison results validate the superiority of the developed SVM scheme and indicate its appropriateness for the application scenarios examined within HUMABIO [16, 17].

5. Conclusions

Four machine learning algorithms were developed for the fusion of several biometric modalities in order to detect the most efficient one for use within the project HUMABIO. The algorithms were Gaussian Mixture Models, an Artificial Neural Network, a Fuzzy Expert System, and Support Vector Machines. The algorithms were trained and tested using a well-known biometric database which contains samples of face and speech and similarity scores of five face and three speech biometric experts. The fusion results were compared against existing fusion techniques and also against each other, showing that the fusion schemes presented in this paper produce better biometric accuracy from conventional methods. From the four algorithms, the most efficient one proved to be the support vector machines-based one offering significant performance enhancement over unimodal biometrics, over more traditionally combined multimodal biometrics, but also over the SoA.

Acknowledgments

This work was supported in part by the EC under Contract FP6-026990 HUMABIO [3, 16]. I. G. Damousis, when the research reported in this paper took place, was with the Informatics and Telematics Institute of the Centre for Research and Technology Hellas in Thessaloniki, Greece (e-mail: [email protected]). S. Argyropoulos, when the research reported in this paper took place, was with the Informatics and Telematics Institute of the Centre for Research and Technology Hellas in Thessaloniki, Greece (e-mail: [email protected]).