Single biometric method has been widely used in the field of wireless multimedia authentication. However, it is vulnerable to spoofing and limited accuracy. To tackle this challenge, in this paper, we propose a multimodal fusion method for fingerprint and voiceprint by using a dynamic Bayesian method, which takes full advantage of the feature specificity extracted by a single biometrics project and authenticates users at the decision-making level. We demonstrate that this method can be extended to more modal biometric authentication and can achieve flexible accuracy of the authentication. The experiment of the method shows that the recognition rate and stability have been greatly improved, which achieves 4.46% and 5.94%, respectively, compared to the unimodal. Furthermore, it also increases 1.94% when compared with general multimodal methods for the biometric fusion recognition.

1. Introduction

Biometric feature analysis has been widely studied for decades as it is a vital way for authentication and safeguard in computer vision. However, traditional biometrics, such as fingerprinting and vein recognition, gradually reveals some of its drawbacks that it can already be assigned and mimicked by forging fingerprints or faces [1]. As fusing features such as facial features, fingerprints, palm prints, sounds, and irises improves the stability, accuracy, and unforgeability of biometrics, multimodal biometric systems could help relieve the problem brought by the single-modal biometric systems and provide tremendous help for more secure authentication and identification.

There have been some researches about multimodal biometrics. Conti et al. [2] fused fingerprint and iris using homogeneous biometric vector through Log-Gabor filtering. Nagar et al. [3] studied the fusion of three biological feature (iris, fingerprint, and face) by using fuzzy vault and fuzzy commitment model to form a biometric encryption system framework. Snelick et al. [4] used a new method of normalization and fusion strategies to fuse and identify the biometrics of fingerprints and face at the score level. Muthukumar et al. [5] fused iris and fingerprint at the score level based on an evolutionary algorithm, Particle Swarm Optimization, which can help the authentication system adapt to different security needs. Shekhar et al. [6] used sparse matrices fusing the same three characteristics (iris, fingerprint, and face). Sparse matrix method has good recognition robustness. The above improvement in biometric identification demonstrates that there are many advantages of multimodal biometric identification.

On the other hand, we find some limitations about the existing researches, they are as follows: the above articles all chose to integrate multibiometrics at a certain level but did not take into account the fact that multiple biometric features may interfere with each other, thereby reducing the recognition effect. Most of the fusing at the decision layer always takes fixed weights. This is based on the overall average quality, but it is not the best solution for every decision. For example, if the fingerprint recognition rate is higher than the voiceprint overall, then it will be given a higher weight in the fusion recognition; however, the fingerprints are not always better than the voiceprint quality.

Inspired by these ideas, we propose a multibiometric fusion authentication solution by using dynamic Bayesian decision method, which is named MFDB-decision (Multibiometric Fusion using Dynamic Bayesian decision). The key idea of this work is that using matching layer score assists decision layer in fusing fingerprints and voiceprints aiming at recovering identity information lost in decision layer and, besides, overcoming the above problems caused by fixed weights. This paper uses fingerprint and voiceprint in multimodal fusion, because of the stability of the fingerprint and the high user acceptance of the voiceprint. The method proposed in this paper can be extended to more dimensional biometric fusion authentication, instead of being limited to the voiceprint and fingerprint.

The outline of this paper is organized as follows. Section 2 presents some related work. The preliminary research about fingerprint feature extraction and voiceprint feature extraction is introduced in Section 3. The multibiometric authentication fusion algorithm MFDB-decision is described in Section 4. The analysis of the experiments and results is given in Section 5. Finally, the paper would be concluded in Section 6.

There has been a great deal of research on the application based on single biometric identification, especially in the fields of biological key [79], cloud computing data security [1013], blockchain [14], privacy preserving [1518], and biological template protection [1921]. However, there are still not so many studies on multibiometrics. The study by Windsor Holden [22] further increased the application of multibiometric methods in the fields of common life other than criminal investigation.

Multimodal biometrics research attempts to overcome the shortcomings of single-modal biometrics in recognition accuracy, robustness, and flexibility and provides richer and more reliable biometrics applications. At present, the multimodal biometric system mainly focuses on the fusion extraction of multimode features at different levels to provide a unified data manipulation interface at the application layer [23]. Mehrotra et al. [24] proposed a class of multimodal classification for relevance vector classifier, which combined incremental and granular learning, which could handle large-scale unbalanced datasets and achieve better performance in multimodal biometrics classification and evaluation. Abdolahi et al. [25] proposed a multimodal fusion system using fingerprint and iris with fuzzy logic, and the obvious improvement in recognition rate was achieved. However, this method does not give a quantitative analysis of the effectiveness of the fusion process, and the obtained effect is poorly generalized. Miao et al. [26] proposed a framework of bin-based classifier method for the fusion of multibiometrics, which embedded matching scores into a new image pixel space, and obtained richer feature information when performing image-based biometrics. Chen et al. [27] proposed a framework for face and fingerprint images fusion using a type of middle-layer semantic features extracted from local feature-image matrix. However, it is still not clear whether this feature has good feature expression for all kinds of biometrics. Khellat et al. [28] proposed a feature level fusion method for three biological traits, which mainly used the Fisher dimensionality reduction technique, which caused the occurrence of the feature fusion in the dimensionality reduction space. Mai et al. [29] proposed a binary feature fusion method, which was generated from the sequence of feature bits using a machine learning algorithm that minimizes intraclass differences by minimizing interclass differences. The above proposed feature fusion algorithms still lack effective theoretical proof. Some work has been done on the application of multimodal biometrics. Liu et al. [30] applied the multimodal biometrics authentication method to single difficult biometrics, fused the different feature modalities to recognize the short utterance speaker, and achieved remarkable performance improvement. Gomez et al. [31] studied the protection of multibiometric template and proposed a multibiometric template protection technology based on homomorphic probability encryption. Gurusamy et al. [32] studied the biometric characteristics of MRI, and it was found that wavelet transform could better highlight the features of MRI images. Meng et al. [33] proposed a method for effectively detecting image hidden information by combining various image features through a fast R-CNN network. At present, the interpretability of the R-CNN network is insufficient.

To the best of our knowledge, although there are quite a few studies on multimodal biometrics, the definitive demonstration of multimode biometrics fusion has not been discussed yet. In this paper, we conducted a study on the deterministic effect of multimodal biometrics, using fingerprints and voiceprint as a template. At present, the research on the fusion of these two types of biometrics is still very limited.

3. Preliminary Research

Fingerprints, irises, human faces, voiceprints, and finger veins are the most commonly used biological features in human biometrics. The samples collection of fingerprint, voiceprint, and face is more convenient, and the application rate is higher. However, the face needs a large number of face sample banks, and the training and operation cost is high. In this paper, we use the low-cost fingerprint and voiceprint characteristics as the research objects.

3.1. Fingerprint Authentication Technology

The fingerprint authentication process is mainly divided into 4 steps, as shown in Figure 1: acquisition of fingerprint images, usually using optical instruments and other equipment; fingerprint image preprocessing, which finally gets the fingerprint thinning map; fingerprint feature extraction, which extracts the fingerprint feature point information which is serialized as fingerprint feature vector and stored as fingerprint feature template; fingerprint matching, which matches the extracted fingerprint feature vector with the feature template in the fingerprint database to confirm the authenticator.

In general, fingerprint image authentication mainly depends on the uniqueness of individual fingerprints in the texture. Fingerprint image preprocessing is the process of removing noise and highlighting and clarifying fingerprint texture. The general process is shown in Figure 2. Fingerprint image preprocessing directly affects the performance of the entire fingerprint authentication system, and its mainly steps include the following: fingerprint image enhancement: in this step, a specific algorithm such as frequency domain transform, filtering, denoising, and splicing of small block fingerprints is used to improve the quality of the image so that the fingerprint lines can have better connectivity and clearness, avoid false feature points, and improve fingerprinting characteristics accuracy of extraction. Fingerprint image binarization: in this step, the fingerprint image is converted into a black-and-white binary image by the method of deleting the local image pixel point while maintaining the connectivity. As a result, the adhesion between the lines is removed, the complexity of the fingerprint feature extraction is reduced, and the subsequent image thinning operation is facilitated. Fingerprint image refinement: in this step, the binarized fingerprint texture is refined into single-pixel lines, preserving the trend of the fingerprint lines without regard to the thickness of the lines. The refined fingerprint can extract details such as feature points more conveniently, so as to improve the accuracy of fingerprint matching.

Fingerprint feature extraction is a key step in the fingerprint template generation. Its main task is to obtain fingerprint feature point information. In the fingerprint identification process, the fingerprint feature point information is generally used as its main feature information, including the attribute point, position, and direction field value of the feature point. The comparison process determines whether the two feature points are the same according to the feature information. When two fingerprints have a certain number of the same feature points, the two fingerprints can be considered as one. The extracted feature point information is serialized to obtain a biometric template.

3.2. Voiceprint Authentication Technology

A complete speaker recognition system consists of acoustic feature extraction, voiceprint models establishing, and voiceprint matching calculations, as shown in Figure 3. The process of feature extraction is to extract the acoustic features of speech, such as Mel-scale Frequency Cepstral Coefficients (MFCC) from the original waveform signal, and to obtain a voiceprint model, such as Gaussian Mixture Model (GMM), which is used as a template to identify personal speech features. By calculating the voiceprint matching score, the system outputs the speaker authentication result.

The GMM is the most common voiceprint recognition model of the existing voiceprint models, as shown in Figure 4. The basic process is to extract the speech MFCC feature sequence and use the training data to calculate the model parameters and obtain the individual GMM template. The specific process is as follows.

For any D-dimensional vector , , the Gaussian mixture probability density function used to calculate the likelihood is as follows:where is the Gaussian component weight, satisfying . Speaker model , where is usually a diagonal matrix. And is the number of Gaussian components; that is, the mixed Gaussian probability density is defined as follows:If is the acoustic feature vector set of speaker I training and is a D-dimensional vector, then the whole process of parameter estimation can be described as updating the model parameter satisfying iteratively until convergence. Given the trained feature vector set of speaker I, the model parameters are usually obtained by the EM iterative algorithm, and the iteration process is as follows:

Weight iterative formula is as follows:

Mean iterative formula is as follows:

Variance iterative formula is as follows:In the above formula, the posterior probability of component is as follows:

The GMM parameters , , , etc. constitute a voiceprint biometric template.

Fingerprint and voiceprint features have their own characteristics. The fingerprint features are presented in the form of images. The features are hidden in the image texture. The recognition process requires fine processing of the image texture, which is susceptible to contamination and other kinds of interference and reduces the recognition accuracy. The high recognition accuracy of general fingerprints requires the assistance of a high-quality fingerprint collector. Speech frequency spectrum is the main analysis object of voiceprint feature. Fine processing of frequency domain features is needed in the process of recognition. The specificity of the features is not intuitive, but the anti-interference ability is slightly stronger than the fingerprint. The effective fusion of the two types of features can complement each other and enhance the anti-interference ability of feature recognition.

4. Multibiometric Fusion Authentication Algorithm MFDB-Decision

Multimodal biometric authentication based on image feature fusion generally achieves the ideal recognition accuracy in the limited sample test. However, the validity of the algorithm often lacks theoretical proof, making the generalization of the algorithm questionable. In this chapter, a demonstrable multimode fusion algorithm is derived from the combination of matching level and decision level making.

4.1. Strategy for Multimodal Fusion Optimizing

Multimodal biometric system uses various levels of fusion to combine two or more modalities [23], according to the different levels of integration. From low to high, it can be divided into the following:(1)At the sensor layer, the captured images are pixel-level fused. It is worth paying attention to that retaining as much as information as possible is inefficient and has poor real-time performance due to the large amount of sensor data processed. Furthermore, considering the differences in signal acquisition equipment, sensor layer fusion is not feasible in most cases.(2)At the feature extraction level, two or more modalities in the form of feature vectors are concatenated. Such a fusion often leads to very high dimensional vectors. But at present, the selection of characteristics is more random, and the specificity of the selected features generally lacks large-scale test.(3)At the matching score level, it mainly combines the matching scores from different modalities. But for the two modes of pattern with different calculation methods, fusion will be difficult.(4)At the decision level, the judgments of multiple verdicts are consolidated, and it has little requirement on the data relevance.

Through the abovementioned analysis, at present, although feature layer fusion may produce new effective features, it is difficult to guarantee the stability and reliability of new features. Considering that single-mode biometrics can easily excavate stable and specific features, we use the dynamic Bayesian method to combine feature recognition in the score layer and that in the decision layer based on fingerprint and voiceprint single-mode feature extraction, and the higher recognition accuracy and stability are obtained.

4.2. Dynamic Bayesian Decision for Minimizing the Error Risk

Because Bayesian judgments can achieve high judgment accuracy in mutually independent biometric modalities, we use Bayesian decision theory [34, 35] as the underlying mechanism. In the ideal case where all the relevant probabilities are known, Bayesian decision theory considers how to choose the optimal category marker based on these probabilities. Firstly, we suppose there are N possible category collections that can be shown like , and is the loss of classifying a sample of true labeled as . Based on the posterior probability , the expected loss produced by classifying the sample as (“conditional error risk” on the sample ) can be expressed as follows: Our task is to find a decision criterion to minimize the cost of the error risk:

Obviously, for each sample , if can minimize the conditional risk , the overall cost of the error risk will also be minimized. This produces dynamic Bayesian decision rule: To minimize the overall risk, it is needed to choose on each sample a category marker that minimizes the conditional risk ; it can be written as follows:where means Bayesian optimal classifier, corresponding to the overall risk called Bayesian risk, and reflects the notion that the classifier can achieve the best performance. When it comes to classification issues, can be expressed as Thus, the Bayesian optimal classifier that minimizes the classification error rate is It is obviously observed that the significance of maximizing posterior probability is to minimize the expected risk.

4.3. Multibiometric Fusion Authentication using Dynamic Bayesian Decision (MFDB-Decision)

In order to fuse the fingerprints and voiceprint recognition systems together, a score vector containing multiple recognition system is constructed, where and , respectively, represent the scores obtained from the fingerprint and voiceprint recognition system. Then, the question of identity conversion translates into the problem of classifying the two-dimensional fraction vector as accepting (genuine) or rejecting (imposter).

We know each modal classifier should have a different weight in multimodal classifier; in this paper, we should not fix the weight of biometrics because the dominant biometric is not always the same one. So we propose an algorithm called dynamic Bayesian decision (MFDB-decision) to get the best fusion recognition accuracy. Algorithm 1 is described in detail as follows.

Input:, ,
(1) repeat
(2) ifthen
(4) else ifthen
(6) else ifthen
(8) end if is category judgment result
(9) until the (k+1)-th test sample

The output of the algorithm is the category to which this feature belongs. The matching scores from each of the two classifiers , , , ; N means a total of N categories, and k represents a total of k test samples. is the weight of fingerprint recognition system; is the weight of voiceprint recognition system; is the quality failure threshold. Here, we set to be the average score value of current person with N template and cycling from 0.3 to 0.6, step 0.01. In step 5, the algorithm uses (10) to calculate . In step 7, because the voiceprint quality is high, the algorithm sets the weight of voiceprint to be greater than .

4.3.1. Fingerprint Matching Score Calculation

Fingerprint matching algorithm is mainly divided into three kinds of schemes based on correlation, minutiae, and nonminutiae matching. Either of the schemes will firstly form a fingerprint feature vector template, which is authenticated by a template. After extracting the feature vector for two fingerprint images, we can express it as

There are two finite sets of points in space: and , where x and y represent the coordinates of the detail points, respectively, where denotes the type of the detail point, for example, for a fork, for an endpoint, and so on. denotes the direction along the main ridge. As fingerprints are collected during being pressed, it is easy for the collected ones to be offset. Therefore, in the authentication process, the geometric constraints on the details of the matching point are proposed, including the geometric distance and the angle of detail deviation from the limit as follows:Following global registration, a local search can be performed [36], where means a reasonable distance threshold for the offset of the minutia and is a permissible deviation from the distortion estimate obtained from the ridge pattern. At the same time, two characteristic points satisfying formula (13) and (14) are considered as matching feature points. Two fingerprints with enough matching feature points are considered to be matched fingerprints. The specific score of fingerprint matching can be calculated as follows:  where means the number of feature points that match within the threshold in both graphs. and are the number of feature points, respectively, owned by the template vector and the test vector.

4.3.2. Voiceprint Matching Score Calculation

For D-dimensional acoustic feature vector , , the Gaussian mixture probability density function used to calculate the likelihood is as shown in (1). The whole process of likelihood parameter estimation can be described as updating the model parameter satisfying iteratively until convergence. According to Jensen inequality, the problem of parameter solving can be transformed into the problem of maximized , and can be solved as follows due to : Calculate the partial derivatives of mean value, weight, and covariance, respectively, and let the result be zero, and then an updated formula of the model parameters , , and will be obtained. Given the trained feature vectors set of speaker I, the model parameters are usually obtained by the EM iterative algorithm, but the computational complexity is large. Therefore, this paper adopts the adaptive method proposed by Reynolds to solve the model parameters. Using the set of observed feature vectors of speaker I to fit the predictive speaker model through the maximum posteriori probability (MAP), the problem is actually transformed into an optimization problem. Similar to the E step in the EM algorithm, the sufficient statistics , , , and of each Gaussian component of the UBM are first calculated, the difference being that the weight of the speaker model, the mean value, and covariance of the update process at the M step are as follows:

In the formula, , is an adaptive parameter that controls the change between the old and new coefficients, where is a fixed correlation factor. makes sure the weight rollup is always 1. Usually, only update the mean value, that is, . If the test speech feature vectors set of speaker J (declared as the I-th speaker , abbreviated as ) is , the common background model is , and the system logarithmic likelihood score is

4.4. Theoretical Support for the MFDB-Decision Algorithm

Algorithm 1 leads to the following lemma.

Lemma 1. The MFDB-decision algorithm can be generalized to classifiers when each one is independent.

Proof. Assuming there are samples to be identified and entered into L different classifiers, the output of a certain sample to be recognized after passing L classifiers is , and must be able to be identified as one of the N classes. According to the Bayes decision theory for minimizing the risk of loss, the fusing sample to be identified will be recognized as the highest posterior probability in N modal classes, and (11) can be written as We assume that L classifier is independent. So (6) can be analyzed as follows: We derive the following from bringing (22) into (21): If it is assumed that the posterior probability fluctuates above and below the prior probability and is not large, just shown as take formula (24) into (22) as follows: Approximately equal sign uses the Taylor series expansion, and we get a general formalized multiclassifier fusion strategy which holds for each independent feature classifier as follows: Consider fusing the concept of minimum loss expressed by (11) and (26) and if each classifier can find the maximum posterior probability, we can find the best posterior probability comprehensively: Under the premise that all L classifiers are correct, the fusion classification result of (27) is guaranteed. However, the abovementioned formula does not consider the influence of the classifier posterior probability on the classification result. If some of the preceding classifications are wrong, the error probability will be accumulated and transmitted backwards, resulting in low robustness. In order to reduce the influence of on fusion classification results, fractional layer information was added to assist in judgment which means the weight needs to be dynamically adjusted in each fusing round and mainly rely on scores that the current user obtained at the matching layer. When the maximum posteriori is within the qualified threshold, then the quality is judged to be good and a larger weight is assigned. If it is outside the qualified threshold, then the quality is judged to be poor and a smaller weight is assigned. When a suitable threshold is found as the decision key, our algorithm using matching score to assist in making decision can be modified from formula (27) as where is a dynamic weight based on the overall performance of each single-mode which can be judged from the matching layer. and are positive numbers that satisfy and . is the quality threshold in which we usually set the average score value of current person with N templates. The setting of details in this paper can be seen in Section 4.3. In this way, when an error is accumulated, the quality of the poor quality feature can be reduced to influence the final result of the voting, thereby improving the robustness of the algorithm. According to formula (28), the lemma is proved.

Corollary 2. If the L classifiers are independent and the number of categories is fixed, then the error rate of the MFDB-decision algorithm will be infinitely tending to zero when the number of classifiers tends to infinity.

Proof. Assuming that L classifiers are independent and our MFDB-decision algorithm achieved the minimum error rate in Lemma 1, then the probability of classification error for each classifier is less than the randomly selected error rate for N categories, i.e., . So it can be expressed as follows: where represents the error rate of each classifier. When the number of categories N is fixed and the number of classifiers L increases to infinity, the classification error rate can be expressed asSince N is a fixed value, is a constant less than 1, so the abovementioned equation equals 0 according to Sandwich Theorem. Therefore, as the number of classifiers L increases, the classification error will be reduced to 0.

This section not only proves the feasibility of MFDB-decision theoretically in this paper but also lists a situation showing that the classification error rate will decrease with the increase of classifiers. Therefore, the proposed algorithm is suitable for generalization of multimodal recognition beyond bimodality.

5. Experimental Results and Analysis

In order to test the effectiveness and practicability of the MFDB-decision algorithm, we used 3 common databases and 1 self-extracting database to conduct experimental tests. The three common databases were FVC2002 DB1 database (fingerprint), MIT Media lab Speech Dataset (speech), and TIMIT corpus (speech). The self-extracting database was hdu2016_40 database (short speech). FVC2002 DB1 database was a standard difficult fingerprint dataset with 100 fingers and eight samples for each finger, which was provided by the National Institute of Standards and Technology (NIST). The recognition rate of difficult fingerprints by general algorithms was not high. Generally, if there was no optimization, the recognition rate would be lower than 95%. In the fingerprint recognition competition, the participants could increase the recognition rate to over 99% by specific optimization, but the versatility of such algorithms was not strong. Since this test mainly investigates the effects of the two types of biometric fusion, the high accuracy of single-mode feature recognition was not conducive to the test results. Therefore, all the following test procedures used the universal fingerprint recognition algorithm and did not specialize optimization for the FVC database. The MIT Media lab Speech Dataset consisted of 48 registrars (22 females and 26 males) and 40 attackers (17 females and 23 males). Recorded separately in the handheld microphone and external headphones, the recording environment included quiet indoor, slightly interfering laboratories and noisy intersections; each tester randomly read 108 words or sentences in 6 environments. The TIMIT corpus was designed by the Defense Advanced Research Projects Agency. The number of registrations in TIMIT was 630, and each person read 10 sentences with 6300 sentences. 630 people were made up of 8 regions, including 438 men and 192 women. Each person read two designated text phonetics (SA) in dialect, five phonetically compact sentences (SX), and three phonetically diverse sentences (SI). The hdu2016_40 was a corpus including 40 people, each person had 25 paragraphs of different short utterance, each paragraph would be recorded ten times, and each record lasted 23 seconds. The MIT Speech Dataset and the TIMIT corpus were both English datasets, and the hdu2016_40 database was short speech Chinese datasets. Although, in the field of long speech voiceprint recognition, the recognition accuracy rate had reached 98% in low noise environment, in the short speech voiceprint recognition environment (voice length less than 5s), even if the ambient noise was low, the recognition accuracy was still not ideal, less than 95%. This article tested for short speech voiceprint recognition. Because the common fingerprint and voiceprint from the same tester database were relatively rare, the experiment used the abovementioned four groups of database combination for testing. In order to made the experimental results more reliable, the samples in the different databases were randomly selected during the experiment for combination testing, and multiple random sampling was performed under the condition that the combinations were not repeated. To evaluate the performance, false rejection rate (FRR) and false acceptance rate (FAR) are used as the main indicators.

5.1. Single Modal and Multimodal Comparison

In order to investigate the effectiveness of the MFDB-decision algorithm, we firstly examine the improvement of the accuracy of the MFDB-decision algorithm when the single-mode algorithm has difficulty in achieving high accuracy. The abovementioned four databases constitute two sets of datasets, with one being the combination of FVC2002 DB1 database, the MIT speech database, and the TIMIT corpus and the other being the combination of the FVC2002 DB1 database and the hdu2016_40 short speech database. The TIMIT corpus was used to train the English speech Universal Background Model (UBM), and the Chinese UBM was trained with the network random grasp of Chinese speech set. In the experiment, the fingerprint recognition algorithm used a general feature-based matching algorithm. The voiceprint recognition algorithm used the GMM model. Fingerprint and voiceprint recognition algorithms had no additional targeted optimization measures, so the difficulties of fingerprint recognition rate and the short speech voiceprint recognition rate were not high, as shown in Table 1.

Table 1 shows the experimental differences between single-mode and multimodal states. The ’#1’ and ’#2’, respectively, represent the collection of two types of voiceprint test databases in Chinese and English. Since the Chinese database is a self-acquisition voice database, the voice quality is better, so the recognition accuracy is relatively high. The result shows the superiority of our algorithm compared to single-mode recognition. We use the method proposed in Section 4.3.1 in the process of the fingerprint matching, using formulae (13) and (14) calculating the geometric distance and the angle of detail deviation; here, we set and and get similarity score calculated by (15). We performed the voiceprint process using the method proposed in Section 4.3.2, and 128 mixtures are used by GMM model, and the likelihood score is calculated by using formula (20). From the results, the MFDB-decision algorithm merged two single modes and achieved more stable and accurate results.

5.2. Robustness of the MFDB-Decision Algorithm

The fingerprints of 100 people (FVC2002 DB1, totaling 100 training pieces and 700 testing pieces) were divided into 60 groups, with each group consisting of 1 40 people, 241 people, 342 people, etc. There were 40 individuals in each group, each with 1 training fingerprint, 7 testing fingerprints, 255 training voices, and 255 test voices, for a total of 407255=35,000 tests. Each fingerprint and each voiceprint were paired and the average of 35,000 tests was taken as the test result of this group. We plotted the recognition rate (Genuine Acceptance Rate (GAR)) of each group after using the MFDB-decision algorithm in Figure 5, where “110” in the legend indicated the recognition rate obtained after the first to tenth groups of voiceprints and fingerprints were fused.

It can be seen from Figures 5 and 6 that the fusion model shows high stability as the recognition rate concentrated in 97% to 100%. And the fusion recognition rate is increased by 4.46% and 5.96% compared with single-model of fingerprint and voiceprint, respectively.

5.3. Effectiveness of the MFDB-Decision Algorithm

We test the effectiveness of the MFDB-decision algorithm by comparing the recognition rate of the MFDB-decision algorithm with several general fusion algorithms. The experimental process used the same grouping method in Section 5.2. The fusion recognition rate of each group was calculated by different algorithms, and the average recognition rate of each group was used as the final recognition rate of the fusion algorithm. The averaging in the abovementioned process was advantageous to avoid contingency. We randomly plotted 10 sets of recognition rates and compared them with the other two methods (AND as well as fixed weight voting method [37]). As shown in Figure 7, it can be seen that the MFDB-decision algorithm not only achieves a high recognition rate but also obtains good stability. The recognition rate of the MFDB-decision algorithm was higher than other algorithms. The fixed weight method was more stable than the MFDB-decision algorithm, but its recognition rate was not high enough. In many cases, it is worthwhile to sacrifice some stability and get a better recognition rate. Table 2 lists the average accuracy of the three multimodal methods.

MFDB-decision-making algorithm uses the score information of matching layer to assist decision-making recognition, which is helpful to recover the lost data in the decision-making process. Figure 8 shows the DET curves for various fusion methods. The PCA method uses the principal component analysis method mentioned in [38]. Since the unprocessed voiceprint MFCC sequence was not specific in the voiceprint recognition process, in this experiment, the accuracy of the PCA method was not high. The fuzzy rule method used fuzzy logic in the decision-making layer in [25] and achieved significant performance improvement. The trend of all curves is similar and decreases with the increase of FRR. The results show that all kinds of fusion methods are effective in the fusion of fingerprint and voiceprint, but our algorithm has achieved better results than other algorithms. We find that each curve intersects with the diagonal, indicating FRR=FAR, which is the equal error rate point EER. Generally, the lower the EER, the better the performance of the algorithm.

6. Conclusions

In this paper, we proposed a multimodal biometric recognition algorithm (named MFDB-decision) and demonstrated its effectiveness. We solved the problem that the fixed weight value could not be adaptively assigned in multimodal recognition and it would result in poor fusion performance. We compared the result of fusion with the result of single-modal recognition as well as the other methods and found that the method improved the recognition rate by an average of 5.0% or more. The multimodal fusion methods we developed are also greatly useful in the fusion recognition of other patterns. Future work will focus on multimodal biometric key extraction, ubiquitous identity authentication, and encryption technologies.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This research is supported by National Key R&D Program of China (no. 2016YFB0800201), National Natural Science Foundation of China (no. 61772162), joint fund of National Natural Science Fund of China (no. U1709220), and Zhejiang Natural Science Foundation of China (no. LY16F020016).