Feature-Level vs. Score-Level Fusion in the Human Identification System
The design of a robust human identification system is in high demand in most modern applications such as internet banking and security, where the multifeature biometric system, also called feature fusion biometric system, is one of the common solutions that increases the system reliability and improves recognition accuracy. This paper implements a comprehensive comparison between two fusion methods, named the feature-level fusion and score-level fusion, to determine which method highly improves the overall system performance. The comparison takes into consideration the image quality for the six combination datasets as well as the type of the applied feature extraction method. The four feature extraction methods, local binary pattern (LBP), gray-level co-occurrence matrix (GLCM), principle component analysis (PCA), and Fourier descriptors (FDs), are applied separately to generate the face-iris machine vector dataset. The experimental results highlighted that the recognition accuracy has been significantly improved when the texture descriptor method, such as LBP, or the statistical method, such as PCA, is utilized with the score-level rather than feature-level fusion for all combination datasets. The maximum recognition accuracy is obtained at 97.53% with LBP and score-level fusion where the Euclidean distance (ED) is considered to measure the maximum accuracy rate at the minimum equal error rate (EER) value.
Due to the digital revolution in the last decades, the high and accelerating demands on automated systems motivate developers, including biometric system designers, to put plans and thoughts towards improving the recognition accuracy and enhancing the system performance. So, the robustness of the biometric system became the main and vital aspect for many developers. One of the most reliable approaches is employing multifeature biometric system rather than unimodal systems; therefore the demand for more biometric traits like a fingerprint, face, iris, palm print, retina, hand geometry, voice, signature, and gait has been raised up. This paper aims to build a reliable biometric system by fusing the face and iris traits into one multimodal system. Meanwhile, the recognition results for the two fusion methods have been compared extensively to determine the best method that satisfies the highest accuracy at minimum EER value. The two fusion techniques have been run by six combination datasets and four feature extraction methods, LBP, GLCM, PCA, and FDs. Each combinational dataset contains 40 subjects, so the overall test subjects are 240. The diversity in applying multifeature extraction methods and databases will support our deep investigation in determining the maximum system accuracy to the corresponding fusion method, feature extraction method, and face-iris combination datasets.
Ijar and Nisha  presented a model for recognizing facial expression based on the support vector machine (SVM) algorithm. The Univalue Segment Assimilating Nucleus (USAN) approach was utilized as a feature extraction after resizing the image into 64 × 64 pixels, where 125 images were used for detecting the face region. The recognition accuracy was 90% with a 16% low error rate. Hamd and Mohammed  applied LBP, GLCM, FDs, and PCA feature extraction methods with feature-level fusion technique to improve the recognition accuracy of human identification system using face-iris traits. Three databases and two combinational datasets were utilized for testing the work. The maximum accuracy rate was measured using the Euclidean distance measurement; it was 100% using LBP and GLCM method, while the PCA and FDs method achieved only 97.5%.
Sharifi and Eskandari  applied three fusion methods: score level, feature level, and decision level to fuse the face and iris traits using the Log-Gabor method; moreover, backtracking search algorithm (BSA) was employed to improve the recognition accuracy by electing the optimized weights for feature-level and score-level fusion. The veriﬁcation rates showed a valuable improvement of the three fusion techniques over multimodal and unimodal systems. Zou et al.  applied a fusion method on the extracted local and global features for the scene classification approach using a collaborative-based classification. A fusion of LBP and Gabor filters satisfied significant results as the experimental results have approved. Luo et al.  presented a fusion of multifocus images for integrating the partially focused images into only one. The singular values of the high-order decomposition and edge intensity were both applied to implement the proposal.
Hamd  applied three neural network classifiers and Euclidian distance to optimize the iris feature vector with respect to maximum accuracy. The maximum accuracy correspondence to the optimum machine vector was satisfied with probabilistic and backpropagation rather than radial basis function and Euclidean distance measurements. Dwivedi and Dey  implemented an evaluation modal using three virtual databases. The integration of Dempster–Shafer (D–S) theory based on the decision-level fusion, cancellable modalities for iris-fingerprint features, and score-level fusion was proposed in this work. The mean-closure weighting (MCW) score level and D–S theory were both applied to minimize the boundaries in the individual score techniques. The test results of the proposed hybrid-fusion framework for three traits were very satisfied and robust than individual fusion methods (score level, feature level, and decision level).
Gunasekaran et al.  involved the local-derivative ternary pattern and contourlet transform for extracting the high variation coefficients. The multimodal fusion of iris-face and fingerprint satisfied significant improvements in the accuracy rate. Islam  developed a new combined approach of feature fusion and score fusion based on the multibiometric systems. Features from left and right iris images were fused, and a log-likelihood ratio based score-level fusion was applied to find the score of iris recognition. The output of multiple classifier selection (MCS) such as the individual modalities of each iris (left and right), feature fusion based modality, and log-likelihood ratio based modality was combined. The discrete hidden Markov model (DHMM) was utilized as a classifier in the modal, and the PCA was applied as an iris feature reduction approach. Each iris vector has 9600 feature values, where only 550 values have been considered. The proposed system performed a 90% enhancement to the accuracy rate over the existing iris recognition approaches under natural lighting conditions at various noise levels.
The D–S theory was applied for fusion at score level by Nguyen et al. . The motivation for applying D–S theory to multibiometric fusion is to take advantage of the uncertainty concept in D–S theory to deal with uncertainty factors that impact biometric accuracy and reliability in less constrained biometric systems. A unified framework was developed to improve the recognition accuracy, where the combination of classifiers performance and quality measures for the input data was weighted and then fused at score level after selecting the significant factors that generate an overall quality score. The proposed approach is robust against image quality variation and classifier accuracy; furthermore, it enabled the multimodal biometric to operate in less constrained conditions. Typically, three masses were considered when evaluating a biometric match: the mass of a genuine enrolment attempt, the mass of an impostor enrolment attempt, and the mass of the uncertainty (either genuine or impostor) of the acquired data and/or classifier. The multimodal biometric fusion of D–S approaches has verified promising performance after appropriately modeling the uncertainty factors. The N classifier accuracy values are also supplied in the form of an EER to strengthen the fusion. These N trios were combined to output a fused score and a fused quality score. When only high-quality data was accepted, the proposed approach achieved a competitive performance (at or close to 100% and with 1% EER) compared with other conventional fusion approaches. The bio-secure DS2 benchmark database  was considered to conduct the tests. It consists of 3 face channels, 6 optical and 6 thermal fingerprint channels, and 2 iris channels (not applied). Those 17 channels contain the data match scores and quality measures information. High quality of face and optical fingerprint achieved high accuracy than low quality face and thermal fingerprint. where the EERs varied from 0.08 to 0.09 for HR face optical fingerprint and to 0.33 for thermal fingerprint. The work did not make comparisons with Bayesian fusion approaches, and its results were only compared with other techniques that have similar data requirements. It also differs from our proposal in introducing the uncertain factors in computing the accuracy that utilized biometric traits other than the iris feature in score-level fusion approach.
An investigation was made by  to compare the performance of three multimodal approaches for recognizing a fusion of iris and fingerprint traits using sum rule, weighted sum rule, and fuzzy logic method. A multimodal biometric system of combined iris-fingerprint features was implemented and compared with the proposed unimodal systems. The scores of both traits were fused together in two phases: the matching score phase and the decision level phase, where the iris trait has more weight in fusion than a fingerprint. The small weight value was added to the matching distance using the fuzzy membership function to mimic human thinking and providing enhanced results. The experimental results showed that the fuzzy logic method introduced by Zadeh for fusion at the decision level is the best followed by the weighted sum rule and the classical sum rule in order.
The accuracy rates, error rates, and matching time were reported as three items for performance evaluation at zero false acceptance rate (FAR) to the corresponding false rejection rate (FRR). The iris and fingerprint features were combined from an equivalent number of CASIA-Iris (V1 and V2) and fingerprints FVC2004 database, and as assumed by this work, the fusion of fingerprint and iris is more reliable than other biometric like face . Hereby, it differs from our proposal by considering the iris-fingerprint features rather than face-iris combinational features. Furthermore, the proposed multibiometric system design is more complicated as it applied three different matching algorithms, and it converts the iris and fingerprint scores to fuzzy sets (fuzzification), which means the fuzzy inference system produces bad recognition, very bad, medium, good, very good, or excellent state. The experimental outcomes satisfied the best compromise between FAR and FRR (0% FAR and 0.05% FRR) with a 99.975 accuracy rate and 0.038 EER value with matching time equal to 0.1754 sec. Gender classification using a modified LBP was proposed by  to handle disadvantages of basic LBP with a new theory of nonlinear gender classification utilizing Tani-Moto metric as distance measure. The compared results with some state-of the-art algorithms showed the high quality of the proposed approach in terms of accuracy rate. Hammad et al. [15, 16] integrated information from two biometric modalities using convolution neural network (CNN) to improve the performance and make a robust system. Moreover, Q-Gaussian multisupport vector machine (QG-MSVM) was proposed with decision-level fusion and feature-level fusion for  and score-level fusion for  to complete the classification stage and achieve high accuracy. The proposed systems were tested on several databases for electrocardiogram (ECG) and fingerprint like PTB and LivDet2015 to show their efficiency, robustness, and reliability in contrast to existing multimodal authentication systems. The proposed system can be deployed in real applications according to their advantages.
The performance of different classification methods and fusion rules was studied by El_Rahman  in the context of multimodal and unimodal biometric systems utilizing the MIT-BIH for electrocardiogram (ECG) database and FVC2004 for fingerprint databases with 47 subjects from virtual multimodal database. The performance of the proposed unimodal and multimodal systems is measured using receiver operating characteristic (ROC) curve and area under the ROC curve (AUC). The experimental results indicated that AUC is 0.985 for sequential multimodal system and 0.956 for parallel multimodal system in contrast to the unimodal systems that achieved only 0.951 and 0.866 for the ECG and fingerprint databases, respectively. The proposed work has concluded that the overall performance of the multimodal systems is better than that of the unimodal systems tested on different classifiers and different fusion rules.
Consequently, few listed papers worked on the face-iris based multimodal human identification system which applies four different feature extraction methods besides two comparative fusion methods. Moreover, our proposal is carried out on six combinational datasets: three for face database and two for iris database. The quality of the database varies from good to low, where the low quality facial/iris images will add more challenges to the system performance as the blur, low illumination, different poses, and partial occlusion will eventually affect the feature extraction result and recognition accuracy outcomes.
2. The Proposed Methods
The two multimodal biometric systems have been designed and evaluated individually; then their performances are compared to determine the best-multifeature model that satisfies maximum accuracy corresponding to each feature extraction method. The three face and two iris databases are combined into six combination datasets. This variety in the image sources and qualities will support the accurate evaluation of the overall system performance. An example of how the four feature extraction methods are implemented by a facial or iris image is illustrated in the next sections.
2.1. Face-Iris Datasets
The specifications of the five standard databases are tabulated in Table 1 [18–20]. The subject must hold two biometric traits (face-iris) using one-to-one relationship. Each database contains 40 subjects, and the multimodal system will classify 240 subjects (one test image with different number of training images) for six combination datasets as explained in the following equation:
The two databases, UFI and MMU-1, contain the most difficult images (partial occlusion, different poses, varying contrast, and illumination) that represent the biggest challenges for any biometric system during the detection and classification phases.
2.2. Iris Preprocessing
The image of the iris is preprocessed before applying the feature extraction methods. These processes are pupil localization, iris localization, and iris normalization. Pupil localization uses the connected component labelling algorithm (CCLA) to detect the region of connected pixel. Moreover, Canny detector and circular Hough transform are both applied for iris localization. The final step is the iris normalization which is implemented using Daugman’s Rubber Sheet model. After that, the four feature extraction methods are applied for iris feature extraction and machine vector generation .
2.3. Feature Extraction Methods
The LBP operation is based on the difference between the threshold value (center pixel) and its eight neighbours. The LBP-code is set to zero if the deference result is less than zero; otherwise, the LBP-code is set to one , where the LBP circle is preconstructed to produce a flexible number of P neighbours with R radius. The LBP operations are presented in equations (2) and (3) [2, 21]:where is the gray value of center pixel, is the gray value of P neighbours (P = 0 to ), and is the intensity difference of every pixel in the image.
The resulted codes are classified into uniform and nonuniform patterns depending on the number of transitions between 1s and 0s bits. The uniform pattern provides two advantages: it selects the important features like lines, corners, end edges, and it saves storage by reducing the code length from 2^P to . Uniform is represented in this work with a radius of 1 pixel and 8 neighbours, where the increase or radius length leads to a decrease in the recognition accuracy as mentioned in . The LBP method has been modified into local ternary pattern (LTP) and improved local ternary pattern (ILTP) for more pattern representation, where the LTP has three levels of coding scheme: positive value, negative value, and zero. This technique is very useful for noisy image applications. The ILTP patterns are classified into two uniform or nonuniform groups, where the extracted patterns are tagged according to the degree of uniformity. The occurrence probability of those tags is extracted as features. This method is very suitable for rotation invariant and noise-resistant image applications [23–25].
The gray-level co-occurrence matrix is a second-order feature extraction method. It can work in different angles: vertical 90°, horizontal 0°, diagonal 45°, and 135°, where each direction determines a specific relationship. If the directional information is not important in the feature extraction, the four angles can be applied equally without any concern, and this is still true if the image is isotropic . Based on the GLCM, more than six features can be obtained: contrast, homogeneity, energy, entropy, correlation, variance, and autocorrelation . It is started by counting specific intensity pairs with specific distances and directional angles over a subimage. The result would be a 2D square matrix, and its size equals the number of intensity levels. An example of GLCM computation for three gray level values (1–3), 0° angle, and radius equal to 1 is explained in Figure 1.
The PCA is a statistical method that depends on computing the covariance matrix of the feature vectors, then the eigenvalues and eigenvectors. This method is a feature extraction and dimension reduction approach, where the dimension of the templates is reduced with maintaining the important features . The mathematic computations of PCA are determined as steps in Equations (4)–(7). First, calculate the for each image vector .
Second, let be represented as a mean-centered image for each vector obtained from subtracting the image vector from image mean, and then compute the covariance matrix vectors as in the following equation:where and represent the mean vector value and the two parameters and are the present values of and . n is the number of rows. Then, the eigenvalues are calculated from matrix as in the following equation:
Finally, for each eigenvalue , the eigenvector V is obtained as follows:
FD is a modified frequency-domain method of Fourier transform. It is applied to describe the shape in the image, where the described object is invariant to the position, rotation, and scale change. Fourier descriptor provides mainly two components, the DC which represents the x-y coordinate of the center point in the border and the radius of the circle that fits the border points. There are three steps for implementing the FDs procedure. First, the x-y coordinates of the border are converted to complex numbers. Second, the shape signature is performed by computing the centroid distance using equations (8)–(10). Finally, the Fourier coefficients are calculated by equation (10) .where are the coordinates of N samples on the boundary of an image region for t = 0, 1, …, N−1, is the center point of region, represents the described boundary shape, and is the Fourier transform coefficient.
2.4. Feature-Level Fusion
The fusion of face and iris traits improves the stability and reliability of the recognition system performance by converting the two unimodal systems into one, known multimodal biometric system. Figure 2 explains the fusion steps between the face and iris features using four feature extraction methods: PCA, LBP, GLCM, and FDs.
The fused biometric system is tested by six combination datasets: UMI-CASIA-V1, UMI-MMU-1, ORL-CASIA-V1, ORL-MMU-1, AR-CASIA-V1, and AR-MMU-1. The features of each trait are extracted separately, and then the serial rule is applied to implement the fusion technique, where a concatenation of face and iris features is created sequentially to produce a new pattern for the classification step and make the final decision. Based on the four feature extraction methods, four multimodal biometric systems are produced as shown in Figure 2. The concept of the serial rule is explained in the following equation [29, 30]:where refers to the facial features with vector size q and refers to the iris features with vector size m, where m and q are not equal.
2.5. Score-Level Fusion
The fusion at the score level is commonly utilized in multibiometric systems [1, 2]. In this technique, the recognition results are calculated for each unimodal system separately, and then the recognition score results are fused into one multimodal system to enhance the overall system performance as explained in Figure 3. First, the score vectors of the classification process for both traits (face and iris) are calculated separately and normalized as in Equation (12) at minimum EER value. The sum rule in Equation (13) is applied for fusing the face-iris scores in the second step. Finally, a decision process is obtained using the desired threshold that satisfied the maximum fused system performance [31, 32].where represents the score normalization of the face and iris biometric sample i, and are the minimum and maximum value in the score vector of sample i, respectively, and are the score value of the face and iris biometric sample i, respectively, and refers to the number of biometric systems that have been used. Figure 3 shows the steps of building four multimodal systems based on score-level fusion technique.
3. Results and Discussion
The multimodal system performances have been evaluated by six combination datasets, four feature extraction methods, and two fusion methods. The maximum recognition rate is measured by Euclidian distance at the minimum EER value that resulted from the intersection of FAR and FRR curves .
4. Recognition Accuracy
From Table 2, the combination dataset UFI-CASIA-V1 has satisfied maximum recognition rate of 85.2885 using score-level fusion and LBP method, while the feature-level fusion obtained a rate of 85.0641 using the same feature extraction method. It is clearly noticed that these rates are better than 67.7885 and 83.0449 that have been obtained by feature-level and score-level fusion, respectively, when the UFI-MMU-1 combination dataset is utilized. The low accuracy value of 67.7885 belongs to GLCM approach, and feature-level fusion is much affected by the quality and blurring images in the UFI-MMU-1 dataset, while the LBP and score-level fusion have maintained their 83.0449 recognition rate with those low quality and difficult combinational datasets.
Table 3 presents the ORL-CASIA-V1 and ORL-MMU-1 recognition accuracies. Generally, the score-level fusion has achieved maximum recognition rates for both combination datasets. It satisfied 97.5321 and 97.3077 rates using LBP and PCA, respectively. Despite the blurred iris images in ORL-MMU-1, the LBP and PCA are very competitive acquiring maximum accuracies with score-level fusion technique. It is worth mentioning that feature-level fusion achieved excellent recognition rates 97.4359 and 95.4808 using PCA with the same combination datasets, respectively. Finally, from Table 4, the LBP satisfied the most maximum accuracies corresponding to the two fusion methods that have been carried out on AR-CASIA-V1 and AR-MMU-1 datasets. The maximum rate is 95.9615, and it is registered for AR-CASIA-V1 with score-level fusion technique. Therefore, the score-level fusion and LBP method have the most stable and highest performance for recognizing the face-iris traits in our multimodal system.
5. Maximum Performance
This section describes analytically and graphically the behaviour of FAR, FRR, EER, and a threshold for the maximum recognition accuracies that have been satisfied by LBP and PCA methods in Table 3. Those good results for the two fusion methods are tabulated in Table 5. They reflect the highest competitive performance between the two fusion methods as well as the four feature extraction methods that are carried out on good quality ORL-CASIA-V1 combination dataset. Moreover, the score-level fusion has outperformed the feature-level fusion under all circumstances.
However, the shaded cells in Table 5 are graphically represented in Figures 4 and 5. These figures represent the maximum recognition rates in our multimodal system, and they clearly show the minimum intersection (EER) values between the FAR and FRR at 0.025 and 0.024 for PCA and LBP methods, respectively. The two threshold values for those EERs are 0.06 and 0.02 as explained in Figures 4 and 5, respectively. For more analysis and clear comparison, the recognition accuracies for the two fusion methods and four feature extraction approaches carried out by six combination datasets are graphically represented through Figures 6–11. These graphs provide full detail about the behaviour of our proposed multimodal systems under deep tests. They show that score-level fusion scored 22 times out of 24 in the highest rates against feature-level fusion.
There is no doubt a multifeature biometric system can improve the performance of a unimodal system and increases its reliability and robustness. This research has motivated the design philosophy one step farther towards the right selection of fusion rule that produces a strong methodical vision to the interested developers. The quality of the combinational dataset and the applied feature extraction method were the two factors that have been taken into consideration besides the fusion technique to study the system performance for classifying 240 subjects. The experimental results show that score-level fusion has outperformed feature-level fusion in satisfying maximum accuracy rates applied on four feature extraction methods. As a result, the score-level fusion and the LBP method are both candidates for a reliable and robust human identification system using the face-iris traits as a multifeature biometric system. This work can be more improved if a decision-level fusion is considered to calculate the system performance in the classification stage.
The proposed comparisons are carried on six face-iris combinations; they are AR-CASIA-v1; AR-MMU1; ORL-CASIA-v1; ORL-MMU1; UFI-CASIA-v1; UFI-MMU1, where the three face databases are combined with the two iris databases to implement this work. All three face databases are available on: https://www.face-rec.org/databases/, while the two iris datasets are available http://www.cbsr.ia.ac.cn/english/IrisDatabase.asp, and https://www.kaggle.com/naureenmohammad/mmu-iris-dataset.
Conflicts of Interest
The author declares that there are no conflicts of interest concerning the publication of this paper.
The author’s sincere thanks and gratitude go to Mustansiriyah University for its support to complete this work.
R. K. Ijar and V. Nisha, “Analysis of face recognition using support vector machine,” in Proceedings of the International Conference on Emerging Trends in Engineering, Science and Sustainable Technology (ICETSST), Thrissur, Kerala, India, January 2017.View at: Google Scholar
J. Zou, W. Li, C. Chen, and Q. Du, “Scene classification using local and global features with collaborative representation fusionﬁcation using local and global features with collaborative representation fusion,” Information Sciences, vol. 348, pp. 209–226, 2016.View at: Publisher Site | Google Scholar
M. Abdolahi, M. Mohamadi, and M. Jafari, “Multimodal biometric system fusion using fingerprint and iris with fuzzy logic,” International Journal of Soft Computing and Engineering, vol. 2, no. 6, pp. 504–510, 2013.View at: Google Scholar
S. F. Ershad, “Developing a gender classification approach in human face images using modified local binary patterns and tani-moto based nearest neighbor algorithm,” International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 12, no. 4, pp. 1–12, 2019.View at: Publisher Site | Google Scholar
M. A. Rahim, M. N. Hossain, T. Wahid, and M. S. Azam, “face recognition using local binary patterns (LBP),” Global Journal of Computer Science and Technology Graphics & Vision, vol. 13, no. 4, 2013.View at: Google Scholar
D. Huang, C. Shan, M. Ardebilian, Y. Wang, and L. Chen, “Local binary patterns and its application to facial image analysis: a survey,” in IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 41, pp. 1–17, 2017.View at: Google Scholar
D. Gadkari, Image Quality Analysis Using GLCM, University of Central Florida, Orlando, FL, USA, 2004.
E. Acar, “extraction of texture features from local Iris areas by GLCM and Iris recognition system based on KNN,” European Journal of Technic, vol. 6, no. 1, 2016.View at: Google Scholar
J. Meghana and C. Gururaj, “Iris detection based on principal component analysis with GSM interface,” International Journal of Advances in Electronics and Computer Science, vol. 2, no. 7, 2015.View at: Google Scholar
N. Srivastava, “Fusion levels in multimodal biometric systems– A review,” International Journal of Innovative Research in Science, Engineering and Technology, vol. 6, no. 5, 2017.View at: Google Scholar
S. M. Prakash, P. Betty, and K. Sivanarulselvan, “Fusion of multimodal biometrics using feature and score level fusion,” International Journal on Applications in Information and Communication Engineering, vol. 2, no. 4, pp. 52–56, 2016.View at: Google Scholar