Abstract

Biometrics based personal authentication has been found to be an effective method for recognizing, with high confidence, a person’s identity. With the emergence of reliable and inexpensive 3D scanners, recent years have witnessed a growing interest in developing 3D biometrics systems. As a commonsense, matching algorithms are crucial for such systems. In this paper, we focus on investigating identification methods for two specific 3D biometric identifiers, 3D ear and 3D palmprint. Specifically, we propose a Multi-Dictionary based Collaborative Representation (MDCR) framework for classification, which can reduce the negative effects aroused by some local regions. With MDCR, a range map is partitioned into overlapping blocks and, from each block, a feature vector is extracted. At the dictionary construction stage, feature vectors from blocks having the same locations in gallery samples can form a dictionary and, accordingly, multiple dictionaries are obtained. Given a probe sample, by coding its each feature vector on the corresponding dictionary, multiple class labels can be obtained and then we use a simple majority-based voting scheme to make the final decision. In addition, a novel patch-wise and statistics-based feature extraction scheme is proposed, combining the range image’s local surface type information and local dominant orientation information. The effectiveness of the proposed approach has been corroborated by extensive experiments conducted on two large-scale and widely-used benchmark datasets, the UND Collection J2 3D ear dataset and the PolyU 3D palmprint dataset. To make the results reproducible, we have publicly released the source code.

1. Introduction

With the heightened concerns about security [1], the need for reliable identity recognition techniques has significantly increased in the recent decade. Bolstered by the needs of various systems, including the immigration control, the aviation security, or the safeguarding of financial transactions, how to establish the person’s identity has attracted great interests of many research endeavors. To address such an issue, biometric-based approaches, which are based on physical or behavioral characteristics of human beings, have recently been attracting increasing attention due to their friendliness and high accuracy. Since 1990s, various biometric identifiers have been exhaustively studied, such as fingerprint [24], 2D face [58], 3D face [911], iris [1216], palmprint [1721], hand geometry [2224], finger-knuckle-print [2530], palmvein [31], and ear [32, 33].

At present, most of the deployed biometrics systems depend on 2D images. Though a great deal of effort has been devoted over the past decades, there are still some great challenges for 2D-image based personal authentication systems. Recently, reliable and cheap 3D scanners have emerged and can provide new choices for researchers to develop recognition systems based on 3D shape information. Compared with their 2D counterpart, 3D data samples have some inherent advantages. For example, (1) they are less sensitive to illumination variations, pose changes, and contamination happening on the target’s surface; (2) they can embed shape information related to the target’s anatomical structure; and (3) they are relatively more difficult to be copied or counterfeited.

In this paper, we focus on two specific 3D biometric identifiers, 3D ear and 3D palmprint, whose associated recognition systems usually share a common architecture. As shown in Figure 1, a typical 3D ear or 3D palmprint recognition system comprises the following components: range data acquisition, preprocessing and extraction of ROI (region of interest), feature extraction, and classification. In most cases, the acquired 3D ear or palmprint data is actually a range image. In this work, we suppose that ROI maps for 3D ears or palmprints have already been available and we solely focus on how to devise a universal feature representation and classification scheme, which can be utilized to classify 3D ears and 3D palmprints both. With respect to schemes for 3D ear and 3D palmprint ROI extraction, readers can refer to [3436], respectively, for details.

The rest of this paper is structured as follows. Section 2 summarizes the relevant work and presents our contributions. Section 3 introduces our newly proposed feature extraction scheme for 3D range data. Section 4 gives the structure of our multi-dictionary based collaborative representation framework. Experimental results are reported in Section 5. Section 6 summarizes our work.

In this section, we will at first briefly review the collaborative representation-based classification. Next, we will review some representative approaches for matching 3D ears or palmprints. Then, our motivations and contributions are presented.

2.1. Recap of Collaborative Representation-Based Classification

Given a dictionary comprising all the feature vectors in the gallery, where is a -dimensional -normalized feature vector of the -th sample of class , is the number of classes, is the total number of samples of class , stands for the number of dictionary items, and is always assumed for robustness; collaborative representation based classification (CRC) [42, 43] argues that it is the collaborative representation (CR, i.e., using all the gallery samples to encode the probe sample) mechanism that actually improves the recognition accuracy. To this end, instead of using an -normalized regularization term employed in sparse-representation-based classification framework (SRC) [6], we adopt CRC with the regularized least square method (CRC_RLS). With CRC_RLS, the representation coefficients of the feature vector of a probe signal over can be computed through solving where the term penalizes the reconstruction errors and is a regularization constant, controlling the relative contributions of the two terms.

Mathematically, Eq. (1) is actually a Ridge regression problem and it has a closed-form solution: where is an identity matrix. It needs to be noted that the term is independent of . In this sense, we can compute it solely based on the gallery set as a projection matrix in advance, making it quite efficient for classification purposes. Then, ’s class label is determined by the class specific residuals defined aswhere is a nonlinear mapping that fills zeros in the entries that are not associated with class .

2.2. Matching Methods for 3D Ears

Of the biometric identifier members, the ear has recently drawn much attention because of its nonintrusiveness, uniqueness, and ease of data acquisition. After decades of anthropometric measures of ear images collected from thousands of people, the researchers found that the ear has a rich structure (see Figure 2(a)) and that no two ears are identical even if they are from identical twins [44]. Furthermore, the anatomical structure of a particular ear does not alter too much over time.

A rendered 3D ear sample is shown in Figure 2(b), from which it can be seen that a 3D ear contains abundant discriminative shape structures. The research work focusing on 3D ear-based personal authentication started from the year 2003 [45]. Since then, substantial effort has been devoted to this area. In order to detect and segment ear regions from the original profile range images, various different ideas have been proposed [34, 35, 4649]. By contrast, most state-of-the-art 3D ear matching schemes, such as [34, 47, 49, 50], are based on ICP (iterative closest point) [51] or its variants. Although ICP is an attractive solution for solving one-to-one verification problems, it actually cannot deal with one-to-many identification cases quite well. Generally speaking, ICP-based identification is quite time consuming. Suppose that in the gallery set each subject has multiple samples. With an ICP-based identification method, to figure out the identity of a test sample, it would be necessary to compare the test sample with all the gallery samples one by one. Such an exhaustive searching strategy is obviously not computationally efficient, particularly when the number of gallery samples is very large. Hence, for large-scale identification applications, ICP-based approaches and their variants are not good candidates.

Zhang et al. [35] attempted to address the 3D ear identification problem using an SRC framework [6]. At the offline training stage, feature vectors are extracted from all the ear samples in the gallery set and they form an overcomplete dictionary . This implies that, if there are enough training samples for each class, it would be highly possible to represent the coming test sample as a linear combination of those samples in from the same class. At the testing stage, the feature vector is extracted from the test sample first, and, then, is coded over the dictionary . Finally, ’s label can be determined by evaluating which class obtains the least reconstruction residual. Quite recently in [39], Zhang et al. improved their work by substituting the SRC framework with the LC-KSVD (label-consistent KSVD) classification framework [52].

For a more thorough and recent review on ear recognition, please refer to [33].

2.3. Matching Methods for 3D Palmprints

Palmprint refers to the texture patterns on the inner surface of a palm, which comprises mainly two kinds of physiological traits, the palmar friction ridges [53] and the palmar flexion creases [18]. The three major types of flexion creases are distal transverse, proximal transverse, and radial transverse creases, which are most clearly visible. Palmar friction ridges and flexion creases are both verified to be unchanged, immutable, and unique to a specific person [53]. The palmprint is an important and appealing member of the biometrics family, having various desired properties, e.g., high distinctiveness, robustness, and user-friendliness.

In 2009, Zhang et al. proposed to use 3D shape information to match two palmprints [36]. They developed an acquisition device using the structured-light technology. Using such a device (as shown in Figure 3), the 3D shape data and the 2D texture data can be simultaneously collected from a user’s palm. With the self-designed device, Zhang et al. constructed the first large-scale 3D palmprint dataset which is now publicly available [40]. Since then, various approaches have been proposed for matching 3D palmprints.

In [36], the authors compute mean curvature images (MCI), Gaussian curvature images (GCI), and surface type (ST) maps from 3D palmprints and for matching they resorted to the normalized Hamming distance. In [54], three levels of features were defined and extracted, including shapes, principal lines, and texture related features. In order to deal with minor alignment errors caused by imperfect ROI extraction, the authors performed an ICP-based alignment refinement to the feature maps. The apparent drawback of this approach lies in its high computational complexity. In their later work [55], Li et al. computed MCI from the given range map first, and, then, line and orientation features were derived from MCI. Then, they fused the features at either the score level or the feature level for the final matching. In [37], surface curvature maps were extracted at first and then the normalized local correlation was used for calculating the matching distance. In [56], Yang et al. resorted to the shape index representation to characterize the geometry of local regions of a 3D palmprint. After that, LBP (local binary patterns) [57] and Gabor wavelet features were derived from the shape index map. At the matching stage, the authors fused those two types of features at the score level. In [58], when matching two 3D palmprints, Liu and Li extracted two MCIs, and , from them first, and, then, applied the OLOF (orthogonal line ordinal feature) operator [59] on and to obtain two feature maps and . Finally, they resorted to the normalized Hamming distance to compute the matching distance between and . To account for the minor alignment error between two palmprint ROIs, a cross-correlation-based scheme was utilized to align the two feature maps.

Quite recently, Zhang et al. [38] claimed that all the above-mentioned methods are only appropriate for one-to-one verification applications and they are not suitable for large-scale one-to-many identification applications. The main reason is that they all adopt a brute-force searching strategy for identification. Moreover, for dealing with the mere misalignment between two ROIs, they used the multi-translation-based matching [36, 55] or explicit registration techniques [54, 58], both of which are not quite computationally efficient. As a solution, Zhang et al. [38] proposed a new scheme for 3D palmprint identification, namely, CR_L2, which makes use of CRC_RLS [42] as the classification framework. Additionally, to represent a 3D palmprint sample, they proposed a patch-wise and statistics-based feature extraction scheme, having the merits of high effectiveness and high robustness to mere misalignment.

2.4. Our Motivations and Contributions

Owing to the fact that both SRC [6] and CRC_RLS [42] are based on the collaborative representation, they are referred to as CR-based classification frameworks in this paper. Having investigated the literature, we find that CR-based approaches can get the state-of-the-art results for 3D ear or palmprint identification [35, 38]. In existing CR-based 3D ear or palmprint classification approaches, to represent a range image sample, a single feature vector is extracted, which is actually a type of holistic representation. And accordingly, a single dictionary is constructed from the gallery set. With such a sample representation scheme, when local deformations, corruptions, or occlusions, different from those existing in gallery samples, happen in the probe sample, its feature vector will be affected and consequently will make the representation coefficients less informative.

To address this issue and to better exploit the discriminant information embedded in data samples, in this paper, we propose to use multiple feature vectors extracted from local blocks to represent a range image. Accordingly, multiple dictionaries are constructed from the gallery set, each of which is composed of feature vectors extracted from blocks having the same locations in different samples. Given a probe sample, its class label can be determined by solving multiple CR-based classification problems. The proposed classification scheme is termed as Multi-Dictionary based Collaborative Representation, MDCR for short.

We use a real example to demonstrate the design rationale of MDCR, as illustrated in Figure 4. Figure 4(a) shows the classification process of the CR-based approaches, while Figure 4(b) illustrates the idea of the proposed MDCR. Suppose that we have a test 3D ear sample, whose ground-truth class label is . Denote its feature vector by . With the CR-based classification scheme, a single dictionary is built from gallery samples. is coded on to get its representation coefficients. Then, ’s class label is determined by evaluating which class can yield the minimum reconstruction error. In this example, the least reconstruction residual happens on class h, and thus the test sample is finally misclassified as class h. With the proposed MDCR classification scheme, the test sample is partitioned into N blocks ( = 1~N). In this example, we set N as 4 for simplicity. Accordingly, four feature vectors ( = 1~4) are extracted from blocks ( = 1~4). Actually, at the dictionary construction stage, four dictionaries ( = 1~4) have already been constructed from blocks of gallery samples and the blocks generating have the same locations as . By coding on , four class label predictions are obtained, , , , and . Finally, by using a simple majority-based voting scheme, the class label of the test sample is correctly determined as . From this example, it can be seen that using MDCR, multiple label predictions can be obtained for a test sample based on its multiple blocks and the final decision is made using a voting scheme. Such a classification strategy can reduce negative effects brought by “bad” local regions (i.e., regions with local corruptions, occlusions, or deformations) in the test sample. The effectiveness of the idea of multi-dictionary has also been corroborated in other research fields, such as visual tracking [6063].

Another contribution of our work is that we propose a novel patch-wise and statistics-based feature extraction scheme, which combines the range image’s local surface type information and local dominant orientation information. To encode local dominant orientation, we resort to the competitive coding (CompCode) scheme [64]. The proposed feature extraction scheme is referred to as Local Histograms of Surface Types and CompCodes, LH_STCC for short.

The efficacy and efficiency of the proposed scheme have been verified by experiments performed on two large-scale benchmark datasets. Source codes have been publicly released at http://sse.tongji.edu.cn/linzhang/MDCR/index.htm to make the results fully reproducible.

The preliminary version of this paper has been published in ICIC 2016 [65]. The following improvements are made in this version: (1) the related work is reviewed more comprehensively; (2) our motivations for the proposed approach are presented in a more clear and detailed manner; (3) a novel approach, namely, LH_STCC, is proposed to extract the feature vector from a 3D range scan; and (4) more methods relevant to our approach were evaluated in experiments and a more thorough analysis of the experimental results is performed.

3. LH_STCC: A Novel Feature Extraction Scheme for Range Images

Using the proposed MDCR classification framework, each range image will be partitioned into a set of blocks and a feature vector needs to be extracted from each block. Since mere misalignment between two ROIs exists, it is highly desired that the extracted feature vectors are robust to mere misalignment error while possessing a high discriminating power. To satisfy aforementioned needs, Zhang et al. [38] proposed a patch-wise and statistics-based feature extraction scheme, which makes use of local surface type information of the range image. In this paper, we extend Zhang et al.’s idea by integrating local surface type information with local dominant orientation information. Details of our feature extraction method are presented as follows.

A range image can be regarded as a 3D surface, containing various convex and concave local structures. Its points can be labeled as different “surface types” according to their intrinsic geometric characteristics. Assume that the range map is represented by , where denotes the depth value at . Its mean curvature and Gaussian curvature at are defined as follows [66]: where , are the first-order and second-order partial derivatives of , respectively. Nine different surface types (STs) can be defined based on signs of H and K, as listed in Table 1. Thus, from , an ST map can be derived and its each element is an integer ranging from one to nine.

On the other hand, the local dominant orientation has been proved to be quite discriminative in the field of 2D biometrics. Thus, we regard the range map as a 2D gray-scale image and use the Gabor filter based CompCode [64] to extract its local dominant orientation information. In the spatial domain, the 2D Gabor filter is defined as where , . In Eq. (6), is the wavelength of the sinusoid part, indicates the orientation of the Gabor function, and and represent the standard deviations of the Gaussian function. CompCode postulates that every pixel of an image locates on a negative “line” and it derives the line orientation with a set of predefined real Gabor filters having various orientations. Denote by the real part of the Gabor filter . With a series of s having the same parameter settings, except the orientation, local dominant orientation information of at the position can be extracted and coded. Such a competitive coding operation can be expressed as where “” means the convolution operation and , . denotes the number of possible orientations. was set as in this paper. We denote the CompCode map computed from by . Obviously, each element of is an integer in the range 05, representing the local dominant orientation.

Then, based on (the surface type map) and (the CompCode map), we resort to the patch-wise and statistics-based scheme [38] to build the feature vector. Specifically, we uniformly partition () into a set of regular patches. For each patch , we compute from it a normalized histogram of surface types (CompCodes), denoted by (). It is easy to know that the dimension of () is 9 because there are 9 different surface types (CompCodes). Finally, all the s and s are stacked together into a large histogram , which is taken as the feature vector for the range image . The proposed feature extraction algorithm is referred to as Local Histograms of Surface Types and CompCodes, LH_STCC for short. The flowchart illustrating the building process of LH_STCC is shown in Figure 5.

4. MDCR: Multi-Dictionary Based Collaborative Representation

In this section, our proposed MDCR classification framework is presented. Suppose that there is a gallery range image set, comprising samples from classes. For each sample , we uniformly partition it into overlapping blocks (j = 1N). A feature vector is extracted from each using the feature extraction algorithm LH_STCC described in Section 3. Then, all the feature vectors are stacked together to form a dictionary : where is the dimension of the feature vector extracted from each block. In this way, we can obtain N dictionaries from the gallery set.

Given a probe range image, we at first partition it into N blocks as we performed to gallery samples. From each block , a feature vector (j = 1N) is extracted using the algorithm LH_STCC. Then, can be coded as a linear combination of the column vectors in . ’s representation coefficients can be determined by solving a CRC_RLS problem as depicted by Eq. (1) and is expressed as where is an identity matrix. Then, we can get a class label prediction for the probe range image by evaluating which class can yield the minimum reconstruction residual for : represents the reconstruction error of using gallery samples from class . After processing all the feature vectors extracted from the probe image by Eqs. (9) and (10), altogether, we can get N label predictions . Finally, we apply a simple majority-based voting strategy on to get the final label prediction for the probe range image.

Till now, we have presented our proposed universal method for classifying 3D ears or palmprints, which uses MDCR as the classification framework and LH_STCC as the feature extraction scheme. In the following, our proposed method will be referred to as MDCR+LH_STCC for short. The overall flowchart of MDCR+LH_STCC is shown in Figure 6, using 3D ear identification as a concrete example.

5. Experiments

5.1. Datasets and the Test Protocol

The experiments were conducted on two benchmark 3D biometrics datasets, one for 3D palmprint recognition [40] and one for 3D ear recognition [41]. The recognition rate is utilized as the performance measure. Furthermore, for each compared method, we also evaluated its time cost consumed by one identification operation. For a given test sample, the time cost consumed by one identification operation is composed of the time for feature extraction and the time for matching the test feature with the gallery feature set. The experiments were conducted on a workstation with an Intel Xeon E5-1650 CPU and 16GB RAM. The software platform was Matlab2015b.

5.2. 3D Palmprint Identification

The PolyU 3D palmprint dataset [40] contains 8000 samples acquired from 400 palms of 200 volunteers. 136 of the volunteers were male and the other 64 were female. For each palm, its 20 samples were collected in two individual sessions and 10 samples were taken in each session. The average time interval between the two collection sessions was about one month. The size of each sample is 128 × 128. Sample 3D palmprint ROI regions are shown in Figure 7. Figures 7(a) and 7(b) are taken from one palm while Figures 7(c) and 7(d) are taken from another palm. In our experiments, samples acquired in the first session were taken as the gallery set and the ones acquired in the second session were taken as the probe set. With such an experimental setting, the gallery set contains 400 classes, each of which includes 10 samples.

In order to show the superiority of our proposed MDCR+LH_STCC scheme for 3D palmprint identification, several representative approaches in this area were investigated, including the MCI-based one [36], the GCI-based one [36], the ST-based one [36], the local correlation (LC) based one [37], and the CR-based one (CR_L2) [38]. For each method, we strive to achieve its best performance by tuning all the parameters. For MDCR+LH_STCC, when constructing local dictionaries, several different block sizes (104 × 104, 108 × 108, 112 × 112, 116 × 116, and 120 × 120) were tried and “116 × 116” was found the most proper one. Thus, each palmprint range sample is empirically partitioned into 16 evenly distributed blocks and the block size is 116 × 116. Accordingly, 16 local dictionaries were built from the gallery set.

The evaluation results are summarized in Table 2 and the associated CMC (cumulative matching characteristic) curves are shown in Figure 8, from which we obtain the following findings. First, in terms of the classification accuracy, the proposed method MDCR+LH_STCC performs much better than the other approaches evaluated. It can get a 99.78% rank-1 recognition rate on the PolyU 3D palmprint dataset. Second, with respect to the running speed, CR_L2 runs the fastest and MDCR+LH_STCC can rank the second. The high computational complexity of MCI, GCI, and ST [36] should be attributed to the multiple translation-based matching scheme they used. LC [37] computes a local correlation coefficient for every data point, making it rather slow.

To further illustrate the robustness of MDCR, we showcase some of its correct predictions on the labels of 3D palmprint ROIs with a large proportion of severely degraded areas. In Figure 9, images in the same row were collected from the same palm and images in three different rows were collected from three different palms. For each row, the left two images are from the gallery set while the right two ones are probe samples. It can be observed that those areas bounded by rectangles vary drastically among samples of the same palm. When extracting features for those proposals to form a holistic representation, we deliberately treat them as clear ones and the degraded information is thus mixed up. As a result, the matching scores or sparse codes computed from them are made less informative. On the contrary, with multiple feature vectors uniformly extracted in spatial domain, incorrect estimations caused by those degraded proposals will be reduced in the existence of a relatively larger number of correct predictions conducted from the rest areas when a voting scheme is adopted. Thus, we still have high confidence in accurate recognition by using MDCR even though we are confronted with such challenging cases.

5.3. 3D Ear Identification

The UND Collection J2 3D ear dataset [41] is the largest 3D ear scan dataset to date, which comprises 2346 3D side face scans acquired from 415 persons. The range images of UND-J2 were captured by a Minolta Vivid 910 range scanner. Pose variations exist among data samples. In addition, in some samples, ear regions are occluded by hair or ear rings. Each range scan is of the size 640 × 480. Several samples selected from UND-J2 are shown in Figure 10.

To evaluate our method’s performance, experiments cannot be simply performed on the entire dataset because several individuals in UND-J2 have only 2 scans. As claimed by Wright et al. [6], sparse coding-based identification schemes require that there should be a sufficient number of samples for each class of the gallery set. As a result, four subsets from UND-J2 were virtually created for our experiments. Specifically, it is required that each class should have more than 6, 8, 10, and 12 samples, respectively. For subset 1, six samples from each class were randomly selected to form the gallery set, and the remaining ones formed the test set. For subset 2, eight samples from each class were randomly selected to form the gallery set, and the remaining ones formed the test set. For subset 3 and subset 4, similar schemes were adopted to create the gallery and test sets. For clarity, important information of the aforementioned four subsets used for evaluation is presented in Table 3.

In experiments, a 3D ear ROI was first detected from the side face and then it was aligned by fitting its edges to a two-dimensional contour template [35]. The size of each ROI is 96 × 65. MDCR+LH_STCC was compared with the classical ICP-based method, Zhang et al.’s method [35], and our recent work (LCKSVD_LHST) [39], which is one of the state-of-the-art approaches in the area of 3D ear identification. LCKSVD_LHST is different from the proposed method mainly threefold. First, the former method extracts local histograms of STs for a 3D ear range image; however, apart from this, we further encode the local dominant orientation information by adapting Competitive Code in this work. Second, we concatenate all the local histograms to generate one compact feature vector as a holistic representation in LCKSVD_LHST; on the contrary, in our current work, we uniformly partition a given 3D ear range image into blocks and for each block we compute from it a feature vector so that we have multiple feature vectors for robust representation. Third, in [39], we resort to LC-KSVD framework to learn a label-consistent dictionary with -norm regularization for discriminative sparse coding, which is an NP-hard problem solved by [67]; alternatively, we directly solve multiple CRC_RLS problems for MDCR in this work, which faithfully assures efficiency. For MDCR+LH_STCC, when constructing local dictionaries, several different patch sizes, 72 × 48, 78 × 52, 84 × 56, 90 × 60, and 93 × 62, were tested and we found that “90 × 60” could lead to the best performance. Thus, each ear range sample is empirically partitioned into 24 evenly distributed blocks and the block size is 90 × 60. Accordingly, 24 local dictionaries were built from the gallery set. The evaluation results are listed in Tables 4 and 5. In Table 4, the rank-1 recognition rate obtained by each method on each subset is given and in Table 5 the time consumption of one identification operation by each method on each subset is presented.

It can be found that the proposed approach MDCR+LH_STCC can get the highest classification accuracy. In addition, MDCR+LH_STCC runs faster than the other competitors.

5.4. Analysis of Performance Improvement

Based on the aforementioned experimental results, it can be seen that the proposed method MDCR+LH_STCC has low computational complexity and can achieve state-of-the-art classification accuracy in two fields, 3D palmprint identification and 3D ear identification.

Actually, the proposed MDCR+LH_STCC approach is an extension of the method CR_L2 [38], which was originally proposed for 3D palmprint identification. CR_L2 uses CRC_RLS [42] as the classification framework and the local histograms of surface types (LH_ST) as the feature. As compared to CR_L2, the novelty of MDCR+LH_STCC lies largely in two directions. First, we enriched LH_ST to LH_STCC by additionally incorporating local dominant orientation information encoded by CompCode [64]. Second, instead of using a single dictionary-based classification scheme as CRC_RLS does, we proposed a novel multi-dictionary collaborative representation-based classification framework, MDCR. Here, the performance enhancement caused by each new aspect of MDCR+LH_STCC is explained and demonstrated. Denote by CRC_RLS+LH_ST the algorithm using CRC_RLS as the classification framework and LH_STCC as the feature. Denote by MDCR+LH_ST the algorithm using MDCR for classification and LH_ST as the feature. Their performances for 3D palmprint identification and 3D ear identification are summarized in Tables 6 and 7, respectively. We also present the results achieved by CR_L2 and MDCR+LH_STCC in Tables 6 and 7 for a clear comparison.

It can be seen that the results listed in Tables 6 and 7 are quite consistent and the following conclusions can be drawn. First, CRC_RLS+LH_STCC performs better than CR_L2, indicating that, as a feature extraction scheme, LH_STCC performs better than LH_ST. It implies that LH_ST’s discriminant capability can be strengthened by incorporating local dominant orientation information. This implies that the newly proposed feature extraction scheme LH_STCC performs quite well for encoding local shape information of a 3D range scan. Second, MDCR+LH_ST can get better results than CR_L2, showing that the proposed classification framework MDCR performs better than CR_L2. It implies that the classifier based on multiple dictionaries constructed from local blocks works better than the one based on a single global dictionary. Third, to some extent, the performance enhancements brought by improvements in the feature extraction strategy and the classification strategy are “independent”; when they work together, the performance can be further boosted. That explains why MDCR+LH_STCC demonstrates the best performance in all cases.

6. Conclusions

Developing sophisticated 3D biometrics systems have attracted much attention from researchers recently. In this work, we proposed a universal framework, MDCR+LH_STCC, which can be used for 3D ear and 3D palmprint identification applications both. Our contributions are mainly twofold. First, we proposed a novel classification scheme, namely, multi-dictionary based collaborative representation, MDCR for short. Second, we proposed a patch-wise and statistics-based feature extraction scheme, LH_STCC, which integrates the range image’s local surface type information and local dominant orientation information. Experiments performed on two benchmark datasets show that MDCR+LH_STCC yields much higher recognition accuracy than the compared competing methods. In addition, it runs fast and thus is quite suitable for large-scale time-critical identification applications.

Data Availability

The data including source code and 3D palmprint dataset that are used to support the findings of this study have been deposited at http://sse.tongji.edu.cn/linzhang/MDCR/index.htm.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This research was funded in part by the Natural Science Foundation of China under Grant No. 61672380, in part by the Fundamental Research Funds for the Central Universities under Grant No. 2100219068, and in part by the National Key Research and Development Project under Grant No. 2017YFE0119300.