Abstract

Finger vein recognition is a promising biometric recognition technology, which verifies identities via the vein patterns in the fingers. In this paper, (2D)2 PCA is applied to extract features of finger veins, based on which a new recognition method is proposed in conjunction with metric learning. It learns a KNN classifier for each individual, which is different from the traditional methods where a fixed threshold is employed for all individuals. Besides, the SMOTE technology is adopted to solve the class-imbalance problem. Our experiments show that the proposed method is effective by achieving a recognition rate of 99.17%.

1. Introduction

Finger vein recognition is a promising biometric recognition technology which verifies identities through finger vein patterns. Medical studies have shown that the finger vein pattern is unique and stable. In detail, the finger veins of an individual are different from the others’, and even the veins captured from a single individual are quite different from one finger to another. Furthermore, the finger veins are also invariant for healthy adults.

Compared with fingerprints, finger veins are hard to be forged or stolen as they are hidden inside the fingers. The contactless captures of finger veins also ensure both convenience and cleanliness, and they are user-friendly. Furthermore, Finger veins are less affected by physiology and environment factors such as dry skin and dirt.

A typical finger vein recognition process is composed of the following four steps. Firstly, the finger vein images are obtained via the finger vein capturing devices. Secondly, the finger vein images are preprocessed. Thirdly, the features are extracted. Finally, the finger vein images are matched based on the extracted features.

The preprocessing procedure includes image enhancement, normalization, and segmentation. For image enhancement, Yang and Yan incorporated directional decomposition and Frangi filtering to enhance the image quality [1]. Yu et al. proposed an enhancement algorithm based on multi-threshold combination [2]. Yang and Yang introduced multi-channel Gabor filter to enhance the images and obtained better performance [3]. Finger vein segmentation is also a very important step, and there are some typical methods including line tracking [4], mean curvatures [5], and region growth-based feature [6]. A detailed description of these approaches is beyond the scope of this paper. However, a summary of these approaches with the typical references is provided in Table 1.

PCA is a popular linear dimensionality reduction and feature extraction technology. It has extensive applications in image processing. Wu and Liu extracted the PCA features and then trained a neural network for matching, which results in a high recognition rate [7]. Since PCA transforms the 2-dimensional image matrix to a 1-dimensional vector, the covariance matrix is always large in size and it is time intensive to obtain the projection matrix which is composed of the covariance matrix’s eigenvectors. Yang et al. proposed 2DPCA to reduce the size of the covariance matrix and save time for computing projection matrices [8]. In order to represent the characteristics of the 2-dimensional images more accurately, Zhang and Zhou introduced (2D)2 PCA which can reflect the information of the image in row and column directions, respectively, use less time to compute the projection matrix, and get better experimental results on face recognition [9].

Recently, more and more researchers apply machine learning methods to finger vein recognition. Liu et al. introduced manifold learning to finger vein recognition [10]. Wu and Liu used PCA and LDA to extract features and train a SVM model for recognition [11]. Measuring the distance of the two samples is the premise of machine learning. For example, KNN requires a distance metric to find the neighbors of the target instance and then conducts classification or regression based on the distance metric. Typical distance metrics, such as Euclidean distance, make significant contribution in some application domains. In some conditions, these metrics cannot satisfy the assumption that the distances between instances from the same class are small while those from different classes are large. It limits the utilities of most machine learning methods.

There are two challenges for finger vein recognition: (1) how to efficiently extract distinguishing features and (2) how to design a strong classifier with high recognition rate and fast recognition speed to make the system more practical in real-world applications.

To overcome these two challenges, in this paper we apply (2D)2 PCA to extract the features from finger vein images. In order to address the shortcoming of traditional distance-metric-based classifiers, we build a classifier for each individual based on metric learning. With regard to training samples of each classifier, the number of positive samples is inadequate as compared to the negative samples. Thus, we use SMOTE technology to oversample the positive samples to balance the two classes before training the classifier. The experimental results show that the proposed method has good performance on finger vein recognition.

The rest of this paper is organized as follows. The technical background is briefly introduced in Section 2. The proposed method is described in Section 3. Experimental results are provided in Section 4. Finally, this paper is concluded in Section 5.

2. Technical Background

2.1. (2D)2 PCA

PCA is a typical linear dimensionality reduction and feature extraction method. Due to the transformation from the 2-dimensional image matrix into a 1-dimensional column vector, PCA often makes the size of the corresponding covariance matrix too large, and computing the eigenvectors and eigenvalues becomes time-consuming. In order to solve this problem, Yang et al. proposed 2DPCA to extract the features [8]. 2DPCA directly uses the image matrix to compute PCA features without transforming the 2-dimensional image matrix into a 1-dimensional column vector. Therefore, it reduces the size of corresponding covariance matrix and obtains the feature projection matrix with less time. However, 2DPCA works only for the row direction of images. To address the problem, Zhang and Zhou proposed (2D)2 PCA which captures the image information from not only the row direction but also the column direction [9]. The experimental results show that (2D)2 PCA outperforms 2DPCA and PCA in terms of both recognition rate and running time. The process of (2D)2 PCA is described as follows.

Considering 𝑀 finger vein images, which are denoted by 𝐀1,…,𝐀𝑀, we compute the mean image matrix as 𝐀 = βˆ‘(1/𝑀)𝑗𝐀𝑗 and the image covariance matrix G as1𝐆=𝑀𝑀𝑗=1ξ‚€π€π‘—βˆ’π€ξ‚π‘‡ξ‚€π€π‘—βˆ’π€ξ‚.(1)

For a random image matrix 𝐀, the key of obtaining the new features is to get the projection matrix π—βˆˆπ‘…π‘›Γ—π‘‘, 𝑛≽𝑑. Then the new features are calculated as 𝐘=𝐀𝐗. The total scatter of the projected samples is used to determine a good projection matrix 𝐗, where the total scatter of the projected samples can be characterized by the trace of the covariance matrix of the projected feature vectors. From this point of view, we adopt the following criterion:𝐸(𝐽(𝐗)=traceπ˜βˆ’πΈ(𝐘))(π˜βˆ’πΈ(𝐘))𝑇𝐸=trace(π€π—βˆ’πΈ(𝐀𝐗))(π€π—βˆ’πΈ(𝐀𝐗))𝑇𝐗=trace𝑇𝐸(π€βˆ’πΈ(𝐀))𝑇𝐗.(π€βˆ’πΈ(𝐀))(2)

So,𝐗𝐽(𝐗)=trace𝑇𝐆𝐗.(3)

It has been proven that 𝐽(𝐗) gets the maximum when the projection matrix 𝐗 is composed by the 𝑑 orthonormal eigenvectors coupled to the 𝑑 largest eigenvalues. In so saying, 𝐗 obtains the optional value, and 𝑑 can be controlled by setting a threshold as follows:βˆ‘π‘‘π‘–=1πœ†π‘–βˆ‘π‘›π‘—=1πœ†π‘—β‰½πœƒ,(4) where πœƒ is a user-specific threshold and πœ†1,πœ†2,…,πœ†π‘› is the top-𝑛 largest eigenvalues of 𝐆.

Because 𝐗 only reflects the information in the row direction, Zhang and Zhou proposed alternative 2DPCA which reflects the information in the column direction and combines 2DPCA with alternative 2DPCA to obtain a new method called (2D)2 PCA [9]. Here is the process of alternative 2DPCA.

Let the image matrix 𝐀𝑗=[(𝐀𝑗(1))𝑇,…,(𝐀𝑗(𝑛))𝑇]𝑇, and the mean image matrix 𝐀=[(𝐀(1))𝑇,…,(𝐀(𝑛))𝑇]𝑇, where 𝐀𝑗(𝑖) and 𝐀(𝑖), denote the ith row vector of 𝐀𝑗(𝑖) and 𝐀(𝑖) respectively. The image covariance matrix can be rewritten as𝐆1=1π‘€π‘€ξ“π‘šπ‘—=1ξ“π‘˜=1𝐀𝑗(π‘˜)βˆ’π€(π‘˜)𝑇𝐀𝑗(π‘˜)βˆ’π€(π‘˜).(5)

Similarly, to achieve the projected matrix 𝐗 in 2DPCA, we can obtain the projection matrix π™βˆˆπ‘…π‘šΓ—π‘ž from (2) and (5). We can also compute π‘ž in the same manner as we compute 𝑑 in 2DPCA.

Using the projected matrix 𝐗, 𝐙 in 2DPCA and alternative 2DPCA, respectively, we can obtain the new feature𝐂=𝐙𝑇𝐀𝐗.(6)

We can see from (6) that the new feature 𝐂 reflects more information of the image than the features obtained by 2DPCA and alternative 2DPCA. Furthermore, the dimension of 𝐂 is smaller, and thus (2D)2 PCA costs less time than 2DPCA and alternative 2DPCA for image processing.

2.2. Metric Learning

Most machine learning methods use distance metrics to measure the dissimilarity of instances. Metric learning is able to learn an appropriate distance metric. The main task of the metric learning is to find a better distance metric, based on which the distances between the samples from same class become small while those from different classes become large. This helps to improve the performance of the machine learning methods.

To overcome the shortage of the KNN classifier using Euclidean distance, Weinberge et al. proposed a metric learning method called LMNN (Large Margin Nearest Neighbor) [12] which learns a distance metric to improve the performance of KNN classifiers. The metric is obtained by learning a linear transformation matrix 𝐋. With this distance metric, the distance between the same-class instances becomes smaller, and they are separated from the other instances by a large margin. The details are as follows.

Let π—π‘–βˆˆπ‘…π‘‘(𝑖=1,…,𝑛) denote the feature vector of training instances and let 𝑦𝑖 be the corresponding label. The essence of LMNN is to obtain a new distance 𝐷(π‘₯𝑖,π‘₯𝑗)=‖𝐋(π‘₯π‘–βˆ’π‘₯𝑗)β€–2=(π‘₯π‘–βˆ’π‘₯𝑗)𝑇𝐋𝑇𝐋(π‘₯π‘–βˆ’π‘₯𝑗) after learning a linear transformation 𝐋 matrix. With this distance metric, the distance between the instance and its π‘˜ nearest neighbors will be minimized and the distance between the instances in different classes will be larger. Figure 1 shows an example of LMNN.

In Figure 1, green circles denote instances from the first class, yellow squares denote instances from the second class, and red squares denote instances from the third class. Consider the instance denoted by the white circle, which is treated as a test instance from the first class, in our following analysis. Based on Euclidean distance, we find 4 nearest neighbors, and this test instance is misclassified into the second class. However, using the LMNN-learned metric; this instance is separated from the second and the third instances. The distance between this instance and its neighbor is small. Now it is correctly classified into the first class.

2.3. SMOTE

The performance of machine learning algorithms is typically evaluated by prediction accuracy. However, this is not applicable when the data is imbalanced. Existing solutions to the class imbalance problem can be divided into two categories. One is to assign distinct costs to training examples. The other is to resample the original dataset, either by oversampling the minority class and/or undersampling the majority class.

Chawla et al. proposed an oversampling approach called SMOTE where the minority class is oversampled by creating β€œsynthetic” examples [17]. The minority class is oversampled by taking each minority class sample and introducing synthetic examples along the line segments joining any/all of the π‘˜ minority class nearest neighbors. Depending upon the amount of oversampling required, neighbors from the π‘˜ nearest neighbors are randomly chosen.

3. The Proposed Method

The proposed method includes training process and recognition process. As shown in Figure 2, a classifier is built for each individual, and the samples from a certain individual are treated as positive and others are negative. In the verification mode, we input a test sample to corresponding classifier to verify whether the sample comes from this individual based on the classification result. In the identification mode, we input a test sample to every classifier and identify which individual this sample belongs to.

In the training process, it is necessary to preprocess the infrared images of the finger veins. Preprocessing includes grayscale, ROI selection, and normalization (e.g., size normalization and gray normalization). After the preprocessing, we apply (2D)2 PCA to extract the features of the training samples. Then we label the samples as positive and negative class accordingly and oversample the positive samples with SMOTE. We learn a new distance metric, that is, the transformation matrix L, with LMNN. Finally, we build the individual KNN classifier based on this new distance metric.

The preprocessing and feature extraction in the recognition process are similar to that in the training process. After that, we input the features of the samples to train classifier to verify the individual based on the classification result.

3.1. Preprocessing

The preprocessing includes image grayscale, ROI selection, size normalization, and gray normalization.

3.1.1. Image Grayscale

The original image (an example is shown in Figure 3(a)) is a 24-bit color image with a size of 320Γ—240. In order to reduce the computational complexity, we transform the original image to an 8-bit image based on the gray-scale equation π‘Œ=π‘…βˆ—0.299+πΊβˆ—0.588+π΅βˆ—0.114, where 𝑅, 𝐺, and 𝐡 denote the value of red, green, and blue. These three color components are coded by 8 bits. π‘Œ is the value of pixel after gray-scale transformation.

3.1.2. ROI Selection

As the background of finger vein region might include noise, we employ an edge-detection method to segment the finger vein region from the gray-scale image. A Sobel operator with a 3Γ—3 mask ξ‚ƒβˆ’101βˆ’202βˆ’101ξ‚„ is used for detecting the edges of fingers. The width of the finger region can be obtained based on the maximum and minimum abscissa values of the finger profile, and the height of the finger region is similarly detected. A rectangle region can be captured based on the width and height. This rectangle region is called ROI (as shown in Figure 3(b)).

3.1.3. Size Normalization

The size of the selected ROI is different from image to image due to personal factors such as different finger size and changing location. Therefore it is necessary to normalize the ROI region to the same size before the feature extraction process by (2D)2 PCA. We use bilinear interpolation for size normalization in this paper, and the size of the normalized ROI is set to be 96βˆ—64 (as shown in Figure 3(c)).

3.1.4. Gray Normalization

In order to extract efficient features, gray normalization is used to obtain a uniform gray distribution (as shown in Figure 3(d)). Formally, we have𝑝𝑝(𝑖,𝑗)=ξ…ž(𝑖,𝑗)βˆ’πΊ1𝐺2βˆ’πΊ1,(7)

whereπ‘ξ…ž(𝑖,𝑗) is the pixel value of the original image, 𝐺1 is the min pixel value of original image, 𝐺2 is the max pixel value of original image, and 𝑝(𝑖,𝑗) is the pixel value of image after gray normalization.

3.2. Training Process

After the preprocessing, we extract the features for each image by (2D)2 PCA and assign labels for them. A classifier is trained for every individual, where the samples belonging to this individual are treated as positive and others are negative. We oversample the positive samples by SMOTE to obtain an augmented training set which achieves class balance in general. LMNN is then used on the augmented training set to obtain a transformation matrix 𝐋. With this new distance metric, a KNN classifier is built for classification.

3.3. Recognition Process

In the verification mode, we input the feature vector of a test sample to a classifier which represents a certain individual, and then we verify whether the sample belongs to this individual based on the classification result. In the identification mode, we employ all classifiers to classify the test sample. If only a classifier C classifies it as positive class, this sample belongs to the individual which corresponds to the classifier C. If there are many classifiers classifying the sample as positive class, then we use the training accuracy rate for decision making: the sample belongs to the individual that corresponds to the classifier with the best training accuracy.

4. Experimental Result and Analysis

4.1. Database

The experiments were conducted using our finger vein database which is collected from 80 individuals’ (including 64 males and 16 females, Asian race) index fingers of right hand, where each index finger contributes 18 finger vein images. Each individual participated in two sessions, separated by two weeks (14 days). The age of the participants was between 19 and 60 years, and their occupations included university students, professors, and workers at our school. The capture device was manufactured by the Joint Lab for Intelligent Computing and Intelligent System of Wuhan University, China, which is illustrated in Figure 4.

The original spatial resolution of the data is 320Γ—240. After ROI extraction and size normalization, the size of the region used for feature extraction is reduced to 96Γ—64. Samples collected from the same finger belong to the same class. Therefore, there are 80 classes, where each class contains 18 samples in our database. Some typical finger vein images are shown in Figure 5.

4.2. Experimental Settings

All the experiments are implemented with MATLAB and conducted on a machine with 2.4 G CPU and 4 G memory.

We design three experiments to verify the efficiency of the proposed method. In Experiment 1, we extract features by (2D)2 PCA and then compare the classification performance of the metric-learning-based method and the classic Euclidean-distance-based method. In Experiment 2, we compare the classification performance of KNN classifier combined with LMNN using different number of training samples. In Experiment 3, we employ the SMOTE technology to further boost the performance.

Experiment 1. In this experiment, we first generate four data sets as follows. We select 480, 720, 960, and 1200 images (i.e., 6, 9, 12, and 15 images for each individual) for training, and the rest of 960, 720, 480, and 240 images (i.e., 12, 9, 6, and 3 images for each individual) are left for testing, respectively. The Euclidean-distance-based recognition method works in the following way. We treat the training samples from each individual as the positive class and construct a center point for each class, where the ith feature of the center point is calculated by averaging the corresponding feature values of the training samples. As there are 80 individuals, we then obtain 80 center point 𝐂𝑖(𝑖=1,2,…,80). For any testing sample 𝐂, we estimate the Euclidean distances from sample 𝐂 to each center point, 𝐷(𝐂,𝐂𝑖)=β€–π‚βˆ’π‚π‘–β€–2(𝑖=1,2,…,80). If (𝐂,𝐂𝑗)=min𝐷(𝐂,𝐂𝑖)(𝑖=1,2,…,80), then 𝐂 goes to the jth class.
The metric-learning-based method works similarly as the Euclidean-distance-based method except for the usage of the learned distance metric 𝐷(𝐂,𝐂𝑖)=‖𝐋(π‚βˆ’π‚π‘–)β€–2(𝑖=1,2,…,80). The recognition rates of these two methods are compared in Table 2.
It is clearly seen that the recognition rate of the metric-learning-based method is higher than the Euclidean-distance-based method. With distance metric transformation, two samples from different classes with small Euclidean distance are dragged farther. On the other hand, two samples from the same class with large Euclidean distance are pulled closer. Furthermore, the samples from different classes are separated by a large margin. Next we are going to provide an intuitive explanation based on the example shown in Figures 6 and 7.
These two figures show the data distribution of the data set with 480 training samples and 960 testing samples. We obtain 25 features for each sample using (2D)2 PCA and select 2 features with the largest contribution to Euclidean distance metric. These two features constitute the vertical and horizontal coordinates of Figure 6. Similarly, these two features are transformed to a new metric space using LMNN, as shown in Figure 7. The samples of the first individual (including 6 training images and 12 testing images) are treated as positive, and the rest of them are considered as negative. In Figures 6 and 7, we use red plus to denote positive training samples, green plus for positive testing samples, blue star for negative training samples, and yellow star for negative testing samples. It is shown from Figure 6 that it is difficult to distinguish the first class from the others because the distances between samples in the first class and the other classes are indiscriminating. This inherent drawback of the Euclidean distance significantly reduces the recognition performance. However, by using LMNN, the samples in the first class are gathered together, as shown in Figure 7. In detail, the positive samples are located mainly in the area of abscissa value between 0 and 10. On the contrary, most negative samples are scattered out. This makes it easier to discriminate samples in the first class from samples in the other classes.

Experiment 2. In this experiment, we select 6, 9, 12, and 15 images from each individual as training samples to build a KNN classifier. The underlying distance metric for each individual is learned by LMNN. Here the number of neighbors, that is, π‘˜, is empirically set to be 3 in KNN. We obtain different recognition rates with different numbers of training samples, and the experimental results are shown in Figure 8.
Overall, the recognition rate increases with the number of training images increases. When the number of the training images goes to 15, the recognition rate reaches 96.67%. It is also worth noting that, as compared to Table 2, the KNN-based method outperforms the above-mentioned metric-learning-based method and the Euclidean-distance-based method, by considering the same number of training images.

Experiment 3. This experiment verifies that SMOTE can improve the classification performance. We select 1200 images (15 images for each individual) for training and 240 images (3 images for each individual) for testing. We use SMOTE to oversample the positive samples to be 5, 10, 20, 30, 40, and 50 times as large as the original set. The recognition result is shown in Table 3. We observe that the recognition rate does not improve by only increasing a small number of synthetic positive samples, as shown in SMOTE-5 and SMOTE-10. After that, the recognition rate increases by about 3%, and finally it achieves 99.17% with SMOTE-40 or SMOTE-50. With a sufficiently large set of synthetic positive samples, the recognition performance would not improve any more.

5. Conclusion

This paper proposes a new finger vein recognition method based on (2D)2 PCA and metric learning. Firstly, we extract features by (2D)2 PCA and then train a binary classifier for each individual based on metric learning. Furthermore, we address the class imbalance problem by using SMOTE oversampling before the classifier is trained. The experimental results show that the proposed method achieves a recognition rate of 99.17%. The contributions of this paper are as follows. (1) We apply (2D)2 PCA to extract features of finger vein image, where (2D)2 PCA reflects the information in both the row direction and the column direction, and it is more efficient for feature extraction as compared to PCA and 2DPCA. (2) We build the KNN classifier based on metric learning using LMNN which changes the sample distribution in the new metric space. LMNN makes the distance between the samples from the same class smaller and the distance between the samples from different classes larger. Furthermore, we also employ a maximum margin framework to improve the recognition performance. This is incorporated with individually trained classifiers which reflect the characteristics of the corresponding individuals. (3) We note the class-imbalance problem; that is, when building the classifier for an individual, the number of the samples from the other individuals is considerably large. We tackle it by oversampling the positive samples with SMOTE, and the experimental results validate the effectiveness. Promising future work includes the exploration of features with better discrimination as well as the processing finger vein images of low quality.

Acknowledgments

This work is supported by National Natural Science Foundation of China under Grant nos. 61173069 and 61070097, and the Research Fund for the Doctoral Program of Higher Education under Grant no. 20100131110021. The authors would like to thank Shuaiqiang Wang and Guang-Tong Zhou for their helpful comments and constructive advice on structuring the paper. In addition, the authors would particularly like to thank the anonymous reviewers for their helpful suggestions.