Abstract

Orientation information is critical to the accuracy of ear recognition systems. In this paper, a new feature extraction approach is investigated for ear recognition by using orientation information of Gabor wavelets. The proposed Gabor orientation feature can not only avoid too much redundancy in conventional Gabor feature but also tend to extract more precise orientation information of the ear shape contours. Then, Gabor orientation feature based nonnegative sparse representation classification (Gabor orientation + NSRC) is proposed for ear recognition. Compared with SRC in which the sparse coding coefficients can be negative, the nonnegativity of NSRC conforms to the intuitive notion of combining parts to form a whole and therefore is more consistent with the biological modeling of visual data. Additionally, the use of Gabor orientation features increases the discriminative power of NSRC. Extensive experimental results show that the proposed Gabor orientation feature based nonnegative sparse representation classification paradigm achieves much better recognition performance and is found to be more robust to challenging problems such as pose changes, illumination variations, and ear partial occlusion in real-world applications.

1. Introduction

Because of its wide use in many application domains, biometrics has been a hotspot of pattern recognition and modern security technique development in the past few years. Although great research interests have been devoted to face, fingerprint, and iris, it is believed that each biometrics has its own strengths and weaknesses, and no single biometrics can be expected to meet all the requirements imposed by all applications [1]. Therefore, further research efforts are needed to exploit other potential biometrics modalities that can be conveniently acquired at lower cost. The ear is an emerging biometrics modality with rich and stable structure that does not change significantly as an individual grows [2]. And the ear has “desirable properties such as universality, uniqueness, and permanence” [3]. Moreover, the expression problem, which is a challenging and unsolved bottleneck for unconstrained face recognition, does not exist for ear recognition as ears have a wealthy of structural features that resist the expression variations. Additionally, the acquisition of ear images is considered to be nonintrusive by most people in comparison with other biometrics modalities such as fingerprints or iris. Because of these properties, academic interests in ear recognition have grown significantly in the past few years [4, 5].

Using 3D models for recognition is considered as a promising solution to ear identification with pose changes or illumination variations [6, 7]. However, the slow speed of 3D ear data acquisition limits its wide use in civilian scenarios. What is more, the currently employed 3D data acquisition device is not only expensive and large in size but also imposes more restrictions on the subject. These are obviously not suitable for nonintrusive human identification in real applications. Therefore, the focus of our work is to exploit the 2D ear images which can be conveniently acquired using low cost digital camera.

Varieties of approaches have been explored in the literature for ear identification by using 2D ear images. Chang et al. [8] applied classical Principal Component Analysis (PCA) to face and ear recognition separately, and experimental results showed that ear and face did not have much difference on recognition performance. Hurley et al. [9] developed a new method of localizing ear shape features by using force field transform. Arbab-Zavar and Nixon [10] proposed a model-based approach for ear recognition. The proposed method was based on a model derived by a stochastic clustering on a set of scale invariant features extracted from the training set. Kumar and Wu proposed an ear recognition approach by using the phase information of Log-Gabor filters to encode local ear structures. Yuan and Mu [11] proposed a 2D ear recognition approach based on local information fusion to deal with ear recognition under partial occlusion. However, most of these research works focused on ear recognition in constrained environments. Great efforts are still needed to effectively deal with challenging problems such as pose changes, illumination variations, and image occlusion so as to realize robust ear recognition in real-world applications.

Recently, Wright et al. [12] proposed an innovative classification algorithm SRC (sparse representation based classification) for face recognition, which is demonstrated to be especially effective to deal with image occlusion and corruption. The success of SRC boosts the research interests in sparse representation based ear recognition. Later, Naseem et al. [13] directly applied SRC to ear recognition, which is the first time to address ear biometrics in the context of sparse representation. Huang et al. [14] developed a robust face and ear based multimodal biometric system using sparse representation, which integrated the face and the ear at feature level. In the framework of sparse representation based classification listed above, extracted features from the training data are used to develop a dictionary. Then classification is performed by representing the extracted features of the test data as a sparse linear combination of atoms in the dictionary. However, what needs to be pointed out is that the extracted feature is still critical [15] to sparse representation based classification for ear recognition. Since the number of training samples is often limited in real applications, ear features such as Eigen ear and Random ear are not effective to handle the variations of illumination, pose, and local deformation. Therefore, in the framework of sparse representation based classification, significant efforts are required to investigate more discriminative features that are robust to pose changes, illumination variations, or ear occlusion.

The rich texture information contained in structures of the outer ear can be encoded using Gabor filters. Moreover, local Gabor features are less sensitive to variations of illumination or pose than the holistic features such as Eigen ear and Random ear. Therefore, Gabor related research has been explored in ear recognition. Wang and Yuan proposed to extract local frequency features by using Gabor features and general discriminant analysis [16]. Kumar and Zhang [17] used Log-Gabor wavelets to extract the phase information from the gray-level ear signals. Khorsandi et al. [18] presented a fully automated approach for ear recognition using sparse representation of Gabor features from the ear images. All of these methods lead to improved ear recognition performance. However, the redundant conventional Gabor features are defined via concatenation of the magnitude coefficients resulting in extremely high dimensionality of features since Gabor filters with multiple scales and directions are adopted.

This paper investigates an effective algorithm based on nonnegative sparse representation of Gabor orientation feature for ear recognition under pose changes, illumination variations, and image occlusion. Our feature extraction scheme is to construct Gabor orientation features. Compared with conventional Gabor features, the proposed Gabor orientation feature can not only reduce the information redundancy but also effectively describe the ear shape contours. Then, Gabor orientation feature based nonnegative sparse representation classification (Gabor orientation + NSRC) is proposed for ear recognition. The proposed classification algorithm treats the Gabor orientation features from all the training samples as the dictionary of our nonnegative sparse representation classification (NSRC) for ear recognition, and then the test ear is represented as a linear additive (without subtraction) combination of the Gabor orientation features extracted from all the training ear samples. Compared with SRC in which the sparse coding coefficients can be negative, allowing the data to “cancel each other out” by subtraction, the nonnegativity of our Gabor orientation feature based nonnegative sparse representation classification algorithm conforms to the intuitive notion of combining parts to form a whole and therefore is more consistent with the biological modeling of visual data. Extensive experimental results show that the proposed classification paradigm achieves much better recognition performance and is found to be more robust to challenging problems including pose changes, illumination variations, and ear partial occlusion.

2.1. Gabor Features

The Gabor filters with direction and scale can be defined as follows [19]: where denotes the pixel of an image, represents the norm operation, and is defined as with , . is the maximum frequency, and is the spacing factor between kernels in the frequency domain.

The Gabor transformation of an image is the convolution of the image with the Gabor filters:

Then the Gabor filtering coefficients are a complex number that can be rewritten as where is the magnitude and is the phase. Magnitude information is known to reflect the variations of local energy in the image. Therefore, in most Gabor based feature extraction research, conventional Gabor features are usually defined via concatenation of the magnitude coefficients [20].

2.2. Sparse Representation Based Classification (SRC)

The sparse representation based classification (SRC) algorithm was proposed for robust face identification in [12]. We denote by the set of training samples from the class. , , is an dimensional vector obtained by stacking the columns of the sample of the class. The sparse representation makes an assumption that the input sample can be represented as a sparse linear combination of all the samples from the same class. Thus for a test sample from the class, ideally, could be well approximated by the linear combination of the all the samples within ; that is, .

Assuming that we have training samples from classes, we define the concatenation of the training samples from all classes by , where . Then the test sample can be rewritten in terms of all training samples as , .

In SRC without occlusion, the sparse solution of the input sample is solved via -minimization

Then classification is made by where is a new vector whose only nonzero entries are those that are associated with class .

While for image with occlusion or corruption, the occluded or corrupted test sample is rewritten as where . The test sample without occlusion or corruption and the corruption error have sparse representation over the training sample dictionary and occlusion dictionary , respectively. The occlusion dictionary was defined as identity matrix in our work. The sparse coding coefficient could be solved via -minimization and classification is then made by

3. Gabor Orientation Feature Based Robust Representation and Classification

3.1. Gabor Orientation Feature Extraction

Conventional Gabor feature extraction technique generates redundant features with extremely high dimensionality of the Gabor features. Consequently, extracting Gabor features is computationally intensive, making the features impractical for real-time application. What is more, redundant information in conventional Gabor features will decrease the discriminative classification capability of features.

In this paper, we extract Gabor orientation information contained in the magnitude of Gabor transform. The input image is firstly convoluted with the Gabor kernel functions to obtain magnitude information across different directions and different scales. Then, for a fixed direction, magnitude information of all the scales at this direction is cumulated to formulate the orientation feature. In comparison with conventional Gabor features, the Gabor orientation feature can not only reduce the feature dimensionality by a factor of the Gabor scale parameter but also reduce the redundancy between the orientation information across various scales. What is more, the Gabor orientation feature extracted in this way strengthens the orientation information of the ear shape contours, which is critical to ear identification as orientation features are more important to identify an ear.

Gabor kernels with 3 different scales () and four different directions () are used in this paper. We predefine parameters , , and . The whole process of the Gabor orientation feature extraction from the ear image is illustrated in Algorithm 1 below.

Algorithm 1 (Gabor orientation feature extraction of ear images). Considering the Following.(1)Input: an ear image , Gabor kernel functions , for .(2)Convoluting the ear image with Gabor kernel functions: .(3)Cumulating magnitude information of all the scales at each direction: .(4)Output: Gabor orientation feature defined as concatenation of across all the four directions: ; ; ; .

Figure 1 illustrates the process of Algorithm 1 to obtain the Gabor orientation features of an input ear image. Obviously, there is a rich amount of redundancies in the filtering responses across different scales and directions. It is easy to see that our proposed Gabor orientation information across four different directions can not only reduce the feature dimensionality by a factor of the Gabor scale parameter but also enhance the orientation information the ear shape contours, which is critical to ear recognition as orientation information is much more important to identify an ear.

3.2. Nonnegative Sparse Representation of Gabor Orientation Features

In sparse representation classification described in Section 2.2, the input data is represented as a sparse linear combination of atoms in the dictionary involving both additive and subtractive operations. The negativity of the coding coefficients allows the data to “cancel each other out” by subtraction, which lacks physical interpretation for visual data. In fact, nonnegativity is more consistent with the biological modeling of visual data [2123] and often produces much better results for data representation [24]. Lee and Seung in [25] argued forcefully for nonnegative representation. Other arguments for nonnegative representation were based on biological modeling, where such constraints are related to the nonnegativity of neural firing rates [26].

Nonnegative sparse representation specializes in that it enforces nonnegativity constraints on the dictionary and sparse coding coefficients; that is, all the elements must be equal to or greater than zero. The nonnegativity constraint leads nonnegative sparse representation to part-based representation. The nonnegative sparse representation model can be formulated as with additional constraints , , which is different from standard sparse representation model.

Appropriate dictionary design plays an important role in the framework of sparse representation based classification algorithm [27]. The proposed Gabor orientation feature can not only enhance orientation information of the ear shape but also tolerate image local deformation to some extent. In this paper, we propose to use Gabor orientation features extracted from all the training ear images as the dictionary of our Gabor orientation feature based nonnegative sparse representation classification model.

Supposing there exist training ear images from distinct classes with images from the class, we firstly extracted Gabor orientation feature from every training ear sample identified with the vector , called atoms of our Gabor orientation feature based nonnegative sparse representation classification. Here denotes the index of the class, , and denotes the index of the training sample, . Then, all the atoms from the class are arranged as columns of a matrix . Finally, the dictionary of our Gabor orientation feature based nonnegative sparse representation classification is defined as the concatenation of the atoms from all classes: .

Thus the linear representation of , denoting the Gabor orientation feature of the test ear from the class, can be represented in terms of all the atoms in dictionary as , where is a coefficient vector whose entries are zero except for those from the same class as the test ear signal. is called the sparse coding coefficient.

In SRC, the test data is represented as a combination of atoms in the dictionary involving both additive and subtractive interactions. Here, we propose to express the test ear sample as a linear additive (with nonsubtraction) combination of all the atoms. Our proposed Gabor orientation feature based nonnegative sparse representation classification (Gabor orientation + NSRC) treats Gabor orientation feature as the atoms of the dictionary . According to the feature extraction process presented in Section 3.1, the elements in the dictionary are all nonnegative, so additional constraints only need to be enforced on the sparse coefficient ; that is, , for . So the proposed Gabor orientation feature based nonnegative sparse representation (GNSRC) model for ear recognition is given below:

Similar to SRC, after the nonnegative sparse coding coefficient is computed by algorithm given in section below, classification is then made by where is a new vector whose only nonnegative entries are those that are associated with class .

When the test ear image is occluded, the test ear sample is rewritten as where , for .

Then classification strategy in (10) should be modified as

The solution of our Gabor orientation feature based nonnegative sparse representation classification model based ear recognition algorithm is presented in section below.

3.3. Nonnegative Sparse Solution to Gabor Orientation Feature Based Nonnegative Sparse Representation Classification

Recent theoretical development in sparse representation reveals that -minimization can be used to approximate -minimization problem, making the problem of (9) convex in the dictionary while still encouraging sparse solutions [28]. Moreover, it is suggested that -minimization leads to more stable active sets and is preferred for the classification tasks [27]. Therefore, we propose to approximate the solution of (9) via -minimization. By replacing with the norm, the Gabor orientation feature based nonnegative sparse representation classification algorithm can be modeled as with constraints , for . This can be rewritten as a standard problem in linear optimization under quadratic and linear inequality constraints.

Therefore, for an appropriate Lagrange multiplier which controls the compromise between accurate reconstruction and sparseness, the solution to the problem (13) is precisely the solution to the unconstrained optimization problem:

The dictionary is known because atoms of are defined as the Gabor orientation features extracted from all the training ear samples. Therefore (14) is quadratic with respect to . The global minimum can be calculated using optimization algorithms such as gradient descent and quadratic programming. In paper [29], an efficient algorithm was presented to solve (14). The update rule is given below:

4. Experimental Results

In this section, we will investigate the use of our proposed Gabor orientation feature based nonnegative sparse representation classification to deal with ear recognition under challenging problems such as pose changes, illumination variations, and ear partial occlusion. Extensive experiments are carried out on UND database Collection J2 and USTB ear database to validate the claims of the previous sections.

4.1. Robustness to Challenging Problems: Pose Changes and Illumination Variations

UND database Collection J2 [30] is used in the experiment to evaluate the performance of our proposed algorithm for ear recognition under varying poses and illuminations. Established between 2003 and 2005, UND database Collection J2 consists of 415 subjects. Some of the ear images undergo obvious pose changes or illumination variations, and some are with occlusion. An improved Adaboost algorithm is used to detect and locate ear area automatically [31]. Typical cropped ear images from one subject are shown in Figure 2. It is obvious to see that ear samples in this database suffer from large pose changes and illumination variations.

In this experiment, we randomly select one ear image per subject for testing and the remaining ear images are used as training samples. No preprocessing operations such as denoising, illumination normalization, or pose normalization are carried out in this experiment. We compare the recognition performance of our proposed Gabor orientation feature with Eigen-ear, LBP (Local Binary Patterns), and conventional Gabor features using nearest neighbor (NN) as classifier. Figure 3 illustrates the cumulative match characteristics (CMC) curves of these four ear recognition approaches. The experimental results illustrate that our proposed Gabor orientation feature outperforms conventional Gabor feature, LBP, and Eigen-ear even using simple NN classifier for recognition. It demonstrates that our proposed Gabor orientation feature is effective to describe ear features.

In order to investigate the recognition performance of our proposed Gabor orientation feature based nonnegative sparse representation classification (Gabor orientation + NSRC), the number of the training samples is required to be sufficiently large to accurately determine the identity of the test sample. Nevertheless, some subjects only contain a few images (2 or 4 images) in this database. These subjects are definitely not suitable for sparse representation classification. We choose subjects that have more than 10 images and a total of 60 subjects meet this criterion. We randomly select five images each subject for testing, the remaining images are selected for training. In the framework of sparse representation based classification algorithm, the dimensionality of the atoms in the dictionary will greatly affect the running time of recognition. Therefore, after conventional Gabor feature and our proposed Gabor orientation features are extracted, the features are subsequently downsampled to 60, 120, 240, 360, 480, 600, 720, and 840.

The rank one recognition performances of conventional Gabor features and our proposed Gabor orientation features using NN, SRC, and NSRC under different feature dimensionalities are illustrated in Figure 4. As can be seen in Figure 4, when using the same classifier for classification, our proposed Gabor orientation feature achieves much better performance than conventional Gabor feature. That is because, in comparison with conventional Gabor features, our proposed Gabor orientation features can not only reduce the redundancy between the orientation information across different scales but also enhance the orientation information of the ear shape contours. It is easy to see that the proposed Gabor orientation + NSRC achieves the best recognition performance on UND ear database J2 with large pose changes and illumination variations.

Table 1 lists the recognition performance comparisons of our proposed Gabor orientation feature using different classifiers: NN, SRC, and NSRC. As can be seen in Table 1, our proposed Gabor orientation feature based nonnegative sparse representation classification (Gabor orientation + NSRC) achieves the best performance. When the feature dimension exceeds or equal to 240, it acquires a recognition rate of more than 90%, which is a break-taking recognition performance on such a database under challenging practical conditions including pose changes and ambient illumination variations. That is because the nonnegativity of the proposed Gabor orientation feature based nonnegative sparse representation classification algorithm is more consistent with the biological modeling of visual data and therefore leads to an improved recognition performance.

From the experimental results given in Figure 4 and Table 1, we can conclude that the proposed Gabor orientation feature based nonnegative sparse representation classification algorithm is effective to deal with ear recognition under challenging conditions with varying poses and illuminations.

4.2. Robustness to Occlusion

Ear occlusion is considered as a challenging problem inevitable in real applications as ears are often occluded by some objects including hair, hat, or earring [6]. Occlusion poses a great obstacle to robust ear identification in real-world application scenarios. As a result, ear recognition with partial occlusion is addressed as an open challenging problem in most ear recognition related researches [1, 7]. In this section, we will specialize in evaluating the robustness of our proposed Gabor orientation feature based nonnegative sparse representation classification algorithm (Gabor orientation + NSRC) for ear recognition under random occlusion.

Most of the available ear databases suffer from pose changes, illumination variations, and partial occlusion simultaneously. Our USTB ear database III [32] is publicly available for academic research. On our USTB ear database III, all images are acquired with color CCD camera under the white background and constant lighting. Furthermore, a total of 20 ear images are acquired for each subject, sufficient for the sparse representation based classification algorithm. Because of these properties, this database is suitable for carrying out specialized research on ear recognition with partial occlusion, excluding other influencing factors such as illumination variations and pose changes. Figure 5 presents a typical subject from this ear database.

We randomly occlude the test ear image with 5, 10, 15, 20, 25, 30, 35, 40, and 50 percent by replacing a block of each test ear image with an unrelated image to evaluate our proposed Gabor orientation feature based nonnegative sparse representation classification (Gabor orientation + NSRC) for ear recognition under various levels of random occlusion. The location of occlusion is randomly chosen and is unknown to the computer, which is rational in real-world ear recognition applications. We choose interval of 5 because ear is smaller in size and small increase in occlusion range can have severe impact on recognition performance in comparison with face images. In our work, various levels of occlusion at any locations are evaluated more thoroughly to demonstrate the effectiveness of the proposed algorithm for ear recognition under occlusion. Figure 6 illustrates the randomly occluded test ear images.

The whole process of Gabor orientation feature based sparse representation classification for an occluded test ear image is illustrated in Figures 7 and 8. The feature extraction process of the proposed Gabor orientation feature based sparse representation classification is illustrated in Figure 7. Figure 7(a) shows a 25% randomly occluded test ear sample from the first class of USTB ear database III. Figure 7(b) shows the Gabor filtering responses of the occluded test ear sample. Figures 7(c), 7(d), 7(e), and 7(f) illustrate the Gabor orientation information of the test ear sample across four different directions. Figure 8 illustrates the classification process of the proposed Gabor orientation feature based sparse representation classification algorithm. Figures 8(a) and 8(b) plot the sparse coding coefficients and representation residual using Gabor orientation feature based sparse representation classification (Gabor orientation + SRC). Figures 8(c) and 8(d) plot the nonnegative sparse coding coefficients and representation residual using Gabor orientation feature based nonnegative sparse representation classification (Gabor orientation + NSRC). We see that Gabor orientation + NSRC correctly classifies the 25% occluded test ear to the first class of the database. However, the occluded ear sample is wrongly classified using Gabor orientation + SRC. Although the representation coefficients are both sparse for Gabor orientation + SRC and Gabor orientation + NSRC, the main difference lies in that the representation coefficients of our proposed Gabor orientation + NSRC are all nonnegative. It demonstrates that nonnegativity is more consistent with the biological modeling of visual data and our Gabor orientation + NSRC exhibits greater robustness to ear image occlusion compared with Gabor orientation + SRC.

The recognition rates when the ear is occluded using Gabor orientation + NSRC, Downsample + SRC, Random ear + SRC, and Eigen ear + SRC are illustrated in Figure 9. From the results described in Figure 9, we can see that our Gabor orientation feature based sparse representation classification (Gabor orientation + NSRC) realizes the best recognition performance. Even when the occlusion percent reaches 25%, the proposed Gabor orientation + NSRC algorithm can still achieve a recognition rate of more than 90%, greatly surpassing the other three methods. With the occlusion percent becoming larger, the advantage of our proposed Gabor orientation + NSRC over other three approaches is getting higher. That is because, compared with other three feature extraction methods, the proposed Gabor orientation feature can effectively encode more precise orientation information of ear shape contours, which is more robust to ear image local deformation to some extent. Furthermore, the nonnegativity of Gabor orientation + NSRC conforms to human visual perception and therefore leads to better recognition performance. In a word, the proposed Gabor orientation feature based sparse representation classification shows greater robustness to ear partial occlusion and achieves a promising performance for ear recognition under random occlusion.

Table 2 lists the direct recognition performance comparisons between conventional Gabor feature and our proposed Gabor orientation feature under the framework of sparse representation based classification. As can be seen in Table 2, for the same Gabor orientation feature proposed in the paper, Gabor orientation + NSRC outperforms Gabor orientation + SRC greatly, especially when the occlusion percent surpasses 15%. The same phenomenon holds for conventional Gabor feature; that is, conventional Gabor + NSRC surpasses conventional Gabor + SRC greatly, especially when the occlusion percent surpasses 10%. It demonstrates that nonnegativity conforms to the intuitive notion of combining parts to form a whole and hence leads to improved performance for occluded ear recognition. Obviously, the proposed Gabor orientation + NSRC algorithm achieves the best recognition performance and shows great robustness to ear occlusion. Even when the ear occlusion percent reaches 30%, the proposed Gabor orientation + NSRC can still achieve a recognition rate of 88.35%, while none of the other three approaches achieves 50%. The experimental results listed in Table 2 demonstrate that the proposed Gabor orientation feature based nonnegative sparse representation classification algorithm exhibits more robustness to ear occlusion, especially for large scale occlusion.

5. Conclusions

In this paper, a new feature extraction approach is proposed by using orientation information of Gabor wavelets. The new Gabor orientation feature extracts orientation information of the ear across different directions and effectively describe the ear shape contour information. Then, combining visual perception characteristics of Gabor orientation features and nonnegative sparse representation, we propose to use Gabor orientation feature based nonnegativity sparse representation classification (Gabor orientation + NSRC) for ear recognition under challenging problems such as pose changes, illumination variations, and ear occlusion. Extensive experimental results on UND J2 ear database and USTB ear database demonstrate the effectiveness of our proposed Gabor orientation features and its superiority over conventional Gabor feature. Especially, when combined with nonnegative sparse representation classification (NSRC), the proposed Gabor orientation feature based nonnegative sparse representation classification algorithm achieves better recognition performance and shows greater robustness to pose changes, illumination variations, and occlusion, which are challenging problems for ear recognition in real applications.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This paper is supported by the National Natural Science Foundation of China (no. 61375010, no. 61170116, and no. 60973064), the Fundamental Research Funds for the Central Universities (no. FRF-SD-12-017A and no. FRF-TP-12-100A), and the Doctoral Fund of Ministry of Education of China (no. 20100006110014). The authors would like to express their sincere appreciation to the anonymous reviewers for their insightful comments, which greatly helped them to improve the quality of the paper.