#### Abstract

This paper develops an approach to measure the information content in a biometric feature representation of iris images. In this context, the biometric feature information is calculated using the relative entropy between the intraclass and interclass feature distributions. The collected data is regularized using a Gaussian model of the feature covariances in order to practically measure the biometric information with limited data samples. An example of this method is shown for iris templates processed using Principal-Component Analysis- (PCA-) and Independent-Component Analysis- (ICA-) based feature decomposition schemes. From this, the biometric feature information is calculated to be approximately 278 bits for PCA and 288 bits for ICA iris features using Masek's iris recognition scheme. This value approximately matches previous estimates of iris information content.

#### 1. Introduction

Biometric systems allow identification of human persons based on physiological or behavioral characteristics, such as voice, handprint, iris, or facial characteristics. Iris recognition offers great benefits with respect to other authentication techniques since it has one of the lowest error rates among biometric technologies in terms of identification and verification of individuals. However, one question that remains unclear is “how much information is there in an iris image?” This question is related to many issues in biometric technology from the point of view of uniqueness, identifiability, and discriminating information. Additionally, such a measure is relevant to biometric cryptosystems and privacy measures. Several authors have presented approaches relevant to this question. For example, Wayman [1] introduced a set of statistical approaches to measure the separability of Gaussian feature distributions using a “cotton ball model.” Another approach was developed by Daugman [2] to measure the information content of iris images based on the discrimination entropy [3, 4], calculated directly from the match score distributions. However, none of these methods approach measurement of biometric feature information in iris images at the feature level from an information theoretic point of view. In this paper we elaborate an approach to measure the biometric information in iris images using the relative entropy measure presented by Adler et al. [5]. Here, the term “biometric information ()” is defined as

BI: it is the discriminating “extra bits” needed to represent an intraclass distribution with respect to the interclass feature distribution or, from a biometric recognition system point of view, the decrease in uncertainty about the identity of a person due to a set of biometric features measurements.

Such an analysis is intrinsically tied to a choice of biometric features. Based on this definition, this paper develops a mathematical framework to measure biometric feature information for iris images processed using the Daugman’s algorithm implemented by Masek in [6, 7]. In practice, there are limited numbers of samples of each person, which makes our measure ill-conditioned. In order to address this issue, we develop a stable algorithm based on a distribution modeling and regularization. We then use this method to calculate the biometric feature information for the iris region using the relative entropy technique. Iris biometric feature information is calculated using two different feature decomposition algorithms based on Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Based on this, we then define the BI information loss due to degradation in image quality, as the relative change in the BI. For the degradation process, different levels of speckle, white Gaussian, and salt and pepper noise are applied to the original iris image as well as blur modeled by *,* which represents the mapping of the original high-quality images to . For the case with no degradation, we measure the image from a person as part of a population , while in the presence of degradation, we obtain a person’s image as part of population .

#### 2. Methods

In this section we develop an algorithm to calculate biometric information based on a set of iris features, using the relative entropy measure [5]. The developed method is divided in to the following steps:(i)iris region normalization and feature extraction using Log-Gabor filters,(ii)PCA/ICA iris feature decomposition,(iii)distribution modeling of iris biometric features,(iv)biometric information calculation using the relative entropy measure.The iris regions used in the entropy calculation are taken from the CASIA database and processed using Masek’s implementation described in [6].

##### 2.1. Iris Recognition

Iris recognition is a highly studied and evolved technology in biometrics. The iris is known to contain a rich texture which means that unique information can be extracted from the iris to identify users. Iris features have been used to obtain high recognition accuracy for security applications [2]. Even though iris recognition has shown to be extremely accurate for user identification, there are still some issues remaining for practical use of this biometric [8]. For example, the fact that the human iris is about in diameter makes it very difficult to be imaged at high resolution without sophisticated camera systems. Traditional systems require user cooperation and interaction to capture the iris images. By observing the position of their iris on the camera system while being captured, users adjust their eye positions in order to locate their iris within a specific area on the screen [9, 10]. Many iris recognition techniques exist where some of the classical methods are developed by Daugman and Wilde [10, 11].

An example of an iris recognition system is shown in Figure 1 which illustrates the major stages from data acquisition to matching/decision outcomes [12]. The initial stage involves segmenting accurately the iris area from an eye image. This process consists in localizing the iris inner and outer boundaries, assuming they have circular or elliptical shapes. This process also requires detecting and removing any eyelash noise from the image prior to segmentation. In order to compensate for the variations in the pupil size and in the image capturing distances, the segmented iris region is mapped into a fixed length and dimensionless polar coordinate system [2]. In terms of feature extraction, iris recognition approaches can be divided into three major categories: phase-based methods [11], zero-crossing methods [13], and texture analysis-based methods [10]. Finally, the comparison between iris templates is made, and a metric is measured. If this value is higher than a threshold, the system outputs a nonmatch, meaning that each signature belongs to different irises. Otherwise, the system outputs a match, meaning that both templates were extracted from the same iris.

##### 2.2. Iris Normalization and Feature Extraction

This section describes the iris normalization, also known as the “Daugman Rubber Sheet Model” and the feature extraction process prior to iris encoding [2, 11]. The iris normalization stage is crucial since it results in a size-invariant representation of the original iris pixels by mapping the sampled iris pixels from the Cartesian coordinates to the normalized polar coordinates (Figure 2).

After data normalization, feature extraction becomes an essential part in any iris recognition system since good identification rates are directly related to the uniqueness and variability of the extracted features used to distinguish between different biometric templates. In this paper, we use a Log-Gabor filter [14] to encode the spatial, frequency, and orientation information in the iris image. The Log-Gabor filter has an advantage over the Gabor filter used by Daugman since the latter filter produces a DC component whenever the bandwidth is larger than one octave [15]. However, the use of a Gabor filter that is Gaussian on a logarithmic scale will eliminate the DC component. The frequency response of a Log-Gabor filter is given by where and represent the centre frequency and the filter bandwidth, respectively.

To encode iris information when working with an unwrapped iris matrix representation, each row of pixel intensities corresponds to a ring of pixels centered at the pupil center. In order to extract the phase feature templates, the Log-Gabor filter is applied to the 1D image vectors. Since the normalization process involves unwrapping the iris region from the circular shape to a rectangular matrix (i.e., from the Cartesian coordinates to the normalized polar coordinates), the spatial relationship along the concentric sampling rings and the radius becomes independent.

##### 2.3. Biometric Iris Feature Information

In this section we develop an algorithm to calculate biometric information based on a set of features, using Masek’s iris recognition system, and the relative entropy measure is shown to be the most appropriate information theoretic measure for the biometric feature information between the intra- and interclass biometric feature distributions [5]. Here, each class represents the features associated with a given individual, which vary due to measurement noise, lighting pose, and ageing. The intraclass distribution measures features and their variability for a single person, while the interclass describes features across the total population. , the Kullback-Leibler divergence (KLD), is defined to be the “extra bits” of information needed to represent with respect to [3]. is defined as follows: where the integral is over all feature dimensions, and are the probability mass functions or the intraclass and interclass feature distributions, respectively. In a generic biometric system, biometric features are measured, to create a biometric feature vector for each iris. For a person’s iris in a subset of irises , we have feature samples, while we have samples for a set of irises. Defining as an instance of random variable , we calculate the population feature mean as follows: The feature mean of an iris ,, is defined analogously, replacing by . The iris feature covariance matrix can be written as follows: The individuals iris feature covariance, , is again defined analogously. One important general difficulty with direct information theoretic measures is that of data availability. Based on the Gaussian model [5], we can write from which we can calculate the relative entropy as follows: where and

This expression calculates the relative entropy in bits for Gaussian distributions and . This expression corresponds to most of the desired requirements for a biometric feature information measure introduced in [5] such that we have the following.(1)If the intraclass feature distribution matches the interclass feature distribution: , this yields , as required.(2)As feature measurements improve, the covariance values, , will decrease, resulting in a reduction in and an increase in .(3)If a biometric template has a feature distribution far from the population mean, will be larger, resulting in a larger value of .(4)Combinations of uncorrelated feature vectors yield the sum of the individual measures.(5)Addition of features uncorrelated to iris features (i.e., iris noise) will not change . Such a feature will have an identical distribution in and .The following section describes a method to deal with issues of numerical instability and in the common circumstance in which only a small number of samples of each individual’s iris images are available.

##### 2.4. Regularization Methods for Degenerate Features

In order to guard against numerical instability in our measures, we wish to extract a mutually independent set of “important” features . To do this, we use the Principal Component Analysis (PCA) [16, 17] to generate a mapping , from the original biometric features to a new feature space of size . The PCA may be calculated from a singular value decomposition (SVD) [18] of the feature covariance matrix, such that
Since is positive definite, is orthonormal and is diagonal. We choose to perform the PCA on the interclass distribution , rather than , since is based on far more data and is therefore likely to be a more reliable estimate. The values of indicate the significance of each feature in PCA space. A feature , with small , will have very little effect on the overall biometric feature information. We use this analysis, in order to regularize and to reject degenerate features by truncating the SVD. We select a truncation threshold of where . Based on this threshold, is truncated to be and is truncated to . Using the basis calculated from the interclass distribution, we decompose the intraclass covariance into feature space :
where is not necessarily a diagonal matrix. However, since and describe somewhat similar data, we expect to have a strong diagonal component comparable to . Based on this regularization scheme, (2) may be rewritten in the PCA space as follows:
where and **.**

##### 2.5. Regularization Methods for Insufficient Data

The expression developed in the previous section solves the problem of ill-posedness of . However, may still be singular in the common circumstance in which only a small number of samples of each class (i.e., person’s eye) are available. Given images of an individual’s iris from which features are calculated, will be singular if , which will result in diverging to . In practice, this is a common occurrence, since most biometric systems calculate many hundreds of features, and there are only rarely more than ten of samples of each person. In order to address this issue, we develop an estimate which may act as a lower bound. In order to do this, we make the following assumptions.(1)Estimates of feature variances are valid for all .(2)Estimates of feature covariances for are only valid for the most important features, where .Features, which are not considered valid based on these assumptions, are set to zero by multiplying by a mask , where Using (9), This expression regularizes the intraclass covariance, , and assures that does not diverge. To clarify the effect of this regularization on , we note that intra-feature covariances will decrease toward zero, leading a differential entropy estimate diverging to .

We thus consider this regularization strategy to generate a lower bound on the biometric feature information. The selection of is a compromise between using all available measurements (by using a large ) and avoiding numerical instability when is close to singular (by using small ).

##### 2.6. Average Information of a Biometric System

The previous section has developed a measure of biometric feature information content of a biometric feature representation of a single iris template with respect to the feature distribution of the entire set. As discussed, the biometric feature information will vary between different iris samples; those with feature values further from the mean have larger biometric feature information. In order to use this approach to measure the biometric feature information content of a biometric system, we calculate the average biometric feature information for each iris in group of irises. This is a measure of the system biometric information (SBI) which can be calculated by averaging the iris template BI over the entire set of irises :

##### 2.7. Biometric Information Loss due to Degradation

In this section, we explore the effect of image degradation and the resulting decrease in biometric quality on the relative entropy measure. Intuitively, it is expected that image degradation changes the intra- and interclass distribution of the iris features resulting in a loss of biometric information. In general, image degradation is a nonlinear process; however, in this paper we use a linear degradation model to explore its effect. Different degradation processes are applied to the iris images in order to generate the degraded features. Different levels of speckle, white Gaussian, and salt and pepper noise are applied to the original iris image as well as Gaussian blur, which maps the original high-quality images to . Features, , are then extracted from the degraded images *G* using the developed feature extraction methods given in Section 2. We then compute the biometric information for the nondegraded distributions and for the degraded distributions .

Here represents the relative entropy between the intraclass and interclass iris feature distribution prior to degradation while is the relative entropy measure between the degraded intraclass and interclass iris feature distributions, respectively. From this, we calculate the normalized mean square distance characterizing the loss of information caused by the degradation model on the underlying features as where and are the variance of and the number of feature samples, respectively. measures the relative distance offset between the original and degraded distributions. is a unitless measure and may be interpreted as the fractional loss in BI due to a given image degradation. Using this degradation process, new sets of degraded features are obtained for different level of noise variances (speckle and white Gaussian), noise density (salt and pepper) and for different space-invariant Gaussian operators of size and , applied to the entire iris image.

#### 3. Results

Information in a feature representation of an iris is calculated using our described method for different irises. In order to test our algorithm, it is necessary to have multiple images of the same iris. For this reason, we used the CASIA database [19] which includes 756 iris images taken using 108 subjects where 6 or 7 images were presented per class (i.e., subject’s eye). The iris images were processed using Masek’s system from which we calculate the PCA (eigeniris) features [17] and the ICA iris features components [20] using Gabor phase feature set. For PCA and ICA feature decompositions, the 327 most dominant feature vectors (arbitrary choice) were computed and used for subsequent analysis. The number of selected feature vectors does not affect the BI result; it is only for representation purpose. Figures 3, 4, and 5 illustrate the amount of biometric information calculated per PCA and ICA iris features, respectively. Using the biometric information calculation procedure described in Section 2.3, the sum of the biometric information over the PCA iris features extracted from the set of irises taken from the CASIA database gives approximately 278 bits using Masek’s system. In addition, Gabor phase features were decomposed using the ICA technique described in [20] in order to have independent feature vectors. ICA has the advantage that it does not only decorrelate the signals but also reduces higher-order statistical dependencies in order to make the signals as statistically independent as possible. For the ICA features, an average of 288 bits was computed for . As noticed, the amount of information per iris feature is very close for PCA and ICA features. ICA features tend to contain more information since they fit the iris feature data model better.

In order to investigate the angular variations in iris information density, a plot of the biometric information as a function of the angle is shown in Figure 6; the iris region is encoded at a fixed radius (i.e., ) and varying angles (1° to 360°). It is seen that the BI changes only slightly as a function of the angle, which implies that the iris information is not a function of rotational angle. On the other hand, Figure 7 shows different results when the BI is plotted as a function of the normalized iris radius. A larger BI is seen at a smaller radius which indicates that an iris segment (i.e., ring) contains more information closer to the pupil boundary. This result corresponds to visual intuition; the inner iris region includes the collarette, a boundary separating the pupillary zone and the ciliary zone which can be seen on the anterior surface of the iris [12], and has a more distinctive pattern. This result might also suggest that iris recognition can be well performed with a partial iris segment which can be a plausible solution in some applications such as iris recognition at a distance or using off-angles images.

Using the image degradation process described in Section 2.7, 327 degraded feature vectors (*g*) are computed and used for subsequent analysis. The new iris features are calculated using the Log-Gabor filter described in Section 2.2. From the degraded features, ΔBI is computed for the degraded interclass and intraclass iris features distributions using (13). This measure represents the fractional amount of iris biometric information lost as a function of the degradation level. Figure 8 shows ΔBI computed as function of different noise and blur level for different images taken from the degraded iris image set. The -axis represents 10 different levels (in increasing order) of Gaussian blur and three different types of noise (Speckle, White Gaussian, Salt and Pepper). As seen in Figure 8, the relative information loss in an image increases with the amount of system degradation. Interestingly, ΔBI tends to reach a steady state after some level of noise degradation. On the other hand, amount of BI loss seems to increase as a function of blur level. This suggests that some features are unaffected by the noise degradation process and represent a lower bound of information measure of an iris feature distribution. Since iris contains a significant amount of details (high-frequencies), Gabor features extracted using the developed system tend to be more robust against noise but severely affected by blur since a significant amount of BI is lost at higher degradation blur level. Features affected uniquely by noise tend to preserve valuable information at larger noise level.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

#### 4. Discussion

This work describes an approach to measure biometric feature information for iris images. Examples of its application were shown for two different feature decomposition algorithms based on PCA and ICA where features are extracted using the Masek and Daugman’s iris recognition systems [2, 6]. PCA is based on the assumption that the most discriminating information corresponds to maximum data variance, under the constraint of orthogonality which gives uncorrelated components. This method effectively represents data in a linear subspace with minimum information loss. On the other hand, ICA is a demixing process whose goal is to express a set of random variables as linear combinations of statistically independent component variables. The major difference between both techniques is the fact that PCA uses only second-order statistics (variances corresponding to the largest eigenvalues of the sample covariance matrix), while ICA uses higher-order statistics which, depending on the application, can provide a more powerful and better data description than PCA under the assumption that the discriminating information that differentiate different classes is contained in the higher order statistics.

The result of biometric feature information calculations (approximately 278 bits for PCA and 288 bits for ICA iris features) is compatible with previous analyses of iris recognition accuracy. For example, Daugman states that the combinatorial complexity of the phase information of the iris across different persons spans about 249 degrees of freedom [21]. Expressing this variation as discrimination entropy [3] and using typical iris and pupil diameters of 11 mm and 5 mm, respectively, the observed amount of statistical variability among different iris patterns corresponds to an information density of about 3.2 bits/ on the iris. From this, we can easily calculate that an average iris should have an average iris area of which ideally would give ( bits per iris. Using our developed biometric information calculation scheme, we found that, depending on the feature decomposition and iris segmentation technique, we obtain on average 283 bits of information for the iris decomposition features. These results obtained using our algorithm supports Daugman’s theory since our iris images have an average iris diameter ranging approximately from 11 mm to 11.5 mm which explains the difference in bits between our method and Daugman’s where the iris diameter is assumed to be 11 mm. For instance, a positive difference of 0.5 mm in the iris diameter size results in an increase of 28.52 bits using Daugman’s discrimination entropy measure. Hence, this explains the difference in numbers between our current results and Daugman’s assumptions which seem to represent the lower bound for the entropy calculation.

A plot of the biometric information as a function of the angle (Figure 6) shows that the information contained in the iris varied little when iris segments are encoded at a fixed radius and varying angle. On the other hand, the BI increased in the iris region closer to the pupil (Figure 7). Since a larger BI is calculated for smaller radiuses, this implies that the iris contains richer information in the proximity of the pupil, more specifically, within the collarette region. This result suggests that possible iris recognition can be performed using a partial (i.e., inner rings) iris area which can facilitate the iris region segmentation process in various applications where high-quality iris image acquisition and user cooperation are not possible. Subsequently, we introduced a measure of information loss as a function of image degradation. It is shown that the fractional BI loss (ΔBI), based on the relative entropy, increases with the blur level. However, it reaches a steady state after some amount of noise degradation which suggests that some features are less affected by the noise degradation process than others. This shows the vulnerability of the iris Gabor features to blur.

In a general biometric system, the following issues associated with biometric features must be considered.(i)Feature distributions vary. In this work, all features are modeled as Gaussian which may be considered to estimate an upper bound for the entropy.(ii)Feature dimensionality may not be constant. For example, the number of available minutiae points varies. The method presented in this work does not address this issue, since the dimensions of and must be the same. Generalized entropy measures exist which may allow an extension of this approach to nonconstant dimensional features. It is interesting to note that the biometric entropy is larger for some iris features. Figure 1 shows a range of BI calculations for different feature number (varies on average from 0.5 to 2 bits per feature) for different individuals, which may help explain why some features are potentially more dominant than others. An analogy can be made with face recognition systems since some users have more dominant facial features than others which make them easier to recognize. This is perhaps some evidence for the “biometrics zoo” hypothesis [22] which classifies users, in the context of a speaker recognition system, into different groups based on their tendency to affect the FAR and FRR of a biometric system. In general, it states that some individuals possess more reliable/recognizable features (i.e., subjects with features that are well separated from others in the database) compared to other users who are intrinsically difficult to recognize and who can degrade the performance of a biometric system by increasing the FRR or FAR.

While we have introduced a measure in the context of iris recognition, we anticipate that such a measure may help address many questions in biometrics technology, such as the following.(i)Uniqueness of biometric features. A common question is “are biometric features really unique?”. While Pankanti et al. [23] have recently provided a sophisticated analysis of this problem based on biometric feature distributions directly, a general approach based on information content would help address this question for other biometric modalities.(ii)Performance limits of biometric matchers. While some algorithms outperform others, it is clear that there are ultimate limits to error rates, based on the information available in the biometric features. In this application, the biometric feature information is related to the discrimination entropy [2].(iii)Biometric fusion. Systems which combine biometric features are well understood to offer increased performance [4]. It may be possible to use the measure of biometric feature information to quantify whether a given combination of features offers any advantage or whether the fused features are largely redundant. The example of fusion of FLD and PCA (200 features) given here clearly falls into the latter category since it does not necessarily offer double the amount of information.

#### 5. Conclusion

This work describes an approach to measure biometric feature information for iris images processed using Daugman and Masek’s methods [7]. Examples of its application were shown for two different iris feature decomposition algorithms based on PCA and ICA subspace analysis. The result of biometric feature information calculations (approximately bits for PCA and bits for ICA iris features) is compatible with previous analyses of iris recognition accuracy. In addition, it is shown that BI loss increases as a function of blur or noise degradation level. However, it is seen that ΔBI reaches a steady state after some amount of noise degradation which suggests that some iris Gabor features are less affected by noise but considerably degraded by blur due to the underlying nature of the iris texture.