Abstract

A qualia exploitation of sensor technology (QUEST) motivated architecture using algorithm fusion and adaptive feedback loops for face recognition for hyperspectral imagery (HSI) is presented. QUEST seeks to develop a general purpose computational intelligence system that captures the beneficial engineering aspects of qualia-based solutions. Qualia-based approaches are constructed from subjective representations and have the ability to detect, distinguish, and characterize entities in the environment Adaptive feedback loops are implemented that enhance performance by reducing candidate subjects in the gallery and by injecting additional probe images during the matching process. The architecture presented provides a framework for exploring more advanced integration strategies beyond those presented. Algorithmic results and performance improvements are presented as spatial, spectral, and temporal effects are utilized; additionally, a Matlab-based graphical user interface (GUI) is developed to aid processing, track performance, and to display results.

1. Introduction

Social interaction depends heavily on the amazing face recognition capability that humans possess, especially the innate ability to process facial information. In a myriad of environments and views, people are able to quickly recognize and interpret visual cues from another person’s face. With an increasing focus on personal protection and identity verification in public environments and during common interactions (e.g., air travel, financial transactions, and building access), the performance capability of the human system is now a desired requirement of our security and surveillance systems. Face recognition is a crucial tool being used in current operations in Iraq and Afghanistan by allied forces to identify and track enemies [1] and effectively distinguish friendlies and nonenemies [2]. The human recognition process utilizes not only spatial information but also important spectral and temporal aspects as well.

Utilizing only visual wavelengths for computer vision solutions has significant downsides, where features evident to humans are too subtle for a machine to capture. Prior research has shown deficiencies in computer vision techniques compared to human or animal vision when detecting defects in parts [4] or biometric identification [5]. By increasing the spectral sampling to include nonvisible wavelengths it might be possible to detect some of these subtle features included in the facial data. However, incorporation and handling of features in multispectral or hyperspectral imagery have not been fully investigated or subsequently extended to commercial applications [6].

The design of a biometric identification system should possess certain attributes to make it an effective operational system. These attributes include universality, distinctiveness, permanence, collectability, performance, acceptability, and circumvention [7]. Unfortunately, face recognition modality suffers from weaknesses in the areas of uniqueness, performance, and circumvention [8]. The ability to mitigate these weaknesses and ultimately match or exceed the recognition capability of a human is the performance benchmark for computer-based face recognition applications. By incorporating additional information inherently present in HSI, the vulnerabilities of uniqueness, performance, and circumvention can be mitigated.

The Carnegie Mellon University (CMU) hyperspectral imagery (HSI) face database, graciously provided by Dr. Takeo Kanade, was used for this research [3]. Figure 1 depicts an example of this data over several sampled wavelengths. The utilization of HSI and the contextual information contained within these image cubes provide the tools to create a hierarchal methodology to address the challenges face recognition systems must overcome.

In this paper, various algorithms are used to exploit the inherent material reflectance properties in HSI to detect, segment, and identify subjects. A closed loop fusion hierarchy is applied to a suite of facial recognition algorithms to produce a cumulative performance improvement over traditional methods. A GUI tool is introduced which facilitates responsive operation as pictorial, numerical, and graphical results from the various algorithms are displayed. Experimental results are presented and recommendations for further research are suggested.

2. Face Recognition Architecture

There are three main focus areas for this research, the application of facial recognition algorithms to HSI, the use of feature and decision fusion for improved results, and adaptive feedback to re-examine and confirm the most difficult matches. This discussion starts with a review of the dataset to understand the dimensionality of the data and exploitation potential.

2.1. Database Description

Hyperspectral imagery involves collecting narrow spectral band reflectances across a contiguous portion of the electromagnetic spectrum. The CMU database images contains 65 spectral bands covering the visible and near infrared (NIR) from 450 nm to 1100 nm with a 50 nm spectral sampling and a spatial resolution of pixels [3].

By taking advantage of fundamental properties of HSI (different materials reflect different wavelengths of light differently), skin, hair, and background materials are relatively easy to detect. The advantages of using higher dimensional data compared to grayscale or 3-band “true” color image includes the ability to detect skin segments since the spectral reflectance properties are well-understood [9]. The segmented portions of the image can be used to provide context that aids traditional face recognition algorithms.

Leveraging the signatures available through HSI, features such as skin and hair can be detected using a straightforward method similar to the Normalized Difference Vegetation Index (NDVI) used in remote sensing to detect live vegetation [9]. A Normalized Differential Skin Index (NDSI) can be computed easily through the sum and difference of key spectral bands [9]. Applying this technique and a variety of edge detection methods, several contextual layers of an individual’s face can be extracted automatically from an HSI as seen in Figure 2 [10]. For individuals attempting to conceal or alter their appearance, it is now possible to detect inconsistencies such as make-up and prosthetic devices due to the differing reflectance properties [11].

Denes et al. [3] noted that the prototype camera used for the CMU data was subject to stray light leaks and optical imperfections as he noted that, “better face recognition clearly requires higher definition through a more sensitive, low noise camera or through higher levels of illumination.” Viewed from another perspective, this noisy data provided an ideal environment for the development of an integration strategy for real world applications. The findings from these previous efforts provide a foundation to construct an intelligent hierarchy to address challenges for recognition systems using face recognition biometric as a test bed.

The portion of the CMU database examined herein contains images for 54 different subjects, 36 of whom sat for two sessions on different days. This database subset comprises our gallery and probe sets (subjects to identify and a gallery to search). Additionally, a subset of subjects from the gallery and probe sets were available for multiple sessions; 3 sessions (28 subjects), 4 sessions (22 subjects), or 5 sessions (16 subjects). These additional images are used in the adaptive feedback process to analyze the ability to inject additional images for confirmation of a subject match.

2.2. Previous Hyperspectral Face Recognition Research

Robila [12] investigated using both the visible and NIR wavelengths, as he explored the utility of spectral angles for comparison. Other research investigating NIR and visible wavelength faces include Klare and Jain [13], who examined matching NIR faces to visible light faces. Bourlai et al. [14] presented an initial study of combining NIR and shortwave IR (SWIR) faces with visible for more complete face representation, in addition to comparing cross-spectral matching (visible to SWIR). Kong et al. [15] delivered an overview of advantages and disadvantages of facial recognition methods with respect to the image wavelengths. Chou and Bajcsy [16] used hyperspectral images and experimented with segmenting different tissue types in the human hand. Elbakary et al. [17] used the K-means clustering algorithm to segment the skin surface in hyperspectral images and then measured the Mahalanobis distance between signatures to match subjects. Pan has accomplished the most extensive research utilizing spectral signatures of skin to identify individuals [1820] and in a subsequent effort [21] explored the benefit of incorporating spatial measurements at various wavelengths. These efforts all produced valuable insight but individually these techniques did not provide the desired performance for this challenging data set.

2.3. Recognition Algorithms

Traditional biometric systems are often open loop, comprised of four basic components, sensor, feature extraction, matching, and decision making, illustrated by the block diagram in Figure 3 [22]. However, such systems do not typically incorporate feedback from the feature extraction, matching, or decision making processes.

The sensor module acquires the biometric data, in this case a hyperspectral image, from the intended subject. The feature extraction module processes the captured data from the sensor and extracts features for detection. The matching module compares the extracted features against stored features saved in memory and generates comparisons called match scores. Match scores are comparisons made in a multidimensional comparison space and are a measure of distance between two images. The decision-making module takes these scores and determines the user’s identity by selecting the stored features (identification) associated with the smallest match score or by evaluating the obtained match score against a threshold for the claimed identity’s features (verification). Feature extraction algorithms, including hair and face detection, are considered as part of preprocessing, while matching algorithms (Table 1 lists specific algorithms considered) are divided into spatial, spectral, and interest point variants; since some functions calculate scores relative to the entire gallery, those are annotated as well.

2.3.1. Preprocessing and Feature Extraction Algorithms

The face recognition methodology presented employed a variety of techniques and methods to preprocess and segment data. Many applications use images accompanied by the manually selected coordinates of the eyes that are subsequently used for alignment and sizing. This upfront effort can be time consuming and assumes the involvement of human recognition and participation at the onset of the process. Typical manual preprocessing techniques [23] include selecting eyes coordinates, geometric normalization, and masking. For the CMU dataset and the process presented, an automated centroid detection based on hair or skin segmentation, face elliptical mask cropping, and or subsequent histogram equalization was employed. Following this process, two separate algorithms were used to locate face and hair surfaces through NDSI [9].

2.3.2. Spatial Recognition Algorithms

Following the preprocessing functions, the image data is ready for recognition algorithms. The first area explored was the spatial domain of the hyperspectral data in the form of grayscale face images. The skin and hair segments were subsequently fed to supporting algorithms of increasing detail for matching. The first and most straightforward method was to use face measurements such as size, length, width, and eccentricity to create a string of parameters for comparison. These images either can be the spatial face segments, hair segments, or combined representations.

The method used for hair and face image matching was the eigenface method devised by Turk and Pentland [24]. Eigenface is a holistic approach developed as an attempt to replicate the human recognition process and also as an alternative to many feature-based methods that utilized specific attributes but unfortunately discarded much of the surrounding image and contextual information. An important aspect of this algorithm is the creation of the comparison space or face space. In the eigenface algorithm all probes are projected into a face space for comparison, eigenface then computes distances between faces in a face space of the gallery of potential candidates [24]. The composition of the gallery’s subjects has a direct impact on the comparison scores and quality of the matches. The creation and dimensionality of the face space are an active area for research [25]. In the final architecture, the gallery will be tailored based on the progression of matches and will play a role in the adaptive selection of matches.

2.3.3. Spectral Recognition Algorithms

In addition to spatial recognition algorithms, spectral recognition can be considered with data that includes multiple spectral dimensions, such as the CMU dataset. For this analysis, spectral signatures of the hair and face segments are compared using spectral angle comparisons [12, 26]. Following methods used by Robila [26], spectral matching capability was evaluated using several variations. The first and most straightforward was by simply using a comparison of the average spectral angle. The variability of the spectral signatures, especially at the sensor wavelength limits, did have an effect on the overall performance. With that in mind, several of the wavelengths at the end of the frequency span were iteratively removed until maximum recognition performance was achieved.

Nunez’s [9] research provided a method with NDSI to identify skin surfaces using only two wavelengths from hyperspectral images. The technique and reduction offered an attractive option to a more involved clustering method. Unfortunately, NDSI looked for two key wavelengths 1080 nm and 1580 nm in order to calculate the index. The CMU data only spanned the spectral range from 450 nm to 1090 nm. So only one of the key wavelengths was contained in the data, and the one wavelength included was located at the performance boundary of the Spectropolarimetric camera.

With the advice from Nunez, a less effective, but suitable, alternative was devised that used a combination of indexes designed to highlight the unique characteristics of the spectral signature of human skin and eliminate common confusers. By examining the available wavelengths in the data as well as the quality of the information, an alternative approach was designed to sum relevant wavelengths and create indexes similar to NDSI that exploited the spectral characteristics of skin. Seen below.

NDSI Substitute Approach
is the NDSI calculation and the alternative indices. Below the NDSI equation are four indices used to highlight the increase in reflectivity in the NIR wavelengths versus blue wavelengths , highlighting the characteristic water absorption dip at 980 nm and a final check to remove potential plant material that can act as a confuser . By combining these indices that indicate the possibility of skin, when the value is greater than one, the skin segment can be identified rather efficiently compared to K means. The most effective implementation of this approach relied on and indicators to identify skin potential pixels. All pixels in the hyperspectral image cube that fell near the calculated average of the potential skin pixels were deemed a skin surface.A similar approach was used for the identification of hair segments in the image. This time using a NDVI calculation, (2), and then fine tuning the selected segment using Mahalanobis distance comparison for only the red (650 nm), green (510 nm), blue (475 nm), NIR (1000 nm) wavelengths and the hair segments, including facial hair, was obtained.

NDVI Calculation
With the unique ability to segment the skin and hair segments of the image, it was uncomplicated to include a centroid calculation to accomplish the task of automatically centering images for identification. These adjustments include the centering of all face images, leveling in the case of unintended rotation of the face, and resizing the image for a consistent scale across individuals or the population. Once this is accomplished, the removal of background clutter is accomplished by the application of an elliptical mask. Unfortunately, when this is accomplished, some important information is removed from the image including the relative shape of the head and a good portion of the hair on top of the head. This same approach was initially attempted but as our processing capability matured, we found this step crude in its application.

2.3.4. Interest Point Recognition Algorithms

Finally, the face, hair, and combined representation are feed to Lowe’s scale and orientation robust scale invariant feature transform (SIFT) method to compare matching interest points or SIFT keys [27, 28]. SIFT extracts these features or key interest points using a difference of gaussians function. The local minimum and maximum of this function are used to create a feature vectors that describe the orientation and gradient based on neighboring pixels. These features are shown to be invariant to image scaling, translation, and rotation. To establish baseline performance, these methods are initially used in isolation and then used in combination to evaluate a range of fusion strategies.

3. Adaptive Facial Recognition

3.1. Qualia Exploitation of Sensor Technology (QUEST) Motivated Methodology

Ultimately, the performance and computational demands of working with high-dimensional data required a strategy that utilized only the relevant information in a more effective method. Intelligently handling high-dimensional biometric data involves dealing with varying levels of abstraction, learning, adaptation, organization, and exploiting structural relationships in the data [29].

Turning to the qualia exploitation of sensor technology (QUEST) methodology, we attempt to develop a general-purpose computational intelligence system that captures the advantages of qualia-like representations [30]. Qualia can be defined as a representation of the physical environment or a facet included in ones intrinsically available internal representation of the world around them [31]. It is our goal to combine different qualia into a metarepresentation, so sensory inputs can be integrated into a model that is adaptable and efficiently functional and can be deliberated repeatedly. A guiding principle of QUEST highlights the use of qualia that map sensory input to more useful and efficient states that complement the reflexive intuition level of processing. The functional requirement for a QUEST system is to possess the ability to detect, distinguish, and characterize entities in the environment [32].

In order to build a QUEST system for our task, it is important to develop and understand the concept of an agent [31]. An agent takes a subset of stimuli from the environment and processes this into relevant information. Information is defined as the reduction of uncertainty in that agent’s internal representation. An agent has knowledge of other agents and of their environmental representation, akin to a theory of mind with insight into their needs. The agent transmits selected aspects of its information representation to neighboring or “aligned” agents. Agents transmit stimuli upward in higher levels of abstraction and can also transmit information downward providing details and context that can influence lower level agents (Figure 4). An entity uses various sets of these agents and their collective knowledge to create an internal representation of its environment.

The relevant information or context is compromised of biometric characteristics and cues across the electromagnetic spectrum. Rogers et al. [32] states that the concept of an agent is to synthesize aspects of its qualia that are provided to it by an “aligned” agent, such that an agent reduces the uncertainty in its internal representation by processing data into information. An agent communicates this context to other agents that use this information to improve their internal representation and reduce their uncertainty. Context can only be transmitted between agents that are aligned as each agent contains a representation of the other’s environment. The combination of fiducial features and higher level abstracted characteristics creates this context. In the human recognition system, the mind stores data not so much as sensory numbers but as relative comparisons to prior experiences that can change over time. For a face recognition system, the relative comparisons should serve an equally important role in refining the solution space and guiding the search process. The connections or links in our fusion hierarchy provide the context of the face. There are many links that can connect the internal and external facial features that have proved so important in human recognition research [33]. The links chosen can help incorporate higher levels of abstraction such as important soft biometric [34] cues or can be the connection between spatial and spectral information.

Figure 5 illustrates the links and various identification algorithms employed in our HSI face recognition system. The concept in Figure 5 can be considered as an extension of the general face recognition concept from Figure 3 to incorporate multiple feature extractions, matching algorithms with a fusion decided identity declaration. A combination of score and rank fusion strategies will be evaluated to obtain the best method to synthesize the results of the agent information.

3.2. Fusion Hierarchy

From the field of automatic target recognition, Ando [35] provides a useful hierarchy for processing the hyperspectral face images. At the lowest level, processing includes smoothing and segmenting the image. During mid-level processing, cues such as shading, texture, reflectance, and illumination are integrated. Lastly, high-level processing integrates information that is invariant across different viewpoints for final identification. Using this guide, the initial face recognition hierarchy could be achieved though incrementally applying segmentation, processing, and identification steps. However, a more efficient means involve parallel processing and score fusion of the segmentation, processing, and identification steps, utilizing not only information from the spatial dimension of the image but spectral elements to help assist in the tasks of segmenting, processing, and identification. Figure 6 illustrates the combined and incremental approach, wherein the algorithmic scores are normalized and then fused across algorithms applied through score fusion.

The straightforward fusion approach presented in Figure 6 did not provide the desired performance during initial testing. Subsequent adjustments, such as the implementation of feedback loops would eventually prove necessary, but the general approach of progressing from easily processed general characteristics to more specific and more computationally intensive characteristics would remain apparent through design of the final processing architecture.

3.3. Adaptive Feedback

As alluded to earlier, algorithms used herein, such as the eigenface algorithm, derive scores for a set of faces that remain constant within a static comparison space derived from the gallery of candidates for a situation. Our implementation of the eigenface method counter balances this consistency with an adaptive training set wherein identified poor matches are removed. Eigenface is then rerun with a different training set, resulting in a different set of eigenfaces (principal components of the set of faces) for the next iteration. Additionally, for score fusion, each set of algorithmic scores must be normalized so the set of algorithms employ consistent and fusible scales. The adaptive feedback strategy employed leverages the changing eigenface space and the normalized scores passed to the fusion algorithms, tailored during the matching process by removing the lowest scoring subjects.

To incorporate the ability to make relative comparisons over time, adaptive feedback loops were added to the established facial recognition hierarchy (Figure 3), but within the adaptive fusion framework this approach is depicted in Figure 7. Closed loop systems compare the measurement of the feedback with the desired output [36]. By incorporating feedback of decision making results, refining the decision-making accuracy is possible.

For the biometric system presented, there are two feedback loops. The first feedback loop is included to examine the improvement potential of changing the dimensionality of the candidate gallery, thus changing the relative scores of some algorithms. This procedure involves reducing the gallery size by removing the lowest scoring subjects. This process is applied only for subject matching scores that fall below a user-specified threshold. The second feedback loop incorporates multi-look functionality, adding the capability to test additional probe images if and when they become available. This facet represents a temporal dimension that comes with multiple probe images or with hyperspectral video that obtains a series of face images over time.

Both feedback loops can be active or applied individually. Finally, there are several control variables for the selection and weighting of the agents used in the fusion process. Research by Chawla and Bowye [25] and Kuncheva [37] has highlighted the importance of randomness and diversity in the creation of classifier ensembles, so the controlled and random selection of these active agents is a current area of research.

At any stage of the hierarchy presented earlier in Figure 6, a Libet level answer similar to intuition is created and is integrated at the higher or metarepresentation levels of the hierarchy. The incorporation of qualia occurs as deliberation is made over the combined evidence from prior agents. The qualia-based Cartesian theater that is created through the fusion representation provides an engineering advantage in the confidence assessment.

4. Graphical User Interface Tool

To facilitate interpretation of data analysis and assist with the visualization of results, a Matlab-based GUI tool was designed to operate and test the facial recognition software. The GUI tool, pictured in Figure 8, is a direct parallel to the architecture presented in Figure 7. A user can select the active agents, enable feedback loops, and select from either a score or rank fusion approach while simultaneously analyzing results.

The GUI displays the probe to be matched and the best current match is directly opposite. Below these displays, the top ten matches are displayed in thumbnail depiction along with their relative rankings and scores. Viewing the results of each algorithm is permitted by selecting the algorithm of interest in the “Results to Display” drop down menu. If feedback loops are employed, a user can select which result set to view, accompanied by the dimensionality of the gallery, in the “Gallery Set Results to View” menu. The pictorial results can be viewed in either grayscale or color images.

A box plot is displayed for each probe under consideration to provide continuous score distribution feedback when viewing results for each face and method. Additionally, gallery matches for each probe are scrollable to enable the visual evaluation of results for the entire score distribution. To review the quantitative results, the user can choose from cumulative match score plots, box plots, or histogram depiction of the relative scores and statistics.

For processing purposes, Matlab’s multiple processor pooling was employed on a dual quad core computer with 16 GB of RAM. The processing requirements of the hyperspectral data along with the chosen methods benefit from the use of parallel processing. However, for computational ease, an additional utility tool allows the user to view saved results of any prior run by simply loading a results file. This file will display the algorithms used, type of feedback loops used, and weighting schemes and permit a user to view all results and face matches. The user is notified if the selected computer can support running the complete suite of software tools by viewing the status bar.

5. Performance Assessment

5.1. Algorithm and Fusion Performance

During the initial testing of the CMU data, many of the same algorithms were utilized from previous HSI research [12, 17, 18, 21]. The results confirmed some of the challenges present in the CMU data. The difference being the quality between the CMU data and the grayscale AT & T data [37] or the more recent CAL HSI data [18] obtained with more modern equipment. Although the performance level of these algorithms were not replicated, the value of the various techniques is not diminished. A comparison of the previously published performance versus that obtained through our initial testing is shown in Figure 9, to establish a preliminary performance threshold.

Data processing starts with common but now automated preprocessing step, followed by the extraction of basic face features and then a matching step where face features and characteristics are compared for subject matching. Average computation time for the preprocessing of each face is 14 seconds. Face matching algorithms take an additional average of 13 seconds to process each face against the gallery of 36 subjects for an algorithm suite consisting of 6 algorithms including SIFT, eigenface, various geometric comparisons, and NDSI. Processing time can vary depending on the number of algorithms or agents activated by the user.

Findings from this initial round of testing reinforce the need for a fusion framework that combines complimentary aspects of these algorithms to enhance the performance capability regardless of data quality or environmental setting. Taking into account the processing time of some algorithms, a method to accomplish effective data reduction and processing should also be considered to reduce overall computational time. The next section will briefly describe the results of integrating the separate algorithms into a hierarchy for a robust face recognition system.

5.2. QUEST Hierarchy Results and Findings

A combination of score and rank fusion strategies were tested with the most effective being a weighted score fusion strategy, wherein the overall matching score is a combination of weighted individual matching scores. Figure 10 illustrates a cumulative match score result using three eigenface-based methods (“hair,” “face,” and “skin”) and unity weighting; the right-hand figure illustrates the changes to the comparison space through dropping the two lowest performing faces from the gallery and reexamining only the lowest scoring half of the probe set. The cumulative match score plots depict the number of correct matches for the rank along the horizontal access. The scores of “1” in the following figures indicate the ability to correctly identify all images during the first attempt. These figures should not be confused with ROC curves or a summation of match scores.

Figure 10 displays score fusion improving the cumulative results over any one method. The reduced gallery included 15 probes to identify against a gallery of 34. Of particular interest is the rank at which the fused results reach 1 indicating that through the reduced gallery method; all subjects are identified within 30 matches improving over 31 matches for the full gallery.

Figure 11 incorporates spectral and spatial recognition to the algorithms presented in Figure 10, the improvement is seen through reaching a cumulative match of “1” by 24 matches. The reduced scores converge to “1” slower than the full gallery (26 versus 24 matches); however this is for a gallery space of the 17 most difficult subjects to identify for this algorithm set. Both results show a distinct improvement over fusion for only the eigenface method in Figure 10.

Continuing the fusion methodology to incorporate interest point matching produces Figure 12, which depicts cumulative match score results obtained using unity weighting for all agents compared to double weighting for the SIFT algorithms using either 6 agents or 7 agents. While the SIFT algorithm contributes a majority of the contribution, it is only through the inclusion of other identification methodologies, and inherent segmentation capability, that the overall identification accuracy is increased to 100%.

Enhancing this fusion strategy with the addition of the adaptive gallery feedback loop and the multi-look functionality allows us to continually process the results until a chosen threshold or confidence level is achieved. Figure 13 depicts an example, using the “6 Agent” framework with unity weighting from Figure 12, where the poorest scoring match distribution is shown after initial matching and then after four feedback repetitions, during which the gallery size was reduced by 10 percent, and a new probe image was injected each time. Through this repetitive process, matches with the lowest matching scores are rechecked as poor candidates are removed from the gallery, and additional probe images are inserted into the process to confirm the correct identification.

6. Conclusion

Even with the distinctiveness that comes with every human being, no single metric or feature has demonstrated the ability to identify all individuals in both controlled and uncontrolled environments across large populations using a single modality. This challenge frequently leads to solutions that incorporate multiple modalities that require close proximity and permission that accompany the selected biometrics not to mention the additional equipment and complexity. An alternative to this challenge may be to fuse contextual or complimentary spatial, spectral, and temporal information in an efficient architecture that enhances effectiveness and efficiency. The use of hyperspectral imagery and a fusion hierarchy similar to the one presented in this paper offers many opportunities for the improvement of current face recognition systems and can be applied to a wider array of object recognition problems.