Biometric face recognition is becoming more frequently used in different application scenarios. However, spoofing attacks with facial disguises are still a serious problem for state of the art face recognition algorithms. This work proposes an approach to face verification based on spectral signatures of material surfaces in the short wave infrared (SWIR) range. They allow distinguishing authentic human skin reliably from other materials, independent of the skin type. We present the design of an active SWIR imaging system that acquires four-band multispectral image stacks in real-time. The system uses pulsed small band illumination, which allows for fast image acquisition and high spectral resolution and renders it widely independent of ambient light. After extracting the spectral signatures from the acquired images, detected faces can be verified or rejected by classifying the material as “skin” or “no-skin.” The approach is extensively evaluated with respect to both acquisition and classification performance. In addition, we present a database containing RGB and multispectral SWIR face images, as well as spectrometer measurements of a variety of subjects, which is used to evaluate our approach and will be made available to the research community by the time this work is published.


Face recognition is a very important aspect for biometric systems and a very active research topic [1]. The human face has advantages over other biometric traits, as it can easily be captured in a nonintrusive way from a distance [2]. Consequently, biometric face recognition systems are becoming more frequently used, for example, at airports in the form of automated border control systems, for access control systems at critical infrastructure, or even for user log-on and authentication in computers or modern smartphones. However, despite the significant progress in the field, face recognition still faces serious problems in real-world scenarios when dealing with changing illumination conditions, poses, and facial expressions, as well as facial disguises (“fakes”), such as masks [3].

To overcome the problem of changing illumination conditions, the use of infrared imagery has been proposed in the recent years. Frontal illumination of faces with near infrared light that is invisible to the human eye helps to reduce the influence of ambient light significantly without distracting or blinding the subjects [4].

For the detection of fakes, also referred to as liveness detection, at least three forms of spoofing have to be considered: photographs, prerecorded or live video (e.g., shown on a mobile device), and partial or complete facial disguises such as masks. The impact of such attacks on face recognition has been researched in several studies, for example, in the context of the research project TABULA RASA [5]. Although some countermeasures for such attacks have been proposed [68], especially the attacks with facial disguises and masks, they are still a problem for state of the art face recognition systems.

Masks can be manufactured using very different materials with varying textures and surface properties, for example, paper, latex, rubber, plastics, or silicon. Due to the variations found in human skin color and texture, distinguishing any possible material from genuine human skin using only the visual domain is a very difficult task [9].

To overcome these problems, the use of infrared imaging has been proposed in prior work. Jacquez et al. [10] have shown that human skin has very specific remission characteristics in the infrared spectral range: the spectral remission of skin above 1200 nm is widely independent of the skin type and mainly influenced by the absorption spectrum of water. In addition, the spectral remission of most other materials differs strongly from that of skin: Figure 1 shows the remission intensities of human skin in the visual and infrared spectral range up to 1700 nm for six different skin types, denoted as skin types 1 (very light colored) to 6 (very dark colored) after Fitzpatrick [11], compared to the remission spectra of materials that might be used to create facial disguises.

In the literature, the infrared spectrum below 1400 nm is commonly referred to as the near infrared (NIR) band and the spectrum between 1400 nm and 3000 nm as the short wave infrared (SWIR) band. This work focuses on the spectral range of 900 nm up to 1700 nm. When describing this wavelength range, most researchers use only the term SWIR in order to distinguish it from work limited to the NIR range below 1000 nm. This paper will adopt this simplification and also use only the term SWIR in the following to describe this wavelength range. The existing approaches that make use of the SWIR spectral range can be classified into four groups: multispectral image acquisition using multiple cameras with band pass filters [9, 12], hyperspectral imagers [13], single cameras using filter-wheels with band pass filters for sequential multispectral image acquisition [14], and, more recently, single cameras with Bayer-like band pass filter patterns applied directly on the sensor [15]. All of these systems are passive (filter-based) and require sufficient illumination by daylight or external lighting. They will be discussed in detail in Section 1.

In our previous work, we presented an active multispectral point sensor for contactless skin detection which can be used for both safety and security applications, as well as a “proof of concept” of an active multispectral imaging system [16, 17]. Both the sensor and the imaging system acquire a “spectral signature” of object surfaces: a specific combination of remission intensities in distinct, narrow wavebands that is used for the classification of the object’s surface material.

The contributions of this work are twofold.(1)Based on our prior work, we present an improved system design of an active multispectral camera system optimized for face verification. The system acquires four-band multispectral image stacks in the SWIR range in real-time. The main improvements are(i)optimized illumination homogeneity,(ii)extensive camera system calibration,(iii)compensation of motion artifacts,(iv)advanced classification methods,(v)an elaborate evaluation regarding both skin detection and face verification.(2)We present data from a study with more than 130 participants (at the time of writing) that combines spectral measurements at several points on faces and limbs with pictures taken with both an RGB camera and the presented multispectral camera system. A subset of this database, reduced by the images of participants that did not agree to publication, will be made available to the research community on our website (http://isf.h-brs.de/) by the time this work is published. We expect the published database to contain spectrometer data from at least 120 participants and image data from at least 50 participants.

The remainder of this paper is organized as follows: Section 1 gives an overview of the related work. Section 2 presents the design of the proposed camera system with a focus on hardware. Sections 3 and 4 describe the methods applied for image preprocessing and analysis. In Section 5, the camera system and the proposed skin and fake detection method are evaluated. For this purpose, a database of spectrometer measurements, as well as multispectral SWIR and RGB images, is presented. Section 6 concludes the paper.

In the following, we will focus on work that is directly related to our approach, that is, based on the SWIR spectral range. A more general, comprehensive overview of methods for face recognition in the infrared spectrum, including the thermal infrared range, can be found in [3].

Taking advantage of the specific remission characteristics of human skin in the SWIR spectral range for its detection is not a new idea, but this approach has (to the best of our knowledge) only rarely been researched in the literature.

In 2000, Pavlidis and Symosek [9] demonstrated that the SWIR range has many advantages for face detection in general and for disguise detection in specific. They proposed a dual band camera system, consisting of two coregistered cameras, with one camera having a spectral sensitivity below 1400 nm (ideally 800 nm to 1400 nm) and the second camera having a spectral sensitivity above 1400 nm (ideally 1400 nm up to 2200 nm). Their system can work with either sunlight or artificial illumination and it uses a fusion algorithm based on weighted differences to detect skin in the acquired images. Depending on the spectral distribution of the illumination source, the weighting factors have to be adapted, as the system is not independent of ambient light. The authors conclude that their system achieves very good face and disguise detection capabilities compared to systems in the visual spectrum, only limited when it comes to the detection of surgical face alterations, where they see an advantage of systems using the thermal infrared range. In a later publication [18], they presented an extension of the system with a third camera for the visual spectrum and a more advanced face detection approach that included multiband eye and eyebrow detection. Their system uses beam splitters to allow all cameras to view the scene from the same vantage point in order to avoid problems with image registration.

At the U.S. Air Force Institute of Technology, Nunez and Mendenhall [12, 13] researched the use of hyperspectral SWIR imagery to detect skin for remote sensing applications. The authors acquired images in 81 narrow spectral bands between 900 nm and 1744 nm with a hyperspectral camera and introduced a detailed reflectance model of human skin based on this data. For real-time and in the field use, the authors propose a multicamera system to acquire images in distinct narrow wavebands using different band pass filters on each camera. To avoid problems with image registration, this system uses dichroic mirrors to split up the beam so that all cameras share one single lens and view the scene from the same vantage point.

More recently, Bourlai et al. [14] presented a multispectral SWIR image acquisition system using a single camera with an attached rotating filter wheel. The filter wheel is equipped with five band pass filters with a full width at half maximum (FWHM) of 100 nm around the peak wavelengths 1150 nm, 1250 nm, 1350 nm, 1450 nm, and 1550 nm. By synchronizing the camera’s integration time to the filter wheel, the system can capture all five waveband images within 260 ms (i.e., at a rate of ≈3.8 frames per second (FPS)).

Bertozzi et al. [15] propose a camera with a broadband sensor for both the visual and SWIR spectral range (i.e., 400 nm to 1700 nm) that is equipped with a Bayer-like mosaic filter pattern directly on top of the pixel array. One clear filter (full bandwidth) is combined with three high pass filters with cut-off wavelengths of 540 nm, 1000 nm, and 1350 nm. By subtracting the acquired values of neighboring pixels with different filters, multispectral images in the four wavebands of approximately 400–600 nm, 600–1000 nm, 1000–1300 nm, and 1300–1700 nm can be calculated.

Due to the passive (filter-based) system design, the spectral distribution of the ambient illumination has a strong influence on the multispectral images acquired by any of these systems. In contrast to this, the approach proposed in this work uses active small band illumination instead of filters and is widely independent of ambient light. It combines a comparably high acquisition speed with high spectral resolution and robust detection.

3. Camera System Design

The approach described in this work is composed of three major building blocks illustrated in Figure 2, which we explain in sequential order. This section describes the design goals and decisions for the camera system with a focus on the hardware. Section 3 presents the low-level image processing methods, while Section 4 will focus on higher level image processing and analysis.

3.1. Design Goals

In general, face detection approaches in the context of biometric applications have strong requirements with respect to robustness and speed of the detection. Here, robustness includes both accurate detection under varying external conditions such as lighting and a reliable exclusion of spoofing attacks.

Even though we do not tackle any specific application scenario, we formulate the following, rather generic design goals that allow the realization of various applications.(i)The imaging system should be independent of ambient light. The spectral distribution or any flickering of the light source must not distort the extracted spectral signatures.(ii)The acquisition time of a complete multispectral image stack should be as short as possible.(iii)Moving objects must not lead to false classifications.(iv)Face and disguise detection must work independent of a subject’s skin type, age, or gender.(v)The operation range should be oriented at typical cooperative user scenarios with short ranges of several meters (as opposed to long range imaging scenarios with distances of more than 100 meters [19]).(vi)The system should require only one single camera. This avoids the need to align the optical path of multiple cameras or to apply complex image registration methods and reduces the costs of the imaging system, as SWIR cameras are still very expensive.

None of the existing approaches described in Section 1 can reach all of these goals.

3.2. System Setup

Based on the specified design goals, we propose a system setup consisting of a single SWIR camera sensitive to a spectral range of 900–1700 nm with an attached LED ring light that illuminates the face of a subject in four distinct narrow wavebands within this spectral range (one at a time), as illustrated in Figure 3. A microcontroller system, which is embedded into the ring light module, triggers short pulses in alternating distinct wavebands and signals the camera to start and stop the exposure of a new image synchronized to the light pulse. The camera transmits the acquired images to a connected computer via Gigabit Ethernet, which in turn is connected to the microcontroller system via USB in order to configure and start the acquisition. We also developed a special software tool that allows a user to control the image acquisition and to perform all related image processing and analysis tasks with a graphical user interface.

3.3. Design of the LED Ring Light

Using LEDs to implement the illumination module is an obvious choice, as they produce rather narrow band illumination and can be pulsed with high intensities and variable frequencies. Based on findings in our previous work [16], we selected four wavebands for our current setup that are well suited for skin detection and designed an LED ring light with 90 LEDs. The number of LEDs for each waveband is shown in Table 1 and was chosen with regard to both the expected radiated power of each LED and a uniform distribution of the LEDs on the ring light.

A uniform distribution of the LEDs around the camera lens, as well as similar viewing angles and radiant patterns of the different LED types, is very important in order to achieve a homogeneous illumination. Otherwise, the extracted spectral signatures of an object would differ depending on the object’s position in relation to the ring light. To avoid this problem, we selected LEDs of the same model and manufacturer (Roithner-Laser ELD-935-525, ELD-1060-525, ELD-1300-535, and ELD-1550-525) and performed optical simulations to find the optimal distribution of the different numbers of LEDs per waveband. For this purpose, we modeled the single LEDs as light sources using the FRED Optical Engineering (Photon Engineering LLC, http://photonengr.com/) software by specifying their typical peak wavelengths, spectral and radiant power distributions as defined by their datasheets. FRED performs ray tracing to simulate the propagation of light from each light source to a virtual target plane. It also provides a scripting language and batch processing capabilities to run a series of simulations with different parameters. This way, we compared different placement patterns and varying positions for the LEDs by simulating the resulting intensity distribution for each waveband on the target plane. Ideally, the normalized intensity distributions of all wavebands should be identical, leading to a homogeneous “color” on the target. The best solution we found and the illumination distribution it created are shown in Figure 4. Due to the visualization of the four SWIR wavebands in yellow, red, green, and blue, the resulting (mixed) color is a soft yellow. Inhomogeneities would be noticeable by local changes in the color tone but cannot be observed.

3.4. Eye Safety Evaluation

Eye safety is a critical aspect of high power SWIR illumination sources, as radiation with a wavelength of up to 1400 nm can still penetrate the human eye and cause thermal damage to the retina. The directive 2006/25/EG of the European Union defines binding permissible limits for illumination systems with pulsed light sources, which should be measured as specified by the applicable standards. For our camera system, this is DIN EN 62471. The directive defines limits for the effective radiance on the retina, which is weighted by a factor depending on the wavelength of the radiation, and the total irradiance on the cornea in a measurement distance of  m.

As the necessary measurement setup was not available to us, we analyzed the incident power of the SWIR radiation on the eye of an observer standing in the “sweet spot” of the ring light based on the optical simulation. Assuming a pupil diameter of 7 mm, the maximum incident power at a distance of  m is achieved by the 935 nm waveband and reaches a level of  mW. This corresponds to a total irradiance of  W/m2.

Using a model of our ring light that is simplified in the “safe direction”, we cross-checked this result using the specifications given in the LEDs datasheet. The typical radiant intensity of one 935 nm LED is given as  W/sr. Now we assume (at worst case) that all LEDs for the 935 nm waveband are continuously powered and directly adjacent, so that the combined radiant intensity of LEDs can be approximated as and the radiating surface as . Now we can calculate and as follows:with being a correction factor according to directive 2006/25/EG and  m being the distance of an observer according to DIN EN 62471.

Table 2 shows both our results and the limits defined by the EU directive. As expected, the total irradiance calculated using the simplified “worst case” model is a little higher than the results from simulation, showing its plausibility. Still, the calculated values are by far below the permissible limits, even if the observer stares right into the ring light for a very long time. This leaves some headroom for further increases of the ring light’s output power.

3.5. Image Acquisition Principle

In practice, the ring light is working as a pulsed light source. The microcontroller system enables its different wavebands one after the other in a fixed order and simultaneously triggers the camera exposure. To remove the influence of ambient light, in each acquisition cycle an additional camera exposure is triggered without the ring light flashing. This reference image is subtracted from each of the other images in preprocessing, so that only light emitted by the ring light in one single waveband remains on these images, which we call waveband images. Each set of waveband images and its corresponding reference image are combined in a multispectral image stack. This method works well for ambient light from continuous light sources, such as daylight. Here, all light sources with intensity variations that are either very slow or very fast compared to one full acquisition cycle can be regarded as continuous. However, “flickering” or pulsed light sources, changing their intensity with frequencies in a magnitude similar to the acquisition frequency, might cause distortions of the spectral signatures. In practice, most flickering light sources are incandescent or fluorescent lamps, flickering at twice the local power line frequency of 50 Hz or 60 Hz, therefore having periods of 10 ms or  ms, respectively. By using exposure times matching this period or any multiples of it, their influence can easily be reduced to a negligible level.

Our current setup is based on an Allied Vision Goldeye G-032 SWIR camera, which is equipped with an indium gallium arsenide (InGaAs) sensor and features a maximum frame rate of 100 frames per second (FPS) at its full resolution of pixels with 14-bit A/D conversion. Due to the camera’s very short readout time, it can be operated at this frame rate with an exposure time close enough to 10 ms to remove the effect of flickering lamps. Figure 5 illustrates the chronological order of the signals given by the microcontroller system within one full acquisition cycle of 50 ms, resulting in an effective frame rate of  FPS.

4. Image Preprocessing

Each image acquired by the SWIR camera is transmitted to a PC via Gigabit Ethernet. Simultaneously, the microcontroller system tells the PC which waveband of the ring light has been active during the exposure via USB connection. Given this information, the software running on the PC performs several preprocessing steps to optimize and match the images in order to compose a multispectral image stack.

4.1. Fixed Pattern Noise Correction

Despite the camera’s internal two-point nonlinearity correction (NUC), underexposed images show significant fixed pattern noise depending on the actual pixel intensity. As the system design requires taking one reference image without flashing the ring light, this noise will have an influence on images taken in dark environments. To analyze the sensor’s behavior in detail, the sensor area was homogeneously illuminated using an adjustable quartz halogen lamp through an integrating (Ulbricht) sphere and 70 images with increasing brightness were taken. This image data is used as a look up table to apply a multiple-point nonlinearity correction to every single pixel. Figure 9 demonstrates the effectiveness of this method.

4.2. Motion Compensation

In the next step, the waveband images of one acquisition cycle are combined with a multispectral image stack. As the waveband images have been acquired sequentially, the positions of any moving object or person in the scene might have changed between each image of the stack. In practice, this will lead to motion artifacts and potentially cause false classifications due to distorted spectral signatures. This problem is common to all approaches that need to coregister sequentially acquired images, such as filter wheel camera systems [14].

To solve this problem, we propose a frame interpolation method based on motion estimation and compensation techniques to properly align all edges in every image of the stack. For this purpose, optical flow methods have proven to be a very effective, but computationally expensive approach [20]: sufficiently high performance for real-time applications can currently only be achieved by implementations using graphics hardware (GPUs). Hoegg et al. [21] demonstrated that this approach can also be used to compensate motion in coregistered sequential images acquired by a time of flight camera.

However, optical flow cannot be applied on our data directly, as illumination conditions and intensity values of object surfaces might differ strongly between the waveband images. In particular the first step in image merging, the subtraction of the (not actively illuminated) reference image, might cause problems: properly exposed image areas with much detail in the actively illuminated waveband images might be completely dark and without detail in the reference image.

Therefore, we use the following approach to motion compensation: consider a full multispectral image stack , with being a sequential number, consisting of images , acquired at times ,  . Furthermore, we assume a discrete and equidistant acquisition time for each image and a constant acquisition time for the full image stack, as illustrated in Figure 6.

As we cannot successfully apply optical flow directly to the sequence of images, that is, between and as shown in the upper row of Figure 7, we also consider a subsequent multispectral image stack and apply optical flow for corresponding images, that is, between and ,   in a bidirectional manner resulting in a set of displacement maps (vector fields). Consider

As and have both been acquired with the same illumination conditions, the results of this operation are much better, as shown in the lower row of Figure 7. Assuming a constant and linear motion between corresponding images and , every vector in the displacement maps describing the movement of pixel between and can be regarded as a linear combination of identical partial vectors describing a pixels movement between and . Based on this assumption, we now apply the forward and backward displacement maps partially to estimate the images at intermediate times , resulting inwhere indicates the application of displacement map to image .

Finally, for all ,  , the positions of moving objects will match their position in the reference image . Thus, any further processing, that is, subtracting from every waveband image ,  , and merging the images in one multispectral image stack, can be applied on this motion-corrected waveband images. For this application, the optical flow algorithm by Brox et al. [22], running on a GPU using a CUDA implementation, was found to be the best choice as it delivers very good results combined with acceptable run-times. Results of the motion compensation approach are presented in Section 5.

4.3. Calibration

With the multispectral image stack being properly aligned and the ambient illumination subtracted from all waveband images, lens distortion and differences in the illumination intensities can be corrected as last step in the image preprocessing. For this purpose, three sets of multispectral image stacks are recorded for each lens. A checkerboard calibration pattern is used to calculate a correction matrix for the lens distortion for every waveband individually to compensate for different distortion characteristics due to lateral chromatic aberration of the lens. Additionally, a plain white surface is used to measure both vignetting of the lens and light distribution of the ring light for each waveband and to calculate a respective correction matrix that normalizes the illumination intensity over the image area. Finally, a “white reference” tile with uniform remission characteristics in the SWIR spectral range is used to measure absolute differences in illumination intensities between the wavebands, which are stored as a vector of correction factors for each waveband. This waveband specific correction data is applied on every image of the multispectral image stack after the reference image has been subtracted.

5. Image Analysis

The multispectral image stacks acquired by the camera system are automatically analyzed by software in two steps: first, a skin classification method analyzes the spectral signature of each pixel to detect areas that show human skin. Second, a face detection algorithm searches for faces in the 1060 nm waveband image, as this waveband is very well suited for this purpose: the remission intensity of skin is comparably high, with eyes and mouth appearing darker. Finally, the locations of detected faces are matched against the results of the skin classification in order to verify their authenticity.

5.1. Skin Classification

To optimize both classification accuracy and run-time performance, the skin classification method consists of two algorithms, one for coarse-grained and one for fine-grained classification. Both algorithms perform pixelwise classification using the spectral signatures of the individual pixels as follows:with each ,  , being the greyscale value of the examined pixel in spectral image of the multispectral image stack , which consists of spectral images.

For each pixel , the first algorithm calculates normalized differences for all possible combinations of greyscale values within as follows: with and . So for , we get a vector of normalized differences withfor each pixel . The normalized differences range from . In contrast to the values of the spectral signatures, they are independent of the absolute brightness of the analyzed pixel , which differs with the measurement distance. This allows for a robust and fast classification of skin-like materials by specifying upper and lower thresholds for each normalized difference. However, this “difference filter” algorithm is not capable of distinguishing skin from materials that are very similar to skin, such as some kinds of silicon used for the creation of masks.

Therefore, a second classification algorithm is applied on the samples classified as “skin-like.” Based on results of our previous work [23], we use support vector machines (SVMs) for this fine-grained classification. The SVMs were trained using normalized difference vectors , which were calculated (as described above) based on spectral signatures extracted from multispectral images of skin, skin-like materials, and other materials acquired with the presented camera system. As shown in Section 5, the SVM classifier performs much better than the difference filter but has a much higher computational complexity. Limiting the SVM classification to those samples that have been positively classified by the difference filter significantly reduces the typical run-time of the skin detection. In addition, outliers and “unknown” material samples (samples that were not included in the training data) are less likely to create false positives when using two different classifiers. All pixels classified as skin are stored in a binary image with representing skin and representing no-skin.

5.2. Face Detection

In the second step of the image analysis, we apply state of the art face detection algorithms on the 1060 nm waveband image to detect faces. We tested both the proprietary FaceVACS software from Cognitec Systems GmbH and an open source implementation of a local binary pattern histogram (LBPH) based face recognizer for this purpose. The result of the skin classification can optionally be used to improve the performance of the face detection algorithm by limiting the search for faces to areas in which skin has been detected. To verify the faces found in the image, their locations are matched with the result of the skin detection method. A face is verified as authentic if the ratio of “skin” to “no-skin” pixels within the facial area is above a specified threshold.

6. Results and Discussion

The results of our work are separated into four subsections: first, we present images acquired with our multispectral camera system and the results of the image processing methods. Second, we describe the design of a study with (by the time of writing) more than 130 participants and present the acquired data, consisting of both spectrometer data and multispectral images. Based on this data, the performance and robustness of the proposed skin detection and classification approach are analyzed. Finally, the performance of the fake detection approach is evaluated.

6.1. Acquisition Quality and Performance

Figure 8 shows an example of the multispectral image stack acquired by our camera system after image processing, consisting of four waveband images and the reference image used to compensate for ambient light, as well as a color image taken with a high quality RGB camera for comparison.

Due to insufficiently corrected axial chromatic aberrations of the camera’s lens leading to a focus shift with increasing wavelengths, it is impossible to have all waveband images perfectly focused at the same time. This effect can only be reduced by stopping down the lens to a smaller aperture. As only the 1060 nm waveband image is used for face detection, we focus on this waveband image and accept a slight falloff in sharpness on the other waveband images.

6.1.1. Influence of Ambient Light

To evaluate the influence of ambient light on the camera system, a series of images of a reference target positioned in a distance of ≈1.5 m was taken with varying illumination conditions. The averaged illumination intensities measured on the reference target are shown in Figure 11. In this measurement, the ambient light is not yet subtracted from the signal pulses. Fluorescent lamps are barely visible for the SWIR camera, while daylight and incandescent lamps might increase the overall brightness significantly. Even without reaching saturation, the sensor shows some nonlinear behavior with increasing brightness levels: the actual signal strength, that is, the difference between the remission intensities with active ring light illumination and ambient light only, decreases by up to ≈20% between dark and bright ambient illumination. However, the relative intensity differences between the wavebands stay almost the same and the influence on the normalized differences between the wavebands is only very small as long as the sensor is not saturated. Saturation can be avoided easily by dynamically reducing the exposure time. However, this will also reduce the acquired remission intensity of the SWIR pulses. Therefore, ambient light can be widely neglected but might reduce the maximum operation distance of the camera system.

6.1.2. Operation Range

The maximum operation distance of the camera system depends on several factors. The most important one is the radiated power of the ring light: with increasing distance to a target, the acquired remission intensities (the “signal”) will strongly decrease until they can no longer be distinguished from noise. In addition, as described before, with increasing ambient light the signal strength slightly decreases, while the absolute (shot) noise increases [24]. To evaluate the quality of the signal, we measured both the noise level in the reference image and the signal amplitude for a target at different distances in both dark and bright environments and calculated the signal to noise ratio (SNR) according to [25] as follows: with being the average signal amplitude on the target and being the standard deviation within the same area in the reference image. Results are presented in Table 3. In our experiments, a SNR ≥ 20 dB was enough to ensure reliable skin classification. Therefore, even in bright daylight conditions (overcast sky at noon), the system can operate at distances of up to at least 4 meters.

Besides the signal to noise ratio, the resolution and field of view of the camera system also put a limit on the operation range. For reliable face detection and recognition, current state of the art algorithms require the image of a face to have an eye-to-eye resolution of pixels [4] or ≈1 pixel/mm. For our camera, we selected a lens with a focal length of 50 mm, which results in an angle of view of and an operation distance of .

6.1.3. Calibration Results

Figure 9 shows the effectiveness of the fixed pattern noise correction method: it presents a “false color” representation of the upper three wavebands before and after correction. The 1060 nm waveband is mapped to the red (R), the 1300 nm waveband to the green (G), and the 1550 nm waveband to the blue (B) channel.

An evaluation of the illumination intensity and homogeneity of the ring light showed some unexpected results. First, the 935 nm waveband appears much darker than the other wavebands, although the combined radiated power of all 935 nm LEDs is much higher than that of the other wavebands. A likely explanation is the characteristic of the camera’s sensor, which is less sensitive in this waveband. Second, despite coming from the same manufacturer and having similar packages, the different LED types have slightly different radiant patterns. Therefore, in practice, the light distribution is not as good as the simulated distribution. However, both the absolute intensity differences and the inhomogeneity can be corrected by applying the calibration data, as shown in Figure 12.

6.1.4. Motion Compensation

The results of the motion compensation approach are shown in Figure 10, with the original image on the left and the corrected image on the right, both represented as false color images with 3 wavebands. With a GPU-accelerated implementation using CUDA, the method based on the dense optical flow algorithm by Brox et al. [22] currently requires ≈110 ms to process the 3 images on our machine (intel Core i7 4771 CPU, nVidia GTX 780 graphics card, Ubuntu Linux 14.04 64 bit, GCC5.3, CUDA 6.5). When motion compensation is applied in real-time on a stream of acquired images, it becomes the bottleneck of the entire image processing chain and limits the frame rate of the camera system to currently ≈9 FPS with 3 or ≈6.5 FPS with 4 wavebands. Without motion compensation, the performance is only limited by the camera system’s maximum frame rate of  FPS with 3 or  FPS with 4 wavebands.

6.2. Study Design

In order to evaluate the robustness of our approach to skin detection and to gather training data for the classification algorithms, we designed a study to acquire images of a representative number of persons with both our camera system and an RGB camera (Canon EOS 50D), as well as spectrometer (TQ irSys 1.7) data in the spectral range of 660 nm to 1700 nm. By the time of writing, the study is still ongoing. A subset of the resulting database, reduced by the images of participants that do not agree to publication, will be made available to the research community by the time this work is published.

In the following, we present data from 135 participants. Multispectral SWIR images were taken of all 135 persons (76 women, 59 men), while RGB images and spectrometer measurements have only been acquired for 120 of them (73 women, 47 men). As the study was conducted at our university, the most common skin types were 2 and 3 and most of our participants were between 20 and 29 years old with an average of ≈28. The respective frequency distributions are shown in Tables 4 and 5. It has to be noted that several of our participants have been wearing make-up. As this will be a common situation in real-life applications, testing the influence of make-up was part of this study.

For each subject, spectrometer data was acquired at 16 measuring points on face and arms: 5 points on the face (forehead, nose, cheek frontal and sideways, and the chin), 3 at the neck (front, sideways, and back), 2 at the ear, 4 at the arm (front and back of both upper arm and forearm), and 2 at the hand (palm and back). These points have been chosen as they cover all skin regions that are typically expected in the field of view of a camera meant for face detection.

With both the RGB camera and the multispectral camera system, 7 portrait pictures were taken for each subject: three frontal shots with different facial expressions, two shots from an angle of , and two profile shots from an angle of . Subjects wearing glasses were asked to take them off for these shots. In this case, we added an additional image with glasses on for comparison.

In Figure 13, we present both RGB and (false color) multispectral SWIR portrait images of six participants of our study representing the skin types 1 to 6 after Fitzpatrick [11]. As expected, the obvious differences of the skin color in the RGB images are almost neglectable in the SWIR images.

6.3. Robustness of Skin Detection and Classification

In the following, we will analyze both spectrometer and camera data in detail in order to prove the validity of our approach to skin detection.

6.3.1. Spectrometer Data

For this evaluation, we used spectrometer data from only 8 of the 16 measuring points of 101 subjects, leaving out hands, arms, and ears, resulting in a total of 808 skin samples. We combined these samples with 336 samples of different materials (including different plastics, textiles, metal, and wood) and transformed the spectrometer data by applying a model of the ring light’s LEDs in order to simulate the expected spectral signatures of the camera system. For this purpose, each samples’ reflectance spectrum is convoluted with each LED’s emission spectrum [26].

We calculated the normalized differences between all wavebands of the spectral signatures for all samples and applied a principal component analysis (PCA) on the data set. Figure 14 presents a plot of the two main components, which already separate most of the samples. Using difference filters by specifying minimum and maximum thresholds for each normalized difference in , all skin samples can be separated perfectly from all material samples, as shown in Table 6.

6.3.2. Camera Data

To analyze the data acquired with the camera system, we extracted the spectral signatures of skin and a variety of other materials from the images taken during the study. Pixels showing skin are stored as positive examples and “no-skin” pixels as negative examples. Similar to the spectrometer data, we applied a PCA on this data set. The two main components are illustrated in Figure 15 and perfectly separate the two classes. However, the difference filter classifier cannot separate all skin samples from all material samples, as shown in Table 7: some material samples belonging to “CP-Flesh,” a silicon mixture specifically designed to imitate human skin, show up as false positives. Therefore, we used LibSVM to train a SVM classifier on the data set. To evaluate the SVM’s performance, we applied a tenfold cross validation, with each fold randomly choosing 90% of the samples for training and 10% for testing. The results of the SVM are shown in Table 8: skin and material can be separated perfectly.

By analyzing the data and reviewing the acquired images, we did not find a significant influence of make-up on the skin classification results. Therefore, we asked one subject to use very large amounts of make-up and powder and acquired additional images. We found that only very thick layers of powder, which are clearly visible in both the RGB and the SWIR images, could influence the spectral signatures enough to lead to false negative results. Therefore, our approach to skin detection proves to be robust against different skin types, typical make-up, and varying measurement conditions.

6.4. Evaluation of Face Verification

To analyze the face verification performance of the presented camera system, we first evaluated the usability and quality of the acquired images for face detection. Then, we tested the skin classification performance of our approach on different fakes and compared the results to the acceptance rate of state of the art face recognition software.

6.4.1. Usability of SWIR Images

To evaluate the usability of the SWIR images, we trained both the proprietary state of the art FaceVACS and the openCV implementation of the LBPH face recognizer with the RGB face images acquired in the context of our study. Then we fed the algorithms with the face images acquired with the multispectral camera system and tried to identify and verify the faces using only the 1060 nm waveband image.

FaceVACS identified all faces correctly. Furthermore, it verified of all faces with a probability score of and with . Only of all faces were verified with a probability score of , with being the minimum. These rare examples of low probability have been investigated in detail and might be caused by strong highlights in the eyes (reflections from the ring light) or differing head poses. However, the acceptance threshold of was met by all test images.

In contrast to this, the LBPH face recognizer did a surprisingly bad job: it identified only of all 1060 nm face images correctly and calculated very low confidence values for those that it actually verified. We compared this result to its performance when trained on additional SWIR images (which were not part of the test samples) and got a much better result of with much better confidence values for the verified test images. We conclude that the classifier used by this face recognizer uses features that are not invariant to absolute greyscale values and excluded this algorithm from the further evaluation.

6.4.2. Fake Detection Performance

In the context of a previous research project together with the German Federal Office for Information Security (Bundesamt für Sicherheit in der Informationstechnik, BSI), several photo-fakes and masks, which mimic the face of one of our test subjects, were manufactured in order to test the vulnerability of face recognition systems and to develop respective countermeasures. Different materials have been used for these masks, including special silicon mixtures, plastics, hard resin, textiles, and paper. Make-up and paint have been applied to the masks to make them more realistic. With the genuine face of the subject enrolled in FaceVACS, all fakes and masks achieved a probability of more than 70% in verification when pictured using an RGB camera and were accepted, except for the paper mask. In particular photo-fakes and prints on t-shirts achieved very high scores in FaceVACS due to a missing liveness detection.

Using images acquired with our camera system, most of the fakes achieved much lower scores in FaceVACS even without skin classification, because the colorants used are less visible in the SWIR range. This applies to most of the photo-fakes and prints, as well as plastics and hard resin masks: verification scores drop from down to less than . Figure 16 shows RGB and SWIR (false color) images of a subject wearing a mask created using a 3D printer, which is much easier to detect in the SWIR image. Transparent silicon masks, however, are still a significant problem.

Adding spectrometer measurements of all fakes to our database and training a new difference filter classifier showed that none of the skin-like fakes, such as the silicon masks, could be separated from skin easily. The same holds true for the camera data: we added images of the fakes to our data set and applied the difference filter classifier on it. The results are shown in Table 9: with this data set, more than of the material samples are classified as skin, namely, all of the silicon masks. Fortunately, a SVM classifier produces a much better result and achieves a precision of in a tenfold cross validation, as shown in Table 10: 87 () of the skin samples are rejected, but only 40 () of the material samples are classified as skin. As each sample is a single pixel of an image, this error will not have a big influence in reality.

Finally, we tested the classifiers with a new data set. We took images of two subjects with and without silicon masks and applied both the difference filter and the SVM classifier successively on the images. The results of the difference filter are shown in the upper half of Figure 17: the classifier detects all skin pixels correctly but also classifies most of the fake pixels as skin.

A set of both true and false positive samples from the results of the difference filter classifier was annotated with correct classes and used as test set for the SVM classifier. The results are almost perfect, as shown in Table 11: only 16 samples (=pixels) of the fake material are still classified as skin, while no true skin pixels were rejected. These results also hold true in practice, as shown in the lower half of Figure 17: only pixels showing uncovered skin are left in the image, while the mask pixels are rejected. Thus, the detected faces without masks are verified as authentic with a very high ratio of skin to no-skin pixels within the facial area, while the faces with masks are reliably rejected as fakes.

7. Conclusions

We proposed an active multispectral SWIR camera system for real-time face detection and verification. The system acquires four-band multispectral image stacks within an acquisition time of 50 ms. The extraction of spectral signatures from the acquired images allows for reliable skin detection independent of skin type. Our approach requires only one SWIR camera and uses active small band illumination based on pulsed LEDs, making it widely independent of ambient light. Motion artifacts at moving objects due to sequential acquisition of waveband images are effectively removed by using optical flow based motion compensation techniques. The system can be used for a variety of application scenarios without the need for regular calibration.

For the application of face detection, recognition, and verification, the active frontal SWIR illumination ensures robust face detection and extraction of facial features. Based on the acquired multispectral images, the proposed analysis methods allow detecting spoofing attacks using fakes or facial disguises such as silicon masks, which are still a big problem for state of the art face recognition systems, with significantly improved reliability.

In addition to the camera system, we presented a database of face images from several subjects in different poses and perspectives, acquired with both our camera system and an RGB camera, supplemented by spectrometer data in the wavelength range between 660 nm and 1700 nm. This database will be made available to the research community on our website by the time this work is published.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


This work was funded by the German Federal Ministry of Education and Research as part of the program “FHprofUnt” (FKZ: 03FH044PX3) and supported by the German Research Foundation (DFG) as part of the research training group GRK 1564 “Imaging New Modalities.” The authors also thankfully acknowledge the support of the German Federal Office for Information Security (BSI).