Abstract

This paper presents a new anthropometrics-based method for generating realistic, controllable face models. Our method establishes an intuitive and efficient interface to facilitate procedures for interactive 3D face modeling and editing. It takes 3D face scans as examples in order to exploit the variations presented in the real faces of individuals. The system automatically learns a model prior from the data-sets of example meshes of facial features using principal component analysis (PCA) and uses it to regulate the naturalness of synthesized faces. For each facial feature, we compute a set of anthropometric measurements to parameterize the example meshes into a measurement space. Using PCA coefficients as a compact shape representation, we formulate the face modeling problem in a scattered data interpolation framework which takes the user-specified anthropometric parameters as input. Solving the interpolation problem in a reduced subspace allows us to generate a natural face shape that satisfies the user-specified constraints. At runtime, the new face shape can be generated at an interactive rate. We demonstrate the utility of our method by presenting several applications, including analysis of facial features of subjects in different race groups, facial feature transfer, and adapting face models to a particular population group.

1. Introduction

One of the most challenging tasks in graphics modeling is to build an interactive system that allows users to model varied, realistic geometric models of human faces quickly and easily. Applications of such a system range from entertainment to communications: virtual human faces need to be generated for movies, computer games, advertisements, or other virtual environments, and facial avatars are needed for video teleconference and other instant communication programs. Some authoring tools for character modeling and animation are available (e.g., Maya [1], Poser [2], DazStudio [3], PeoplePutty [4]). In these systems, deformation settings are specified manually over the range of possible deformation for hundreds of vertices in order to achieve desired results. An infinite number of deformations exist for a given face mesh that can result in different shapes ranging from the realistic facial geometries to implausible appearances. Consequently, interactive modeling is often a tedious and complex process requiring substantial technical as well as artistic skill. This problem is compounded by the fact that the slightest deviation from real facial appearance can be immediately perceived as wrong by the most casual viewer. While the exiting systems have exquisite control rigs to provide detailed control, these controls are based on general modeling techniques such as point morphing or free-form deformations, and therefore lack intuition and accessibility for novices. Users often face a considerable learning curve to understand and use such control rigs.

To address the lack of intuition in current modeling systems, we aim to leverage the anthropometrical measurements as control rigs for 3D face modeling. Traditionally, anthropometry—the study of human body measurement—characterizes the human face using linear distance measures between anatomical landmarks or circumferences at predefined locations [5]. The anthropometrical parameters provide a familiar interface while still providing a high level of control to users. While this is a compact description, they do not uniquely specify the shape of the human face. Furthermore, particularly for computer face modeling, the sparse anthropometric measurements taken at a small number of landmarks on the face do not capture the detailed shape variations needed for realism. The desire is to map such sparse data into a fully reconstructed 3D surface model. Our goal is a system that uses model priors learned from prerecorded facial shape data to create natural facial shapes that match anthropometric constraints specified by the user. The system can be used to generate a complete surface mesh given only a succinct specification of the desired shape, and it can be used by expert and novice alike to create synthetic 3D faces for myriad uses.

1.1. Background and Previous Work

A large body of literature on modeling and animating faces has been published in the last three decades. A good overview can be found in the textbook [6] and in the survey [7]. In this work, we focus on modeling static face geometry. In this context, several approaches have been proposed. They can be roughly classified into the creative approach and the reconstructive approach.

The creative approach is to facilitate manual specification of the new face model by a user. Parametric face models [811] and many commercial modelers fall into this approach. The desire is to create an encapsulated model that can generate a wide range of faces based on a small set of input parameters. They provide full control over the result, including the ability to produce cartoon effects and the high efficiency of geometric manipulation. However, manual parameter tuning without geometric constraints from real human faces for generating realistic faces is difficult and time-consuming. Moreover, the choice of the parameter set depends on the face mesh topology and therefore the manual association of a group of vertices to a specific parameter is required.

The reconstructive approach is to extract face geometry from the measurement of a living subject. The reconstructive approach is to extract face geometry from the measurement of a living subject. In this category, the image-based technique [1218] utilizes an existing 3D face model and information from few pictures (or video streams) for the reconstruction of face geometry. Although this kind of technique can provide reconstructed face models easily, its drawbacks are the inaccurate geometry reconstruction and inability to generate new faces that have no image counterparts. Another limiting factor of this technique lies in that it gives very little control to the user.

With a significant increase in the quality and availability of 3D capture methods, a common approach towards creating face models uses laser range scanners to acquire both the face geometry and texture simultaneously [1922]. Although the acquired face data is highly accurate, unfortunately, substantial effort is needed to process the noisy and incomplete data into a model suitable for modeling or animation. In addition, the result of this effort is a model corresponding to a single individual; and each new face must be found on a subject. The desired face may not even physically exist. Furthermore, the user does not have any control over the captured model to edit it in a way that produces a novel face.

Besides these approaches, DeCarlo et al. [23] construct a range of face models with realistic proportions using a variationally constrained optimization technique. However, without the use of the model priors, their method cannot generate natural models unless the user accurately specifies a very detailed set of constraints. Also, this approach requires minutes of computation for the optimization process to generate a face model. Blanz and Vetter [24] present a process for estimating the shape of a face from a single photograph. This is extended by Blanz et al. [25], who present a set of controls for intuitive manipulation of facial attributes. In contrast to our work, they manually assign attribute values to characterize the face shape, and devise attribute controls using linear regression. Vlasic et al. [26] use multilinear face models to study and synthesize variations in faces along several axes, such as identity and expression. An interface for gradient-based face space navigation has been proposed in [27]. Principal components that are not intuitive to users are used as navigation axes in face space, and facial features cannot be controlled individually. The authors focus on a comparison of different user interfaces.

Several commercial systems for generating composite facial images are available [2830]. Although they are effective to use, a 2D face composite still lacks some of the advantages of a 3D model, such as the complete freedom of viewpoint and the ability to be combined with other 3D graphics. Additionally, to our knowledge, no commercial 2D composite system available today supports automatic completion of unspecified facial regions according to statistical properties. FaceGen 3 [31] is the only existing system that we have found to be similar to ours in functionality. However, there is not much information available about how this function is achieved. As far as we know, it is built on [24] and the face mesh is not divided into different independent regions for localized deformation. In consequence, editing operations on individual facial features tend to affect the whole face.

1.2. Our Approach

In this paper, we present a new method for interactively generating facial models from user-specified anthropometric parameters while matching the statistical properties of a database of scanned models. Figure 1 shows a block diagram of the system architecture. We use a three-step model fitting approach for the 3D registration problem. By bringing scanned models into full correspondence with each other, the shape variation is represented by using principal component analysis (PCA), which induces a low-dimensional subspace of facial feature shapes. We explore the space of probable facial feature shapes using high-level control parameters. We parameterize the example models using the face anthropometric measurements, and predefine the interpolation functions for the parameterized example models. At runtime, the interpolation functions are evaluated to efficiently generate the appropriate feature shapes by taking the anthropometric parameters as input. Apart from an initial tuning of feature point positions, our method works fully automatically. We evaluate the performance of our method with cross-validation tests. We also compare our method against optimization in the PCA subspace for generating facial feature shapes from constraints of the ground truth data.

In addition, the anthropometric-based face synthesis method, combined with our database of statistics for a large number of subjects, opens ground for a variety of applications. Chief among these is analysis of facial features of different races. Second, the user can transfer facial feature(s) from one individual to another. This allows a plausible new face to be quickly generated by composing different features from multiple faces in the database. Third, the user can adapt the face model to a particular population group by synthesizing characteristic facial features from extracted statistics. Finally, our method allows for compression of data, enabling us to share statistics with the research community for further study of faces.

Unlike a previous approach [23], we utilize the prior knowledge of the face shape in relation to the given measurements to regulate the naturalness of modeled faces. Moreover, our method efficiently generates a new face with the desired shape within a second. Our method also differs significantly from the approach presented in [24, 25] in several respects. First, they manually assign the attribute values to the face shape and devise attribute controls for single control using linear regression. We automatically compute the anthropometric measurements for face shape and relate several attribute controls simultaneously by learning a mapping between the anthropometric measurement space and the feature shape space through scattered data interpolation. Second, they use a 3D variant of a gradient-based optical flow algorithm to derive the point-to-point correspondence between scanned models. This approach does not work well for faces of different races or in different illumination given the inherent problem of using static textures. We present a robust method of determining correspondences that does not depend on the texture information. Third, their method tends to control the global face and requires additional constraints to restrict the effect of editing operations to a local region. In contrast, our method guarantees local control thanks to its feature-based nature.

The main contributions of our work are as follows.

(i)A general, controllable, and practical system for face modeling and editing. Our method estimates high-level control models in order to infer a particular face from intuitive input controls. As correlations between control parameters and the face shape are estimated by exploiting the real faces of individuals, our method regulates the naturalness of synthesized faces. Unspecified parts of the synthesized facial features are automatically completed according to statistical properties.(ii)We propose a new algorithm which uses intuitive attribute parameters of facial features to navigate face space. Our system provides sets of comprehensive anthropometric parameters to easily control face shape characteristics, taking into account the physical structure of real faces.(iii)A robust, automatic model fitting approach for establishing correspondences between scanned models.(iv)The automatic runtime synthesis is efficient in time complexity and performs fast.

The remainder of this paper is organized as follows: Section 2 presents the face data we use. Section 3 elaborates on the model fitting technique. Section 4 describes the construction of local shape spaces. The face anthropometric parameters used in our work are illustrated in Section 5. Section 6 and Section 7 describe our techniques of feature-based shape synthesis and subregion blending, respectively. After presenting and explaining the results in Section 8, we present a variety of applications of our approach in Section 9. Section 10 gives concluding remarks and our future work.

2. Scanned Data and Preprocessing

We use the USF face database [32] that consists of Cyberware face scans of 186 subjects with a mixture of gender, race, and age. The age of the subjects ranges from 17 to 68 years, and there are 126 male and 60 female subjects. Most of the subjects are Caucasians (129), with African-Americans making up the second largest group (37), and Asians the smallest group (20). All faces are without makeup and accessories. The laser scans provide face structure data which contains approximately 180 k surface points and a 360×524 reflectance (RGB) image for texture-mapping (see Figures 2(a) and 2(b)). We also use a generic head model which consists of 1.092 vertices and 2.274 triangles. Prescribed colors are added to each triangle to form a smooth-shaded surface (see Figure 2(c)).

Let each 3D face scan in the database be 𝑆𝑖(𝑖=1,,𝑀). Since the number of vertices in 𝑆𝑖 varies, we resample all faces in the database so that they have the same number of vertices all in mutual correspondence. Feature points are identified semi-automatically to guide the resampling. Figure 3 depicts the process. As illustrated in Figure 3(a), a 2D feature mask consisting of polylines groups a set of 86 feature points that correspond to the feature point sets of MPEG-4 Facial Definition Parameters (FDPs) [33]. The feature mask is superimposed onto the front-view face image obtained by orthographic projection of a textured 3D face scan into an image plane. The facial features in this image are identified by using the Active Shape Models (ASMs) [34] and the feature mask is fitted to the features automatically. The 2D feature mask can be manipulated interactively. A little user interaction is needed to tune the feature point positions due to the slight inaccuracy of the automatic facial feature detection. But this is restricted to slight corrections of wayward feature points. The 3D positions of the feature points on the scanned surface are then recovered by back-projection to the 3D space. In this way, we efficiently define a set of feature points on a scanned model 𝑆𝑖 as 𝑈𝑖={𝐮𝑖,1,,𝐮𝑖,𝑛}, where 𝑛=86. Our generic model 𝐺 is already tagged with the corresponding set of feature points 𝑉={𝐯1,,𝐯𝑛} by default.

3. Model Fitting

3.1. Global Warping

The problem of deriving full correspondence for all models 𝑆𝑖 can be stated as: resample the surface for all 𝑆𝑖 using 𝐺 under the constraint that 𝐯𝑗 is mapped to 𝐮𝑖,𝑗. The displacement vector 𝐝𝑖,𝑗=𝐮𝑖,𝑗𝐯𝑗 is known for each feature point 𝐯𝑗 on the generic model and 𝐮𝑖,𝑗 on the scanned surface. These displacements are utilized to construct the interpolating function that returns the displacement for each generic mesh vertex: 𝐟(𝐱)=𝑛𝑗=10𝑥0200𝑑𝐰𝑗𝜙𝑗𝐱𝐯𝑗+𝐌𝐱+𝐭,(1) where 𝐱3 is a vertex on the generic model, denotes the Euclidean norm and 𝜙 is a radial basis function. 𝐰𝑗, M and t are the unknown parameters. Among them, 𝐰𝑗3 are the interpolation weights, 𝐌3×3 represents rotation and scaling transformations, and 𝐭3 represents translation transformation.

Different functions for 𝜙(𝑟) are available [35]. We had better results with the multi-quadric function 𝜙(𝑟)=𝑟2+𝜌2, where 𝜌 is the locality parameter used to control how the basis function is influenced by neighboring feature points. 𝜌 is determined as the Euclidean distance to the nearest other feature point. To determine the weights 𝐰𝑗 and the affine transformation parameters M and t, we solve the following equations: 𝐝𝑖,𝑗𝐯=𝐟𝑗|𝑛𝑗=1,𝑛𝑗=10𝑥0200𝑑𝐰𝑗=0,𝑛𝑗=10𝑥0200𝑑𝐰𝑇𝑗𝐯𝑗=0.(2) This system of linear equations is solved using the LU decomposition to obtain the unknown parameters. Using the predefined interpolation function as given in (1), we calculate the displacement vectors of all vertices to deform the generic model.

3.2. Local Deformation

The warping with a small set of correspondences does not produce a perfect surface match. We further improve the shape using a local deformation which fits the globally warped generic mesh 𝐺 to the scanned model 𝑆𝑖 by iteratively minimizing the distance from the vertices of 𝐺 to the surface of 𝑆𝑖. To optimize the positions of vertices of 𝐺, the local deformation process minimizes an energy function: 𝐸𝐺=𝐸ext𝐺,𝑆𝑖+𝐸int𝐺(3) where 𝐸ext stands for the external energy and 𝐸int the internal energy.

The external energy term 𝐸ext attracts the vertices of 𝐺 to their closest compatible points on 𝑆𝑖. It is defined as 𝐸ext𝐺,𝑆𝑖=𝑁𝐺𝑗=10𝑥0200𝑑𝜁𝑗𝐱𝑗𝐬𝑗2,(4) where 𝑁𝐺 is the number of vertices on the generic mesh, 𝐱𝑗 is the 𝑗th mesh vertex, and 𝐬𝑗 is the closest compatible point of 𝐱𝑗 on 𝑆𝑖. The weights 𝜁𝑗 measure the compatibility of the points on 𝐺 and 𝑆𝑖. As 𝐺 closely approximates 𝑆𝑖 in the global warping procedure, we consider a vertex on 𝐺 and a point on 𝑆𝑖 to be highly compatible if the surface normals at each point have similar directions. Hence, we define 𝜁𝑗 as: 𝜁𝑗=𝐧𝐱𝑗𝐬𝐧𝑗𝐱if𝐧𝑗𝐬𝐧𝑗>00otherwise,(5) where 𝐧(𝐱𝑗) and 𝐧(𝐬𝑗) are the surface normals at 𝐱𝑗 and 𝐬𝑗, respectively. In this way, dissimilar local surface patches are less likely to be matched, for example, front-facing surfaces will not be matched to back-facing surfaces. To accelerate the minimum-distance calculation, we precompute a hierarchical bounding box structure for 𝑆𝑖 so that the closest triangles are checked first.

The transformations applied to the vertices within a region of the surface may differ from each other considerably, resulting in a non-smoothly deformed surface. To enforce local smoothness of the mesh, the internal energy term 𝐸int is introduced as follows: 𝐸int𝐺=𝑁𝐺𝑗=10𝑥0200𝑑𝑘Ω𝑗𝐱0𝑥0200𝑑𝑗𝐱𝑘̃𝐱𝑗̃𝐱𝑘2,(6) where Ω𝑗 is the set grouping all neighboring vertices 𝐱𝑘 that are linked by edges to 𝐱𝑗, and ̃𝐱𝑗 and ̃𝐱𝑘 are the original positions of 𝐱𝑗 and 𝐱𝑘 before local deformation. Including this energy term constrains the deformation of the generic mesh and keeps the optimization from converging to a solution far from the initial configuration.

Minimizing 𝐸(𝐺) is a nonlinear least-square problem and optimization is performed using L-BFGS-B, which is a quasi-Newtonian solver [36]. The optimization stops when the difference between 𝐸 at the previous and current iterations drops below a user-specified threshold. After the local deformation, each mesh vertex takes texture coordinates associated with its closest scanned data point for texture mapping. Finally, we reconstruct surface details in a hierarchical manner by taking advantage of the quaternary subdivision scheme and normal mesh representation [37]. Figure 4 shows the results of model fitting. Hence, a spatial correspondence is established by the generated normal meshes.

4. Forming Feature Shape Spaces

We perceive the face as a set of features. In this work, the global face shape is also regarded as a feature. Since all face scans are in correspondence through mapping onto the generic model, it is sufficient to define the feature regions on the generic model. We manually partition the generic model into four regions: eyes, nose, mouth and chin. This segmentation is transferred to all normal meshes to generate individualized feature shapes with correspondences (see Figure 5). In order to isolate the shape variation from the position variation, we normalize all scanned models with respect to the rotation and translation of the face before the model fitting process.

We form a shape space for each facial feature using PCA. Given the set Γ={𝐹} of features, let {𝐹𝑖}𝑖=1,,𝑀 be a set of example meshes of a feature 𝐹, each mesh being associated to one of the 𝑀 scanned models in the database. These meshes are represented as vectors that contain the 𝑥, 𝑦, 𝑧 coordinates of 𝑁 vertices 𝐹𝑖=(𝑥𝑖1,𝑦𝑖1,𝑧𝑖1,,𝑥𝑖𝑁,𝑦𝑖𝑁,𝑧𝑖𝑁)3𝑁. The average over 𝑀 example meshes is given by 𝜓0=(1/𝑀)𝑀𝑖=10𝑥0200𝑑𝐹𝑖. Each example mesh differs from the average by the vector 𝑑𝐹𝑖=𝐹𝑖𝜓0. We arrange the deviation vectors into a matrix 𝐂=[𝑑𝐹1,𝑑𝐹2,,𝑑𝐹𝑀]3𝑁×𝑀. PCA of the matrix 𝐂 yields a set of 𝑀 non-correlated eigenvectors 𝜓𝑖 and their corresponding eigenvalues 𝜆𝑖. The eigenevectors are sorted according to the decreasing order of their eigenvalues. Every example model can be regenerated using (7). 𝐹𝑖(𝜶)=𝜓0+𝐾𝑗=10𝑥0200𝑑𝛼𝑖𝑗𝜓𝑗,(7) where 0<𝐾<𝑀 and 𝛼𝑖𝑗=(𝐹𝑖𝜓0)𝜓𝑗 are the coordinates of the example mesh in terms of the reduced eigenvector basis. We choose the 𝐾 such that 𝐾𝑖=10𝑥0200𝑑𝜆𝑖𝜏𝑀𝑖=10𝑥0200𝑑𝜆𝑖, where 𝜏 defines the proportion of the total shape variation (98% in our experiments). In this model each eigenvector is a coordinate axis. We call these axes eigenmeshes.

5. Anthropometric Parameters

Face anthropometry provides a set of meaningful measurements or shape parameters that allow the most complete control over the shape of the face. Farkas [5] describes a widely used set of measurements to characterize the human face. The measurements are taken between the landmark points defined in terms of visually-identifiable or palpable features on the subject face using carefully specified procedures and measuring instruments. Such measurements use a total of 47 landmark points for describing the face. As described in Section 2, each example in our face scan database is equipped with 86 landmarks. Following the conventions laid out in [5], we have chosen a subset of 38 landmarks for anthropometric measurements (see Figure 6).

Farkas [5] describes a total of 132 measurements on the face and head. Instead of supporting all 132 measurements, we are only concerned with those related to five facial features (including global face outline). In this paper, 68 anthropometric measurements are chosen as shape control parameters. As an example, Table 1 lists the nasal measurements used in our work. The example models are placed in the standard posture for anthropometric measurements. In particular, the axial distances correspond to the 𝑥, 𝑦, and 𝑧 axes of the world coordinate system. Such a systematic collection of anthropometric measurements is taken through all example models in the database to determine their locations in a multi-dimensional measurement space.

6. Feature Shape Synthesis

From the previous stage we obtain a set of examples of each facial feature with measured shape characteristics, each of them consisting of the same set of dimensions, where every dimension is an anthropometric measurement. The example measurements are normalized. Generally, we assume that an example model 𝐹𝑖 of feature 𝐹 has 𝑚 dimensions, where each dimension is represented by a value in the interval (0,1]. A value of 1 corresponds to the maximum measurement value of the dimension. The measurements of 𝐹𝑖 can then be represented by the vector 𝐪𝑖=𝑞𝑖1,,𝑞𝑖𝑚[],𝑗1,𝑚𝑞𝑖𝑗](0,1.(8) This is equivalent to projecting each example model 𝐹𝑖 into a measurement space spanned by the 𝑚 selected anthropometric measurements. The location of 𝐹𝑖 in this space is 𝐪𝑖.

With the input shape control thus parameterized, our goal is to generate a new deformation of the facial feature by computing the corresponding eigenmesh coordinates with control through the measurement parameters. Given an arbitrary input measurement vector 𝐪 in the measurement space, such controlled deformation should interpolate the example models. To do this we interpolate the eigenmesh coordinates of the example models and obtain smooth range over the measurement space. Our feature shape synthesis problem is thus transformed to a scattered data interpolation problem. Again, the RBFs are employed. Given the input anthropometric control parameters, a novel output model with the desired shapes of facial features is obtained in runtime by blending the example models. Figure 7 illustrates this process. Our scheme first evaluates the predefined RBFs at the input measurement vector and then computes the eigenmesh coordinates by blending those of the example models with respect to the produced RBF values and pre-computed weight values. Finally, the output model with the desired feature shape is generated by evaluating the shape reconstruction model (7) at those eigenmesh coordinates. Note that there exist as many RBF-based interpolation functions as the number of eigenmeshes.

The interpolation is multi-dimensional. Consider a m mapping, the interpolated eigenmesh coordinates 𝑎𝑗(), 1𝑗𝐾 at an input measurement vector 𝐪m are computed as 𝑎𝑗(𝐪)=𝑀𝑖=10𝑥0200𝑑𝛾𝑖𝑗𝑅𝑖(𝐪)for1𝑗𝐾,(9) where 𝛾𝑖𝑗 are the radial coefficients and 𝑀 is the number of example models. Let 𝐪𝑖   (1𝑖𝑀) be the measurement vector of an example model. The radial basis function 𝑅𝑖(𝐪) is a multi-quadric function of the Euclidean distance between 𝐪 and 𝐪𝑖 in the measurement space: 𝑅𝑖(𝐪)=𝐪𝐪𝑖2+𝜌2𝑖for1𝑖𝑀,(10) where 𝜌𝑖 is the locality parameter used to control the behavior of the basis function and determined as the Euclidean distance between 𝐪 and the closest other example measurement vector.

The 𝑗th eigenmesh coordinate of the 𝑖th example model, 𝑎𝑖𝑗, corresponds to the measurement vector of the 𝑖th example model, 𝐪𝑖. Equation (9) should be satisfied for 𝐪𝑖 and 𝑎𝑖𝑗 (1𝑖𝑀): 𝑎𝑖𝑗=𝑀𝑖=10𝑥0200𝑑𝛾𝑖𝑗𝑅𝑖𝐪𝑖for1𝑗𝐾.(11) The radial coefficients 𝛾𝑖𝑗 are obtained by solving this linear system using an LU decomposition. We can then generate the eigenmesh coordinates, hence the shape, corresponding to the input measurement vector 𝐪 according to (9).

7. Subregion Shape Blending

After the shape interpolation procedure, the surrounding facial areas should be blended with the deformed internal facial features to generate a seamlessly smooth face mesh. The position of a vertex 𝐱𝑖 in the feature region 𝐹 after deformation is 𝐱𝑖. Let 𝒱 denote the set of vertices of the head mesh. For smooth blending, positions of the subset 𝒱F=𝒱𝒱F of vertices of 𝒱 that are not inside the feature region should be updated with deformation of the facial features. For each vertex 𝐱𝑗𝒱F, the vertex in each feature region that exerts influence on it, 𝐱𝐹𝑘𝑖, is the one of minimal distance to it. It is desirable to use geodesic distance on the surface, rather than Euclidean distance to measure the relative positions of two mesh vertices. We adopt an approximation of the geodesic distance based on a cylindrical projection which is preferable for regions corresponding to a volumetric surface (e.g., the head). The idea is that distance between two vertices on the projected mesh in the 2D image plane is a fair approximation of geodesic distance. Thus, 𝐱𝐹𝑘𝑖 is obtained as: 𝐱𝑗𝐱𝐹𝑘𝑖𝐺min{𝑖|𝑖𝒱𝐹}𝐱𝑗𝐱𝑖,(12) where 𝐱𝑖 and 𝐱𝑗 are the positions of vertices on the projected mesh, and 𝐺 denotes the geodesic distance. Note that the distance is measured offline in the original undeformed generic mesh. For each non-feature vertex 𝐱𝑗, its position is updated in shape blending as: 𝐱𝑗=𝐱𝑗+𝐹Γ1exp𝛼𝐱𝑗𝐱𝐹𝑘𝑖𝐺𝐱𝐹𝑘𝑖𝐱𝐹𝑘𝑖,(13) where Γ is the set of facial features and 𝛼 controls the size of the region influenced by the blending. We set 𝛼 to 1/10 of the diagonal length of the bounding box of the head model. Figure 8(b) shows the effect of our shape blending scheme employed in synthesizing the nose shape.

8. Results

Our method has been implemented in an interactive system with C++/OpenGL, where the user can select facial features to work on interactively. A GUI snapshot is shown in Figure 9. Our system starts with a mean model which is computed as the average of 186 meshes of the RBF-warped models and textured with the mean cylindrical full-head texture image [38]. Our system also allows the user to select the desired feature(s) from a database of pre-constructed typical features, which are shown in the small icons on the upper-left of the GUI. Upon selecting a feature from the database, the feature will be imported seamlessly into the displayed head model and can be further edited if needed. The slider positions for each of the available feature in the database are stored by the system so that their configuration can be restored whenever the feature is chosen. Such a feature importing mode enables coarse-to-fine modification of features, making the face synthesis process less tedious. We invited several student users who naturally lack the graphics professional's modeling background to create face models using our system. The laymen appreciated the intuitiveness and continuous variability of the control sliders. Table 2 shows the details of the datasets.

Figure 10 illustrates a number of distinct facial shapes synthesized to satisfy user-specified local shape constraints. Clear differences are found in the width of the nose alar wings, the straightness of the nose bridge, the inclination of the nose tip, the roundness of eyes, the distance between eyebrows and eyes, the thickness of mouth lips, the shape of the lip line, the sharpness of the chin, and so forth. A morphing can be generated by varying the shape parameters continuously, as shown in Figures 10(b) and 10(c). In addition to starting with the mean model, the user may also select the desired head model of a specific person from the example database for further editing. Figure 11 illustrates face editing results on the models of two individuals for various user-intended characteristics.

In order to quantify the performance, we arbitrarily selected ten examples in the database for the cross validation. Each example has been excluded from the example database in training the face synthesis system and its shape measurements were used as a test input to the system. The output model was then compared against the original model. Figure 12 shows a visual comparison of the result. We assess the reconstruction by measuring the maximum, mean, and root mean square (RMS) errors from the feature regions of the output model to those of the input model. The 3D errors are computed by the Euclidean distance between each vertex of the ground truth and synthesized model. Table 3 shows the average errors measured for the ten reconstructed models. The errors are given using both absolute measures (/mm) and as a percentage of the diameter of the output head model bounding box.

We compare our method against the approach of optimization in the PCA space (Opt-PCA). Opt-PCA performs optimization to estimate weights of the eigen-model (7). It starts from the mean model on which the anthropometric landmarks are in their source positions. The corresponding target positions of these landmarks are the landmark positions on the example model. We then optimize the mesh shape in the subspaces of facial features using the downhill simplex algorithm such that the sum of distances between the source and target positions of all landmarks is minimized. Table 4 shows the comparison between our method and Opt-PCA. Opt-PCA produces a large error since the number of landmarks is small and it is not sufficient to fully determine weights of the eigen-model. Opt-PCA is also slow since there are many PCA weights to be optimized iteratively.

Our system runs on a 2.8 GHz PC with 1 GB of RAM. Table 5 shows the time cost of different procedures. At runtime, our scheme spends less than one second in generating a new face shape upon receiving the input parameters. This includes the time for the evaluation of RBF-based interpolation functions and for shape blending around the feature region boundaries.

9. Applications

Apart from creating plausible 3D face models from users' descriptions, our feature-based face reconstruction approach is useful for a range of other applications. The statistics of facial features allow analysis of their shapes, for instance, to discern differences between groups of faces. They also allow synthesis of new faces for applications such as facial feature transfer between different faces and adaptation of the model to local populations. Moreover, our approach allows for compression of 3D face data, facilitating us to share statistics with other researchers to allow the synthesis and further study of high-resolution faces.

9.1. Analyzing the Shape of Facial Features

As the first application, we consider analysis of the shape of facial features. This is useful for classification of face scans. We wish to gain insight into how facial features change with personal characteristics by comparing statistics between groups of faces. We calculate the mean and standard deviation statistics of anthropometric measurements for each facial feature of different groups. The morphometric differences between groups are visualized by comparing the statistics of each facial feature in a diagram. We follow this approach to study the effects of race and gender.

Race
To investigate how the shape of facial features changes with race, we compare three groups of 18–30 year-old Caucasian (72 subjects), Mongolian (18 subjects), and Negroid (26 subjects) which are divided almost equally between the genders. The group statistics are shown in Figure 13, colored with blue, green, and red, respectively. It shows that the Caucasian nose is narrow, the Mongolian nose is medial, and the Negroid nose is wide. The statistics indicate a relatively protruding, narrow nose in Caucasian. The Mongolian nose is less protruding and wider, and the Negroid nose has the smallest protrusion. The nasal root depth and nasofrontal angle are the largest for the Caucasian, exhibiting significant differences compared with the smaller Negroid and smallest Mongolian values. This suggests the high nasal root in Caucasian and relatively flat nasal root in Negroid and Mongolian. Significant differences among the three races are also found in inclination of the columella and nasal tip angle, indicating the hooked nose in Caucasian and the snub nose in Mongolian and Negroid.
For the eyes, the main characteristics of the Caucasian group are the largest eye fissure height, the smallest intercanthal width and eye fissure inclination angle. These suggest that the Caucasian eyes typically have larger openings with horizontally aligned inner and external eye corners. The Mongolian group has the largest intercanthal width, and the greatest inclination in the shortest eye fissure and the smallest eye fissure height, which indicate the relatively small eye openings separated in a large horizontal distance with positions of the inner eye corners lower than those of the external ones. Blacks have the largest eye fissure length and binocular with, which denote the relatively wide eyes in this group.
As shown in Figure 13(c), many measurements of the mouth of Negroid (e.g., mouth width, upper and lower lip height, upper and lower vermilion height) are the largest among the three races. They are significantly different from those in Caucasian or Mongolian group. Mongolian has the relatively narrow mouth and thin lips. In Caucasian the skin portion of the upper and lower lips and their vermilion height are the smallest. However, the proportions of the upper and lower lip heights in the three races reveal the similarity.
From statistics illustrated in Figure 13(d), the Negroid chin has the characteristics of a long vertical profile dimension and small width. The smallest value of inclination of the chin from the vertical and the largest mentocervical angle also indicates a less protruding chin for Negroid. In Mongolian, the chin is the widest among the three races. The smallest chin height is found in Caucasian. Also, the chin of Caucasian is slightly wider than that of Negroid, but markedly narrower than that of Mongolian.

Gender
To study the effect of gender, we compare in Figure 14 18–30-year-old Caucasian females (35 subjects, in red) to Caucasian males of the same age group (37 subjects, in blue). The change of the shape of facial features from females to males is different in character from that of the change between varying racial groups. The larger values of most distance measurements of the nose indicate that males have wide alar wings and wide, long nose bridge. The value of the nasal root depth is also indicative of high upper nose bridge of the male subjects. In females, the nose bridge and alar are narrower; the nose tip is sharper and more protruding. In addition, the vertical profile around the junction of the nose bridge and the anterior surface of the forehead in females is flatter, which is suggested by the larger nasofrontal angle. The inclination of the nose bridge and columella reveals the similarity in two genders.
Regarding anthropometric measurements of the eyes, males have the larger intercanthal width and binocular width, which imply that their eyes are more separated with regard to the sagittal plane (vertical plane cutting through the center of the face). The width of the eye fissure of males is slightly larger than that of females, whereas the heights of the eye fissure of two genders are similar. Males also have the large height of the lower eyelid. In females, the height of the upper eyelid and distance between eyebrows and eyes are larger. Another characteristic of females is the large inclination of the eye fissure.
Most distance measurements of the mouth in the male group are larger in both genders, as shown in Figure 14(c). This suggests that males have a much wider mouth with the large skin portion of the upper and lower lips. However, the vermilion heights of the upper and lower lips in two groups reveal the similar thickness of the lips in two genders. The differences exhibited in the angular measurements are indicative of more protruding lips and convex lip line of the female subjects.
The diagram in Figure 14(d) shows that the chin of males is characterized by large size in three dimensions (width, height and depth) due to the large underlying mandible. The greater inclination angle of the chin and smaller mentocervical angle also indicate a relatively protruding chin in males compared to that of females.

9.2. Facial Feature Transfer

In the applications of creating virtual characters for entertainment production, sometimes it is desirable to adjust the face so that it has certain facial features similar to those of a particular person. Therefore, it is useful to be able to transfer desired facial feature(s) between different human subjects. One might wish, given a database of example faces, to select a face or multiple faces to which to adjust facial features.

Our high-level facial feature control framework allows the transfer of desired facial features from example faces to a source model in a straightforward manner. We can alter the feature of the source model with a feature-adjustment step which coerces the anthropometric measurement vector to match that of the target feature of an example face. The new shape of the selected feature is reconstructed on the source model and can be further edited if needed.

Figure 15(a) shows the source model which is the approximation of an example 3D scan using the deformed generic mesh. Figures 15(c) to 15(f) show the results of matching the shape measurements of the features of this model to those of two example faces shown in Figure 15(b). The synthesis keeps global shape of the source model, while transferring features of the target subject to the source subject. With decomposition of the face into local features, typical features of different target faces can be transferred in conjunction with each other to the same source model. Figure 16 shows a composite face built from facial features of four individuals.

9.3. Face Adaptation to Local Populations

Adapting the model to local populations falls neatly into our framework. The problem of automatically generating a population is reduced to the problem of generating the desired number of plausible sets of control parameters. It is convenient to generate each parameter value independently as if sampled from the Gaussian normal distribution with its mean and variance. The generated control parameter values both respect a given population distribution, and—thanks to the use of interpolation in the local feature shape spaces—produce a believable face. The examples of this process are shown in Figure 17.

9.4. Face Data Compression and Dissemination

For the face synthesis based on a large example data set, the ability to organize examples into database, compress, and efficiently transmit them is a critical issue. The example face meshes used for this paper are restricted from being transmitted in their full resolution because of their dense-data nature. In our method, we take advantage of the fact that the objects under our consideration are of the same class and that they lie in correspondence to compress data very efficiently. Instead of storing instances of geometry data for every example, we adopt a compact representation obtained by extracting the statistics with PCA, which are several orders of magnitude smaller than the original 3D scans. This accounts for the space gain from 𝑀 times the dimensionality of high-resolution 3D scans (hundreds of thousands), to 𝐾 (𝐾𝑀) times the dimensionality of an eigenmesh (several thousands), with 𝑀 and 𝐾 being the number of examples and eigenmeshes respectively. For all faces, we also make available the statistics of facial feature measurements within different population groups. These statistics along with the eigenmeshes should make it possible for other researchers to investigate new applications beyond the ones described in this paper.

10. Conclusion and Future Work

We have presented an automatic runtime system for generating varied, realistic face models. The system automatically learns a statistical model from example meshes of facial features and enforces it as a prior to generate/edit the face model. We parameterize the feature shape examples using a set of anthropometric measurements, projecting them into the measurement spaces. Solving the scattered data interpolation problem in a reduced subspace yields a natural face shape that achieves the goals specified by the user. With an intuitive slider interface, our system appeals to both beginning and professional users, and greatly reduces the time for creating natural face models compared to existing 3D mesh editing software. With the anthropometrics-based face synthesis, we explore a variety of applications, including analysis of facial features in subjects with different races, transfer of facial features between individuals, and adjusting the apparent race and gender of faces.

The quality of the generated model depends on the model priors. Therefore, an appropriate database with large number and variety of the faces must be available. We would like to extend our current database to incorporate more 3D face examples of Mongolian and Negroid races as well as to increase the diversity of age. We also plan to increase the number of facial features to choose from. To improve the system interface, we would like to integrate the “dragging" interaction mode which allows for directly choosing one or more feature points of a facial feature and then dragging them to the desired positions to generate a new facial shape. This involves updating multiple anthropometric parameters in one step and results in large scale changes.