A Facial Expression Parameterization by Elastic Surface Model
We introduce a novel parameterization of facial expressions by using elastic surface model. The elastic surface model has been used as a deformation tool especially for nonrigid organic objects. The parameter of expressions is either retrieved from existing articulated face models or obtained indirectly by manipulating facial muscles. The obtained parameter can be applied on target face models dissimilar to the source model to create novel expressions. Due to the limited number of control points, the animation data created using the parameterization require less storage size without affecting the range of deformation it provides. The proposed method can be utilized in many ways: (1) creating a novel facial expression from scratch, (2) parameterizing existing articulation data, (3) parameterizing indirectly by muscle construction, and (4) providing a new animation data format which requires less storage.
Recent interests in facial modeling and animation have been spurred by the increasing appearance of virtual characters in film and video, inexpensive desktop processing power, and the potential for a new 3D immersive communication metaphor for human-computer interaction. Facial modeling and animation technique is a difficult task, because exact classifications are complicated by the lack of exact boundaries between methods and the fact that recent approaches often integrate several methods to produce better results. The classification of these methods is described in the survey report . Many efforts have been put on physic-based muscle modeling to model anatomical facial behavior more faithfully. These are categorized into three: mass-spring systems, vector representation, and layered spring meshes. Mass-spring methods propagate muscle forces in an elastic spring mesh that models skin deformation . The vector approach deforms a facial mesh using motion fields in delineated regions of influence . A layered spring mesh extends a mass-spring structure into three connected mesh layers . Mesh deformation plays a central role in computer modeling and animation. Animators sculpt facial expressions and stylized body shapes. They assemble procedural deformation and may use complex muscle simulations to deform a character's skin. Despite the tremendous amount of artistry, skill, and time dedicated to crafting deformations, there are few techniques to help with reuse.
In this paper, a novel parameterization of facial expressions is introduced. The parameters can be learned from existing face models or created from scratch. The obtained parameters can be applied on target face models dissimilar to the source model from which the parameters are taken in order to generate similar expressions on the target models. We also adopt a muscle-based animation system to obtain the parameters indirectly. It is tedious and difficult to make expressions by manipulating each control point. The proposed system provides a new alternative to make expressions which is easier and more intuitive. Facial animation by the parameterization requires less storage especially for highly complex face models with huge articulation data without reducing the range of deformation it provides.
In Section 3, the elastic skin model is described. Section 4 describes the details of the proposed facial parameterization. In Section 5, the expression cloning technique is introduced. In Section 6, a simplified facial muscle model is used to indirectly generate the parameters by manipulating the muscles. Section 7 describes the advantages of the proposed method and perspective for continuous animation.
2. Related Works
Applying facial expressions from human faces to computer-generated characters has been widely studied [5–11]. To control facial movement, facial expressions are analyzed into the position of feature points [5, 6, 11, 12] or the weights for blending premodeled expressions [7–10]. Our work is mostly motivated by the work . In order to transform motion data from a source model to a target model, their method requires a dense mapping between the source and the target model. Then the motion vector at each vertex on the source model is dissipated among the corresponding vertices on the target model. Our method does not require such a dense mapping and computationally more efficient. References [8, 9] adopt an example-based approach for retargeting facial expressions. Examples-based approach requires a set of precomputed basis models to synthesize a new expressions. This approach is effective since the animators can use their imagination to create a set of basis expressions so that a novel bending expression can possibly represent their artistry. However, creating basis expressions is not a trivial work and these methods might lack generality.
To compute facial parameters from existing models, we assume that there is a “point-to-point” correspondence between them in order to derive motion vectors for each expression. This assumption might be too restrictive in some cases; however there are several techniques to establish correspondences between two different models [13–15]. Harmonic mapping is a popular approach for recovering dense surface correspondences between them .
3. Elastic Facial Skin Model
In this section, the underlying theory of the elastic skin model is introduced. An intuitive surface deformation can be modeled by minimizing physically inspired elastic energies. The surface is assumed to behave like a physical skin that stretches and bends as forces are acting on it. Mathematically this behavior can be captured by the energy functional that penalizes both stretching and bending [17–19]. Let be the displacement function defined on the surface and let and be the parameters to control the resistance to stretching and bending, respectively, the elastic energy is defined as
where the notations are defined as , .
In a modeling application one would have to minimize the elastic energy in (1) subject to the user-defined constraints. By applying variational calculus, the corresponding Euler-Langrange equation that characterizes the minimizer of (1) can be expressed as
The Laplace operator in (2) corresponds to the Laplace-Beltrami operator . Using the famous cotangent discretization of the Laplace operator, the Euler-Lagrange PDE turns into a sparse linear system:
where is the handle vertices, and is the fixed vertices. Interactively manipulating the handle changes the boundary constraints of the optimization. As a consequence, this system has to be solved in each frame. Note that restricting to affine transformation of the handle allows us to precompute the basis functions of the deformation. So, instead of solving (3) in each frame, only the basis functions have to be evaluated . We will elaborate the details in the next section.
Figure 1 shows the results of deformation for two extreme cases (a) pure stretching and (b) pure bending .
In general, the order of Laplacian operator corresponds to the continuity across the boundaries. For the facial skin deformation, we use the pure bending surface model because the model can retain the continuity around the handle vertex which is proved to be a good approximation of the skin deformations of various expressions.
4. Facial Parameter Estimation
In this section, we parameterize a set of existing face models using the elastic surface model. The facial parameters are calculated so that obtained parameters are precise enough to approximate the deformation of certain facial expression. The input consists of a face model with neutral expression and a set of face models with key expressions. To match up every vertices, all the models share the same number of vertices and triangles and have identical connectivity. Equation (3) can be expressed in matrix form by reordering the rows:
Let the kth column vector of be denoted by , then the right-hand side of (5) can be decomposed as
where is the number of handle points. Note that the basis functions can be precomputed once the handle points are fixed and they can be reused for all the expressive face models of the same subject.
The left-hand side of the (6) can be computed for each expressive face by subtracting the neutral face. The facial parameters can be computed by using least square approximation method. We use the QR factorization method to solve the least square problem.
To obtain the basis functions , handle region corresponding to the facial control points needs to be defined. We adopt a subset of facial control points defined in MPEG-4 standard , which are distributed symmetrically over an entire front face. The total number of control points is forty-seven for our test model, and they are shown in Figure 2. Note that the number of facial control points and the location of each point are fully customizable. For instance, if mouth animation is the main objective such as a lip syncing animation, the majority of control points must be placed around the lips to increase the degree of freedom of the lip movement.
If no fixed region is defined on the face model, undesired deformation would occur on the back of the head and around the ears. This is because the solution of (3) would try to keep the boundary conditions around the deformed handle region . For our face models, the fixed region is empirically defined on the vertices which are static under the change of expressions. In order to search for the fixed vertices, we let to be the Euclidean distance between the tip of the nose and the center of the forehead, then if the Euclidean distance between the vertex and the tip of nose is greater than the threshold value defined as (for our test models), we put the vertex in the fixed region .
Figure 3 shows the generated face models by applying facial parameters computed to the neutral face model. In order to clarify the difference from the original, the corresponding original models are shown in the first row of the Figure 3. All the expressions are reproduced correctly enough even though we notice some slight differences between the two models such as the nostril area of the anger expression and the eyelid of the blink expression. We can mitigate the differences between the two models by placing additional controls points around the area.
4.1. Facial Expression Blending
Facial expression blending is a common technique for facial animation to create a novel expression by blending existing expressions.
Given the set of facial parameters generated for each expression, a novel expression can be created by simply blending the facial parameter for each expression:
where is the blending weight for each expression.
Figure 4 shows some examples of expression blending. The first row shows no textured images, and the second row shows textured images. The blending calculation is performed at the facial control points but not at every vertices, so the computational cost is relatively low.
(c) Mixed expression
(a) Target model 1
(b) Target model 2
(c) Target model 3
We can also attenuate the displacement motion of each control point independently by adopting the importance map as suggested in  to increase the variety of expressions to be generated.
5. Facial Expression Cloning
Expression cloning is a technique that copies expressions of a source face model onto a target face model. The mesh structure of the models needs not to be the same. Our proposed facial parameterization can be used for this purpose.
The first step selects the facial control points on the target model, each of which is exactly correspond to the control point on the source model. It takes no more than twenty minutes to select all facial control points on the target model. The second step computes the basis functions for the target model as we did in the previous section. Table 1 shows the time took for computing the basis functions for each target model. In the third step, we copy the expressions on the target model. Given the facial parameters for each expression of the source model and the basis functions obtained from the target model, each expressive target model is computed by using (6).
To compensate for the scale difference between the source and the target model, each element of facial parameters , a 3D displacement vector from the neutral face at the control point , is normalized such that the norm is measure by the Facial Action Parameter Unit (FAPU). The FAPU is commonly set as the distance between the inner corners of the eyes of the model. We also assume that the model is aligned so that the -axis points through the top of head, -axis points through the left side of head and looking in the positive -axis direction. If the target model is not aligned with the above coordinate system, it is aligned before the deformation is applied then moved back to the original position after the deformation.
In Figure 6 the source model and five basis expressions are shown in the first row and the cloned expressions on three different target models are shown in the following rows.
At the end of this paper we show the various expressions generated by a set of facial parameters in Figure 12. For each row, the facial parameter is same for all the models. The models are rendered with eyes to make each expression distinguishable.
6. Facial Deformation by Muscle Contraction
The facial animation by muscle contraction has been studied by many researchers [3, 23]. The simplified facial muscle model which is often used in computer animation should resemble that of anatomical models as much as possible. This is because the facial animation by muscle contraction must be general and fully understood by the animator. The basic canonical facial expressions have been throughly studied by Ekman and Friesen ; they have described in detail how the facial parts should move in order to make certain expressions. Even though there is little attention about the contraction of facial muscles underlying the facial skin, their analysis was very helpful to manipulate our simplified facial muscles to create certain expressions.
We define two types of muscles: a linear muscle that pulls and a sphincter muscle that squeezes the nearby skin elements. Similar pseudomuscle model is first proposed in . In , they have succeeded to generate various facial expressions from a set of simple facial muscles. Figure 7 shows the muscle models we use in the following sections.
Most of the methods proposed before using a facial muscle model try to attach nearby skin elements to a pseudomuscle in the registration stage then deform the skin elements as the muscle contracts. The magnitude of the deformation is determined from the relative position of the skin element from the muscle. In a physically based facial modeling approach , the deformation is applied as nodal forces to three-layered mass-spring facial skin system. Some drawback of previous approaches is the complexity of the interaction between the muscle and nearby skin elements and the relatively high computational costs due to the finite elements method for skin deformation . The uniqueness of the proposed method is that each muscle's contraction drives only the nearby facial control points that we define in the previous sections but not all the nearby skin elements as proposed before. The approach alleviates the painstaking task required to register all the skin elements to the corresponding muscle and has low computational costs. Finally, the skin elements other than the control points are deformed as the solution of the elastic skin model described in the Section 3. The details are described in the following section.
6.1. Facial Muscle Registration
In order to generate natural expressions and provide the animator easier operability, a set of simplified major facial muscles is defined and each of them is attached to the mesh by referring a anatomical model. These tasks are done manually and must be adopted for each different face model. Note that each muscle is a virtual edge connecting the two vertices (attachment points) of the mesh.
The fundamental of the linear muscle is that one end is the bony attachment that remains fixed, while the other end is embedded in the soft tissue of the skin. When the muscle is operated, it contracts along the two ends of the muscle. Each muscle has maximum and minimum zones of influence; however there is no analytic methods to measure them since only the surface points could be measured and the range of influence varies greatly from face to face .
The sphincter muscles around the mouth and the eyes that squeeze the skin tissue are described as an aggregate of linear muscles radially arranged from a pseudocenter point. The pseudo center is calculated by fitting an elliptic curve to the points defining the sphincter muscle.
Figure 8 shows the two kinds of the muscle model. The end point of each muscle is colored in blue at the bony attachment and in red at the skin attachment. As shown in Figure 7, the facial muscles are symmetrically defined across the center of the face however each muscle contracts independently.
(a) Linear muscle
(b) Sphincter muscle
To compute the zone of maximum and minimum influences, we adopt the method proposed in . Each linear muscle has the radial falloff and the angular falloff calculated from the rest position. At the skin attachment (not at the bony attachment), the influences is maximum (1.0) and gradually falloff to minimum (0.0) using cosine curve as in (8) and (9). Each muscle registers the nearby facial control points along with its own radial influence and angular influence if they reside in the influence region of the muscle. At the end of the registration, each control point is registered by one or more than one muscle elements depending on the zones of influence. In cases that there are any control points which no muscles register, the zones of influence must to be adjusted until every control point is registered by at least one muscle:
where is the radial distance between the facial control point and the muscle's bony attachment point
where is the angle between the linear muscle at rest position and the facial control point.
Figure 9 illustrates the registration of a facial muscle. The control point is influenced by two linear muscles and is registered with its own radial and angular influence values.
6.2. Facial Expression by Muscle Contraction
In our simplified muscle models, the linear muscle contracts only along the muscle's endpoints. When the muscle contracts, the displacement vector at the skin attachment point (the red point in Figure 8) from the rest position is calculated and it is dissipated among the registered facial control points. The amount of displacement of each registered control point is defined as a function of , , and . We use the following simple formula to drive the translation of the facial control point:
Since the facial control point might be registered by other muscles, the final displacement, the total of the displacement by each muscle, is given by
Finally the deformed model by the muscle contractions is calculated by the product of the basis functions and the facial parameters .
In Figure 10, six canonical expressions are created by contracting facial muscles by referring the analysis results . In Figure 11, the amount of each muscle contraction of each expression is shown in two separate graphs to remove clutters. Note that each of the symmetric muscle pair can be contracted independently. If the amount is positive, the muscle stretches and if negative the muscle shrinks along the two ends of the linear muscle. The displacement vector of each muscle is calculated as
(a) Surprise, fear, disgust
(b) Anger, happy, sad
where is the normalized muscle vector, and is the amount of muscle contraction from the user setting.
The proposed parameterization of facial expression has advantages in terms of storage size. The conceptual storage size required to store the animation data is given by
for the proposed method, where is the size of storage required to store the precomputed basis functions and it is defined as
Meanwhile for a traditional system that simply stores the displacement vector at every vertex, it is given as
So the required storage size is decreased by the ratio
As this formula indicates, the proposed method requires less storage and has much lower memory footage if the size of the mesh is very large and the number of the expressions exceeds the number of the control points. It is a possible scenario since highly detailed facial animation might require a large number of blendshapes, for instance, the facial animation of Gollum in the feature film. The Two Towers require 675 blendshapes .
Character animation specifically facial animation requires continuous deformation in animation time frame.
Key framing is a common technique to continuously deform a model by interpolating key framed poses or channel data. The various methods of interpolation and extrapolation control the visual effects of the continuous deformation in the animation time frame. By using the proposed parameterization, the displacement of each facial control point can be set directly by tweaking the position at each key-frame. It is also possible to set the facial control points indirectly by blending canonical expressions described in Section 4.1 or by the muscle contraction described in Section 6.
The deformation by the limited control points requires low computational cost since only the sum of the scalar products of the basis functions and the facial parameters is required to get the resulting model. Smooth deformation by a limited facial control points is a key technique when creating a facial animation from motion capture (mocap) data. Several approaches have been studied. For instance Lorenzo and Edge  use Radial Basis Functions (RBFs) to transfer facial motion data to a target mesh dissimilar to the original actor from whom it was captured. Our proposed method can be nicely coupled with mocap data if the control points on the target mesh are exactly matched with the markers on the actor.
The elastic deformable surface model has been used as a deformation tool especially for elastic organic models. We have introduced a novel parameterization of facial expression using the elastic deformable surface model. The parameterization uses a small number of facial control points to deform a higher resolution surface smoothly and continuously.
In order to obtain parameters for each expression two approaches are introduced. The first approach retrieves parameters directly from existing models by least square minimization method. The parameter can also be created from scratch by moving each control points using the imagination of the animator.
The other approach indirectly obtains parameters by manipulating the facial muscles. The method by the muscle contraction is more intuitive and less daunting task compared with the former method.
The obtained facial parameters can be applied on other target model even if the mesh structure (number of vertices, number of triangles, and connectivity) is different from the source model. A key-framed facial animation can seamlessly use the proposed parameterization. The parameter values could be provided from mocap data.
The method could be used as a postprocessing tool to compress existing animation data, since it requires less storage especially for highly complex mesh objects with huge articulation data without sacrificing the quality of the original animation as much as possible.
Future research could explore the automated detection of the control points on the face model. Several heuristic approaches have been studied [6, 28]. If the target mesh is different from the source mesh, not only the ratio of the size but also the local features at control points, for example, the human head and the dog head, the facial parameters obtained cannot be directly applied. A possible solution is to define the parameters with respect to the local coordinate system at each control point of the source model then for the target model the parameter is reconstructed using the local coordinate system at the corresponding control point of the target model. Similar method is suggested in .
J. Noh and U. Neumann, “A survey of facial modeling and animation techniques,” Technical Report 99-705, USC, 1998.View at: Google Scholar
S. M. Platt and N. I. Badler, “Animating facial expressions,” in Proceedings of the International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '81), vol. 15, pp. 245–252, Dallas, Tex, USA, August 1981.View at: Google Scholar
K. Waters, “A muscle model for animating three-dimensional facial expression,” in Proceedings of International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '87), vol. 21, pp. 17–24, Anaheim, Calif, USA, July 1987.View at: Google Scholar
D. Tezopoulos and K. Waters, “Physically-based facial modeling, analysis, and animation,” Journal of Visualization and Computer Animation, vol. 4, pp. 73–80, 1990.View at: Google Scholar
B. Guenter, C. Grimm, D. Wood, H. Malvar, and F. Pighin, “Making faces,” in Proceedings of the Annual Conference on Computer Graphics (SIGGRAPH '05), pp. 17–24, 2005.View at: Google Scholar
J. Yong and N. Ulrich, “Expression cloning,” in Proceeding of 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 277–288, 2001.View at: Google Scholar
F. Pighin, R. Szeliski, and D. H. Salesin, “Resynthesizing facial animation through 3D model-based tracking,” in Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV '99), vol. 1, pp. 143–150, Kerkyra, Greece, September 1999.View at: Google Scholar
B. Choe and H.-S. Ko, “Analysis and synthesis of facial expressions with hand-generated muscle actuation basis,” in ACM SIGGRAPH Courses, 2006.View at: Google Scholar
H. Pyun, Y. Kim, and W. Chae, “An example-based approach for facial expression cloning,” in Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 167–176, 2003.View at: Google Scholar
P. Joshi and W. C. Tien, “Learning controls for blend shape based realistic facial animation,” in ACM SIGGRAPH Courses, 2006.View at: Google Scholar
L. Williams, “Performance-driven facial animation,” Computer Graphics, vol. 24, no. 4, pp. 235–242, 1990.View at: Google Scholar
M. Eck, T. DeRose, T. Duchamp, H. Hoppe, M. Lounsbery, and W. Stuetzle, “Multiresolution analysis of arbitrary meshes,” in Proceedings of the 22nd Annual ACM Conference on Computer Graphics and Interactive Techniques, pp. 173–180, Los Angeles, Calif, USA, August 1995.View at: Google Scholar
A. W. F. Lee and D. Dobkin, “Multiresolution mesh morphing,” in Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 343–350, 1999.View at: Google Scholar
G. Celniker and D. Gossard, “Deformable curve and surface finiteelements for free-form shape design,” in Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques, pp. 257–266, 1991.View at: Google Scholar
D. Terzopoulos and J. Platt, “Elastically deformable models,” in Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Technique, pp. 205–214, 1987.View at: Google Scholar
W. Welch and A. Witkin, “Variational surface modeling,” in Proceedings of the 19th Annual ACM Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '92), vol. 26, pp. 157–166, Chicago, Ill, USA, July 1992.View at: Google Scholar
ISO, Mpeg-4 international standard, moving picture experts group, 2003.
Y. Lee and D. Tezopoulos, “Realistic modeling for facial animation,” in Proceedings of International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '95), pp. 55–62, 1995.View at: Google Scholar
P. Ekman and W. V. Friesen, Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues, Prentice-Hall, Upper Saddle River, NJ, USA, 1975.
Y. Lee, D. Terzopoulos, and K. Waters, “Realistic modeling for facial animation,” Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '95), pp. 55–62, 1995.View at: Google Scholar
J. Fordham, “Middle earth strikes back,” Cinefex, no. 92, pp. 71–142, 2003.View at: Google Scholar
M. S. Lorenzo and J. D. Edge, “Use and re-use of facial motion capture data,” in Proceedings of the Vision, Video, and Graphics, pp. 1–8, 2003.View at: Google Scholar
Z. Deng and U. Neumann, “Data-driven 3D facial animation,” in Expressive Visual Speech Generation, pp. 29–59, Springer, Berlin, Germany, 2008.View at: Google Scholar