Abstract

Facial animation is one of the most popular 3D animation topics researched in recent years. However, when using facial animation, a 3D facial animation model has to be stored. This 3D facial animation model requires many triangles to accurately describe and demonstrate facial expression animation because the face often presents a number of different expressions. Consequently, the costs associated with facial animation have increased rapidly. In an effort to reduce storage costs, researchers have sought to simplify 3D animation models using techniques such as Deformation Sensitive Decimation and Feature Edge Quadric. The studies conducted have examined the problems in the homogeneity of the local coordinate system between different expression models and in the retainment of simplified model characteristics. This paper proposes a method that applies Homogeneous Coordinate Transformation Matrix to solve the problem of homogeneity of the local coordinate system and Maximum Shape Operator to detect shape changes in facial animation so as to properly preserve the features of facial expressions. Further, root mean square error and perceived quality error are used to compare the errors generated by different simplification methods in experiments. Experimental results show that, compared with Deformation Sensitive Decimation and Feature Edge Quadric, our method can not only reduce the errors caused by simplification of facial animation, but also retain more facial features.

1. Introduction

Concomitant with the rapid advancements in information technology, the application of 3D animation has become increasingly popular in fields such as the movie industry [1, 2], gaming [35], arts [6, 7], and education [8, 9]. Among the various 3D animation technologies, facial animation is the most commonly used technique. However, because facial animation contains a variety of expressional contents, the surface relief of 3D models in facial animation undergoes considerable changes.

Parke [10] was the first to propose the concept establishing facial animation. Subsequently, many related studies and techniques, including Head Pose Estimation [11], Facial 3D Shape Estimation [12], Head Shop [13], The Digital Emily Project [14], Automatic Generation [15], Kinect-Based Facial Animation [16], Real-Time Facial Animation [1721], and 3D Facial Similarity Measure [22], emerged.

However, in order to elaborate facial animation, numerous triangles are needed, which significantly increases storage costs. Thus, in order to reduce the number of triangles needed to describe the face animation, many experts and scholars have proposed simplified 3D animation methods, including Deformation Sensitive Decimation (DSD) [23, 24], Feature Edge Quadric (FEQ) [25], Facial Features Region Partition [26], and MPEG-4 Quadric-based LoD Simplification [27]. However, most of them are derived from QSlim [28], proposed by Garland and Heckbert, and DSD [23], proposed by Mohr and Gleicher.

QSlim [28] is one of the most famous 3D model simplification methods [29]. This method can not only execute rapidly, but also reduce the errors caused by model simplification. Unfortunately, QSlim can only simplify 3D static models, whereas 3D animation typically contains various frame models. Consequently, in order to utilize QSlim for simplification of 3D animation, Mohr and Gleicher proposed the DSD method [23]. The method calculates and aggregates the error matrix of each vertex in different frame models to serve as a basis for simplification. However, DSD can cause destruction of the appearance characteristics of the model because it lacks complete homogeneous coordinate transformation.

In order to preserve the facial expression features when simplifying face animation, Kim et al. [25] established 32 feature points and adopted FEQ to preserve facial animation features, as shown in Figure 1; Wang et al. [26] used Facial Features Region Partition to set a scope on the face, which is intended to preserve the desired facial features. Its simplification results are shown in Figure 2. However, in these methods, the feature regions must be set by the user before the shapes of the regions are preserved. In other words, these methods can only preserve general facial features, such as the eyes and nose. For nongeneral features, such as wrinkles, it is most likely that they are not in the set range and their triangles can easily be removed, resulting in damage to the features of the nonsetting regions.

In order to overcome this limitation of feature regions having to be set by users and to preserve the facial features during the simplification process, this paper proposes a DSD-based method that uses Homogeneous Coordinate Transformation Matrix (HCTM) to solve DSD’s homogeneous coordinate transformation problems and uses Maximum Shape Operator (MSO) to estimate the changes in each vertex point in the various frame models. In this way, the intensity values of each facial characteristic are automatically quantified and the characteristics of facial animation are moderately preserved during the simplification process.

2.1. Deformation Sensitive Decimation (DSD)

QSlim [28] is a Quadric Error Metrics- (QEMs-) based 3D model simplification method. This method first calculates the matrix of the adjacent triangular piece of each vertex: = . In this formula, and the matrix is based on the calculation of the distance of any point from the plane, . The sum of distances of any point to its adjacent triangle is computed by adding all the matrices of all adjacent triangular patches. This sum of distances, which can be obtained using formula (1), is the error matrix values resulting from the simplification of vertex pairs. In the QSlim process, the simplification is performed in an orderly manner from the vertex pair of the lowest error to the highest, as shown in Figure 3. Consequently, this method can efficiently generate a simplified low error model as follows:

In this formula,

QSlim is utilized to simplify 3D animation models by DSD [23]. This method mainly calculates the matrix of each vertex in different frame models of animation, as shown in formula (3), to estimate the errors each vertex pair () produces during the simplification process and then to decide which vertex pair should be simplified first, as shown in formula (4). This method inherits the speedy execution of QSlim but lacks complete homogeneous coordinate transformation [30]. Consequently, it cannot generate the best vertex point after the simplification of vertex pairs, as illustrated in Figures 4 and 5. Consider where is the number of frame models in 3D animation; refers to the new vertices, and , generated in the th frame model after simplification; and and are the respective error matrices of and in the overall 3D animation model.

In addition, the DSD simplification method is not aimed at analyzing the surface of the 3D animation model. Consequently, it is relatively easy for the important features of the face to be damaged during the facial animation simplification process. To solve this problem, Kim et al. proposed Feature Edge Quadric method with the aim of defining the important feature regions of the face and further preserving the main facial animation features during the simplification process.

2.2. Feature Edge Quadric (FEQ)

FEQ was proposed by Kim et al. for preservation of facial features. In the proposed method, the feature points of 10 parts in the face, including tip of the nose, top of the head, left/right side of the face, top/bottom of the nose, left/right eye socket, bottom left/right of the nose, lip contact line, top/bottom of the lip, chin, and throat, are set as shown in Figure 1. For error estimation, this method contains basic quadric error and Feature Edge Quadric. The former is based on QSlim and calculates the error matrix generated by each vertex pair () during the simplification process, as shown in the following formula: where is the set of triangles adjacent to , is the set of triangles adjacent to , is the area of triangle , and is the error matrix of triangle .

In the latter, the error matrix of the adjacent feature edges of each feature point is calculated to properly preserve feature edges during the facial animation simplification process.

To calculate the error matrix of feature edge , FEQ first computes the planes , which are orthogonal to the edge and can be calculated by and the average normal vector of its two adjacent triangles, , as shown in the following formula:where , where and are the normal vectors of two triangles adjacent to .

Through the integration of basic quadric error and Feature Edge Quadric, the formula of the error matrix can be deduced, as shown inwhere , where is the set formed by the adjacent feature edge of vertex point , and is the weight value (0-1) customized by users (0.5 by default).

Although the FEQ simplification method aims to preserve important features of the eyes, nose, mouth, and so forth, there are many more facial features than these organ features. The forehead, cheeks, and other areas also change significantly with changes in facial expressions. However, FEQ easily ignores the expression features and causes a rapid increase in the simplified errors during the process of simplification. Therefore, our paper proposes a novel simplification method that integrates MSO and HCTM and properly takes into account the overall facial features, thereby reducing errors generated during the simplification of facial animation.

3. Homogeneous Coordinate Transformation Matrix (HCTM)

Because 3D facial animation is composed of various different expression models, the tangent plane and normal vector of the same vertex point also vary with changes in expressions, as shown in Figure 6. In other words, the local coordinate system for the same vertex point varies in different frame models. Therefore, if the coordinate transformation system is not accurate, errors can be easily caused by the simplification of the facial animation models. For this reason, the DSD cannot accurately calculate the optimal vertex position after the simplification of vertex pairs. This paper adopts Theorems 1 and 2 and proposes HCTM to solve this problem.

Theorem 1. The Homogeneous Coordinate Transformation Matrix that corresponds to a translation by is given by the following formula [31]:

Theorem 2. The Homogeneous Coordinate Transformation Matrix that corresponds to a rotation of about the axis in the direction of the unit vector is given by the following formula [31]:where and .

In this paper, HCTM sets the local coordinate space for each vertex point in the first expression model of the facial animation as the main coordinate space. Then, the vertex points in all other expression models are transformed into the corresponding local coordinate space in the first expression model.

Suppose that is the vertex point of in frame ; is the Mesh formed by the adjacent vertex points of ; and and are the tangent plane and normal vector in . HCTM must convert from its original local coordinate system to the local coordinate system in the first expression model. This process contains two steps: translation of to the coordinate system that takes as the origin and and as the tangent plane and normal vector and rotation of the tangent plane and normal vector to overlap with and . The transformation matrix in step one can be defined by Theorem 1 as follows:

In this formula,The transformation matrix in step two can be defined by Theorem 2 as in formula (12). In this formula, and is the angle included between and . Consider

Integrating formulas (10) and (12), HCTM can be defined, as is shown in (13). Mesh and the error matrix can be converted from one to the other with (13), as is shown in formulas (14) and (15) as follows:

With HCTM, the same Mesh in different expression models can be converted to the local coordinate system in the first expression model, and its error calculation can be revised using formula (3) defined by DSD, as shown in formula (16). This method can solve the error value increase problem resulting from incomplete homogeneous coordinate transformation in DSD as follows:

4. Maximum Shape Operator (MSO)

To detect the shape features of the facial animation model, this paper presents a shape operator that estimates the changes in the 3D surface. In general, shape operator is mainly used to calculate the shape change of the specific tangent direction of the 3D object’s surface, as in Definition 3. However, in this paper, the shape change of the 3D object’s surface, on which Mesh lies, is estimated by the edge of the Mesh.

Definition 3. Let be a regular surface, and let be a surface normal to defined in a neighborhood of a point . For a tangent vector to at , we putThen, is called the shape operator [32, 33].

Shape operator refers to the change degree of the normal vector field of surface ’s vertex point in its tangent direction . To calculate the adjacent surface changes of the vertex points in the 3D model, in this paper, shape operator is extended to estimate and integrate the shape operator values of the vertex point and its adjacent vertex points. Suppose that the number of adjacent vertex points of vertex point is , called , and its tangent vector to each vertex point is, respectively, , and , as shown in Figure 7. Consequently, , which signify the shape operator value of vertex in the , and directions can be generated according to the definition of shape operator, as shown in formula (18). After integrating the shape operator in each of these different tangent directions, the surface change of the region in which the vertex lies is shown in formula (19) as follows:

Shape operator can automatically analyze the surface changes of the facial expression model and extract the important features, such as eyes, nose, and mouth, as shown in Figure 8. This method covers the drawback in FEQ of each feature region having to be defined by the user. However, facial animation is not static. As time passes, the face may produce various facial features, including nasolabial folds and forehead wrinkles. Therefore, in order to preserve facial expression features during simplification of the facial animation, this paper uses MSO to extract the facial animation feature. This method calculates the shape operator value of each vertex point in (the number of expression models) expression models and takes the maximum value as the eigenvalue for appearance analysis of facial animation. The MSO is defined as follows:where is the shape operator value of vertex in frame . Figure 9 shows the results of the surface changes of the face animation extracted by MSO. It can be seen from the result that nasolabial folds are obvious in expressions of surprise, grin, laugh, and smile, but not obvious in expressions of fury, anger, rage, and sadness. However, if the features of those expressions are ignored during the face animation simplification process, a rapid increase in simplification errors can easily occur. With MSO, the proposed method can extract the features of nasolabial folds to reduce the errors caused by the simplification of the facial animation.

5. The Proposed Algorithm

The algorithm proposed in this paper is mainly based on DSD, with the introduction of MSO and HCTM to reduce the errors caused by the simplification of the face animation model. The main steps in this algorithm are as follows.

(1)Calculate the QEM of vertex in each facial expression model:where is the set of triangles adjacent to in facial expression model and is the matrix formed in plane , as is shown in formula (1).(2) Calculate HCTM and update QEM to obtain the following:(3) Estimate the MSO of each vertex point :where is the number of vertex points adjacent to in facial expression model , is the vertex point adjacent to in facial expression model , and is the shape operator of .(4) Sum all QEMs of and import MSO into the QEM:where is the MSO value of , is the mean value of MSO, and is the standard deviation of MSO. The original MSO value is between zero and 1,241.77, which is scattered and has some outliers. In order to balance the effect of QEM and MSO, in this paper, the MSO is standardized to , and the outliers that are higher than are set to the maximum value of one.(5) Calculate the minimum simplification error of each vertex pair with the following conditions:(a) is an edge; or(b) is not an edge, but its distance is smaller than the threshold value set by users.(6) Choose the vertex pair with the smallest error in step (5) for simplification.(7) Simplify the vertex pair into a vertex point , and update the QEM of , where .(8)Update all the information adjacent to the vertex points or .(9) Repeat steps (5) to (8) until the number of triangles has been reduced to the designated value.

6. Experimental Results

This study used an Intel Core 2.2 GHz CPU with 1 GB RAM as the main execution environment. The eight experimental facial animation models included fury, surprise, anger, grin, laugh, rage, sadness, and smile. This experiment compared the differences between this method and DSD and FEQ and used root mean square (RMS) error and perceived quality to analyze the distortion errors caused by these methods so as to verify the superiority of this method over DSD and FEQ.

The eight original expression models adopted in this experiment are shown in Figure 10. Each model had 29,299 vertex points and 57,836 triangles. When the number of triangles in the facial animation model is reduced to 5,000, 2,000, 1,000, and 500 through DSD, the generated RMS error values are shown in Table 1.

With changes in the expression model, the error caused by DSD and this method will be different. Take face 01 (fury) as an example: when the number of triangles in the facial animation model is reduced to 5,000, 2,000, 1,000, and 500 through DSD, the generated RMS error values are 5.7313 × 10−2, 11.3899 × 10−2, 22.9495 × 10−2, and 42.7859 × 10−2, respectively. However, the error can be reduced to 0.8427 × 10−2, 1.6240 × 10−2, 2.6013 × 10−2, and 4.7140 × 10−2 using this method, resulting in an improvement rate of between 85.30% and 88.98%. For other expression models, this method is better than the DSD method, with an improvement rate of more than 68.35%. In addition, with the reduction in the number of triangles, the improvement rate is increasingly better. Take face 01 (fury) as an example: when the number of triangles is reduced to 5,000, the improvement rate of this method is 85.30% compared with the DSD; and when the number is 500, the improvement rate is 88.98%. In other words, this method is better than DSD in low triangle numbers and can retain more facial shapes than DSD.

The average errors generated by all the facial expressions in the facial animation are shown in Table 2 and Figure 11. The table shows the errors generated from simplifying the entire 3D facial animation using DSD and this method, respectively. In Table 2, it can be seen that when the number of triangles is simplified to 5,000, 2,000, 1,000, and 500, the errors generated by DSD are 4.9897 × 10−2, 10.0919 × 10−2, 21.2115 × 10−2, and 38.8511 × 10−2, whereas those of this method are 1.2092 × 10−2, 2.4207 × 10−2, 3.7969 × 10−2, and 6.2438 × 10−2, which are improved by 75.77% to 83.93% compared with DSD.

In addition to the use of QEM for simplification like DSD, FEQ also estimates the simplification errors of important facial features such as the eyes, nose, mouth, and ears. Therefore, FEQ is also better than DSD in terms of simplification results. However, the features of facial animation include not only the eyes, nose, mouth, and ears, but also wrinkles on an angry forehead, protuberant cheeks, and obvious nasolabial folds on a smiling face.

The data in Table 3 and Figure 12 indicate that when the number of triangles is simplified to 5,000, 2,000, 1,000, and 500, the errors generated by FEQ are 1.2658 × 10−2, 2.7134 × 10−2, 5.2143 × 10−2, and 10.6767 × 10−2, which are better than DSD’s but are only limited to general facial features such as eyes, nose, mouth, and ears, having no obvious improvement in other expression features like wrinkles in forehead and nasolabial folds in cheeks.

The error comparison results in Table 4 indicate that when the number of triangles is simplified to 5,000, 2,000, 1,000, and 500, the respective errors generated by this method are 1.2092 × 10−2, 2.4207 × 10−2, 3.7969 × 10−2, and 6.2438 × 10−2, which are all better than EFQ’s. The improvement rate reaches 4.47% when the number of triangles is simplified to 5,000, and it even reaches 41.52% when the number is 500.

In the preservation of expression features, as shown in Figure 13, when the model is simplified to 5,000 triangular pieces, as shown in Figure 13(b), the FEQ method retains 129 triangles in the forehead wrinkled area; however, this method has 123 fully covered and 37 half-covered triangles in the same area. If we use 0.5 as a unit for each half-covered triangle, then our method obtains 141.5 triangles in the forehead wrinkled area. In other words, for this feature area, this method retains 10% more triangular pieces than the FEQ. Moreover, when the model is reduced to 1,000 triangles, as shown in Figure 13(c), the FEQ only has 15.5 triangles in the forehead wrinkled area (including 11 fully covered and 9 half-covered triangles), but this method has 28.5 triangles (20 fully covered and 17 half-covered triangles); 84% more triangles are retained compared with the FEQ.

In addition, in the cheeks and the nasolabial fold areas, as shown in Figure 14, when the number of triangles in the model is reduced to 1,000 using the FEQ, these areas retain 61 triangles. However, by using the proposed method to simplify the model, 113.5 triangles are retained (including 97 fully covered and 33 half-covered triangles), which is approximately 86% more triangles as compared to the FEQ.

From the simplified results of forehead wrinkles, cheeks, and nasolabial fold areas, it can be seen that this method retains a greater number of triangles than the FEQ. It also shows that, after simplification, this method gets closer to expressing the original facial model in these feature areas than the FEQ method.

To verify the effectiveness of our method, we also adopted perceived quality to further compare it with DSD and FEQ. Perceived quality is mainly used to compare visual differences. We used tensor-based perceptual distance measure (TPDM) [34], proposed by Torkhani et al., to evaluate the perceived quality values of the model before and after simplification. The perceived quality value is between zero and one. A value closer to one signifies that the appearance of the simplified model is very similar to the original, whereas a value closer to zero signifies that the appearance of the simplified model is very different from the original.

In Table 5, it can be seen that the perceived value generated by our method is 1.7413 × 10−1 to 3.0275 × 10−1, results that are clearly superior to DSD’s 2.4633 × 10−1 to 4.6055 × 10−1, with its improvement rate of 29.31% to 37.30%. In Table 6, it can be seen that the perceived value generated by FEQ is 1.7413 × 10−1 to 3.0275 × 10−1. Thus, this method generates a lower perceived value than FEQ and can achieve an improvement rate of 2.33% to 12.88%, as shown in Figure 15.

In addition, in terms of time cost, given that this improved method is based on DSD, it inherits benefits such as lower computation time. In order to perform the time cost analysis, this paper divided the entire implementation process into four phases, namely, the setup time, initialization time, simplification running time, and output time.

In the setup time phase, the main work is to input the original facial models, including vertex coordinates and triangle information. In the initialization time phase, this method calculates relevant information needed to simplify the model, including QEM, HCTM, and MSO. In the simplification running time phase, the method mainly records the execution time needed to simplify the model until it reaches a specific required number of triangles.

In the final output time phase, it records the time needed to output the simplified facial model. The average time required for each execution process is shown in Table 7. As is evident from the table, the execution speed of this method is significantly fast. The overall simplification time is only about 8 seconds, in which the initialization time for HCTM and MSO is just over 2-3 seconds, accounting for about 37% of the total simplification time. Undoubtedly, this method is very efficient in model simplification.

7. Conclusions

In order to analyze the shape changes of facial animation and to reduce simplification errors, this paper proposed HCTM to modify the homogeneity of the local coordinate system for different models and adopted MSO to automatically analyze the degree of facial animation shape change, to locate the region with most expression changes and to rectify the drawback wherein feature regions such as the eyes, nose, mouth, and ears had to be defined by users. In experiments conducted, RMS and perceived quality errors were utilized to compare the simplification results of the proposed method with those of DSD and FEQ. The experimental results show that the errors caused by this method are lower than those of DSD and FEQ. Furthermore, this method can not only properly retain the facial features of fixed positions such as the eyes, nose, mouth, and ears, but also preserve more triangles than other methods in other important feature regions such as wrinkles on the forehead, cheeks, and nasolabial folds. Thus, it satisfies the requirement that the simplified facial animation should be as elaborate and natural as possible.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The author thanks Professors Chassery, Wang, and Torkhani for providing the TPDM codes. Additionally, this research was supported by the Minghsin University of Science and Technology, Taiwan, under Grant MUST-104CSIE-2.