Abstract

A semantic ontology-driven hierarchical consistency segmentation algorithm was proposed to solve the segmentation inconsistency of animation character models because of the changing of poses. The mapping between semantic labels and local geometric features was extracted to form a segmentation ontology. In the process of segmentation, support vector machine (SVM) and local geometric features were used to identify the semantic labels, and segmentation was carried out according to the semantic label driving hierarchy to ensure the consistency of the segmentation levels of animation character models. Poisson equation was used to define contour lines for the equal perimeters of segmentation boundary because of the changing of poses. This optimization method made the segmentation boundary smooth and consistent under the changing of poses. In the experiment part, all kinds of animation character models under different poses were verified and analyzed, and the consistent hierarchical segmentation effect was obtained. Compared with the existing methods, the proposed segmentation ontology can solve the problem of adaptive selection of optimization parameters of different classes of models, and improve the segmentation quality. With the continuous development of deep learning, the use of image segmentation for animation, and human pose recognition will become more and more important.

1. Introduction

In the field of computer graphics such as animation and games, segmentation of animation character models is necessary to obtain different subparts to analyze their motion characteristics. In addition, such segmentation results can be applied to 3D animation modeling, multiview reconstruction, and other fields. Therefore, segmentation of 3D animation [1] character models has aroused wide interest of researchers.

For animation character models, it is crucial to ensure the consistency of segmentation results under pose changes. There are many research achievements in the field of graphics for reference. These studies can be classified into two main categories. One is to adopt geometric features insensitive to pose changes. Geodesic distance has been widely used in model segmentation due to its stability under pose changes. The shape diameter function is to approximate the axial distance of the 3D model, which is also stable under pose changes. Thermonuclear signals are not only stable under pose changes, but also provide a theoretical framework for multi-scale analysis from local to global. The other is the common segmentation of model sequences proposed by researchers. In recent years, collaborative segmentation is achieved based on the similarity of shape features of multiple models to achieve consistency of segmentation. Although these studies can improve the consistency of segmentation results, the rationality of segmentation results is not considered. In order to ensure that segmentation results conform to users’ semantic perception, researchers consider incorporating semantic knowledge into the segmentation process [2].

According to existing studies [3], for segmentation of animation character models [4], the consistency of segmentation results in hierarchy can be effectively guaranteed by prior knowledge [5]. However, the existing researches have not discussed how to construct a normative knowledge structure to define the required parameters of segmentation, which have a very important impact on the quality of segmentation. Therefore, based on the existing studies, this paper proposed the construction of segmentation ontology and then formed the definition of ontology knowledge suitable for the segmentation of animation characters. In this paper, the definition of segmentation ontology was proposed for animation character models, so that users can customize the segmentation hierarchy, shape characteristics, optimization algorithm, optimization parameters of segmentation boundary, and obtain the optimal segmentation effect. In terms of the consistency of segmentation boundaries, considering that the segmentation boundaries of 3D models with different poses have equal perimeters, the method of segmentation boundary optimization by Poisson equation [6] was proposed, so as to make the segmentation boundaries more consistent.

2. Segmentation Ontology

In order to ensure the consistent results of 3D mesh model segmentation under different poses [7], domain knowledge at different levels should be used in the segmentation, including the hierarchical structure definition of the model, semantic labels of different subparts, segmentation algorithm, boundary optimization algorithm, and optimization parameter setting. By defining the segmentation ontology, ontology representation of segmentation domain knowledge is formed, and the following two problems can be solved through standardized domain knowledge definition.(1)Consistent hierarchy. The geometry features of models with different poses are different, so only using shape features to control the segmentation hierarchies will lead to inconsistent hierarchies. It is necessary to control hierarchical results with prior knowledge to ensure the consistency of segmentation results.(2)Approximately consistent location of the segmentation boundary. For segmentation results with consistent hierarchical structure, it is necessary to consider how to ensure the consistency of segmentation boundaries, because any segmentation results need to adopt an optimization algorithm to obtain the smooth boundaries. For different sequences of animation models, the optimization algorithm and matching parameters should be effectively organized to ensure that the optimal segmentation boundary of a specific model is consistent.

According to the definition of ontology [8], segmentation ontology decomposes domain knowledge into two parts: hierarchical structure features and shape descriptors required for segmentation, as shown in Figure 1. The two parts of knowledge form “include” and “use” relations: the “include” relation is used to represent the inheritance relation of hierarchical structure in segmentation to ensure the consistency of hierarchical structure in the segmentation process. The knowledge defines the hierarchical inheritance relation of semantic labels to ensure the consistency of hierarchical structure in different poses.

The “use” relation is used for local optimization of the segmentation process, defining shape feature descriptors and segmentation parameters, including classifier definition, shape feature definition, and optimization parameters required in segmentation optimization, so as to complete the semantic label recognition and segmentation boundary optimization, and obtain the optimal segmentation result.

Figure 2 shows the segmentation ontology definition and segmentation results of the above knowledge structure. It can be seen from this figure that segmentation ontology is an extensible knowledge structure, which not only defines the semantic label recognition method for object segmentation but also defines different optimization methods. In this paper, semantic knowledge is used as the segmentation criterion. The differences are as follows: (1) the segmentation ontology defined in this paper adopted statistical classification to replace the human body proportion knowledge. Considering the clustering of geometric features of the same subpart, the mapping relationship between geometric features and semantic labels can be established through training, but it is difficult to quantify this mapping relationship by using parameters. So, this definition makes segmentation ontology more general; (2) to solve the problem of parameter setting required by segmentation boundary optimization, this paper proposed to separate the optimization method and parameter definition to form domain knowledge, so that the personalized parameters can be defined for different models and the optimal segmentation boundary can be obtained.

3. Semantic Ontology-Driven Hierarchical Segmentation of Animation Character Models

According to the above definition of segmentation ontology, the following steps are adopted to complete the segmentation of any model: (1) Extract external feature points and obtain the first layer of rough segmentation based on ontology knowledge; (2) Use a support vector machine (SVM) [9] to identify semantic labels of different subparts; (3) Obtain the hierarchical structure of the input model according to semantic labels and external feature points; (4) Optimize the segmentation boundary according to the optimization parameters defined by the segmentation ontology.

3.1. First Layer Rough Segmentation

Given that the sum of geodesic distances of external structural points of the animation character model with very obvious structural features has extreme value, the external structural points can be well detected by the salience function, and these stable feature points can be used to drive the first layer rough segmentation. The salience function is the integral of geodesic distance along the surface of the 3D model [10], defined aswhere, pi and pj are model vertexes, I and j are index numbers of model vertexes, is the geodesic distance between 2 points, and N is the number of surface vertexes of the 3D model. For the vertexes on the grid, if its salience function value is greater than the neighboring vertex ψi, it indicates that this is an external feature point. A series of local maximum point li can be calculated, and the set of local feature points is L, which is defined as follows:where, NL is the number of local maximum points and V is the set of model vertexes. The salience function is a feature point detection function that is very sensitive to geodesic distance, thus leading to overdetection (many local salient points are also detected, as shown in Figure 3). Here, it is necessary to combine the number of subparts defined in the segmentation ontology and adopt k-means clustering [11] to obtain NE −  1 clustering set (the central point set has no salient points, and NE is the number of external feature points of the model). In the case of multiple local salient points in each region, lm, the vertex with the maximum salience function value in the region, was selected as the initial clustering point, and the external feature points were defined aswhere, ei is the detected external feature point. So far, the set of external feature points of the model is defined as E = {ei ∈ V, 0 < i < NE}

Since each external feature point represents a subpart of the model, this feature point can be divided into different subparts by region growth. The smaller the value of the salience function, the greater the probability that the vertex belongs to the rigid body part of the model. Therefore, the priority queue is constructed according to the salience function value of the model vertexes, and then the vertexes are extended until the entire convex subparts are divided into a series of disconnected regions, and the first layer of rough segmentation is completed. In the implementation of the algorithm, in order to avoid the high complexity of rigid body extraction with multidimensional calibration, geodesic paths between external feature points are constructed first, and then vertex tracking is carried out on the geodesic paths until the extracted rigid body region has effectively separated different subparts, and the algorithm ends at this time.

3.2. Semantic Label Recognition

The shape diameter function [12] is used to define the local geometric features of each subpart. The shape diameter function constructs a cone at the midpoint of each plane, and then starts from the midpoint and transmits light into the cone. The light is screened by the angle between the light and the inner vector and a certain standard deviation. Finally, the weighted average value d is calculated for a series of target light, and the set of the value is denoted as D. This method makes the shape diameter function keep a stable value under pose change, rotation, and other transformations. The shape diameter function is based on the distance characteristics of rays and does not have scale invariance. Therefore, the SDF [13] normalized value of plane FI is calculated by the following formulawhere, ∂ is a standardized parameter, and logarithmic transformation enlarges the normalized value of small SDF, which can retain some details to the maximum extent. Through experimental analysis, set ∂ = 4. For the subpart where the feature point ei is located, combined with SDF normalized shape eigenvalue, the histogram feature vector can be constructed as follows:where, the element μj is the number of SDF eigenvalues in the interval ((j − 1)/t, j/t] (t is the number of equal divisions of [0, 1.0]). In order to further illustrate the intra-class clustering and interclass separability in the SDF histogram. Figure 4 shows the histogram distribution of human arm subparts and legs [14]. It can be found that in Figures 4(a)–4(c), there are obvious changes in the human arms, but the SDF histogram constructed is basically stable and has good intra-class clustering. This is mainly because the SDF feature is due to the approximation of the diameter of the local area of the model and is stable under the change of poses. For another subsection, namely, leg, as shown in Figure 4(d), the obtained SDF histogram and the arm have obvious differences, ensuring that the classifier can accurately distinguish the semantic labels of different subparts of the human body.

According to the local histogram of the subpart of the model, SVM is used to identify the corresponding semantic label, and the next layer segmentation is driven according to the semantic label. (Yi, Fi) indicates that the semantic label of the histogram feature vector Fi is YI. The binary classifier is defined aswhere, K is the kernel function [15], b is the intercept, n is the number of support vectors, t is the index corresponding to each support vector, αt is the Lagrange multiplier [16], Ft is the support vector, and yt is the semantic label. ℓ is simply a binary classifier, and semantic label recognition is a multivalued classifier problem, thus, based on equation (6), the multivalued classifier is formed by pairwise combination [17]:where, ym stands for the label with the most votes, and VT stands for the number of votes belonging to label yi. First, the binary classifier is used to obtain the decision result of the SDF feature vector, and then the final semantic label is identified according to the principle of the maximum number of votes.

3.3. Equal Perimeter Boundary Segmentation Optimization

Through the above process, the segmentation result of hierarchical structure based on ontology knowledge can be obtained, but there is a major problem: the segmentation boundary is rough and not smooth, which needs to be optimized. In this paper, Poisson equation is used to define the optimization algorithm according to the equal perimeter characteristics of the character model under pose changes. The isoline theory used in Poisson equation can effectively avoid the influence of local curvature noise on the segmentation boundary, and the optimization parameters defined by the segmentation ontology can solve the optimal boundary. Figures 5(a) and 5(d) show the comparison of the results before and after arm segmentation boundary optimization.

3.3.1. Fuzzy Region Extraction

The fuzzy region is defined by the feature point and boundary vertex of a specific part, and the farthest point is defined aswhere, BR is the set of local area vertexes where is located. For a given threshold parameter Tr, fuzzy regions can be obtained, as shown in Figure 5(b). The vertex pi in the fuzzy region must meet the following conditions:

3.3.2. Isoline Extraction

In the set of vertexes in the fuzzy region, the set of vertexes on the upper edge far from is U, and the set of vertexes on the lower edge close to is K. Poisson equation is defined aswhere, Δ is the Laplace operator. Boundary constraint is

A series of adjacent smooth isolines {I1, …, Ii, …, Iz} are obtained by solving linear equations, where z is the number of segment lines.

3.3.3. Optimal Segment Line Selection

On the basis of isoline extraction, the optimal segment line is obtained by considering the position information and shape characteristics. The following optimal decision function is formed:where, refers to position constraint conditions, making the optimal segment line close to the middle position; the second is the shape feature constraint, which makes the optimal segment line in the concave region, that is, the perimeter of the optimal segment line is smaller than the isoline K of the adjacent region. ri is the local radius of isoline Ii, ri = mi/(2π), where mi is the perimeter. R(k) is a Gaussian function used to punish isolines with a large perimeter, defined as

4. Experimental Analysis and Discussion

The experimental data are from TOSCA database [18], including animation character models of humans, horses, lions, and centaurs. Firstly, the parameters and performance of the classifier used for the semantic labels are analyzed. Secondly, the stability of the algorithm is verified. Finally, a comparative analysis is made with the existing similar algorithms.

4.1. Classifier Parameter Selection and Performance Analysis

The kernel function is to map linear indivisible vector to higher-dimensional space and then construct classification surface in higher-dimensional space. Gaussian kernel function with wide convergence domain, good performance, and few parameters was selected in this paper [19]:

The kernel parameter σ and error penalty parameter C have important effects on the performance of the classifier. The grid search algorithm is used to find the optimal parameters, and the distribution of recognition accuracy of classifier in step learning iteration is shown in Figure 6 in [2−10,210]. When (log2C, log2σ) = (−1, −5), the accuracy is 100%. In this way, the optimal parameters of the four types of models are selected, respectively. Table 1 shows the classifier parameters and recognition rates of different class models:where, RS and TS are the correct sample number and test sample number, respectively. In the test set, only the centaur model had a classifier recognition rate of 92.59%, while the other models all reached 100%, indicating correct recognition. The results show that semantic label recognition can be achieved by statistical classification. The main reason for the misidentification is that the SDF shape does not distinguish enough between the classes of the centaur legs and tail, which leads to the misidentification of part of the tail. In the follow-up study, it is necessary to consider the use of features with better differentiation, so as to make the recognition accuracy higher.

4.2. Stability Analysis

Figure 7 shows the segmentation results of various animation character models in different poses. The same gray level represents the same semantic label. It can be found that the pose difference produced by these character models in the process of movement is very obvious (for example, in Figures 7(h) and 7(i)), but the segmentation results are consistent. This is because, on the one hand, the shape diameter function has pose stability, and the histogram representation is insensitive to local changes, which enables SVM to correctly identify semantic labels. On the other hand, the joint perimeter remained unchanged during the pose change. The algorithm in this paper adopts equal perimeter features in boundary optimization, which makes the boundary segmentation to have a good consistency.

4.3. Comparative Analysis of Segmentation Consistency

To further verify the performance of the algorithm, the segmentation quality of the proposed method is compared with that of the method in literature [17]. The quality of segmentation mainly depends on two aspects: (1) whether segmentation results are globally consistent in the semantic structure; (2) whether the segmentation boundaries of model sequences are consistent. Let the segmentation result of frame X bewhere, Fx is the plane index, and Cx is the segmentation index corresponding to each plane. The consistency similarity of segmentation sequence YF is defined as (assuming that the two models have the same number of points and planes, and the point and plane indexes of the subparts of any two models are also the same):where, Cx-Cy represents the difference between the segmentation results of the two models, and GT is used to count the number of elements. The segmentation consistency between the two models can be well measured through equation (17). The larger the YF value is, the higher the segmentation consistency is. The similarity was calculated by using M-frame model and reference segmentation result Sf, and the average of M similarity is taken as the consistency evaluation index ξ of the model:

Table 2 shows the consistent segmentation results obtained by using different models and reference models. It can be found that literature [17] only uses the normal included angle between planes as the optimization constraint condition, so it is difficult to ensure that the segmentation boundary converges to the joint position. In this paper, the isoperimetric line is used to extract the boundary, which is not only very smooth but also has better consistency. In addition, using human body proportion as semantic knowledge for semantic label recognition has great limitations and can only be applied to human body model. In this paper, a semantic label recognizer is constructed and hierarchical consistent segmentation of different models can be obtained only by updating the training data. In conclusion, this method has better universality and applicability.

Figure 8 shows the comparison of segmentation results obtained by the mentioned methods, in which SDF features are directly used for clustering analysis. It can be found that the SDF features have good clustering and have similar segmentation results for character models with different poses. However, this segmentation does not conform to the semantic structure, as shown in Figure 8(c). The method can correctly identify the semantic structure features of human body, but the algorithm uses human body proportion as domain knowledge and has no generality. In addition, although this method can also obtain relatively smooth boundaries, its convergence positions are obviously inconsistent. For example, for the second layer segmentation of human model legs, this method mainly adopts triangle angle as constraint condition, resulting in inconsistent convergence boundaries. The method in this paper basically converges to the knee position, because the perimeter is adopted as the convergence condition for the segmentation boundary in this paper, and relatively consistent results can be obtained, as shown in Table 2.

5. Conclusion

Experiments show that the segmentation quality can be significantly improved by selecting the optimal parameters through ontology. The main shortcomings of this study are that the proposed algorithm cannot be applied stably and reliably to the animation character models with severe occlusion, and the segmentation boundary needs further optimization. Further research is needed in the following two aspects.(1)Consistent segmentation of 3D dynamic data under occlusion. For 3D data with partial occlusion, there will be data loss. In this case, both local features and subpart structure may change significantly. Therefore, for this kind of data, it is necessary to propose more stable local features based on existing studies for semantic sub-label recognition and define the rejection rate for removing unstable subparts.(2)Collaborative boundary optimization. For boundary consistency, the method in this paper does not consider the correlation between models, although the consistent segmentation results can be better achieved by using equal perimeter. In the following research, collaborative boundary optimization algorithms can be defined for multiple models to obtain more consistent segmentation boundaries through clustering analysis.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Acknowledgments

Thanks for the Project supported by the Fujian Province Social Science Planning “Fujian Folk Animation Research” (Grant no. FJ2021B195) and Sichuan Province Social Science Key Research Base Sichuan Animation Research Center “Sichuan Animation Industry Chain” (Grant no. DM2020002) support.