Projection Analysis Optimization for Human Transition Motion Estimation

Li, Wanyi; Zhang, Feifei; Chen, Qiang; Zhang, Qian

doi:https://doi.org/10.1155/2019/6816453

International Journal of Digital Multimedia Broadcasting

On this page

Abstract Introduction Evaluation Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2019 | Article ID 6816453 | https://doi.org/10.1155/2019/6816453

Projection Analysis Optimization for Human Transition Motion Estimation

Wanyi Li,¹Feifei Zhang,¹Qiang Chen,¹and Qian Zhang¹

Academic Editor: Jintao Wang

Received07 Nov 2018

Revised18 Mar 2019

Accepted09 Apr 2019

Published02 Jun 2019

Abstract

It is a difficult task to estimate the human transition motion without the specialized software. The 3-dimensional (3D) human motion animation is widely used in video game, movie, and so on. When making the animation, human transition motion is necessary. If there is a method that can generate the transition motion, the making time will cost less and the working efficiency will be improved. Thus a new method called latent space optimization based on projection analysis (LSOPA) is proposed to estimate the human transition motion. LSOPA is carried out under the assistance of Gaussian process dynamical models (GPDM); it builds the object function to optimize the data in the low dimensional (LD) space, and the optimized data in LD space will be obtained to generate the human transition motion. The LSOPA can make the GPDM learn the high dimensional (HD) data to estimate the needed transition motion. The excellent performance of LSOPA will be tested by the experiments.

1. Introduction

3-dimensional (3D) human motion animation is applied in many fields, such as video game and movie. It is necessary to estimate the human transition motion for making all kinds of 3D animations [1–4]. Estimating the human transition motion is crucial to making the smooth animation of the 3D human motion; it is a branch of human motion estimation. As to technologies of human motion estimation, there are some advanced methods in recent years, such as multiview image segmentation [5], sparse presentation [6], and convolutional neural network (CNN) coupled with a geometric prior [7]. These methods focus on the reconstruction of the 3D human motion from the 2-dimensional (2D) image sequence. The needed data is the high dimensional data of the 3D human motion model. The mapping will be built between the model and the 2D image for each frame. However, it is difficult to build the mapping without the overcomplete prior information, as a result of the data complexity during the optimization. Some human poses contain the ambiguity of limbs; for example, it is hard to determine which thigh is in the front from the silhouette. The right thigh or left thigh cannot be confirmed. The problem is that if we have enough prior information to distinguish the ambiguity, the reconstruction will be achieved easily. Thus the generated model of 3D human motion is necessary; the samples of the model can be obtained to construct the prior information. Besides, the generated model can also generate the 3D human motion for making 3D character movie. Then the generated model can be built through the unsupervised learning. The unsupervised learning of the model is the necessary supplement of the advantaged methods above. In this paper, how to generate the human transition motion will be mainly discussed in the following sections

If there is a method that can estimate the valid human transition motion, the animation making time will cost less, and the work will get easier. Thus a new method called latent space optimization based on projection analysis (LSOPA) is proposed to estimate the human transition motion. LSOPA needs to combine Gaussian process dynamical models (GPDM) [8] to process the low dimensional (LD) data. GPDM is the derivation of some dimension reduction models [9–13]; it can provide the prediction of the LD data. After the dimension reduction, LD data will be optimized by LSOPA, so that the valid human transition motion can be generated to achieve the estimation. The human motion is described by the high dimensional data. If the HD data sample is searched in its own dimensional space, the invalid data will be generated; it means the generated human motion in 3D will be abnormal [14]. GPDM is an unsupervised learning model; it can learn the high dimensional data (HD) sample and estimate the new one, but it needs to process the LD data in the LD space. In the LD space, the LD data can be searched, and the corresponding valid HD data can be generated by the mapping from LD data to HD data. Some methods [15–17] can process the LD data, but the generated LD data are all unreliable and undetermined during the optimization, as a result of the randomization of these methods. The LSOPA can do this work to process the LD data better and ensure the valid transition motion can be generated. The excellent performance of LSOPA will be tested by the experiments in the corresponding section.

The human motion is described by a 3D human motion model. The model has some markers to show the limbs of human motion; it is the HD data. When we make the 3D human motion animation and only have the samples of the two irrelevant human motions, it is necessary to use the transition motion to connect the two irrelevant human motions, so that the complete and smooth human motion in 3D is constructed. Meanwhile, the transition motion consists of many poses, and the poses are all the HD data samples of 3D human model, thus how to estimate the valid transition motion is a challenged task. However, the LSOPA can take the advantage of the GPDM to generate the valid transition motion and avoid generating the invalid pose of the transition motion. The proposed method will be discussed in the following sections.

2. Dimension Reduction

When we have the sequence of HD data samples , the corresponding LD data can be obtained as the following equations from GPDM [8]:Equations (1)-(7) are used to computing (8), then the X and the other parameters can be got. In (1) and (2), we have the sequences , , respectively, Y denotes the HD data samples of 3D human motion, and X denotes the LD data of Y in the LD space after the dimension reduction. In (3) and (4), and are kernel matrices, respectively, and are their corresponding kernel parameters, and they satisfy the relation of (6) and (7), respectively. Equations (3) and (4) are showing the method of computing the two kernel matrices and . is a scale diagonal matrix with the preset parameter , x₁ conforms Gaussian distribution of dimensions, and is the sequence . The mapping from LD data to HD data can be built as follows:

The GPDM has a dynamic process. It can predict the LD data in the latent space (also called LD space) and generate the needed HD data of human motion, so that it has better performance than other dimension reduction models. Thus GPDM can be selected to learn the samples of the two different human motions; then the LD space can be built to find the needed LD data of the transition motion, so that corresponding poses can be generated through the mapping of (9).

3. Latent Space Optimization Based on Projection Analysis

3.1. The LD Data of the Transition Motion in the LD Space

After the dimension reduction, the HD data samples of the two irrelevant human motions can be seen in the LD space. The LD data of two difference sequences can denote the corresponding poses of the two irrelevant human motions. It can be found that there is an obvious distance between the two LD data as Figure 1 shows. From the two sets of LD data, we can know the needed transition motion can be generated through the LD space, but it needs to construct an appropriated curve to connect the two sets of LD data in the LD space. Thus, the optimized work will be started.

3.2. Optimization in the LD Space

Assume that , are, respectively, the LD data of the two irrelevant human motions after dimension reduction, the first motion has frames, and the second motion has frames. The positions of the corresponding LD data can be seen in Figure 2. In Figure 2, the derivations of can be got as follows:From Figure 2, we can see denotes the distance from the LD data sample to its projection A in (10), and we can also see the vector is the projection of the vector in the plane . Assume the set of bases in the plane , is the vector from the plane ; we can get denotes the projective operation, which means obtaining the projective vector or plane of H. denotes the complement of the projective operation , which means obtaining the vector perpendicular to projective vector from the operation . In (12) and (13), the complement of the projection vector is perpendicular to the plane ; thus the equation can be got from (13) as follows:In (14), ; then is a vector of the plane , and (15) can deduced from this:In (15), can be denoted by the bases and coefficients ; then . Substituting (15) into (14), the following equation can be deduced:Simplify (16), compute the least squares estimator of , and get the equation as below:Substitute (12) and (17) into (15); the following equation can be obtained:From (15) to (18), we have , and the bases are linearly independent.

Our task is to find the optimal LD data sequence of the transition motion between and ; thus we can build the constraint functions to describe position of the needed LD data sequence as follows: Equation (19) denotes confirming the projection distance in the vector from each ; then, we haveEquation (20) denotes confirming the distance between and ; meanwhile, we can getIn (21), is a vector of which is perpendicular to the plane . C is a preset vector in the plane ; it can be seen as the reference of computing the angle of (21). In (19)-(21), the LD data of the transition motion can be described in the LD space and is the preset values of adjusting the value of . is preset angle which is describing the deviation of the LD data position. Then the projection and corresponding vector can be used to build these constraint functions, so that an object function with constraints can be obtained as follows:The optimized object function of (22) ensures that the frames of the transition motion can be varying regularly and the valid transition motion will be seen in the vision. The solution of (22) can be obtained by the method in [18].

3.3. The Procedure of the LSOPA

When the two irrelevant human motion samples are obtained, the two samples are denoted by , . Then, we have the following.

LSOPA(1)Let ; use the method called principal component analysis (PCA) to initiate and get the initiated LD data . Preset the values of , , and .(2)Preset . Then, compute , update , and get , , .(3)Compute , and the corresponding constraint functions in (22), and get the LD data of the transition motion.(4)Compute the transition motion , let , and output the 3D transition motion model. Finally, let , , compute the , and update , , , . End the whole procedure.

The method LSOPA can be written as pseudocode in Algorithm 1 for the transition motion estimation.

Input: , ; Output: , , , ,
Preset , , , C, , , ;
;
Use PCA to process Y, and get ;
;
For i=1 to
,

;
End For
;
;
;
;
;
End the procedure.

4. Experiment and Evaluation

We test the proposed method LSOPA in the performance of the vision and error test. We choose the line initialization [8] (LI) and random initialization [17] (RI) as the compared methods. LI means that the LD data of the transition motion is obtained linearly between the LD data of the two different motions. RI means that the LD data of the transition motion is obtained randomly between the LD data of two different motions. The LD data samples (green ones) in the LD space can be seen in Figures 3(a)–3(c), which denotes the transition motion. The tested transition motion is a total of 15 frames and the tested database is the Carnegie Mellon University (CMU) motion capture database which is shortened for CMU database.

(a) LSOPA

(b) LI

(c) RI

The three methods can estimate the corresponding transition motions, but the transition motions are obviously different. Let us see the effect in the vision from Figures 4(a)–4(c). In the estimation of the transition motion between walking and playing golf, we can find that the transition motion generated by the LSOPA is smoothly and naturally in vision. The transition motions generated by the other two methods are invalid and messy; they are abnormal human transition motions obviously. In a word, the transition motions of LI and RI are not like playing the golf. It is demonstrated that the performance of LSOPA is the best in the visual effect among the three methods.

(a) The transition motion of LSOPA

(b) The transition motion of LI

(c) The transition motion of RI

Then, the error and mean error of estimating the transition motion are tested. How to compute the error can be seen in [19], but the scale of markers in the 3D model must be considered. The error can be computed as follows:In (23), , , and are, respectively, the true position and the estimated position of the marker in the 3D model; both are the joint marker in the 3D model. Mar is the number of the markers in the 3D model.

The transition motion includes four transition motions; they are walking and playing golf, walking and dancing, walking and swimming, and walking and playing football, respectively. The contrast of the true data and the estimated data can be seen in Figures 5(a)–5(c). The error of each frame in the transition motion generated by LSOPA is the lowest among the three methods, so is the mean error. Generally speaking, the error of some frames will be close, but the error of LSOPA will be kept low and stable on the whole. The tested result can also reveal that the LSOPA has the best performance in the error test.

(a) The error of the transition motion from walking to playing golf

(b) The error of the transition motion from walking to dancing

(c) The error of the transition motion from walking to swimming

(d) The error of the transition motion from walking to playing football

5. Conclusion

The LSOPA is proposed to solve the problem of estimating the human transition motion. The LSOPA is tested in the experiments of the vision and error test, and we can find that the performance of the LSOPA is the best among the three methods. However, the LSOPA still cannot process some complex human transition motion well, and its initialization of optimization cannot be random; thus the LSOPA need to be improved in the future research. Then, the improved method can carry out the unsupervised learning of the two complex irrelevant human motions and estimate their smooth and valid transition motion. The 3D human motion model will also be replaced by other exquisite models [20–22] in the future research.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work is supported by the University Young Creative Talent Project of Guangdong Province (no. 2016KQNCX111); Science and Technology Plan Project of Guangzhou (no. 201804010280); National Natural Science Foundation of China (no. 61772140); Science and Technology Planning Project of Guangdong Province (no. 2017A010101021); Appropriative Researching Fund for Professors and Doctors, Guangdong University of Education (no. 2015ARF17, no. 2015ARF25); Key Disciplines Of Network Engineering of Guangdong University of Education (no. ZD2017004); and Computer Practice Teaching Demonstration Center of Guangdong University of Education (no. 2018sfzx01).

References

Z. Liu, L. Zhou, H. Leung et al., “High-quality compatible triangulations and their application in interactive animation,” Computers and Graphics, vol. 76, pp. 60–72, 2018.
View at: Publisher Site | Google Scholar
A. Sujar, J. J. Casafranca, A. Serrurier et al., “Real-time animation of human characters' anatomy,” Computers and Graphics, vol. 74, pp. 268–277, 2018.
View at: Publisher Site | Google Scholar
B. Walther-Franks and R. Malaka, “An interaction approach to computer animation,” Entertainment Computing, vol. 5, no. 4, pp. 271–283, 2014.
View at: Publisher Site | Google Scholar
D. Archambault and H. C. Purchase, “Can animation support the visualisation of dynamic graphs?” Information Sciences, vol. 330, pp. 495–509, 2016.
View at: Publisher Site | Google Scholar
L. Yebin, G. Juergen, S. Carsten et al., “Markerless motion capture of multiple characters using multiview image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2720–2735, 2013.
View at: Publisher Site | Google Scholar
X. Zhou, M. Zhu, S. Leonardos, and K. Daniilidis, “Sparse representation for 3D shape estimation: a convex relaxation approach,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 8, pp. 1648–1661, 2017.
View at: Publisher Site | Google Scholar
X. Zhou, M. Zhu, G. Pavlakos, S. Leonardos, K. G. Derpanis, and K. Daniilidis, “MonoCap: monocular human motion capture using a CNN coupled with a geometric prior,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 4, pp. 901–914, 2019.
View at: Google Scholar
J. M. Wang, D. J. Fleet, and A. Hertzmann, “Gaussian process dynamical models for human motion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 283–298, 2008.
View at: Publisher Site | Google Scholar
S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.
View at: Publisher Site | Google Scholar
N. Lawrence, “Probabilistic non-linear principal component analysis with Gaussian process latent variable models,” Journal of Machine Learning Research (JMLR), vol. 6, pp. 1783–1816, 2005.
View at: Google Scholar | MathSciNet
C. H. Ek, Shared Gaussian Process Latent Variables Models, Oxford Brookes University, Oxford, UK, 2009.
J. Xinwei, G. Junbin, W. Tianjiang et al., “Supervised latent linear Gaussian process latent variable model for dimensionality reduction,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 6, pp. 1620–1632, 2012.
View at: Publisher Site | Google Scholar
J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
View at: Publisher Site | Google Scholar
W.-Y. Li and J.-F. Sun, “Human motion estimation based on gaussion incremental dimension reduction and manifold boltzmann optimization,” Acta Electronica Sinica, vol. 45, no. 12, pp. 3060–3069, 2017.
View at: Google Scholar
W. Li, J. Sun, X. Zhang, and Y. Wu, “Spatial constraints-based maximum likelihood estimation for human motions,” in Proceedings of the 2013 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2013, pp. 1–6, IEEE, KunMing, China, 2013.
View at: Google Scholar
W. Li, J. Sun, and Z. Song, “Manifold latent probabilistic optimization for human motion fitting based on orthogonal subspace searching,” Journal of Information and Computational Science, vol. 11, no. 15, pp. 5357–5365, 2014.
View at: Publisher Site | Google Scholar
W. Li, J. Sun, and X. Zhang, “Learning for human transition motions estimation based on feature similarity optimization,” Journal of Computational Information Systems, vol. 10, no. 5, pp. 2127–2136, 2014.
View at: Google Scholar
R. H. Byrd, M. E. Hribar, and J. Nocedal, “An interior point algorithm for large-scale nonlinear programming,” SIAM Journal on Optimization, vol. 9, no. 4, pp. 877–900, 1999.
View at: Publisher Site | Google Scholar | MathSciNet
L. Sigal, A. O. Balan, and M. J. Black, “HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion,” International Journal of Computer Vision, vol. 87, no. 1-2, pp. 4–27, 2010.
View at: Publisher Site | Google Scholar
M. Sandau, H. Koblauch, T. B. Moeslund, H. Aanæs, T. Alkjær, and E. B. Simonsen, “Markerless motion capture can provide reliable 3D gait kinematics in the sagittal and frontal plane,” Medical Engineering & Physics, vol. 36, no. 9, pp. 1168–1175, 2014.
View at: Publisher Site | Google Scholar
E. Yeguas-Bolivar, R. Muñoz-Salinas, R. Medina-Carnicer, and A. Carmona-Poyato, “Comparing evolutionary algorithms and particle filters for markerless human motion capture,” Applied Soft Computing, vol. 17, pp. 153–166, 2014.
View at: Publisher Site | Google Scholar
J. Gall, B. Rosenhahn, T. Brox et al., “Optimization and filtering for human motion capture - a multi-layer framework,” International Journal of Computer Vision, vol. 87, no. 1-2, pp. 75–92, 2010.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2019 Wanyi Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1361

Downloads

1564

Citations