Research Article | Open Access
Wanyi Li, Yuqi Zeng, Qian Zhang, Yilin Wu, Guoming Chen, "Human Motion Capture Based on Incremental Dimension Reduction and Projection Position Optimization", Wireless Communications and Mobile Computing, vol. 2021, Article ID 5589100, 9 pages, 2021. https://doi.org/10.1155/2021/5589100
Human Motion Capture Based on Incremental Dimension Reduction and Projection Position Optimization
Three-dimensional (3D) human motion capture is a hot researching topic at present. The network becomes advanced nowadays, the appearance of 3D human motion is indispensable in the multimedia works, such as image, video, and game. 3D human motion plays an important role in the publication and expression of all kinds of medium. How to capture the 3D human motion is the key technology of multimedia product. Therefore, a new algorithm called incremental dimension reduction and projection position optimization (IDRPPO) is proposed in this paper. This algorithm can help to learn sparse 3D human motion samples and generate the new ones. Thus, it can provide the technique for making 3D character animation. By taking advantage of the Gaussian incremental dimension reduction model (GIDRM) and projection position optimization, the proposed algorithm can learn the existing samples and establish the relevant mapping between the low dimensional (LD) data and the high dimensional (HD) data. Finally, the missing frames of input 3D human motion and the other type of 3D human motion can be generated by the IDRPPO.
Three-dimensional (3D) human motion capture is applied for many fields, such as medical diagnosis, animation making, and 3D video game development [1–3]. How to generate the human motion in 3D becomes curial to these works. Human motion in 3D is depicted by high-dimensional (HD) data, and the motion sequence consists of poses. Each pose can be exhibited by a human motion model. One complete motion cycle is called a gait.
3D human motion capture has been developed into a hot researching topic. How to generate the human motion in 3D has various techniques. One of the hot techniques is the reconstruction of 3D human motion from the image sequence, which needs some complex preprocessing methods to extract the image feature and analyze feature sample, such as video event analysis  and video feature analysis . Another one is 3D human motion estimation of self-supervised learning, which is learning the sparse samples of one type human motion and generating the other type human motion. Then, the method of self-supervised learning will be mainly discussed in this article. The self-supervised learning can be seen as the special case of unsupervised learning to some extent. Recently, there are some methods which contain defects. In [6, 7], some heuristic algorithms are used to process the preprocessed image for generating the human motion. It will cost too much time, and the quality of generated human motion is susceptible to preprocessing quality of the image. The accuracy and efficiency are low. Some dimension reduction models [8–11] can process the human motion efficiently, but the HD data of the human motion can only be visualized by these models in low-dimensional (LD) space. Some improved dimension reduction models  have two mappings between LD space and HD space, which can generate the LD data sample for transformation of the HD data sample. These models will do great help to generate human motion, but the other type of human motion cannot be obtained. A certain improved method in  is proposed to fit the human motion sequence, which needs to process the LD data in LD space and increases the difficulty of generating the human motion. These methods above also cannot fast obtain one type motion from the other type directly. In summary, how to generate one type motion from the other type directly is not an easy task in a short time. The CNN  and its corresponding networks are emerging in the recent years (e.g., Resnet , AlexNet , VGG , SqueezeNet , DenseNet , and Inception ), but these networks working will need much training time, a large amount of datasets, and huge budget of hardware, which will even need the high-level and costly GPU for work. Thus, a new machine learning method need to be proposed, which is suitable for fast making the animation of 3D human character. Meanwhile, the proposed method can generate the new valid train data and corresponding pseudolabel (self-encoded) data (LD data), which can be used to retrain the model and improve the prediction. In general, it can improve the self-supervised learning model. The data sequence seen as a matrix can be processed by the proposed method directly, so that it can promote performance of some frameworks of tracking and estimation to a certain extent, such as self-supervised seep correlation tracking  (self-SDCT). Without the artificial annotation, the proposed method can obtain the new essential samples according to the data requirement of the self-supervised learning model and let the model update the generating mapping for the improvement of tracking or estimating by the help of these samples.
In this paper, the new algorithm (method) called incremental dimension reduction and projection position optimization (IDRPPO) is proposed to address the problems mentioned above. It can generate one type human motion from the other type. In addition, the input motion samples can be incomplete gait. IDRPPO will show the promising performance from the experimental tests of visual effect and error. IDRPPO will take advantage of Gaussian incremental dimension reduction model (GIDRM)  and projection position optimization to carry out the self-supervised learning of small-scale samples. GIDRM is similar to the bilinear analysis model of compound rank- projections (CRP). Inspired by CRP, the adoption of GIDRM can process the complex HD data of the 3D human motion and make these HD data visualized and regularized. Firstly, GIDRM can process the matrix directly without the transformation of the vectors, which is conducive to decrease the computation complexity and improve the model flexibility. The matrix can denote the HD sample sequence of human motion or the corresponding LD data sequence. Secondly, GIDRM can provide the LD space for searching and generating the optimal LD data sample, so that the corresponding 3D human motion can be reconstructed by its mappings. The two advantages are essential to the efficiency of IDRPPO for estimating the 3D human motion. Thus, IDRPPO with the GIDRM can learn one type incomplete gait, then the missing frames in incomplete gait and the other type motion can be output perfectly by it. Our contributions are listed as follows: (1)Address the problem of filling the missing frames in the incomplete motion cycle and make the motion cycle complete and smooth(2)Address the problem of generating the other type motion cycle from the origin incomplete motion cycle by the help of the IDRPPO
The performance of the IDRPPO will be tested from the experiments, and the results will indicate the IDRPPO can help to achieve the promising visual effect and low estimating error for human motion capture. The technique framework of IDRPPO can be seen in Figure 1. Then, the details of IDRPPO will be discussed in the following sections.
2. Generation of Human Motion through IDRPPO
2.1. Gaussian Incremental Dimension Reduction Model
From Equation (2) and Equation (1), HD data sequence can be denoted by , . LD data sequence can be denoted by , . Kernel matrix is denoted by . are the kernel parameters of , and the other kernel matrix can be denoted by , . are the kernel parameters of . is the scale parameter matrix, then , . Let , and confronts the of -dimensional Gaussian distribution. and satisfy and , respectively. In Equation (1) and Equation (2), is known; thus, is constant, and the equivalence of can be got. The LD data and corresponding parameters can be obtained as follows: where , , the mapping from HD space to LD space can be built as follows: If two or more mappings from LD space to HD space need to be built, Equation (3) can be retrained according to the needs. After building the first mapping, the LD data from the first mapping can be fixed, which can be seen as the initial LD data of the second mapping training.
Then, the mapping of the incremental dimension reduction is built as follows: where is radial basis function, . is the weight matrix, . is least squares estimator, . Then, denotes the new HD data sample, denotes the LD data of . If is known, the mapping from to can be given as follows: where , then we can get the equation as follows:
In Equation (7), is the error matrix, let . Then, , , and . Let , and is a diagonal matrix (). is an invertible matrix ( ). We have , then let and . The equation can be got as follows: Thus, Equation (8) can be written as: According to the properties of least squares, , we have: where . When training, the Nk orthogonal vectors can be replaced; the equation can be got as follows: Equation (11) is equivalent to the equation as follows: In Equation (12), and both are the sets of orthogonal vectors. is the subset of , is the set containing which is the vector from , then . When the tolerance is satisfied, the training can be finished. It means that the vector is selected as few as possible to minimize the variable NK for the satisfaction of the tolerance, so that the mapping training can be finished.
2.2. Projection Position Optimization
The learning of the incomplete gait of human motion needs projection position optimization in the LD space. Let us give some definitions: denotes the projected operation of vector , A is the first known LD data before the missing human motion sequence, B is the last known LD data after the missing human motion sequence, and denotes the LD data of the missing frames. According to Figure 2, we have: After dimension reduction, in Equation (14) is a preset parameter which denotes the distance between the missing dot and projection dot in Figure 2. The position of missing frames should satisfy Equation (13) and Equation (14); thus, Equation (3) can be trained optimally during the second training. Then, according to Equation (13) and Equation (14), the objective function and gradient function can be got respectively, as follows: From Equation (16), , “” denotes product of the entry of matrix. The solution of Equation (15) will not be a unique solution, but any of the solutions can keep the relative position of each missing frame in the LD space during training. Thus, the second training can obtain the LD data samples of missing frames. The solution of Equation (15) can be got by some traditional gradient optimization methods .
2.3. The Procedure of Generating the Human Motion
Some definitions are listed as follows: and are denoted as HD data sample sequences of type I and II human motions, respectively; contains the missing frames; and are denoted as the LD data sequences of and , respectively; and are denoted as the new HD samples of type I and II human motions, respectively. Then, the procedure of generating the human motion is summarized as follows: (1)Equation (3) can be used to process the which is containing missing frames for dimension reduction; then, and corresponding training parameters can be obtained (the external and internal iteration numbers of this step are set to and , respectively)(2)Adopt the projection position optimization to process . It is equivalent to minimize Equation (15) by the help of Equation (16) (the iteration number of this step is set to )(3)The training parameters in step 1 and processed in step 2 can be took into Equation (3) for the second training, then the training parameters, the updated and mapping from to can be obtained. The missing frames in the can be generated from processed in step 2. Build the mapping from to through Equation (5) next (the external and internal iteration numbers of building are set to and , respectively, the iteration numbers of building is set to )(4)Build the mapping from to through Equation (3), is obtained from step 3, and is fixed during this training. After finishing the training of Equation (3), the mapping can be obtained (the external and internal iteration numbers of building are set to and , respectively)(5)When there comes , can be generated by the equation .The computational complexity of the whole algorithm is depending on the iteration number of each step usually. The computational complexity is denoted by , which is mainly described by the time frequency. If the data preprocessing and matrix calculation are without consideration, as the result of which are not the core steps of proposed algorithm, we can get the computational complexity is . Thus, the computational complexity is depending on each iteration number which can reach the max iterative magnitude.
3. Experiment and Evaluation
Some heuristic algorithms and dimension reduction models cannot generate one type human motion from the other type mostly. How to optimize the projection position is the key to the generation of human motion. Thus, the algorithm using incremental dimension reduction with no projection position optimization can be called IDRNPPO. IDRNPPO and IDRPPO will be used to generate the human motion for the experimental tests. In the experiments, the visual effect and error from the missing frames and generated poses will be the evaluation criterion of the performance. The missing frames can adopt the walking motion, and the generated motion can adopt running motion which will be generated by the walking motion. Our test environment is listed as follows:
GPU: Nvidia GTX 1660Ti 6GB
HD: 1.5TB solid state disk
Software: MATLAB R2009b
3.1. The Visual Comparison
(a) The samples of the input walking motion
(b) The samples of the output human running motion from IDRNPPO
(c) The samples of the output human running motion from IDRPPO
(a) The samples of generating the walking missing frames from IDRNPPO
(b) The samples of generating the walking missing frames from IDRPPO
From Figure 3, the human running poses from IDRPPO are better than the ones from IDRNPPO in the visual effect. The 30th, 35th, 40th, 45th, 48th, 52nd, and 58th frames from the IDRNPPO are the same, which cannot constitute the smooth motion sequence to show the running process. Furthermore, from Figure 4, the missing frames in the input motion from IDRNPPO are also the same, which cannot display the missing smooth walking sequence. However, the running motion and the missing walking motion from the IDRPPO are very smooth, which are constituting the ideal sequences of running motion and the missing walking motion, respectively. The running time testing results are reported in Table 1. From Table 1, when generating the running motion, the IDRNPPO consumes 7.83 seconds, and the IDRPPO consumes 7.96 seconds; thus, the running times of both are close. Then, when generating the missing walking motion, the running times of both are also close, the IDRNPPO is 2.15 seconds, and the IDRPPO is 2.13 seconds. From the running time test, it can be found that the IDRPPO will not be time-consuming relatively. In Figure 5, the LD data of missing frames from IDRNPPO and IDRPPO are obviously different, which are denoted by the green ones in Figure 5(a) and Figure 5(b), respectively. The ones of IDRNPPO are without projection position optimization. They are becoming a mess carve, which are difficult to be distinguished. On the contrary, the ones of IDRPPO are very neat and smooth, which can constitute the missing part from the whole curve. The results of Figure 5 can also explain why the missing frames of IDRPPO will be the smooth motion sequence in another aspect. On the whole, Figures 3, 4, and 5 can indicate IDRPPO has better performance than IDRNPPO.
(a) The LD data of the missing frames from IDRNPPO (green ones)
(b) The LD data of the missing frames from IDRPPO (green ones)
3.2. The Error of the Generation
The IDRPPO and IDRNPPO can be seen in Figure 6, respectively. How to calculate error can be seen in . From Figure 6, the errors of the human running motion and the missing walking motion from IDRPPO are lower than IDRNPPO on the whole. It is the normal phenomenon that some frames of both have the close error in Figure 6(a), because some frames of IDRNPPO can display the running motion correctly. However, the tendency of errors can be evaluated by mean error. The mean error from IDRPPO is lower than IDRNPPO as depicted in Figure 6(b). From Table 1, it can be found that the runtime testing results are 8.28 seconds (IDRNPPO) and 9.51 seconds (IDRPPO), respectively. The small gap of the required running times for both will also be indicated. Finally, the results of Figure 6 can illustrate the IDRPPO performance of generating the motion is better than the IDRNPPO again.
(a) The error of generating running motion
(b) The error of generating the missing walking motion
The IDRPPO is proposed to obtain the 3D human motion. IDRPPO with the GIDRM can help to learn the incomplete gait, and generate the other gait, which makes up the defects of some self-supervised or unsupervised algorithms. From the experiments, the projection position is crucial to the performance of IDRPPO. The experimental results can reveal IDRPPO is efficacious in making 3D human character animation, which can do great help to generating the motion cycle fast. IDRPPO can promote the small-scale self-supervised or unsupervised learning undoubtedly. However, IDRPPO cannot process the complex and irregular human motion samples, which will be improved in the future research. The human motion model can be replaced by a more advantaged model , so that the high-level multimedia product can be made by this technique.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by the Guangzhou Science and Technology Plan Project of (No. 202002030232), the University Young Creative Talent Project of Guangdong Province (No. 2016KQNCX111 and No. 2019KQNCX095), Natural Science Foundation of Guangdong Province (No. 2018A0303130169), Guangdong Province Universities and Colleges Special Innovation Projects (Natural Science) (No. 2018KTSCX163), High Education Teaching Reform Project of Guangdong Province (No. 440, approved in year 2020), Teaching Quality and Teaching Reform Project of Guangdong University of Education (No. 2019jxgg07), Key Disciplines of Network Engineering of Guangdong University of Education (No. ZD2017004), and Computer Practice Teaching Demonstration Center of Guangdong University of Education (No. 2018sfzx01).
- E. E. Phelps, R. Wellings, F. Griffiths, C. Hutchinson, and M. Kunar, “Do medical images aid understanding and recall of medical information? An experimental study comparing the experience of viewing no image, a 2D medical image and a 3D medical image alongside a diagnosis,” Patient Education and Counseling, vol. 100, no. 6, pp. 1120–1127, 2017.
- T. Kühl, S. D. Navratil, and S. Münzer, “Animations and static pictures: the influence of prompting and time of testing,” Learning and Instruction, vol. 58, pp. 201–209, 2018.
- A. Dowsett and M. Jackson, “The effect of violence and competition within video games on aggression,” Computers in Human Behavior, vol. 99, pp. 22–27, 2019.
- X. Chang, Y.-L. Yu, Y. Yang, and E. P. Xing, “Semantic pooling for complex event analysis in untrimmed videos,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 8, pp. 1617–1632, 2017.
- M. Luo, X. Chang, L. Nie, Y. Yang, A. G. Hauptmann, and Q. Zheng, “An adaptive semisupervised feature analysis for video semantic recognition,” IEEE Transactions on Cybernetics, vol. 48, no. 2, pp. 648–660, 2018.
- J. Gall, B. Rosenhahn, T. Brox, and H. P. Seidel, “Optimization and filtering for human motion capture,” International Journal of Computer Vision, vol. 87, no. 1-2, pp. 75–92, 2010.
- W. Y. Li and J. F. Sun, “Human motion estimation based on gaussion incremental dimension reduction and manifold boltzmann optimization,” Acta Electronica Sinica, vol. 45, no. 12, pp. 3060–3069, 2017.
- S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.
- J. Xinwei, G. Junbin, W. Tianjiang, and Z. Lihong, “Supervised latent linear Gaussian process latent variable model for dimensionality reduction,” IEEE Transactions on Systems Man & Cybernetics Part B, vol. 42, no. 6, pp. 1620–1632, 2012.
- J. B. Tenenbaum, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
- N. Lawrence, “Probabilistic non-linear principal component analysis with gaussian process latent variable models,” Journal Mach. Learn. Research, vol. 6, pp. 1783–1816, 2005.
- J. M. Wang, D. J. Fleet, and A. Hertzmann, “Gaussian process dynamical models for human motion,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 30, no. 2, pp. 283–298, 2008.
- W. Li, “Manifold latent probabilistic optimization for human motion fitting based on orthogonal subspace searching,” Journal of Information and Computational Science, vol. 11, no. 15, pp. 5357–5365, 2014.
- X. Zhou, M. Zhu, G. Pavlakos, S. Leonardos, K. G. Derpanis, and K. Daniilidis, “MonoCap: monocular human motion capture using a CNN coupled with a geometric prior,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 41, no. 4, pp. 901–914, 2019.
- A. Zanchetta and S. Zecchetto, “Wind direction retrieval from Sentinel-1 SAR images using ResNet,” Remote Sensing of Environment, vol. 253, p. 112178, 2021.
- K. M. Hosny, M. A. Kassem, and M. M. Fouad, “Classification of skin lesions into seven classes using transfer learning with AlexNet,” Journal of Digital Imaging, vol. 33, no. 5, pp. 1325–1334, 2020.
- X. Xu, M. Xie, P. Miao et al., “Perceptual-aware sketch simplification based on integrated VGG layers,” IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 1, pp. 178–189, 2021.
- M. Hassanpour and H. Malek, “Learning document image features with Squeeze Net convolutional neural network,” International Journal of Engineering, vol. 33, no. 7, 2020.
- Z. Tang, W. Jiang, Z. Zhang, M. Zhao, L. Zhang, and M. Wang, “DenseNet with up-sampling block for recognizing texts in images,” Neural Computing and Applications, vol. 32, no. 11, pp. 7553–7561, 2020.
- D. McNeely-White, J. R. Beveridge, and B. A. Draper, “Inception and ResNet features are (almost) equivalent,” Cognitive Systems Research, vol. 59, pp. 312–318, 2020.
- D. Yuan, X. Chang, P. Y. Huang, Q. Liu, and Z. He, “Self-supervised deep correlation tracking,” IEEE Transactions on Image Processing, vol. 30, pp. 976–985, 2021.
- X. Chang, F. Nie, S. Wang, Y. Yang, X. Zhou, and C. Zhang, “Compound rank-k projections for bilinear analysis,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 7, pp. 1502–1513, 2016.
- X. Liuqing and Z. Shipeng, Practical Optimization Method Shanghai, Shanghai Jiaotong University Press, 2000.
- L. Sigal, A. O. Balan, and M. J. Black, Humaneva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion, Brown University, Providence, USA, 2006.
- L. Yebin, G. Juergen, S. Carsten, D. Qionghai, S. Hans-Peter, and T. Christian, “Markerless motion capture of multiple characters using multiview image segmentation,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 35, no. 11, pp. 2720–2735, 2013.
Copyright © 2021 Wanyi Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.