Biologically Inspired RoboticsView this Special Issue
Sparse Approximation for Nonrigid Structure from Motion
This paper introduces applying a novel sparse approximation method into solving nonrigid structure from motion problem in trajectory space. Instead of generating a truncated traditional trajectory basis, this method uses an atom dictionary which includes a set of overcomplete bases to estimate the real shape of the deformable object. Yet, it still runs reliably and can get an optimal result. On the other hand, it does not need to consider the size of predefined trajectory bases; that is to say, there is no need to truncate the trajectory basis. The mentioned method is very easy to implement and the only trouble which needs to be solved is an -regularized least squares problem. This paper not only presents a new thought, but also gives out a simple but effective solution for the nonrigid structure from motion problem.
Nonrigid structure from motion (NRSfM) refers to the process of recovering 3D coordinates of a nonrigid object from its 2D projections. This technology plays an important role in computer vision applications [1–4]. One prevalent solution for solving this process was considering the deformable object as a linear combination of shape basis. This method worked well for simple motion containing a single action. However, it would be hard to recover the structure of the object when dealing with complex sequences. For this reason, Akhter et al.  introduced a common method that reconstructs the 3D object in trajectory space instead of shape space. This approach did well with long complex motion sequences because these measurement points were not dependent on each other. What is more, since the trajectories of the points of most 3D motion in the real world were some curved lines which were naturally smooth, these trajectories could be modeled with a linear combination of some known curved lines, known as trajectory basis. The appearance of trajectory basis model improved the solutions for recovering nonrigid shape from motion well.
Moreover, experiments also proved that the efficiency of the trajectory basis method relied on two factors: the type of the trajectory basis and the number of bases. And in the selection of trajectory basis, it turned out that the discrete cosine transform (DCT) for Markov was more suitable to be defined as a general basis [6, 7]. However, we should know that, though DCT basis has been proved to be better than the others on the whole, the former could not be suitable for every motion sequence. Another thing, the number of the selected trajectory bases was thought-provoking. Selecting a smaller trajectory basis size might lead to big ignorance of much important information of the motion sequences, while a larger basis size might lead to a large number of unknown factors and the system of equations would be ill-posed. And the former situation might get a bad result, and the latter would be a large waste of time or even could not get the solution of the equations .
(1) Contributions. In general, a solution to 3D point trajectories could be obtained more accurately by reducing the restrictions of the trajectory basis. This paper introduces applying sparse coding algorithms to the NRSfM problem. In this process, a set of overcomplete bases called atonic dictionary was predefined to represent the deformable object with sparse coefficients.
An advantage of the sparse coding approach is that it is not restricted to only one trajectory basis function and may be generated by two or more incoherent basis functions. It is very useful to recover the trajectory curves which consist of some different types of bases functions. What is more, since the goal of sparse approximation is to represent trajectory sequences as a sparse combination of all atoms, there is no need to predefine the number of trajectory bases.
(2) Related Work. Bregler et al. first proposed to use factorization approaches to recover the nonrigid deformable objects from motion . The main thought of Bregler et al. was to obtain a low-rank shape basis which can satisfy the 2D point projections. They argued that the structure of motion object could be regarded as an approximate linear combination of basis shapes. This shape model was widely used in this field, though it appeared to be very hard because the inherent basis ambiguity of the nonrigid problem must be overcome. To resolve this ambiguity, Torresani et al.  introduced a Gaussian prior to constraining the solution by reducing the coefficients. For the same reason, Xiao et al.  proposed to add extra “basis constraints” besides the orthonormality constraints. Later, Dai et al.  proposed a simple prior-free method to solve the NRSfM problem for the first time in 2010. Although they did not assume any additional prior knowledge except the low-rank constraint, the method recovered the nonrigid shape reliably. The shape based model became more mature. The methods mentioned above are based on shape basis, until Akhter et al.  made a great improvement, who introduced trajectories based model for solving NRSfM problem instead of shape basis. This approach, regarding the 3D point trajectories as a model in the domain of the discrete cosine transform (DCT) basis vectors, provided better results on complex shapes. Zhu and Lucey proposed to use a penalty to minimize the size of active trajectory basis .
2. Problem Formulations
The measured projective trajectories are contained in a matrix as follows:
is the measurement matrix generated from the 2D coordinates of feature points, where indicates the number of motion frames and is the number of feature points of the deformable object . Then deal with and let be a registered measurement matrix connecting the camera center and the projection on the image plane, so that in the trajectory space can be represented as
In this formula, is the camera motion (projection) matrix and indicates the nonrigid shape matrix . Moreover, the shape coordinates of the object can be decomposed into a set of truncation bases and corresponding coefficients:
Matrix is the predefined trajectory bases, and indicates the corresponding coefficients.
We all know that most of the energy in natural signal concentrates in low-frequency area and discrete cosine transform (DCT) has a huge advantage in the specificity of energy concentration . For the above reason, DCT is used in NRSfM processing so that the loss compression of data can be conducted. What is more, it has been proved that in general DCT basis is a better basis in the whole. So this paper mainly considers the situation in the DCT trajectory space. The discrete cosine transform is generally formulated as follows:where
In Akhter’s method , the rank of matrix was truncated to . That is to say, they selected columns from a matrix and was far smaller than . But because of the limitation of bases number in use, it might be widely inaccurate when the scenes were complex.
3. Sparse Approximation Method
Traditional trajectory bases approaches of reconstruction mainly involve the use of a certain kind of orthogonal bases, such as the Fourier basis, various DCT bases, and other orthogonal wavelet bases. In these situations, one wishes to represent all the trajectory curves of feature points as a linear combination of a certain waveform. But there is a problem that if the number of trajectory bases was too big, the equations system would be a NP-hard problem. So this paper introduces a new method which uses sparse approximation method to represent the trajectory curves instead of traditional trajectory bases method. And the sparse approximation provides a class of algorithms that learn basis functions only when they capture higher-level features in the input data . Moreover, an overcomplete atom dictionary will be used in this method other than trajectory bases. An overcomplete atom dictionary which contains different kinds of bases functions will help to get a better result.
3.1. Atom Dictionary
In reality, wavelets perform poorly on high-frequency sinusoids and, on the contrary, sinusoids perform poorly on impulsive events . So it would not be accurate to recover the trajectory curves just with one certain basis. Dirac function and trigonometric function are two kinds of fundamental orthonormal bases, and most of the real trajectory curves can be represented by these two functions. If Dirac basis and DCT basis are put together in one matrix, this will generate an atom dictionary which performs better than any independent trajectory basis. And the coefficients of the dictionary must be sparse (i.e., the coefficients include many zero items). Each column of this dictionary is called atom, and the concatenation of Dirac basis and DCT basis has turned out to be suitable for most situations . In the paper this dictionary is used in experiments with experimental data instead of trajectory bases.
3.2. Sparse Coding
A dictionary is a concatenation of some orthonormal bases. So the expression is not unique, and many combination methods arise. But the goals of these methods are all to get a highly sparse decomposition which contains very few nonzero terms. This puts forward an optimization problem:where and the symbol indicates the norm which constrains the nonsparse items. It has been turned out to be right that if the isometry constant Δ satisfied the following equations:
The norm is a very common solution to solve sparse estimation problems and it has been turned out to be effective. A variety of different solution packages can be found to solve the norm problem. This paper will apply -regularized least squares, namely, the feature-sign search algorithm, to achieve the sparse representation of the object motion trajectory. This algorithm is mentioned in [13, 17, 18].
4. Algorithm Solution
Having mastered the above theories, applications of sparse approximation algorithm in the NRSfM problem will be presented in the following words.
The goal of the NRSfM is to estimate the camera motion matrix and then recover the true nonrigid 3D coordinate matrix from the measurement matrix . For this purpose, this paper needs to estimate the matrix at first. Fortunately, there has been a well solution for this objective. Dai et al.  have put forward a prior-free method for shape basis model. And this method could also be used in trajectory space. Having solved the motion matrix , then an overcomplete atom dictionary can be predefined as the trajectory basis. The sparse approximation algorithm will help to solve the corresponding coefficients. So the 3D shape matrix would be obtained with the trajectory basis and the corresponding coefficients.
4.1. Estimate Camera Motion Matrix
One can compute the rank-3K decomposition of a measurement matrix via singular value decomposition (SVD)  and obtain the equation as . For the reason that there exist any rank-3K matrices which satisfied the equation that , so the decomposition is not unique. In this paper, a semidefinite programming (SDP) of small and fixed size mentioned in  is applied to solve the above problem. Dai’s method is also effective in estimating in trajectory space by trace minimization. The linear equations system can be obtained as follows.
, such thatwhere denotes the th column-triplet of , denotes the th double rows of , denotes the th row of , and denotes the vectorization operator. The above linear system will get a unique solution via a standard SDP. Then can be found by using SVD. Once the is obtained, one is allowed to compute the camera motion matrix by the following equation:Note that is included in the . Finally, the motion matrix is represented as .
4.2. Estimate the Coefficients Matrix A
The coefficients matrix is a sparse matrix; that is to say, matrix consists of many nonzero items. And from the previous statement, can be solved by a sparse approximation method. Having known the camera motion matrix , can be obtained from the equation . Then an alternate strategy can be described as follows:where , .
To simplify the expression, it can be represented by the following optimization problem:where is the penalty coefficient and it must be a constant.
An iterative feature-sign search algorithm, which can solve for in the Fourier domain, can be used to solve this objective efficiently. The details of the algorithm have been introduced in . The coefficients matrix can be generated from the vector . At last, the shape matrix will be obtained via the equation .
The training dataset used in this paper is from CMU Motion Capture dataset, which covers a variety of human actions. The random synthetic data are used only in the condition of algorithm validation and the sparse approximation method performs clearly better than others. So the result will be not mentioned and this paper reports the results on real sequences only. The real scenes tested in this paper mainly include the commonly sequences of “Yoga” (41/307), “drinking” (41/1102), “pickup” (41/357), “shark” (91/240), “Stretch” (41/740), and “walking” (55/260), where denotes the number of points and frames .
5.1. Sparse Coefficients Estimations
At first, this paper does one experiment on “Yoga” sequence. A union atom dictionary of DCT and Dirac function is used in this experiment. Having recovered the 3D coordinates of the deformable object, the corresponding coefficients of the atom dictionary are obtained naturally. And the experimental result is shown in Figure 1. Figure 1 shows the corresponding coefficients range from the 1st to the 250th atoms.
Coordinate “atom” refers to the basis order of the dictionary generated by trajectory basis functions. That is to say, the “atom” corresponds to the columns of one dictionary matrix. Coordinate “coefficients” refer to the corresponding coefficients of each atom. From the above graph, one can easily find that most of the atom coefficients are zeros. It proves that the assumption is suitable that the coefficients matrix is a sparse matrix. At the same time, the result also verifies the feasibility of the sparse approximation method.
5.2. Experimental Result on One Real Sequence
It has been proved that DCT was better than the others in the application of trajectory basis model and it is the most common method. This paper compares the sparse method against the trajectory basis methods in the situation of selecting DCT bases with different size. The atom dictionary used in sparse approximation method is also generated by a union of DCT basis and Dirac basis. This subsection presents the different performances on the “Yoga” motion sequence of these methods and gives the shape reconstruction error on one diagram as shown in Figure 2. The calculation formula of the mean 3D error is denoted as follows: where indicates the real shape sequence and indicates the estimated one.
Clearly, selecting 12 as the size of DCT basis is the best solution and the shape estimation error is the smallest when recovering the object shape with DCT basis model. But when considering with the sparse coding method, the sparse method is obviously better than the trajectory basis methods even in any size of DCT basis. It is very easy to understand that the atom dictionary contains all predefined trajectory bases even those not used in traditional trajectory basis. The difference between these two methods is that the sparse method only sets the coefficients of the unused trajectory bases to zero and these unused trajectory bases may still be used in other trajectory curves. But, in the trajectory basis model, once the trajectory bases were truncated in a fixed size, the trajectory curves could only be represented by these truncated bases.
5.3. Estimation Errors
This subsection gives a statistical comparison between the well-size DCT trajectory basis method [19, 20] and the sparse approximation method. The truncated size of the trajectory bases is learned by the previous work done by other researchers and it has been proved to be reliable. By using the obtained data from the “Yoga” sequences, the mean 3D error of each frame is computed with these two methods, respectively. From the above subsection, the best solution of the DCT basis model could be found. So the size of the DCT trajectory basis in this scene is 12 and the penalty coefficient of the sparse method is 0.1. The shape estimation errors of each frame are shown in Figure 3.
From Figure 3, it is very clear to know that, despite using the traditional DCT trajectory basis method in the best situation, the shape estimation errors of each frame obtained by sparse approximation method are smaller than the estimation errors obtained by DCT basis model. That is to say, the sparse approximation method performs better than the DCT trajectory basis method in dealing with the NRSfM problems in the “Yoga” scene.
5.4. Tests on Different Scenes
To verify the effectiveness of the sparse coding method, this paper repeats the experiment in different scenes. The experiment is done with some real sequences which include “Yoga,” “walking,” “pickup,” “shark,” and “drink.” The best size of DCT trajectory bases is obtained from the previous experiments which have been proved to be effective. This experiment applies the best solution of DCT basis model to every scene and compares the results of DCT basis model with the sparse approximation method. From previous works, the best DCT basis size of “Yoga” is 12, “walking” is 8, “pickup” is 12, “shark” is 2, and “drinking” is 10. The penalty coefficient of the sparse approximation method is 0.1. The experimental result is shown in Figure 4.
From Figure 4, one can find that the shape estimation errors of every scene obtained by DCT trajectory basis model are higher than the estimation errors obtained by sparse approximation method. That is to say, sparse approximation method is better than DCT trajectory basis method in most of the scenes; even the size of DCT trajectory basis is selected in the best situation. At worst, the former will get a similar result with the latter in the situation that the truncated DCT bases can represent the trajectory curves of the feature points largely. A large number of experiments have been done to prove that the sparse approximation method is a better application in the NRSfM problem.
5.5. Sample Shape Reconstruction Results
To verify the effectiveness of the sparse approximation method that this paper mentioned, a lot of experiments have been done. At this subsection of the paper, some experimental results were shown clearly to enhance the persuasion. This subsection gives out the shape reconstruction results of the “Yoga,” “shark,” and “Stretch” sequences using the DCT method and the sparse approximation method, respectively. The truncated size of DCT basis in the “Yoga” experiment is 12, and the penalty coefficient of the sparse method is 0.1. The experimental results are seen in Figure 5. The truncated size of DCT basis in the “shark” experiment is 2, and the penalty coefficient of the sparse method is 0.1. The experimental results are seen in Figure 6.
To verify the effectiveness of sparse approximation method, this subsection also gives out the experimental result of “Stretch” sequence; Figure 7 shows the shape reconstruction result of the “Stretch” sequence with two methods, respectively. The truncated size of DCT basis in the “Stretch” experiment is 12 and the DCT basis method will get the best solution in the DCT basis model situation, and the penalty coefficient of the sparse method is 0.1.
From Figures 5, 6, and 7, one can easily find that the reconstruction points with sparse approximation method are more close to the original 3D feature points. The experimental results have shown the advantage of the sparse approximation method. The approach mentioned in this paper is obviously more effective than the traditional DCT trajectory basis methods.
This paper introduces a novel sparse approximation method to resolve the NRSfM problem. It is quite easy to understand and is guaranteed to get an optimal solution. It shows that one needs not to consider the size of truncated DCT bases via using the sparse approximation method. In this paper, only the union of DCT and Dirac function is applied into the reconstruction of 3D motion object. It is expected that camera rotation matrix estimation error can be obtained more accurately.
In this paper, the present experiments are all in a situation of orthographic camera models. And thanks to recent progress in signal sparse coding, the proposed solution can be easily applied into solving the NRSfM problem. This paper just gives out one thought to solve the size of the trajectory bases. Some other ideas can also be used in this problem. At the same time, obtaining a rotation matrix accurately will help to get a satisfied result.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by the National Natural Science Foundation of China (61272311) and is also supported in part by the Natural Science Foundation of Zhejiang Province (LY13F020042, LY14F010022, and Z15F020017) and 521 Project of Zhejiang Sci-Tech University.
J. K. Aggarwal and Q. Cai, “Human motion analysis: a review,” in Proceedings of the IEEE Nonrigid and Articulated Motion Workshop, pp. 90–102, June 1997.View at: Google Scholar
I. Akhter, Y. Sheikh, S. Khan, and T. Kanade, “Nonrigid structure from motion in trajectory space,” in Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS '08), pp. 42–48, December 2008.View at: Google Scholar
S. Agarwal, J. Wills, L. Cayton, G. Lanckriet, D. Kriegman, and S. Belongie, “Generalized non-metric multidimensional scaling,” Journal of Machine Learning Research, vol. 2, pp. 11–18, 2007.View at: Google Scholar
Y. Wang, J. Cheng, J. Zheng, Y. Xiong, and H. Zhang, “Analysis of wavelet basis selection in optimal trajectory space finding for 3D non-rigid structure from motion,” International Journal of Wavelets, Multiresolution and Information Processing, vol. 12, no. 3, Article ID 1450023, 14 pages, 2014.View at: Publisher Site | Google Scholar | MathSciNet
C. Bregler, A. Hertzmann, and H. Biermann, “Recovering non-rigid 3D shape from image streams,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '00), pp. 690–696, June 2000.View at: Google Scholar
J. Xiao, J.-X. Chai, and T. Kanade, “A closed-form solution to non-rigid shape and motion recovery,” in Computer Vision—ECCV 2004, pp. 573–587, Springer, Berlin, Germany, 2004.View at: Google Scholar
Y. Zhu and S. Lucey, “Convolutional sparse coding for trajectory reconstruction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 529–540, 2013.View at: Google Scholar
H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efficient sparse coding algorithms,” in Neural Information Processing Systems, pp. 801–808, 2007.View at: Google Scholar
M. Chen, G. AlRegib, and B.-H. Juang, “Trajectory triangulation: 3D motion reconstruction with ℓ1 optimization,” in Proceedings of the 36th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '11), pp. 4020–4023, Prague, Czech Republic, May 2011.View at: Publisher Site | Google Scholar