Human Motion Estimation Based on Low Dimensional Space Incremental Learning

Li, Wanyi; Sun, Jifeng

doi:https://doi.org/10.1155/2015/671419

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2015 | Article ID 671419 | https://doi.org/10.1155/2015/671419

Human Motion Estimation Based on Low Dimensional Space Incremental Learning

Wanyi Li¹and Jifeng Sun¹

Academic Editor: Mohamed Djemai

Received28 Sept 2014

Revised05 Jan 2015

Accepted09 Jan 2015

Published16 Mar 2015

Abstract

This paper proposes a novel algorithm called low dimensional space incremental learning (LDSIL) to estimate the human motion in 3D from the silhouettes of human motion multiview images. The proposed algorithm takes the advantage of stochastic extremum memory adaptive searching (SEMAS) and incremental probabilistic dimension reduction model (IPDRM) to collect new high dimensional data samples. The high dimensional data samples can be selected to update the mapping from low dimensional space to high dimensional space, so that incremental learning can be achieved to estimate human motion from small amount of samples. Compared with three traditional algorithms, the proposed algorithm can make human motion estimation achieve a good performance in disambiguating silhouettes, overcoming the transient occlusion, and reducing estimation error.

1. Introduction

Human motion estimation has become a hot researching topic [1–3], but it is a challenging task. Unusually, we are very interested in estimating the human motion in 3D from the silhouettes of human motion multiview images. The challenges are mainly as follows: firstly, it is hard to build the mapping between multiview silhouettes and human motion in 3D; secondly, the matching is ambiguous between multiview silhouettes and human motion in 3D; finally, it is hard to determine the spatial position information of the human motion depicted in the multiview images. In the past few years, a number of algorithms have been proposed to estimate the human motion. In the works of Sigal et al. [4] and Deutscher and Reid [5], they use some improved particle filters to estimate the human motion in 3D. It cannot work well because of searching (sampling) in high dimensional (HD) space for many times. There are some problems if we directly search in HD space, for example, searching in large scale will get the invalid data and searching in small scale will not get the target data. Moreover, if searching many times in small scale, it also brings about the invalid data. Li et al. use principal component analysis (PCA) to reduce the dimension of the HD data samples and simulated annealing particle swarm optimism (SAPSO) to estimate the human motion in 3D [6]. This algorithm is time-consuming and its performance is not so well because the HD data converted from corresponding low dimensional (LD) data will be quite different from original HD data. Besides, it does not consider the spatial position information of the human motion. Some traditional Monte Carlo methods [7–9] have drawbacks, which can not ensure collecting the best sample each time during searching, thus stochastic extremum memory adaptive searching (SEMAS) is proposed to solve this problem. In the work of Wang et al. [10], Gaussian process dynamical model (GPDM) can be used to reduce the dimension of the HD data to acquire the corresponding LD data and build the mapping from LD space to HD space, but GPDM can not quickly reduce the dimension of the new HD data sample and acquire the new corresponding LD data; thus incremental probabilistic dimension reduction model (IPDRM) is proposed to solve this problem based on GPDM. Some improved incremental or nonincremental learning algorithms in [11–14] can not satisfy our need. The limitations are that output data denoting the class label or other simple information have only one dimension, which can not satisfy the description of some output data, and they can not carry out the unsupervised incremental learning of HD data. Inspired by the researches stated above, the key to estimate the human motion in 3D depends on generating the better prior information. The human motion in 3D can be estimated more accurately through searching around the better prior information in small scale only once. In this paper, we mainly focus on the regular human motion cycle (walking or running).

Our task is how to use the small amount of samples of HD data as the prior information to estimate the human motion in 3D which matches the multiview images. Based on the works of the researches above, we proposed a novel algorithm called low dimensional space incremental learning (LDSIL). The LDSIL mainly carries out through SEMAS and IPDRM to collect the new HD samples and updates the mapping from the LD space to HD space through the selection of new HD samples, thus the searching in the LD space can generate the more accurate HD data to estimate the corresponding human motion in 3D. Then, SEMAS is used to find the spatial position of human motion in 3D, and it can find the best data sample during searching more easily. IPDRM is used to reduce the dimension of the new HD data sample and acquire the new corresponding LD data, and it can help to select the new HD samples through the mapping of incremental dimension reduction. Moreover, it provides the LD space to generate the valid HD data. Based on IPDRM, the method of selecting the HD data samples for incremental learning can be achieved by comparing corresponding LD data.

The main contribution of this paper is listed as follows:(1)SEMAS is proposed to find the spatial position of human motion model. It can get the best data sample better than the traditional Monte Carlo methods.(2)IPDRM is proposed to reduce the dimension of the new HD data sample and acquire the new corresponding LD data. It can promote the incremental learning in LD space to update the mapping from the LD space to HD space. Besides, it provides the LD space to generate valid HD data through searching.(3)The method of selecting the HD data samples is proposed; it is used to update the mapping from LD space to HD space.

Overall, due to LDSIL being able to make use of LD data, it can solve these problems mentioned above and contribute a lot to estimating the human motion in 3D, which has the better performance than other traditional algorithms, including disambiguating silhouettes, overcoming the transient occlusion, and reducing estimation error.

The rest of this paper is organized as follows: Section 2 introduces the corresponding data and models, and they are used to estimate the human motion. Section 3 proposes SEMAS algorithm to find the spatial position of human motion model. Section 4 proposes IPDRM to achieve the incremental dimension reduction. Section 5 proposes orthogonal least squares learning of multiple outputs (OLSLMO) to learn the mapping (HD space to LD space and LD space to HD space). Section 6 discusses LDSIL based on Sections 2–5: this section mentions how to select the new HD data sample from the estimated human motion models to achieve the incremental learning in the LD space. Section 7 proposes the method of searching in LD space to estimate the human motion model; the method is taking the advantage of the SEMAS and IPDRM. Section 8 shows the validity of proposed algorithm (LDSIL) through the experiments and evaluations. Section 9 discusses the limitation of LDSIL algorithm and the improvement in the future.

Let us give the more detailed discussion in the following sections.

2. Corresponding Data and Models

We introduce the corresponding data and models in the works [4, 5]. All image data can be found in HumanEva-I dataset [4], as shown in Figures 1(a)–1(d). Figure 1(a) shows the human motion model denoting the human motion in 3D, which is described by HD data. The model is our estimated object, which need match the limbs in the multiview images. Figure 1(b) shows the multiview images, which are depicting human motion and its spatial position. After using some image segmentation algorithms [15–17] to process the multiview images, we can get the silhouettes as shown in Figure 1(c). Then, we project the model to the corresponding views and obtain the projection images in Figure 1(d). The images in Figure 1(d) are used to compare with the images in Figure 1(c).

(a) Human motion model

(b) Multiview images

(c) Silhouette images

(d) Projection images

(e) Corresponding LD data of HD data samples (3D)

In the following, we give some definitions for the abovementioned data and models. The equation is built as below:where denotes the th frame, denotes the th view of camera, is the number of views, denotes the weight coefficient, denotes the pixel of the image, denotes the set of pixels in the silhouette image, and denotes the set of pixels in the projection image of the human motion model. Then, is statistical function of , which is in the silhouette (, or , ). is statistical function of , which is in the model projection (, or , ). Thus, in the th view, denotes the pixel number of in the silhouette and not in the model projection, denotes the pixel number of in the model projection and not in the silhouette, and denotes the pixel number of in the silhouette and the model projection. Then, the object image consists of multiview silhouette images, denotes the th view feature of object image, denotes multiview features of object image, and , respectively, denote the HD data samples which contain spatial position information (6 dimensions in total) and no spatial position information, , and thus is the conditional probability of matching image feature of the th view from the appearance of . Usually, we let and in most cases. After dimension reduction, the corresponding LD data of the HD data samples with no spatial position information can be seen in Figure 1(e). The HD data samples are the small amount of samples which cover one cycle of human motion.

For the better description of our proposed algorithm in the following section, we need to define the symbols of some operations as follows. is extracting the th-th dimension data of vector as a subvector; and are matrices or vectors, , ; is the matrix whose elements are 1; is the matrix whose elements are conforming the distribution ; the set can be described by , , ; is the calculation of 2-norm (Euclidean-norm), and can denote the vector or matrix.

3. Stochastic Extremum Memory Adaptive Searching

We will propose SEMAS algorithm in this section, which is used to find the spatial position of human motion model. The spatial position information includes the angles and coordinates of root marker in the model, which is the low dimensional data. In time (the th frame), let denote the spatial position information of , the set , , and and ; we getwhere the SEMAS can be carried out as follows. We denote as the ordinal number of maximum element in , , , , and is a sample of the th time searching in time , , , . Then, we let , is the scale parameter vector, , and is the searching parameter of the step size; thus the searching vectors can be given by or , ; the new data samples can be obtained by , . Here, the extremum needs to be kept, let , and, in the next time searching, it is recording the extremum for ensuring obtaining the data samples which is not worse than the ones in the last time searching. In each searching, we can get data samples. After searching twice, we can adjust the value of according to the equation as below:where when the subtraction of the best object values in the past two times searching is below some value, can be enlarged to search better data samples; on the contrary, if the subtraction is above some value, it illustrates that the value of can be shrunk to avoid missing the best data sample. Thus, the method can make the convergence faster. After times searching ( is large enough), when is unimodal approximately, the best sample can be obtained byand the derivation of (4) and the pseudocode can be seen in the appendices.

Then, we compare and analyze the performances between traditional Monte Carlo method and SEMAS as depicted in Figures 2(a) and 2(b). In Figure 2, we take 4 times searching and 3 data samples each time for example. Figure 2(a) shows that the traditional Monte Carlo method can not ensure that the searched data samples are not worse than the last time, which keeps the larger weight data samples simple. The reason is that it does not keep the best data sample each time during searching, which can not be compared with the data samples searched in the next time; moreover, the length and direction of searching vector are stochastic and uncontrolled. After several times searching, in most cases, it may appear that all of the searched data samples can not close the better, and the mean , can not reach also. As shown in Figure 2(b), the SEMAS will keep the best data sample in each time searching, which will be compared with the searched data samples in the next time searching. It can ensure that the searched data samples can not be worse than the last time searching and can adjust the length of searching vector to control the searching scale, according to subtraction of the two best values of objective function in the past twice searching; thus it will close the better and has more chance to get the better data samples than traditional Monte Carlo method.

(a) Traditional Monte Carlo method

(b) SEMAS

4. Incremental Probabilistic Dimension Reduction Model

4.1. Probabilistic Models of GPDM

We give the probabilistic models of the GPDM in [10] as follows:In (5), HD data sequence can be denoted by , , and LD data sequence can be denoted by , . Kernel matrix with parameter is , ; is scale parameter matrix, , , ; in (6), kernel matrix with parameter is , , , , and conforms Gaussian distribution of dimensions. and satisfy and , respectively. Then, we give the conditional distribution as follows:where , , , , , , and . Kernel matrices include , , , , , and , and .

When training GPDM, is known; thus is constant, and . Then, the LD data and corresponding parameters can be calculated as below:Optimization of (9) can use the method of scaled conjugate gradient (SCG) [18, 19]. After the optimization, can be depicted as shown in Figure 1(e). According to (7), , the mapping from LD data to HD data can be given by the mean as below:

4.2. The Mapping of Incremental Dimension Reduction

After finishing training, GPDM can not process the new HD data sample, which means it can not acquire the corresponding LD data. From (10), is known, the is embedded in nonlinear kernel, and thus is hard to be solved through this equation. Thus, we need to build the mapping from HD data to LD data and acquire the LD data of new HD data sample fast.

We denote HD data samples not containing spatial position information as ; then we denote the corresponding LD data as , which can be acquired by GPDM. The mapping can be built through the training model as below:where is the weight matrix and is the matrix of Gaussian basis function; thus we have , is the bias matrix, and . is known; after finishing training of (11), , , and can be confirmed, here, and denote the trained values of and , respectively, . Then, we have the mapping of incremental dimension reduction as below:where as to one HD new samples , let , , and . The mapping as (12) can acquire the corresponding LD data of the new HD data sample; furthermore, it can acquire the multiple corresponding ones of the multiple HD data samples, which only need to add the corresponding column in and . Equations (5)–(12) constitute IPDRM; however, the learning method of (11) is the key to estimate the human motion in 3D; thus we will discuss it in the next section.

In the following, we need to discuss the several advantages of IPDRM. Firstly, the model is used to build the LD space through the known HD data samples, acquire the LD data of the new HD data sample, and provide the human motion global prior information in the LD space. The searching position in the LD space can be confirmed to disambiguate silhouettes through clustering. This is because the mapping from HD space to LD space is one-to-one. Secondly, it can reduce the estimation error and overcome the transient occlusion through linear and nonlinear searching in the LD space via the global prior information; moreover, the HD data samples generated through the large scale nonlinear searching between the two neighbours LD data in the LD space are better and more valid than the large scale nonlinear searching between the two neighbours HD data in the HD space directly, which can be tested as shown in Figure 3. Let us see the two experiments (walking and running models); we find that the HD data which are generated through nonlinear searching in the LD space (Figure 3(b)) and converted to human motion models are more like human motion shape than the ones which are generated through the nonlinear searching in the HD space (Figure 3(a)). Finally, searching in the LD space can have lower computation cost than searching in the HD space obviously.

(a) Searching in the HD space (some limbs in model are mass and invalid)

(b) Searching in the LD space (all limbs in model are regular and valid)

5. Orthogonal Least Squares Learning of Multiple Outputs

In this section, we propose a method called orthogonal least squares learning of multiple outputs (OLSLMO). In the LD space or HD space, the data are multiple dimensions, for which the feature or model can be described better. Let us denote the model as follows:Here, matrices of the output and input vectors are, respectively, and ; , , , and ; error matrix is . Then, the matrix can be decomposed into this form , and is the matrix with orthogonal vector , ; , ; , is a diagonal matrix; is the invertible matrix with 1 on the diagonal given bylet least squares (LS) estimator , , and we can give the derivation as below:Equation (13) can be written byand let ; according to property of LS estimation, we haveand let , ; minimizing (17) can be achieved bywhere ; let , , and is corresponding orthogonal vector set from vector set . The value of can be confirmed according to the condition , . When this condition is satisfied, we letwhere and error matrix is . According to the equation and updating method [20] of LS, can denote the LS estimator of , and we let , , , and and getwhere if condition , is satisfied, and if condition , can also be satisfied; because is determined, can get further approximation based on which is playing the leading role during training. The more detailed deviation of the condition can be seen in appendices. Then, we have , , and .

Then, we summarize the algorithm of OLSLMO as in Algorithm 1.

Input: , , , ;
Output: , , ;
For
If = 1
, ;
Else If ≥ 2
, , ;
;
End If
;
;
; ;
; ; ;
; ;
; ;
If
Break;
End If
End For
;

The learning of (11) can use Algorithm 1, we let input parameters , , , and and functions , , , and , and then get corresponding output parameters , , , , and .

6. Low Dimensional Space Incremental Learning

After using low dimensional space searching (LDSS) to estimate some human motion models of corresponding frames in 3D (about one motion cycle), the new HD data samples can be got from the models. The LDSS algorithm is given in the framework of LDSIL shown in Section 7.2. Then, continuing to use the mapping of (10) to generate HD data sample will be less accurate, thus incremental learning is needed to update the mapping from LD space to HD space. Equation (10) is so complicated that its parameters can not be updated easily, due to the need of calculation in (9). For this consideration, we still use the model like (11) to build the mapping, which can take advantage of new HD data samples to learn through the method of Algorithm 1. The details of incremental learning will be discussed as follows.

The key of incremental learning is how to select the HD data samples to update the mapping from LD space to HD space. The better new samples used to training the mapping can be selected from the estimated HD data with spatial position information removed. The estimated HD data samples contain errors, which can convert into human motion models of corresponding frames. After we get corresponding new HD data samples , we can use (12) to obtain the corresponding LD data, , and get . We know the LD data is representing the feature of HD data; thus the distance among LD data can describe the similarity of HD data, and we need to select the LD data from , which are well-distributed in the LD space for getting the corresponding HD data. The selecting rule can be carried out as below. We denote ordinal number as below:and then, we haveand the LD data and the corresponding HD data can be selected bywhere we let and and then get Equation (24) means selecting the best well-distributed LD data. Thus, the training data samples of input and output are given bywhere we let and , , are the samples of incremental learning, the can be seen in the LD space as depicted in Figure 4, and the green ones denote ; the new mapping can be built through the training model as below: where , , , , , and denote the trained values of and , respectively, and . After finishing the training of (26), the incremental learning is completed. Then, the mapping is updated, so that the new HD data can be generated through the updated mapping:where, similarly, as to the one , let , , and . When we have multiple , we can also get the corresponding multiple , and it only needs to add the column in and .

Training equation (26) can use Algorithm 1; we let input parameters , , , and and functions , , , and and then get corresponding output parameters , , , , and . Let us summarize the incremental learning algorithm as in Algorithm 2.

Input: , , b, ;
Output: , , , ;
For
If =
; ;
Else
; ;
End If
;
For
;
If
;
End If
;
End For
;
For
;
End For
; ;
End For
= ; ; ; ;
; ; ;
; ; ; ; ; ;
Use Algorithm 1 to get and ;
, , ; ;
;

The mapping of (27) is incremental learn mapping, which replaces (10) of IPDRM; it can promote the second advantage of IPDRM mentioned above, due to the mapping from LD data to HD data being updated to get more accurate. Worthy of attention is that at least one must distribute between and , so that effectiveness of incremental learning can be better, thus using LDSS to generate HD data samples should follow this.

7. Human Motion Estimation via Searching in Low Dimensional Space

7.1. The Method of Searching

We estimate the human motion model via searching in low dimensional space. The method of searching needs to combine linear searching and nonlinear searching in order to generate the better corresponding HD data sample, which is depicted as Figure 5. In Figure 5, the hollow dots denote the new LD data acquired by linear searching and nonlinear searching in the LD space; the solid dots denote the known LD data acquired by dimension reduction of known HD data samples. Let us discuss more details as follow.

We take ( or ), ( or ), and the mapping ( or ), for example, to discuss the method. As to the new HD data sample , we use mapping to obtain the (). Then, the clustering of the can be executed through the comparison of distance to find the nearest known LD data:After finding the , denotes the amount of new searched LD data; we need linear searching to get the new LD data between the two known neighbours LD data near , like thishere, , , and ; let ; then we have , , and , . Besides, we also need nonlinear searching to get them similarly, and denotes the number of new oneswhere , , , , , and . From (29) and (30), the set containing searched LD data can beuse the mapping to get another set as below:according to (2), we have where the optimal prior information of the pose can be obtained, and then the optimal prior HD data can be , and is the optimal spatial position information which can be obtained by SEMAS.

On the basis of work above, we can estimate the human motion model from multiview silhouettes through Bayesian theory. When is known, is unknown and is large enough; we let , and the estimation can be achieved byand, after getting , the human motion model can be drawn in the space. Worthy of attention is that is multivariate normal distribution density function, which takes as the mean, is the number of generating , is contained as one of samples, which can let , and is normalized weight. The method of searching before or after incremental learning is also like this; it only needs to replace some variables to be carried out.

The poses of human motion are continuous, which are shown by the human motion models. The pose estimated last time is close to the one of this time, thus the new HD sample can be seen as the pose estimated last time, and the corresponding LD data can be acquired by IPDRM. Then, we find the other LD data close to through clustering to confirm the searching position in the LD space, so that the HD data samples closing to the true data this time can be generated efficiently through searching in the nearby area of this position. For these advantages, the proposal algorithm (LDSIL) can disambiguate silhouettes, overcome the transient occlusion, and reduce estimation error. Furthermore, the performance can be promoted through its incremental learning which is used to update the mapping from the LD space to the HD space.

As to the estimation of Bayesian theory above, we use SEMAS and the proposed method of searching to find the optimal prior HD data , which can make large enough. When , according to (34), somehow, also become large enough. Then, it means that can be seen as the sample generating from the unimodal distribution density , due to , and is known. Moreover, the true HD data can be seen as the mean relative to the distribution density , and thus its value of distribution density will be larger than others. On the whole, it also means that is the sample which is close to the true HD data; thus taking as the mean () with the small scale variance to search the HD data in the HD space for only once can generate the HD data samples which will also have the large value of distribution density ; then the mean of these samples can be close to mean relative to distribution density ; in other words, it will be close to the true HD data. The work above can avoid the small scale searching (sampling) the HD data for many times or the large scale searching (sampling) the HD data directly and generating the invalid HD data to estimate the human motion in 3D.

Then, we will discuss more details about the whole procedure of human motion estimation in the following section.

7.2. The Procedure of Algorithm for Human Motion Estimation

The work from Sections 2–7 can be summarized as the complete algorithm procedure of LDSIL to estimate human motion in 3D. The framework of LDSIL can be seen as shown in Figure 6. Let us give more detailed description for the framework.

From Figure 6, we can see the framework of LDSIL. Firstly, LDSIL is using the LDSS to estimate the human motion in 3D based on the small amount of HD data samples and get the estimated HD data covering about one motion cycle. The framework of LDSS consists of SEMAS, linear and nonlinear searching in LD space, IPDRM, and Bayesian theory. The estimated HD data with the spatial position information removed can be selected as the new HD data samples and used for updating the mapping from LD space to HD space through IPDRM. Secondly, the IPDRM is used to obtain the corresponding LD data of the new HD data samples through its mapping of incremental dimension reduction. Selecting the new HD data samples can be achieved through comparing the distance among the corresponding LD data. After the comparison is finished, the LD data and the corresponding HD data can be obtained and used to update the mapping from LD space to HD space for the incremental learning of LD space. Thirdly, the new mapping updated by the selected HD data and the corresponding LD data can generate the HD data more accurately; thus the estimation can achieve a better performance with the help of new mapping . Finally, the estimation of human motion in 3D from the multiview silhouettes is carried out through the framework including SEMAS, linear and nonlinear searching in LD space, IPDRM with new mapping , and Bayesian theory. However, when LDSIL is used to estimate the human motion, these data are known as follows.

We have the sequence of the small amount of HD data samples (no spatial position information) covering one motion cycle and the HD data of initial frame (1st frame) containing the spatial position information (6 dimensions in total) . can denote the number of estimatedframes without incremental learning, which are used for the sample selection of incremental learning, and can denote the number of estimated frames with incremental learning. Then, we begin the procedure as follows.

LDSIL for Human Motion Estimation(1)Use (9) to reduce the dimension of , obtain , build the mapping , and let , .(2)Use Algorithm 1 to training equation (11), build the mapping , get IPDRM, and let mapping .(3)When estimating the th frame, use SEMAS and set relevant parameters to find the spatial position information. Let , select , and get the optimal planimetric position information . Then, let , select , and get the optimal height and rotation angle information . At last, obtain .(4)Let , obtain the LD data of the th frame , and calculate , . Search the data in the LD space, get the set , and calculate , , and then get .(5)Generate , , through , let , calculate , , and get . Then, human motion model can be outputted through , and let , . If , , return to step (3); otherwise, get the estimated HD data (new HD data samples) and go to next step.(6)If needing incremental learning, go to next step; otherwise end this algorithm.(7)After getting , use mapping to obtain . Then, get the training samples and selected from and by the method proposed above, and use Algorithm 2 to carry out incremental learning to train (26). After finishing the training, get updated mapping . Let , , , and mapping , and return to step (3).

8. Experiments and Evaluations

We tested the performance of our proposed algorithm (LDSIL) in three views () by comparison with other three traditional algorithms, which included annealed particle filter [4] (APF), Gaussian particle filter [21] (GPF), and particle filter [22] (PF). The tested performance included disambiguating silhouettes, overcoming the transient occlusion, and estimation error. Besides, we compared LDSIL with LDSS to test the validity of incremental learning independently, and the LDSS was similar to LDSIL somehow, so it had the performance of LDSIL, but it was without the incremental learning, and the mapping from LD space to HD space was not updated, which still used (10) to generate HD data. The performance of LDSIL and LDSS could be seen by the comparison of the mean error and maximum error.

Firstly, we were testing disambiguating silhouettes and taking walking and running motion as the test cases. Then, we saw in the subfigures of Figure 7, after estimating several frames, APF, GPF, and PF could not disambiguate silhouettes to estimate the human motion in 3D; on the contrary, LDSIL could disambiguate silhouettes to estimate the one that was close to the true data. The ambiguity meant a set of silhouettes such that the pictures of Figure 1(c) could not distinguish positions of the limbs from the human body; for example, the silhouettes could not depict which one of the legs was in front or in back. Thus, we could see in the subfigures of Figures 7(a) and 7(b) that the estimated human motion projections in all views from APF, GPF, and PF showed that the positions of left leg (white line) and right leg (gray line) were, respectively, opposite by the comparison of true data. The corresponding human motion models were reflecting the same phenomenon, and the left leg (gray one) and right leg (black one) of the model were also opposite.

(a) Walking

(b) Running

Secondly, we tested overcoming transient occlusion. The transient occlusion was equal to estimate the adjacent and discontinuous frame, due to transient occlusion of cameras from all the views, which would lead to the result that the silhouettes from all the views could not be obtained. The two frames (initial data and true data) between which the interval of 10 frames existed were used to test all the algorithms. We could find in the subfigures of Figures 8(a) and 8(b) that APF, GPF, and PF could not estimate the human motion models accurately, which showed that the limbs from the human body mismatched the images of all the views. However, LDSIL could estimate the models more accurate than APF, GPF, and PF, which were depicted in the corresponding subfigures, and the limbs from the human body matched the images of all the views better like the true data.

(a) Walking

(b) Running

Thirdly, we used 50 continuous frames to test the estimation error of each algorithm. We used walking and running motion sequences (walking 1–3, running 1-2) to test, whose spatial positions had obvious change. The subfigures of Figure 9 showed the experiment results, the mean error and standard deviation from LDSIL were the smallest among these algorithms, and the maximum error shown in the identifications of the corresponding subfigures was also the smallest; besides, the errors of most frames from LDSIL were smallest. The error of each frame could be computed as below [23]: where and were, respectively, true position and estimated position of the joint marker in the model and was the number of the markers. From Figures 9(b)–9(f), we found that the errors of all the tested algorithms were close in the estimation of the first 20 frames, but the error from LDSIL could be kept smaller than APF, GPF, and PF after the 20th frame. However, it was reasonable that the errors from APF, GPF, and PF might be close to LDSIL in some frame estimation, because initial data was close to the true data and the probability was stochastic, which were still larger than the LDSIL on the whole.

(a) Mean error and standard deviation

(b) The error and maximum error in walking 1

(c) The error and maximum error in running 1

(d) The error and maximum error in walking 2

(e) The error and maximum error in running 2

(f) The error and maximum error in walking 3

Finally, we tested the validity of incremental learning of LDSIL. Compared with LDSIL, LDSS had no incremental learning. We selected 3 motion sequences to test the estimation error for 6 times arbitrarily, which included known and unknown spatial position information. We were setting the same parameters in the experiments, and the performance of LDSIL was better than LDSS as shown in the Table 1, according to the comparison of the mean errors and maximum error. The experiment results indicated that the incremental learning played the key role, because LDSIL had the smaller mean error and maximum error in the tests.

9. Conclusions

We could see the experiments above, which indicated that the proposed algorithm (LDSIL) could contribute a lot to estimate the human motion containing the spatial position information in 3D from the multiview silhouettes. The results showed that LDSIL had the best performance including disambiguating silhouettes, overcoming the transient occlusion, and reducing estimation error by the comparison of the other three algorithms (APF, GPF, and PF). Meanwhile, the feasibility and performance of incremental learning in the LDSIL were also validated by the experiments. In addition, the segmentation of multiview images must achieve the high quality; otherwise the results in the experiments would be affected. Our work in this paper had some limitation as follows. Firstly, the initialized frame of the human motion model (the 1st frame) was known, which contained the spatial position information; secondly, the spatial position information and pose of the estimated human motion in 3D were changing regularly, and the human motion model had markers. In the future, these limitations would be improved. Our work would solve the problem that initialized frame needed to be known and focus on complex human motion estimation. Moreover, the human motion model would have no marker [24, 25].

Appendices

A. Derivation of (4)

Due to , ; thus, in the space of dimensions, each direction has the same chance to generate . When objective function is unimodal approximately, is the ordinal number of maximum element in , , , , , in the th time searching, we have , , , and get , , , . Thenthus, we getlet , in the th time searching, and we also have , , similarly, and get , , , . Then,

If , the can be kept, and then , and let , and continue to search until finding to make , , and . Due to being stochastic searching vector in the space of dimensions, it will be possible to satisfy this, according to theory of probability. Thus, when is large enough, .

SEMAS can run as in Algorithm 3.

Input: , , , , , , , , ;
Output: ;
For

;
;
;
If and
;
Else If and
;
End If
End For
;

B. Derivation of the Condition in Section 5

B.1. Condition

If condition , is satisfied, and if condition , can also be satisfied.

B.2. Derivation

According to (19), (20), and property of LS, , , , , and ; we can seethen we getand then ; we haveand, from the equation above, we know , , , and and get ; thus we let and and get this following equation, when

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by National Natural Science Foundation of China (no. 61202292) and Guangdong Provincial Natural Science Foundation of China (no. 9151064101000037).

References

Y. Liu, J. Gall, C. Stoll, Q. Dai, H.-P. Seidel, and C. Theobalt, “Markerless motion capture of multiple characters using multiview image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2720–2735, 2013.
View at: Publisher Site | Google Scholar
E. Yeguas-Bolivar, R. Muñoz-Salinas, R. Medina-Carnicer, and A. Carmona-Poyato, “Comparing evolutionary algorithms and particle filters for Markerless Human Motion Capture,” Applied Soft Computing Journal, vol. 17, pp. 153–166, 2014.
View at: Publisher Site | Google Scholar
U. Güdükbay, İ. Demir, and Y. Dedeoğlu, “Motion capture and human pose reconstruction from a single-view video sequence,” Digital Signal Processing, vol. 23, no. 5, pp. 1441–1450, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
L. Sigal, A. O. Balan, and M. J. Black, “Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion,” International Journal of Computer Vision, vol. 87, no. 1-2, pp. 4–27, 2010.
View at: Publisher Site | Google Scholar
J. Deutscher and I. Reid, “Articulated body motion capture by stochastic search,” International Journal of Computer Vision, vol. 61, no. 2, pp. 185–205, 2005.
View at: Publisher Site | Google Scholar
Y. Li, Z.-X. Sun, S.-L. Chen, and Q. Li, “3D human pose analysis from monocular video by simulated annealed particle swarm optimization,” Acta Automatica Sinica, vol. 38, no. 5, pp. 732–741, 2012.
View at: Publisher Site | Google Scholar
T. Sato, “Particle relaxation method of Monte Carlo filter for structure system identification,” Journal of Civil Structural Health Monitoring, vol. 3, no. 4, pp. 325–334, 2013.
View at: Publisher Site | Google Scholar
C. A. L. Waechter, D. Pustka, and G. J. Klinker, “Real-time monocular people tracking by sequential Monte-Carlo filtering,” in Proceedings of the 6th International Conference on Computer Vision (CV '13)/Computer Graphics Collaboration Techniques and Applications (CGCTA '13), pp. 1–7, June 2013.
View at: Publisher Site | Google Scholar
G. To and M. R. Mahfouz, “Quaternionic attitude estimation for robotic and human motion tracking using sequential monte carlo methods with von mises-fisher and nonuniform densities simulations,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 11, pp. 3046–3059, 2013.
View at: Publisher Site | Google Scholar
J. M. Wang, D. J. Fleet, and A. Hertzmann, “Gaussian process dynamical models for human motion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 283–298, 2008.
View at: Publisher Site | Google Scholar
Y. W. Wong, K. P. Seng, and L.-M. Ang, “Radial basis function neural network with incremental learning for face recognition,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 41, no. 4, pp. 940–949, 2011.
View at: Publisher Site | Google Scholar
N. Sun and Y.-F. Guo, “An improved incremental learning approach based on SVM model for network data stream,” in Advances in Computer Science and Information Engineering, Advances in Intelligent and Soft Computing, pp. 277–282, Springer, Berlin, Germany, 2012.
View at: Publisher Site | Google Scholar
S. Chen, C. F. N. Cowan, and P. M. Grant, “Orthogonal least squares learning algorithm for radial basis function networks,” IEEE Transactions on Neural Networks, vol. 2, no. 2, pp. 302–309, 1991.
View at: Publisher Site | Google Scholar
Z. Zhang, T. Ke, N. Deng, and J. Tan, “Biased p-norm support vector machine for PU learning,” Neurocomputing, vol. 136, pp. 265–261, 2014.
View at: Publisher Site | Google Scholar
F. Dong, Z. Chen, and J. Wang, “A new level set method for inhomogeneous image segmentation,” Image and Vision Computing, vol. 31, no. 10, pp. 809–822, 2013.
View at: Publisher Site | Google Scholar
A. Kasaiezadeh and A. Khajepour, “Multi-agent stochastic level set method in image segmentation,” Computer Vision and Image Understanding, vol. 117, no. 9, pp. 1147–1162, 2013.
View at: Publisher Site | Google Scholar
B. He, G.-J. Wang, and C. Zhang, “Iterative transductive learning for automatic image segmentation and matting with RGB-D data,” Journal of Visual Communication and Image Representation, vol. 25, no. 5, pp. 1031–1043, 2014.
View at: Publisher Site | Google Scholar
N. Andrei, “Scaled conjugate gradient algorithms for unconstrained optimization,” Computational Optimization and Applications, vol. 38, no. 3, pp. 401–416, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
I. T. Nabney, Algorithms for Pattern Recognition, Springer, Berlin, Germany, 2001.
X.-D. Zhang, Matrix Analysis and Applications, Tsinghua University press, Beijng, China, 2nd edition, 2013.
M. F. Bugallo and P. M. Djuric, “Gaussian particle filtering in high-dimensional systems,” in Proceedings of the IEEE Workshop on Statistical Signal Processing (SSP '14), pp. 129–132, Gold Coast, Australia, June-July 2014.
View at: Publisher Site | Google Scholar
F. A. Ruslan, R. Adnan, A. M. Samad, and Z. M. Zain, “Parameters effect in Sampling Importance Resampling (SIR) particle filter prediction and tracking of flood water level performance,” in Proceedings of the 12th International Conference on Control, Automation and Systems (ICCAS '12), pp. 868–872, Jeju-do, Republic of Korea, October 2012.
View at: Google Scholar
L. Sigal, A. O. Balan, and M. J. Black, Humaneva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion, The Report of Brown University, Providence, RI, USA, 2006.
M. Sandau, H. Koblaucha, T. B. Moeslundc, H. Aanæs, T. Alkjær, and E. B. Simonsen, “Markerless motion capture can provide reliable 3D gait kinematics in the sagittal and frontal plane,” Medical Engineering and Physics, vol. 39, no. 9, pp. 1168–1175, 2014.
View at: Publisher Site | Google Scholar
W. Luo, T. Yamasaki, and K. Aizawa, “Cooperative estimation of human motion and surfaces using multiview videos,” Computer Vision and Image Understanding, vol. 117, no. 11, pp. 1560–1574, 2013.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2015 Wanyi Li and Jifeng Sun. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

889

Downloads

1008

Citations