Abstract

Automatic human motion tracking in video sequences is one of the most frequently tackled tasks in computer vision community. The goal of human motion capture is to estimate the joints angles of human body at any time. However, this is one of the most challenging problem in computer vision and pattern recognition due to the high-dimensional search space, self-occlusion, and high variability in human appearance. Several approaches have been proposed in the literature using different techniques. However, conventional approaches such as stochastic particle filtering have shortcomings in computational cost, slowness of convergence, suffers from the curse of dimensionality and demand a high number of evaluations to achieve accurate results. Particle swarm optimization (PSO) is a population-based globalized search algorithm which has been successfully applied to address human motion tracking problem and produced better results in high-dimensional search space. This paper presents a systematic literature survey on the PSO algorithm and its variants to human motion tracking. An attempt is made to provide a guide for the researchers working in the field of PSO based human motion tracking from video sequences. Additionally, the paper also presents the performance of various model evaluation search strategies within PSO tracking framework for 3D pose tracking.

1. Introduction

Human motion tracking is a general requirement in many real-time applications including automatic smart security surveillance [1], human computer interaction (HCI), 3D animation industries [2], medical rehabilitation [3], and sport science (e.g., for movement and behaviour analysis). In order to improve the feasibility, in such applications, the research on articulated human motion tracking and pose estimation has been continuously growing in the past few years [412].

The primary objective of markerless articulated human motion tracking is to automatically localize the pose and position of a subject from the video stream (sequences of images). This task is formulated by rendering a human body model on the images to identify the models configuration that is the best available match of the input images. One major line of approach in the research is based on articulated models [412]. The interest in this area owed two benefits: firstly, it generates results in the form of model configuration for each frame that can be useful for various higher-order processing tasks such as character animation and 3D movies. Secondly, the human models are sufficiently capable to give abundant information of kinematic human body motion. However, the key challenge in the approach is the high-dimensionality of the search space involved, due to the large number of freedom typically present in an articulated human body figure.

The human motion tracking is a very complex task due to the high-dimensional parametric search space and large number of degree of freedoms involved. Other challenges include variation of cluttered background, occlusion, ambiguity, and illumination changes. To address human motion tracking challenges, many methods and algorithms have been proposed in the literature using different techniques. The first solutions emerging from the computer vision community are particle filtering (PF) variants [4, 5, 1315]. In particular, the condensation algorithm is mostly widespread used in human motion tracking [16]. However, it suffers from the dimensionality issue when used for the human motion tracking problem. To address this issue, [5] introduced annealed particle filter (APF), an approach that merges condensation and simulates annealing in an attempt to improve the tracking results as well reduce the number of particles. The APF performs a multilayer particle evaluation, where the fitness functions in the initial layers are smoothed to avoid the search from being trapped in local minima. In the last layers, fitness function is more peaked in order to concentrate the particles to solution regions. To represent the posterior distribution adequately, the particle filter solutions critically rely on a large number of particles which consequently increases the computational complexity beyond practical use when a wide variety of motion is considered [6, 7, 11, 12, 17].

Partition sampling [15] is another approach to reduce the system complexity. The technique was initially introduced in [18] to address the high cost effect of particle filters while tracking multiple objects. Later on, it was successfully applied in hand tracking. In general terms, partitioned sampling (PS) is a strategy that consists of dividing the complete state into several substates “partitions,” consecutively employing the dynamics for every partition followed by a suitable weighted resampling procedure. In point of fact, partition sampling can abbreviate the high dimensionality problem in another situation also.

The partitioned sampling (PS) is different from the APF in a way that it applies strong partition of the search space. The main problem consists of determining the optimal partition. In an attempt to solve this, Bandouch et al. [13] proposed a method that combines both PS and APF known as PSAPF. The APF is incorporated into a PS framework by utilizing an appropriate weighted resampling in each subspace. This approach is able to deal with high dimensionality, but it suffers from high cost of employing a very large number of evaluations per frame (around 8000). Generally, the common human pose tracking approaches rely on filtering algorithms, but the conventional filtering algorithms have some shortcomings, such as computational expensive, slowness of their convergence, and they suffer from the curse of dimensionality and they need to rely on simple human models (which lead to suboptimal tracking results) or require a high number of evaluation to achieve accurate results [6, 7, 11, 12, 19].

Many real-world problems such as the articulated human motion tracking and pose estimation problem can be formulated as multidimensional nonlinear optimization problems of parameters with variables in continuous domains. In the past few years, evolutionary computation approaches (e.g., GA, PSO, DE, etc.) are most widely used to solve continuous optimization problems including human motion tracking and pose estimation.

Particle swarm optimization (PSO) [20] is a population-based stochastic optimization algorithm, which is originally inspired from social behaviour of bird flocking or fish schooling. The PSO has a capability of simple computation and rapid convergence as a stochastic search scheme. PSO has been successfully applied in several areas including human motion tracking and pose estimation [612] and, in the other optimization areas, the PSO give competition to the genetic algorithms. Currently, the PSO and its variants are most extensively used in the literature for video-based articulated human motion tracking and pose estimation as they offer several advantages, for example, ability to solve highly nonlinear problem, robust and reliable performance, global and local search capability, and little or no prior information requirement and it has fewer parameters to adjust [6, 7, 11, 12].

In the PSO tracking framework, articulated human motion tracking problem is formulated as a multidimensional nonlinear optimization problem. The final tracking results are then obtained by optimizing a fitness function that computes the match between the observed image and the 3D body model. Generally, the main aim of fitness function is to evaluate how well a candidate pose hypothesis matches the observation, that is, the images from all cameras views at each time instant. Many fitness functions have been proposed in the literature including optical flow, appearance model, skin, color, and contours-based fitness function. However, most common approaches rely on silhouettes and edges based fitness function [47, 11, 12] because it provides an appropriate trade-off between robustness and speed. Figure 1 illustrates the general optimization framework. Within this framework, many tracking task can be reformulated as a global optimization problem, in which a metaheuristic algorithm is used to the optimized model parameters. In this paper, we employ the latter approach; we find the optimal body model configuration by maximizing a fitness function representative of the similarity between the model and image observation under investigation.

The standard PSO is generally used to find a single optimum in a static search space. In contrast, the nature of pose estimation and tracking problem is dynamic where optimum changes over time or frames. Thus, the standard PSO cannot be used directly to the problem, it is necessary to modify the PSO algorithm to better suit this problem.

PSO has the capability to find the best value for interacting particles; unfortunately, the standard PSO also suffers from the curse of dimensionality, which has led to many variants specifically adapted for pose tracking. Also, its convergence speed becomes very slow near the global optimum when it is applied to high-dimensional parametric search space. Therefore, PSO generally failed in searching for a global optimal solution. The existence of local optimal solutions is not the sole reason for this phenomenon. It is because the particles velocities sometimes failed into degeneracy leading to the restriction of successive range in the subplain of the whole search hyperplain. In spite of its reported success, the second major issue in using PSO for articulated human motion tracking problem is that of particle diversity loss. Generally, it occurs due to convergence of the prior level of optimization, and all the particles may be close to previous optimum position and the swarm has shrunk. The swarm may be able to find the optimum efficiently if the position of new optimum still lies within the region of shrinking swarm because of its diversity. However, the true optimum can never be found if the current optimum lies outside of the swarm because of the particles low velocities which will inhibit rediversification and tracking.. However, in the dynamic optimization problem, it is necessary to control the particle diversity within the swarm with respect to time.

In order to overcome the above-mentioned issues, several variants of PSO algorithm have been proposed for pose tracking in the literature using different techniques such as hierarchical search optimization [6, 7, 11, 12, 21, 22] and global-local refinement process [10, 23]. Some adaptive versions of PSO have been presented to incorporate the strengths of other evolutionary and stochastic filtering algorithms like hybrid versions of PSO [2326] or the adaptation of PSO parameters [27]. Although these improvements lead to the avoidance of local optima, the problem of early convergence by the degeneracy of some dimensions still exists, even in the absence of local minima. Thus, the PSO algorithm performance is still limited in high-dimensional parametric search space.

There are many survey works that deliver an effective overview of articulated human body pose estimation and analysis [1, 2833]. However, these survey works cover only general overview of human motion tracking and pose estimation. To the best of our knowledge, there is no systematic survey in the literature that provides the details of PSO based approaches for human pose tracking. Recently, some surveys have given the overview of PSO in data cluster analysis application [34, 35].

In that point of view, different variants of PSO with different techniques have been proposed to improve the performance of the PSO in pose tracking. Therefore, the main aim of this work is to present a systematic literature survey by reviewing PSO algorithm and its variants as applied to human pose tracking. An attempt is made to provide a guide for the researchers who are working or planning to work in the field of PSO based human motion tracking from video sequences. Additionally, the paper will also focus on the various model evaluation search strategies within PSO tracking framework for 3D pose tracking, in order to identify their potential efficiency worth for effective pose recovery in high-dimensional search space. For example, first is the holistic (global optimization) approach, where all the human body parameters or variables are optimized jointly. Second is the hierarchical search optimization, where articulated structure of human body considering independent branches, namely, the limbs and head is optimized. Therefore, it is possible to apply a hierarchical search where there are independent search processes working on smaller spaces, and then the problem can be more easily solved. Finally, one can see the discussion of the evaluated work and it highlights the gaps for future work and provides directions of future research.

2. Particle Swarm Optimization (PSO)

A short perception of tracking process in a stochastic optimization prospective has been given in this section to indicate how PSO will work on tracking application and how it is able to achieve good performance. The fundamental concept of PSO algorithm with its variants also has been discussed.

2.1. Motivation

Fundamentally, video-based tracking is the process of automatically localizing the subject pose and position in a video stream. In the PSO context, the tracking problem can be understood as follows: imagine a certain object (food) in the image (state space) being explored. A set of particles (birds) are randomly distributed in the image space (state space). None of the particles (birds) knows the location of the object (food). However, every particle (birds) knows whether it is progressing closer to or further away from the object (food), through an objective function (sense of smell or sight). The question arises as how to find the object (food) effectively by utilizing collectively and efficiently all the particles information. The PSO algorithm is an attempt to provide a framework for the question.

In video-based articulated human motion tracking, the data of concern is in a time sequence. Thus, the tracking process is considered as a dynamic optimization problem. In this context, the object’s height and shape may change with time; the objective function is hence temporally dynamic. Such dynamic optimization problem can be solved by using two effective principles: firstly, by utilizing the temporal continuity information among two consecutive frames effectively and, secondly, by maintaining the particle diversity during each optimization process [6, 7, 11, 12, 27].

2.2. Standard Particle Swarm Optimization

Before starting the discussion on standard PSO algorithm, we first introduce some necessary notations and assumptions which are used in PSO algorithm. Assume an -dimensional solution space . Let us consider that the th particle position and velocity are represented as and , respectively. The individual best position is originated by th particle so far (personal best) and expressed by and overall best of all the particle value (global-best) is defined as . The standard particle swarm optimization (PSO) can be briefly summarized in the following paragraph.

Particle swarm optimization (PSO) [20] is a population-based stochastic optimization algorithm. The main motivation for the development of this algorithm was based on the simulation of simplified animal social behaviors such as bird flocking or fish schooling. The algorithm maintains a swarm, consisting of particles where each particle is representing a candidate solution to the optimization problem under consideration. If the problem is described with variables, then each particle denotes an -dimensional point in the search space. A cost (fitness) function is used in the search space to measure the fitness of particles. The particles are randomly generated in the solution search space, having its position adjusted according to its own personal best experience and best particle position of the swarm.

The PSO is initialized with a set of random particles ( number of particles) and the search for optimal solution iteratively in the search space. Each particle has a corresponding fitness value as well as its own velocity . The fitness value is calculated by an observation model and the velocity provides the direction of particle movement. In each iteration, the th particle movement depends on two key factors: first its individual best position , which is originated by th particle so far and second, the global best position , is the overall global best position that has been generated by entire swarm. In the iteration, each particle updates the position and velocity by utilizing the following equations: where and denote the velocity vector and the position vector of -particle, respectively, at t-iteration. The particle velocity constraint is one of the important mechanisms which is used for controlling particle movement in search space and is also useful for making balance between exploitation and exploration. The specific acceleration parameters are and which represent the positive constants which are called as cognitive and social parameters; both control the influence balance of the personal best and global best particle position; are random numbers obtained from a uniformly distribution function in the interval ; is the inertia weight parameter that has been used as velocity constraint mechanism [6]. The inertia weight plays an important role for controlling the trade-off between global and local search. The high value of inertia weight promotes particles to explore in large space (global search) whereas small inertia weight promotes particles to search in smaller area (local search). In the following, at the starting of the search , high inertia value is imported which decreases until it reaches ( the lowest value. Figure 2 illustrates a schematic view of updating the position of a particle in two successive iterations.

Here, at each iteration, global best and individual best positions are computed after invoking the fitness function (cost function) at each position . Generally, a greater fitness value corresponds to a more optimal position. The best position of a particle is updated only when the present position value is higher than the former best value. Among all of the individual best position values, the position with the highest fitness value is considered as global best. It can be expressed mathematically as follows: where is the fitness value at the position . The process will continue until the terminating conditions are met (typically a maximum number of iterations).

The PSO has good balance between exploration and exploitation and plays an important role in avoiding premature convergence during the optimization process. In fact, in a number of works, PSO has been successfully applied in articulated human motion tracking and has been reported to give good accuracy with less computational cost in comparison to particle filtering approach [612]. Pseudocode for PSO algorithm is presented in Algorithm 1.

Set parameters .
// Initialization
Initialize a population of particles with random position () and velocity ().
foreach Particle do
Compute the fitness value .
end for
Initialize the inertia weight .
Select the best particle in the swarm .
// Iteration process
for to maximum number of iterations do
 foreach particle do
   update velocity and position for the particles.
  employ the inertia weight update rule.
   compute particles fitness value .
   update best particles: and .
  end for
  if convergence criteria are met then
 Exit from iteration process;
end if

3. PSO to Human Motion Tracking

In the computer vision community, articulated human pose tracking is a long standing problem and still it is considered as a difficult problem because of the many challenges it presents. First, it is a high-dimensional problem because large number of variables is used to cover full human body pose estimation and to obtain accurate results. This is a crucial problem in pose tracking. The solution to this problem requires a search strategy that can efficiently explore wide sections of the search space. Second, the computing power is a limiting factor. The operations needed to evaluate the solutions are computationally very expensive. Therefore, it is important and essential to have good solutions in few iterations as possible. Finally, an appropriate balance is needed between local and global search. In most of the situations, local search can deliver good solutions; however, the problem of occlusion and ambiguities in the configuration of camera lead us to use the global optimization so that a correct solution can be achieved after the conflicting situation is finished.

As we have stated previously, human pose tracking problem can be formulated as a multidimensional nonlinear optimization problem that search the best possible joint angle of the human body model given the information available in the prior and the current images [19].  In the past decades, various stochastic filtering (e.g., PF, APF, PSAPF, etc.) and nature-inspired evolutionary algorithms (e.g., GA, PSO, PEA (probability evolutionary algorithm), QICA (quantum-inspired immune cloning algorithm), QPSO (quantum-PSO), etc.) have been developed for human pose tracking. It has been proven by many researchers that evolutionary algorithms are viable tool to solve complex optimization problems and can be successfully implemented for solving human pose tracking problem. Among them, the PSO algorithm gained popularity in the last few years in the domain of human motion tracking and pose estimation. It is because of its simplicity, flexibility, and self-organization. The PSO also successfully combined with other techniques such as dimensionality reduction and subspace learning to address the human pose tracking problem [7, 3638].

3.1. Literature Survey

Initially, Ivekovic and Trucco [22] have applied PSO for upper body pose estimation from multiview video sequences. The PSO algorithm is applied in 20-dimensional search space. The optimization process is executed in 6 hierarchical steps which are based on model hierarchy. However, the approach is only performed in static upper body pose estimation. Similarly, Robertson and Trucco [21] used an approach where the number of optimized parameters is iteratively increased so that a superset of the previously optimized parameters is optimized at every hierarchical stage.

John et al. [6] proposed a hierarchical version of PSO (HPSO) to the human motion tracking using a 31 DOF (degree-of-freedom) articulated model with great success. In order to overcome the high dimensionality, the 31-dimensional search space is divided into 12 hierarchical subspaces. Additionally, the estimates obtained from each subspace are fixed in the following optimization stages. As they have stated, their approach results outperforms PF, APF, and PSAPF. HPSO algorithm reduces the computation cost massively in the comparisons of the stochastic filtering approaches. However, this approach has some shortcoming that the HPSO algorithm failed to escape the local maxima calculated in the previous hierarchical levels which as a result may produce inaccurate tracking. Moreover, the final solution tends to drift away from the true pose, especially at low frame rates. Zhang et al. [11] applied the PSO stochastic algorithm to estimate the full articulated human body motion. This method also estimates pose in a hierarchical fashion by prescribing some space constraints into each suboptimization stage. To maintain the diversity of particles, the swarm particles are circulated according to a weak transition model, and the temporal continuity information is also utilized.

Krzeszowski et al. [10] present a global-local particle swarm optimization method for 3D human motion capture. This system divides the entire optimization cycle into two parts; the first part of optimization cycle estimates the whole body and the second part refines the local limb poses using less amount of particles. A similar approach called global-local annealed PSO (GLAPSO) is presented by Kwolek et al. [9]. This algorithm maintains a pool of candidate instead of selecting global best particle, to improve the algorithm ability to explore the search space. One hybrid approach is presented by Kwolek et al. [23], in which particle swarm optimization with resampling is used to articulate human body tracking. The system employs a resampling method to select a record of the best particle according to the weights of particles making up the swarm which leads to the reduction of the premature stagnation. Kwolek [26] applied PSO algorithm to track the human motion from multiview surveillance video. This approach also estimates the body pose hierarchically. Krzeszowski et al. [39] proposed an approach which combines the PSO and PF (PF + PSO), where the particle swarm optimization algorithm is employed in the particle filtering to shift the particles towards more promising region of human body model. Similarly, [24, 25, 40] introduced annealed PSO based particle filter (APSOPF) algorithm for articulated human motion tracking. The sampling covariance and annealed factor are incorporated into the velocity-updating equation of PSO which results in constraining particle to most likely the reason of pose space and reducing generation of invalid particles. However, both of the above-discussed approaches obtained good accuracy but are computationally heavy.

Ivekovic et al. [41] proposed an adaptive particle swarm optimization (APSO) to reduce the computational complexity of the system. This system uses black-box property of HPSO in which it requires no parameters value input from the users, and it adaptively changes the search parameters online based on the types of pose estimation in the previous frame. Yan et al. [42] adopt annealed Gaussian based particle swarm optimization (AGPSO) for 3D human motion tracking. In this approach, the observation is designed as a minimized Markov Random field (MRF) energy. Kiran et al. [43] present a hybrid PSO called (PSO + K) for human posture classification. Initially, the PSO algorithm is applied to search the optimal solution in parametric search space and then it passed to K-means algorithm which has been used to refine the final optimal solution. Zhang [44] introduces another hybrid PSO algorithm for human motion tracking in monocular video. In order to construct the weight function of particles, color, edge and motion cues are integrated together. To escape from the local minima, simulated annealing (SA) algorithm incorporated into PSO. This approach yields better results than both the standard PSO and APF algorithms.

Ugolotti et al. [17] introduced two algorithms for model based object detection, namely, particle swarm optimization (PSO) and differential evolution (DE). PSO is clearly and consistently superior compared with the DE for model based tracking. Similarly, Bolivar et al. [19] report the comparisons between evolutionary and particle filtering algorithms. As they stated that, for the human motion tracking, the hierarchical version of PSO (H-PSO) is better than all filtering as well as evolutionary algorithms except hierarchical covariance matrix adaptation evolutionary strategy (H-CMAES). Their comparisons results are tested on HumanEva-I-II datasets. Fleischmann et al. [12] proposed a soft partitioning approach with PSO (SPPSO). In this approach, the optimization process is divided into two stages where in first stage important parameters (typically torso) are optimized and in second stage all remaining parameters are optimized jointly and called global optimization which refines the estimates from the first stage. The approach obtained good results at low frame rates (20 fps) sequences but, at normal frame (60 fps), it received almost similar results as HPSO. However, they used global optimization in second stage; therefore, the approach is computational costly.

In the user-friendly human computer communication, nonintrusive human body tracking is a key issue. This is one of the most challenging problems in computer vision and at the same time one of the most computationally demanding tasks. Conventionally, the human pose tracking approaches need to execute the various tasks step-by-step to obtain good tracking results (i.e., foreground/background removal, edge detection, and model rendering and optimization). Therefore, the implementation of such techniques in CPU (central processing unit) leads to longer processing time and higher computational cost. From this point of view, the GPU architecture benefits from the property of execution of thousands of threads concurrently because of the massive fine grain parallelization. In order to take GPU architectures benefit, there are some publications that discuss the implementation details of the PSO and its variants on GPU.

Kwolek et al. [8] proposed parallel version of PSO known as latency tolerant parallel particle swarm optimization. In this algorithm, multiple swarms are present that are executed in parallel on multiple computers which are connected through peer-to-peer network which exchange the information about the location of the best particles as well as its corresponding fitness function of a subswarm. This information about the location of global particles and corresponding fitness value is transferred asynchronously after each optimization iteration without blocking the sending thread. The mutual exclusive memory is used to store these best values. After each iteration, the value of global particle is verified by processing thread for its performance over other particles provided by other computers. If value is better than previous, then the value of best particle gets updated and optimization process continues. The main novelty of this work lies in the asynchronous exchange mechanism for the best particle information during the multiple calls of PSO. Their result demonstrates that latency tolerant parallel particle swarm optimization is able to give real-time results. Zhang and Seah [45] proposed another hybrid approach that is called Niching swarm filtering (NSF) with local optimization. In the NSF framework, ring topology based bare bones particle swarm optimization algorithm (BBPSO) and particle filter algorithms have been integrated. This approach in tracking unconstrained human motions without using strong prior information of the dynamics and its GPU implementation shows that approach is able to give real-time results.

Mussi et al. [46] developed an approach to articulated human body tracking from multiview video using PSO running on GPU. Their implementation is far from real-time and roughly requires 7 sec per frame, but they clearly demonstrate that the formulation of algorithm in GPU decreases the execution time prominently without compromising the accuracy of post estimation. Krzeszowski et al. [47] report GPU-accelerated articulated human motion tracking using PSO, and they show that their GPU implementation has achieved a speedup of more than fifteen times than the CPU implementation with a 26 DOF 3D model.

Real-time tracking performance of human motion is critical for many applications. In order to compile this, Zhang et al. [48] introduced GPU-accelerated based multilayer framework for real-time full body motion tracking. In their multilayer pose tracking framework first layer, they applied the NSF stochastic search to fit the body model to images and, in the second layer, the estimation is refined hierarchically using local optimization. The volume with appearance reconstruction observation has been used to measure the pose hypothesis and 3D distance transform (DT) is employed to increase the algorithm speed. The GPU implementation is done using CUDA (compute unified device architecture), which significantly accelerates the pose tracking process. Their results demonstrate that NFS algorithm outperforms other state-of-the-arts algorithms in CPU implementation as well as in GPU. Similarly, Rymut and Kwolek [49] present a GPU-accelerated PSO for real-time multiview human motion tracking. In this approach, they demonstrated how particle swarm optimization algorithm works on GPU for articulated human motion tracking and also demonstrated the parallelization of the cost function.

As we have indicated previously, the PSO also is successfully combined with other techniques such as dimensionality reduction and subspace to address the human pose tracking problem. For example, John et al., 2010 [7], introduced a hybrid generative-discriminative approach for markerless human motion capture using charting and manifold constrained particle swam optimization [7]. The charting algorithm has been used to learn the common motion in a low-dimensional latent search space and the pose tracking is executed by a modified PSO called manifold constrained PSO. Mainly this PSO variant is designed to polarize the search space for the best next pose. Similarly, Saini et al. [37] proposed a low-dimensional manifold learning (LDML) approach for human pose tracking where a hierarchical-charting dimension reduction technique has been used to learn motion model. In order to escape from local minima, the quantum-behaved particle swarm optimization (QPSO) has been used for pose tracking in low-dimensional search space. Li and Sun [50] proposed a generative method for articulated human motion tracking using sequential annealed particle swarm optimization (SAPSO). Simulated annealed principle has been integrated into traditional PSO to get global optimum solution more efficiently. The main novelty of their approach is the use of principal component analysis (PCA) to reduce the dimensionality and learn the latent space human motions.

Note that most of the above-discussed PSO based approaches rely on silhouette and edge based fitness function [411, 24, 25, 40, 41]. The main reason is that both generic features have been shown to deliver an appropriate trade-off between robustness and speed. However, in some approaches, the use of skin color leads to failure because of the influence of lightning on skin color which varies from person to person. Table 1 summarizes the above research contributions.

3.2. Discussion

PSO has been applied in a number of areas as a technique to solve large, nonlinear complex optimization problems. However, the applications of PSO in computer vision and graphics are still rather limited [52]. Few PSO variants have been developed with different techniques for pose tracking. According to literature review, in most of the cases, the hierarchical version of PSO (HPSO) is most effective for pose tracking. To get fast and real-time tracking results, many variants of PSO have been implemented on GPU. However, the PSO still requires much investigation to improve the tracking performance and also other key features that would make such algorithms techniques suitable for pose tracking. Furthermore, many variants of PSO have not been used yet in pose tracking such as multiobjective PSO, multiswam PSO as well as many others which have been discussed [34]. Some future works and research trends to address the pose tracking problem are as follows: development of multiswarm PSO algorithm, subswarm method, such as [53], developing new fitness and measuring functions, new approach for search space partitioning, some dimensional reduction techniques, which can be easily incorporated to PSO, new sensitivity analysis of PSO parameters, and others.

4. Pose Evaluation Techniques

In the last decade, many stochastic filtering and nature-inspired algorithms have been proposed in the literature using different techniques for human pose tracking. Among the many nature-inspired algorithms, pose tracking with particle swarm optimization techniques has found success in solving pose tracking problem because PSO is well suited for parameter-optimization problems like pose tracking. In order to improve the efficiency of PSO algorithms, researchers have proposed different variants of PSO algorithm with different techniques such as hierarchical optimization, different fitness functions, different search space partitioning, and different pose refinement process.

The complexity of the human kinematic structure and the large variability in body shape between individuals imply that there are many parameters that need to be estimated for a full body model that is often over 35 (in our case 31), even for coarse models. When defining the problem as an optimization of an objective function over the model parameters, the search space becomes very high dimensional. However, exploring high-dimensional state space in practical time becomes problematic. Therefore, in order to reduce the complexity, different types of search strategy have been utilized in pose space within PSO tracking framework such as hierarchical search with soft and hard partitioning. According to the literature review, there are two techniques that can be applied to solve the problem: holistic and hierarchical techniques. Figure 3 illustrates the taxonomy of pose tracking approaches within PSO tracking framework. The graphical representation of different partitioning strategy is illustrated in Figure 4 [12].

4.1. Holistic Optimization

In the holistic approach, all human pose parameters are optimized jointly, also known as global optimization. Fundamentally, this approach is more appropriate because it makes no assumptions about the independence of the body parts which results in more robustness to error. In practical, due to high dimensionality, the recovery of complete 3D body pose parameters jointly is very difficult especially in real-time situation because this process required more computing time to evaluate the solution. The exponential growth of the search space with large number of variables is the main drawback of the holistic approach. Furthermore, a small deviation in the higher nodes of the hierarchy affects their lower children. However, good quality results can be received in the early search stages for the lower nodes, but it might be discarded later by changes in one of their parents. Finally, the main drawback of global optimization is that it is computationally very expensive. Figure 4(a) demonstrates the global optimization process where the optimizer searches the whole search space (grey) at once.

4.2. Hierarchical Search Approach

Computational complexity of global optimization approach is very high due to large number of variables to cover full human body pose estimation. In this situation, the hierarchical search takes advantage by considering the each human body parts independently. Therefore, it is possible to apply a hierarchical search in which there are independent search processes working on smaller spaces. Despite this, several researchers have applied hierarchical partitioning schemes in the search space according to the limb hierarchy to reduce the complexity of high-dimensional search [6, 7, 11, 21, 41, 46, 51] and also they proved that hierarchical search approaches is more appropriate than holistic approaches. The hierarchical search approach further can be classified in two classes: hard search space partitioning (HP) and soft search space partitioning (SP).

4.2.1. Hard Search Space Partitioning (HP)

Hierarchical optimization along with the global-local PSO which divides the optimization process into multiple stages in which a subset of parameters is optimized while the rest of the parameters are fixed, disabling the optimizer from refining the suboptimal estimate from the initial stage. In other words, in hierarchical schemes, the search space is partitioned according to the model hierarchy. The most important parameters are optimized first (typically torso), while the less important are kept constant. This is termed as hard partitioning of the search space. Figure 4(b) illustrates the hierarchical optimization where, in first stage, parameter is optimized and is kept constant. Similarly, during the second stage, parameter is optimized and is kept constant.

4.2.2. Soft Search Space Partitioning (SP)

Hierarchical hard search space partitioning (HP) approaches are very effective to reduce the effect of computational complexity [6, 7, 11, 21, 46]. But the pose estimation approach which is divided into hierarchical stages with hard partitions suffers from error accumulation, especially low frame rate sequences. The error accumulation occurs due to objective function for one stage being unable to evaluate completely independently from subsequent stages. To avoid error accumulation issue, some of the researchers have applied soft partitioning approach in search space [12].

The major difference between hard and soft partitioning is that previous optimized level (parameters) allowed some variation in the following level to refine their suboptimal from the initial stage. The soft partitioning approach reduces the search space not as much as hard partitioning, but the search space is much smaller than in a global optimization. Figure 4(c) demonstrates the soft partitioning scheme where the initial stage is equivalent to the hierarchical scheme, but allowed some variation during the second stage optimization which results in the optimizer to find a better estimate.

Hierarchical search approaches are very effective to reduce the effect of computational complexity [6, 7, 11, 21]. However, hierarchical search approach also has many drawbacks. Firstly, incorrect pose estimation (due to noise or occlusion) for the initial segment can distort the pose estimates for subsequent limbs. Therefore, an error in the estimation of the first node compromises the rest of the nodes irrevocably [13]. Secondly, the optimal partitioning may not be obvious and it may change according to time. Table 1 shows the summary of some popular PSO based approaches and also display their pose evaluation techniques along with the number of stages used to evaluate the solution.

4.3. Discussion

We notice that each author has used different number of stages in hierarchical search optimization to estimate a complete human body. However, based on the literature, it is still unclear how many hierarchical partitions are sufficient to obtain good quality of results and also which order of partition is the best (e.g., which body part needs to be estimated first, leg or arms?). In the noise free data, this does not make a sense but, in noisy data observation practices, it makes inaccurate estimation in the early partition which results in inaccurate next partition outcome. Additionally, many researchers have proved that hierarchical search approaches are very effective to reduce the effect of computational complexity [6, 7, 11, 21, 46]. However, still it is not clear which hierarchical search space partitioning is effective for pose tracking.

In order to investigate above mention issues, we have implemented PSO algorithm using Brown University computational tracking framework [4]. This framework covers the APF implementation. We substituted our PSO tracking code in their framework in place of the APF code while other parts of the framework are kept the same. Firstly, we have tested various model evaluation search strategies in 3D pose tracking to identify their potential efficiency worth for effective pose recovery in high-dimensional search space. Secondly, we also have investigated different order of body partitions for PSO (i.e., first leg or arms and vise verse) and finally we have tested different hierarchical body partitions to identify how many hierarchical partitions are good enough to get good quality results with considerable computational cost.

5. Quantitative and Visual Results

In this section, we have shown some results of the test that we have performed using PSO algorithm. The parameter settings for PSO algorithm are presented in Table 2. Additionally, for the global PSO optimization, 60 particles and 60 iterations are used. The presented PSO algorithm is run in Brown University framework with windows, 3.20 GHz processor. The quantitative comparison is carried out in three prospective: Comparison of holistic and hierarchical algorithms search with both soft partitioning and hard partitioning; an extensive experimental study of PSO over range of values of its parameters and; computation time. The complete experiment was carried out using Lee walk dataset [4]. This dataset was captured by using four synchronized grey scale cameras with 640 × 480 resolutions. The main reason to use Lee walk dataset is that it contains ground-truth articulated motion which is allowed for a quantitative comparison of the tracking results. As in [54], the error metric calculated is defined as the average absolute distance between the marker actual position and the estimated position : Equation (4) gives an error measure for a single frame of the sequence. The tracking error of the whole sequence is calculated by averaging that for all the frames.

In hierarchical search, we have divided complete search space into 6 different subspaces and correspondingly executed the hierarchical optimization in 6 steps in which we have considered that it will provide an appropriate balance between global and local search. The six steps are the global position and orientation of the root followed by torso and head and finally the branches corresponding to the limbs (as illustrated in Figure 5 where LUA and LLA represent the left upper arm and left lower arm resp.; similarly LUL and LLL represent the left upper leg and left lower leg respectively; similar representation is on right side for the right body parts), which are optimized independently. In hierarchical search, with hard partitioning, the estimate obtained for each subspace is unchanged once it is generated and only one limb segment at a time is optimized and the results are propagated down to the kinematic tree. While in soft partitioning, the previously optimized parameters which are positioned higher in the Kinematic tree allowed some variation in the following optimization stage. Thus, the current estimates can be refined from their parents. The variations in the previous optimized parameters are set empirically.

5.1. Accuracy

Figure 6 shows the average 3D tracking error graph between ground-truth pose and estimated pose with 3600 evaluation per frame on Lee walk sequences. The hierarchical search with soft partitioning approach performs better than the other two approaches, particularly at 20 fps. However, the difference is very less pronounced at 60 fps. The highest fluctuations correspond to the fast limbs movements, especially the lower arms and legs. As we have noticed that as the number of evaluation increases, the likely range of variation in the 3D error becomes narrower and it increases the tracking performance. Table 3 displays the average 3D tracking error for Lee walk sequences and Figure 7 shows each evaluating approach tracking results for a few frames with corresponding 3D error.

5.2. Varying the Number of Particles

We have evaluated the PSO performance by varying the number of particles for both hierarchical hard and soft partitioning. However, to keep the computational time feasible on our hardware, the range of is limited. The experiments have been tested with 30, 40, and 50 particles for over five trials. Table 4 shows the results. As predicted, we notice that the accuracy and consistency have been improved as the value of increases and also it has increased the computational time. More number of particles in PSO is able to estimate pose more accurately and avoid the error propagation. However the number of likelihood evaluations per frame and computational cost increases with : (i.e., 30 particles result in 5400 likelihood evaluations, 40 particles in 7200, and 50 particles in 9000 evaluations per frame). The complete set of experiments to evaluate the values of after which no significant benefit happens is beyond the scope of our hardware.

5.3. Computation Time

The computation time is the major factor in the pose tracking. Generally, it takes from seconds to minutes to estimate the pose in one frame for MATLAB implementations [4, 6, 7, 54]. This means that tracking an entire sequence may take hours. However, the computation time vastly depends on the number of likelihood evaluation and model rendering. As we have used the same number of likelihood evaluation for all search strategies, the computation time for holistic is much higher than other hierarchical search approaches. Furthermore, when it comes to a hierarchical search approaches, the hard partitioning computation time is less than soft partitioning. The commutation time can be massively cut down by using more numbers of partitioning in the search space (similar to [6, 41, 46]). Thus, the conclusions can be drawn based on the experimental results; the hierarchical search with hard partitioning approach is more useful than the other two approaches to reach near to real-time performance. However, the PSO results are still far from being real time, which would be necessary for many applications. The fast and real-time performance can be obtained by using graphics processing hardware.

In our experiment, we have investigated different order of body partitions for PSO (i.e., first arms or legs). We have tested both cases, but the order of partitioning did not have any influence on the quality of tracking in our experiments. Finally, in order to identify that how many hierarchical body partitions are good enough to get good quality results and which can have considerable computational load, we have tested PSO algorithm with different optimization stages. We have tested that two-stage optimization similar to [810, 12, 23], five-stage optimization [11], six-stage optimization [19], seven-stage optimization, [13] and twelve-stage optimization [6, 7, 41]. In our experimental results, we found that six-stage optimization provides good tracking accuracy than the other optimization stages. It is because the six stages provide more appropriate balance between global and local search than others. However, there is no significant difference in tracking accuracy between six optimization stages and the other stages. The approaches which uses more number of optimization stages like [6, 7, 41, 46] are able to cut down computational cost massively. While the two stage optimization approaches [810, 12, 23] suffer from high computational cost because of the involvement of global optimization in the second stage. Therefore, the approach is not suitable for real-time applications. Furthermore, the tracking accuracy strongly depends on the quality of the image likelihood. The best tracking performance is obtained combining silhouette and edge in the likelihood evaluation. When it comes to an individual likelihood evaluation, the silhouette likelihood evaluation is reported to perform better.

5.4. Discussion

Based on the experimental results, a set of conclusion can be drawn. First, the hierarchical search approaches are more appropriate than holistic (global) in terms of high and robust tracking accuracy with very less computational cost. The hierarchical search with soft partitioning approach is only suitable for low frame rates sequences, but, in high frame rates, it produces very near results as hard partitioning approach. The soft partitioning approach has more computational cost than hard partitioning. Second, the PSO algorithm did not have any influence on the tracking quality when the order of partitioning is changed (i.e., first leg or arms and vise verse). Finally, the six-stage optimization provides an appropriate balance between global-local search; therefore, it has good accuracy than other stages.

6. Conclusion

This paper has presented a review of previous research works in the field of particle swarm optimization and its variants to pose tracking problem. Additionally, the paper has presented the performance of various model evaluation search strategies in 3D pose tracking using PSO algorithm. Research shows that PSO algorithm applied to pose tracking in multidimensional search space has shown outstanding performance as compared to the stochastic particle filtering algorithms. However, convergence speed is still limited when the search is for global optima. Furthermore, the modified PSO and its hybridization with other algorithms such as particle filter, simulated annealed, etc. and/or the combinations with other techniques such as dimensional reduction and feature selection can be successfully applied to pose tracking problem and provides better results. Our implementation of PSO algorithm results shows that the hierarchical search approach is very effective for pose tracking and it can reduce the computation cost massively and produce robust and reliable tracking results.

Conflict of Interests

The authors declare that they have no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by ERGS Grant (ERGS/1/2013/ICT07/UTP/02/03), funded by the Government of Malaysia. The authors would like to thank Leonid Sigal for many useful discussions regarding Brown University framework and also for maintaining the online evaluation for the HumanEva (Lee walk sequence) dataset.