Mathematical Problems in Engineering

Volume 2018, Article ID 5798696, 13 pages

https://doi.org/10.1155/2018/5798696

## Pose Estimation in Noncontinuous Video Sequences Using Evolutionary Correlation Filtering

^{1}CETYS Universidad, Centro de Innovación y Diseño (CEID), Ave. CETYS Universidad No. 4, El Lago, 22210, Tijuana, BC, Mexico^{2}Instituto Politécnico Nacional, CITEDI-IPN, Ave. Instituto Politécnico Nacional 1310, Tijuana 22435, BC, Mexico

Correspondence should be addressed to Kenia Picos; xm.sytec@socip.ainek

Received 27 May 2018; Revised 24 September 2018; Accepted 18 October 2018; Published 31 October 2018

Academic Editor: Ioannis Kostavelis

Copyright © 2018 Kenia Picos et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In this paper, we propose an evolutionary correlation filtering approach for solving pose estimation in noncontinuous video sequences. The proposed algorithm computes the linear correlation between the input scene containing a target in an unknown environment and a bank of matched filters constructed from multiple views of the target and estimates of statistical parameters of the scene. An evolutionary approach for finding the optimal filter that produces the highest matching score in the correlator is implemented. The parameters of the filter bank evolve through generations to refine the quality of pose estimation. The obtained results demonstrate the robustness of the proposed algorithm in challenging image conditions such as noise, cluttered background, abrupt pose changes, and motion blur. The performance of the proposed algorithm yields high accuracy in terms of objective metrics for pose estimation in noncontinuous video sequences.

#### 1. Introduction

Pose estimation is an important task widely used in three-dimensional (3D) imaging applications to obtain descriptors such as location, orientation, scaling, depth, or geometric visualization of a target [1]. Applications, such as object tracking, fixture alignment, camera-based optical metrology, and augmented reality, are addressed by performing pose estimation research [2, 3]. The problem of 3D pose estimation presents high complexity when using monocular images, in which depth information of the target within a scene is unknown [4, 5]. The pose is characterized by how the object is viewed through an observer (camera) that, in turn, is determined by the estimation of its location, orientation, and scaling parameters [6]. An effective pose recognition system must be able to produce low approximation errors between the real and estimated parameters [7]. Furthermore, it is important to consider that real images are commonly degraded by challenging conditions such as noise, background clutter, motion blur, and geometrical changes of the target. Generally, these challenges compromise the system’s performance by increasing the probability of occurrence of high estimation errors. Thus, it is required an effective and robust approach to solve pose estimation under image degradations.

Correlation filtering is a pattern recognition technique that presents high accuracy in location estimation of a target [8]. This technique is given by a linear system whose frequency response is designed to produce a high matching value between a reference image of the target and the input scene [9]. Correlation filtering also provides high efficiency in target detection under noisy environments [10, 11].

Conventionally, the design of correlation filters requires an explicit knowledge of the appearance and shape of the target [12]. An effective design strategy consists of the construction of a set of correlation filters, in which each filter is modeled with each possible appearance version of the target that is expected in the observed scene [12]. Thus, this approach allows us to recognize several geometrically modified versions of the target.

Template matching based on correlation filters can be used to solve 3D pose estimation of rigid objects [13]. Moreover, the pose estimation problem can be modeled as a search problem, in which the goal is to find the reference target view that gives the best match between the actual view of the target in the scene [14]. By using a template matching approach, a big set of correlation filters can be required to find a good match [15]. The design and construction of correlation filters is a high-dimensional complex problem, thus finding an optimal solution will require a very large search space. This strategy presented a narrow exploration space using a local search algorithm [13]. In this strategy it was assumed that the object would appear with smooth pose transitions through the video frames. However, in case of abrupt pose changes the algorithm would require performing an extended and time-consuming search. It should be noted that, in the case of an extended search, the number of iterations will increase to obtain a good estimate, i.e., the optimum. In consequence, the risk of getting stuck in a local optimum will be increased also.

In this work, we are encouraged finding the optimal solution for 3D pose estimation problem when the target can present abrupt pose changes in the scene, i.e., when the search space of location, orientation, and scaling parameters of the target is very big. This challenge needs to be treated as a combinatorial optimization problem that can become numerically intractable as the number of feasible poses of the target increases [16, 17]. From the point of view of computational complexity, this problem is classified as an NP-hard problem [18], since it will require an algorithm with the ability to solve problems in nondeterministic polynomial (NP) time, but the solution can be verified in polynomial time. Moreover, it is at least as difficult as the hardest problems in NP [19, 20].

In this paper, we propose an evolutionary correlation filtering approach to solve the pose estimation problem efficiently. The proposed evolutionary correlation filtering approach can be defined as a hybrid metaheuristic that combines correlation filters [21] and evolutionary computation [22]. Specifically, the proposed evolutionary correlation filtering approach employs a genetic algorithm [23] because of its computational efficiency, accuracy, robustness, and simplicity for implementation [24, 25]. We propose a 3D pose estimation algorithm for video sequences that contain a target with abrupt pose changes across frames while the input scenes are degraded by challenging conditions, such as the presence of additive noise, cluttering, and motion blur. The proposed algorithm utilizes an evolutionary approach to perform adaptive correlation filtering for 3D pose estimation, in which a bank of filters can evolve to obtain high estimation accuracy in 3D pose parameters of a target from monocular scenes. Based on the above considerations, the main motivation behind this work is to present an accurate algorithm for pose estimation that can be employed in critical applications of 3D imaging and object tracking. The main contributions of this work can be summarized as follows:(i)The three-dimensional pose parameters of a target pose can be estimated with a single monocular camera.(ii)The proposed algorithm can solve the pose estimation problem with evolutionary correlation filtering using a hybrid approach between a genetic algorithm and correlation filters.(iii)An adaptive filter bank dynamically evolves for pose tracking of a target by presenting high estimation accuracy in terms of location, orientation, and scaling.(iv)Robust pose estimation is performed under noncontinuous video sequences, with image conditions such as additive noise, cluttered background, and motion blur.

The paper is organized as follows. Section 2 presents the theoretical framework of correlation filtering for object recognition and evolutionary computation for global optimization. Section 3 describes our proposed method for pose estimation using evolutionary correlation filtering. In Section 4, we present and discuss the experimental results by performance evaluations in noncontinuous video sequences. Finally, the conclusions are summarized in Section 5.

#### 2. Theoretical Framework

In this section, a brief description of the main components of the proposed evolutionary correlation filtering approach is presented. This approach combines correlation filters and a metaheuristic based on a genetic algorithm as the global optimization method to solve the 3D pose estimation problem.

##### 2.1. Object Recognition with Correlation Filters

Assume that a 3D object is observed with a monocular camera, as shown in Figure 1. The captured frame consists on a projection of the 3D scene into the plane . The scene is composed of an object which is placed in an unknown location and embedded into a disjoint background , as follows:where represent Cartesian coordinates in the image plane, which are the mapped coordinates from , and represent Cartesian coordinates in the 3D space. As shown in Figure 1, the term is the distance from to the reference coordinate system of the 3D scene, and is the total distance from the center of projection (COP) of the camera to the observed world. The term is a binary function that represents the support region of the target, given by Furthermore, represents the inverse support region of the target. The additive noise denoted by is given by a zero-mean Gaussian distribution process. Moreover, in (1) is a transformation matrix that involves the appearance modifications of the target [13] related to scaling and rotation (with orientation parameters). Hence, is related to the space .