Computational Intelligence and Neuroscience

Volume 2015 (2015), Article ID 434263, 15 pages

http://dx.doi.org/10.1155/2015/434263

## A Method for Estimating View Transformations from Image Correspondences Based on the Harmony Search Algorithm

^{1}Departamento de Ciencias Computacionales, Universidad de Guadalajara, CUCEI , Avenida Revolución 1500, 44430 Guadalajara, JAL, Mexico^{2}División de Ciencia y Tecnología, Universidad de Guadalajara, CU-Norte, Carretera Federal No. 23, Km. 191, 46200 Colotlán, JAL, Mexico

Received 30 September 2014; Accepted 12 December 2014

Academic Editor: Rahib H. Abiyev

Copyright © 2015 Erik Cuevas and Margarita Díaz. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In this paper, a new method for robustly estimating multiple view relations from point correspondences is presented. The approach combines the popular random sampling consensus (RANSAC) algorithm and the evolutionary method harmony search (HS). With this combination, the proposed method adopts a different sampling strategy than RANSAC to generate putative solutions. Under the new mechanism, at each iteration, new candidate solutions are built taking into account the quality of the models generated by previous candidate solutions, rather than purely random as it is the case of RANSAC. The rules for the generation of candidate solutions (samples) are motivated by the improvisation process that occurs when a musician searches for a better state of harmony. As a result, the proposed approach can substantially reduce the number of iterations still preserving the robust capabilities of RANSAC. The method is generic and its use is illustrated by the estimation of homographies, considering synthetic and real images. Additionally, in order to demonstrate the performance of the proposed approach within a real engineering application, it is employed to solve the problem of position estimation in a humanoid robot. Experimental results validate the efficiency of the proposed method in terms of accuracy, speed, and robustness.

#### 1. Introduction

The goal of estimating geometric relations in images is to find an appropriate global transformation to overlay images of the same scene taken at different viewpoints. It can be applied in image processing when an object moves in front of a static camera and when a static scene is captured by a moving camera or multiple cameras from different viewpoints. This methodology has been widely adopted in many applications, for instance, when series of images can be stitched together to generate a panorama image [1–3]. Also, multiple image superresolution approaches can be applied in the overlapped region calculated according to the estimated geometry [4–6]. The motion of a moving object can also be estimated using its geometric relations [7] and a distributed camera network can be calibrated, where each camera’s position, orientation, and focal length can be calculated based on their correspondences [8–10]. Another example is the robot position that can be controlled or estimated through the estimation of the fundamental matrix/homography [11–13].

In a modelling problem, those data that can be explained by the hypothetical model are known as* inliers *of this model. Other points, for example, those generated by matching errors, are called* outliers*. The outliers are caused by external effects not related to the investigated model. Based on different criteria, several robust techniques have been proposed to identify points as inliers or outliers, being the random sampling consensus (RANSAC) algorithm [14] the most well known [15–17].

RANSAC adopts a simple hypothesize-and-evaluation process. Under such approach, a minimal subset of elements (correspondences) is sampled randomly, and a candidate model is hypothesized using this subset. Then, the candidate model is evaluated on the entire dataset separating all elements from the dataset into inliers and outliers, according to their degree of matching (error scale) to the candidate model. These steps are iterated until there is a high probability that an accurate model could be found during iterations. The model with the largest number of inliers is considered as the estimation result.

Although RANSAC algorithm is simple and powerful, it presents two main problems [18, 19]: the high consumption of iterations and the inflexible definition of its objective function. In the RANSAC algorithm, candidate models are generated by selecting data samples. Since such a strategy is completely random, a large number of iterations are required to explore a representative subset of noisy data and to find a reliable model that could contain the maximum number of inliers. In general terms, the number of iterations is strongly affected by the contamination level of the dataset. The other crucial issue is the objective function to evaluate the correctness of a candidate model from contaminated data. In the RANSAC methodology, the best estimation result is the model that maximizes the number of inliers. Therefore, the objective function involves the count, one by one, of the number of inliers associated with a candidate model. Such an objective function is fixed and prone to obtain suboptimal models under different circumstances [19].

Several variants have been proposed in order to enhance the performance of the RANSAC method. One example constitutes the approach MLESAC [20] which searches the best hypothesis by maximizing the likelihood via the RANSAC process by assuming that the inlier data would distribute as a Gaussian function and outliers are distributed randomly. Alternatively, instead of giving the error scale (i.e., the threshold to separate inliers from outliers) a priori, the SIMFIT method [21] proposes its prediction based on an iterative procedure. Other representative works, such as the projection-pursuit method [22] and TSSE (two-step scale estimator) [23], employ the mean shift technique to model the inlier distribution and obtain an inlier scale. Such approaches enables RANSAC to be data-driven; however, the whole process becomes quite time consuming.

Although all the proposed variants allow solving one of the two main RANSAC problems, the other challenge still remains. Such situation comes from the fact that the estimation process is approached as an optimization problem where the search strategy is a random walking algorithm while the objective function is fixed to the number of inliers associated with the candidate model. In order to overcome the typical RANSAC problems, we propose to visualize the RANSAC operation as a generic optimization procedure. Under this point of view, a new efficient search strategy can be added for reducing the number of consumed iterations. Likewise, it can be defined as a new objective function which incorporates other elements that allow an accurate evaluation of the quality of a candidate model.

Two important difficulties in selecting a search strategy for RANSAC are the high multimodality and the complex characteristics of the estimation process produced by the elevated contamination of the dataset. Under such circumstances, classical methods present a bad performance [24, 25], making way for recent new approaches that have been proposed to solve complex and ill-posed engineering problems. These methods include the application of modern optimization techniques such as evolutionary algorithms and metaheuristic techniques [26, 27] which have delivered better solutions over those obtained by classical methods.

The harmony search algorithm (HS) introduced by Geem et al. [28] is one example of these approaches. HS is an optimization algorithm based on the metaphor of the improvisation process that occurs when a musician searches for a better state of harmony. The HS produces a new candidate solution from all existing solutions. In HS, the solution vector is analogous to the harmony in music, and its generation schemes are analogous to musician’s improvisations. With regard to other metaheuristics in the literature, HS imposes fewer mathematical prerequisites; therefore, it can be easily modified for solving several sorts of engineering optimization challenges [29, 30]. Numerical comparisons have established that the convergence of HS is faster than GA [29, 31, 32]. Such a fact has attracted the attention of the evolutionary computation community. It has been effectively applied to solve a wide range of practical optimization problems such as structural optimization [33], parameter estimation of the nonlinear Muskingum model [34], design optimization of water distribution networks [35], vehicle routing [36], image segmentation [37], and circle detection in images [38].

Although HS allows identifying promising regions at the solution space within a reasonable time interval, it underperforms in local searching, in particular for parameter identification applications [39–42]. In order to enhance the fine-tuning (accuracy) properties of HS, the local search parameter (BW) is dynamically adjusted to improve the balance between exploration and exploitation during the search process (see [29]). However, considering that the adjustment follows an exponential function, longer exploitation periods are allowed, affecting the exploring capacity of HS particularly when it is applied to complex objective functions. A better adjustment alternative, which employs the use of a linear model, has been recently proposed in [43]. It presents better searching capacities than the approaches based on exponential functions. For this reason, such an approach is used in our method.

In this paper, a new method is presented for the robust estimation of multiple view relations from point correspondences. The approach combines the RANSAC method with the HS. Upon such combination, the proposed method adopts a different sampling strategy in comparison to RANSAC to generate putative solutions. Under the new mechanism, new candidate solutions are built iteratively by considering the quality of models generated by previous candidate solutions, rather than relying over a pure random selection as it is the case of RANSAC. Likewise, a more accurate objective function is incorporated to accurately evaluate the quality of a candidate model. As a result, the proposed approach can substantially reduce the number of iterations still preserving the robust capabilities of RANSAC. The method is generic and its use is illustrated by the estimation of homographies, considering synthetic and real images. Additionally, in order to demonstrate the performance of the proposed approach in a real engineering application, it is employed to solve the problem of position estimation of a humanoid robot. Experimental results validate the efficiency of the proposed method in terms of accuracy, speed, and robustness.

The paper is organized as follows. Section 2 explains the problem of image matching considering multiple views. Section 3 introduces the fundamentals of the RANSAC method. Section 4 explains the harmony search algorithm while Section 5 presents the proposed approach. Section 6 exhibits the experimental set and its performance results. Section 7 exposes a robotic application of the proposed approach. Finally, Section 8 establishes final conclusions.

#### 2. View Relations from Point Correspondences

The problem of image matching consists in finding a geometric transformation that maps one image of a scene to another image taken from a different point of view. To determine the correspondence among points, it is necessary to find corresponding points on both images. Such point pairs can be obtained as a result of applying an automatic algorithm of detection and matching [44, 45]. The detected points are described by vectors of parameters (descriptors), and frequently these parameters do not allow discriminating one point from another with complete certainty. As a result, an erroneous matching about the correspondence of points located on different parts of different images may emerge.

In this section the geometric relations of points between two views are discussed, considering the case of homography.

Assume that there is a collection of pairs of the corresponding points that are found on two images where and are the positions of points in the first and second images, respectively.

Two perspective images can geometrically be linked through a plane of the scene by a homography (see Figure 1). This projective transformation relates corresponding points of the plane projected into two images by or . The homography across two views can be computed by solving a linear system from a set of four point matches [46]. The quality of the estimated homography is evaluated by considering the distance between the position of the point calculated with the help of the matrix and the actually observed position. Therefore, the mismatch error produced by the -correspondence () is defined as the sum of squared distances from the points to their estimated positions: where and correspond to the errors produced in the first and second images, respectively.