Abstract

The correspondence function (CF) is a concept recently introduced to reject the mismatches from given putative correspondences. The fundamental idea of the CF is that the relationship of some corresponding points between two images to be registered can be described by a pair of vector-valued functions, estimated by a nonparametric regression method with more flexibility than the normal parametric model, for example, homography matrix, similarity transformation, and projective transformations. Mismatches are rejected by checking their consistency with the CF. This paper proposes a visual scheme to investigate the fundamental principles of the CF and studies its characteristics by experimentally comparing it with the widely used parametric model epipolar geometry (EG). It is shown that the CF describes the mapping from the points in one image to their corresponding points in another image, which enables a direct estimation of the positions of the corresponding points. In contrast, the EG acts by reducing the search space for corresponding points from a two-dimensional space to a line, which is a problem in one-dimensional space. As a result, the undetected mismatches of the CF are usually near the correct corresponding points, but many of the undetected mismatches of the EG are far from the correct point.

1. Introduction

Finding point correspondences between two images is a fundamental problem in computer vision [1, 2]. In two given images, the corresponding points (CPs) are the projections of the same point in a scene. Many computer vision algorithms and applications rely on the successful identification of point correspondences between two images, for example, tracking, stereo vision, motion analysis, object recognition, remote sensing, image mosaicing, and automatic quality control, among others [37].

Point correspondences are usually established by the following procedures: extracting salient points, calculating their descriptors based on a small and local area around them, establishing putative correspondences by comparing the descriptors, and refining the putative correspondences. Compared with the representations based on a large spatial area, local feature descriptors are usually more robust to brightness variation, deformation, and occlusion but have less distinctiveness. This typically results in a high percentage of mismatches/outliers among the computed putative correspondences, which are very likely to ruin traditional estimation methods [810]. Therefore, an essential problem in computer vision is rejecting mismatches from given putative correspondences in a refining stage [1113].

Correspondence function (CF) is a model recently introduced based on point set mapping theory to reject mismatches from putative correspondences [2]. The fundamental idea of CF is that, for two given images and of a scene, the relationships between their CPs can be described by a pair of vector-valued functions, which are estimated by a nonparametric regression method. Mismatches are then detected by checking whether they are consistent with the CFs. The key of the model is that the relationships between CPs are represented with more flexibility than the usual parametric model, for example, homography matrix [14], similarity transformation [15], and projective transformations [15]. The flexibility is to account for nonrigidity of objects or to reduce the undue influence from outliers.

In this work, we propose a visual scheme to investigate the fundamental principles of CF (Figures 1 and 2) and to study its characteristics by experimentally comparing it with the widely used parametric model epipolar geometry (EG) (some rudimentary comparisons were made between CF and EG in a conference paper ([16])). It is shown that the CF describes the mapping from a point in one image to its CP(s) in another image, which enables us to directly estimate the positions of the CPs. However, the EG acts by reducing the search space for CPs from a two-dimensional space to a line, which is a problem in one-dimensional space. In applications, the result of the difference between EG and CF is that the undetected mismatches by CF are usually near the correct CPs, but many of the undetected mismatches of EG are far from the correct correspondence.

The mismatch-rejecting problem has been investigated in many studies [14, 18, 19], and the EG constraint is a widely used model in resolving the problem.

Suppose that and are two images of a scene, is the fundamental matrix, and is a set of putative point correspondences between them. The EG constraint sayswhere is a pair of CPs and a superscript denotes the transpose of a matrix or vector. The principle of the mismatch-rejecting methods based on EG is that the corresponding point of is on a line in and vice versa (Figure 3(a)), where and are called two epipolar lines defined by points and , respectively [3]. Therefore, if is known, then we can reject some mismatches from by checking whether they are consistent with the EG constraint (1) [1, 3, 4].

However, as shown in Figure 3(b), if is another point on the epipolar line , the mismatch cannot be detected by this constraint. Therefore, the EG constraint is a necessary but insufficient condition, by which the mismatches consistent with (1) cannot be detected. Unfortunately, the undue influence of these kinds of mismatches may be large enough to severely distort the final results in applications, for example, in image mosaicing and 3-dimensional reconstruction.

CF is a concept recently introduced to reject mismatches from putative correspondences between images [2]. Its potential superiority has been shown experimentally on efficiency and accuracy. From the point set mapping perspective, CF depicts the relationships between corresponding points by mapping a point in one image to its corresponding point in another [2].

For two images of a general scene, the relationship between some corresponding points (CPs) can be classified into three types: one-one, one-many, and many-one [2, 4]. The fundamental idea of the CF is to decompose the relationships between the CPs into two subsets. The two subsets consist of one-one and many-one, or one-one and one-many kinds of corresponding relationships. The two subsets of corresponding relationships can be described by two vector-valued functions, and .

Suppose are a set of putative correspondences. CF, for example, , can be represented bywhere is kernel function and are parameters to be estimated. Based on this representation and Gaussian kernel, Li and Hu [2] studied the estimation of CF based on Support Vector Machine.

Actually, the vector-valued function interpolation/regression with similar idea has been extensively investigated for modelling the transformations between two images in scenarios of image registration and computer graphics. For example, Goshtasby [20] used a vector-valued function from a reference image to a target image for remote sensing image registration based on surface spline; Bookstein [21] studied the registration of medical images based on the thin-plate spline (TPS). Schölkopf et al. [22] proposed a machine learning method to establish dense point correspondences by estimating the deformation fields that transform two given objects into each other.

In these scenarios, the focus is to transform an image into a coordinate system of another image, therefore two images to be processed are usually called a reference image and a target image, respectively, and only one vector-valued function is needed to describe the relationship from the reference image to the target image. A good review can be found in [15, 23] for the works on image registration based on this idea.

Let and be two image points, , and . The CF, for example, , can be expressed bywhere . The mapping is a special expression of CF and referred to as a vector field [24, 25]. Zhao et al. [24] and Ma et al. [25] investigated the vector field learning problem using a Bayesian framework and studied its application in outlier rejection. Ma et al. [26, 27] proposed an estimation method, , for the vector field and investigated its application in point set registration between images of nonrigid object; Ma et al. [28] investigated the point correspondence problem by interpolating the vector field and formulated it a maximum a posteriori (MAP) estimation of a Bayesian model with some hidden variables.

In literature, related researches are usually focused on learning and application of the correspondence function or transformation function based on a flexible function. We investigate the principles and characteristics of it in this work.

3. Principles of Correspondence Function

Suppose and are two images to be registered. The CPs between and form a surface in the joint-image space [17], which is illustrated schematically in Figure 1(a). In this figure, images and are illustrated as a horizontal axis and a vertical axis, respectively. And the surface is referred to as correspondence manifold (CM) in [17].

In Figure 1(a), is a pair of CPs between images and , and is a point in the joint-image space formed by CPs , and the other points on the curves from to and from to have similar meanings. Additionally, curves , , and correspond to the image points projected from the scene surface which can be distinguished by both cameras of and ; curves and and points correspond to the scene portions which can only be distinguished by one camera as illustrated in Figure 4(a). The gap between and corresponds to the scene portion between and in Figure 4(b) that can be observed by one camera but not by the other one. This sketch map shows that the relationship between corresponding points can be classified into three types: one-one, one-many, and many-one. This characteristic is independent of the camera model of the images to be registered and the complexity of scene surface.

To highlight the mapping between CPs, we move the origin to an infinite point and connect the CPs by a dotted line; then the sketch map in Figure 1(a) becomes Figure 1(b). It is well known that a function is a mapping that can depict two types of relationships—many-one and one-one—in mathematics. Therefore, the relationship between CPs is too complex to be described by a function.

The fundamental idea of CF is to decompose the corresponding relationships (we name the mapping relations between CPs as “corresponding relationships”) into two directional relationships that can be depicted by two functions. For convenience, these directional relationships are referred to as “component relations” (CRs) and labeled as and , respectively. The CR consists of one-one mapping and many-one mapping. For example, Figures 2(a) and 2(b) are a pair of CRs of Figure 1.

The component relation, for example, , can be constructed as follows: for every point , if there is a unique corresponding point , the corresponding relationship should be one member of . Otherwise, if has more than one corresponding point , we have two choices: either ignore all of the corresponding relationships or select any one of the CPs , insert into , and ignore the other corresponding relationships . Component relation can be constructed similarly by checking every potential point . For example, in Figures 1 and 2, point has a unique corresponding point in ; therefore the corresponding relationship is in component relation . In particular, the corresponding point of is also unique in . Therefore, is one member of the component relation even though the CPs of are not unique in image .

Therefore, is a one-to-one or many-to-one mapping from to and can be depicted by a vector-valued function . Similarly, is also a one-to-one or many-to-one mapping from to and can be depicted by a vector-valued function

On the other hand, for any given corresponding point pair , if it is of the one-one type, the corresponding relationship expressed by it can be depicted not only by a function from to but also by a function from to . Therefore the corresponding relationships of the one-one type can be found in both and , such as the corresponding point pair in Figures 1(b), 2(a), and 2(b). Otherwise, if one given corresponding point pair is of the many-one type, has a unique corresponding image (If are a pair of CPs, is called the corresponding image of and vice versa.) in based on the definition of CPs of the many-one type. Therefore, the corresponding relationships of the many-one type can also be depicted by a function mapping and are members of , like the corresponding point pairs in Figures 1(b) and 2(a). Similarly, the corresponding relationships of one-many can be depicted by a function mapping from to and are members of , such as the CPs in Figures 1(b) and 2(b).

In conclusion, for a pair of images and to be registered of a scene, the relationships between CPs can be decomposed into a pair of component relations and , which consist of a number of one-to-one and many-to-one mappings. The and can be described by a pair of vector-valued functions as and . The are named as correspondence functions (CFs) in [2].

In image registration, and are referred to as a reference image and a target image, respectively; the objective is to transform the target image into the coordinate system of the reference image. Therefore, only one vector-valued function, for example, , is needed to describe the corresponding relationship and named transformation function in [23].

Therefore, CF is a pair of functions demonstrating that, for any one pair of CPs and , are consistent with at least one of two functions or , which can be estimated from a given set of putative correspondences based on robust regression method [2]. Also, for any given point , its corresponding point can be uniquely estimated by the CF and vice versa (Figure 2).

In practice, putative correspondences are usually corrupted by noise and outliers, and the number of putative correspondences is limited. Therefore, only an estimation of CF with a certain accuracy can be estimated, and the actual results may be not strictly consistent with the “necessary and sufficient condition.”

Therefore, the characteristic is that CF model can reduce the searching space of CPs into an elliptical area and the mismatches of CF are usually near the correct CPs (Section 5).

4. Learning of CF and Its Application in Mismatch Rejection

Based on the discussions in Section 3, for a pair of images and of a scene, the relationship of some corresponding points between and can be decomposed into a pair of component relations (CRs): and . A CR consists of one-one mapping and many-one mapping.

Therefore, and can be described by two vector-valued functions, and :where , , and are referred to as a pair of correspondence functions (CFs).

Suppose is a set of putative correspondences between and . If we treat as an output of a system and as an input, the CF can be estimated from using a vector-valued regression method, for example, vector field consensus (VFC) method [24, 28] and IECF (Iteratively Estimate Correspondence Function) algorithm [2]. The fundamental ideas of VFC and IECF are as follows: approximately represent the unknown CF as a weighted sum of kernel functions (2) and estimate the weights iteratively by gradually reducing the undue effects from outliers. The objective of this work is to investigate the fundamental principles and characteristics of CF. Therefore, we do not compare the various implementation of CF and do experiments by IECF. The CF can be estimated similarly.

Theoretically, the correctness of a pair of putative correspondences can be determined by checking whether it is consistent with one of the estimated correspondence functions or . However, due to the influence of image noise, the observed corresponding points usually are not strictly consistent with the estimated CFs and and a tolerance parameter is needed to determine whether they are consistent. It is shown that the consistency of with can be determined bywhere the superscript represents a transpose of a vector and is the covariance matrix of random variable and can be estimated using IECF algorithm [2]. The consistency of with is measured similarly and denoted by . Then, some mismatches can be rejected by the constraints and .

Similarly, a tolerance parameter is also needed to reject mismatches by the EG constraint (1):

5. Experimental Research

In this section, we experimentally investigate the characteristics of the CF by comparing it with the widely used parametric model epipolar geometry (EG) ((1), Figure 3) in the context of rejecting mismatches.

5.1. Experimental Configuration

The mismatch-rejecting methods based on CF and EG are implemented by algorithms ICF (identifying point correspondences by correspondence function) [2] and RANSAC (Random Sample Consensus) [11, 29, 30], respectively, in this study. Since the coordinates of the observed putative CPs are usually corrupted by noise, they are not strictly consistent with the CF and EG equations. Therefore, two tolerance parameters are needed for the computation; we name the two parameters in EG + RANSAC and in CF + ICF.

In evaluating the performance of local descriptors and points matching models, there are several schemes, for example, recall, precision [31], and ROC curve [14]. However, the objective of this work’s experiments is to show the characteristic of fail-detected mismatches of CF. Therefore, to make a fair comparison, we adjusted the values of and such that a same number of putative correspondences are identified as possible correct matches in every experiment and the RANSAC are implemented without a limit on iteration number.

Suppose and are two images of a scene to be matched and suppose is a set of putative correspondences between them. A mismatch-rejecting method will partition into two subsets, and , as possible correct correspondences and possible mismatches. For readability, we denote the two subsets by and in the method based on EG constraint and and in the method based on CF. Thus, set can be divided into four subsets, , , , and , where and The meanings of the above four subsets are as follows:(1): the putative correspondences that are regarded as possible correct matches by the methods based on and .(2): the putative correspondences that are regarded as possible mismatches by method and possible correct matches by .(3): the putative correspondences that are regarded as possible correct matches by the method and possible mismatches by .(4): the putative correspondences that are regarded as possible mismatches by the methods based on and .On subsets and , the two methods based on CF and EG constraint are consistent. Therefore, it is sufficient to focus on and to compare the CF model and the EG constraint.

For convenience in comparing the results of EG and CF, we manually removed some of the correct and near-correct putative correspondences from and in some experiments, and the two sets composed of the remaining putative correspondences are denoted by and , respectively:(i): the typical putative correspondences that are classified as possible correct matches by and as possible mismatches by .(ii): the typical putative correspondences that are classified as possible mismatches by and as possible correct correspondences by .

5.2. Experiments
5.2.1. Main Results

One of the used image pairs is presented in Figure 5. There are 1278 and 1207 SIFT feature points extracted from two images, respectively (the feature points are detected by the multiscale DoG (Difference of Gaussian) scheme and described by the orientation histogram technique [32]), and 448 putative correspondences computed from the feature points by the NNDR (Nearest Neighbor and Distance Ratio) method [32]. Due to the ambiguity of local information, some of the putative correspondences are incorrect and need to be rejected. In this experiment, the two methods based on EG and CF are inconsistent on 52 putative correspondences (Figures 5(a) and 5(b)). For example, for every putative correspondence in Figure 5(a), the CF method identified it as a possible correct match, but the EG method identified it as a possible mismatch. Therefore, based on the definition of and , we can regard as a set of potential mismatches that the EG constraint fails to detect and as a set of potential mismatches that the CF model fails to reject. More results are presented in Figures 6, 7, and 8.

5.2.2. Comparing the CF and EG Constraint by Analyzing the Undetected Mismatches

For ease of explanation, we introduce the following two definitions: type-I mismatches and type-II mismatches. Suppose is a mismatch. If there exists a correct corresponding point pair satisfying that the points and are in the neighborhood of and , respectively, then is defined as a mismatch of type-I. Otherwise, we call a type-II mismatch. For example, the mismatches in Figure 7(d) and in Figure 3(b) belong to type-II and in Figure 3(b) belong to type-I. By the EG constraint, we can only determine whether a putative correspondence is in a band around the epipolar line (Figures 3(b), 9(b), and 9(d)) but not whether it is close to the correct one. Therefore, the undetected mismatches usually belong to type-II in the methods based on the EG constraint (Figures 3, 5(b), 5(d), 6(b), 6(d), 7(b), 7(d), 8(b), and 8(d)). The CF constraint determines the mapping relationships between two CPs and directly estimates the position of the corresponding point. Therefore, by the mismatch-rejecting methods based on CF, we can restrict the error bound of the putative correspondences by a tolerance parameter ; the undetected mismatches by CF usually belong to type-I. To show the difference between the CF model and EG constraints, a typical result is presented in Figure 6, in which we relax the tolerance parameters and to ensure the two sets and change at equal speed.

However, the mismatches of type-I and type-II are usually dramatically different from each other in usefulness and undue influence. From the mismatches of type-I, we can search for the correct CPs by including more information. On the contrary, other than ruining the traditional estimation methods and distorting the final application results (e.g., image mosaicing and three-dimensional reconstruction), the mismatches of the type-II provide less useful information.

5.3. Further Evaluation on Some Popular Benchmark Data

The characteristics of the CF are also investigated on two series of popular benchmark data sets widely used to compare and evaluate related algorithms [33, 34].

To evaluate the two schemes quantitatively, this work conducted experiments on a series of image sets with ground-truth correspondences [31, 35]. The images were captured under special consideration to enable the encoding of the ground-truth correspondences with homographies between the reference image and other images. The ground-truth homographies are computed based on the following steps: first, a set of putative correspondences are manually selected and an initial estimation of the homography is computed based on the selected putative correspondences; second, the images are approximately aligned with the rough estimation of homography; third, reliable interest points are detected and matched automatically and an accurate homography is computed from them.

The experimental results are presented in Figures 11, 12, 13, and 14. In every experiment, the upper image and the lower image are called the reference image and the target image, respectively. In addition to the computed putative correspondences, we also present the ground-truth CPs and ground-truth correspondence relationships in these experiments. The ground-truth CPs in the target images are computed for all of the marked interest points in reference images based on the ground-truth homographies and denoted by red circles. The detected interest points are denoted by black plus signs. The ground-truth correspondence relationships are displayed as blue dashed lines and the computed putative correspondence relationships are shown by a solid line with red or green for visibility.

Suppose that are a set of putative correspondences and are the ground-truth correspondences of . We quantitatively evaluate the quality of by the root-mean-square deviation (RMSD) .

The second series of evaluation data are the Middlebury Stereo Datasets [36, 37] (http://vision.middlebury.edu/stereo/data/).  These data sets were created using an automated version of structured-lighting technique [38]. Each dataset consists of seven rectified views. This work used the first and the fifth views for evaluation. The experimental results are presented in Figures 15, 16, 17, 18, and 19.

5.4. CF and EG in Practice
5.4.1. Necessity and Sufficiency

As analyzed, the EG constraint is a necessary but insufficient condition in the correspondence problem, and the CF model describes the mapping between CPs directly. However, in applications, some correct CPs may be inconsistent with the estimated EG model and/or the estimated CF model; there may be some mismatches that are “consistent” with the estimated CF model; there also may be some of the putative correspondences which are “consistent” with the estimated CF but “inconsistent” with the estimated EG, for example, the experimental results in Figures 5(a), 5(b), 7(a), 7(b), 8(a), and 8(b). The reasons for the above results are as follows:(i)The coordinates of the obtained putative CPs are usually corrupted by noise.(ii)There are usually some errors in the estimation of the CF and EG.(iii)As pointed out in the first paragraph of Section 5, a preset tolerance parameter is needed to define the “consistent” and “inconsistent” in the applications of CF and EG. However, the noise in the obtained putative CPs is random.

5.4.2. Characteristics and Limitations of the CF Model and EG Model

In the two models, there are no restrictions on the depth range of the imaged scene. However, the EG is a global describing model, and the CF is a local describing model. In the EG model, any pair of putative CPs can influence the overall model estimation, whereas in the CF model, a pair of putative CPs only influence the model estimation within a local area in their vicinity. The shortcoming of the global model EG is that the overall model estimation can be degraded or ruined by PCs with considerable noise or mismatches in any area of the images. The shortcoming of the local model CF is that a pair of good CPs cannot improve the estimation of CF outside of a local area around itself. Thus, the uneven distribution of putative correspondences with different quality results in the fact that the estimated CF may be of high quality in some areas but of low quality in others; for example, in the experiment of Figure 10, the qualities of the CF estimations over the second-nearest-to-the-cameras crossbeam and the third-nearest-to-the-cameras crossbeam are better than estimation over the first-nearest-to-the-cameras crossbeam. The “local area” is determined by the CF estimation method and its chosen parameters; for example, in the CF estimation method SP (SVM) [2], the local area is determined by the chosen kernel and the scale parameter. The above limitations of EG and CF can be alleviated by investigating robust estimation methods, for example, the RANSAC, M-estimators [8], LMedS (Least Median of Squares) [9], and MLESAC [10, 12] for EG. In future work, we will investigate further the estimation method for CF to improve its robustness.

6. Conclusion

The CF is a recently introduced nonparametric model for rejecting mismatches/outliers in image point matching. In this study, we investigated the principles of the CF and studied its characteristics by comparing it with the widely used parametric model epipolar geometry (EG) constraint.

It is shown that the CF describes the mapping relationships between two CPs and should be able to estimate the position of the corresponding point. Therefore, in addition to mismatch rejection, a potential application of the CF is to guide the point matching process by incorporating it into the correspondence propagation [39].

In practice, putative correspondences are usually corrupted by noise and outliers, and the number of putative correspondences is limited. Therefore, only an estimation of the CF with a certain accuracy can be estimated, and the actual results will violate the “necessary and sufficient condition.” The characteristic is that the CF model can reduce the searching space of CPs into an elliptical area and the mismatches of the CF are usually near the correct CPs.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors are grateful for valuable advice from Dr. Dibyendu Mukherjee and the anonymous reviewers and would like to thank the National Natural Science Foundation of China (Grant nos. 61075033, 61005033, and 61273248), the Natural Science Foundation of Guangdong Province (2014A030313425 and S2011010003348), and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (201001060) for their support.