Mathematical Problems in Engineering

Volume 2015, Article ID 594956, 16 pages

http://dx.doi.org/10.1155/2015/594956

## Extracting Corresponding Point Based on Texture Synthesis for Nearly Flat Textureless Object Surface

^{1}College of Information Sciences and Technology, Donghua University, Shanghai 201620, China^{2}Engineering Research Center of Digitized Textile & Fashion Technology, Ministry of Education, Donghua University, Shanghai 201620, China

Received 5 June 2014; Accepted 28 August 2014

Academic Editor: Minrui Fei

Copyright © 2015 Min Mao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Since the image feature points are always gathered at the range with significant intensity change, such as textured portions or edges of an image, which can be detected by the state-of-the-art intensity based point-detectors, there is nearly no point in the areas of low textured detected by classical interest-point detectors. In this paper we describe a novel algorithm based on affine transform and graph cut for interest point detecting and matching from wide baseline image pairs with weakly textured object. The detection and matching mechanism can be separated into three steps: firstly, the information on the large textureless areas will be enhanced by adding textures through the proposed texture synthesis algorithm TSIQ. Secondly, the initial interest-point set is detected by classical interest-point detectors. Finally, graph cuts are used to find the globally optimal set of matching points on stereo pairs. The efficacy of the proposed algorithm is verified by three kinds of experiments, that is, the influence of point detecting from synthetic texture with different texture sample, the stability under the different geometric transformations, and the performance to improve the quasi-dense matching algorithm, respectively.

#### 1. Introduction

In the last decades, more and more methods for 3D modeling have been proposed. Techniques which only use images or video as input have been developed to reconstruct 3D scenes. For examples, techniques of shape-from-video used flexibility of the recording to reconstruct the wide variety of scenes. However, these methods require large overlap between subsequent frames, which might not always be possible to record video of the interest object due to time pressure or obstacles. Furthermore a reconstruction is desirable from the images which are not taken for the purpose of 3D modeling. Since the stereo matching has been focusing on the small baseline stereo or frontoparallel planes, many algorithms such as graph cuts [1–3], minimal path search [4], and belief propagation [5] are based on the diversity of concepts.

One of the most studied problems in stereo matching is finding corresponding points between stereo image pairs, and this is sometimes very hard. The system should set aside smoothness assumptions to detect the occlusions and depth discontinuities, and it also strongly depends on the capacity of handling weakly textured regions. The algorithms mentioned above can handle short-baseline stereo matching well; however in contrast, there are much more challenging in the wide-baseline situation due to increased occluded areas and large perspective distortions. But the wide-baseline only requires fewer images to reconstruct a scene completely, and it is worth addressing.

On the other hand, local, viewpoint invariant features can be used for wide-baseline matching, and hence the viewpoints can be further apart. Amintoosi et al. [6] used SIFT key-points for image registration, and Jian et al. [7] proposed a key-point detector based on wavelet transform for image retrieval. In general, the process for matching the discrete image points can be divided into three main steps: firstly, extracting the interest point from each image, such as T-junctions, corners. The point detector should have the property of repeatability, which guarantees finding the same physical interest point under different viewing conditions. Secondly, each interest point can be represented by a distinctive feature vector through the descriptor. Finally, the descriptor vectors are matched between different images.

Almost all of the detectors are based on the gradient map of image. For example, the Harris corner detector [8] is based on the second moment matrix, which describes the gradient distribution in a local neighborhood of a point in image. But corners detected by this method are not scale invariant. Mikolajczyk and Schmid [9] proposed two scale-invariant methods, that is, Harris-Laplace and Hessian-Laplace, which are based on the concept of automatic scale selection [10]. And the location is selected by Harris measure or the determinant of the Hessian matrix; scale is selected by Laplacian. Lowe [11] speeds up the above methods by using the difference of Gaussians (DoG) to approximate the Laplacian of Gaussians (LoG). Stansk and Hellwich [12] introduce a new operator to extract the salient points from image, which used interest points as the anchor points to match points between different perspectives. There are lots of different detectors that have been proposed in the literature [13–16]. However, image blurring, magnification, and illumination are still problems of the methods based on interest point; one of the most serious weaknesses is that these methods could not get a point on the weakly texture areas by classical interest-point detectors. For solving this, Dragon et al. [17] proposed an NF-features method to complementary regular feature detection, which is similar to our method. The method uses regular detectors, such as SIFT, Hessian-Affine, and SURF to get the interest points surround the nontextured regions and get the NF-features by using these points as anchor features according to the Euclidean distance between every location in the textureless areas and regular point.

In this paper, we will extend the capability of classical interest-point detectors by transfer strongly texture to extracting the corresponding points in the weakly texture areas. The motivation is the fact that large nearly flat textureless areas in the real world are always assumed as a smoothness surface [18], which means the depth of these areas are continuous. Hence, if the textureless area is nearly flat, it can also be approximated by a plane and the transformation of this plane between two image pairs can be described by a simple affine transform. Different from the texture synthesis for 3D shape algorithm, we will use a 2D texture synthesis for the weakly texture regions in two image pairs and then use classical interest-point detectors to extract the correspondences from these regions. Hence, we can take advantage of these powerful detectors to obtain corresponding points from 3D textureless objects in wide-baseline stereo matching. We focus only on the points corresponding problem in large nearly flat textureless areas and propose an algorithm to get the corresponding points on those areas by using both salient features surround large weakly texture objects and the information on these objects. The proposed algorithm can be divided into two steps. First, the feature points detected on the contour of the object, which has a large textureless region, will be used to estimate the affine transform and to locate the center points for adding synthesize textures on the object surface between two image pairs. Second, since the real surfaces of these objects are not planar completely, the correspondences extract from the first step are not accurate. We therefore use the graph cut algorithm to adjust the corresponding results from first step by incorporating the features on the object, such as the intensity and the self-similarity.

The proposed approach can be used for enhancing the performance of dense wide-baseline matching by increasing the seed points obtained from classical interest-point detectors [16]. The results obtained on image sets under different types of photometric and geometric transformations show the high robustness of the method. The proposed approach gives a rich set of corresponding points in the large weakly-textured region which has no distinctive feature points detected by classical detectors.

The paper is organized as follows. Section 2 describes the method for adding synthetic texture to image pairs. We will discuss why the texture can be added in two image pairs by an affine transform and how this transform will be estimated. In Section 3, we propose an approach to adjust the position of the corresponding point in the second image by graph cut algorithm and comparing the adjusting result to the one which is directly obtained from Section 2. In Section 4, we verify the proposed method with extensive experiments of human model images under different circumstances. Section 5 provides concluding remarks and possible extensions of the proposed approach.

#### 2. Texture Synthesis for Point Detecting and Matching

In this section, we will present a texture transfer mechanism used for interest point detecting and matching. We first explain the reason why the relationship of a large weak texture region between two image pairs can be approximated by an affine transform and then can present the estimation methods of this affine transform and structuring synthetic texture for adding texture on the textureless object surface.

##### 2.1. Relationship of Large Textureless Area between Image Pairs

Let be a 3D point and , are the corresponding projection points in image plane , from two cameras with different orientations, respectively. According to this relationship, a mapping exists, when , , are the projection points from a same scene point .

In wide-baseline situations, the feature of point in two images would be changed frequently by lighting or perspective changes. Thus, most researchers favor using classical detectors, such as GLOH and SURF [16, 19], which are both based on the gradient map of image and have proven successful for sparse matching in wide-baseline situation. And the mapping can be got by using the sparse matching result as seed points [20].

The general sparse matching algorithm is as follows:(1)interesting point detecting;(2)point describing;(3)matching points between two images.

Let and denote the point sets got by interest point detector, is a point in , and then its corresponding point in can be got as follows: where denotes the descriptor operator, is the match operator and the pair of points .

From the above analysis, we see that the result from sparse matching algorithm depend largely on its detector and descriptor, so these algorithms cannot give the corresponding points in textureless areas, since points in these areas will not be detected by those detectors. On the other hand, texture synthesis can replace the textureless areas with strong texture image. The idea is as follows.

let and be the regions which the textureless areas are replaced by texture image in image , , respectively; then according to sparse matching algorithm, it can get a new mapping . On the other hand, considering the relationship between two images, we should estimate the transformation of the texture image from to image . Let denote the transformation operator, which transforms the point location in to . Then should satisfy the following two conditions.

*Condition 1. *If , are corresponding points; that is, ; then .

*Condition 2. *Let denote the descriptor operator, since after the transformation we should still use classical detectors to detect interest point; then should satisfy and Condition 1, , .

Obviously, if transformation operator satisfies the above conditions, then in , ; that is, if , then , where are the point locations in image , respectively.

In general situation, it is more difficult to get transformation operator than to get mapping directly. However, if is an affine transform, then descriptors such as SIFT and SURF are both satisfying Condition 2 with this transformation.

As discussed above, it is difficult to get a transformation which satisfies Condition 1, but fortunately, textureless region can be assumed as a plane, and in this paper we assume that these regions are almost planar, and according to the property of affine transform, it is possible to use affine transform as under this hypothesis. The following theorem explains the reason why an affine transform can be used as an approximation of mapping between 3D points in the same plane which are projected on different viewpoint images.

A 3D point coordinate vector in world coordinate system is denoted by , and its coordinate vector in camera coordinate system is ; according to the simple pinhole model, the relationship between and is given by where is the extrinsic parameters which describes the rotation and translation between two coordinate systems. Moreover, the relationship between and its image projection is given by where is the camera intrinsic matrix, and are scale factors in image and axes, respectively, and are the coordinates of the principal point. According to formulas 2-3, the relationship between and is given by where which denotes the Pinhole camera model. Let be the 3D points in the same plane as shown in Figure 1; then each 3D point on this plane can be represented as follows: where is the parameter for to represent point in 3D coordinate. According to above formulas, the 2D coordinate of in each image can be represented as Equation 7 gives the 2D coordinates of a 3D point in two images, respectively, since the relationship of points in image pairs can be represented by affine transform as follows: According to formulas 7 and 8, the relationship between and is represented as follows: