#### Abstract

In order to make the general user take vision tasks more flexibly and easily, this paper proposes a new solution for the problem of camera calibration from correspondences between model lines and their noisy image lines in multiple images. In the proposed method the common planar items in hand with the standard size and structure are utilized as the calibration objects. The proposed method consists of a closed-form solution based on homography optimization, followed by a nonlinear refinement based on the maximum likelihood approach. To automatically recover the camera parameters linearly, we present a robust homography optimization method based on the edge model by redesigning the classic 3D tracking approach. In the nonlinear refinement procedure, the uncertainty of the image line segment is encoded in the error model, taking the finite nature of the observations into account. By developing the new error model between the model line and image line segment, the problem of the camera calibration is expressed in the probabilistic formulation. Simulation data is used to compare this method with the widely used planar pattern based method. Actual image sequences are also utilized to demonstrate the effectiveness and flexibility of the proposed method.

#### 1. Introduction

Camera calibration has always been an important issue in the field of computer vision, since it is a necessary step to extract metric information from 2D images. The goal of the camera calibration is to recover the mapping between the 3D space and the image plane, which can be separated into two sets of transformations. The first transformation is mapping of the 3D points in the scene to the 3D coordinates in the camera frame, which is described by the extrinsic parameters of the camera model. The second one involves mapping of the 3D points in the camera frame to the 2D coordinates in the image plane. This mapping is described by the intrinsic parameters which models the geometry and optical features of the camera. In general case, these two transformations can be expressed by the ideal pin-hole camera model.

Up to now, much work for camera calibration has been done to accommodate various applications. Those approaches can be roughly grouped into two categories according to whether requiring a calibration object. This first type of camera calibration methods is named as metric calibration, which resolves the camera model with the help of metric information of a reference object. Camera calibration is performed by observing a calibration object whose geometry dimension is known with very high precision. The calibration object can be 3D object with several planes orthogonal to each other [1, 2]. Sometimes a 2D plane undergoing a precisely known translation [3] or free movement [4] is utilized. Recently, a 1D temple [5–8] is used with three or more markers for camera calibration. In [6], it was proved that the 1D object undergoing a planar motion was essentially equivalent to the 2D planar object. For such type of methods, calibration can be done very efficiently and accurately. However, a calibration pattern also needs to be prepared, though in [4] the setup is very easy and only a planar object attached with the chessboard is utilized. Another type of camera calibration methods is called self-calibration which does not use any metric information from the scene or any calibration object. Such methods are also considered as 0D approach for only image feature correspondences are required. Since two constraints on the intrinsic parameter of the camera can be provided by using image information alone [9], three images are sufficient to recover the camera parameters including the internal and external parameters and reconstruct the 3D structure of the scene up to similarity [10, 11]. The problem of such methods is that a large number of parameters need to be estimated, resulting in very unstable solution. If the camera rotation is known, more stable and accurate results can be obtained [12, 13]. However, it is not always easy to get the camera rotation with very high accuracy. In general, metric calibration methods can provide better results than self-calibration methods [14]. Our current research is focused on smartphone vision system since the potential for using such system is large. Smartphones are now becoming ubiquitous and popular in our daily life. To make the general public who are not experts in computer vision do vision tasks easily, the setup of camera calibrate should be flexible enough. The method developed in [4] is considered as the most flexible technique; however, when the orientation of the model plane with respect to image plane is increasing, foreshortening will make the corner detection less precise and even fail. Moreover, the planer pattern should be prepared, which is still inconvenient for general user of smartphone. Therefore, it would be best to utilize the handy item as the calibration object. The camera calibration technique described in this paper was designed with these considerations in mind. Compared with the classical techniques, the proposed technique does not need to prepare the planer pattern and is considerably more flexible. The calibration objects employed by the proposed method are common and handy in our daily life such as an A4 paper or even a standard IC card.

Our approach exploits the line/edge features of the handy objects to calibrate both the internal and external parameters of the camera, since they provide a large degree of stability to illumination and viewpoint changes and offer some resilience to hash imaging conditions such as noise and blur. A first challenge of the solution proposed in this paper is to automatically estimate the homography and establish the correspondences between model and image features. In this sense, we redesigned the model based tracking method [15–18] to robustly estimate homography for the common planar object in the clutter scene. An advantage of such methods is handling the occlusion, large illumination, and viewpoint change. With a series of homography from the planar object to the image plane, the initial camera parameters can be solved linearly. A second challenge is to optimize the camera parameters by developing effective object function and by making full use of the finite nature of the observation extracted in the images. In this paper, the error function for the model and image line, which encodes the length of the image line segment and the information of the midpoint, is derived from the noisy image edge points in the least square approach.

The remainder of the paper is organized as follows. Section 2 gives the procedure of the proposed camera calibration algorithm. Section 3 presents an overview of the redesigned homography tracking method based on edge model. Section 4 derives the error model between image and model lines and expresses the problem of the camera calibration in the probabilistic formulation by the maximum likelihood approach. Section 5 details how to solve the problem of camera calibration by the nonlinear technique. Some experiment results are given in Section 6.

#### 2. Algorithm

The proposed algorithm is summarized in this section.

*Step 1. *
*Optimize* the homography between the model plane and image plane according to our model based homography tracking approach.

*Step 2. *
*Fit* the image line segment from the image edge points obtained by 1D search along the normal direction of the corresponding model line.

*Step 3. *
*Calculate* the initial camera parameters linearly with a series of homography matrices (more than three orientations).

*Step 4. *
*Estimate* the camera parameters by minimizing the sum of the distance between the finite image line segments and the model lines in the maximum likelihood approach.

#### 3. Model Based Homography Tracking

As can be seen in Figure 1, the 2D model edge is projected to the image plane using the prior homography of the planar object. Instead of tackling the line segment itself, we sample the projected line segment (black solid line in Figure 1) with a series of points (brown points in Figure 1). Then the visibility test for each of the sample points is performed, since some of these sample points may be out of the camera’s view field. For each of the visible sample points, 1D search along the normal direction of the projected model line is employed to find the edge point with the strongest gradient or closest location as its correspondence. Finally, the sum of the errors between the sample points and their corresponding image points is minimized to solve for the homography between frames subsequently.

##### 3.1. Probabilistic Formulation for Homography Tracking

The relationship between a model point and its image point can be given aswhere is the homography between the model plane and the image plane.

Suppose is the set of projected sample points and is their corresponding image points with the presence of the observation noise along the normal direction. Then we can define a function to measure the normal distance between a projected sample point and its noisy observation :where is the unit normal vector of the projected sample point .

Assuming a Gaussian distribution for , then we have

The conditional density of given can be given by

Therefore, with the assumption that the observation errors for different sample points are statistically independent, a maximum likelihood estimation of the homography iswhere is the number of 3D mode points.

It is clear that proposed approach can obtain the maximum likelihood estimation of the homography by minimizing the sum of the square of normal distances.

##### 3.2. Interaction Matrix-Distance between Points

The derivation of the interaction matrix for the proposed approach is based on the distance between the projection of sample point and its projected image point . The motion velocity of the object is then related to the velocity of these distances in the image.

Assume that we have a current estimation of the homography . The posterior homography can be computed from the prior homography given the incremental motion :

can be represented as follows:

The motion in the image is related to the twist in model space by computing the partial derivative of the normal distance with respect to th generating motion at current homography:where , .

Then the corresponding Jacobian matrices can be obtained bywhere is a unit vector with the th item equal to 1 and .

##### 3.3. Robust Minimization

The error vector is obtained by stacking all of the normal distances of each sample point as follows:

The optimization problem for (5) can be solved according to the following equation:where is the motion vector, is Jacobian matrix which links to , and is the weight matrix (refer to [17]).

Then, the solution of (11) can be given by

Finally, the new homography can be computed according to (7) as follows:

With a series of homography matrices (more than three orientations), the camera parameters can be solved linearly by method [4].

#### 4. Maximum Likelihood Estimation of the Camera Parameters

In this paper, the camera calibration problem can be formulated in terms of a conditional density function that measures the probability of the image observations predicted from the camera parameters given the actual image observations. This section describes how to construct this conditional density function.

##### 4.1. Probabilistic Formulation for the Camera Calibration

Consider the case where there are images of a static scene containing straight line segments. Let be the matched set of 3D model and 2D image lines in the image , which can be established automatically according to homography optimization in this paper. With the assumption that the observation errors for different line segments are statistically independent, the conditional density function of the camera parameters can now be defined as follows:where is the projection function which takes the camera parameters and the 3D line segment and returns the corresponding edge in the image. , are the intrinsic and extrinsic parameters of the camera in the image , respectively. denotes the conditional density.

Then, the maximum likelihood estimation of the camera parameters , is maximizing the conditional density function , which is given by

By taking the negative logarithm, the problem of maximize a product is converted into a minimization of a sum, which is given as follows:

The intrinsic parameters of the camera are represented as , where and are the equivalent focal length, is the principal point of the camera, and is the radial distortion coefficient. The extrinsic parameters of the camera in the image are presented in the usual manner by a translation vector and a rotation matrix . In the remainder of this section, the elements of (16) are discussed in more detail.

##### 4.2. Perspective Projection of 3D Model Line Segment

Throughout this paper, the perspective projection model is utilized. The relationship between a 3D world and 2D image point can be given aswhere is the coordinate in the camera frame for and is the -coordinate. and are the equivalent focal length. and are the principal point. and are the radial distortion, which is modeled as one-order polynomial model:where is corresponding to the projection ray from the focal point to the image point .

As shown in Figure 2, the line segment and its projection in the image plane are represented by their endpoints and . The line segments and lie on the infinite lines and , respectively. The perspective projection of 3D line segment can be given by the projection of its two endpoints:

When noise is present in the measuring data, we denote as the noisy observation of the projection of the 3D points and as the noisy observation of the projection of 3D model line .

##### 4.3. Error Model for the Observation of the Line Segment

Let , be a series of image edge points with the presence of the observation noise perpendicular to the line. For convenience, we assume that the true position of the line is parallel to the horizontal axis. Then we havewhere are Gaussian random variables with , and they are mutually independent.

Let the noises for the endpoints along the vertical direction be , , respectively, and . It can be easily derived thatwhere , .

It is clear that these two noises are negatively correlative. Since the observation noises conform to Gaussian random variables, the joint density for the random variables and is a Gaussian PDF, which can be given by

Supposing that the length of the line segment is and the intervals of the edge points are all , then we have , . Therefore, we obtain

When the number is large enough, that is, , it is easy to obtain

Substituting (24) into (22) yields

From (25), it can be seen that the error model allows us to encode the measurement error for image edge point () explicitly and obtain the intuitive impact of image line length. Moreover, long line segments produce more accurate location than shorter ones and small produces higher confidence about the line location.

##### 4.4. Maximum Likelihood (ML) Estimation

The measurement noise for the localization of the 2D line segments can be decomposed into two components: noise perpendicular to the line and noise along the length of the line. The first noise is modelled as a Gaussian random variable related to orientation error and the noise model has been derived in the last section, whilst the second one is assumed to conform to any distribution (not necessarily Gaussian) related to line fragmentation.

As can be seen in Figure 3, both the projection of the 3D line segment and its noisy observation are represented by their endpoints, and , receptively. The noise vector perpendicular to the line and the noise vector along the line are expressed as follows:where the components of are the distances between the endpoints of and along the direction perpendicular to . The components of are the distances between the endpoints of the two line segments along the direction of .

It is assumed that the two random vectors and are statistically independent. And then we can approximate the conditional density of given as

In the literature [19], it is proved that the conditional density of the projection of the 3D model line given its observed noisy image line segment is only dependent on the noise perpendicular to line :

Therefore, with the assumption that the observation errors for different line segments are statistically independent, (16) can be converted into the following formation:where is the objective function that measures the disparity between the actual image observations and their corresponding predicted ones by the current camera parameters. corresponds to the distances from the endpoints of the image line segment to the projected model line.

If the image line segment is fitted by LST and the intervals of the edge points are fixed for all of the image line segments, then we havewhere , , and correspond to the distances from the two endpoints and midpoint of the image line segment to the projected model line . It is clear that the error function between 3D model line and 2D image line is weighted by the length of the image line segment.

#### 5. Nonlinear Technique for the Optimization of Camera Parameters

In this section, we will describe how to employ the nonlinear technique to solve the problem of camera calibration defined in the previous section. In the initial case, the camera parameters can be provided by the method which is similar to [4] except that the homography matrices are calculated by the method discussed in Section 2, rather than the chessboard corners. At each iteration, the linearized error function is minimized to obtain the interframe motion vector for the intrinsic and extrinsic parameters. Then the camera parameters are updated until the objective function converges to a minimum.

The distance from the point of the image line segment to the projection of the model line is given bywhere (refer to [20]).

Assume that we have a current estimation of the rotation at the time of . The posterior rotation can be computed from the prior rotation given the incremental rotation :where is the corresponding skew-symmetric matrix of vector :

The transformation from the reference frame to the camera frame can be rewritten aswhere denotes the location of the origin of the camera frame in the world frame.

Let represent the motion velocities corresponding to translation in the , , and directions between the prior translations and the posterior translation . Equation (31) can be rewritten as

Then, the partial derivative of the error function with respect to the th motion velocities can be computed aswhere , , and .

The partial derivative of the error function with respect to can be given bywhere , .

The error vector is obtained by stacking all of the normal distances of each image point as follows:where is the distance vector from midpoint and endpoints of the image line segment to the projected model line.

The optimization problem for (30) can be solved according to the following equation:where is the motion vector and . is Jacobian matrix which links to and is the weight matrix (refer to [17]).

If the incremental motion vector has been calculated, the new camera parameters can be computed as follows:

#### 6. Experimental Results

The proposed algorithm has been tested on simulated data generated by the computer and real image data captured from our smartphone. The closed-form solution is yielded by the approach [4] except that the homography matrices are estimated by the proposed method. The nonlinear refinement within the IRLS algorithm takes 5 to 8 iterations to converge.

##### 6.1. Computer Simulations

The simulated perspective camera is supposed to be 2 m from the plane object. The resolution of the virtual camera is . The simulated camera has the following property: , . The model plane is a checker pattern printed on the A4 paper (210 mm × 297 mm) with corners. The images are taken from different orientations in front of the virtual camera. The normal vector of the plane is parallel to the rotation axis represented by a 3D vector , whose magnitude is equal to the rotation angle. The position of the plane is represented by a 3D vector (unit in millimetres). In the experiment, the proposed method is compared with the widely used chessboard corners based method [4] (referred to as corners based method and the implementation is according to the related camera calibration function of OpenCV [21]). For the corners based method, 154 corners are utilized. In our method, we use 25 lines fitted from the noisy corners by the LST. The reprojection error indicated by RMS is expressed by the root of mean squared distances in pixels, between the detected image corners and the projected ones. When only four edges of the plane pattern are utilized, the proposed method is referred to as 4-line based method.

###### 6.1.1. Performance with respect to the Noise Level

In this experiment, three planes with , , , , and are used (the three orientations are chosen according to [4]). Zero mean Gaussian noise is added to the projected image points with the standard deviation ranging from 0.1 pixels to 2.0 pixels in steps of 0.1 pixels. At each noise level, 100 independent trials are generated. The estimated camera parameters are then compared with the ground truth and RMS errors are measured. Moreover, for 154 points with real projections and the recovered projections, the RMS reprojection error is also calculated. Figures 4(a) and 4(b) display the relative errors of the intrinsic parameters which are measured with respect to , while Figure 4(c) shows the reprojection errors of the two methods.

**(a)**

**(b)**

**(c)**

From Figure 4, we can see that both the relative errors of the intrinsic parameters and the reprojection errors increase almost linearly with the noise level. The proposed method can produce the equivalent performance with the corners based methods since the image lines are fitted from the noisy image corners. When 4 lines (the smallest set for homography estimation) are utilized, the errors of the proposed method are larger than the corners based method. For , there is little difference between the 4-line based method and the corners based method.

In addition, we vary the number of sample points that are utilized to fit the line segment to validate the performance of the 4-line based method with . From the results in Figure 5, we can see that the errors decrease significantly when more sample points are utilized. When the number is above 40 where more than 160 are utilized to fit 4 line segments, the performance of the 4-line based method is almost similar to that of the 154-corner based method.

**(a)**

**(b)**

**(c)**

###### 6.1.2. Performance with respect to the Number of Planes

In this experiment, we investigate the performance of the proposed method with respect to the number of the images of the model planes. In the first three images, we use the same orientation and position of the model plane as those used in the last subsection. For the following images, the rotation axes are randomly chosen in a uniform sphere with the rotation angle fixed to 30° and the positions are randomly selected around . The number of the model plane images ranges from 3 to 17. At each number of the images, 100 independent trials of independent plane orientations are generated with the noise level for the image points fixed to 0.5 pixels. The errors including the relative errors in camera intrinsic parameters and the reprojection errors for the two methods are shown in Figure 6. The errors decrease when more images are used. From 3 to 7, the errors decrease significantly. Moreover, the reprojection errors of the proposed method are around 0.7, when the number of the images is varying.

**(a)**

**(b)**

**(c)**

###### 6.1.3. Performance with respect to the Number of Lines

This experiment examines the performance of the proposed method with respect to the number of the lines utilized to recover the camera parameters. For our method, more than 4 lines should be employed. We vary the number of lines from 4 to 25. Three images of the model plane are also used with the same orientation and position as last subsection. 100 independent trials are conducted with the noise level fixed to 0.5 pixels for each number of the lines. The results are shown in Figure 7. When more lines are used, the errors decrease. In particular, from 4 to 15, the errors decrease significantly.

**(a)**

**(b)**

**(c)**

###### 6.1.4. Performance with respect to the Orientation of the Model Plane

This subsection investigates the influence of the orientation of the model plane with respect to the image plane. In the experiment, three images are used with two of them similar to the last two planes in Section 6.1.1. The initial rotation axis of the third plane is parallel to the image plane, and the orientation of the planes is randomly chosen from a uniform sphere with the rotation varying from 5° to 75°. The noise level is fixed to 0.5 pixels. The results are displayed in Figure 8. Best performance seems to be achieved with the angle around 40°.

**(a)**

**(b)**

**(c)**

##### 6.2. Real Images

For the experiment with real data, the proposed algorithm is tested on several image sequences captured from the camera of the smartphone.

###### 6.2.1. Homography Tracking Performance

In the experiment, three image sequences are captured from the smartphone with a resolution of . In the first image sequence, a chessboard containing interior corners is printed on an A4 paper and put on the desk. About 1500 frames are taken at different orientation. For each image, the homography from the model plane to the image plane is optimized by the proposed method using the four edges of the A4 paper. The interior corners are extracted by the function of* cvFindChessboardCorners* in OpenCV and refined by the function of* cvFindCornerSubPix*. Figure 9 shows some sampled results from the image sequence.

**(a) Frame 220**

**(b) Frame 480**

**(c) Frame 640**

**(d) Frame 973**

**(e) Frame 1073**

**(f) Frame 1113**

**(g) Frame 1153**

**(h) Frame 1213**

**(i) Frame 1433**

**(j) Frame 1533**

In the last two image sequences, the covers of two books are chosen as the model planes, respectively. To validate the performance of the proposed homography tracking method, the books are put in the clutter environment with the smartphone undergoing large rotation and translation. Both of the last two sequences contain around 2000 images. Figure 10 exhibits some sampled results.

**(a) Planar object 1**

**(b) Planar object 2**

###### 6.2.2. Camera Calibration Performance

In this subsection, we applied our calibration technique and the corners based method to the four images sampled from the video captured by our smartphone (shown in Figure 11). The image resolution is . In the experiment, the chessboard plane contains interior corners and 23 lines. The results are shown in Table 1. We can see that there is little difference between the proposed method and the corners based method. When only four edges of the plane pattern are utilized, the proposed method can provide the very consistent results with the corners based method and the offset of the camera parameters is very small about 5 pixels with respect to the corners based method. The last column of Table 1 shows the reprojection RMS of the three methods. When all of the 23 lines are utilized, the proposed method provides the almost same reprojection error as the corners based method. The 4-line based method returns the slightly larger reprojection error, since only the minimum of model lines are utilized.

In order to further investigate the stability of the proposed method, we vary the number of lines from 4 to 23. The results are shown in Figure 12. and recovered by the proposed method are around the values estimated by the corners based method only with a small deviation. The reprojection errors for the projected method decrease significantly from 4 to 17. When the number is above 17, the reprojection error is very close to that of the corners based method.

**(a)**

**(b)**

**(c)**

###### 6.2.3. Application to Image-Based Modelling

In this subsection, we applied the proposed method on two image sequences. In the first image sequence, the card with the size of 54.0 mm × 85.6 mm is utilized as the model object. The A4 paper with the size of 210 mm × 297 mm is chosen as the model object for the second image sequence. In the experiment, a series of images are sampled from the videos to calibrate the camera intrinsic parameters and then the camera pose is optimized for each image frame. After that, the structure from motion developed by the methods [22–24] was run on the image sequences to build the complete models of the toys including Luffy and Hulk. In Figure 13, Figures (A), (B), (C), and (D) are some sampled images from the image sequences. The recovered camera poses by the proposed method are shown in Figure (E). The left one of Figure (E) shows the camera poses for the whole image sequence, while the right one corresponds to the sampled views for the following reconstruction. By recovering the whole motion trajectory of the camera, we can easily choose a subset of the frames which are suitable and adequate for modeling. Two rendered views of the reconstructed objects are shown in Figure (F). From Figure 13, we can see that the complete model of the objects has been reconstructed by moving the camera around the objects. For the size of the handy items is known, the objects can be reconstructed with the metric information.

**(a) Reconstruction results of Luffy**

**(b) Reconstruction results of Hulk**

###### 6.2.4. Discussion

In practice, the corners detection often suffers from a failure, when the angle between the model and image plane is large or when some of the corners are invisible or corrupted by the image noise and blur. However, the edge detection is more stable in such case. Moreover, in the simulated experiments, since the line segment is fitted by the corners lying on it, the proposed method provides almost the same performance with the corners based method. In our homography tracking framework, much more image edge points corresponding to the sample model points are utilized, and therefore the line segment can be fitted with higher accuracy. In addition, the proposed method is more flexible and suitable for the general user of the smartphone who wants to take the vision task, since it only uses the common and handy planar object rather than the prepared planar pattern.

#### 7. Conclusions

In this paper, we have investigated the possibility of camera calibration using common and handy planar objects undertaking general motion for the smartphone vision tasks. A linear algorithm supported by the edge model based homography tracking is proposed, followed by a nonlinear technique to refine the results. Both the computer simulated and real images have been utilized to validate the proposed algorithm. The experimental results exhibited that the proposed algorithm is valid and robust and provides more flexible performance than the widely used planar pattern based method.

In addition, for the general user who will do vision task, the prepared planar calibration may be not always in hand. However, the common items in our daily life almost have the standard size and planar structure. By exploiting the edge information, we proposed an easier and practical camera calibration method. Moreover, in the proposed method, the uncertainty of the image line segment was encoded in the error model, which takes the finite nature of the observations into account. The problem of camera calibration using lines was formalized in the probabilistic framework and solved by the maximum likelihood approach.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The research was supported by the National Basic Research Program of China (2013CB733100) and National Natural Science Foundation of China Grant (no. 11332012).