#### Abstract

We present two simple approaches to calibrate a stereo camera setup with heterogeneous lenses: a wide-angle fish-eye lens and a narrow-angle lens in left and right sides, respectively. Instead of using a conventional black-white checkerboard pattern, we design an embedded checkerboard pattern by combining two differently colored patterns. In both approaches, we split the captured stereo images into RGB channels and extract R and inverted G channels from left and right camera images, respectively. In our first approach, we consider the checkerboard pattern as the world coordinate system and calculate left and right transformation matrices corresponding to it. We use these two transformation matrices to estimate the relative pose of the right camera by multiplying the inversed left transformation with the right. In the second approach, we calculate a planar homography transformation to identify common object points in left-right image pairs and treat them with the well-known Zhangs camera calibration method. We analyze the robustness of these two approaches by comparing reprojection errors and image rectification results. Experimental results show that the second method is more accurate than the first one.

#### 1. Introduction

The process of estimating internal-external (also known as intrinsic and extrinsic) camera parameters and knowing the correct relative pose between cameras in a stereo setup has been of the interest in the computer vision field for many years. It is considered as the first and foremost important step in many 2D/3D stereo vision experiments. Much related work have been introduced throughout the past few decades, initially starting in the photogrammetry community [1, 2]. As mentioned in [3], these calibration methods can be divided into two broad categories: photogrammetric calibration and self-calibration. Photogrammetric calibration is performed by observing a calibration object (normally a checkerboard pattern) whose geometry in the 3D space is known for the best precision. In contrary, self-calibration is performed by extracting feature points and processing correspondences between captured images of a static scene. However, one of the constraints in most of these photogrammetric calibration methods is using common or similar field-of-view (FOV) cameras. Correspondingly, many self-calibration methods also follow the same constraint, where a few utilize advantages of using heterogeneous setups. However, extracting rich key points is challenging and sometimes could lead into erroneous approximations. In this paper, we propose two new, yet simplified, calibration approaches for a heterogeneous camera setup. Instead of using the general black-white checkerboard pattern, we design a new color checkerboard pattern, by combining two different patterns. In our first approach, we consider the checkerboard pattern as the world coordinate system and calculate the two transformation relationships between left and right cameras correspondingly. Multiplying the inverted left transformation with the right transformation gives the relative pose of the right camera with respect to the left camera. In our second approach, we use a planar homography transformation method to identify common object points in stereo images. Once these common points are estimated, we apply Zhang’s method [3] to calibrate the stereo camera setup. The remainder of this paper is constructed as follows: Section 2 describes some existing stereo calibration methods for heterogeneous setups. Section 3 describes the preliminaries, including the configuration of our camera setup, the method of designing the color checkerboard pattern, and the method of separating two patterns from each other. Section 4 consists of the core of this paper, brief introductions to two stereo calibration approaches. Section 4.1 describes mono calibration method used to undistort input image sequences. In Section 4.2 we describe the matrix multiplication method and in Section 4.3 the planar homography transformation-based calibration method. Experiments performed to evaluate the accuracy of these two methods are summarized in Section 5. Besides comparing reprojection errors, we perform image rectifications to see how robust our proposed methods are. Finally, the conclusions and further discussions are drawn in Section 6.

#### 2. Related Work

The popularity of wide-angle lenses, such as fish-eye cameras, has started to increase in the field of stereo vision. The wider FOV of such cameras allows users to cover a broad scene area compared to conventional cameras. These cameras have been intensively used in many recent stereo-based experiments, where quite a number of calibration methods have also been tested. Barreto and Daniilidis introduced a factorization approach without performing nonlinear minimization to estimate the relative pose between a conjugated wide-angle camera setup [4, 5] using a minimum of 15 corresponding point matches. Fischler and Bolles proposed a RANdom Sample Consensus (RANSAC) [6] based polynomial eigenvalue method [7] to estimate the relative pose of a noncentral catadioptric camera system [8]. Lhuillier introduced a similar approach [9] in 2008. In this method, he discussed applying a central model to estimate the geometry of the camera and a decoupling orientation translation to identify the transformation relationship. Lim et al. introduced a new stereo calibration method using an antipodal epipolar constraint [10]. In addition, many optical flow estimation approaches have been adopted for pose estimations, as cited in [11]. On the other hand, planar projection (or homography) based approaches have also been studied to estimate relative pose in a stereo camera rig. Chen et al. proposed a calibration method for a high definition stereo camera rig by utilizing the idea of homography transformation [12] using a marker chessboard. In year 2013, they discussed another slightly improved image undistortion and pose estimation method in their technical paper [13].

Even though these existing methods can be used to calibrate heterogeneous stereo camera setups, most of them have certain limitations and drawbacks. Most of these methods depend on geometric invariants of image features, such as projections of straight lines, or the approximations of the fundamental matrix [13]. They require proper extraction/matching of point correspondences between stereo image pairs, which sometimes could be more challenging due to irregular resolutions, different FOVs, and lens distortions of cameras. In addition, the implementation of these methods is limited only for small displacement since the reliability of feature points extraction decreases when there are large FOV differences between images. The method proposed by Barreto and Daniilidis is mostly algebraic, and the linear model requires a minimum of 15 point correspondences. Precise estimation of these correspondences is more ambiguous and less accurate in difficult environments. Similarly, the method proposed by Micusik and Pajdla generalizes Fitzgibbon’s technique [14] and requires 9 point correspondences, whereas the method proposed by Lhuillier requires a minimum of 7 point correspondences to calculate the fundamental matrix. The method introduced by Lim et al. imposes the constraints on the distribution of feature points. The planar homography method introduced by Chen et al. in their first research article [12] sometimes failed to detect chessboard corners properly. They proposed a solution for this problem in their second research article [13] by introducing the concept of a robust type homography transformation in which they primarily focused on processing mono video cameras instead of focusing on stereo systems.

In our article, we realized that the above limitations and drawbacks occur mainly because of using point correspondences in-between stereo image pairs. However, the two stereo calibration methods we state in this article do not depend on these sensitive point correspondences and do not show such difficulties. Instead, we use pure mathematical approaches for pose estimations. The embedded checkerboard pattern we introduce is a proper alternative for the traditional black-white checkerboard pattern and can be used in cases where common areas are not visible in images (due to FOV differences).

#### 3. Preliminaries

##### 3.1. Focal Lengths, Field-of-Views, and Wide-Angle Cameras

Focal length is the distance from the center of the lens to the image plane where light converges to a similar point named the focal point. Figure 1 shows how two light rays are converging into this point. The focal length of a camera and its FOV are proportionally interconnected with each other. A longer focal length results in a lower FOV, where a lower focal length results in a higher FOV. This proportional relationship allows for converging or diverging the amount of light entering the camera. This is graphically shown in Figure 2. Using a short focal length is the base idea of wide-angle lenses [15, 16].

The popularity of wide-angle lenses, such as fish-eye lenses, have started to increase because of their ability to cover wider viewing areas. The basement of these wide-angle lenses can be considered as the Double-Gauss lens [17], which is a compound-type lens of a positive and negative meniscus lenses on the object side and the image side, respectively. In general, all these wide-angle lenses can be categorized into two main groups: short focus lenses and retrofocus lenses. Short focus lenses are generally made of multiple glass elements whose shapes are nearly symmetrical in the front and back of the diaphragm. Retrofocus lenses use an inverted telephoto configuration, in which the front element is negative.

##### 3.2. Designing the Special Checkerboard Pattern

The conventional way of capturing stereo images of a black-white checkerboard pattern using narrow-angle cameras has constantly been used in many existing methods. To obtain higher accurate calibration results, the pattern needs to be kept near to cameras. This orientation could sometimes result in limiting the number of poses (even though the minimum number of poses required is six as mentioned in [3]). In some situations, capturing the full area of the checkerboard pattern fails. One possible solution to resolve this occlusion problem would be using wide-angle lenses. In this paper, we have decided to use a single wide-angle lens along with a narrow-angle lens.

However, using a wide-angle lens does not guarantee the stereo setup manages to capture full images of the checkerboard pattern. Since we use a narrow-angle camera in our stereo setup, there is a difficulty to cover the full area of the checkerboard pattern at close distance as it is illustrated in Figure 3. In order to overcome these problems and as a final solution, we have designed a new checkerboard pattern and used it instead of using the conventional black-white pattern. This new checkerboard pattern we used in our proposed methods is graphically shown in Figure 4.

This special checkerboard pattern is made by combining two different color checkerboard patterns: larger pattern and a smaller pattern. The larger pattern (from now on mentioned as the outer pattern) is designed by red-blue checker patterns, and the smaller pattern is designed by black-green checker patterns. This smaller pattern is embedded into the outer pattern (as in Figure 5(a)), making the basic color blend. Color mixing results in a secondary inner pattern with red-yellow, blue, and cyan colors inside the outer pattern. Therefore, we can think of using two individual checkerboard patterns, instead of using a single pattern. The process of designing this special checkerboard pattern is depicted in Figure 5.

**(a)**

**(b)**

##### 3.3. Capturing Calibration Images of Special Checkerboard Pattern

The heterogeneous stereo camera setup we have used in our experiments is depicted in Figure 6. Two Point Grey Grasshopper cameras are mounted on either side of a horizontal panning bar: left side wide-angle camera (focal length *≅* 3.5 mm) and right side narrow-angle camera (focal length *≅* 8 mm). We kept the special checkerboard pattern in front of the cameras in such a way the narrow-angle camera always sees the full area of the inner checkerboard pattern. Since the wide-angle camera has a wider FOV, it fully sees both inner and outer patterns (Figure 7).

In our experiments, we wanted to retain only the outer pattern from wide-angle camera images and the inner pattern from narrow-angle camera images. We performed RGB channel splitting to distinguish two patterns from each other. Once R channel is extracted, we managed to separately identify the outer pattern in wide-angle camera images. Similarly, we first extracted the G channel from narrow-angle camera images and inverted it to identify the inner pattern. Figure 8 shows an instance of how we managed to separately identify two patterns from each other. Figures 8(a) and 8(b) show left wide-angle and right narrow-angle camera images. We can easily identify the outer pattern from the wide-angle image by extracting R channel and the inner pattern from the narrow-angle image by extracting the inverted G channel.

**(a)**

**(b)**

**(c)**

**(d)**

#### 4. Stereo Calibration

##### 4.1. Mono Camera Calibration

One of the problems of using wide-angle cameras is that they suffer from massive barrel distortions. Performing stereo calibrations without correcting distortions could lead into erroneous matrix calculations. Consequently, we start our two stereo calibration methods by first undistorting input wide and narrow-angle images.

We use the same experiment setup mentioned in Section 3.3. We kept the special checkerboard pattern at a short distance and captured left-right wide and narrow-angle camera images separately. After capturing images, we followed the method mentioned in Section 3.3 to retain the outer pattern in wide-angle camera images and the inner pattern in narrow-angle camera images. We then used the well-known Zhang’s method [3] to calibrate cameras independently. Figure 9 shows an instance of where wide- and narrow-angle cameras are calibrated separately.

**(a)**

**(b)**

##### 4.2. Stereo Calibration Using Transformation Matrices

The first stereo calibration approach is based on multiplying the two transformation matrices between wide- and narrow-angle cameras. Once two cameras are properly calibrated as mentioned in Section 4.1, we then capture stereo image sequences of the checkerboard pattern from two cameras at the same time. While capturing images, we kept the checkerboard pattern at a short distance to the cameras in such a way the wide-angle camera sees the full area of the pattern and the narrow-angle camera sees the full area of the inner pattern. In this method, we considered the checkerboard pattern as the world coordinate system where the origin lies at the intersection point of first red and blue checker patterns of the outer pattern. Since we consider inner and outer patterns are two different checkerboards, the inner pattern has its origin at the intersection point of first red and yellow checkers, and we shifted this toward the origin of the outer pattern by simply adding the distance between two origins. This is graphically described in Figure 10.

Taking and representing two transformation matrices between wide-angle and narrow-angle cameras with respect to world coordinate system, we wanted to find the relative pose of the narrow-angle camera with respect to wide-angle camera, . We used the captured stereo image sequences to calibrate two cameras separately (Section 4.1) and estimated two camera matrices (or perspective projection matrices).

The general relationship between a 3D point in the world coordinate system and its respective 2D point in the image coordinate system can be written as where depicts the camera matrix. This matrix can be further decomposed as intrinsic camera matrix and rigid transformation matrix (or the extrinsic matrix) [18]. Thus, (1) can be rewritten aswhere denotes rotation and* t* denotes translation ( and in (3), resp.).These intrinsic and extrinsic entries of matrix can be easily identified using factorization [19].

We can estimate both and transformation matrices by applying this generalization into wide-angle and narrow-angle cameras separately as follows: The following equation depicts the relationship between transformation matrices shown in Figure 11, which we are interested in estimating .We multiplied the inverse of left transformation matrix with the right transformation matrix to find the relative pose of the narrow-angle camera with respect to wide-angle camera as follows:

Figure 12 graphically summarizes the whole matrix multiplication-based calibration procedure as a flow chart.

##### 4.3. Stereo Calibration Using Planar Homography Transformation

Figure 13 summarizes the whole process we followed to find the relative pose of the narrow-angle camera by keeping wide-angle camera as the reference. Similar to the method mentioned in Section 4.2, we first undistorted the images and used them as input data.

This second approach uses Zhang’s method to perform stereo calibration, but, to apply Zhang’s method, we need to know the correct relationship between point locations in two camera images. Due to the reason that the narrow-angle camera only sees a partial area of the full checkerboard pattern, we could not directly identify this relationship. Therefore, we applied a planar homography transformation on the wide-angle camera images to properly project point locations into the view point of the narrow-angle camera.

To calculate the planar homography matrix , we need at least four corresponding image points between wide- and narrow-angle images. This means that we need to know at least four sets of 2D image coordinates of the checkerboard pattern. Due to the FOVs of two cameras, wide-angle camera captures both inner and outer patterns, where narrow-angle camera only manages to capture the full area of the inner pattern (with some partial areas of the outer pattern).

Therefore, we decided to retain only the inner pattern in both wide- and narrow-angle images. We followed the same channel splitting method, but this time we only considered extracting the inverted G channel. This results in separately identifying the inner pattern in both camera images. We manually selected four exact common point locations in images to calculate matrix as shown in Figure 14. According to [18], the homography transformation relationship between two 2D corresponding point locations can be summarized as is the homography transformation matrix that we are interested in calculating, where and represent known 2D point locations we selected in wide-angle and narrow-angle camera images, respectively. Using the above four point correspondences, we find this matrix based on singular value decomposition.

After calculating matrix, we next find chessboard corners of the outer pattern in wide-angle images. We followed the steps mentioned in [20] to find the chess corner locations accurately. We first extract R channel to retain the outer pattern and find 2D point information of all 54 corners. Next we apply matrix to identify where these corner points projected onto narrow-angle images (Figure 15). Green circles in narrow-angle images represent these projected point locations. We adjusted these points with subpixel accuracy to maximize their cornerness criteria. Once we find 2D coordinates of common object points in both wide- and narrow-angle images, we can treat them with Zhang’s method to perform stereo calibration between two cameras.

#### 5. Experiments and Results

We have performed 4 experiments (2 for method 1 and 2 for method 2) to evaluate the robustness of the proposed two methods. We have performed experiments in both indoor and outdoor environments. We have used the same experiment setup mentioned in Figure 6 to perform indoor experiments, where we mounted it on top of the front mirror of a vehicle to do outdoor experiments. We used a similar number of image sequences (30 images) in every experiment. Table 1 summarizes intrinsic camera parameters for both cameras. Parameters and represent the focal lengths expressed in pixel units in and directions. and represent the and components of the principal point. Table 2 summarizes experiment results calculated for both indoor and outdoor environments from method 1 in Section 4.2 and method 2 in Section 4.3. Parameters , , and represent the components of rotation in , , and directions, where parameters , , and represent the components of translation in , , and directions.

We calculated and compared reprojection error values in both methods, that is, the root mean squared value (RMS) of Euclidean distances between the observed chess corner points in the image coordinate system (in 2D calibration images) and the corresponding projected object points. We referred to [18, 21] to calculate these errors.

Also, we performed image rectifications [22] to see how accurate our calibration methods are. Experiment results affirm homography transformation method is slightly accurate compared to the matrix multiplication method. Some indoor environment rectification results generated from both methods are shown in Figures 16 and 17, where outdoor results are shown in Figures 18 and 19, respectively. There, we drew epilines (green horizontal lines) to represent the rectification error graphically and additionally calculated the absolute value differences of the inner pattern’s chessboard corner locations to represent it mathematically.

To represent rectification error mathematically, we selected four stereo image pairs from the outdoor environment that are rectified using calibration parameters of the two methods. From each image set, we extracted inner pattern areas, estimated 35 chess corner locations (as mentioned in [20]), and calculated value differences (in pixels) between corresponding point locations in wide- and narrow-angle images. We summarized the average difference of each individual image set along with their overall average (term* Average Err.*). Table 3 depicts these results in pixels.

We performed another experiment to evaluate the accuracy of calibration using the embedded checkerboard pattern and a general black-white checkerboard pattern. We kept both patterns at the same position, where both cameras manage to see the full area. We calibrated the images of the black-white pattern according to the general version of Zhang’s method. We used our proposed homography transformation method to calibrate the images of the embedded pattern. Similarly, we performed image rectifications and calculated 2D pixel positions to confirm that the combination of our embedded pattern and homography-based method gives better results compared to the general method when using the black-white pattern (Figure 20).

**(a)**

**(b)**

#### 6. Conclusions

In this paper, we proposed two new methods to calibrate a heterogeneous stereo camera setup using a special colored checkerboard pattern. The heterogeneous camera setup consisted of a left wide-angle fish-eye lens camera and a right narrow-angle conventional camera. Because of the viewing angle irregularities, we could not use the conventional black-white checkerboard pattern at a short distance to the cameras. Therefore, we designed a new color checkerboard pattern by combining two different size checkerboard patterns. We embedded the small checkerboard pattern with the larger checkerboard pattern, letting their colors blend. This color blending results in a special checkerboard pattern, which consists of an outer pattern and an inner pattern. This checker pattern is kept at a very close distance to cameras and captured calibration images sequences to improve estimated results. We used RGB channel splitting method to separately identify two patterns from each other.

In our first method, we perform stereo calibration between the cameras by calculating left and right transformation matrices. In our second method, we calculated a planar homography relationship between two cameras to identify common object point locations of stereo images. We projected chessboard corner locations of the outer pattern into the view point of the narrow-angle camera by treating them with the calculated homography relationship. Zhang’s calibration method was applied to calibrate the stereo camera rig afterwards. We created rectification results to evaluate the robustness of our two proposed methods. There, we realized the second method was slightly accurate than the first.

As in future improvements, we are planning to parallelize both calibration approaches in GPU-based Nvidia Jetson TK1 board to speed up calibration by reducing the computation time and to use it in an embedded smart vehicle system for lane detection. In addition, we are planning to enhance the accuracy by updating calibration results using the well-known 5-point algorithm and a parallelized SIFT-GPU based corresponding point extraction.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work was jointly supported by the Civil Military Technology Cooperation Center and the Korea Atomic Energy Research Institute (KAERI), the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIP) (no. NRF-2016M2A2A4A04913462), and the BK21-Plus project (SW Human Resource Development Program for Supporting Smart Life) funded by the Ministry of Education, School of Computer Science and Engineering, Kyungpook National University, Korea (21A20131600005).