Journal of Sensors

Volume 2016 (2016), Article ID 8923587, 8 pages

http://dx.doi.org/10.1155/2016/8923587

## Relative Pose Estimation Algorithm with Gyroscope Sensor

^{1}School of Computer Science and Technology, Beihang University, Beijing 100191, China^{2}Lenovo Group, Ecosystem & Cloud Services Business Group, Beijing 100085, China^{3}Lenovo Corporate R&D, SoC Center, Beijing 100085, China

Received 26 July 2016; Accepted 27 October 2016

Academic Editor: Jose R. Martinez-de Dios

Copyright © 2016 Shanshan Wei et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper proposes a novel vision and inertial fusion algorithm S^{2}fM (Simplified Structure from Motion) for camera relative pose estimation. Different from current existing algorithms, our algorithm estimates rotation parameter and translation parameter separately. S^{2}fM employs gyroscopes to estimate camera rotation parameter, which is later fused with the image data to estimate camera translation parameter. Our contributions are in two aspects. (1) Under the circumstance that no inertial sensor can estimate accurately enough translation parameter, we propose a translation estimation algorithm by fusing gyroscope sensor and image data. (2) Our S^{2}fM algorithm is efficient and suitable for smart devices. Experimental results validate efficiency of the proposed S^{2}fM algorithm.

#### 1. Introduction

Camera relative pose estimation (CPE) is the estimation of camera extrinsic parameters, that is, camera 3D rotation parameter and 3D translation parameter. It is one of the key issues in computer vision and is widely applied in 3D scene reconstruction, augment reality, panorama, and digital video stabilization solutions.

The traditional solutions to CPE problem are based on image processing technique, that is, the visual methods. These solutions usually first extract feature correspondences between frame pairs and then model CPE problem as linear equations under the epipolar geometry constraint. In such a way CPE problem is transformed into an optimal solution problem. Hartley [1] proved the feasibility of using 8 pairs of feature correspondences to handle CPE problem and proposed 8-point (8 pt) algorithm to solve CPE problem for uncalibrated cameras. After that, in order to find simpler solutions, researchers proposed 7 pt algorithm [2], 6 pt algorithm [3, 4], and 5 pt algorithm [4–6]. These are mature traditional solutions based on image processing technique with accurate estimation results but complex calculation and slow computing speed. With the fast growing employment of MEMS sensors in smart devices, inertial-based solutions for CPE problem have been tried recently. These solutions [7, 8] usually first perform CPE by visual and inertial methods individually and then adopt data filter to fuse the two results in order to obtain a more reliable estimation result. These two individual algorithms complement each other and improve the robustness of CPE. The disadvantage of this solution lies in the fact that it needs additional fusing time of the two results and thus reduces the efficiency.

It can be seen from the above analysis that visual solutions for CPE are mature and accurate but with complex computation and inertial solutions have been tried but without satisfactory results. Considering that rotation can be estimated fast and accurately by gyroscopes but there are no proper sensors to estimate translation accurately enough for CPE application; this paper proposes a visual and inertial fusion solution S^{2}fM (Simplified Structure from Motion). S^{2}fM divides the CPE problem into two parts: the rotation estimation part and the translation estimation part. It first employs gyroscope sensor to estimate the rotational information and then fuses the estimated rotational information with image data to estimate camera translation. Our solution relies on both gyroscope sensor data and image data but there might be time delay between them, so a calibration algorithm is necessary to align the gyroscope data and image data. The camera focal length is also estimated in the calibration algorithm, which further simplifies the visual algorithm for translation estimation. Since the calibration needs to be done only once for each device, the main cost of our solution lies in the visual algorithm stage, which has been simplified to deal with only 3 feature pairs. Our main contributions are in two aspects. (1) Under the circumstance that no inertial sensor can estimate accurately translation parameter, we propose a translation estimation algorithm fusing gyroscope sensor and image data. (2) Our S^{2}fM algorithm is efficient and suitable for smart devices.

The rest of the paper is organized as follows. Section 2 reviews the related works. Section 3 describes the proposed solution. Section 4 presents the experimental results, and Section 5 draws the conclusion.

#### 2. Related Work

Generally, CPE solutions can be classified into two major groups.

The first group of solutions are the traditional solutions. These solutions model CPE problem as linear estimation problem based on image feature correspondences under two-view geometry (mostly adopted) or multiview geometry [2]. A fundamental matrix will be determined by the feature correspondences, which can then be decomposed to give relative camera orientation and translation. Thus, CPE problem is transformed into the fundamental matrix estimation and decomposition problem. The fundamental matrix decomposition problem is called the minimal problem in computer vision, whose solutions are divided into two categories: one for calibrated camera and the other for uncalibrated camera. The essential issue in minimal problem is that how many correspondence points the solution needs at least. For uncalibrated camera, the solutions include 8 pt (point) algorithm, 7 pt algorithm, and 6 pt algorithm. For calibrated camera, the solution is 5 pt algorithm, since the relative pose parameter number is 5, that is, 3 for rotation and 2 for translation (up to an unknown scale factor). Hartley proved the validity of 8 pt algorithm [1] in 1997, in which the correspondence problem is supposed to have been solved. After 8 pt algorithm, in order to find simpler algorithm for uncalibrated camera, researchers tried to add constraints to the formulated equations and proposed 7 pt and 6 pt algorithms. In 2003, Hartley and Zisserman proposed 7 pt algorithm [2], which added the constraint that fundamental matrix and essential matrix are singular matrices. In 2005, Stewénius et al. proposed 6 pt algorithm [3] and in 2012 Kukelova et al. proposed a 6 pt algorithm based on polynomial eigenvalue [4]. 5 pt algorithms for calibrated camera include Nistér’s 5 pt algorithm [5] in 2004, Li and Hartley’s 5 pt algorithm [6] in 2006, and Kukelova’s polynomial-eigenvalue-based 5 pt algorithm [4] in 2012. These widely used traditional visual solutions rely on image correspondence points, which may contain error and noise. RANSAC [9] method is usually introduced to reduce those error and noise. Brückner et al. [10] compared these traditional solutions above. The advantage of these traditional solutions is that they can generate accurate results but the disadvantage is that they are complex in computing: the more correspondent points the algorithm needs, the slower its computing speed is.

Another group of solutions for CPE are the inertial-based solutions, which were not proposed until the MEMS sensors were accurate enough. In 2008, Gaida et al. [7] introduced a multisensor framework that combines gyroscopes, accelerometers, and magnetometers as a unit to estimate camera pose. Then a visual method is adopted to estimate camera pose too. Finally extended Kalman filter is adopted to fuse their results to obtain the final pose. One disadvantage of using accelerometers for translation estimation is that translation measurements from accelerometers are significantly less accurate than orientation measurements [11–13]. This is because gyroscope data need to be integrated only once to obtain the camera’s orientation but accelerometer data need to be integrated twice to obtain the camera’s translation, which will introduce too much noise that will affect significantly the accuracy. Miyano et al. [8] proposed an inertial and visual combination solution. It uses acceleration and a magnetic sensor to roughly estimate a camera pose and then searches the accurate pose by matching a captured image with a set of reference images. Corke et al. [14] made a survey on inertial and vision fusion solutions. These fusion solutions usually first perform CPE with separate inertial-based and visual-based solutions, generating respective results, and then fuse them by data filters. This is cooperation between inertial and visual methods. Its advantage lies in the robustness because the two methods can complement each other. Its disadvantage is the slow computing speed because of the fusion process.

This paper proposes an inertial and visual fusion solution called S^{2}fM for CPE. Different from existing fusion solutions which fuse inertial data and visual data in a cooperation manner, our solution fuses them in a division manner: it divides CPE problem into a rotation part and a translation part. Our solution first estimates camera rotation by gyroscopes and then uses it as known parameter in the visual method to estimate camera translation. Since the reliability and efficiency of gyroscopes for rotation estimation have been proven [7, 8, 11–14], they can significantly simplify the visual solution for camera translation estimation. As we will derive in the next section, only 3 pairs of correspondence points are needed for translation estimation. Different from Hartley and Nistér, who made great efforts to find the solution of the established equation sets, our focus is on proposing an inertial and visual fusion solution to solve CPE problem efficiently under the circumstance that no inertial sensor can estimate accurately enough translation parameter.

#### 3. Proposed Solution

This section describes our proposed solution which is under the pinhole camera model and consists of three steps: camera and gyroscope calibration, estimation of camera rotation, and estimation of camera translation.

##### 3.1. Camera and Gyroscope Calibration

Our solution first calibrates the camera and gyroscope and the calibrating contents are as follows:(1)Gyroscope noise processing(2)Camera focal length calibration (in pixel unit)(3)The delay between the gyroscope and frame sample timestamps

###### 3.1.1. Gyroscope Noise Processing

Raw MEMS gyroscope data need to be processed to remove zero-drift and random noise. We take the general statistical method to remove zero-drift. Put the device in static position for a period of time to get statistics of zero-drift and subtract it from the source data to obtain a series of stable, zero-expectation, and normally distributed random noise. Those random noises are then modeled through time sequence method and depressed by Kalman filter to give usable gyroscope data.

###### 3.1.2. Calibrating Algorithm

After noise processing, the gyroscope data can be used for the calibrating operation. The purpose of our calibrating algorithm is to calibrate the parameters (delay between the gyroscope and frame sample timestamps) and (camera focal length). We take a similar calibrating algorithm as in Miyano et al. [8] under camera rotation model (as Figure 1 shows) but with an optimized objective function.