EURASIP Journal on Image and Video Processing 
Volume 2008 (2008), Article ID 256896, 16 pages
doi:10.1155/2008/256896
Research Article

Iterative Object Localization Algorithm Using Visual Images with a Reference Coordinate

Kyoung-Su Park,1 Jinseok Lee,1 Milutin Stanaćević,1 Sangjin Hong,1 and We-Duke Cho2

1Mobile Systems Design Laboratory, Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, NY 11794, USA
2Department of Electronics Engineering, College of Information Technology, Ajou University, Suwon 443-749, South Korea

Received 29 July 2007; Revised 11 March 2008; Accepted 4 July 2008

Recommended by Carlo Regazzoni

Abstract

We present a simplified algorithm for localizing an object using multiple visual images that are obtained from widely used digital imaging devices. We use a parallel projection model which supports both zooming and panning of the imaging devices. Our proposed algorithm is based on a virtual viewable plane for creating a relationship between an object position and a reference coordinate. The reference point is obtained from a rough estimate which may be obtained from the preestimation process. The algorithm minimizes localization error through the iterative process with relatively low-computational complexity. In addition, nonlinearity distortion of the digital image devices is compensated during the iterative process. Finally, the performances of several scenarios are evaluated and analyzed in both indoor and outdoor environments.

1. Introduction

The object localization is one of the key operations in many tracking applications such as surveillance, monitoring and tracking [18]. In these tracking systems, the accuracy of the object localization is very critical and poses a considerable challenge. Most of localization methods use geometric relationship between the object and sensors. Acoustic sensors have been widely used in many localization applications due to their flexibility, low cost, and easy deployment. The acoustic sensor provides directional information in angle of the source with respect to the sensor coordinates which are used to create a geometry for localization. However, an acoustic sensor is extremely sensitive to its surrounding environment with noisy data and does not fully satisfy the requirement of consistent data [9]. Thus as a reliable tracking method, visual sensors are often used for tracking and monitoring systems as well [10, 11]. The visual localization has a potential to yield noninvasive, accurate, and low-cost solution [1214].

Multiple-image-based multiple-object detection and tracking are used in indoor and outdoor surveillance, and give a delicate and complete history of an interested object's action [2, 15, 16]. The object tracking can be simply concerned into a 2D tracking problem on the ground plane [2, 1719]. The establishment of correspondences in multiple images can be achieved by using a field of view lines [2, 20]. Besides, for the selection of the best view about interested objects, a camera movement such as zooming and panning is required [19].

There are many localization methods which use image sensors [5, 6, 13, 2125]. Most of conventional localization methods follow two steps of operation. Initially, the camera parameters are computed offline using known objects or pattern images. Then using additional information such as control points in the scene or techniques such as structure from motion, the relative displacements of a camera are estimated [21, 26]. Basically, these studies can sufficiently localize objects from 3D reconstruction. Once the sufficient number of points is observed in multiple images from different positions, it is mathematically possible to deduce the locations of the points as well as the positions of the original cameras, up to a known factor of scale [21]. In the localization method based on a perspective projection model, the camera calibration is critical. The calibration usually uses a flat plate with a regular pattern [14, 27, 28]. However, in many applications, it is not easy to obtain calibration patterns [29, 30]. In order to alleviate the effect of the calibration patterns, some methods based on self-calibration use the point matching from image sequences [2934]. In these methods, the image feature extraction should be very accurate since this procedure is very sensitive to the noise [21, 27, 35]. Moreover, if a pair of stereo images for a single scene is not calibrated and the motions between two images are unknown, the image matching requires prohibitively high complexity [27, 3436].

The localization method based on the affine reconstruction can be used for object localization without the concern of the complex calibration [3740]. Basically, the relationships between physical space and geometric properties of a set of cameras are considered. The method uses two uncalibrated perspective images where an image is induced by a plane to infinity [3739, 41, 42, 43, 44]. Especially, the factorization method based on the paraperspective projection model can be used for localization [42, 44, 45]. In [42], three well-known approximations such as orthography, weak perspective and paraperspective are involved to full-perspective projection in the affine projection model. In [44, 45], shape and motion recovery is used for less complexity in depth computation. However, the localization method based on the affine structure requires at least five correspondences in two images [3739]. On the other hand, our proposed method requires only one correspondence (i.e., a centroid coordinate of the detected object) in two images, where each correspondence represents the same object. Thus, the critical requirements of an effective localization algorithm in tracking applications are the computational simplicity with a simpler model where 3D reconstruction is not necessary as well as the robust adaptation of camera's movement during tracking (i.e., zooming and panning) without requiring any additional imaging device calibration from the images. The contribution of this paper is to simplify localization method with efficiency which does not consider 3D reconstruction and complex calibration.

In this paper, we propose a simplified algorithm for localizing multiple objects in a multiple-camera environment, where images are obtained from traditional digital imaging devices. Figure 1 illustrates the application model where multiple people are localized in a multiple-camera environment. The cameras can freely move with zooming and panning capabilities. Within a tracking environment, the proposed method uses detected object points to find object location. We use the 2D global coordinate to represent the object location. In our localization algorithm, the distance between an object and a camera is provided by a reference point. Since the reference point is initially a rough estimate, we are motivated to obtain a more accurate reference point. Here, we use an iterative process which substitutes a previously localized position with a new reference point close to a real-object location. In addition, the proposed localization method has an advantage of using a zooming factor without being concerned about a focal length. Thus, the computational complexity is simplified in determining an object's position which supports both zooming and panning features. In addition, the localization algorithm sufficiently compensates a nonideal property such as optical characteristics of a camera lens.

Figure 1: Illustration of the model of application.

The rest of this paper is organized as follows. Section 2 briefly describes a parallel projection model with a single camera. Section 3 illustrates the visual localization algorithm in a 2D coordinate with multiple cameras. In Section 4, we present analysis and simulation results where the localization errors are minimized by compensating for nonlinearity of the digital imaging devices. An application that uses the proposed algorithm for tracking people within a closed environment is illustrated. Section 5 concludes the paper.

2. Characterization of Viewable Images

2.1. Basic Concept of a Parallel Projection Model

In this section, we introduce a parallel projection model to simplify the visual localization, which is basically comprised of three planes: an object plane, a virtual viewable plane and an actual camera plane. In Figure 2, an object placed on an object plane is projected to both a virtual viewable plane and an actual camera plane, and denotes the projected object point on the virtual viewable plane. The distance denotes the distance between a virtual viewable plane and an object plane. and denote the position of projected object on both the actual camera plane and the virtual viewable plane. The virtual viewable plane is in parallel with the object plane by distance . and denote each length of the virtual viewable plane and the actual camera plane, respectively. The virtual viewable plane is for the connection between the position on the object plane and the position on the actual camera plane; it has an advantage of simplifying the computation process.

Figure 2: Illustration of the parallel projection model.

Since the size of image sensor is much smaller than the virtual viewable plane, the viewable range starts from a point . Thus the camera model of parallel projection model is similar to a pin-hole camera. All planes are represented as - and -axes but we use -axis for the explanation of the parallel projection model in this section. Since represents the origin of both the virtual viewable plane and the camera plane, two planes are placed on the same camera position. However, in Figure 2, we drew two planes separately to show the relationship between three planes.

In the parallel projection model, an object is projected from an object plane through a virtual viewable plane to an actual camera plane. Hence, as formulated in (1), is expressed as , , and through the proportional lines of two planes as the following:(1)

Thus the object is represented from and the distance between the virtual viewable plane and the object plane.

2.2. Zooming and Panning

Since the size of the virtual viewable plane and the object plane are proportional to the distance between the object and the camera (), the length of the virtual viewable plane () is derived from the distance and the viewable range.

Zooming factor represents the relationship between and . The zooming factor is defined as a ratio of and as follows:(2)

Since both and use metric units, zooming factor is a constant.

Figure 3 illustrates the model of zooming in terms of two different zooming factors. Even though the zooming factor of a camera has changed from to , if the distance between object and camera is not changed, the position of projected object on the virtual viewable plane is not changed. In the figure, since the distance is equal to the distance , the position of the object on the virtual viewable plane is invariant but the position on the actual camera plane is variant. Thus the distance is equal to but the distance is different from the distance . The projected positions and on the actual camera planes 1 and 2 are expressed as and . Since and , the relationship between and is represented as .

Figure 3: Illustration of the model of zooming in terms of two different zooming factors.

Figure 4 illustrates a special case in which two different objects denoted and are projected to the same spot on the actual camera plane. and denote the projected objects on the virtual viewable planes 1 and 2.

Figure 4: Illustration of a special case in which different objects are projected to the same spot on the actual camera plane.

The objects and are projected to a point on the actual camera plane while two objects are separated as two different points on the virtual viewable plane 1 and 2. Since the zooming factor is equal to and , the relationship between the distance and is expressed as . The distance is equal to the distance , and the distance is different from the distance . It is shown that the distance in projection direction between an object and a camera is an important parameter for the object localization.

Now, we consider a panning factor denoted as that represents camera rotation. The panning angle is defined as the angle difference between -axis and -axis where -axis represents the normal direction of the virtual viewable plane. Thus the panning angle can exist in the range of . The sign of is determined: the left rotation is positive and the right rotation is negative.

To get the global coordinate of the object, -axis and -axis in camera coordinate are translated to -axis and -axis in global coordinate. We define camera angle factor () to represent the absolute camera angle in global coordinate. The camera angle is useful to translate the object coordinate from camera images.

Figure 5 illustrates the relationship between the camera angle and the panning angle in global coordinate. The global coordinate is represented as -axis and -axis. For example, in the position of Camera , panning angle is the angle between - and -axes; while in Camera , the panning angle is the angle between -axis and -axis. Thus four cases of camera deployment such as Camera , , , have different relationships between and . Thus the projected object on the virtual viewable plane is derived from and . denotes the origin on the virtual viewable plane in global coordinate.

Figure 5: Illustration of individual panning factors with respect to a global coordinate.
2.3. The Relationship between Camera Positions and Pan Factors

Figure 6 illustrates the panning factor selection in a pair of cameras depending on an object position. Among deployment of four possible cameras, such as cameras , , , and , a pair of cameras located in adjacent axes is chosen.

Figure 6: Illustration of panning factor selection in a pair of cameras depending on an object position.

In this paper, we choose cameras and for the deployment of two cameras for the sake of the localization formulation. The camera angles in Camera and are expressed as and in terms of the panning angle .

3. Visual Localization Algorithm in a 2-Dimensional Coordinate

3.1. The Concept of Visual Localization

Turning to the object localization with an estimate, consider a single-camera-based localization. In the single-camera localization, we use the estimate plane as an object plane. Figure 7 illustrates the object localization using the estimate based on a single camera, where denotes the estimate which is used for a reference point. Note that the the estimate as a reference point may be any position at the first time, and it becomes close to a real position. The estimate and the object are projected to two planes: virtual viewable plane and actual camera plane. Here, the reference point generates the object plane. The distance denotes the distance between the estimate and the virtual viewable plane. In view of the projected positions, the length is obtained by the length . Hence the object is determined from the estimate .

Figure 7: Illustration of the visual localization in a single camera.

Once we use the estimate plane as an object plane, the estimated object position is different from the real-object position . In other words, since any points on the ray between the object and origin are projected to the same spot on the actual camera plane, the real object is distorted to the point . Thus, the localization has an error from the distance difference of the distances and . Through the single-image sensor-based visual projection method, it is shown that an approximated localization is accomplished with a reference point.

We are now motivated to use multiple image sensors in order to reduce the error between and . In the case of single camera, the distance difference between the distances and cannot be found by a single-camera view. However, if an additional camera is available for localizing the object within different angles, the distance difference can be compensated by the relationship between two camera views.

Figure 8 illustrates the localization using two cameras for a simple case where both panning factors are zero, and the directions of - and -axes are aligned to - and -axes. Given by a reference point , the virtual viewable planes for two cameras are determined. and are the obtained object coordinates in each single camera. In view of camera 1, the length between the projected points and supports the distance between the object plane of camera 2 and the point . Similarly, in the view of camera 2, the length between the projected points and supports a distance between the object plane of camera 2 and the point . Therefore, the basic compensation algorithm is that camera 1 compensates -direction by the length , and camera 2 compensates -direction by the length given by a reference point .

Figure 8: Illustration of the localization in multiple cameras.

Through one additional image sensor, both in -direction and in -direction make a reference point closer to a real-object position. Hence is computed by and . Note that is the localized object position through the two cameras, which still results in an error with the real-object position . The error can be reduced by obtaining a reference point closer to a real position . In Section 3.5, an iterative approach is introduced for improving localization. In the next section, we formulate the multiple image sensor-based localization.

3.2. 2D Localization
3.2.1. 2D Localization Model

In this section, we introduce a simplified localization model. If the estimate and the object have the same -coordinate and -axis is aligned with -axis, all points are placed on a plane. Thus the localization is simplified in 2D coordinate. The 2D localization is simple and has an advantage for mapping the test environment. Moreover, once the object is represented as in global coordinate, the 2D localization gives a feasible solution.

To derive 2D localization equations, we use vector notation which has a benefit to express the relationship between the estimate and the object where “” denotes a unit vector and “” represents a vector. For example, one vector is represented as , where , , and denote unit vectors toward -, -, and -axes and A, B, and C are the magnitude of -, -, and -axes, respectively. Figure 9 shows the basic model of object localization. The vectors , , and denote the vector from the estimate to the object , the vector from the projected estimate to the projected object on the virtual viewable plane 1, and the vector from the projected estimate to the projected object on the virtual viewable plane 2, respectively. The lengths and are the projections of the vector on the virtual viewable planes 1 and 2.

Figure 9: Illustration of basic localization algorithm.

Figure 10 shows the projected image on the virtual viewable planes 1 and 2 where the projected points and are expressed as (, ) and (, ) on the virtual viewable planes 1 and 2. and denote the -coordinates of the projected objects in global coordinate and are equal to and . Since the estimate has some height with the object, the projected estimate and object have the same -coordinate on the virtual viewable plane 1 and 2. Thus in the figure, is different from while is equal to . Since an estimate is a reference point, the actual estimates in the figure are not displayed on the actual camera plane. Since the projected vectors and are the projection of vector toward -axis and -axis, the lengths and are equal to and .

Figure 10: Illustration of the projected images on the virtual viewable planes 1 and 2.
3.2.2. Object Localization Based on a Single Camera

The projected object in -axis is transformed into in global coordinate. The origin is the center of virtual viewable plane. The camera deployment is expressed as the origin and camera angle .

Figure 11 shows the estimation with a reference point, and a projected object. denotes the vector from the origin to the estimate . The object , estimate , projected objects , and projected estimates are denoted as , , , and in global coordinate. The vector is expressed in two ways which have different points of view: on the virtual viewable plane and in global coordinate.

Figure 11: The estimation of a projected object.

The unit vector is represented in global coordinate as . The vector is expressed as . Since the length is equal to the projection of vector toward -axis (), the length is represented as:(3)

Once we assume the estimate is close to the object, the length is represented as(4)where the length is the length of the projected estimate and object on the actual camera plane.

In Figure 11, since the vector is equal to , the length of vector is represented as follows:(5)

Since the length is the projection of the vector toward -axis (), the global coordinate is related with as follows:(6)

Note that since there are two unknown values of , two equations are necessary.

3.2.3. Object Localization Based on Multiple Cameras

As shown in Figure 9, once there are two available cameras which show an object at the same time, two cameras have the following relationship:(7)

The projected vector sizes of the vectors and are derived from and in (5). The lengths and are represented as and in (4). The length between and in an actual camera plane () and the length between and in an actual camera plane () are obtained from displayed images.

Therefore, the object position is represented as follows:(8)

3.3. Effect of Zooming and Lens Distortion

The errors caused by zooming effect and lens distortion are the reason of scale distortion. In practice, since every general camera lens has nonlinear viewable range, the zooming factor is not a constant. Moreover, since a reference point is a rough estimate, the distance could be different from the distance . However, in (4), the distance , instead of the distance , is used to get the length .

Figure 12 illustrates the actual (nonideal) zooming model caused by lens distortion where the dashed line and the solid line indicate ideal viewable angle and actual viewable angle, respectively.

Figure 12: Illustration of actual zooming model caused by lens distortion.

For reference, zooming distortion is illustrated in Figure 13 with the function of distance from the camera and various actual zooming factors measured by Canon Digital Rebel XT with Tamron SP AF 17–50 mm Zoom Lens [46, 47] where the dashed line is the ideal zooming factor and the solid line is the actual (nonideal) zooming factor. As the distance increases, the nonlinearity property of zooming factor decreases.

Figure 13: Illustration of zooming distortion on a function of distance from the camera and various actual zooming factors used.

To reduce the localization error, we update the length . The lengths and are equal to and , respectively. Due to the definition of zooming factor, and are expressed as and . Since the objects and are projected at the same point on the actual camera plane in Figure 12, and have the same length on the actual camera plane. Thus the actual length is represented as follows: (9)

The distances and are derived from(10)where , , , and , are equal to , , , and , respectively.

Finally, the compensated object position is determined as follows:(11)where the lengths and are equal to and , respectively.

3.4. Effect of Lens Shape

The virtual viewable plane is a plane, and real camera displays a curved space. Thus, unit distances per pixel in - and -axes are nonlinear on the actual camera plane. Figure 14 shows the error caused by lens shape, where the distances and denote two different distances between the estimates and the camera.

Figure 14: Illustration of the error caused by lens shape.

Figure 15 illustrate the distribution of unit distance of - and -axes on the actual camera plane. The distance between camera and calibration sheet is 35 inches and an unit distance is 1 inch.

Figure 15: Illustration of unit distance distribution due to camera nonlinearity on the actual camera plane.

The translation of the distance between the estimate and the object needs the compensation for the nonlinearity by camera calibration. In Figure 15(a), the unit distance for -axis is invariant in -axis and in Figure 15(b), the unit distance for -axis is also invariant in -axis. Hence in Figure 10, the height differences of two different cameras have little effect for the overall localization error.

3.5. Iterative Localization for Error Minimization

Once the virtual viewable plane is defined by the estimate, the localized result has the error caused by the distance difference between the estimate and the real object . Thus the distance between the object and the estimate is important for reducing the localization error.

The basic concept of iterative approach is to use the previous localized position as a new reference point for the localization of object . Thus since the reference point is closer to a real position , the localized position is getting closer to a real position .

Figure 16(a) illustrates the basic localization based on two cameras where represents the real object. If the distance is equal to the distance , the obtained object coordinate uses the coordinate of and to translate the global coordinate of the object. Thus the object point is closer to the real object point .

Figure 16: Illustration of iterative localization.

Figure 16(b) shows the iterative localization. Each iteration gives closer object coordinate with relative computational complexity. Thus the iterative approach can reduce the localization error. Furthermore, through the iteration process, the localization is becoming insensitive to the nonlinear properties.

3.6. Effect of Tilting Angle

In surveillance system, a camera can have tilting angle to increase viewable area. The tilting angle represents the angle difference between -axis and -axis on the virtual viewable plane. The tilting angle has the range as .

Figure 17 illustrates an example of the tilting angle where one plane is placed on -axis and the other has tilting angle. The tilting angle is equal to the angle difference between virtual viewable plane and virtual viewable . Since -axis is invariant for the variation of tilting angle, -axis on the virtual viewable plane is the same as -axis on the virtual viewable .

Figure 17: Illustration of an example of the tilting angle.

The tilting angle is the reason for distortion in - and -axes as shown in Figure 18. and denote the project object positions of the same object within different tilting angles. The tilting angle is not affecting the variation in -axis. However, the tilting angle changes the distance of the object and camera. Thus, once the distance of object and camera is changed, the zooming factor is also changed. Therefore, the tilting angle distorts the object position in -axis.

Figure 18: Illustration of the distortion by the tilting angle ().

In Figure 18, the distance is different from the distance even if the position of camera and object is not changed. Since and on the actual camera plane are translated to and using the zooming factor and the distance between the object and camera, the tilting angle is the reason for the localization error.

Figure 19 illustrates the effect of tilting angle in terms of the distance between the object and the virtual viewable plane. The heights and denote the object height and the camera height. If the camera has tilting angle, the distance is changed by the distance .

Figure 19: Illustration of the effect of tilting angle.

In order to compensate the localization error from tilting angle, we update the distance to and then change the zooming factor for the distance . Thus the length in (9) is updated as follows:(12)where denotes the zooming factor when the distance between the object and the virtual viewable plane is .

The distance is derived as follows:(13)where the distance is computed as(14)

To quantify the localization error caused by tilting angle, we tested the localization error in the simple case. Figure 23 shows the setup of experiment where two cameras are placed on the left side for camera 1 and the bottom side for camera 2 in Cartesian coordinate. For simplicity, the panning factors and are both zero. We denote the object is placed on (1.8 m, 1.8 m) and (1.5 m, 1.5 m).

Figure 20(a) illustrates the localization error in terms of tilting angle variation. If the tilting angle is zero, the height difference between the camera and the object () does not affect the localization result while the higher tilting angle makes the higher localization error. Thus the tilting angle is the reason for localization error. For example, if the height of the object is 0.2 m lower than the camera height, the range of localization error is from 0.003 to 0.025 m.

Figure 20: Illustration of the localization error in terms of tilting angle variation.

Once object height is provided, the localization error is compensated by (12). In Figure 20(b), we compensated the localization error by denoting the camera height as 1.8 m and the object height as 1.6 m. The overall error caused by the tilting angle has the error range from 0.003 to 0.011 m. If we know the camera height and object height, the error is compensated. Moreover, once the height difference between the object and the camera is unknown, the localization error in high-tilting angle, the localization error is obviously improved. Therefore, if we expect the height of the object, the localization error can be successfully compensated.

When the height difference between the object and camera is an unknown value, the compensation for localization caused by tilting angle is difficult. However, if the distance is much longer than the distance , the tilting angle has little effect for the localization error. Figure 21 illustrates the localization error in terms of the distance where the tilting angle is 12.4 degree. When the distance increases, the localization error increases but after is 2.7 m, the error is saturated. In the worst case, the error rate is 0.01 m error per 0.2 m height distance. For example, once the camera height difference is 6 m, the expected error is about 0.3 m. Moreover, when the camera height is 0.2 m taller than the object, the error range is from 0.023 to 0.04 m. Once we assume the object is placed on 0.2 m lower than the camera, the compensation reduces the error to the range of 0.006 to 0.024 m.

Figure 21: Illustration of the localization error in terms of the distance ().

4. Analysis and Simulation

4.1. Simulation Setup: Basic Illustration

The objective in this simulation ensures the proposed localization algorithm by measuring the localization error in the real case. To show the compensation for camera nonlinearity, we chose small space which is close to the camera. In the case of Figure 13, the distortion from camera nonlinearity exists in 2.0 m inside space. Thus in this simulation, we use area.

Our target application is a surveillance system where most of target objects are human or vehicle. However, in this simulation, we use a small ball as a target object to simplify the target detection. There are many reasons for localization error caused by detection. For example, the centroid detection of a human is important for reducing localization error since a human is represented as a point. If we use different positions between two camera images, the localization result has some centroid error. Thus in this setup, we use a small ball. Moreover, after taking pictures, we manually search the center of ball. We analyze the localization error in 2D global coordinate. The object is represented as .

Figure 22 shows the displayed images in two cameras where the lengths and are distances between a reference point and a real-object point in camera 1 and camera 2, respectively. To explain the test setup, we showed the reference point in Figures 22(a) and 22(b), but actually the reference point is a virtual point.

Figure 22: Illustration of two images of camera 1 and camera 2.
Figure 23: Illustration of experimental setup for localizing an actual object.

Figure 23 shows the experiment setup to measure an actual object. In this experiment, the actual position of the object is calculated from the reference based on the parallel projection model. In Figure 23, two cameras are placed on the left side for camera 1 and the bottom side for camera 2 in Cartesian coordinate. Both camera panning factors and are at zero.

The actual zooming factors are and , where is the zooming factor when the distance between the object plane and the virtual viewable plane is , and is the zooming factor when the distance between the object plane and the virtual viewable plane is . Now, we analyze the localization result and compare the localization error depending on the iteration process called compensation.

4.2. Localization Error and Object Tracking Performance

Figure 24 shows the error distribution of the algorithm where two cameras are positioned at and . The actual object is located at . The figures illustrate the amount of localization error as a function of the reference coordinate. Since each camera has limited viewable angles, the reference coordinate located on the outside of viewable angle cannot be considered. Note that the error is minimized when the reference points are close to the actual object point. The localization error can be further reduced with multiple iterations.

Figure 24: Illustration of error comparison based on the number of iterations.

The proposed localization algorithm is also used for a tracking example. In this example, an object moves within a area, and the images are obtained from the real cameras. We first applied the proposed noniterative localization algorithm with compensation in tracking problems. Each time the object changes coordinates, its corresponding estimation is generated. Figure 25(a) illustrates the trajectory result of localization. After the compensation, the tracking performance is improved. Figures 25(b) and 25(c) illustrate the tracking performance in the -axis and the -axis. These figures clearly show that the compensation improves the tracking performance but the localization error still exists.

Figure 25: Application of the noniterative localization in tracking a trajectory with rough estimates.

Similarly, the proposed iterative localization algorithm is used in the same tracking example. In this case, only one reference coordinate is used for the entire localization. The chosen estimate is outside the trajectory as shown in Figure 26. This figure illustrates the trajectory result of localization. There is a significant error with the one iteration since the estimated coordinate is not close to the object. Note that the error increases if the object is further away from the estimated coordinate. However, successive iterations eliminated the localization error as shown in the figure.

Figure 26: Application of the iterative localization with single estimate.
4.3. Application of the Algorithms

Figure 27 shows a tracking environment with moving cameras where the proposed localization algorithm is applied. For illustration, two sequences of images are shown. The coordinates of the center of the room is chosen as the initial reference coordinate. The cameras follow the object during the localization. When the object is detected by individual camera, the coordinate of the camera images are combined for actual coordinate. The actual coordinate is shown in the tracking environment. In the experiment, cameras are following the object through panning.

Figure 27: The snapshots of the tracking environment with moving camera based on the proposed localization algorithm. Human face is used to localize a person. The circle represents the actual coordinate of the person within the room.

Figure 28 illustrates object detection in outdoor environment where two objects are used for evaluating the proposed localization algorithm. Both cameras are placed on the same side and the panning angles for camera 1 and camera 2 are and , respectively. Figure 29 illustrates two objects trajectories in an outdoor environment. Since the method is computationally simple, the total computation time is proportional to the the number of objects, which is not a significant with respect to overall computation. As shown in the figure, the trajectory computation errors are negligibly small for the practical use. The average error in terms of the distance between the actual trajectories and the computed trajectories is 0.294 m and 0.296 m for persons and , respectively. However, the maximum error can go up to as much as 0.608 m (3%) for person B. In addition to the localization algorithm computation errors, note that additional contributing factors on the errors are the measurements of the distances between the cameras and persons, and the selected center point of the detected regions of the persons used in the computation.

Figure 28: Illustration of detection results for people localization in an outdoor environment.
Figure 29: Illustration of two objects trajectory in an outdoor environment.

5. Conclusion

This paper proposes an accurate and effective object localization algorithm with visual images from unreliable estimate coordinates. In order to simplify the modeling of visual localization, the parallel projection model is presented where simple geometry is used in computation. The algorithm minimizes the localization error through iterative approach with relatively low-computational complexity. Nonlinearity distortion of the digital image devices is compensated during the iterative approach. The effectiveness of the proposed algorithm in object position localization as well as tracking is illustrated. The proposed algorithm can be effectively applied in many tracking applications where visual imaging devices are used.

Acknowledgments

This research is supported by Foundation of Ubiquitous Computing and Networking (UCN) project, the Ministry of Knowledge Economy (MKE) 21st Century Frontier R&D Program in Korea, and a result of subproject UCN 08B3-O4-30S.

References

  1. R. Okada, Y. Shirai, and J. Miura, “Object tracking based on optical flow and depth,” in Proceedings of the IEEE/SICE/RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp. 565–571, Washington, DC, USA, December 1996.
  2. S. Khan and M. Shah, “Consistent labeling of tracked objects in multiple cameras with overlapping fields of view,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 10, pp. 1355–1360, 2003.
  3. A. Bakhtari, M. D. Naish, M. Eskandari, E. A. Croft, and B. Benhabib, “Active-vision-based multisensor surveillance—an implementation,” IEEE Transactions on Systems, Man, and Cybernetics C, vol. 36, no. 5, pp. 668–680, 2006.
  4. N. X. Dao, B.-J. You, S.-R. Oh, and Y. J. Choi, “Simple visual self-localization for indoor mobile robots using single video camera,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '04), vol. 4, pp. 3767–3772, Sendai, Japan, September 2004.
  5. V. Ayala, J. B. Hayet, F. Lerasle, and M. Devy, “Visual localization of a mobile robot in indoor environments using planar landmarks,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '00), vol. 1, pp. 275–280, Takamatsu, Japan, October 2000.
  6. K. Nickel, T. Gehrig, R. Stiefelhagen, and J. McDonough, “A joint particle filter for audio-visual speaker tracking,” in Proceedings of the 7th International Conference on Multimodal Interfaces (ICMI '05), pp. 61–68, Torento, Italy, October 2005.
  7. D. N. Zotkin, R. Duraiswami, and L. S. Davis, “Joint audio-visual tracking using particle filters,” EURASIP Journal on Applied Signal Processing, vol. 2002, no. 11, pp. 1154–1164, 2002.
  8. G. Pingali, G. Tunali, and I. Carlbom, “Audio-visual tracking for natural interactivity,” in Proceedings of the 7th ACM International Conference on Multimedia, pp. 373–382, Orlando, Fla, USA, October 1999.
  9. D. B. Ward, E. A. Lehmann, and R. C. Williamson, “Particle filtering algorithms for tracking an acoustic source in a reverberant environment,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 826–836, 2003.
  10. H. Lee and H. Aghajan, “Collaborative node localization in surveillance networks using opportunistic target observations,” in Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks, pp. 9–18, Santa Barbara, Calif, USA, October 2006.
  11. O. Yakimenko, I. Kaminer, and W. Lentz, “A three point algorithm for attitude and range determination using vision,” in Proceedings of the American Control Conference (ACC '00), vol. 3, pp. 1705–1709, Chicago, Ill, USA, June 2000.
  12. H. Tsutsui, J. Miura, and Y. Shirai, “Optical flow-based person tracking by multiple cameras,” in Proceedings of the International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI '01), pp. 91–96, Baden-Baden, Germany, August 2001.
  13. V. Lepetit and P. Fua, “Monocular model-based 3D tracking of rigid objects: a survey,” Foundations and Trends in Computer Graphics and Vision, vol. 1, no. 1, pp. 1–89, 2005.
  14. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330–1334, 2000.
  15. M. Han, A. Sethi, W. Hua, and Y. Gong, “A detection-based multiple object tracking method,” in Proceedings of the International Conference on Image Processing (ICIP '04), vol. 5, pp. 3065–3068, October 2004.
  16. I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: real-time surveillance of people and their activities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 809–830, 2000.
  17. H. Jin and G. Qian, “Robust multi-camera 3D people tracking with partial occlusion handling,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '07), vol. 1, pp. 909–912, Honolulu, Hawaii, USA, April 2007.
  18. J. Berclaz, F. Fleuret, and P. Fua, “Robust people tracking with global trajectory optimization,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), vol. 1, pp. 744–750, New York, NY, USA, June 2006.
  19. K. Nummiaro, E. Koller-Meier, T. Svoboda, D. Roth, and L. Van Gool, “Color-based object tracking in multi-camera environments,” in Proceedings of 25th DAGM Symposium on Pattern Recognition, pp. 591–599, Magdeburg, Germany, September 2003.
  20. O. Javed, S. Khan, Z. Rasheed, and M. Shah, “Camera handoff: tracking in multiple uncalibrated stationary cameras,” in Proceedings of the IEEE Workshop on Human Motion (HUMO '00), pp. 113–118, Los Alamitos, Calif, USA, December 2000.
  21. P. E. Debevec, Modeling and rendering architecture from photographs, Ph.D. thesis, University of California at Berkeley Computer Science Division, Berkeley Calif, USA, 1996.
  22. M. Watannabe and S. K. Nayar, “Telecentric optics for computational vision,” in Proceedings of the 4th European Conference on Computer Vision (ECCV '96), vol. 2, pp. 439–451, Cambridge, UK, April 1996.
  23. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
  24. C. Geyer and K. Daniilidis, “Omnidirectional video,” The Visual Computer, vol. 19, no. 6, pp. 405–416, 2003.
  25. S. Spors, R. Rabenstein, and N. Strobel, “A multi-sensor object localization system,” in Proceedings of the Vision Modeling and Visualization Conference (VMV '01), pp. 19–26, Stuttgart, Germany, November 2001.
  26. S. Bougnoux, “From projective to Euclidean space under any practical situation, a criticism of self-calibration,” in Proceedings of the 6th IEEE International Conference on Computer Vision (ICCV '98), pp. 790–796, Bombay, India, January 1998.
  27. R. K. Lenz and R. Y. Tsai, “Techniques for calibration of the scale factor and image center for high accuracy 3-D machine vision metrology,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, no. 5, pp. 713–720, 1988.
  28. J. Heikkila and O. Silven, “A four-step camera calibration procedure with implicit image correction,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '97), pp. 1106–1112, San Juan, Puerto Rico, USA, June 1997.
  29. F. Lv, T. Zhao, and R. Nevatia, “Camera calibration from video of a walking human,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1513–1518, 2006.
  30. O. D. Faugeras, Q.-T. Luong, and S. J. Maybank, “Camera self-calibration: theory and experiments,” in Proceedings of the 2nd European Conference on Computer Vision (ECCV '92), pp. 321–334, Santa Margherita Ligure, Italy, May 1992.
  31. A. Zisserman, P. A. Beardsley, and I. D. Reid, “Metric calibration of a stereo rig,” in Proceedings of the IEEE Workshop on Representation of Visual Scenes (WVRS '95), pp. 93–100, Cambridge, Mass, USA, June 1995.
  32. E. Horster, R. Lienhart, W. Kellermann, and J.-Y. Bouguet, “Calibration of visual sensors and actuators in distributed computing platforms,” in Proceedings of the 3rd ACM International Workshop on Video Surveillance & Sensor Networks, pp. 19–28, Hilton, Singapore, November 2005.
  33. P. F. Sturm and S. J. Maybank, “On plane-based camera calibration: a general algorithm, singularities, applications,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '99), vol. 1, pp. 432–437, Fort Collins, Colo, USA, June 1999.
  34. Z. Zhang, R. Deriche, O. Faugeras, and Q.-T. Luong, “A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry,” Artificial Intelligence, vol. 78, no. 1-2, pp. 87–119, 1995.
  35. Q. Memon and S. Khan, “Camera calibration and three-dimensional world reconstruction of stereo-vision using neural networks,” International Journal of Systems Science, vol. 32, no. 9, pp. 1155–1159, 2001.
  36. R. Cipolla, T. W. Drummond, and D. Robertson, “Camera calibration from vanishing points in images of architectural scenes,” in Proceedings of the British Machine Vision Conference, vol. 2, pp. 382–391, Nottingham, UK, September 1999.
  37. P. A. Beardsley, A. Zisserman, and D. W. Murray, “Sequential updating of projective and affine structure from motion,” International Journal of Computer Vision, vol. 23, no. 3, pp. 235–259, 1997.
  38. O. Faugeras, “Stratification of three-dimensional vision: projective, affine, and metric representations: errata,” Journal of Optical Society of America, vol. 12, no. 3, pp. 465–484, 1995.
  39. T. Moons, L. Van Gool, M. Proesmans, and E. Pauwels, “Affine reconstruction from perspective image pairs with a relative object-camera translation in between,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 1, pp. 77–83, 1996.
  40. M. Pollefeys and L. Van Gool, “A stratified approach to metric self-calibration,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '97), pp. 407–412, San Juan, Puerto Rico, USA, June 1997.
  41. P. A. Beardsley and A. Zisserman, “Affine calibration of mobile vehicles,” in Proceedings of the Europe-China Workshop on Geometrical Modelling and Invariants for Computer Vision (GMICV '95), Xi'an, China, April 1995.
  42. J. J. Koenderink and A. J. van Doorn, “Affine structure from motion,” Journal of the Optical Society of America A, vol. 8, no. 2, pp. 377–385, 1991.
  43. P. Sturm and L. Quan, “Affine stereo calibration,” in Proceedings of the 6th International Conference on Computer Analysis of Images and Patterns (CAIP '95), pp. 838–843, Prague, Czech Republic, September 1995.
  44. C. Tomasi and T. Kanade, “Shape and motion from image streams under orthography: a factorization method,” International Journal of Computer Vision, vol. 9, no. 2, pp. 137–154, 1992.
  45. C. J. Poelman and T. Kanade, “A paraperspective factorization method for shape and motion recovery,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 3, pp. 206–218, 1997.
  46. http://www.usa.canon.com/.
  47. http://www.tamron.com/.