#### Abstract

Road monitoring helps to control the regional traffic situation so as to adjust the traffic flow. Real-time panorama is conducive to timely treat traffic accidents and to greatly improve traffic capacity. This paper designs a 3D road scene monitoring framework based on real-time panorama. The system is the combination of large scale panorama, satellite map textures, and 3D scene model, in which users can ramble freely. This paper has the following contributions. Firstly, land-points were extracted followed by motion detection, then comotion algorithm was applied to land-points from adjacent cameras, and homography matrix was constructed. Secondly, reference camera was chosen and transformed to overhead viewpoint; subsequently multiviews were morphed to the same viewpoint and stitched to panorama. Finally, the registration based on high-precision GPS information between 2D road panorama and 3D scene model was also proposed. The proposed framework has been successfully applied to a large road intersection monitoring. Experimental results are furnished at the end of the paper.

#### 1. Introduction

Independent camera only provides local information. As a result, it is difficult to make a reasonable judgment and immediate response to the global situation. Image mosaic can combine a set of images into a larger image with a wider field of view of the scene. It helps to improve observers’ spatial awareness.

Over the years, numerous algorithms for image stitching have been developed. Image stitching is typically solved by finding global parametric warps to bring images into alignment [1]. 2D projective warps are parameterized by homography, if the scene is planar or if the views differ purely by rotation [2, 3]. The warps of video images from multicameras were described in [4, 5]; however, they only focused on the images (infrared and visible images) of similar viewpoint. Lee et al. [6] introduced an automatic method to warp multiview images to a unified coordinate plane. They proposed time-space registration based on trajectory extraction and on the basis the homography matrices from images to the reference plane were estimated. However the trajectory accuracy might influence fit precision. Reference [7] presented a method for ground plane estimation from image pairs and land-points, instead of static features such as color, shape, and contours, for image matching. Szlávik et al. [8] proposed a method for matching partially overlapping image pairs where the object of interest was in motion, namely, comotion algorithm. It was valid even if the motion is discontinuous and in an unstructured environment. Szlávik et al. [9] also applied comotion model to project a couple of camera images to the reference road plane.

Recently significant progress in the synthetic vision system (SVS) [10, 11], which is a computer-generated reality system, has been achieved. And that computer-generated image, namely, virtual image, is complementary to optical sensor-based vision. SVS supplies users with good immerse visualization. For example, virtual and reality technology has been studied in landmarks (horizon, runway) detection from an aircraft in low visibility conditions [12, 13].

In this paper, we propose a 3D road monitoring method based on real-time panorama. We have synthesized mosaicked images and static 3D scene model. Firstly, moving targets were detected and land-points were extracted. Secondly, binary sequences in continuous time were generated according to whether the pixel is land-point set or not. Subsequently land-points from adjacent cameras were matched according to similarity measure, resulting in homography transformation of adjacent cameras. Thirdly, choose a camera as a reference one and wrap the view to overhead view according to the world coordinates of ground control points (GCPs) and corresponding image coordinates. Thus with the transformation relationships between adjacent cameras, all the views were unified to overhead view. Finally, panorama was generated based on stitching lines. In the end of the paper, we proposed a novel panorama presentation of 3D road scene with real-time traffic situation. Specific process is shown in Figure 1.

#### 2. Transformation of Multiview Based on Land-Points Comotion Method

The multiple views would be unified to the same viewpoint. In this paper 2D projective transformation brings all the images aligned to a common ground plane with overhead view. There are three main steps.(1)Homography relations are implemented via land-points comotion statistical maps between adjacent cameras.(2)Referenced image is projected to the ground plane on the basis of the world coordinate of GCPs and the corresponding image coordinates.(3)From steps (1) and (2), all the views are converted to the overhead view.

##### 2.1. Calculation of Land-Points

Assuming the moving blobs has been detected before the calculation of land-points, land-point [7] is the point of central axis projected to the ground. The calculation of land-point is as follows.(1)Binarization of video images using hybrid Gauss background algorithm: moving blobs are denoted as (2)Trace the contour of , name the contour , and only the blobs whose area of external polygon is greater than (≥10) are counted.(3)The external rectangle of is denoted as (4)Line perpendicular to the -axis ; the distance from point to line is (5)Calculate the abscissa of land-point If there are more than one meeting then is the average of these quantities.(6)Get land-point .

The land-point of moving blobs is illustrated in Figure 2.

##### 2.2. Matching of Land-Points Based on Comotion Method

In our approach no prior knowledge is needed, and the method also works well in images of randomly scrambled motion. It is difficult to realize the registration among images with different position and angle of view. In this paper, the road surface served the reference plane so as to solve the image registration with comotion statistics maps [4, 5]. The hybrid Gauss background model with a fine performance in shadow removal was constructed to detect the foreground of camera video. For any pixel at th frame, a binary variable indicates whether the pixel belongs to foreground. Continuous construct sequence vector [6], where

The aforementioned is illustrated in (6). Among them, subscript is a reference image and is a matched one. Let and denote the width and height of image, respectively. And is the length of the video to participate in the registration.

Hence coded sequence can be divided into two categories: valid motion point and noise point with almost no or frequent movement. Let be static point threshold and let be the noise point threshold. Subsequently characteristic function is defined as follows: where is the -norm and is the difference of vector, for a vector is .

Point will be removed from candidate matching pairs if . Here similarity measure of the binary sequence is depicted as -norm. Thus from (8) perfect matching objective function can be written as follows: with

In general, pixel which is valid moving point in image based on (9) matches the moving point in image if and only if the -norm of - is minimum.

In (9) the nearest neighbor searching is implemented with ANN (approximate nearest neighbor) library [14] developed by Mount and Arya. Define the dimension of searching space as and Minkowski norm to be -norm. ANN supports kd-trees and box-decomposition data structures, which greatly improves the efficiency of high dimensional feature matching.

##### 2.3. Homography between Adjacent Cameras

According to the camera model, the relationship between world coordinate and image pixel coordinate is where , , , and are called the internal camera parameters, and are called the external parameters, and is depth coefficient.

When the world point falls in the plane, (11) is cast as

Customarily (12) is rewritten as where is called a homography.

Based on (9), we get the corresponding land-point pairs between images and to estimate 8 parameters of the projective transformation model; pixel is the destination pixel in image corresponding to the source pixel in image . Substitute in (13) for the land-point pairs; then the homography is computed such that

In many practical situations this assumption is not valid because land-points with noise are mismatched [15]. To improve the robustness of the transformation model, random sample consensus (RANSAC) procedure is applied to the estimation of homography matrix [16].

##### 2.4. Projection of the Reference Camera Based on GCPs

Given the center of visual range of camera and the center of the panorama , then the reference camera is chosen via

As shown in Figure 3, camera 3 is the reference camera satisfied with (15).

The view of reference camera will be projected to the ground plane based on GCPs by means of measurement. That is to say, the view has been transformed to overhead view till now. For any quadrangle on the ground, the intersection of edges and is the origin of the world coordinate and -axis parallels pointing right. The coordinate system and length of each edge are as shown in Figure 4.

Given , , , , , and , according to simple geometric relations, the vertices of quadrilateral are as below [17]:

In engineering application, so as to get better visual effect, similarity transformation is necessary: where proper translation and makes the coordinate nonnegative, and scaling coefficient determines the height from viewpoint to the ground.

Suppose the coordinates of 4 points from (16), (17), and the corresponding image points substitute and in (13); then the view of reference camera has been transformed to overhead view.

##### 2.5. Viewpoints Unifying

Now suppose there are three cameras. Let be the homography transforming the current viewpoint to overhead one. Suppose camera 2 is the reference camera; then where and are obtained in Section 2.3 and in Section 2.4.

#### 3. 3D Road Scene Based on Multiview Panorama

2D multiview panoramas from multiple cameras installed top near the road were generated in real time. 2D panoramic road texture, satellite map texture, and 3D scene model were fused. The process is referred to as 2D-3D image registration. The whole data architecture has been shown in Figure 5.

##### 3.1. Registration of Panorama and Satellite Map Texture

GPS geodetic coordinate transformation was introduced in [18, 19]. Suppose point of panorama corresponds to GPS geodetic coordinate and the position precision (Table 1) meets the demand of measurement.

The integrated navigation system XW-ADU5630 was used to obtain the GPS information of panorama’s key-points, and the pixels of satellite map texture corresponding to the key-points also have GPS information. Therefore it is easy to match the panorama with satellite map texture based on GPS information.

Applying the set of correspondences (GCPs) to (13), satellite map texture is registered to panorama. Therefore fusion of panorama and satellite map texture can be easily achieved. Panorama with satellite map texture is called extended panorama.

##### 3.2. 2D-3D Image Registration

In our system, all the views are projected to the common road plane, which is the region of interest (ROI) of extended panorama. As a result there is much perspective deformation in top buildings above the ground. Therefore valid road area needs to be clipped out and registered to 3D model (Figure 12). Therefore observers can toggle 3D rendering engine to view the scene freely.

When we align the points on ground in 3D model and corresponding points in extended panorama, 2D-3D image registration is completed. A similarity transformation (or more simply similarity) is conducted so as to align the 3D model and 2D panorama. The similarity can be written more concisely in block form as

A similarity in (19) has seven degrees of freedom and can be computed from four point pairs, that is, four corresponding GCPs. The GCPs are manually picked from ground plane of 3D scene model and 2D extended panorama.

#### 4. Results and Discussion

Our experiments are conducted on the workstation with an Intel Xeon E5-2650 CPU and two graphic cards (NVIDIA GTX Titan Black). Six Samsung web cameras with 2.0-Megapixel are set up at the top of a building next to the road, monitoring a large crossroad. In the experiment, six cameras with three types of focal length (4 mm, 6 mm, and 8 mm) were used.

##### 4.1. Abridged General View of Generation of Extended Panorama

*(**1) Motion Detection.* The hybrid Gauss background model was applied to detect moving blobs (foreground). The results are demonstrated in Figure 6. The moving blobs, especially cars, have high accuracy to generate nice matching pairs.

**(a)**

**(b)**

**(c)**

**(d)**

After comotion process, land-points comotion statistics map [9] is shown in Figure 7. The higher the statistics value at a given position the brighter the corresponding pixel in the image.

**(a)**

**(b)**

**(c)**

**(d)**

*(**2) Overhead View.* Four points easily to be measured on the ground were chosen, and the corresponding image coordinates were picked manually as shown in Figure 8.

**(a)**

**(b)**

*(**3) Panorama and Satellite Map Texture Registration.* Figure 9 is the result of merging ROI of real-time traffic panorama and 2D satellite map texture. It is shown in Figure 9 on the left. On the right there are 6-channel real HD video images.

*(**4) Viewpoint Roam.* The 3D model is driven based on OSG. Users can observe the road at any viewpoint. In Figure 10, we give two screenshots from two viewpoints. It helps observers to inspect the traffic situation in any direction and react promptly in emergency.

**(a)**

**(b)**

##### 4.2. Experiments on Other Roads

Figure 11 shows the experiments on other two roads based on the method proposed. One is the panorama of civic center from 6-channel videos. The other is the panorama of a plaza from 2-channel videos. The polygon region in red dotted lines is mosaicked image.

**(a)**

**(b)**

##### 4.3. Comparative Experiments

A multisource images (visible image, satellite map texture, and 3D scene model) fusion method was proposed. By applying the method in [6], we have found the regions in the neighborhood of tall buildings have much projective deformation, as shown in Figure 12. The polygon region in red dotted lines is mosaicked image from real-time video images. And the polygon region in sky blue dotted lines is ROI of panorama, which is clipped out in our method as shown in Figure 12.

#### 5. Conclusion

This paper implements a reasonable integration of 2D multiview overhead panorama, 2D satellite map texture, and 3D model to synthesize 3D road scene. The 3D road scene monitoring system has been successfully applied to traffic supervision. The fusion method is innovative and enables the observers to have the sense of immersion. Besides, 3D road scene OSG driven can be translated, rotated, and scaled as one wishes.

In this paper, we focused mainly on the generation of 2D panorama and the presentation of 3D road scene. In future work, we will research on intelligent video analysis and traffic situation analysis.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work was supported in part by the National High Technology Research and Development Program (863 Program) of China under Grants 2012AA011804-1 and 2013AA013802, in part by the Open Research Fund of Key Laboratory of Higher Education of Sichuan Province through the Enterprise Informationalization and Internet of Things under Grant 2013WZJ01, and in part by the Scientific Research Project of Sichuan University of Science and Engineering under Grants 2014PY08 and 2014RC02.