Recent Advances on Mathematical Modeling and Control Methods for Complex Vehicle Systems
View this Special IssueResearch Article  Open Access
3D Road Scene Monitoring Based on RealTime Panorama
Abstract
Road monitoring helps to control the regional traffic situation so as to adjust the traffic flow. Realtime panorama is conducive to timely treat traffic accidents and to greatly improve traffic capacity. This paper designs a 3D road scene monitoring framework based on realtime panorama. The system is the combination of large scale panorama, satellite map textures, and 3D scene model, in which users can ramble freely. This paper has the following contributions. Firstly, landpoints were extracted followed by motion detection, then comotion algorithm was applied to landpoints from adjacent cameras, and homography matrix was constructed. Secondly, reference camera was chosen and transformed to overhead viewpoint; subsequently multiviews were morphed to the same viewpoint and stitched to panorama. Finally, the registration based on highprecision GPS information between 2D road panorama and 3D scene model was also proposed. The proposed framework has been successfully applied to a large road intersection monitoring. Experimental results are furnished at the end of the paper.
1. Introduction
Independent camera only provides local information. As a result, it is difficult to make a reasonable judgment and immediate response to the global situation. Image mosaic can combine a set of images into a larger image with a wider field of view of the scene. It helps to improve observers’ spatial awareness.
Over the years, numerous algorithms for image stitching have been developed. Image stitching is typically solved by finding global parametric warps to bring images into alignment [1]. 2D projective warps are parameterized by homography, if the scene is planar or if the views differ purely by rotation [2, 3]. The warps of video images from multicameras were described in [4, 5]; however, they only focused on the images (infrared and visible images) of similar viewpoint. Lee et al. [6] introduced an automatic method to warp multiview images to a unified coordinate plane. They proposed timespace registration based on trajectory extraction and on the basis the homography matrices from images to the reference plane were estimated. However the trajectory accuracy might influence fit precision. Reference [7] presented a method for ground plane estimation from image pairs and landpoints, instead of static features such as color, shape, and contours, for image matching. Szlávik et al. [8] proposed a method for matching partially overlapping image pairs where the object of interest was in motion, namely, comotion algorithm. It was valid even if the motion is discontinuous and in an unstructured environment. Szlávik et al. [9] also applied comotion model to project a couple of camera images to the reference road plane.
Recently significant progress in the synthetic vision system (SVS) [10, 11], which is a computergenerated reality system, has been achieved. And that computergenerated image, namely, virtual image, is complementary to optical sensorbased vision. SVS supplies users with good immerse visualization. For example, virtual and reality technology has been studied in landmarks (horizon, runway) detection from an aircraft in low visibility conditions [12, 13].
In this paper, we propose a 3D road monitoring method based on realtime panorama. We have synthesized mosaicked images and static 3D scene model. Firstly, moving targets were detected and landpoints were extracted. Secondly, binary sequences in continuous time were generated according to whether the pixel is landpoint set or not. Subsequently landpoints from adjacent cameras were matched according to similarity measure, resulting in homography transformation of adjacent cameras. Thirdly, choose a camera as a reference one and wrap the view to overhead view according to the world coordinates of ground control points (GCPs) and corresponding image coordinates. Thus with the transformation relationships between adjacent cameras, all the views were unified to overhead view. Finally, panorama was generated based on stitching lines. In the end of the paper, we proposed a novel panorama presentation of 3D road scene with realtime traffic situation. Specific process is shown in Figure 1.
2. Transformation of Multiview Based on LandPoints Comotion Method
The multiple views would be unified to the same viewpoint. In this paper 2D projective transformation brings all the images aligned to a common ground plane with overhead view. There are three main steps.(1)Homography relations are implemented via landpoints comotion statistical maps between adjacent cameras.(2)Referenced image is projected to the ground plane on the basis of the world coordinate of GCPs and the corresponding image coordinates.(3)From steps (1) and (2), all the views are converted to the overhead view.
2.1. Calculation of LandPoints
Assuming the moving blobs has been detected before the calculation of landpoints, landpoint [7] is the point of central axis projected to the ground. The calculation of landpoint is as follows.(1)Binarization of video images using hybrid Gauss background algorithm: moving blobs are denoted as (2)Trace the contour of , name the contour , and only the blobs whose area of external polygon is greater than (≥10) are counted.(3)The external rectangle of is denoted as (4)Line perpendicular to the axis ; the distance from point to line is (5)Calculate the abscissa of landpoint If there are more than one meeting then is the average of these quantities.(6)Get landpoint .
The landpoint of moving blobs is illustrated in Figure 2.
2.2. Matching of LandPoints Based on Comotion Method
In our approach no prior knowledge is needed, and the method also works well in images of randomly scrambled motion. It is difficult to realize the registration among images with different position and angle of view. In this paper, the road surface served the reference plane so as to solve the image registration with comotion statistics maps [4, 5]. The hybrid Gauss background model with a fine performance in shadow removal was constructed to detect the foreground of camera video. For any pixel at th frame, a binary variable indicates whether the pixel belongs to foreground. Continuous construct sequence vector [6], where
The aforementioned is illustrated in (6). Among them, subscript is a reference image and is a matched one. Let and denote the width and height of image, respectively. And is the length of the video to participate in the registration.
Hence coded sequence can be divided into two categories: valid motion point and noise point with almost no or frequent movement. Let be static point threshold and let be the noise point threshold. Subsequently characteristic function is defined as follows: where is the norm and is the difference of vector, for a vector is .
Point will be removed from candidate matching pairs if . Here similarity measure of the binary sequence is depicted as norm. Thus from (8) perfect matching objective function can be written as follows: with
In general, pixel which is valid moving point in image based on (9) matches the moving point in image if and only if the norm of  is minimum.
In (9) the nearest neighbor searching is implemented with ANN (approximate nearest neighbor) library [14] developed by Mount and Arya. Define the dimension of searching space as and Minkowski norm to be norm. ANN supports kdtrees and boxdecomposition data structures, which greatly improves the efficiency of high dimensional feature matching.
2.3. Homography between Adjacent Cameras
According to the camera model, the relationship between world coordinate and image pixel coordinate is where , , , and are called the internal camera parameters, and are called the external parameters, and is depth coefficient.
When the world point falls in the plane, (11) is cast as
Customarily (12) is rewritten as where is called a homography.
Based on (9), we get the corresponding landpoint pairs between images and to estimate 8 parameters of the projective transformation model; pixel is the destination pixel in image corresponding to the source pixel in image . Substitute in (13) for the landpoint pairs; then the homography is computed such that
In many practical situations this assumption is not valid because landpoints with noise are mismatched [15]. To improve the robustness of the transformation model, random sample consensus (RANSAC) procedure is applied to the estimation of homography matrix [16].
2.4. Projection of the Reference Camera Based on GCPs
Given the center of visual range of camera and the center of the panorama , then the reference camera is chosen via
As shown in Figure 3, camera 3 is the reference camera satisfied with (15).
The view of reference camera will be projected to the ground plane based on GCPs by means of measurement. That is to say, the view has been transformed to overhead view till now. For any quadrangle on the ground, the intersection of edges and is the origin of the world coordinate and axis parallels pointing right. The coordinate system and length of each edge are as shown in Figure 4.
Given , , , , , and , according to simple geometric relations, the vertices of quadrilateral are as below [17]:
In engineering application, so as to get better visual effect, similarity transformation is necessary: where proper translation and makes the coordinate nonnegative, and scaling coefficient determines the height from viewpoint to the ground.
Suppose the coordinates of 4 points from (16), (17), and the corresponding image points substitute and in (13); then the view of reference camera has been transformed to overhead view.
2.5. Viewpoints Unifying
Now suppose there are three cameras. Let be the homography transforming the current viewpoint to overhead one. Suppose camera 2 is the reference camera; then where and are obtained in Section 2.3 and in Section 2.4.
3. 3D Road Scene Based on Multiview Panorama
2D multiview panoramas from multiple cameras installed top near the road were generated in real time. 2D panoramic road texture, satellite map texture, and 3D scene model were fused. The process is referred to as 2D3D image registration. The whole data architecture has been shown in Figure 5.
3.1. Registration of Panorama and Satellite Map Texture
GPS geodetic coordinate transformation was introduced in [18, 19]. Suppose point of panorama corresponds to GPS geodetic coordinate and the position precision (Table 1) meets the demand of measurement.

The integrated navigation system XWADU5630 was used to obtain the GPS information of panorama’s keypoints, and the pixels of satellite map texture corresponding to the keypoints also have GPS information. Therefore it is easy to match the panorama with satellite map texture based on GPS information.
Applying the set of correspondences (GCPs) to (13), satellite map texture is registered to panorama. Therefore fusion of panorama and satellite map texture can be easily achieved. Panorama with satellite map texture is called extended panorama.
3.2. 2D3D Image Registration
In our system, all the views are projected to the common road plane, which is the region of interest (ROI) of extended panorama. As a result there is much perspective deformation in top buildings above the ground. Therefore valid road area needs to be clipped out and registered to 3D model (Figure 12). Therefore observers can toggle 3D rendering engine to view the scene freely.
When we align the points on ground in 3D model and corresponding points in extended panorama, 2D3D image registration is completed. A similarity transformation (or more simply similarity) is conducted so as to align the 3D model and 2D panorama. The similarity can be written more concisely in block form as
A similarity in (19) has seven degrees of freedom and can be computed from four point pairs, that is, four corresponding GCPs. The GCPs are manually picked from ground plane of 3D scene model and 2D extended panorama.
4. Results and Discussion
Our experiments are conducted on the workstation with an Intel Xeon E52650 CPU and two graphic cards (NVIDIA GTX Titan Black). Six Samsung web cameras with 2.0Megapixel are set up at the top of a building next to the road, monitoring a large crossroad. In the experiment, six cameras with three types of focal length (4 mm, 6 mm, and 8 mm) were used.
4.1. Abridged General View of Generation of Extended Panorama
(1) Motion Detection. The hybrid Gauss background model was applied to detect moving blobs (foreground). The results are demonstrated in Figure 6. The moving blobs, especially cars, have high accuracy to generate nice matching pairs.
(a)
(b)
(c)
(d)
After comotion process, landpoints comotion statistics map [9] is shown in Figure 7. The higher the statistics value at a given position the brighter the corresponding pixel in the image.
(a)
(b)
(c)
(d)
(2) Overhead View. Four points easily to be measured on the ground were chosen, and the corresponding image coordinates were picked manually as shown in Figure 8.
(a)
(b)
(3) Panorama and Satellite Map Texture Registration. Figure 9 is the result of merging ROI of realtime traffic panorama and 2D satellite map texture. It is shown in Figure 9 on the left. On the right there are 6channel real HD video images.
(4) Viewpoint Roam. The 3D model is driven based on OSG. Users can observe the road at any viewpoint. In Figure 10, we give two screenshots from two viewpoints. It helps observers to inspect the traffic situation in any direction and react promptly in emergency.
(a)
(b)
4.2. Experiments on Other Roads
Figure 11 shows the experiments on other two roads based on the method proposed. One is the panorama of civic center from 6channel videos. The other is the panorama of a plaza from 2channel videos. The polygon region in red dotted lines is mosaicked image.
(a)
(b)
4.3. Comparative Experiments
A multisource images (visible image, satellite map texture, and 3D scene model) fusion method was proposed. By applying the method in [6], we have found the regions in the neighborhood of tall buildings have much projective deformation, as shown in Figure 12. The polygon region in red dotted lines is mosaicked image from realtime video images. And the polygon region in sky blue dotted lines is ROI of panorama, which is clipped out in our method as shown in Figure 12.
5. Conclusion
This paper implements a reasonable integration of 2D multiview overhead panorama, 2D satellite map texture, and 3D model to synthesize 3D road scene. The 3D road scene monitoring system has been successfully applied to traffic supervision. The fusion method is innovative and enables the observers to have the sense of immersion. Besides, 3D road scene OSG driven can be translated, rotated, and scaled as one wishes.
In this paper, we focused mainly on the generation of 2D panorama and the presentation of 3D road scene. In future work, we will research on intelligent video analysis and traffic situation analysis.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work was supported in part by the National High Technology Research and Development Program (863 Program) of China under Grants 2012AA0118041 and 2013AA013802, in part by the Open Research Fund of Key Laboratory of Higher Education of Sichuan Province through the Enterprise Informationalization and Internet of Things under Grant 2013WZJ01, and in part by the Scientific Research Project of Sichuan University of Science and Engineering under Grants 2014PY08 and 2014RC02.
References
 C.H. Chang, Y. Sato, and Y.Y. Chuang, “Shape preserving half projective warps for image stitching,” June 2014. View at: Google Scholar
 R. Hartley and A. Zisserman, Multiple view geometry in computer vision, Cambridge University Press, 2nd edition, 2003. View at: MathSciNet
 J. Zaragoza, T.J. Chin, M. S. Brown, and D. Suter, “Asprojectiveaspossible image stitching with moving dlt,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’13), pp. 2339–2346, IEEE, 2013. View at: Google Scholar
 R. Ai, Z. Shi, D. Xu, and C. Zhang, “A line mapping based automatic registration algorithm of infrared and visible images,” in Proceedings of the 5th International Symposium on Photoelectronic Detection and Imaging(ISPDI ’13), p. 89072J, International Society for Optics and Photonics, 2013. View at: Google Scholar
 X. W. Zhang, Y. N. Zhang, T. Yang, X. G. Zhang, and D. P. Shao, “Automatic visualthermal image sequence registration based on comotion,” Acta Automatica Sinica, vol. 36, no. 9, pp. 1220–1231, 2010. View at: Publisher Site  Google Scholar  MathSciNet
 L. Lee, R. Romano, and G. Stein, “Monitoring activities from multiple video streams: establishing a common coordinate frame,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 758–767, 2000. View at: Publisher Site  Google Scholar
 M. Hu, J. Lou, W. Hu, and T. Tan, “Multicamera correspondence based on principal axis of human body,” in Proceedings of the International Conference on Image Processing (ICIP ’04), pp. 1057–1060, October 2004. View at: Google Scholar
 Z. Szlávik, L. Havasi, and T. Szirányi, “Estimation of common groundplane based on comotion statistics,” in Image Analysis and Recognition, vol. 3212 of Lecture Notes in Computer Science, pp. 347–354, Springer, 2004. View at: Google Scholar
 Z. Szlávik, T. Szirányi, and L. Havasi, “Video camera registration using accumulated comotion maps,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 61, no. 5, pp. 298–306, 2007. View at: Publisher Site  Google Scholar
 T. Schnell, Y. Kwon, S. Merchant, and T. Etherington, “Improved flight technical performance in flight decks equipped with synthetic visi on information system displays,” International Journal of Aviation Psychology, vol. 14, no. 1, pp. 79–102, 2004. View at: Publisher Site  Google Scholar
 A. L. Alexander, C. D. Wickens, and T. J. Hardy, “Synthetic vision systems: the effects of guidance symbology, display size, and field of view,” Human Factors, vol. 47, no. 4, pp. 693–707, 2005. View at: Publisher Site  Google Scholar
 C. J. Liu, Y. Zhang, K. K. Tan, and H. Y. Yang, “Sensor fusion method for horizon detection from an aircraft in low visibility conditions,” IEEE Transactions On Instrumentation and Measurement, vol. 63, no. 3, pp. 620–627, 2014. View at: Google Scholar
 C. J. Liu, Q. J. Zhao, Y. Zhang, and K. K. Tan, “Runway extraction in low visibility conditions based on sensor fusion method,” IEEE Sensors Journal, vol. 14, no. 6, pp. 1980–1987, 2014. View at: Google Scholar
 D. M. Mount and S. Arya, ANN: library for approximate nearest neighbour searching, 1998.
 M. Brown and D. G. Lowe, “Automatic panoramic image stitching using invariant features,” International Journal of Computer Vision, vol. 74, no. 1, pp. 59–73, 2007. View at: Publisher Site  Google Scholar
 S. Choi, T. Kim, and W. Yu, “Performance evaluation of ransac family,” Journal of Computer Vision, vol. 24, no. 3, pp. 271–300, 1997. View at: Google Scholar
 T. K. Koo and Y. B. Aw, “A threedimensional visualization approach to traffic accident mapping,” Photogrammetric Engineering & Remote Sensing, vol. 57, no. 7, pp. 921–925, 1991. View at: Google Scholar
 D. H. Maling, “Coordinate systems and map projections for gis,” in Geographical Information Systems: Principles and Applications, pp. 135–146, John Wiley & Sons, 1991. View at: Google Scholar
 R. Reulke, S. Bauer, T. Dring, and F. Meysel, “Traffic surveillance using multicamera detection and multitarget tracking,” in Image and Vision Computing New Zealand, pp. 175–180, 2007. View at: Google Scholar
Copyright
Copyright © 2014 Yuezhou Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.