Abstract
We propose a new method for the motion segmentation using a moving camera. The proposed method classifies each image pixel in the image sequence as the background or the motion regions by applying a novel three-view constraint called the “parallax-based multiplanar constraint.” This new three-view constraint, being the main contribution of this paper, is derived from the relative projective structure of two points in three different views and implemented within the “Plane + Parallax” framework. The parallax-based multiplanar constraint overcomes the problem of the previous geometry constraint and does not require the reference plane to be constant across multiple views. Unlike the epipolar constraint, the parallax-based multiplanar constraint modifies the surface degradation to the line degradation to detect the motion objects followed by a moving camera in the same direction. We evaluate the proposed method with several video sequences to demonstrate the effectiveness and robustness of the parallax-based multiplanar constraint.
1. Introduction
The ground motion detection is an essential challenge in many computer vision and video processing tasks, such as vision-based motion analysis, intelligent surveillance, and regional defense. When the prior knowledge of the motion object appearance and shape is not available, the change detection or optical flow can still provide powerful motion-based cues for segmenting and localizing the objects, even when the objects move in a cluttered environment or are partially occluded. The aim of the ground motion object detection is to segment the motion objects out according to the motions in the image sequence whether the platform is moving or not.
For extensive study in detecting the motion objects in the image sequence captured by a moving camera, the scene contains multiple objects moving in the background and the background may also contain a strong parallax produced by the 3D structures. The motion segmentation in dynamic image background is inherently difficult, for the moving camera induces 2D motion for each pixel. The motion of pixels in moving objects is generated by both the independent object motion and the camera motion. In contrast, the motion of pixels in the static background is strictly due to the camera motion. Our goal is to utilize multiview geometric constraints to segment the motion objects from the video sequence.
The first geometric constraint used in detecting motion objects is the homography constraint in 2D plane mode [1, 2]. The homography matrix is a global motion model which can compensate for the camera motion between consecutive images. The pixels which are consistent with homography constraint are considered to belong to the static background [3]. However, those inconsistent pixels may correspond to the motion objects or parallax regions [4, 5]. Because the homography constraint cannot distinguish the parallax regions and the moving objects, the epipolar constraint as the supplement of the homography constraint is used in the motion segmentation [6, 7].
The epipolar constraint is a commonly used constraint for motion segmentation between two views [1, 8, 9]. There are two corresponding feature points in two images from the different views. If a feature point in one image does not lie on the epipolar line induced by its matched feature point in another image, then the corresponding 3D point will be determined to be moving [10]. However, the epipolar constraint is not sufficient to detect all kinds of 3D motion. Indeed, when the motion objects move on a special plane in 3D, then the epipolar constraint cannot detect them [1]. This phenomenon is called “surface degradation.” The 3D point moves on the epipolar plane which is formed by two camera centers of the different views and the point itself, so its 2D projections move along the epipolar lines. In this case, the motion objects cannot be detected by the epipolar constraint. The surface degradation often happens when the moving camera follows the objects moving along the same line.
In order to overcome the surface degradation of the epipolar constraint, the geometric constraints over more than two views need to be imposed. The trilinear constraint can be applied to segment the motion objects across three views [1, 11]. However, estimating the parameters of the trifocal tensor is a nontrivial task, which requires accurate correspondence of the points and large camera motion.
In this paper, inspired by [12], we propose a novel three-view constraint which is named “parallax-based multiplanar constraint.” The parallax-based multiplanar constraint as the supplement of the epipolar constraint modifies the surface degradation to the line degradation. Compared to the previous method based on the “Plane + Parallax” framework [12–15], the parallax-based multiplanar constraint can segment the motion objects without a fixed reference plane. The main contributions can be summarized as follows.(1)The parallax-based multiplanar constraint can segment the motion objects without a fixed reference plane. The traditional methods [13–15] assume that the reference plane is consistent across three views. However, this assumption is not valid sometimes. The parallax-based multiplanar constraint is inspired by [12] and segmenting the motion object without a fixed reference plane based on “Plane + Parallax” framework. This is the main contribution in this paper.(2)A reference point is introduced to replace the epipole. The calculation of the epipole is inaccurate when the motion vectors of the feature points are followed by a moving camera in the same direction [13]. The reference point can improve the accuracy of the parameters to get better results for the motion segmentation.(3)A motion segmentation framework based on the parallax-based multiplanar constraint is proposed. In the motion segmentation framework, the parallax-based multiplanar constraint and the homography constraint are applied in the “Plane + Parallax” framework. This motion segmentation framework can reduce the run time and apply the parallax-based multiplanar constraint into real-time system.
The paper is organized as follows. In Section 2, we briefly review the existing approaches related to our work. Section 3 formally describes the epipolar constraint and the surface degradation that the epipolar constraint is unable to handle. In Section 4, we briefly review the definition of the parallax-based rigidity constraint in [12]. We then introduce the parallax-based multiplanar constraint and its degenerate cases in Section 5. The application of the parallax-based multiplanar constraint is explained in Section 6. The experimental results are shown and discussed in Section 7. Section 8 concludes our paper and presents possible directions for future research.
2. Related Work
From what is described in Section 1, the methods of the motion objects detection from a moving camera are vast. The parallax-based multiplanar constraint is based on “Plane + Parallax” framework to segment the motion objects. It considers that the image sequence can be decomposed into the reference plane, the parallax, and the motion objects. Thus, the motion segmentation methods based on the background subtraction and the motion segmentation with the strong parallax are the most related topics to this paper.
The background subtraction method has a wide range of applications in the static camera [14]. A novel framework to segment the motion objects by detecting contiguous outliers in the low-rank representation is proposed in [4, 15]. It avoids the complicated motion computation by formulating the problem as outlier detection and makes use of the low-rank modeling to deal with complex background. A new method is based on Dirichlet process Gaussian mixture models, which are used to estimate per-pixel background distributions. It is followed by the probabilistic regularization. Using a nonparametric bayesian method allows per-pixel mode counts to be automatically inferred, avoiding over-/underfitting [16]. These methods have get the better effect in motion segmentation without the strong parallax.
For motion segmentation with the strong parallax, sparse motion field estimation is a common approach [17, 18]. The sparse motion field of the corners is recovered and the corners that belong to the same motion pattern are classified according to their motion consistency [17, 19, 20]. The constraint equations are applied in the optical flow to decompose the background and foreground [21]. An effective approach is to do background subtraction for complex videos by decomposing the motion trajectory matrix into a low rank one and a group sparsity one. Then the information from these trajectories is used to further label foreground at the pixel level [22]. The motion segmentation approaches [23] segment the point trajectories based on subspace analysis. These algorithms provide interesting analysis on sparse trajectories, though they do not output a binary mask as many background subtraction methods do. However, most of the methods based on sparse motion field estimation assume that the motion object can be represented by the feature points (example as Harris corners). This assumption is invalid in many cases, so the detect rate is poor for these methods.
3. Epipolar Constraint and Surface Degradation
The epipolar geometry is the intrinsic projective geometry between two views. It is independent of the scene structure and only depends on the cameras’ internal parameters and relative pose of the camera. The epipolar constraint is usually used in the motion segmentation in two views. The fundamental matrix is the algebraic representation of epipolar geometry [1, 8].
Suppose that there are two images acquired by cameras with noncoincident centers; then the fundamental matrix is the homogeneous matrix between view 1 and view 2 which satisfies for all corresponding points and . If is a 3D static point, and are the projection of in image 1 and image 2 which are from view 1 and view 2.
If point moves between view 1 and view 2, the position of the 3D point in view 2 is denoted as ; is therefore the projection of point in image 2 and the epipolar line is . In this case, does not lie on and does not lie on . The pixel-to-line distance is used to measure how the point pair deviates from the epipolar lines where and are the perpendicular distances from to and from to , respectively, as shown in Figure 1(a).

(a)

(b)
Furthermore, is used to detect whether the 3D point is moving or not. If , is moving. However, there is a special kind of case, called “surface degradation,” where the moving points cannot be detected by the epipolar constraint. The surface degradation happens when the objects are moving in a special plane, as illustrated in Figure 1(b). If point and the camera centers and constitute a plane in 3D Euclidean space and point moves to and , point lies on and point lies on in 2D images. In this situation, and the surface degradation happens.
Unfortunately, the camera follows the motion objects moving in the same direction in many practical cases [12, 24]. If the camera follows the objects moving in the same direction, the surface degradation may happen. In order to solve the surface degradation, multiview constraints need to be introduced. Therefore, in Sections 4 and 5, the novel three-view constraints are proposed to segment the motion object.
4. Parallax-Based Rigidity Constraint
The “Plane + Parallax” framework [12, 15, 25] extends the 2D parametric registration approach to general 3D scenes. The plane registration process (using the dominant 2D parametric transformation) removes all effects of camera rotation, zoom, and calibration, without explicitly computing them. The residual image motion after the plane registration is only due to the translational motion of the camera and to the deviations of the scene structure from the planar surface.
4.1. “Plane + Parallax” Framework
Figure 2 provides the geometric interpretation of the “Planar + Parallax” framework. Let denote a 3D static point and let and denote the coordinates of in different camera views. Let the rotation matrix and the translation vector denote the rotation and translation between the camera systems.

Let and denote the image coordinates of the 3D point projected onto two different views. The homogeneous expression can be denoted as and . Let be an arbitrary planar surface and . denote the homography matrix that aligns the planar surface between two different views. We can describe it as [1].
Define , where is the 2D image displacement vector of the 3D point between two different views. It can be shown, as well, that where denote the planar part of the 2D image motion and denote the residual planar parallax 2D motion. When , where denotes the image point in view 1 which results from warping the corresponding in view 2, by the 2D parametric transformation of the reference plane . The first view is referred to as the reference view. Also, is the perpendicular distance from the second camera center to the reference plane , and noted denotes the epipole. is a measure of the 3D shape of the 3D point . In particular, , where is the perpendicular distance from to the reference plane , and is -distance of point in the first camera coordinate systems. We refer to the projective 3D structure of point . The use of the “Plane + Parallax” framework for ego-motion estimation is described in [12], and for 3D shape recovery it is described in [26]. The “Plane + Parallax” framework is more general than the traditional decomposition in terms of rotational and translational motion.
4.2. Parallax-Based Rigidity Constraint
Theorem 1. Given the planar-parallax displacement vectors and of two points that belong to the static background scene, their relative 3D projective structure is given by where, as shown in Figure 3(a), and are the image locations of two points that are part of the static scene, , the vector connects the “warped” locations of the corresponding points in view 2, and signifies a vector perpendicular to [12].

(a)

(b)
Form Figure 3(a), when the epipole is stable. However, when the parallax vectors are nearly parallel, the epipole estimation is unreliable. However, the relative structure can be reliably computed in this case (see Figure 3(b)).
Theorem 1 is called “parallax-based shape constraint,” proved in [12], and it is noted that this constraint directly relates the relative projective structure of two points to their parallax displacements alone: no camera parameters, in particular the epipole (FOE), which is difficult to calculate accurately [27–29]. This is different from the traditional methods that use the two parallax vectors to recover the epipole and then use the magnitudes and distances of the points from the computed epipole to estimate their relative projective structure. The benefit of the constraint (5) is that it provides this information directly from the positions and parallax vectors of the two points, without the need to go through the computation of the epipole [12].
Theorem 2. Given the planar-parallax displacement vectors of two points that belong to the background static scene over view 1, view 2, and view 3, the following constraint must be satisfied: where , are the parallax displacement vectors of the two points between the reference plane and view 1. , are the parallax vectors between the reference plane and view 2, and , are the corresponding distances between the warped points [12].
In the case of the parallax-based shape constraint, the parallax-based rigidity constraint (Theorem 2) relates the parallax vectors of the pairs of points over three views without referring to any camera parameters. However, the parallax-based rigidity constraint assumes that the reference plane is consistent across three views. This assumption is not valid sometimes, since the interframe homographies are automatically estimated and the reference planes may correspond to different parts of the scene.
5. Parallax-Based Multiplanar Constraint
In this work, we propose a novel three-view constraint, which is called the “parallax-based multiplanar constraint.” This constraint is capable of detecting the motion object that the epipolar constraint cannot detect without a fixed reference plane across three views.
5.1. Description of the Parallax-Based Multiplanar Constraint
Theorem 3. The image points and given in view 1 and view 2, which are projected by the 3D point which belongs to the background, must all satisfy the following constraint: where is the relative projective structure for view 1 to view 2 and is the relative projective structure for view 2 to view 3. is a matrix (proof: see Appendix A).
Theorem 3 is called “parallax-based multiplanar constraint.” The parallax-based multiplanar constraint represents a constraint for the same point in the background by their relative 3D projective structure. This constraint can detect the moving objects from the moving camera without a fixed reference plane across three views. Its degenerate case is modified from the surface degradation to the line degradation (it is discussed in Section 5.2).
5.2. Degradation of the Parallax-Based Multiplanar Constraint
The parallax-based multiplanar constraint uses the relative 3D projective structure from three views to detect the moving objects. This constraint is capable of detecting most of the degenerate cases mentioned in this paper. However, there still exists a degenerate case that cannot be detected.
Result 4. There is a 3D moving point and its axis coordinates in the view 1, view 2 and view 3 are equal. The parallax-based multiplanar constraint cannot detect this moving point (proof: see Appendix B).
Figure 4 shows the degenerate case for parallax-based multiplanar constraint. Fortunately, these cases happen much less frequently in reality, because the proportional relationship is not easily satisfied.

6. Application of the Parallax-Based Multiplanar Constraint
In this section, we present some implementation details of a detecting and tracking moving objects system based on the parallax-based multiplanar constraint. As shown in Figure 5, the system is built as a pipeline of five stages: feature point matching, plane segmentation, dense optical flow, object extraction, and spatiotemporal tracking.

This system starts with the feature point matching. Then, the homography parameters and the parallax-based multiplanar constraint parameters are estimated by the feature points matching. We can get the plane residual image which is composed of the pixels which are not satisfied for the homography constraint. The motion field of the binary of the plane residual image can be obtained by the dense optical flow. The parallax-based multiplanar constraint can distinguish the parallax pixels and motion pixels from the plane residual image. Finally, the 2D motion pixels obtained from each frame are linked into motion trajectories by a spatiotemporal tracking algorithm.
The Kanade-Lucas-Tomasi (KLT) feature tracking [30–32] is applied to extract and track feature points in the image sequence . is the temporal window size.
The homography parameters can be estimated by the method described in [1]. can be warped to by the homography matrix. Then, after estimating the background model [33, 34] (we use the single gauss algorithm in this work), we obtain the binary image of the plane residual image which is composed by the pixels with intensity differences larger than threshold .
We chose three images (, , and , ( is the time interval)) from the image sequence and estimate parallax-based multiplanar constraint parameters by the corresponding feature points. The parallax-based multiplanar constraint parameters are estimated by the similar method of estimating the fundamental matrix [1]. is obtained by singular value decomposition. The random sample consensus (RANSAC) scheme is a common choice, which finds a solution with the largest inlier support [1].
The motion field of the pixel in the binary image of the plane residual image can be acquired by the dense optical flow in [31, 32]. We define an algebraic error function through (7): When , this pixel is in the motion region. On the contrary (), the pixel is in parallax region. is a parallax threshold. So we can extract the motion object from the plane residual image and get the motion binary image.
The motion binary images are further refined by standard morphological operations such as erosion and dilation. The connected pixels are grouped into compact motion regions, whereas scattered pixels are removed. The tracking step takes image appearance, 2D motion vectors, and motion likelihood of these regions as its observation and link similar regions into object trajectories. Since the object tracking is not the focus of this paper, interested readers can refer to [35, 36].
7. Experiment and Analysis
In this section, we present the experimental results obtained by a number of video sequences. In all the video sequences, the camera undergoes general rotation and translation. Both the qualitative and quantitative results demonstrate the effectiveness and robustness of our method.
7.1. Qualitative Evaluation
There are five video sequences that have been adopted to qualitatively demonstrate the effectiveness and robustness of the parallax-based multiplanar constraint.
In Figure 6, we show the segmenting results of a video sequence which is captured in the laboratory. We use checkerboard pattern that only has black and white checks to compose the background of the video and a cylindrical object as a motion object. So, we can call this video “chessboard.” This background can ensure that there are enough feature points (Harris corners are accepted in this paper) in it. The video is captured by a moving gray camera; the resolution of the video is ; the frame frequency is 25 fps. In this paper, the parameters are , , , and . We show the three-frame images (#148, #153, and #158) in the video sequence which are shown in Figures 6(a), 6(b), and 6(c) and the red points are defined as the reference points. The camera translates from the left to right. In this video sequence the reference plane is the checkerboard. There are two static objects as the parallax regions. After computing by 2D registration [1], the parallax and motion regions are obtained from the plane residual image which is shown in Figure 6(d). As can be seen in Figure 6(d), the two parallax regions are clear. Figure 6(e) is the residual image of the parallax-based multiplanar constraint. The intensity of the motion region is greater than the other region whether including or not the parallax region. In Figure 6(f), it is the binary result of the residual image of the parallax-based multiplanar constraint and shown that the parallax regions (the two static objects) are eliminated finally.

(a)

(b)

(c)

(d)

(e)

(f)
The second video sequence is the experiment video of [12] and is named “car 1.” Its resolution is ; the frame frequency is 25 fps. In this paper, the parameters are , , , and . Figure 7(a) is the original image (#17) and the red points are defined as the reference points. In this sequence the camera is in motion (translating from the left to right), inducing parallax motion of different magnitudes on the house, road, and road sign. The car moves independently from the left to right. Figure 7(b) is the plane residual image and Figure 7(c) is the binary result of the plane residual image. Because the car is followed by a moving camera in the same direction, the surface degradation of the epipolar constraint happened and is shown in Figure 7(d). Figure 7(e) is the residual of the parallax-based multiplanar constraint computed over three frames. The final binary result is shown in Figure 7(f). From Figure 7, the parallax-based multiplanar constraint modifies the surface degradation to the line degradation to segment motion objects followed by a moving camera in the same direction.

(a)

(b)

(c)

(d)

(e)

(f)
In Figure 8, it is shown that a camera moves from the left to right and there is a car that moves from the right to left. We can call it “car 2.” This video is captured by a gray camera. Its resolution is ; the frame frequency is 25 fps. In this paper, the parameters are , , , and . In this video sequence, three-frame images (#19, #21, and #23) are shown in Figures 8(a), 8(b), and 8(c) and the red points are defined as the reference points. In Figures 8(a) and 8(b), the green points are the corner points which are the inner points for reference plane between frames 19 and 21. In Figures 8(b) and 8(c), the blue points are the corner points which are the inner points for reference plane between frames 21 and 23. So, the reference plane is changed from frame 19 to frame 23. Figure 8(d) is the plane residual image. The motion regions cannot be segmented from the residual image of the parallax-based rigidity constraint, which is shown in Figure 8(e), because of the change of the reference plane. Figure 8(f) is the residual image of the parallax-based multiplanar constraint. From Figure 8, the parallax-based multiplanar constraint can obtain a better effect compared with the parallax-based rigidity constraint, because it does not need a fixed reference plane over three frames.

(a)

(b)

(c)

(d)

(e)

(f)
Figure 9 is an infrared video acquired from the VIVID dataset. Its resolution is ; the frame frequency is 30 fps. In this paper, the parameters are , , , and . The camera is in unmanned aerial vehicle. There are three cars moving on the road. So, we can call it “cars 1.” The building is considered as the parallax region. The first row is the original images from 71 to 77 and the red points are defined as the reference points. The second row is the plane residual images. The third row is the residual of the parallax-based multiplanar constraint. The final binary results are shown in the fourth row. We demonstrate the potential of the parallax-based multiplanar constraint applied to the motion segmentation problems using the Berkeley motion segmentation dataset in Figure 10. In this video, there is a car moving on the road and the camera moves from the right to left. It is called “car 4.” Its resolution is ; the frame frequency is 30 fps. In this paper, the parameters are , , , and . The first row is the original images from 13 to 17 and the red points are defined as the reference points. The second row is the plane residual images. The third row is the residual of the parallax-based multiplanar constraint. The final binary results are shown in the fourth row.


From all of the above experiments, we can know that the parallax-based multiplanar constraint can segment the motion regions from the “moving” background. First of all, compared with the homography constraint, the parallax-based multiplanar constraint can segment the parallax regions and the motion regions. Secondly, the parallax-based multiplanar constraint modifies the surface degradation of the epipolar constraint to the line degradation and can detect the motion object which follows the direction of the camera move. Thirdly, in the process of motion segmentation, the parallax-based multiplanar constraint does not need a fixed reference plane across three views. Therefore, this method can effectively extract the motion object from a moving camera and this camera is uncalibrated.
7.2. Quantitative Evaluation
In order to quantitatively evaluate the performance of our system, we have manually labeled the ground-truth data on the above video sequences. The ground-truth data refer to a number of 2D polygons in each video frame, which approximate the contour of motion regions. For the “chessboard” and “car 2” videos, there are 20 frames labeled in different parts.
Based on the ground-truth and detected motion mask images, we define two area-based metrics to evaluate our method [37]. Let denote the set of pixels that belong to the ground-truth motion regions in frame and let denote the set of the actually detected pixels in frame . We define a detection rate to evaluate how many detected pixels lie in the ground-truth motion regions as and a precision rate to evaluate how many detected pixels are indeed motion pixels as where is the complement set of and is the number of pixels within . In this, and . The higher both measures are, the better the performance of motion segmentation becomes.
The detection rate and the precision rate measures are computed over the labeled video frames to evaluate the performance of our motion segmentation method. For the “chessboard” and “car 1” videos, we evaluate four moving segmentation methods: epipolar constraint [1], parallax-based rigidity constraint [12], detecting contiguous outliers in the low-rank representation (DECOLOR) [4], and our method.
The first line and the third line of Figure 11 are the ground-truth data; the second line and fourth line are the motion segmentation results of the parallax-based multiplanar constraint for the “chessboard” video. The red points are defined as the reference points.

Let us quantitatively compare the performance of the methods based on the curves of the detection rate and precision rate. The detection rate of the epipolar constraint is low compared with the other methods in Figure 12(a) for the surface degradation. The DECOLOR method is based on the homography constraint so that the parallax regions and the motion regions are considered as “motion objects.” Its precision rate is lower than the other methods in Figure 12(b). In Figure 12, the parallax-based multiplanar overcomes the surface degradation of the epipolar constraint. it can get the better effect in the detection rate and the precision rate.

(a)

(b)
Figure 13 is the ground-truth data and the motion segmentation results of the parallax-based multiplanar constraint for “car 2” video which is similar to Figure 11.

In Figure 13, when the reference planar is changed from frame 262 to frame 264, there are a lot of false alarms detected by the parallax-based rigidity constraint and DECOLOR which are shown in Figure 14(b). In contrast, the parallax-based multiplanar constraint segments the motion objects without a fixed reference plane, so it performs better in precision rate.

(a)

(b)
7.3. Parameter Selection
There are a few parameters that are found to be critical for system performance.
The first important measure is the temporal window size . This parameter is used by the homograph image registration to get the plane background. relates to the frame frequency and the size of camera motion. If is set too small, the detect rate may decline. On the contrary, if it is set too big, the overlap region is too small to not detect the motion object and the false alarm probability may increase for the accumulated errors. is proportional to the frame frequency and inversely proportional to the size of camera motion.
The second one, that is, the time interval , is used for the estimation of the parallax-based multiplanar constraint parameters. also relates to the frame frequency and the size of camera motion. If the difference of continuous image is rather small, needs to be increased for a stable estimation of the parallax-based multiplanar constraint parameters.
The third parameter is the homograph threshold . is set at low value to make sure that there are enough pixels to compute the . This threshold needs to be adjusted to different scene configurations in order to include all the possible motion pixels and enough parallax pixels as well. However, if is set too small, the run time may increase.
The fourth parameter is the parallax threshold . This parameter is used to segment the parallax distance to detect the motion objects. relates to the time interval and is proportional to it.
8. Conclusion
We have presented a novel method for detecting moving objects in video sequences captured from moving cameras. It uses the multiview geometric constraints for motion detection in three or multiple views. Moreover, the parallax-based multiplanar constraint this paper proposed overcomes the problem of the previous geometry constraint and does not require the reference plane is constant across the multiple views and modifies the surface degradation of the epipolar constraint to the line degradation. It can detect the motion objects followed by a moving camera in the same direction. The experimental results demonstrate the effectiveness and robustness of our approach.
There are several doable directions for future work to be carried out. An appropriate reference point can be found for computing the parallax. If the camera projection matrices are known or obtained by the self-calibration techniques [1], then both the static background and the moving objects can be reconstructed and aligned together in the 3D Euclidean space.
Appendices
A. Derivation of the Parallax-Based Multiplanar Constraint
In this appendix, we prove Theorem 3; we derive (7).
Let be a 3D static point and its 3D coordinates in view 1, view 2, and view 3 are expressed as , , and . There is another 3D static point as the reference point and its 3D coordinates are expressed in three views , , and . , , and of the 3D point are the homogeneous image coordinates in image 1, image 2, and image 3 and , , and of the 3D point are the homogeneous image coordinates in image 1, image 2, and image 3, respectively.
From Section 4, for view 1 and view 2, we know that the 3D projective structure of point and the 3D projective structure of point are
Because point is reference point, the 3D projective structure of point is an invariant for all image points. We can define and is a constant factor for the other points.
From (A.1), we can know that
For the 3D point , where is the normal vector of plane scaled by . is the perpendicular distance from the camera center of view 1 to the reference plane .
Substituting (A.3) in (A.2) obtains
The camera model can be represented as
Substituting (A.5) in (A.4) obtains
Similarly, we can get for view 2 and view 3.
Let denote the third row of rotation matrix and let denote third component of translation vector . The 3D depth of point could be related to that of by extracting the third row in as
Substituting (A.5) into (A.8), we have
Substituting (A.6) and (A.7) into (A.9), we can obtain
By rewriting (A.10), we have
So we can get the parallax-based multiplanar constraint
B. Degradation of the Parallax-Based Multiplanar Constraint
In this appendix, we prove Result 4 by the algebraic approach; we describe the degradation of the parallax-based multiplanar constraint.
Let , , and denote the 3D corresponding points in the three views. Assume and, according to (A.6) and (A.7), we can get
Substituting (B.1) into (A.10), we can eliminate the left polynomial Decomposing (B.2), we can get Because , (B.3) is valid in any case. We can derive that the degradation of the parallax-based multiplanar constraint is that the parallax-based multiplanar constraint cannot detect the motion object when the -distance of the 3D point in camera coordinate systems at time is equal to ().
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This project is supported by the Natural Science Foundation of Jiangsu Province of China, under Grant no. BK20130769, Jiangsu Province High-level Talents in Six Industries, no. 2012-DZXX-037, and Program for New Century Excellent Talents in University, no. NCET-12-0630.