Research Article | Open Access
Exploiting Visibility Information in Surface Reconstruction to Preserve Weakly Supported Surfaces
We present a novel method for 3D surface reconstruction from an input cloud of 3D points augmented with visibility information. We observe that it is possible to reconstruct surfaces that do not contain input points. Instead of modeling the surface from input points, we model free space from visibility information of the input points. The complement of the modeled free space is considered full space. The surface occurs at interface between the free and the full space. We show that under certain conditions a part of the full space surrounded by the free space must contain a real object also when the real object does not contain any input points; that is, an occluder reveals itself through occlusion. Our key contribution is the proposal of a new interface classifier that can also detect the occluder interface just from the visibility of input points. We use the interface classifier to modify the state-of-the-art surface reconstruction method so that it gains the ability to reconstruct weakly supported surfaces. We evaluate proposed method on datasets augmented with different levels of noise, undersampling, and amount of outliers. We show that the proposed method outperforms other methods in accuracy and ability to reconstruct weakly supported surfaces.
The problem of surface reconstruction from input 3D points is well studied and different solutions are widely used in many industries. However, producing complete reconstructions of outdoor and complicated scenes is still an open problem. The input points can be obtained using different techniques. Laser scanners and structured-light based systems along with a registration method are probably the most widely used as sources of input points. Recently, images processed with a structure-from-motion (SfM) method [1–4] and a multiview-stereo (MVS) method [5–14] have become another popular source of input points. The input points can almost always (with respect to the acquisition technique) be augmented with visibility information. By visibility information of an input point, we mean that 3D position(s) of sensor(s) that produce(s) the input point is/are known. We consider input points that are close to the real surface but do not represent the real surface as noise and input points that are far from the real surface as outliers.
Surfaces that do not have strong support in the input points but represent real surfaces in the scene, that is, weakly supported surfaces, are essential for achieving complete reconstructions. Such surfaces may be transparent, highly reflective, and lacking in texture or, as in the case of the ground planes, they can be occluded by moving objects such as people and cars. weakly supported surfaces are present especially when the input points were obtained from images processed by a SfM method and a MVS method. Frequent examples are the ground planes in tourist photo collections, which are often blurred, since the cameras are mostly focused on objects of interest (facades and landmarks) and on people above the ground. Additionally, input points can contain relatively large amounts of outliers. The density of outlier points is often similar or greater than the density of points of a weakly supported surface. In this paper we propose a novel method that can reconstruct these difficult weakly supported surfaces.
2. Related Work
Here we review previous work that is most relevant to the method proposed in this paper.
Silhouette based methods are mostly related to the idea of visual hull that was introduced by Laurentini in . This technique relies on the ability to clearly separate the scene objects from the background on input images. These objects silhouettes are used to infer shapes. A more robust approach exploiting the same idea is proposed in . This approach introduces occupancy grid concept. It infers where and how likely the matter is present in the scene from given silhouette cues. The method proposed in  obtains moving occluder silhouette information from video sequences of static calibrated cameras using a background subtraction technique. A Bayesian sensor fusion formulation is used to process all occlusion cues in the multiview sequence. The method proposed in  is similar to the method proposed in  but applies state-of-the-art natural image matting algorithms for multiview sequences to obtain occluder silhouette cues. The image matting information is later merged into a 3D coordinate and a composite model is created. All these methods rely on silhouette cues information and use it to reconstruct occluders in the scene. Our method builds on another principle and does not need any occluder silhouette information.
Local depth-map filtering methods [8, 13, 19–22] alternate depth-map computation from images with depth-map filtering in order to obtain complete and outlier-less depth maps. Some of them [8, 13, 19, 20] use a visibility-based filtering approach. Another  uses buddle adjustment minimization of the camera reprojection error to filter outliers and improve accuracy. The final filtered depth maps are later processed by a depth-map fusion method, usually the method proposed in , to produce the final surface 3D reconstruction. The depth-map filtering methods are rather conservative and tend to reconstruct only those surfaces that are strongly supported by the input data. Unlike our method the local depth-map filtering methods leave the space free where the support is weak.
Volumetric depth-map fusion methods have achieved great success in recent years. Depth maps often contain many outliers and are noisy or noncomplete, especially when computed from images. Some volumetric depth-map fusion methods [24–26] are designed to be robust against outliers and to produce complete, accurate, and visibility-consistent results. Some of them  are focused to produce smooth and high quality results. Another volumetric depth-map fusion method [27, 28] can produce high quality result fast; however, it requires many and relatively high-quality depth maps. Our method falls into the group of robust volumetric depth-map fusion methods.
Space carving introduced in  produces an approximate reconstruction called photo hull, which is, under very strict conditions, guaranteed to subsume all other photo-consistent reconstructions. The assumptions are so strict that the method is useless for reconstruction of outdoor scenes. It also does not reconstruct weakly supported surfaces well since all non-photo-consistent volume is carved out. Photo flux was recently introduced in . It is closely related to photo hull but allows the recovery of finer shape details without oversmoothing while still handling noise robustly. An approach to multiview image based 3D reconstruction by statistically inversing the ray-tracing based image generation process was presented in . The method proposed in  uses octree partitioning of 3D space and computes the minimal s-t cut of a graph derived from the octree in order to label each voxel as being inside or outside. The discontinuity cost used in  is based on photo consistency (or sensor depth map). All previously mentioned methods proposed in [25, 29–31] rely on 3D reconstruction of visible, solid, well-textured, and static scenes from images. The photo consistency of the scene surface plays an important role in all these methods while non-photo-consistent parts are usually not reconstructed. The methods do not pay any particular attention to occluders and weakly supported surfaces. Moreover, the methods use voxel-based volumetric representation of the space. Since the number of voxels depends on the resolution of the volume cubically, the computational and memory costs quickly reach machine limits. For this reason, the voxel-based volumetric methods are suited for reconstruction of small and compact objects only.
The methods proposed in [24, 32, 33] work with Delaunay tetrahedralization of the input points instead of voxel-based volumetric representation. The tetrahedral-based volumetric representation suites better for large real-world reconstructions than voxel-based one. Size of tetrahedra depends just on the spatial distribution and density of input points. Therefore we have adopted the tetrahedral volumetric representation in our method.
Pioneered by , both the method proposed in  and the method proposed in  are based on a minimal s-t cut approach. In both methods, an s-t graph is derived from the Delaunay tetrahedralization of the input points in order to reconstruct the final surface. The minimal s-t cut of the graph labels each tetrahedra as “being full” with a full label or “being free” with a free label. The final surface consists of the faces shared by the tetrahedra labeled as being “free” (empty space) and the tetrahedra labeled as being “full” (full space). Therefore, the surface can be seen as an interface between the free and the full spaces. The new element in the method proposed in  with respect to the method proposed in  is that it can reconstruct weakly supported surfaces. We provide a detailed comparison of both [24, 33] methods with respect to the newly proposed method later in the text.
Contributions. The method proposed in this work is a generalization of the method proposed in . The main contribution of the newly proposed method is that it produces more accurate results than the method proposed in  and accordingly is able to reconstruct weakly supported surfaces better. We propose a new interface classifier and we justify the design of the interface classifier by experiments. We show experimentally that the number of false positives (wrongly classified non-interface points as interface) is negligible and that the number of true positives (especially on weakly supported surfaces) is considerable. We demonstrate the superior performance of the proposed method over the methods proposed in [23, 24, 33] on several different datasets and provide a quantitative evaluation.
3. Weakly Supported Surfaces Reveal Themselves through Occlusion
The main idea of improving the reconstruction of weakly supported surfaces in  is that even weakly supported surfaces exhibit themselves by occluding other input points. We next review this idea for an infinite number of noiseless and outlier-less input points augmented with visibility information.
Figure 1(a) shows a two-dimensional schematic illustration of three sensors (blue, green, and red wedges) with their respective fields of view. The sensors observe the corner of an L-shaped object. Some points of the object are not observed by any sensor (shown as the solid black line), some are observed by one sensor, some are observed by two sensors (line segments marked by AB and by BC), and some are observed by all three sensors (corner marked by ABC). For simplicity, we assume that all points of the L-shaped object that are seen by at least two sensors are on the input. Note that we are assuming continuous case in this example. Therefore, the number of input points is infinite.
Let us next add a circular object as an occluder to the scene, Figure 1(b), which has no point in the input points. We can still reconstruct the same surface of L-shaped object as in Figure 1(a), but all points are now visible from only (and exactly) two sensors. It is important to notice here that while the set of input points has not changed, the visibility information of the input points has changed. The visibility information has changed in the way that an input point that is occluded by the occluder is not seen by the sensors in which it is occluded.
Let us now introduce a new concept that conveniently captures the effect of having an occluder in the scene. For each point in the space we can construct a measure of its emptiness. We will call this measure free-space support. Consider the point marked by “3” in Figure 1(c). The emptiness of this point is supported by three observations. There are three line segments passing through this point that connect the sensor centers A, B, and C with the respective input points. Hence, the free-space support of the point is equal to three. The emptiness of the point marked by “2” is supported only by two observations. The ray from sensor C through the point 2 does not end in an input point. It ends in a point seen by only one sensor: sensor C. Hence the free-space support of this point is equal to two. Note that just points that are seen by at least two sensors are on the input.
After the introduction of the occluder into the scene, Figure 1(d), the free-space support decreases at some points because some of the rays get blocked by the occluder.
Figure 1(e) shows the space partitioned by sensor visibility cones into regions with constant free-space support, which is denoted by black numbers. Figure 1(f) shows the same after introducing the occluder. We see that a region with zero free-space support surrounded by nonzero free-space support emerged in the scene center. Such a region provides evidence of an occluder even when no point on the occluder has been on the input.
We can consider the space with nonzero free-space support to be free and the complement to be full. Therefore, evidence of the occluder interface can be detected by nonzero to zero change of the free-space support in this ideal noiseless, outlier-less, and continuous case. Additionally, the L-shaped object has zero free-space support and nonzero to zero change is evidence for the L-shaped object interface as well. Therefore, we can say that nonzero to zero change of the free-space support is interface evidence for all objects.
Hallucinations. Figure 1(f) shows that there is another nonzero to zero change of the free-space support that is not an evidence for any object. It is the zero to nonzero change of the free-space support at the boundaries of visibility cones. We call these types of evidence hallucinations. Hallucinations cause problems in datasets where the scene is not captured from all sides.
3.1. Space Discretization by Delaunay Tetrahedralization
It is impossible to compute the free-space support for all points of the space. To be practical, the space is discretized into tetrahedra constructed by the Delaunay tetrahedralization  and the free-space support is evaluated for the tetrahedra. The space discretization is constructed by tetrahedralizing the union of the input points and sensor centers. By having the sensor centers in the tetrahedralization, the space between the sensors centers and the input points is covered by the tetrahedralization.
The interface evidence can be easily detected by nonzero to zero change of the free-space support in ideal noiseless, outliers-less, and continuous case. In real-world scenarios, however, the number of input points is finite and the amount of noise and outliers can be significant. Nevertheless, we experimentally show that the interface of a scene object, weakly supported as well as strongly supported by input points, can be detected by measuring large free-space support change of nearby tetrahedra. Let us introduce several notations first.
Tetrahedralization. We denote a tetrahedron of the Delaunay tetrahedralization of the input points as .
The Unit . We introduce a constant that equals times median of all edges lengths of all tetrahedra. The value represents the smallest reconstructible object.
Input Points and Segments (Lines) of Sight. We assume that we have on the input a set of sensor centers and a set of pairs where and is an input 3D point. The set is a set of input points augmented with visibility information. We call the line segment defined by 3D points the segment of sight or simply segment. We call the line defined by 3D points the line of sight. We call input points augmented with visibility information, that is, the set , the input segments of sight.
The set of input points is denoted as . Note that one input point can be augmented (and typically is) with multiple sensor centers .
Input Point Weights. We assign to each of the input points a weight denoted by . The weight reflects the amount of input points in surrounding the point .
Free-Space Support. Free-space support is a measurement of emptiness. Roughly speaking, if a sensor with center sees a point , then the segment of sight supports the emptiness of space between and . The free-space support of tetrahedron is computed from segments intersecting as where denotes the intersection of the line of sight and the tetrahedron .
Free-Space Support on a Line of Sight. We denote free-space support on line of sight as . Figure 2 illustrates how the free-space support on a line of sight is computed. The function is a piecewise linear function that interpolates the free-space supports of tetrahedra crossed by the line of sight (bottom). Note that the domain of the function is in units . We introduce the following notations to measure extremal values of the function on specific intervals:
The function measures the maximal value of in range , that is, in distance from the point towards the sensor . The function measures the average of maximal and minimal values of in range , that is, in distance from the point away from the sensor . See Figure 2.
Free-Space Support Jump. We define the following functions to be able to measure free-space support change in close surrounding of an input point:
The function evaluates the absolute change of the free-space support on the line of sight in the surrounding the input point . The function evaluates a relative change.
The functions are chosen this way because we want to be able to detect big change in free-space evidence near some 3D point. However, measuring just relative change is not enough because relative change of small (noise) numbers can be also significant. Therefore we study combination of absolute and relative change.
Outliers Level. If a point represents a real interface point, then the value of the function represents the amount of outlier lines of sight crossing the corresponding part of full space. The function reflects average free-space evidence accumulated in the space that should be occupied by an object. Only an outlier that is seen by a sensor but should be occluded by the object can contribute to this evidence; therefore the function reflects the outliers level.
The First and the Last Crossing Tetrahedron. The first crossing tetrahedron of a line segment is denoted by . It is the tetrahedron containing point , the tetrahedron on Figure 2. We denote as the tetrahedron ( on Figure 2) that contains the point (gray point for on Figure 2).
4. Experimental Evaluation of the Free-Space Support Properties
In this section we experimentally evaluate how the free-space support of tetrahedra looks in free and full space near ground-truth weakly and strongly supported surfaces in a real-world scenario. We evaluate it under controlled levels of noise and controlled levels of undersampling. More specifically, we evaluate at discrete points . Note that the domain of the function is in units.
We evaluate the free-space support in two scenes (see Figure 3). The first scene is a synthetically created scene of a bunny on a plate. We have downloaded the publicly available bunny dataset from the Stanford 3D Scanning Repository (see http://graphics.stanford.edu/data/3Dscanrep/) and we have created the plate in a 3D modeling software. We have uniformly distributed sensors on two circles in two elevations around the bunny so that each sensor sees the whole bunny and, more importantly, each sensor is oriented so that the plate is visible all around the bunny. In another words, the whole bunny occludes a part of the plate in each view.
The plate is always strongly supported. The bunny object is represented by 160 K of input points and the plate object is represented by 2.4 M of input points. We have created three datasets from the bunny on the plate scene with the sensors:(i)original scene without undersampling and without outliers,(ii)bunny undersampled to of the original points plus 30 K outliers,(iii)bunny undersampled to of the original points plus 130 K outliers.
The second scene is a real-world scene of a “fountain.” The “fountain” dataset is a benchmark dataset  provided by Strecha et al. and contains calibrated images with a resolution of together with ground-truth laser scan. We have computed input segments of sight by MVS plane-sweeping technique described in . We consider input points that are within distance to the ground-truth laser scan to be ground-truth input points.
We denote the input segments of sight, that is, the set of the “bunny” (i–iii) and “fountain” datasets as BunnySSS, BunnyWSS, BunnyWSSO, and FountainSSS, where SSS stands for strongly supported surface, WSS stands for weakly supported surface, and WSSO stands for weakly supported surface with large amount of outliers.
Note that in all the datasets we know exactly which points are ground-truth interface points and in the “Bunny” datasets we additionally know which ground-truth points are from the bunny object surface and from the plate object surface. We also know which points are located inside or outside the object, that is, in full or free space.
In this section we use the following notation: is the set of input segments of sight that contain ground-truth interface points of the bunny without the plate (of the fountain), is the set of input segments of sight that contain input points that are located in the free space, and is the set of input segments of sight that contain input points that are located in the full space.
4.1. Free-Space Support Evaluation
In this section we evaluate at points just for the ground-truth interface points . Figure 4 shows the Matlab boxplots of values for each . It can be seen that the free-space support between the observer and the surface, for example, , is in the majority of cases relatively large compared to the free-space support for ; that is, free-space support is large in free space. Moreover, it can be seen that the free-space support is relatively small at a small distance behind the surface, ; that is, the free-space support is small in full space. This property holds for all datasets. It holds for weakly supported as well as for strongly supported surfaces. Therefore, it is reasonable to measure the free-space support jump in surrounding the input points.
Results in Figure 4 are represented by the Matlab boxplot (see http://en.wikipedia.org/wiki/Box_plot) function which shows values to quantile as a box with a horizontal line at the median. The red crosses show data beyond times the interquartile range.
4.2. Free-Space Support Jump Evaluation
Based on the experiments in the previous section, we choose and . We evaluate absolute and relative free-space support jump on ground-truth interface input points , input points located in ground-truth free space , and input points located in ground-truth full space . We introduce the two interface classifiers and in order to capture properties of the free-space support jump functions. Given an input line of sight the classifiers classify the point as being interface point or noninterface point according to the following rule: where , , , and are parameters of the classifiers and .
Figure 5 shows receiver operating characteristic curves
(ROC see (http://en.wikipedia.org/wiki/Receiver_operating_characteristic)) of the classifiers and . We evaluate classifier () on a set of segments of sight for classifier parameters , , and ().
For given () we compute the rate of the results classified positively as out of the positive examples , that is, the true positive rate. For the same () we compute the rate of the results classified positively as out of the negative examples (), that is, the false positive rate. Finally, we plot the (false positive rate, true positive rate) as point of the ROC. We have used the set of all segments of sights from all datasets; that is, BunnySSS BunnyWSS BunnyWSSO FountainSSS.
Figure 5(a) shows that the classifier can detect relatively large number, that is, of interface input points, correctly for the price of up to of wrongly classified noninterface input points. We have evaluated that, if we allow of false positives, then the parameter that gives the maximal percentage of true positives is . The classifier with this parameter gives of true positives. The smaller the parameter is, the lower the percentage of the false positives the classifier gives at the price of lower true positives; that is, a conservative approach to the parameter setting is to set a smaller value of the parameter.
Figure 5(b) shows that the classifier can be used to detect the input point that is located in the full part of the space. We have evaluated that, if we allow just of false positives for , then the parameter that yields the maximal percentage of true positives is . The classifier with this parameter gives of true positives. The larger the parameter is, the lower the percentage of false positives the classifier yields at the price of lower true positives; that is, a conservative approach to the parameter setting is to set a larger value of the parameter.
4.3. Outlier Level Evaluation
We evaluate the level of outliers in the same way we have evaluated free-space support jump in the previous section. We use the following classifier: where and are parameters of the classifier . We evaluate the classifier for .
Figure 6 shows that the classifier can be used to detect an input point that is located in the free space. We have evaluated that, if we allow just of false positives for , then the parameter that yields the maximal percentage of true positives is . The classifier with this parameter yields of true positives. The smaller the parameter is, the lower the percentage of false positives the classifier gives at the price of lower true positives; that is, a conservative approach to the parameter setting is to set a smaller value of the parameter.
5. Interface Classifier
In this section we design a new interface classifier. The classifier takes into account the free-space support jump and outliers level. Based on the experiments in the previous sections we can say that a large free-space support jump, coupled with a low outliers level in surrounding an input point, is good evidence for an interface. Given an input line of sight the following classifier classifies the point as being an interface point or a noninterface point : where , , , , and are parameters of the classifier and are constant in all of our experiments. All parameters , , , , and were discussed and evaluated experimentally in the previous section. Table 1 summarizes computed and used values. The used values were chosen using a conservative approach, that is, lower percentage of false positives at the price of lower true positives.
Table 2 shows the decision rates of the interface classifier on sets BunnySSS, BunnyWSS, BunnyWSSO, and FountainSSS} of input segments of sight. We consider the set to be positive examples (i.e., “interface”) and the set to be negative examples (i.e., “noninterface”).
Based on the experiments, we can say that, if the classifier classifies a point as INT, then it is most likely an interface point in the real world. Note that this property holds for weakly supported surfaces as well. On the other hand if the classifier classifies a point as NOI, we cannot say that it is not an interface point in the real world. In the next sections we show how we use the interface classifier to modify the state-of-the-art method , which is not capable of reconstructing weakly supported surfaces, so that it will obtain the ability to reconstruct weakly supported surfaces. Although the number of true positives is around , we will show that it is enough for the new method to reconstruct weakly supported surfaces.
6. Finding Surfaces by Solving a Graph Optimization Problem
We construct s-t graph (STG) for given Delaunay tetrahedralization of the input points in the same way as described in [24, 33]. We denote tetrahedron that corresponds to a node as . We denote oriented face of the tetrahedralization that corresponds to an oriented v-edge as . Given the graph DG and weights of all of its edges an is a partition of into two disjoint sets and where and . The is defined as follows:
The minimum s-t-cut optimization problem is the minimization of the value . There are many classical algorithms that efficiently solve this problem . Following , we cast the surface reconstruction problem as the minimal problem of the graph STG. We interpret the final sets and as that tetrahedra are labeled as being free and tetrahedra are labeled as being full. The final surface is then reconstructed as the union of oriented faces (triangles) of the STG that are on the full-free interface. The surface is therefore guaranteed to bind a volume, that is, to be watertight and self-nonintersecting.
7. Using an Interface Classifier to Reconstruct Weakly Supported Surfaces
We have introduced a way how to detect occluder just from the accumulated free-space evidence in previous sections. Now we use this tool to reconstruct them. Our idea is to modify weights of the STG such that the positively classified occluder tetrahedra will be labeled as full after the minimum s-t-cut optimization. The natural way how to do it is to set weight of corresponding t-edges to infinity. However, we do just enforce them instead. By enforcing the t-weights we mean that we enlarge the values of the weights of the t-edges.
This will force the optimization to label the occluder nodes as full and the weakly supported surface will be reconstructed accordingly. It is important to note that we do not have to enforce all occluder tetrahedra. It is enough to enforce just few. The reason is nicely explained in the paper . Therefore, besides the fact that the number of true positives of our classifier is not , it is far enough for weakly supported surfaces reconstruction.
Following , all we need to do in order to preserve the weakly supported surface is to enforce the t-weights of nodes where tetrahedra are located inside the occluder.
We proceed sequentially. In the first iteration we compute the weights of STG in the same way as in  but we use adaptive weight of input point (proposed in ). We compute the free-space support as was described in previous sections. Then we evaluate the interface classifier and, for the segments of sight classified as INT (interface), we enforce t-weight in the place we assume is inside of the object. Weight of a t-edge is enforced to
Algorithmic overview and the implementation details of the proposed method are in Section 8. Finally, in Section 9 we provide an experimental evaluation of the accuracy and the ability of reconstruction of weakly supported surfaces on synthetic as well as on real-world datasets and compare them to other methods.
7.1. Comparison to Related Work
Both the newly proposed method and the method proposed in  differ from the method proposed in  in how the STG edges weights are computed. First, the newly proposed method and the method proposed in  use adaptive weight of input point while the method proposed in  uses a constant weight for each input point . Second, the newly proposed method and the method proposed in  use an interface classifier to enforce t-weights in order to be able to reconstruct weakly supported surfaces. The method proposed in  does not define the interface classifier explicitly. However,  uses a decision rule that can be formulated as interface classifier as follows: where
If equals zero, then ; that is, if , then the free-space support of the tetrahedron equals the sum of the weights of all the incoming edges to the node .
We define on the line of sight for the function in the same way as the free-space support weight on the line of sight for the function is defined.
The major advantage of the newly proposed method with respect to the method proposed in  is that it produces much more accurate results on strongly supported surfaces and reconstructs weakly supported surfaces better as shown in experiments in Section 9.
Why Is the Newly Proposed Method More Accurate Than the Method Proposed in ? Figure 7(a) shows a typical situation where the classifier wrongly classifies the point as an interface point. While the free-space support decreases rapidly in front of the point , which is in front of a real surface point (), the classifier classifies the point as interface point and enforces the t-weight behind (red peak), which leads to less accurate result. However, proposed interface classifier searches for such point and an associated sensor for which the free-space support function is rapidly decreasing at some (reasonably) large distance in front of the point (i.e., towards the sensor ). Additionally it is reasonably small and almost constant at (reasonably) large distance behind the point (i.e., backwards the sensor ); see Figure 7(b). Afterwards, based on the positive interface classification we enforce t-weight in distance (deep enough) behind the point in order to ensure that it is enforced inside of the real object (occluder).
8. Implementation Details
We use the Computational Geometry Algorithms Library (CGAL) (see http://www.cgal.org/)  to create the tetrahedralization. The tetrahedralization is created incrementally. When adding a new point from the input point cloud we first check if there is a vertex of the actual tetrahedralization within a certain distance from . The distance is set in terms of the and reprojection distance in pixels to a value in the range in order to be able to fit the data into memory. If there is such a vertex , then we do not add to the tetrahedralization but associate the sensor with the vertex and increase by . Otherwise, we will add the point to the tetrahedralization, associate the sensor with it, and initialize to . The approach is similar to . Each vertex has a set of sensors and an value associated with it. The vertices of the tetrahedralization are considered input points, associated sensors are considered to be visibility information of input points, and is the weight of the input point .
Implementation of the proposed method is described in Algorithm 1. We build the s-t graph. Next, we compute the weights of all edges. Our approach to setting up the weights of the graph is sequential. In the first part, we compute all weights of all edges as described in Section 7 (line 2 in Algorithm 1). In the second part, we evaluate on each input point and each associated sensor the newly proposed classifier . If the classification result is , then we multiply the weight of the t-edge where by value (lines 3–8 in Algorithm 1). Table 1 shows classifier parameters, which we use in all of our experiments. Finally, we solve the minimal s-t cut problem using the software (see http://pub.ist.ac.at/~vnk/software.html) described in  (line 9 in Algorithm 1). Finally, we do two steps of a Laplacian based smoothing (line 11 in Algorithm 1) as in . We focus just on creating an initial surface, which can be later refined as in .
Table 3 shows the running times of different parts of Algorithm 1 for different datasets. Table 3 shows that the second part is to times faster than the first one . Iterations in all parts are performed parallel in our implementation. Therefore, all parts are relatively fast.
| is the time of the first part of the proposed algorithm and is the time of the second one. is the time of solving the minimal - cut problem. The times are in the format: minutes: seconds. is the number of vertices and is the number of tetrahedra in the tetrahedralization. The letter M stands for million.|
9. Experimental Results
All experiments were computed on Intel Core i7 CPU machine with NVIDIA 285GTX graphics card, 12 GB RAM, and 64-bit Windows 7 OS. We use several different calibrated datasets in this paper: “bunny,” “Lausanne,” “castle,” “herzjesu,” “fountain,” and “googleStreetView.” The “bunny” and “fountain” datasets are described in Section 4. The street view dataset “Lausanne” was provided by Christopher Strecha and contains images.
The street view data set “goolgeStreetView” is Google street view Pittsburgh experimental image set . The first camera matrices were obtained by the method proposed in . We have to point out that the quality of images (compared to the quality of images from a DLSR or compact camera) was poor in this dataset. The calibration was not optimal either because the calibration was done on spherical images where the internal parameters of each perspective camera out of Ladybug perspective cameras are unknown or because in the proposed pipeline we use perspective cutouts from one spherical image instead of original perspective images from the Ladybug device. Therefore, we consider the “googleStreetView” dataset the challenging one.
We compute the depth map for each of the () images from “Lausanne” (“googleStreetView”) dataset but to reconstruct different parts of the city we choose 3D points from different volumes parts defined by boxes in order to fit into memory.
The “castle,” “herzjesu,” and “fountain” datasets are the benchmark datasets . The “castle” dataset contains images, the “herzjesu” dataset contains images, and the “fountain” data set contains images.
9.1. Approach to Comparison with Ground Truth
Let us describe how we compare a reconstruction to a ground truth. We generate two depth maps for computed 3D mesh and for the ground-truth mesh for each sensor. The depth value for a pixel of a depth map is denoted as . We evaluate accuracy value for each sensor and for each pixel .
We compute a histogram from all values as follows: bins to represent the histogram of accuracy values of all pixels and all sensors where and are defined; that is, . Bin is the number of pixels where and bin is the number of pixels where the ground-truth depth map is defined but the reconstructed depth map is not defined; that is, .
Therefore, the more accurate result has a higher number in the first bins. The first two bins are authoritative because they contain the majority of the points. We call the aforementioned histogram occupancy histogram. We call the value of a bin occupancy at or occupancy at bin .
9.2. Quantitative Analysis
In this section we provide a quantitative analysis of results of proposed, former, baseline, and Poisson method with respect to the ground truth.
Robustness to Outliers. Figure 8 shows robustness of evaluated methods to outliers. It shows that proposed, former, and baseline methods are robust in the presence of 25 K and 130 K randomly distributed outliers in manually selected cube around the bunny. The Poisson method fails completely when 130 K of outliers were added to the scene.
(a) Bunny (160 K) points plus 0, 25 K and 130 K outliers
(b) Poisson 
(c) Baseline 
(d) Former 
Robustness to Undersampling and Outliers. Figure 9 shows the robustness of evaluated methods to undersampling and outliers. We have randomly undersampled just the bunny object (the plate object was not undersampled) to and . We have randomly distributed 130 K and 1 M outliers in manually selected cube around the bunny. Figure 8 shows that Poisson method has completely failed in all cases. The baseline method was successful just in the first case where the surface is relatively strongly supported by the input points and the outliers level is lower than the surface sampling by the input points. The former method can reconstruct the weakly supported object better than the baseline method; however, it is not perfect in all cases. On the other hand, the proposed method is stable in all cases and it is able to reconstruct weakly supported surfaces perfectly. This experiment also shows that while the TP classification rate showed in Table 2 of the proposed interface classifier (Section 5) is around it is still enough to reconstruct weakly supported surfaces.
(a) Input points (percent of bunny points, outliers)
(b) Poisson 
(c) Baseline 
(d) Former 
Robustness to Noise. Figure 10 shows robustness of evaluated methods to noise. We randomly noisify the bunny input points with noise with normal distribution in surrounding ground input points. Next, for each noise level, we reconstruct the scene using a method and we compute the occupancy histogram of the reconstruction (see Section 9.1). We take a bin from the occupancy histogram for each noise level and plot it as a piecewise linear occupancy function of the method at bin . The occupancy function shows how the accuracy of a method at a certain bin of the occupancy histogram changes with the increasing level of noise. Figures 10(a), 10(b), 10(c), and 10(d) show the occupancy functions of Poisson, baseline, former, and proposed methods at bin (,,).
Figure 10(a) shows that the occupancy of Poisson and former methods at bin decreases rapidly with increasing noise level while the occupancy of the baseline and the proposed methods at bin decreases slowly and is almost at the same level.
Figures 10(b) and 10(c) show that the occupancy of Poisson and former methods at bins increases slowly with increasing noise level while the occupancy of the baseline and the proposed methods at bins 2 and 3 increases rapidly almost identically.
Note that the more slowly the occupancy at bin decreases and the more slowly the occupancy at bins ≥ increases, the more accurate the reconstruction is.
We have experimentally shown that the proposed method is more accurate compared to the former method. Additionally, we showed that the proposed method produces the results at the same accuracy level as the baseline method and both the proposed and the baseline methods are much more accurate than the Poisson method with increasing the level of noise.
9.3. Accuracy Evaluation on Real-World Dataset
Figure 11 shows renderings and occupancy histograms of reconstructions using baseline, former, and proposed method with respect to a ground-truth laser scan. The laser scan, images, and calibrations were provided by Strecha et al. and downloaded from their evaluation page . We have used two sources of point clouds as input to the reconstruction methods. The first one was generated by PMVS2 software (PMVS2 is software developed by Furukawa and Ponce ). The second one was generated by the plane-sweeping method described in .
Figures 11(a), 11(b), and 11(c) show the rendering of a ground-truth laser scan, the rendering of the reconstruction of proposed method on input points generated by PMVS2, and the rendering of reconstruction of proposed method on input points computed from the plane-sweeping method described in . It shows that the proposed method works for input points generated by different MVS based methods.
Figure 11(d) shows the occupancy histograms of reconstructions of baseline, former, and proposed methods on  input points. Figure 11(e) shows the occupancy histograms of reconstructions of baseline, former, and proposed methods on PMVS2 input points. Figures 11(d) and 11(e) show that baseline and proposed methods produce more accurate results than the former method. Additionally, it shows that the proposed method produces the results at the same accuracy level as the baseline method.
9.4. Evaluation on Middlebury Datasets
Figure 14 shows results of reconstruction of the “templeRing” and “dinoRing” datasets using the method proposed in this paper. Note that we have not used silhouette images at all.
Table 4 shows quantitative evaluation comparing our results with the laser-scanned ground truth. The accuracy values for the “templeRing” and “dinoRing” meshes are 0.53 mm and 0.4 mm, respectively. More remarkably, the completeness measures are and . The most relevant method evaluated in the Middlebury webpage is the method proposed in . The method  first reconstructs the scene using the “baseline” method and then it refines the reconstruction using a final mesh refinement step. It is important to note that we do not use any final mesh refinement step in the proposed method. Nevertheless, the proposed method is among the best methods in the evaluation page. Of course, the results will vary if a different depth-map estimation method is employed.
9.5. Results of Proposed Method on Real-World Datasets
Figures 13 (bottom row) and 12 demonstrate that the proposed method reconstructs weakly supported surfaces better than the baseline method. The weakly supported surface in Figures 13 and 12 is mostly the ground plane. Figure 13 (first two rows) shows screen shots of other results using the proposed method where the weakly supported ground plane is reconstructed.
10. Conclusion and Future Work
We have presented a new surface reconstruction method that can reconstruct surfaces strongly supported by input points with the same accuracy as the state-of-the-art methods and moreover it can reconstruct weakly supported surfaces (e.g., low-textured walls, windows, cars, and ground planes). We assume calibrated sensors and input points augmented with visibility information of sensors on the input. We introduce an observation that it is also possible to reconstruct a surface that does not contain input points and we demonstrate it in an example. We assume an infinite number of ideal noiseless, outliers-less input points in the example. We accumulate free-space support from visibility information of input points and we show that nonzero free-space support is the evidence for free space and the zero free-space support is the evidence for full space or hallucination in the example. Therefore, the nonzero to zero change is the evidence of a surface (interface). Based on this observation, we define and study free-space support of tetrahedra in tetrahedralization of (real-world) input points that can be noisy and contain huge amount of outliers, and the number of input points is finite. We design an interface classifier based on the experiments. We experimentally show that the number of false positives (wrongly classified noninterface point as interface) is negligible and that the number of true positives (especially on weakly supported surfaces) is considerable. Finally, we propose a new method that extends an existing state-of-the-art (baseline) method by using an interface classifier, so that the existing state-of-the-art method gains the ability to reconstruct weakly supported surfaces. The newly proposed method strongly follows the existing (former) method that introduces the problem of weakly supported surfaces and is able to reconstruct them. However, we discuss and experimentally show that the newly proposed method produces more accurate results and reconstructs weakly supported surfaces better.
We introduce an observation that it is also possible to reconstruct an occluder surface that does not contain input points and we have proposed and evaluated methods that trade this observation in real-world scenario. However, to be practical, the method assumes that the occluder surface is weakly covered by the input points. A possible future work would be to try to use this observation in cases when the occluder is not covered by the input points at all. One of the possible ways would be to estimate possible parts of occluder presence in the space and introduce a new helper points into the tetrahedralization. It would be also possible to work with another volumetric representation of the space, that is, voxels, and so forth. Furthermore, we think that expanding our idea to the aerial reconstructions where facades are usually weakly supported would be nice direction of future work.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work has been supported by FP7-SPACE-2012-312377 PRoViDE and by TACR TA02011275 ATOM.
- R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2003.
- N. Snavely, S. M. Seitz, and R. Szeliski, “Modeling the world from Internet photo collections,” International Journal of Computer Vision, vol. 80, no. 2, pp. 189–210, 2008.
- C. Wu, S. Agarwal, B. Curless, and S. M. Seitz, “Multicore bundle adjustment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11), pp. 3057–3064, June 2011.
- C. Wu, “Towards linear-time incremental structure from motion,” in Proceedings of the International Conference on 3D Vision, 2013.
- S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A comparison and evaluation of multi-view stereo reconstruction algorithms,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), pp. 519–526, June 2006.
- C. Strecha, W. Von Hansen, L. Van Gool, P. Fua, and U. Thoennessen, “On benchmarking camera calibration and multi-view stereo for high resolution imagery,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), June 2008.
- R. Collins, “A space-sweep approach to true multi-image matching,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '96), 1996.
- M. Pollefeys, D. Nistér, J.-M. Frahm et al., “Detailed real-time urban 3D reconstruction from video,” International Journal of Computer Vision, vol. 78, no. 2-3, pp. 143–167, 2008.
- H. Hirschmüller, “Stereo processing by semiglobal matching and mutual information,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008.
- C. Strecha, R. Fransens, and L. Van Gool, “Wide-baseline stereo from multiple views: a probabilistic account,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '04), pp. I552–I559, July 2004.
- C. Strecha, R. Fransens, and L. Van Gool, “Combined depth and outlier estimation in multi-view stereo,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), pp. 2394–2401, June 2006.
- G. Vogiatzis, C. Hernández, P. H. S. Torr, and R. Cipolla, “Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2241–2246, 2007.
- Y. Furukawa and J. Ponce, “Accurate, dense, and robust multiview stereopsis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 8, pp. 1362–1376, 2010.
- D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision, vol. 47, no. 1–3, pp. 7–42, 2002.
- A. Laurentini, “Visual hull concept for silhouette-based image understanding,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 2, pp. 150–162, 1994.
- J.-S. Franco and E. Boyer, “Fusion of multi-view silhouette cues using a space occupancy grid,” in Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV '05), pp. 1747–1753, October 2005.
- L. Guan, J.-S. Franco, and M. Pollefeys, “3D occlusion inference from silhouette cues,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), June 2007.
- H. Kim, M. Sarim, T. Takai, J. Y. Guillemaut, and A. Hilton, “Dynamic 3d scene reconstruction in outdoor environments,” in Proceedings of the 5th International Symposium 3D on Program Set & Call for Participation, 2010.
- Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, “Towards internet-scale multi-view stereo,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 1434–1441, June 2010.
- M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S. M. Seitz, “Multi-view stereo for community photo collections,” in Proceedings of the IEEE 11th International Conference on Computer Vision (ICCV '07), October 2007.
- E. Tola, V. Lepetit, and P. Fua, “DAISY: an efficient dense descriptor applied to wide-baseline stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 5, pp. 815–830, 2010.
- J. Li, E. Li, Y. Chen, L. Xu, and Y. Zhang, “Bundled depth-map merging for multi-view stereo,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 2769–2776, June 2010.
- M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson surface reconstruction,” in Proceedings of the Eurographics Symposium on Geometry Processing (SGP '06), 2006.
- P. Labatut, J.-P. Pons, and R. Keriven, “Robust and efficient surface reconstruction from range data,” Computer Graphics Forum, vol. 28, no. 8, pp. 2275–2290, 2009.
- C. Hernández, G. Vogiatzis, and R. Cipolla, “Probabilistic visibility for multi-view stereo,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), June 2007.
- C. Zach, T. Pock, and H. Bischof, “A globally optimal algorithm for robust TV-L1 range image integration,” in Proceedings of the IEEE 11th International Conference on Computer Vision (ICCV '07), October 2007.
- B. Curless and M. Levoy, “A volumetric method for building complex models from range images,” in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques (SIGGRAPH '96), 1996.
- R. A. Newcombe, S. Izadi, O. Hilliges et al., “KinectFusion: real-time dense surface mapping and tracking,” in Proceedings of the 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR '11), pp. 127–136, October 2011.
- K. N. Kutulakos and S. M. Seitz, “Theory of shape by space carving,” International Journal of Computer Vision, vol. 38, no. 3, pp. 199–218, 2000.
- Y. Boykov and V. Lempitsky, “From photohulls to photoflux optimization,” in Proceedings of the British Machine Vision Conference (BMVC '06), 2006.
- S. Liu and D. B. Cooper, “Ray Markov random fields for image-based 3D modeling: model and efficient inference,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 1530–1537, June 2010.
- S. N. Sinha, P. Mordohai, and M. Pollefeys, “Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh,” in Proceedings of the IEEE 11th International Conference on Computer Vision (ICCV '07), October 2007.
- M. Jancosek and T. Pajdla, “Multi-view reconstruction preserving weakly-supported surfaces,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11), pp. 3121–3128, June 2011.
- M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications, Springer, 2nd edition, 2000.
- V. H. Hiep, R. Keriven, P. Labatut, and J.-P. Pons, “Towards high-resolution large-scale multi-view stereo,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR '09), pp. 1430–1437, June 2009.
- Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1124–1137, 2004.
- CGAL . Cgal, “Computational Geometry Algorithms Library,” 2013, http://www.cgal.org/.
- P. Labatut, J.-P. Pons, and R. Keriven, “Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts,” in Proceedings of the IEEE 11th International Conference on Computer Vision (ICCV '07), October 2007.
- Google street view pittsburgh experimental data set, In: Google Cityblock Research Dataset V.1.7. 2008.
- A. Torii, M. Havlena, and T. Pajdla, “From google street view to 3d city models,” in Proceedings of the 9th IEEE Workshop on Omnidirectional Vision, Camera Networks and Non-classical Cameras (OMNIVIS '09), 2009.
- M. Jancosek and T. Pajdla, “Hallucination-free multi-view stereo,” in Trends and Topics in Computer Vision: ECCV 2010 Workshops, Heraklion, Crete, Greece, September 10-11, 2010, Revised Selected Papers, Part II, vol. 6554 of Lecture Notes in Computer Science, pp. 184–196, 2012.
Copyright © 2014 Michal Jancosek and Tomas Pajdla. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.