Abstract

Flat surface detection is one of the most common geometry inferences in computer vision. In this paper we propose detecting printed photos from original scenes, which fully exploit angular information of light field and characteristics of the flat surface. Unlike previous methods, our method does not need a prior depth estimation. The algorithm rectifies the mess epipolar lines in the epipolar plane image (EPI). Then feature points are extracted from light field data and used to compute an energy ratio in the depth distribution of the scene. Based on the energy ratio, a feature vector is constructed and we obtain robust detection of flat surface. Apart from flat surface detection, the kernel rectification algorithm in our method can be expanded to inclined plane refocusing and continuous depth estimation for flat surface. Experiments on the public datasets and our collections have demonstrated the effectiveness of the proposed method.

1. Introduction

With the rapid development of light field theory [1, 2], light field cameras such as Lytro [3] and Raytrix [4] are now available for consumer and industrial use. Different from 2D image captured by traditional camera, light field camera records extra angular information of the real world and it provides more possibilities for many traditional computer vision tasks [58].

Flat surface detection is such a prominent task to make planar structure inference from natural scenes. One potential application of this issue is detection of printed photo in face-based verification [9]. Face identification has been widely applied in industrial world. However, a common problem of such system suffers whether the face is a real one or just a printed photo of the authorized face. The main reason is the loss of depth information when the camera records the real world. Traditional methods always assume that the printed faces contain detectable texture patterns or require a user interaction to solve this problem [10]. However, these methods are unreliable or inefficient. Depth estimation [11] can be another option before the authentication, but it may bring other problems in depth estimation such as occlusion [12] and shading [13].

In this paper, we analyze the variant and invariant features of flat surface in EPI representation and propose an algorithm to detect the flat surface without depth estimation, which fully exploits angular information of the light field and the characteristics of flat surface. Our main contributions are as follows:(i)An algorithm to rectify EPI for a flat surface, which can also be expanded to other tasks such as refocusing in an inclined plane.(ii)A framework to detect the flat surface in light field by a two-stage algorithm without depth estimation.

The rest of this paper is organized as follows. In Section 2, we review the background and the previous works about flat surface detection. In Section 3, the detailed algorithm is described. And we give the experimental results in Section 4. Our conclusion and future work are arranged in Section 5. Some proofs of related properties are provided in Appendix.

A light field is a function defined in 4D space named [1] to describe light rays in physical world, where and describe the distribution of the light in spatial and angular space, respectively. Under the two-plane parametrization, when we fix one spatial dimension and one angular dimension (or ), the EPI appears. For each point in the real world, there is a corresponding epipolar line in the EPI, and the slope of this line has a linear relationship with the depth of this point.

As a basic problem in computer vision, flat surface detection has been researched for decades but it is not well developed. Most of techniques detect flat surface using a prior depth estimation. Zhao et al. [11] proposed to detect flat surface using the disparity map of the scene; however this method depends on the accuracy of disparity estimation and is sensitive to the alignment errors of disparity estimation. Raghavendra et al. [15] proposed a similar strategy. Instead of the disparity map used in [11], they proposed to obtain a rough depth map by using the focus measure. Undoubtedly, the method also suffers the same problem, that is, inaccuracy of disparity estimation. Ghasemi and Vetterli [14] proposed to extract energy feature vector based on the change of gradient of EPI and to distinguish the flat surface from nonflat one by using a Bayes classifier. The method computes the slope of all epipolar lines and then takes the variance; it is still a depth-based method.

Different from the traditional “depth map to plane fitting” strategy [11, 15, 16], we detect the Lisad-1 feature point [17] in 4D light field and then fit the function of the flat surface by using several feature points in light field directly; finally the robustness of the function which we build is tested by another several feature points. If the scene is a flat surface, the function which is built from several feature points should also be suitable for other points and vice versa.

3. The Proposed Approach for Detecting Flat Surface

It is well observed that, for a flat surface which is parallel to the camera plane, all epipolar lines in the EPI have the same slope since they are in the same depth. However, this invariant property went when the flat surface is tilted to different angles. By analyzing the properties of flat surface function, it is noticed that no matter what angle the flat surface is tilted to, the difference of slope stays the same (this property is discussed in Section 3.3 and its proof can be found in Appendix). Based on this invariant property, we first propose to rectify the slope of epipolar lines in EPI into a same value and then to detect the Lisad-1 feature points [17] of the rectified EPI (the most important advantage of the Lisad-1 feature points is that it provides the slope of each feature point, and the extraction of their depth does not suffer any occlusion or shading problems as they are salient points). The slope of each feature point ought to be equal to if the plane function is true. Finally, we combine the energy ratio from different EPIs as an energy vector and employ a classifier to distinguish flat surface from natural scenes.

3.1. Epipolar Plane Image Rectification

On a flat surface, the depth value of a point has a linear relationship with its 2D image coordinates. When the point is in an EPI with a fixed , its depth can be expressed by a linear function only on (Figure 1), where is the depth of the point . And we can derive the slope of each point by where is the disparity of the point , is the slope of the point in EPI with the fixed , is the focal length of the lens, and is the baseline between two views. As the camera parameters and are constants, they are ignored. Equation (2) can be rewritten as

If the linear function is determined, the slope of each point in EPI can be known and the slope of epipolar lines in EPI can be normalize into a same value. Suppose one point in the original EPI; its slope is , where satisfies the following function: If the slope we hope to normalize to is (it is called target slope later), the target point of shearing is , where meets the following function:

In a brief summary, for a point in the rectified EPI, the corresponding point in the original EPI is We can refer to Figure 2 to understand the procedure of rectification. And Figure 3 gives an example of the original EPI and the rectified EPI. We can see that only one slope is remained in EPI after the rectification.

3.2. Line Function Determination

By substituting the depth range and the image size into (1), 4 constraints are determined to search the value of the parameters , By solving these constraints, the searching space of is obtained (it is labeled with red in Figure 4). Then is generated by dividing the searching space into discrete grids.

We rectify the EPI by using each possible combination of . A Lisad-1 feature [17] is the local extrema in scale-depth spaces by convolving the EPIs with scale variation kernels. As the Lisad-1 feature point provides the slope information, the feature points are extracted and the ratio of the feature points which have the same slope with the target slope in all feature points is computed. where is the set of the Lisad-1 feature points extracted from the rectified EPI. is the size of the set . is the slope of the th feature point.

For a flat surface, if is known, all epipolar lines should have the same slope value with the target slope (see Figure 3) after rectification. So we select the optimal which results in the largest ratio .

3.3. The Proposed Strategy

Practically, we can not use the ratio of only one EPI to detect the flat surface since it is too regional to represent the whole scene. We take the following two useful properties into consideration to solve the problem:(i)The value of should be a constant with different fixed if the scene is a flat surface.(ii)The value of should have a linear relationship with the variable if the scene is a flat surface.

These two properties are obvious and can be proved easily (see Appendix). With these two properties, we formulate our strategy as a plane fitting stage and a feature extracting stage.

In the plane fitting stage, the plane function (the common and the function of with ) is fitted by using a series of parameters calculated from several EPIs. And in the feature extracting stage, the parameters of each EPI are computed by using the function of the plane. If the scene is a flat surface, the plane function that we build from previous EPIs should also be suitable for other EPIs; that is to say the slope value in all rectified EPIs should be equal to the value and vice versa. This is the core idea of our strategy. The detailed description can be seen in Algorithm 1.

() Input: Light field LF, slope range , the
  number of training samples , the number of testing
  samples
() Output: The energy ratio of the feature points which
  have the same slope value with in all feature points
  which come from all rectified EPIs.
() / Plane fitting stage /
() Choose EPIs from LF randomly.
() Calculate the best parameters of each EPI by
  using the algorithm mentioned in Section 3.2.
() Select the which has the most frequency in all as
  the common . And fit the linear function of with
  by using the whose equals to .
() / Feature extracting stage /
() Choose EPIs from LF randomly.
() Calculate the parameters of each EPI by using
  the common and the linear function of with .
() Rectify each EPI with its by using the algorithm
  mentioned in Section 3.1.
() Extract the Lisad-1 feature points of all rectified EPIs.
() Count the ratio of the feature points which have
  same slope value with for each EPI, and contruct a
  the feature vector by combining these ratios in a
  descending order.
() Employ a classifier to distinguish flat surface from
  natural scenes.
3.4. Expand to Inclined Plane Refocus

The traditional method shears the EPI [3, 18] to achieve refocusing; the displacement of each point is the same value as the plane we hope to refocus on is a plane which is parallel to the camera plane, in which all points in the plane are in a same depth. where denotes the input EPI and denotes the sheared EPI by a value of .

Under the framework of our algorithm, we can obtain the line function of each EPI after the plane fitting stage, and then we shear each point in EPI with a different displacement according to the line function where denotes the input EPI and denotes the sheared EPI by two parameters of the line function.

In other words, we just need to set the target slope as , in which the epipolar line is perpendicular to the horizontal axis. Two refocus results of our method can be seen in Figure 5; as the data captured by us is a 3D light field (1 angular dimension), there is only defocus blur in horizontal direction and no defocus blur in vertical blur.

3.5. Depth Estimation for Inclined Plane

Similarly, if the scene is a flat surface, we can estimate its depth with the common and the function of obtained from the plane fitting stage. The detailed description is as follows:(i)Calculate the parameters of the line function for each EPI.(ii)Calculate the disparity of each point in EPI according to

We detect the flat surface by using a small set of EPIs, and we fit the function of this plane at the same time. With the function of the flat surface, the depth map can be obtained by substituting the coordinate into the function of the plane.

Different from the traditional multilabeling methods [19], our depth estimation results are continuous since we know the function of the flat surface. Two of our results are shown in Figure 6.

4. Experimental Results

4.1. Experimental Setup

To better analyze the performance of our algorithm, we select two different datasets. The HCI light field dataset [20] and its printed edition are selected to analyze the properties of energy ratio. It is noticed that the printed photos are tilted to different degrees in order to better evaluate our algorithm. The experimental environment and printed light fields can be seen in Figure 7. Apart from this, the EPFL light field dataset [14] is selected to do a comparison with the previous work. As this dataset is captured by a Lytro camera, the experimental results on this dataset can better reflect the pros and cons of the algorithm.

We implement the algorithm in the Matlab 2014b, on OS X 10.11.1 with 8 gigabytes of RAM and 2.7 GHz of processor. The running time of our implementation for a light field is measured in seconds but does not excel the time complexity of [14]. This time can be accelerated to microseconds by using GPU.

In the stage of determining line function, we divide the searching space of into an grid. The target slope we hope to normalize to is not important in our rectification; actually it can be an arbitrary value. In the plane fitting stage, we select 11 EPIs to obtain the function of the plane and select another 15 EPIs in the feature extracting stage. The SVM classifier is employed to distinguish the flat surface from the natural scenes.

4.2. Analysis of the Energy Ratio

We compute the energy ratio for each EPI in the first dataset and plot their distribution in Figure 9. The horizontal axis is the line number, and the vertical axis is the energy ratio for each EPI. It can be seen that the energy ratio of natural scenes may reach a high value sometimes, but mostly it is very small and far away from the flat surface. Apart from this, the energy ratio distribution of flat surface is not only large but also stable; in contrast, natural scenes do not meet these properties.

It is noticed that there are always some EPIs which have large energy ratios in original scenes (the first half of the blue curve in Figure 8). The main reason for this phenomenon is that these EPIs are selected to fit the plane function, and the fitted plane function is more suitable for these areas. However, it may not be suitable for other EPIs (the second half of the blue curve in Figure 8), and this is the reason why we select other EPIs to combine the feature vectors.

4.3. Further Experiments and Comparison

In order to better evaluate the accuracy of our algorithm in real data captured by light field camera, we run our code in another public light field dataset captured by Ghasemi and Vetterli [14]. The dataset consists of 50 light fields of printed photos and 50 light fields of natural scenes.

To test and verify our algorithm in a classification setting, we used a SVM model with cross validation [21]. The results can be seen in Table 1.

Our detection precision is clearly superior to Ghasemi and Vetterli’s method [14]. This improvement is prominent especially in the detection of natural scenes, where 7 natural scenes are misclassified as flat surface in [14] and only 3 in our method. We further analyze these failed data and find that most textures of the scene lie on one continuous depth plane and there are a few textures on others. The feature points which come from this continuous plane play a more vital role and lead to a higher energy ratio. In Figure 8(a), most feature points lie on the cardboard and a few points on the foreground, the fitted plane is close to the cardboard plane, and it leads to high energy ratio (the blue curve in Figure 8(e)). Then for the flat surface, the value of wrong classified samples is 3 as well. By analyzing these samples, it is noticed that the number of feature points is too little to estimate the plane function accurately. In Figure 8(c), there are more feature points in the bottom of the scene and a few points in the top which lead to a wrong fitting of the parameters in Section 3.3. Figures 8(e) and 8(f) show the energy ratio distribution of these failed samples.

5. Conclusion

In this paper, we propose an algorithm to rectify the EPI of a flat surface, which normalizes the mess slope of epipolar lines in EPI into a same slope. And this algorithm can be easily expanded to the inclined plane refocus and the continuous depth estimation for flat surface. Then, we propose a framework for flat surface detection, which learns a function of the flat surface by using several EPIs and tests this function by using another several EPIs. The results show that our algorithm performs well for most scenes, and the more complex the scene, the better the performance.

We may continue to study the limits of the algorithms, such as in terms of low texture scene which leads to the insufficient feature points.

Appendix

The proof of two properties mentioned in Section 3.3 is described here.(1)Assume that the plane function is (2)Substituting any fixed into the function, (A.1) becomes (3)The term is replaced by , and the term is replaced by .

It can be found that the value of is a constant with different fixed and the relationship of and is linear.

Disclosure

The manuscript was presented as an abstract in a poster session of ICPR2016.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The work in the paper is supported by research grant of State Key Laboratory of Virtual Reality Technology and Systems (BUAA-VR-15KF-10).