Dynamic Self-Occlusion Avoidance Approach Based on the Depth Image Sequence of Moving Visual Object

Zhang, Shihui; He, Huan; Zhang, Yucheng; Li, Xin; Sang, Yu

doi:https://doi.org/10.1155/2016/4783794

Mathematical Problems in Engineering

On this page

Abstract Introduction Analysis Conclusion Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2016 | Article ID 4783794 | https://doi.org/10.1155/2016/4783794

Dynamic Self-Occlusion Avoidance Approach Based on the Depth Image Sequence of Moving Visual Object

Shihui Zhang,^1,2Huan He,¹Yucheng Zhang,¹Xin Li,¹and Yu Sang¹

Academic Editor: Alessandro Gasparetto

Received16 May 2016

Accepted30 Aug 2016

Published09 Oct 2016

Abstract

How to avoid the self-occlusion of a moving object is a challenging problem. An approach for dynamically avoiding self-occlusion is proposed based on the depth image sequence of moving visual object. Firstly, two adjacent depth images of a moving object are acquired and each pixel’s 3D coordinates in two adjacent depth images are calculated by utilizing antiprojection transformation. On this basis, the best view model is constructed according to the self-occlusion information in the second depth image. Secondly, the Gaussian curvature feature matrix corresponding to each depth image is calculated by using the pixels’ 3D coordinates. Thirdly, based on the characteristic that the Gaussian curvature is the intrinsic invariant of a surface, the object motion estimation is implemented by matching two Gaussian curvature feature matrices and using the coordinates’ changes of the matched 3D points. Finally, combining the best view model and the motion estimation result, the optimization theory is adopted for planning the camera behavior to accomplish dynamic self-occlusion avoidance process. Experimental results demonstrate the proposed approach is feasible and effective.

1. Introduction

Self-occlusion avoidance refers to the fact that when the self-occlusion of visual object occurs, the visual system changes the camera observation direction and position by using current detected self-occlusion cue to observe self-occlusion region further and obtains more information of the visual object so as to accomplish the related visual task better. Therefore, the main problem to be solved on self-occlusion avoidance is how to determine the best observation direction and position of the camera by utilizing the occlusion cue of current visual object.

According to spatiotemporal attribute difference of the visual object, the self-occlusion avoidance problem can be classified into static self-occlusion avoidance and dynamic self-occlusion avoidance. Static self-occlusion avoidance refers to the fact that when the visual object is static, the self-occlusion phenomenon is avoided by moving the camera to a next best view for observing the self-occlusion region detected in an image of the visual object. Dynamic self-occlusion avoidance refers to the fact that when the visual object is moving, the self-occlusion phenomenon is avoided by moving the camera step by step to the best view for observing the self-occlusion region detected in one frame from depth image sequence of the moving visual object. By the concept above, we can see the main difference between the two cases is that the result of static self-occlusion avoidance is a next best view under self-occlusion and the camera can accomplish the task of self-occlusion avoidance in the calculated view, which can be regarded as “one-step” type. While the result of dynamic self-occlusion avoidance is a series of next views, the camera needs multiple motions to avoid self-occlusion, which can be regarded as “step-by-step” type.

Compared with static self-occlusion avoidance, the visual object keeps moving in the process of dynamic self-occlusion avoidance, which makes it impossible to accomplish the dynamic self-occlusion avoidance through moving the camera to the calculated next best view directly. In order to accomplish the dynamic self-occlusion avoidance, besides calculating the next views, the motion estimation about the moving visual object is necessary. Then the camera not only does synchronous motion with the visual object according to the motion estimation result, but also executes self-occlusion avoidance function simultaneously. In this way, the camera moves step by step until accomplishing the dynamic self-occlusion avoidance.

Before self-occlusion avoidance problem emerging, the similar problem called next best view has been widely studied. Currently, scholars have gained some achievements on the next best view. Connolly [1], as one of the earlier scholars studying the next best view, used partial octree model to describe visual object and made different marks to the nodes of different observation situations, so as to determine the next best view. By discretizing a fixed surface, Pito [2] determined the next best view from plenty of discretization candidate views. Based on the depth data, Whaite and Ferrie [3] constructed parameter similarity model of object and planned the next best view of the camera according to the difference between the depth data and the current fitted model. Li and Liu [4] proposed a method which used B-spline to construct the model of object and determined the next best view by calculating information gain. Trummer et al. [5] proposed a next best view method by combining on-line theory to optimize the 3D reconstruction precision of any object. Unfortunately, because of never considering occlusion factor in these methods, the more serious the occlusion is, the more the accuracy of these methods would be affected.

Because ubiquitous occlusion would affect the result of the next best view, scholars further proposed the next best view methods taking occlusion into account. Banta et al. [6] proposed a combination method to determine the next best view under occlusion. Based on the integration of active and passive vision, Fang and He [7] determined the next best view through the shape of the shadow region and the concept of limit visible surface. Combining layered ray tracing and octree, Vasquez-Gomez et al. [8] constructed the object model and generated candidate views based on sorting of the utility function to determine the next best view. Potthast and Sukhatme [9] determined the next best view through the difference of information entropies under occlusion. Wu et al. [10] determined the next best view by using the algorithm of layered contour fitting (LCF) based on density. Although these methods consider the factor of occlusion (mutual occlusion or self-occlusion; this paper pays attention to self-occlusion), there are limitations in the camera position [6, 7], specific equipment [8, 9], a priori knowledge [7, 10], and so forth. And besides, the self-occlusion region is not modeled properly in these next best view methods. Moreover, there exist some differences (such as problem description and solving cue) between the next best view and self-occlusion avoidance, so the above methods have some reference significance but cannot be taken as the final solution to the problem of self-occlusion avoidance. Most importantly, all of the above methods aim at the study of static visual object, and they are not suitable for the moving visual object. That is to say, all of the above methods cannot be the solution to dynamic self-occlusion avoidance. However, in many scientific research fields such as real-time 3D reconstruction, object tracking, autonomous navigation, scene recognition of mobile robot, robot autonomous operation, and dynamic scene rendering, the self-occlusion of moving visual object is a universal phenomenon. Meanwhile, the visual system would be invalid even wrong if it cannot effectively detect and avoid self-occlusion of moving visual object; thus the visual system would lose its value, which makes how to avoid the self-occlusion of moving visual object become an inevitable and urgent problem.

According to investigation result, there is no related research on dynamic self-occlusion avoidance of moving visual object; that is to say, the research on dynamic self-occlusion avoidance is still in initial stage at present. Meanwhile, considering the objective fact that the 3D information of a scene can be better obtained from the depth image than the intensity image, an approach for avoiding the self-occlusion of the rigid motion object is proposed in this paper based on the depth image. In the process of designing the method, we mainly solve the following three problems: firstly how to design the solution to the dynamic self-occlusion avoidance problem; secondly how to estimate the motion of visual object based on depth image; thirdly how to measure the effect of dynamic self-occlusion avoidance method. Aiming at the first problem, the dynamic self-occlusion avoidance is posed as an optimization problem, and motion estimation is merged into the optimization process of best view model to solve the problem of dynamic self-occlusion avoidance. Aiming at the second problem, the two Gaussian curvature feature matrices are matched by SIFT algorithm and the efficient singular value decomposition (SVD) is used to estimate the motion of visual object. For the third problem, “effective avoidance rate” is proposed to measure the performance of the dynamic self-occlusion avoidance method. The rest of the paper is organized as follows. Section 2 is the method overview. Section 3 describes the dynamic self-occlusion avoidance method based on depth image. Section 4 presents our experimental results and Section 5 concludes the paper.

2. Method Overview

2.1. The Analysis of Dynamic Self-Occlusion Avoidance

The problem of dynamic self-occlusion avoidance can be described on the premise that the self-occlusion region in an image of visual object image sequence is taken as the research object; the reasonable next view sequence is planned for observing the self-occlusion region by moving the camera to achieve the goal that the most information about the self-occlusion region can be obtained from the camera final view. Because the visual object is moving in the process of self-occlusion avoidance, the next view of the camera should be adjusted dynamically to observe the vested self-occlusion region.

Figure 1 shows the sketch map of camera observing ideal object model. Figure 1(a) is the observing effect of camera in initial view, and the shadow region is the self-occlusion region to be avoided. In the case of object moving, if camera wants to obtain the information of the self-occlusion region, it must do the synchronous motion with the visual object and move to the best view for observing self-occlusion region at the same time. Figure 1(b) is the observing effect when the camera is avoiding self-occlusion. Figure 1(c) is the observing effect when the camera arrives at the final view, in which the camera can obtain the maximum information of self-occlusion region, so the process of dynamic self-occlusion avoidance is accomplished when the camera arrives at this view.

2.2. The Overall Idea of Dynamic Self-Occlusion Avoidance

Based on the analysis of dynamic self-occlusion avoidance above, an approach to dynamic self-occlusion avoidance is proposed. The overall idea of the approach is as follows. Firstly, two adjacent depth images of the moving object are acquired and each pixel’s 3D coordinates (all pixels’ 3D coordinates are in the same world coordinate system) in two depth images are calculated by utilizing antiprojection transformation, and the self-occlusion cue in the second depth image is detected. On this basis, the best view model is constructed according to the self-occlusion information in the second depth image. Secondly, according to the 3D coordinates calculated above, the Gaussian curvature feature matrices corresponding to the two adjacent depth images are calculated by using the pixels’ 3D coordinates. Then, the Gaussian curvature feature matrices corresponding to the two adjacent depth images are matched by SIFT algorithm, and the motion equation is estimated by using the 3D coordinates of the matched points. Finally, combining the best view model and the estimated motion equation of the visual object, a series of next views of the camera are planned to accomplish dynamic self-occlusion avoidance process.

3. The Approach to Dynamic Self-Occlusion Avoidance Based on Depth Image

3.1. Constructing the Self-Occlusion Region to Be Avoided and the Best View Model

3.1.1. Constructing the Self-Occlusion Region to Be Avoided

In order to solve the problem of dynamic self-occlusion avoidance, it is necessary to first construct the self-occlusion region to be avoided. As we know, the information of self-occlusion region in current view is unknown; that is to say, the geometry information of self-occlusion region could not be obtained directly according to the depth image acquired in current view, so the specific modeling method should be adopted to describe the self-occlusion region approximately. Figure 2 shows the local self-occlusion region of visual object and its approximate description, and in Figure 2(a), the red boundary is self-occlusion boundary and the blue boundary is the nether adjacent boundary corresponding to the red one. The method in literature [11] is used to detect the self-occlusion boundary in depth image, and all the points on self-occlusion boundary are organized to form self-occlusion boundary point set . As shown in Figure 2(b), the region between self-occlusion boundary and nether adjacent boundary in 3D space is the unknown self-occlusion region. Figure 2(c) shows the construction of self-occlusion region model by utilizing quadrilateral subdivision on unknown self-occlusion region.

The concrete steps of constructing self-occlusion region model are as follows. Firstly, take out the th self-occlusion boundary point from self-occlusion boundary point set in turn. Mark the unmarked nonocclusion boundary point corresponding to the maximum of depth differences between the self-occlusion boundary point and its eight neighbors as , and add into nether adjacent boundary point set . Secondly, use the neighboring self-occlusion boundary points , in and their corresponding nether adjacent points , in to form quadrilateral , as shown in Figure 2(c); is the integer from 1 to and the combination of all quadrilaterals is approximate description of self-occlusion region. Thirdly, combining depth image and camera parameters, use antiprojection transformation to calculate each pixel’s 3D coordinates. At last, use the 3D coordinates of quadrilateral ’s four vertices to calculate its normal and area, thus the modeling process of self-occlusion region can be accomplished.

On the premise of not considering the normal direction, the formula for calculating the normal of quadrilateral can be defined aswhere is the vector from point to the midpoint of and in 3D space and is the vector from point to point in 3D space. Supposing the initial camera view is , in order to ensure that the direction of calculated normal points outside in the process of self-occlusion region modeling, the calculation formula of ’s normal is defined aswhere is the sign function; it means the function value is 1 when and the function value is −1 when . The area of quadrilateral is defined as

In summary, the set consisting of all quadrilaterals is the modeling result of self-occlusion region, so the self-occlusion information to be avoided is described as

3.1.2. Constructing the Best View Model

The best view model should be constructed after obtaining the self-occlusion information to be avoided. This paper uses the area of visible self-occlusion region on next view to construct the objective function and meanwhile uses the size equality of real object corresponding to pixels in different depth images as constraint condition. So the best view model is defined aswhere and are, respectively, the position and direction of camera for observing the vested self-occlusion region at maximum level, is the number of quadrilaterals, is the angle between the normal of th quadrilateral and the vector from the midpoint of th quadrilateral to camera, is the center of gravity of all visible points in initial camera view .

3.2. Estimating the Motion Equation of the Visual Object Based on Two Adjacent Depth Images

Different from the static self-occlusion avoidance, the dynamic self-occlusion avoidance is carried out on the premise that the visual object is moving, which makes the motion estimation of the visual object in 3D space become a necessary step in the process of dynamic self-occlusion avoidance. Because the existing motion estimation methods based on depth image all estimate the motion of the visual object in 2D images, this paper designs a feasible scheme based on the depth image to estimate the motion of the visual object in 3D space.

3.2.1. Calculating the Gaussian Curvature Feature Matrices of the Two Adjacent Depth Images

Considering that rigid body has rotation and translation invariant characteristics, the surface shape of visual object remains invariant in the motion process. Because the curvature feature reflects bending degree of the object surface, and Gaussian curvature only depends on Riemannian metric of the surface which is the intrinsic invariant of the surface (i.e., Gaussian curvature keeps invariant in rotation and translation), the motion of visual object is estimated based on Gaussian curvature feature matrices corresponding to the two adjacent depth images.

In 3D Euclidean space, and are two principal curvatures of a point on the differentiable surface; then the Gaussian curvature is defined as the product of two principal curvatures; namely . The method of calculating the Gaussian curvature in literature [12] is implemented by optimizing convolution calculation, and it is high-efficient and suitable for different scales of curvature value, so the method in literature [12] is used for calculating the Gaussian curvature feature matrix of depth image.

3.2.2. Estimating the Motion Equation of Visual Object Based on the Matched Feature Points

Analysis shows that the matched points can provide reliable information for the motion estimation if the Gaussian curvature feature matrices corresponding to two adjacent depth images of the visual object can be matched. SIFT algorithm in literature [13] is more efficient, and it is not only invariant to image translation, rotation, and scaling, but also robust to the change of visual angle, affine transformation, and noise, so the SIFT algorithm is adopted in this paper for matching the feature points.

SIFT algorithm mainly consists of two steps: the generation of SIFT features and the matching of feature vectors. In order to generate the SIFT features, firstly the scale space should be established. Because the Difference of Gaussians scale-space (DoG scale-space) has good stability in detection of key points, the DoG scale-space is used for detecting the key points in this paper. Secondly, the key points which are instable and sensitive to noise are filtered. Then the direction of each feature point is calculated. Finally, the SIFT feature vector of each key point is generated. In the process of generating SIFT feature vector, 16 seed points are selected for each key point, and each seed point has 8 orientations. Therefore, the dimensions of the SIFT feature vector can be obtained for each key point. After obtaining the SIFT feature vectors of the Gaussian curvature feature matrices corresponding to the two adjacent depth images, the two Gaussian curvature feature matrices can be matched, and the similarity of the vectors is measured by Euclidean distance in the process of matching.

After the key points of the Gaussian curvature feature matrices corresponding to the two adjacent depth images are matched, the motion equation of visual object can be estimated by using the pixels’ 3D coordinates which are obtained by antiprojection transformation. The motion equation is defined aswhere and are, respectively, the 3D coordinates of points before and after visual object motion, is the unit orthogonal rotation matrix, and is the translation vector; then the result of motion estimation is the solution of minimizing the objective function . The objective function is defined aswhere is the number of key points which are matched. In order to solve the and in formula (7), the matrix is defined aswhere . By carrying out singular value decomposition on , can be expressed as ; then according to and , in formula (7) can be deduced as

After obtaining , substitute into formula (10) to calculate ; namely,

After obtaining and , the motion equation shown in formula (6) can be obtained, and the motion estimation of the visual object is accomplished.

3.3. Planning the Next View Based on Motion Estimation and Best View Model

After the motion equation of visual object is estimated, the next view of the camera can be planned. Analysis can be known, if substituting the camera observation position into the motion equation of visual object, the camera can do the synchronous motion with the visual object. In addition, on the basis of the synchronous motion with the visual object, if the camera motion offset for avoiding self-occlusion is introduced, the whole process of dynamic self-occlusion avoidance can be achieved. By analyzing the best view model in formula (5), we can know it is the camera best view that can make formula (11) achieve maximum value. Formula (11) is defined aswhere is the camera observation position. Since formula (11) is a nonconvex model, it is difficult to directly calculate its global optimal solution. Meanwhile, because the problem of dynamic self-occlusion avoidance needs to calculate a series of next views and the gradient information in formula (11) can provide basis for determining the camera motion offset of the self-occlusion avoidance process, in the process of dynamic self-occlusion avoidance, the gradient descent idea is used in this paper for defining the formula of camera observation position aswhere is the current observation position of the camera and is the offset coefficient. Based on the solution to formula (12), the formula of next observation direction is defined as

According to the constraint condition in formula (5), we can search one point along the opposite direction of and make meet

By now, the obtained view can be regarded as the new current view . After acquiring the depth image of visual object in the new view , the process from formula (6) to formula (14) is repeated until the difference between two which are calculated in two adjacent views according to formula (11) is less than a given threshold. It means that a series of views can be calculated during the visual object motion, so the process of dynamic self-occlusion avoidance can be accomplished.

3.4. The Algorithm of Dynamic Self-Occlusion Avoidance

Algorithm 1 (dynamic self-occlusion avoidance).
Input. The camera internal and external parameters and two adjacent depth images acquired in initial camera view.
Output. The set of a series of camera views corresponding to the dynamic self-occlusion avoidance.
Step 1. Calculate the pixels’ 3D coordinates and the Gaussian curvature feature matrices corresponding to the two adjacent depth images.
Step 2. Detect the self-occlusion boundary in the second depth image and establish the self-occlusion region in the second depth image according to formula (1) to formula (4).
Step 3. Construct the best view model according to self-occlusion information (formula (5)).
Step 4. Match the key points based on Gaussian curvature feature matrices corresponding to the two adjacent depth images by using SIFT algorithm.
Step 5. Based on the 3D coordinates of matched key points, solve the objective function in formula (7) according to formula (8) to formula (10); then the visual object’s motion equation in formula (6) can be obtained.
Step 6. Plan the next view of camera according to formula (11) to formula (14).
Step 7. In the obtained next view, acquire a depth image and calculate in formula (11).
Step 8. If the difference between two adjacent is less than a given threshold, the process of dynamic self-occlusion avoidance can be terminated, and otherwise, jump to Step to continue the process of dynamic self-occlusion avoidance.

4. Experiments and Analysis

4.1. Experimental Environment

In order to validate the effect of the method in this paper, the OpenGL is adopted to implement the process of dynamic self-occlusion avoidance during camera observing the moving object based on 3D object model in the Stuttgart Range Image Database. The experimental hardware environment is Intel CPU (R) Core (TM) i7-3770 3.40 GHz and the memory is 8.00 GB. The dynamic self-occlusion avoidance program is implemented with C++ language. In the process of self-occlusion avoidance, the parameter of the projection matrix in OpenGL is and the window size is .

4.2. The Experimental Results and Analysis

To verify the feasibility of the proposed method, we do experiments with different visual objects in different motion modes. During the experiments, the initial camera observation position is and the initial observation direction is . The coordinate unit is millimeter (mm) and the offset coefficient is . The process of dynamic self-occlusion avoidance will terminate if the calculated difference of between two adjacent views is less than the given threshold . The offset coefficient and threshold are chosen by empirical value. Due to the limited space, here we show 3 groups of experimental results corresponding to the visual object Bunny with relatively large self-occlusion region and Duck with relatively small self-occlusion region. The results are shown in Tables 1, 2, and 3, respectively. Table 1 shows the process of dynamic self-occlusion avoidance in which the visual object Bunny does translation along with the speed of . Table 2 shows the process of dynamic self-occlusion avoidance in which the visual object Bunny does translation along with the speed of and rotation around with the speed of . Table 3 shows the process of dynamic self-occlusion avoidance in which the visual object Duck does translation along with the speed of and rotation around with the speed of . Because the camera needs multiple motions in the process of dynamic self-occlusion avoidance, only partial observing results the matched results of Gaussian curvature feature matrices and related camera information are shown in Tables 1, 2, and 3 after the results of first three images are shown.

The first row in Tables 1, 2, and 3 shows the depth image number corresponding to different views in the process of self-occlusion avoidance. At the end of the self-occlusion avoidance, the image number in Tables 1, 2, and 3 is 17, 16, and 13, respectively. Thus we can see that the camera movement times are different for different visual objects or the same visual object with different motions in the process of self-occlusion avoidance. The second row in Tables 1, 2, and 3 shows the acquired depth images of visual object in different camera views which are determined by proposed method. From the images in this row, we can see that the self-occlusion region is gradually observed more and more in the process of self-occlusion avoidance; for example, the self-occlusion region mainly caused by Bunny’s ear in the second image in Tables 1 and 2 and the self-occlusion region mainly caused by Duck’s mouth in the second image in Table 3 are better observed when the process of self-occlusion avoidance is accomplished. The third row in Tables 1, 2, and 3 shows the matched results of the Gaussian curvature feature matrices corresponding to the two adjacent depth images with SIFT algorithm. These matched results are the basis of motion estimation and self-occlusion avoidance, and good matched results are helpful to obtain accurate motion estimation; then the accurate camera motion trajectory can be further calculated. Therefore, the matched results in the third row will influence the effect of self-occlusion avoidance to a certain extent. The fourth, fifth, and sixth rows in Tables 1, 2, and 3, respectively, show the total matched points, the error matched points, and the error rate in the process of matching with SIFT algorithm, where the error rate is equal to the error matched points divided by the total matched points. The seventh, eighth, and ninth rows in Tables 1, 2, and 3, respectively, show the observation position, observation direction, and the observed area of the vested self-occlusion region in related view when the visual object Bunny and Duck do different motions. It can be seen from these data and their change trends that the process of avoiding self-occlusion dynamically with proposed method accords with the observing habit of human vision.

In view of the fact that there is no method of dynamic self-occlusion avoidance to be compared and no Ground Truth about dynamic self-occlusion avoidance currently, it is very difficult to evaluate our method by comparing with other methods or Ground Truth. In order to make the further quantitative analysis of the feasibility and effectiveness of our method, after deeply analyzing the characteristics of dynamic self-occlusion avoidance, we propose an index named “effective avoidance rate” to measure the performance of dynamic self-occlusion avoidance method. Considering the effect of object surface, the same object will have different self-occlusion regions in different camera views, and meanwhile the distribution of the self-occlusion region is random; it is not realistic to find a camera view in which the camera can obtain all the information of the self-occlusion region, so it is inappropriate to take the ratio of the final observed self-occlusion region area to the total self-occlusion region area as evaluation criterion. In addition, because the goal of dynamic self-occlusion avoidance is that the camera not only can track the visual object but also can observe the vested self-occlusion region maximally, the camera will inevitably gradually be close to the best view in the process of dynamic self-occlusion avoidance, namely, the objective view described in formula (5). The area in the objective view is also the self-occlusion region area really expected to be observed. Based on this, the proposed index of effective avoidance rate is defined aswhere is the effective avoidance rate, is the observed self-occlusion region area in the view, where the process of dynamic self-occlusion avoidance is accomplished, is the self-occlusion region area in the objective view, and is the total self-occlusion region area to be avoided.

In order to make quantitative analysis of the performance of proposed method based on the effective avoidance rate, we have done multiple groups of experiments by adjusting the motion modes of different visual objects. Table 4 shows the quantitative analysis results of total 10 groups of experiments which include the 3 groups of experiments described above and the other 7 groups. In addition, in the last column of each experiment, the average time, which is from the time that the camera arrives at a certain view to the time that the next view has been calculated, is shown.

From the 10 groups of experiments in Table 4 we can see that, affected by surface of visual object, the different self-occlusion region area will be generated for the same (or different) visual object in different (or the same) camera views, and the distribution of the self-occlusion region is random, so the vested self-occlusion region area will not be observed completely even if the camera is in the best view (the objective view). At the same time, the camera can only infinitely be close to but hardly reach the objective view accurately in the process of self-occlusion avoidance; that is to say, at the end of self-occlusion avoidance, the camera observed area can only be close to but not reach the area in the objective view, which also verifies the rationality of taking the effective avoidance rate as evaluation criterion. Besides comparing 10 groups of experiments in Table 4 we can know that the visual objects in groups 1, 4, 6, and 8 only do translation or rotation in different initial views, and the motion is relatively simple; then the average effective avoidance rate of these four groups is 95.29%. But in the groups 2, 3, 5, 7, 9, and 10 in Table 4, the visual objects do both translation and rotation in different initial views, the motion is relatively complex, and then the average effective avoidance rate of these six groups is 78.42%, which is obviously lower than that of the four preceding groups in which the visual objects only do translation or rotation. The main reason from our analysis is that many steps in the process of dynamic self-occlusion avoidance, such as antiprojection transformation, matching points based on SIFT and motion estimation, will make error and the error will accumulate when the camera changes its view. The most important thing is that error accumulation will become more obvious when the motion of the visual object is more complex, which will lead to the result that the effective avoidance rate will be significantly lower when the motion of the visual object is complex. Objectively speaking, it is inevitable that there are error and error accumulation in the process of self-occlusion avoidance, and it is not realistic to eliminate the effect of these errors completely; what different methods can do is to reduce or cut down the effect of these errors. The last row data in Table 4 shows that the average effective avoidance rate of the proposed method in this paper reaches 86.78%; therefore in the average sense, the better result of dynamic self-occlusion avoidance can be obtained based on the proposed method. Furthermore, from the last column data in Table 4 it can be seen that the average time consumption is 0.97s which is from the time that the camera arrives at a certain view to the time that the next view has been calculated. It shows that the proposed method has good real-time performance.

5. Conclusion

In this paper, an approach for avoiding self-occlusion dynamically is proposed based on the depth image sequence of moving visual object. The proposed approach does not need specific equipment and the a priori knowledge of visual object or limit the observation position of the camera on a fixed surface. This work is distinguished by three contributions. Firstly, the problem of dynamic self-occlusion avoidance which is still in initial stage at present is studied, and the solution to dynamic self-occlusion avoidance problem is proposed and implemented, which provides an idea for further studying the dynamic self-occlusion avoidance. Secondly, based on the depth images of moving visual object, the Gaussian curvature feature matrices are matched by SIFT algorithm, and the motion equation is efficiently estimated by using singular value decomposition. This provides a novel idea about motion estimation in 3D space based on the depth image of visual object. Thirdly, considering that there are no evaluation criterions about dynamic self-occlusion avoidance at present, an evaluation criterion named “effective avoidance rate” is proposed to measure the performance of dynamic self-occlusion avoidance reasonably, and it provides the referable evaluation basis for further studying the problem of dynamic self-occlusion avoidance. The experimental results show that the proposed approach achieves the goal that the camera avoids the self-occlusion automatically when the visual object is moving, and the avoidance behavior accords with the observing habit of human vision.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this article.

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant no. 61379065 and the Natural Science Foundation of Hebei province in China under Grant no. F2014203119.

References

C. Connolly, “The determination of next best views,” in Proceedings of the 1985 IEEE International Conference on Robotics and Automation, pp. 432–435, IEEE, Missouri, Mo, USA, March 1985.
View at: Publisher Site | Google Scholar
R. Pito, “A sensor-based solution to the ‘next best view’ problem,” in Proceedings of the 13th International Conference on Pattern Recognition (ICPR '96), vol. 1, pp. 941–945, IEEE, Vienna, Austria, August 1996.
View at: Publisher Site | Google Scholar
P. Whaite and F. P. Ferrie, “Autonomous exploration: driven by uncertainty,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 3, pp. 193–205, 1997.
View at: Publisher Site | Google Scholar
Y. F. Li and Z. G. Liu, “Information entropy-based viewpoint planning for 3-D object reconstruction,” IEEE Transactions on Robotics, vol. 21, no. 3, pp. 324–337, 2005.
View at: Publisher Site | Google Scholar
M. Trummer, C. Munkelt, and J. Denzler, “Online next-best-view planning for accuracy optimization using an extended E-criterion,” in Proceedings of the 20th International Conference on Pattern Recognition (ICPR '10), pp. 1642–1645, IEEE, Istanbul, Turkey, August 2010.
View at: Publisher Site | Google Scholar
J. E. Banta, L. M. Wong, C. Dumont, and M. A. Abidi, “A next-best-view system for autonomous 3-D object reconstruction,” IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans., vol. 30, no. 5, pp. 589–598, 2000.
View at: Publisher Site | Google Scholar
W. Fang and B. He, “Automatic view planning for 3D reconstruction and occlusion handling based on the integration of active and passive vision,” in Proceedings of the 21st IEEE International Symposium on Industrial Electronics (ISIE '12), pp. 1116–1121, Hangzhou, China, May 2012.
View at: Publisher Site | Google Scholar
J. I. Vasquez-Gomez, L. E. Sucar, and R. Murrieta-Cid, “Hierarchical ray tracing for fast volumetric next-best-view planning,” in Proceedings of the 10th International Conference on Computer and Robot Vision (CRV '13), pp. 181–187, IEEE, Saskatchewan, Canada, May 2013.
View at: Publisher Site | Google Scholar
C. Potthast and G. S. Sukhatme, “A probabilistic framework for next best view estimation in a cluttered environment,” Journal of Visual Communication and Image Representation, vol. 25, no. 1, pp. 148–164, 2014.
View at: Publisher Site | Google Scholar
B. Wu, X. Sun, Q. Wu, M. Yan, H. Wang, and K. Fu, “Building reconstruction from high-resolution multiview aerial imagery,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 4, pp. 855–859, 2015.
View at: Publisher Site | Google Scholar
S. Zhang and J. Liu, “A self-occlusion detection approach based on depth image using SVM,” International Journal of Advanced Robotic Systems, vol. 9, article 230, 2012.
View at: Publisher Site | Google Scholar
P. J. Besl and R. C. Jain, “Segmentation through variable-order surface fitting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, no. 2, pp. 167–192, 1988.
View at: Publisher Site | Google Scholar
N. Bayramoglu and A. A. Alatan, “Shape index SIFT: range image recognition using local features,” in Proceedings of the 20th International Conference on Pattern Recognition (ICPR '10), pp. 352–355, IEEE, Istanbul, Turkey, August 2010.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2016 Shihui Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1143

Downloads

633

Citations