Improved Generalized Belief Propagation for Vision Processing

Chen, S. Y.; Tong, Hanyang; Wang, Zhongjie; Liu, Sheng; Li, Ming; Zhang, Beiwei

doi:https://doi.org/10.1155/2011/416963

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion Acknowledgments References Copyright Related Articles

Special Issue

Propagation Phenomena and Transitions in Complex Systems: Efficient Mathematical Models

View this Special Issue

Research Article | Open Access

Volume 2011 | Article ID 416963 | https://doi.org/10.1155/2011/416963

Improved Generalized Belief Propagation for Vision Processing

S. Y. Chen,¹Hanyang Tong,¹Zhongjie Wang,¹Sheng Liu,¹Ming Li,²and Beiwei Zhang³

Academic Editor: Cristian Toma

Received29 Sept 2010

Accepted25 Oct 2010

Published02 Dec 2010

Abstract

Generalized belief propagation (GBP) is a region-based belief propagation algorithm which can get good convergence in Markov random fields. However, the computation time is too heavy to use in practical engineering applications. This paper proposes a method to accelerate the efficiency of GBP. A caching technique and chessboard passing strategy are used to speed up algorithm. Then, the direction set method which is used to reduce the complexity of computing clique messages from quadric to cubic. With such a strategy the processing speed can be greatly increased. Besides, it is the first attempt to apply GBP for solving the stereomatching problem. Experiments show that the proposed algorithm can speed up by 15+ times for typical stereo matching problem and infer a more plausible result.

1. Introduction

Many engineering problems related to computer vision, statistical physics, signal processing, and artificial intelligence can be formulated as an inference problem in probabilistic graphical models such as Bayesian networks or Markov Random Fields (MRF).The goal is to find the maximum a posteriori (MAP) configuration [1]. However, it is an NP hard problem to get the exhaustive solution, and thus we may get the approximate inference by graph cuts or message passing algorithm and so on. The most popular variant of message passing algorithm is Belief Propagation (BP). Recently, because of its flexibility and efficiency, BP and its variants are boomed especially in image restoration, optical flow, and stereo.

BP is an optimization tool which is firstly proposed by Pearl for singl Bayesian network [2] and extended to loopy graphs such as MRF in last decade. The virtue of it is that we can use it to compute marginal probabilities for graphical models, at least approximately, in a time that grows only linearly with the number of nodes in the system [3]. In BP algorithm, each variance starts with the same initial message and iteratively updates all the messages passing from its neighbor variances, and calculates messages for its every neighbor, then passes new messages back until converged. In factor graph or Bayesian network, BP can be used to perform exact inference for every variance. However, when it refers to highly connected graphs with massive conflicting interactions such as the MRF of stereo matching, the convergence problem becomes a tricky issue anyway. The precision of configuration will vary with the cyclicity of graph. To impulse and accelerate variances to converge, many works have done and also achieved some plausible progress. However, being enslaved to the absence of the convergence property of BP in graph models with loops, the development of BP seems slow. On the other hand, generalized belief propagation (GBP) proposed by Yedidia et al. [4] with its better convergence property against BP has received more attentions recently.

GBP can be considered as a variant of standard BP. It is also an instance of cluster variation methods. In the literature, BP can only converge to a stationary point of Bethe free energy, while GBP can converge to a more accurate stationary point of Kikuchi free energy [5]. Therefore, it leads GBP to take the advantage of better convergence than BP. Despite of the characteristic of good convergence, as a toll, it is really computationally expensive. When considering the temporal complexity with the optimal version in [6], BP as an approximate method reaches linear complexity, while the canonical GBP takes quartic complexity. This has limited its applicability in some small-scale problems, for example, image denoising and image restoration [1], and obviously prevents GBP away from some more complicated problems, for example, stereo matching even in a small size image pair.

To accelerate GBP algorithm, some optimization methods have proposed recently. Petersen et al. proposed two strategies of fast GBP for map estimation on 2D and 3D grid-like MRF [7]. One is to use a caching method that significantly reduces the number of multiplications during GBP inference. The other is to introduce a speed-up for computing the map estimate of GBP cluster messages by presorting its factors and limiting the number of possible combinations. Pawan and Torr also provides a method of fast memory efficient GBP [8].

However, for solving the stereo matching problem, it is still a fraction of the need. This paper proposes a new method named direction set method which is introduced into the pairwise message computation stage to make GBP more efficiently. With the proposed method, the temporal complexity can be decreased from quartic to cubic. Furthermore, this is the first attempt to apply GBP for solving the stereo matching problem. For completeness, we will briefly introduce the MRF and BP in the next section.

The remainder of this paper is organized as follows. Section 2 gives a sketch of basic theory. Section 3 provides the definition of the GBP with min-sum messaging and its caching structure. Section 4 represents a detailed description of the proposed strategies for GBP optimization. Section 5 gives the experiments and results of stereo matching and Section 6 summarizes the findings.

2. The Basic Principle

Human understand a scene mainly using the spatial and visual information which is assimilated through our eyes. These information such as region or object, mainly based on the contextual constraints, are extremely necessary for interpretation. The context-dependent object such as image can be modeled in a convenient and consistent way through MRF theory. It is achieved through characterizing mutual influences among such entities using conditional MRF distributions [9].

MRF is firstly introduced into computer vision in [10] and have dominated the fields of image processing and computer vision since the early 1980s. As the most popular type of prior models for gridded image-like data, which include not only regular natural images but also two-dimensional fields such as motion or depth maps, as well as binary fields such as and image restoration and segmentations, MRF provides a mathematical foundation for the characterization of contextual constraints and the derivation of the probability distribution of interacting features [9].

Without loss of generality, let be a set of indexes , be a set of observed nodes, be a set of labels. Here we set all the labels are discrete.

represents the neighbor system to indicate the interrelationship between nodes or the order of MRF. Recently, a learning high-order MRF model named Fields of Excepts has been proposed which could get more sufficient priors. However, we use one-order MRF (also called pairwise MRF) for simplification. Figure 1 shows a sample of MRF used in this paper.

Many computer vision problems can be formulated as a labeling problem in which the solution is assigning a label from the set to each of the nodes in . In the literature, a mapping function which can be represented in this processing. It has been proved that the joint probability of an MRF is a Gibbs distribution. Besides, according to the Hammersley-Cliffod theorem, the posterior probability only depends on its neighborhood , which means that

According to Bayes’ Rule, the posterior distribution for a given set and their evidence , combined with a prior over the unknowns , is given by

If we take the negative logarithm of both sides, we get Here is a constant which is used to make the integrate to 1. To find the MAP solution, we simply minimize (2.3), which can also be treated as an energy function: where the is the data penalty and is smoothness penalty.

Recall (2.1) and rewrite(2.4) where

Therefore, the energy function is

In the large label space, because of massive variances and various uncertainties, it becomes a nontrivial task [11] to make a global inference using local information. For this reason, many approximated inference algorithms are proposed to find the MAP estimation against the exact answer. In this case, the inference problem usually can be mapped into an energy minimization problem which has a profound mathematic foundation in the literature. In the last few years, two approximate algorithms have been developed in MRF approximated inference problem with their efficiency and comparatively high accuracy, for example, graph cut (GC) [12] and BP [6, 13].

In standard belief propagation with pairwise MRF, a variable can be vividly treated as a “message” from a node to its neighbor node which contain the information about what state node should be in. The message is a vector of same dimensionality as the number of possible label. The value of each dimension manifested that how this label might be corresponding to the node.

Recall the function of (2.1) and write as , then where is the pairwise interaction potential and is the “local evidence”.

Usually, the message must be nonnegative. A high value of message show that the node “believes” the posterior probability of is very high. The message update rule is where represents the number of iterations as shown in Figure 2.

The belief is the product of “local evidence” of the node and all messages send to this node

The standard BP we have described above is also named sum-product BP. There is another variant BP which is more simple and easy to use: max-product (or max-sum in log domain). In max-product BP, (2.9) is rewritten It indicates that which states should the node most likely be in. Though BP is an efficient implicit inference algorithm for MRF with loops. It can only converge to the stationary points of the Bethe approximation of the free energy where the node number of regions is at most two. As has discussed above, GBP can get a more accurate inference than BP. In next section, we extend BP to GBP.

3. Message Passing

The GBP which was firstly proposed by Yedidia et al. can be considered as a region based BP method [4]. Specifically, the basic intuitive idea behind GBP is to compute more useful message between regions other than nodes. As a Kikuchi free energy approximation method, GBP in general allows an arbitrary number of nodes to gather as a clique and involves the clique information to the whole passing process, which yields better approximation to the posterior probability, while BP only do node-to-node message passing around.

As another source of information, that is, the clique information, involved in the passing process, the search capability for the minimum of an energy function is extensively upgraded. The update rules of the canonical GBP are defined as below: where is the regions and is their correspondent subregion, is the message sending from region to its subregion , is the local “evidence” of node , is the set of messages sending from out side of region to some nodes inside region , is defined similarly, is the set of messages sending from some nodes in region but not in region to some nodes in region , and is the belief of region .

The definitions of the regions, for example, and , in (3.1) directly determine the performance of GBP. It is very hard to choose the reasonable size of region. Though the basic clusters should encompass as many cycles as possible, the complexity will grows exponentially with the number of size. To some degree, they are somewhat contradictory, the more lager size is, the less efficiency the algorithm is. In practice, it is infeasible to set the cluster sizes larger than four.

In this paper, we concern the implementation instance introduced in [14]. This instance of GBP is comprised of two types of regions definition, that is, single node region and double node region, and the correspondent messages are named edge message and cluster message, respectively. The message update rules are defined in (3.2) and (3.3). The sketch map of the message passing process can be seen in Figure 3, Equation (3.2) describes the edge message sending from a specific node to node . Equation (3.3) describes the cluster message sending from pair-node and to pair-node and .

(a)

(b)

4. Efficiency Improvement

In order to improve the efficiency of GBP, the direction set method is proposed which can reduce the computation complexity of cluster message. Considering (11), when and are given, the temporal complexity to compute a specific item in the cluster message is . It is almost contributed from the first term in the equation, which can be regarded as finding the minimum value in a grid-like dataset which size is and , respectively. Usually all the elements in the lattice are traversed to find a minimization. And the temporal complexity is . Petersen et al. proposed a method to reduce the search space [7, 15], but it relies very much on the traverse order. The method which we suggested in this paper is very straightforward. The temporal complexity becomes when the direction set method is applied. Thus the total complexity for computing the cluster message becomes .

The direction set method adopted here is also called Powell’s method [16]. It is a classical numerical algorithm in function minimization or maximization. It decomposes an -dimensional (-D) search problem into several one-dimensional (1D) search processes. Take an example in 2D lattice where a node with a random initial position, and two orthogonal directions are given. First, moves to the extreme value position which is found by searching along the first direction among the two initial directions. Second, moves to another extreme value position by searching along the second direction given by initialization. Third, the first direction is substituted by the second, and the second direction is set to be a new direction which is determined by the initial position and the final position after two rounds of searching. Meanwhile, the final position is set to be the new initial position. The three steps are performed in an iterative way until no longer moves. In another word, by searching along the two directions, there is no other position where its value is less than . Thus, the final position is where stops.

The general idea of the direction set or Powell’s method has a challenging problem that the two directions will “fold up on each other” in some cases. Once this happens, the search capability in this iteration will be weakened, and the process has a high risk of getting a subspace minimization instead of full N-D case. On the other hand, in practice it is hard for computer to search along an arbitrary direction where it needs more computation to determine which nodes are occupied. This paper adopts the method suggested in [16]. We set the two directions to be static and parallel along each axis. This setting not only keeps the orthogonal condition from the beginning to the end, but also makes the implementation easier because every search process is along one of the axes.

Although there is no special requirement for the start position, it is more useful to place the initial position close to the extreme value position. When and are given, the initial position at the s-t lattice is a tricky issue. To place it near the minimum value position, we assume that the combination of the independent minimum value positions of and is close to the actual minimum value position.

Through this optimization, the number of accessed positions is decreased from to where is the number of iterations which in our practice is about 2 to 3 in average, is the search range, for example, the disparity range for stereo matching. Since the comparison operation takes main computation time, the general complexity becomes while the complexity of brute force search is . The efficiency rate is . When is larger, the rate of computation time is higher.

5. Experiments

Stereo matching has been one of the most challenging and fundamental problems in computer vision. A comprehensive research has been done in the last decade [17–22]. A latest evaluation of these various methods can be found in [23]. In the last few years, as is showed in [24], the global methods based on MRF have reached the top-performing.

In this section, stereo matching is formulated as a MRF inference problem. To achieve the MAP estimation, which can be yielded as an energy minimization problem, let be the set of the image pixels in image pair and be the disparity. The initial data cost calculated by the truncated linear transform which is robust to noise or outlier is defined as where is the cost weight which determines the portion of energy that data cost possesses in the whole energy, represents the truncating value. Both of them are set experimentally. represents ’s intensity in the left image of channel . is defined similarly. The Birchfield and Tomasi’s pixel dissimilarity is used to improve the robustness against the image sampling noise. It is noticeable that we calculate the data cost in the CIELAB (the standard of Commission Internationale de L’Eclairage) color space, and the Euclidean distance is used as the measure. Practical experiments show that it can improve the final results at some degree.

The smooth cost which expresses the compatibility between neighboring variables embedded in the truncated linear model is defined as where is the truncating value. The smooth cost based on the truncated linear model is also referred to as discontinuity preserving cost, since it can prevent the edges of objects from over smoothing.

The corresponding energy function used here is the most conspicuous one which is defined as where are the edges in the four-connected neighborhood set.

The energy function defined in (5.3) can be considered as a description of the scene. The objective is to find a solution which can minimize (5.3), which means the correct depth information in the scene. Generally, a rather complex energy function can get the solution more correct. However, to simplify the presentation and to be consistent and comparable with other methods, the dualistic energy function as (5.3) is used in this paper.

The proposed method is evaluated on MiddleBury test. We compared our results with efficient BP [6] and canonical GBP to show the improved efficiency as well as the accuracy of the proposed method. The same set of certain typical parameters were used, where specifically, and in the data cost term, = 10.0 in the smooth cost term, = 16. All experiments were tested on a personal computer with 1.6 GHz CPU and 2 G DRAM.

Apparently, efficient BP is much faster than others. However, it is less accurate. The ultimate purpose of the proposed method is to improve the efficiency of canonical GBP while keeping it in a good accuracy. As shown in Figure 4, the execution time which combines the two strategies can be extensively reduced, while the convergence energy rises a little because direction set may cause loss of accuracy. The canonical GBP with caching and direction set can achieve about 15+ times of the speed rate. The experiments were tested with the image “Tsukuba” (384 × 288 size).

The error of result is calculated with the ground truth, respectively. From the evaluated accuracy listed in Table 1, the accuracy of proposed method is obviously better than that of canonical GBP. Comparing it with the accuracy of efficient BP, the proposed method yields a similar level. On the other hand, through the comparison between the proposed method and efficient BP, it is noticeable that efficient BP tends to get a frontoparallel result which makes the surface oversmooth and results in a layered effect. In the contrary, the proposed method does not have the drawback of layered effects like that caused by efficient BP, but the 3D map becomes blurred at the boundaries and some noises cannot be eliminated. In fact, although a layered result can reach a lower energy, it cannot always be a better description of the real scenes (Figure 5).

(a)

(b)

(c)

(d)

6. Conclusion

This paper studied the challenging issues in both physics and computer vision, that is, the efficient optimization for GBP and stereocorrespondence for 3D vision. A min-sum scheme is invented for the message computing process in GBP, and this new method is applied to solve the stereo matching problem. Direction set is proposed for improving the efficiency. For a typical image pair, it can speed up the matching process to about 15+ times. Besides this improved speed in each single thread, with a parallel computing architecture, it can further catch up or take over most contemporary global algorithms due to its message-based passing process. Furthermore, with the proposed method we can get more plausible results in visual favorite because its better convergence can outperform most of other global algorithms. The practical experiments also prove these conclusions beyond both efficient BP and canonical GBP.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC-60870002, 60802087, 60873264, 61070214), the 973 Plan [2011CB302800], NCET, and the Science and Technology Department of Zhejiang Province (2009C21008, 2010R10006, 2010C33095, Y1090592).

References

M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky, “MAP estimation via agreement on trees: message-passing and linear programming,” IEEE Transactions on Information Theory, vol. 51, no. 11, pp. 3697–3717, 2005.
View at: Publisher Site | Google Scholar
J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988.
J. S. Yedidia, T. F. William, and Y. Weiss, “Understanding belief propagation and its generalizations,” in Exploring Artificial Intelligence in the New Millennium, chapter 8, pp. 236–249, 2003.
View at: Google Scholar
J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Generalized belief propagation,” Neural Information Processing Systems, vol. 13, no. 7, pp. 689–695, 2000.
View at: Google Scholar
J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Constructing free-energy approximations and generalized belief propagation algorithms,” IEEE Transactions on Information Theory, vol. 51, no. 7, pp. 2282–2312, 2005.
View at: Publisher Site | Google Scholar
P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient belief propagation for early vision,” International Journal of Computer Vision, vol. 70, no. 1, pp. 41–54, 2006.
View at: Publisher Site | Google Scholar
K. Petersen, J. Fehr, and H. Burkhardt, “fast generalized belief propagation for MAP estimation on 2D and 3D grid-like markov random fields,” in In Proceedings of the 30th Deutsche-Arbeitsgemeinschaft-fur-Mustererkennung (DAGM) Symposium on Pattern Recognition, pp. 10–13, Munich, Germany, June 2008.
View at: Google Scholar
K. M. Pawan and P. H. S. Torr, “Fast memory-efficient generalized belief propagation,” in Proceedings of the European Conference on Computer Vision, Part IV, pp. 451–463.
View at: Google Scholar
S. Z. Li, Markov Random Field Modeling in Image Analysis, Springer, 3rd edition, 2009.
S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721–741, 1984.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
C. Cattani, “Fractals and hidden symmetries in DNA,” Mathematical Problems in Engineering, vol. 2010, Article ID 507056, 31 pages, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, 2001.
View at: Publisher Site | Google Scholar
J. Sun, Y. Li, S. B. Kang, and H.-Y. Shum, “Symmetric stereo matching for occlusion handling,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), pp. 399–406, June 2005.
View at: Google Scholar
J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Bethe free energy, kikuchi approximations and belief propagation algorithms,” Tech. Rep. TR-2001-16, Mitsubishi Electric Research Laboratories, 2001.
View at: Google Scholar
E. G. Bakhoum and C. Toma, “Dynamical aspects of macroscopic and quantum transitions due to coherence function and time series events,” Mathematical Problems in Engineering, vol. 2010, Article ID 428903, 13 pages, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Reciples in C, Cambridge University Press, Cambridge, UK, 2nd edition, 1992.
R. Balasubramanian, S. Das, and K. Swaminathan, “Reconstruction of quadratic curves in 3-D from two or more perspective views,” Mathematical Problems in Engineering, vol. 8, no. 3, pp. 207–219, 2002.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
C. L. Zitnick and S. B. Kang, “Stereo for image-based rendering using image over-segmentation,” International Journal of Computer Vision, vol. 75, no. 1, pp. 49–65, 2007.
View at: Publisher Site | Google Scholar
R. Szeliski, R. Zabih, D. Scharstein et al., “A comparative study of energy minimization methods for Markov random fields with smoothness-based priors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 6, pp. 1068–1080, 2008.
View at: Publisher Site | Google Scholar
O. J. Woodford, P. H. S. Torr, I. D. Reid, and A. W. Fitzgibbon, “Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2241–2246, 2007.
View at: Publisher Site | Google Scholar
M. Bleyer, C. Rother, and P. Kohli, “Surface stereo with soft segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1570–1577, June 2010.
View at: Google Scholar
Q. Yang, L. Wang, R. Yang, H. Stewénius, and D. Nistér, “Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 3, pp. 492–504, 2009.
View at: Publisher Site | Google Scholar
D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision, vol. 47, no. 1–3, pp. 7–42, 2002.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
D. Scharstein and R. Szeliski, “Middlebury Stereo Vision Research,” 2008, http://vision.middlebury.edu/stereo/eval/.
View at: Google Scholar

Copyright

Copyright © 2011 S. Y. Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1938

Downloads

1211

Citations