Abstract

Edge weight-based segmentation methods, such as normalized cut or minimum cut, require a partition number specification for their energy formulation. The number of partitions plays an important role in the segmentation overall quality. However, finding a suitable partition number is a nontrivial problem, and the numbers are ordinarily manually assigned. This is an aspect of the general partition problem, where finding the partition number is an important and difficult issue. In this paper, the edge weights instead of the pixels are partitioned to segment the images. By partitioning the edge weights into two disjoints sets, that is, cut and connect, an image can be partitioned into all possible disjointed segments. The proposed energy function is independent of the number of segments. The energy is minimized by iterating the QPBO--expansion algorithm over the pairwise Markov random field and the mean estimation of the cut and connected edges. Experiments using the Berkeley database show that the proposed segmentation method can obtain equivalently accurate segmentation results without designating the segmentation numbers.

1. Introduction

There are numerous approaches and applications for unsupervised image segmentation in computer vision. Many different theories are proposed for varying the roles of the unsupervised segmentation. As a low level vision problem, an image can be simplified by oversegmentation using a number of different approaches, such as mode-seeking mean shift, multilevel thresholding, histogram-based neural networks, superpixel algorithms, and various graph-based methods [14]. Conversely, semantic segmentation is attempted for simultaneous detection, recognition, and segmentation [5].

Generally, the role of unsupervised segmentation falls between image simplification and full semantic segmentation, where semantically meaningful segments are expected to be found but not necessarily recognized. Segmentation is posed as an image-coloring problem that minimizes specific energy functions. Energy functions can be optimized using stochastic methods such as deterministic annealing and stochastic clustering [610]. For graph theoretic segmentation approaches, the spectral method and graph cut are efficient deterministic optimization methods [1113]. Another traditional segmentation method is the variational method, which evolves boundary contours in a level set framework [14, 15].

The edge weight-based segmentation methods have evolved together with graph partition problems. When edge weights are all positive, the minimum cut can be found; however, the minimum cut has bias toward smaller cuts. Adding negative edge weights can prevent the problem so the graph becomes nonsubmodular; however, the problem becomes NP-hard [16]. Different algorithms have been introduced to estimate the correlation in clustering problem [17, 18]. In contrast, Shi and Malik normalized nonnegative edge weights so the bias toward smaller cuts was eliminated [11].

For the graph theoretic segmentation and level set methods, the number of segments must be predefined. The segment number choice greatly influences the quality of segmentation, especially for a normalized cut. Nonetheless, there have been attempts to solve this problem. The number of segments can be controlled by setting the threshold value to the recursive normalized cut [11]. For level set approaches, a four-color theorem was used to segment images with an arbitrary number of phases with one or two level set functions [19]. However, these methods are still functions of , the number of segments.

In this paper, transforming the pixel clustering problem into an edge partition problem circumvents the segment number selection problem. Edges among adjacent pixels can represent dissimilarity or similarity weights. Two edge partitions are always sufficient for pixel-partitioning problems. An edge can be in a cut set or connected set, which can then be translated into a unique segmentation, as in Figure 1(c). The cut edges indicate that the two node labels are different, whereas the connected edges indicate that two nodes have the same labels. In most cases, however, the cut or connect assignments on the edges are not enough to define a specific segmentation configuration, as in Figure 1(d). Random cut and connect assignments on the edges may result in contradiction of the node labels. However, under the pixel coloring framework, cut and connect assignments on the edges are defined concurrently with pixel labels, and inconsistencies, such as those in Figure 1(d), are prevented.

Under the pixel-labeling framework, a label number selection problem arises. Although the label number selection might seem similar to the segment number selection problem, there are subtle differences. First, pixels do not need to use all label assignments; thus, low numbers of segments are possible with large numbers of labels. Second, under the four-color map theorem, the maximum number of labels for two-dimensional (2D) segmentation can be as low as four. The four-color map theorem states that any 2D map can be colored with intact borders using a maximum of four colors [26]. This theorem can be translated directly to the segmentation problem; any 2D image segmentation can be represented using four labels [19].

In the following sections, a new energy function is introduced for image segmentation through the edge partition. The edge partitions can uniquely define the image segmentation with the hard constraints enforced by the image-labeling framework. Next, an energy minimization algorithm is proposed for the edge partitioning. The experimental section discusses tests of the proposed algorithm using the Berkeley image segmentation database.

2. Pixel Clustering

Image segmentation can be viewed as a pixel-partitioning problem. Many image segmentation methods borrow their ideas from the general partitioning techniques. The -means algorithm minimizes the following function and segments the image into regions. Considerwhere is the pixel feature value and is the mean values of the respective partition . The -means algorithm minimizes the sum of the squared distance from the mean of each partition. The energy function must have a fixed segmentation number. However, estimating the number of segments is a difficult task, and the number of partitions is often designated by human discretion.

3. Edge Partitions

An image can be represented as a set of nodes and edges by a graph . An edge is assigned with weight between nodes . For each node , a label from is assigned to define a segmentation.

3.1. Energy Function

The segmentation problem is formulated in terms of edge partitions. The edges can be partitioned into two sets (cut) and (connect), such that and . If an edge is in , the pixel nodes connected by the edge have the same label. Otherwise, if an edge is in , the pixel nodes connected by the edge have a different labelIn (2), is an edge between pixel nodes and . The pixel labels for pixels and are denoted by and , respectively. is a positive edge weight between pixel nodes and . can be a similar or dissimilar measure between the two pixel nodes. A simple example of is the absolute difference between two pixel colors. Thus, if the colors between the two pixels have a large difference, the edge will likely be in . If the two pixel colors have a small difference, the edge should be in . The mean edge weight values of the and edge sets are found in the following equation:and then the energy function associated with the edge partitions can be defined by following equation: is the cardinality of the set. The energy function (4) is the same as the -means algorithm in (1) except that the number of partitions is set to . The proposed energy function has two mean centers, but it also has hard constraints in (2). Regardless of the segmentation number, there can only be two partitions for the edges, cut and connected edges .

The proposed energy function breaks down into an image-labeling problem in order to maintain the label consistency conditions of (2). The image label state that minimizes (4) under (3) and (2) constraints is the proposed segmentation state. The number of labels must be at least two to avoid division by zero in (3). Under the well-known four-color map theorem, four labels are sufficient to define all possible segment configurations for 2D images [19].

3.2. Optimization

Given the image label state , the mean values , can be estimated as in (3). Otherwise, if and are kept constant, the image label state can be found by optimizing the following pairwise energy function:If the labels between edges are not the same, the edge is considered to be in the cut set; otherwise, it belongs to the connected set. With and constants, minimizing (5) is equivalent to minimizing the edge partition function (4).

The multilabel pairwise energy function (5) can be solved by QPBO--expansion. QPBO--expansion optimizes the multilabel MRFs by iteratively expanding a single label using graph cut [27]. Graph cut can find the optimal expansion if the expansion is submodular. In this problem, the expansions are nonsubmodular. The pairwise potentials for QPBO--expansion, where is the current label state, can be defined as follows:This nonsubmodular binary labeling problem can be approached using the QPBO algorithm [28] with the possibility of a large number of unlabeled nodes. Recently introduced, QPBO improve (QPBOI) algorithm can cope with unlabeled regions [28]; however, this algorithm is not as efficient as the graph cut which minimizes the submodular potentials. The QPBOI algorithm can randomly improve the solution, but iterations of the improved steps can be time-consuming for large numbers of nodes.

Similar to the original -means algorithm, good initialization is helpful to the optimization. The initial estimation of the means, and , can be found by a -means algorithm minimization of edge partitions (4) without the labeling constraint of (2). To estimate the initial state , the pixel clustering -means algorithm (1) can be used. The general framework is illustrated in Algorithm 1.

()   Estimate image label state with the -means algorithm on pixels, .
()   Estimate and by the -means algorithm minimization of (4)
without the labeling constraint of (2).
()    Estimate the image label state using the QPBOI--expansion.
 (keep and constant)
()    Estimate and from the image label state .
()   If and are unchanged, terminate.
Else, repeat steps 3 and 4.
3.3. Edge Weights

Various examples of the edge partition segmentation results using the color distance edge weights are shown in Figure 2 for the MSRC image database [29]. The color distance from the neighboring pixels is sufficient for some image segmentation problems, but more rigorous weight calculations are often suited for semantic segmentation. Instead of proposing new edge weight calculations, an existing state-of-the-art contour detection algorithm is incorporated.

The global probability of the boundary (GPB) edge detection method [25, 30], which scored best for the Berkeley database (http://www.cs.berkeley.edu/projects/vision/bsds), is employed as the edge weights. The edge weights can be connected between the pixel nodes, and the proposed edge partitioning algorithm can be implemented. Figure 3 shows the other segmentation results under the pixel-to-pixel edge connections. Although Figures 3(a) and 3(b) show a good segmentation result, the QPBOI algorithm cannot obtain a good segmentation in Figures 3(c) and 3(d). The QPBOI algorithm often fails in the presence of a large number of nodes. Thus, to reduce both the computational time and the chance of failure in the QPBOI algorithm, the oversegmentation process is adopted from [25] in this segmentation. The edges are connected between the superpixels instead of the pixels. The number of oversegments is between 400 and 1000. The edge partitioning algorithm segments a BSDS image average in under 5 seconds.

4. Evaluation

The proposed edge partition approach is evaluated using the popular Berkeley image database. The set contains 300 images with at least four human segment annotations per image. The three quantitative evaluation methods used are as follows: Probabilistic Rand Index (PRI) [31], Variation of Information (VoI) [32], and Boundary Displacement Error (BDE) [33]. Global Consistency Error (GCE) [34] is not included in this evaluation. GCE measures the extent to which one segmentation can be viewed as a refinement of another. However, one pixel per segment and one segment for an entire image can give zero error for GCE [31]. GCE favors extremely oversegmented or undersegmented results, and both cases are unwanted for a semantic segmentation. GCE is deemed to be an inconsistent evaluation method.

The evaluation methods used in this study are PRI, VoI, and BDE. PRI counts the number of consistent labels between the segmentation and the ground truth. VoI measures the segmentation randomness that cannot be explained by the ground truth. BDE is the average displacement error or the boundary pixels between two segmentation results. PRI counts the correctness in segmentation, while VoI and BDE measure the errors between the segmentation and ground truth. In the first subsection, the proposed method is evaluated against various segmentation methods. In the second subsection, the comparison between the proposed and the merge-threshold methods is demonstrated using the same edge weights.

4.1. Comparison to the Previous Segmentation Methods

Generally, the parameters are constant for the entire database and test methods. This evaluation includes mean shift (MShift) [1], graph-based segmentation (GBIS) [21], JSEG [20], Normalized Tree Partitioning (NTP) [22], saliency-based segmentation (Saliency) [23], Boundary Encoding Based Segmentation (TBES) [24], normalized cut (Ncut) [11], and fully connected spectral segmentation (SpecSeg) [13]. Additionally, contour to region (CtoR) [25] uses the same edge weights. Table 1 summarizes the performance of these methods. Many of the evaluation results are obtained from [13].

For PRI measurements, the merge-threshold method of CtoR ranks first. The proposed segmentation ranks first for VoI and BDE. The CtoR method is available to the public by the authors. The threshold value for the CtoR method was chosen to be 80 for its highest average ranking. A number of segmentation results of CtoR and of the proposed EPartition are shown in Figure 4. For the normalized cut and fully connected spectral segmentation, the segmentation number is chosen for each image and is excluded from the rankings.

CtoR and EPartition use the same edge weights; thus, their performances are similar. However, in CtoR, a merge-threshold algorithm is used for segmentation. Different thresholds among integer intervals are shown for the PRI, VoI, and BDE evaluation methods in Figure 5. Generally, PRI and BDE favor oversegmentation and VoI favors undersegmentation. The optimal threshold value is generally smaller for PRI and BDE than VoI.

In contrast, the edge partitioning segmentation is independent of a threshold value. Figure 5 shows the performance of the CtoR merge-threshold method in terms of threshold values. The proposed EPartition segmentation evaluation scores for PRI, BDE, and VoI are very close to the highest evaluation score of CtoR. However, the merge-threshold method in CtoR requires a specific threshold value for each segmentation evaluation method. The advantage of EPartition is that correct segmentation is possible without the designation of segmentation number or a threshold value.

4.2. Comparison to Trained Threshold

In previous experiments, EPartition was shown to have competitive performance with CtoR when the optimal threshold value is hand-picked for CtoR. In this section, the threshold value is trained from the Berkeley 300 set and the segmentation performances are compared to the Weizmann segmentation set [35]. The Weizmann set contains 100 images with three human segmentation annotations.

In Table 2, the segmentation evaluations of the CtoR and EPartition methods are compared. There is a minuscule difference for PRI and small differences in the VoI evaluation methods. For BDE evaluation, EPartition clearly outperforms CtoR method. The trained threshold value was not robust for different segmentation evaluation approaches. By partitioning the edges through minimizing the mean squared distance, the proposed EPartition shows adaptive performance among the three evaluation methods. Various comparative segmentation results are shown in Figure 6.

5. Conclusion and Future Works

In this paper, image segmentation by edge partitioning is proposed. In contrast with previous edge weight-based segmentation methods, such as normalized cut, the proposed method is independent of the number of segments. Furthermore, compared with the previous segmentation techniques, edge partitioning remains competitive without the need for the segmentation number selection. Segmentation by edge partitioning has shown to be competitive with previous segmentation techniques in the Berkeley database. The advantage of the proposed method lies in its adaptive nature for handling edge weights without threshold values or segment number assignments.

The proposed algorithm can be extended to general partitioning problems. Four labels are sufficient when segmenting 2D images. However, for fully connected graphs, the number of labels can be arbitrarily large. If a maximum number of labels are chosen, the edge partitioning method can be incorporated into a general partition problem without designating the specific number of partitions among nodes.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by Institute for Information & Communications Technology Promotion (IITP) Grant funded by the Korea government (MSIP) (no. R0101-15-0171, Development of Multimodality Imaging and 3D Simulation-Based Integrative Diagnosis-Treatment Support Software System for Cardiovascular Diseases). This work was also supported by Hankuk University of Foreign Studies Research Fund.