Abstract

For unsupervised color image segmentation, we propose a two-stage algorithm, KmsGC, that combines -means clustering with graph cut. In the first stage, -means clustering algorithm is applied to make an initial clustering, and the optimal number of clusters is automatically determined by a compactness criterion that is established to find clustering with maximum intercluster distance and minimum intracluster variance. In the second stage, a multiple terminal vertices weighted graph is constructed based on an energy function, and the image is segmented according to a minimum cost multiway cut. A large number of performance evaluations are carried out, and the experimental results indicate the proposed approach is effective compared to other existing image segmentation algorithms on the Berkeley image database.

1. Introduction

Unsupervised color image segmentation plays a key role in various image processing and computer vision applications, such as medical image analysis [1], image retrieval [2], image editing [3], and pattern recognition [4]. Although the reported performance of image segmentation algorithms has been improved significantly these years, it is by general consent that there is still a long way to reach a high-quality algorithm for unsupervised image segmentation. Among the approaches for color image segmentation, edge-based approaches [5] detect region boundaries based on the assumption that the pixel properties should change abruptly between different regions. A common issue of these approaches is the generation of disconnected edges which leads to undersegmentation. Region-based approaches [6] employ a set of pixels as starting seeds to grow into homogenous grouping of pixels. However, the efficiency of these segmentation approaches relies heavily on the choice of seeds. Watershed-based approaches [7, 8] use gradient information and morphological markers for the segmentation. While these approaches achieved meaningful segments, they are computationally intensive. Clustering-based approaches [9, 10] capture the global characteristics of the images by calculating the image features, usually color or texture, to efficiently segregate data. In [9], the authors proposed a feature space analysis and clustering-based mean-shift segmentation algorithm called MS algorithm. It uses kernel density estimation to model feature space data and reaches the local minimum or maximum by finding the zero of the gradient of the density function so as to obtain the clustering centers. A notable advantage of this algorithm is its modularity; however, this algorithm needs a complicated feature selection process, and it is not practically easy to choose appropriate parameters for various images. And the segmentation result is somewhat sensitive to resolution parameters and . In [10], a CTM algorithm is presented, and it segments the images by clustering the texture features. It models the distribution of the texture features with a mixture of Gaussian distributions. The clustering algorithm derives from a lossy data compression. CTM algorithm compresses feature vectors or windows extracted around all pixels by partitioning them into clusters. As a result, the final coding length is highly redundant due to severe overlap between windows of adjacent pixels, which could lead to incorrect segmentation. CTM algorithm encodes the membership of the pixel by a Huffman coding, but it does not take account of spatial adjacency of pixels nor smoothness of boundaries, which cannot ensure correct and smooth boundary. In practice, it is hard to choose proper parameters, and the segmentation results are sensitive to the parameters. Graph-based approaches [11, 12] construct a weighted graph with limited vertices and edges, and the graph is partitioned into multiple components according to the minimization of a cost function to achieve the segmentation. So far, many graph-based approaches have been developed for color image segmentation, and their drawbacks generally lie in the high computation complexity. Hybrid approaches [1315] have become a noticeable developing direction over the years. They usually mix edge and region information together with other image features or combine two segmenting techniques to achieve optimal results. The approach JSEG in [13] quantizes the color to several classes to form a class-map and then use region growing technique to obtain the segments, but the parameters are hard to be specified. The work [15] is a hybrid approach which employs the vector-based color gradient method and Otsu’s automatic threshold to perform a dynamic threshold-based segmentation. All these approaches address some of the many drawbacks in color image segmentation, such as complex computing models, computation expensiveness, and sophisticated parameters.

Apart from these approaches, graph cut based techniques emerged as increasingly useful tools for energy minimization problems producing provably high-quality solutions in practice, and they have been successfully applied to a wide range of problems in computer vision in recent years. In image segmentation context, since Boykov and Jolly first demonstrated how to use binary graph cuts to build efficient object extraction tools for image segmentation in [16], many graph cut based extensions have been proposed, for example, geo-cuts, grab-cuts, lazy snapping, obj-cuts, and so forth. Generally speaking, these works belong to binary graph cuts formulation or interactive solution which means that the user should place seeds interactively. In [17], the authors present a graph cut based unsupervised image segmentation solution. This algorithm incorporates data cost, smoothness cost, and label cost in an energy function and calculates the optimum energy by binary graph cuts formulation. It uses the minimum description length principle to obtain an unsupervised segmentation. Current graph cut based image segmentation techniques are mainly used in binary image segmentation [16, 18] and interactive image segmentation [1922].

In this work, we aim at making use of the advantages of clustering technique and graph cut optimization technique, combining them together, and propose a new unsupervised color image segmentation solution. We solve the color image segmentation problem via multiway graph cuts and establish a compactness criterion to achieve an unsupervised, fast, and effective segmenting solution.

The rest of the paper is organized as follows. In Section 2, we review the graph cut framework and the -means clustering technique and describe how to determine the number of clusters. The new segmentation approach is presented in Section 3. Experimental results, discussion, and conclusion are presented in Sections 4 and 5, respectively.

2. Background

2.1. Image Segmentation Based on Graph Cut

An image can be defined by a pair consisting of a finite discrete set of points (pixels in , voxels in , etc.) and a function that maps each point to a value in some value space. Usually, corresponds to a square. For an image, we can construct a nonnegative edge weighted graph consisting of a set of vertices , a set of edges , and a positive weighting function defining the edge capacity.

We distinguish between two special vertices of : the multiple terminal vertices , where is a label set, and pixel vertices . Every pixel vertex has an n-link to its four neighbors and also a t-link to every terminal vertex. The set of edges consists of all n-links and t-links.

In [23, 24], Boykov et al. showed that the image segmentation problem can be naturally formulated in terms of energy minimization, and we use Potts energy in the form of where is the set of interacting pairs of pixels, () is a data penalty term, is a discontinuity penalty term, and is a labeling of image .

For the energy minimization problem, there are some methods for the optimality guarantees in certain cases. For example, dynamic programming can be used for a few one-dimensional energy function minimization, such as snakes. Mean field annealing can deduce the minimum of the energy by estimating the partition function. But computing of the partition function is computationally intractable. Graduated nonconvexity is a kind of continuation method, but the quality of the output may not be known except for certain cases (see [25] for details). For the local minimum of energy minimization, lots of methods can be used for the continuous energy as described in [26], and Euler equations are often used to guarantee a local minimum. Another alternative is to use discrete relaxation labeling methods in which combinatorial optimization is converted to continuous optimization with linear constraints and then gradient descent is used to give the solution. Graph cut is a powerful technique for solving many computer vision problems and we apply it to solve the energy minimization in this work. According to [23, 24], the global minimum of energy (1) can be computed by computing a minimum cost multiway cut on an appropriately constructed graph as Figure 1 [23].

2.2. -Means Clustering Technique

Clustering is a process of classifying data items into similar groupings. -means is undoubtedly the most widely used clustering algorithm for its simplicity, efficiency, and empirical success. It generates clusters by minimizing the sum of distances between each data point and its cluster center. There are many ways for the distance measuring (e.g., squared Euclidean distance, hamming distance, and cosine dissimilarity), and we use the most popular Euclidean distance in this work which forms the SSE (sum-squared-error) criterion: where denotes the Euclidean norm, is the centroid of cluster , and is the number of clusters.

-means algorithm starts iteration with initial cluster centers and in the iteration each data item is assigned to the nearest cluster center based on the Euclidean distance and after updating the new cluster centers, the data item is reassigned until no change of the new cluster centers.

2.3. How to Determine the Number of Clusters

Determining the optimal number of clusters is one of the most difficult problems for clustering-based image segmentation methods. Most methods cast this problem into the model selection [27]. Minimum description length (MDL) principle is a well-known regularization solution for this problem, first introduced by Rissanen in 1978. MDL principle is based on a simple idea that the best way to capture regular features in data is to construct a model in a certain class which permits the shortest description of the data and the model itself. After formulating a minimum description length criterion of the data, the basic MDL based algorithm starts with a large number of clusters and gradually merges the clusters until the MDL criterion does not decrease. However, the validity of a MDL selection criterion depends on properties of the underlying coding scheme or, more precisely, the resulting description lengths [28]. Besides, formulating an accurate measure of the coding length is hard, and it is difficult to decide the optimal stop time of the merging process. In our approach, the clustering algorithm is run with different cluster number, and the optimal cluster number is chosen based on a compactness criterion. This means calculating the values of the compactness criterion in the process of clustering to reach a simple and efficient solution.

3. The Proposed Approach

In this part, we proposed a new unsupervised color image segmentation solution. As the number of clusters is necessary for the -means clustering, we should first determine the number of clusters.

3.1. Determine the Number of Clusters

We know that “minimum intracluster variance and maximum intercluster distance” lead to compact clusters [29] and based on this principle we determine the optimal cluster number. Below we establish a compactness criterion based on this principle to find compact clustering so as to determine the optimal number of clusters.

First, we want to find compact clustering with “minimum intracluster variance.” An intracluster distance measure is defined as the average distance between the data items and their cluster centers within clusters, and we want it to be as small as possible. It can be defined as where denotes the Euclidean norm, is the number of clusters, is the number of pixels in the image, and is the cluster center of cluster . We obviously want to minimize this measure.

Next, we use an intercluster distance measure, the distance between clusters, to describe compact clustering with “maximum intercluster distance,” and we want it to be as large as possible. We calculate it as the distance between cluster centers and take the minimum of this value, defined as where and . Obviously, we want to maximize this measure.

Since the minimum value of intrameasure and the maximum value of intermeasure lead to compact clustering, we combine them and define a compactness criterion as

To obtain compact clusters, we obviously want to minimize the criterion, and the minimum value of this criterion corresponds to the best clustering.

3.2. The Proposed Approach

Based on the work above, we present a new unsupervised color image segmentation approach (referred to as KmsGC) which consists of two stages. In the first stage, we give the image pixels initial partitional clustering by -means algorithm, and we use the technique described above to automatically determine the optimal number of clusters. In the second stage, we optimize the initial pixel partition by an energy formulation. We construct a weighted graph in terms of the energy function and then compute a minimum cost multiway cut using the max flow/min cut algorithm.

In the first stage, we do clustering with -means algorithm and determine the optimal number of clusters using the compactness criterion in (5). There are many published methods for determining the optimal number of clusters, such as Akaike information criterion (AIC), minimum description length (MDL) [26], minimum message length (MML), Bayes information criterion (BIC), and gap statistics [3032]. Here, we apply a simple and effective approach described in [30], which experimentally works well with the proposed framework. The process includes executing an iteration from to , where represents an upper limit of the number of clusters. In the iteration, the cluster with larger inner variance is partitioned into two subclusters using -means algorithm and then a new compactness criterion value of this is calculated. After the iteration, we take the cluster number with the smallest value of compactness criterion as the optimal .

The inner cluster variance is defined as where and , is the components for cluster in the color space, is the number of pixels in cluster , and is the vector representing each pixel’s red, green, and blue components, respectively. is the cluster center of of the components. We get , the variance of cluster , by taking the average of the three components of . The presented algorithm is implemented in RGB color space.

In the second stage, we address the construction of energy function in our application as follows. We firstly deal with the data penalty term, the first term in (1). In our use, the data penalty term indicates the cost for allocating the pixel to the cluster . From the above analysis, it is obvious that our goal is to obtain compact clusters, which means we want every pixel to belong to the closest cluster center. Therefore, we set data penalty term as the distance between the pixel and the cluster center: where is the Euclidean norm, is the characteristics of pixel , here we use its RGB value, and is the cluster center of cluster , where is the number of clusters.

Secondly, we describe the discontinuity penalty term, the second term in (1). As [23] points out, a common constraint for label-assigning is that the label should vary smoothly almost everywhere while preserving sharp discontinuities that may exist at object boundaries, so discontinuity penalty term should penalize neighboring pixels having different labels. We want it providing smooth boundaries for the segmentation, so we set it as where is the spatially unvarying smoothness cost and is the spatially varying cost. indicates the cost of assigning neighboring pixels with and where is a label set) and this cost is spatially invariant. Therefore, the spatially unvarying smoothness cost is set as a by matrix where each element value equals

We set the spatially varying smoothness cost as the filtering of the image according to each RGB channel and take the maximum of the three values as the final value.

For the energy function minimization algorithm, as the global energy minimization needs enormous computational costs, local minimum becomes a desirable option. Standard moves technique is often used for calculating the local minimization, which changes a pixel’s label in a time. Many methods use standard moves, such as iterated conditional modes (ICM) [33] and simulated annealing [34]. ICM chooses the largest decrease of the energy function until it converges to a local minimum. Simulated annealing is an outstanding technique which can optimize arbitrary energy functions, but it needs an exponential time for this. A weak point of standard moves is that it cannot decrease the energy by changing a single pixel’s label at a local minimum, which brings low quality solution (see [23] for details). In our work, we use expansion moves which can make larger moves to change labels of large sets of pixels simultaneously to get a fast and accurate solution (see [23] for its detailed performance comparing with simulated annealing). Then, the minimum cost multiway cut is computed by the expansion algorithm presented in [23, 35] and finally the segmentation result is obtained based on the cut.

In summary, the proposed KmsGC algorithm can be described as in Algorithm 1.

1st  stage. Clustering
   Input: image
   Output: initial pixel partition.
  (1) initialize
  (2) for :
   (3)   -means       is the cluster to be partitioned into 2
   (4)  calculate intra, inter, CP
   (5)  calculate
   (6)   = find( )
   (7) endfor
   (8) = find( )
   (9) Return the pixel partition corresponding to .
2nd  stage. Image segmentation
  Input: initial pixel partition
  Output: segmented image
   (1) Initialization.
   (2) Determine the data term as (7)
   (3) Determine the smooth term as (8)
   (4) Construct a graph correspond to the energy function
   (5) Compute the minimum cost multi-way cut using the expansion algorithm.
   (6) Return segmented image

4. Experimental Results

In order to evaluate the performance of the proposed algorithm, we design a number of experiments. First, the accuracy of determining the cluster number must be tested and compared. Second, the qualitative performance is to be accessed by comparing the segmented results with human segmentations and other approaches. Third, the segmented results are to be compared against other approaches by means of effective indices to judge the quantitative performance of the proposal.

Experiments are performed on a typical 2 GHz Intel PC with 2 GB RAM, the MATLAB implementation of our algorithm on spatial point sets, standard images, and real color images.

In terms of the computation time, it depends on the size of image, the value of , and the number of clusters. It mainly consists of two parts: the time for clustering including determining the optimal cluster number and the time for graph cut based segmentation. As for a real color image and as equals 25, it takes about 2 seconds to do clustering (including automatically determining the optimal number of clusters) and a few seconds to segment the image via graph cut. Below are the three experimental evaluations for the proposed algorithm.

4.1. Evaluation of the Determination of Clusters

We first test the accuracy of the proposal on the cluster number determination. The proposal was tested on various spatial point sets for their ideal cluster numbers are known beforehand and also on the standard images for these images are simple so their segmentation results are straightforward. In the experiments, we found that the minimum value of the compactness criterion occurred at the ideal cluster number for each spatial point set, which verifies that the optimal cluster number can be determined according to the minimum value of the compactness criterion. Here, we give the results of a point set and a standard image to show how the algorithm works. Figure 2(a) is a point set with obvious six clusters and we apply it to show the determination of cluster number. We set parameter and after running the algorithm we get the result of Figure 2(b). As can be seen in Figure 2, the proposed algorithm obtains the minimum value of compactness criterion when the cluster number is 6, which verifies that the algorithm is indeed to calculate the correct number of clusters in an unsupervised way.

More than ten standard color images are also used to test the performance of cluster number determination. In these tests, we set , and the cluster numbers are thought to be correctly determined if the corresponding segmentation results are good from a visual point of view. Here, we provide the segmentation result of a house picture. Figure 3(c) shows that the compactness criterion reaches the minimum value when the cluster number is 4 and the corresponding segmented image is showed in Figure 3(b). From the result, we can see that the algorithm provides nearly perfect segmentation, which suggests that the cluster number is correctly determined.

Below we compare our unsupervised solution with the popular regularization criterion MDL. There are many MDL criterion based segmentation algorithms, such as [10, 17, 36]. For the coherence consideration, we will take [10] as a main comparison.

For the proposed KmsGC, we use a compactness criterion and the energy function to reach compact clustering, which is consistent with the base point of human segmentation. The MDL based algorithm has been used for many unsupervised segmentation solutions and it is objective, but it is experimentally difficult to determine the optimal stopping time of the merging, so it may generate oversegmentation or undersegmentation. For example, at the right column of the second row of Figure   in [10], the right picture is undersegmented and the middle picture is oversegmented. More examples can be seen in [10]. This defect also can be seen in Figure of [36].

MDL based algorithms usually merge two adjacent segments if this merging decreases the coding length the most until no decrease happens, like [10, 36]. We find that sometimes the MDL based algorithm fails to accurately control which two adjacent segments to be merged in the merging process. For example, in the right picture of the second row of Figure   in [10], there are many oversegments of the sky while the land is undersegmented and the grass detail is not segmented. Comparing the 3rd picture of Figure 4 of the proposal, the grass detail is segmented and the oversegments of the sky are less.

Regarding detail preservation, MDL based segmentation sometimes cannot preserve details. For example, in the left picture of the second row in Figure   in [10], the man and the car are lost in the final segmentation. In the right picture at the second row of the left column in Figure 10 of [10], the details of some shades and the seacoast are missing. Compare the proposed segmentation in Figure 7. Also observe the eyes and mouths of the woman image in Figure of [36] and Figure of [17].

For the MDL algorithm, if the boundary constraints are not incorporated into the MDL criterion, it may be hard to reach correct and smooth boundaries. Compare our boundary with the results in Figure of [36]. But if it is incorporated, it may be harder to control the segmentation precisely as the coding length gets more complicated. Observe the faces, river, and skin of the second picture in Figure of [17].

4.2. Visual Evaluation

In this section, we visually evaluate the qualitative performance of the segmentation results on the Berkeley image database. This database provides several human manually segmented results for each image and here we employ two of them as the perceptual evaluation references per image. We set the evaluation criterion as: the more similar to these human segmentations, the better the segmented results. We test the proposal on all the 300 images (image size ) of Berkeley image database and some results and the corresponding human segmentations are provided in Figure 4, and some corresponding cluster number determination results are showed in Figure 5. And Figure 8 shows more segmentation results from different categories of the Berkeley images, namely, landscape, objects, water, sky, animals, and people.

From Figure 4, we observe that different regions in the images are well segmented where the contours of the segmented regions match those in the human segmentations, the results are of smooth-boundary and large-region, and the important details in the images have been detected. For example, consider the following.(1)In the bear image, the contour of the segmented bear is highly consistent with the first human segment; the segmented boundary of the bear and the bands is smooth. Important details are detected, such as the ears and nose of the bear.(2)In the 2nd image, the church is accurately segmented and the boundary is smooth.(3)In the third image, the segmented region contours of the stone and the tree are close to those of the human’s, and the boundary is smooth. The segmented regions are of large size while the details are preserved.(4)In the flower image, note that in the second human segmentation the leaves are segmented while they are not in the first human’s. In our result, the petals and the leaves all have been segmented, and the contour of the flower core is correct. The contours of the petals are generally correct, and the leaves are in large segments.(5)In the woman image, although the illumination variation exists, the woman’s face, neck, and hair are accurately segmented and also the forehead is well segmented, which indicates that the number of clusters is accurate.(6)In the 6th image, the objects like the man, fishes, and the sea bottom are well segmented where the number and size of the segmented regions are almost equal to those of human segments, which make them easy to be automatically recognized.(7)Observing the result of the 7th image, the distant objects like the tree and hills are accurately segmented where the contour of the tree highly matches those of the human segments, and the boundaries are smooth, which can be a substitute for the human segmentation. In the near area, the cow and the haystack are well segmented in general.

Figure 5 provides the results of the calculation of the compactness criterion on some images of Figure 4, and from the results we obtain the optimal cluster number for those images.

We set the upper limit cluster number for all natural images segmentation, which is approximately more than 2 times of the ideal cluster number of the images and we think it is large enough from a visual point of view. In the experiments, we found that the algorithm works well on large number of the tested images with this fixed , and from our experience it has little change of the results while takes different values within this range.

As the segmented results illustrate that the segmentations are very close to human segmentations, it may suggest that the proposed algorithm on cluster number determination is effective.

We now visually compare the proposal with two clustering-based unsupervised segmentation algorithms: MS [9] and CTM [10].

First compare with the MS algorithm. Observing the experimental results of MS in [9], we notice that there exist slightly unsmooth and serrated boundaries: see the MIT image in Figure 6 of [9], the room image in Figure 7 of [9] and the hand image in Figure 9 of [9]. Some errors are obvious in those segmentations. Notice that for both MS and the proposal, there exit over-segmentations due to a small gradient of illumination: see the table in Figure 7 of [9].

Second compare with CTM algorithm. Apart from the analysis in the introduction, the CTM algorithm solves the unsupervised problem by minimizing the lossy description length of the feature vectors, which will hold the performance of MDL. Except for the discussion on MDL based segmentation algorithm above, we notice that, for CTM algorithm, the experimentation shows that the results are sensitive to the parameters. Based on the segmentation results provided in [10], it can be seen that there are over-segmentations when and under-segmentations when (like the left picture of the second row in Figure 10 of [10]). For CTM algorithm, we observe that the accuracy of the final segmentation also depends on the accuracy of the boundaries and the regions of the pre-segments done before the merging process. Observing the boundaries of the church in the Figure 11 of [10], boundaries are unsmooth and incorrect. On the computation time, the proposed method clearly beats CTM algorithm without any sensitive parameter to tune.

4.3. Quantitative Evaluation

We now compare the quantitative performance of the proposed algorithm against MS and CTM. The comparison is based on four quantitative performance measures which have been used in many works as follows [10].

PRI (the probabilistic rand index). It counts the fraction of pairs of pixels whose labeling is consistent between the computed segmentation and the ground truth.

VoI (the variation of information). The metric defines the distance between two segmentations as the average conditional entropy of one segmentation given the other and thus roughly measures the amount of randomness in one segmentation which cannot be explained by the other.

GCE (the global consistency error). It measures the extent to which one segmentation can be viewed as a refinement of the other. Segmentations which are related in this manner are considered to be consistent, since they could represent the same image segmented at different scales.

BDE (the boundary displacement error). It measures the average displacement error of boundary pixels between two segmented images. Particularly, it defines the error of one boundary pixel as the distance between the pixel and the closest pixel in the other boundary image.

In the experiments, we set all parameters as in [10] or as the values recommended by the authors in their papers for the goal to achieve best results.

Table 1 shows the quantitative comparison of the proposed algorithm with MS and CTM algorithms on all the 300 images of Berkeley image database based on four indices, and in the table the bolded letters are the best value for each index. From the result we observe that the proposed algorithm KmsGC obtains competitive performances in terms of the four indices. In detail, the proposal obtains the best results with the PRI measure. On the BDE index, the proposal gets the best values, 0.5775 better than MS.

It is important to notice that the quantitative evaluation result also indicates that none of the algorithms is a clear winner in terms of all four indices. Notice that CTM algorithm shows different performance while the parameter is given different values. From the visually segmentation results in [10] we can see that the segmentation is coarser while is bigger. While , CTM performs almost the same with MS algorithm on PRI, VoI, and GCE indices. But while , CTM gets better performances on the first three indices but generates undersegmentation and loses accuracy on BDE index. Overall the proposed KmsGC outperforms other algorithms on PRI and BDE indices.

We may look into the quantitative evaluation results and observe the relationship between the visual and the quantitative evaluation. According to the definition of the indices, PRI index measures the correctness of the pixel labeling, and VoI index measures the difference between two segmentations based on conditional entropy of the information. So a higher PRI value means more accurate segmentation results. Benefiting from optimization of the information-theoretic criterion, CTM algorithm deserves the best performance on VoI index. BDE index measures the average displacement error of the boundary pixels. The proposal optimizes the boundary by an energy function, so the segmentation results demonstrate that the segmented boundaries match those of the human segmentations, which brings a best BDE value. As noticed in [10] that GCE index penalizes undersegmentation more heavily than oversegmentation and it does not penalize oversegmentation at all; that is, the highest score is achieved by assigning each pixel as an individual segment. Therefore, while we get a poor GCE, we obtain large segmented regions and smooth boundaries. Therefore, it may be concluded from the above analysis that the quantitative evaluation results are consistent with the visual evaluation results.

5. Discussion and Conclusion

We first discuss the conceptual advantages of the proposed method. The proposed KmsGC is a two-stage solution which combines -means clustering technique with graph cut technique. In the first stage, a global and efficient initial pixel partition is obtained by -means algorithm. In the second stage, the proposal casts the image segmentation constraints into the problem of energy optimization. The data penalty term expressing the constraints of region properties and the discontinuity penalty term expressing the boundary constraints are integrated in an energy function. The data penalty term is in charge of and experimentally produces compact clustering and accurate pixel label assignment, and the discontinuity penalty term is in charge of and experimentally brings accurate boundary location and smooth boundaries. The energy formulation is solved by a graph-based optimization method, graph cuts, to obtain a global minimum. A powerful a-expansion is run for the minimum cut of the graph and brings a fast segmentation.

For the unsupervised solution, a compactness criterion is constructed based on clustering principle to determine the optimal number of clusters, which is straightforward and ensures compact clustering. And -means algorithm is applied to partition a cluster into two subclusters in the iteration, which is simple and efficient.

In general, the proposed model successfully incorporates -means algorithm and graph cut technique into a new algorithm and hence owns the advantages of both of them, which provides fast, effective, and encouraging segmentation.

Comparing our segmentation results with the segmentation based on -means algorithm will further demonstrate the advantages of the proposal. As -means clustering algorithm partitions the pixels based on the features of single pixel, and it does not take into account the neighborhood relations among the pixels as well as the boundary constraints, it cannot produce smooth and continuous boundaries, and a lot of outliers will exist in the segmentation. Figures 6 and 7 show these defects. Comparing with the results in Figure 4, it is obvious that the proposed KmsGC solves these problems and provides satisfied segmentation.

A variety of experimental results demonstrate that the proposed algorithm KmsGC is effective and able to produce desired image segmentation results. It is important to notice that we do not have any complicated parameter to tune, and the algorithm is straightforward and unsupervised.

However, some disadvantages exist in the proposal: due to the random initialization of the -means algorithm, the proposal could produce different segmentations in different running. To solve this problem, one can run the algorithm several times to obtain a good result; as can be seen from the experimental result, a few outliers exist in some segmentations; if a small gradient of illumination exists in a region of the image, the proposal is inclined to generate oversegmentation.

Oversegmentation is an intrinsic problem appearing in clustering-based segmentation. The technique of region merging has been brought up to solve this problem. In [37], the author merges adjacent regions with similar grey value level to reduce oversegmentation. However, merging the oversegmentations into an optimal segmentation is inherently difficult. In addition, fusing intensity, color, texture, and other features together to measure the similarity between the pixels can be a possible solution to this problem because more feature information will help to partition the image pixels better.

In this work, the human segmentations are assumed as the best perceptual references. To reach better performance, the segmentation result of the proposal has to be tuned to be closer to the human segmentation. So it is necessary to learn the patterns of human segmentation’s behavior and then improve the proposal to get results in a more similar way to the humans. These are some of the challenging problems to be solved in the future.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank the editors and the reviewers for their valuable reviews and comments. They would also like to deeply thank Dr. Dongquan Liu and Paul Liu for their great contributions to the improvement of this paper.