Abstract

This paper presents a comparative study using different color spaces to evaluate the performance of color image segmentation using the automatic GrabCut technique. GrabCut is considered as one of the semiautomatic image segmentation techniques, since it requires user interaction for the initialization of the segmentation process. The automation of the GrabCut technique is proposed as a modification of the original semiautomatic one in order to eliminate the user interaction. The automatic GrabCut utilizes the unsupervised Orchard and Bouman clustering technique for the initialization phase. Comparisons with the original GrabCut show the efficiency of the proposed automatic technique in terms of segmentation, quality, and accuracy. As no explicit color space is recommended for every segmentation problem, automatic GrabCut is applied with , , , , and color spaces. The comparative study and experimental results using different color images show that color space is the best color space representation for the set of the images used.

1. Introduction

The process of partitioning a digital image into multiple segments is defined as image segmentation. Segmentation aims to divide an image into regions that can be more representative and easier to analyze. Such regions may correspond to individual surfaces, objects, or natural parts of objects. Typically image segmentation is the process used to locate objects and boundaries (e.g., lines or curves) in images [1]. Furthermore, it can be defined as the process of labeling every pixel in an image, where all pixels having the same label share certain visual characteristics [2]. Usually segmentation uses local information in the digital image to compute the best segmentation, such as color information used to create histograms or information indicating edges, boundaries, or texture information [3].

Color image segmentation that is based on the color feature of image pixels assumes that homogeneous colors in the image correspond to separate clusters and hence meaningful objects in the image. In other words, each cluster defines a class of pixels that share similar color properties. As the segmentation results depend on the used color space, there is no single color space that can provide acceptable results for all kinds of images. For this reason, many authors tried to determine the color space that will suit their specific color image segmentation problem [4]. In this work, a segmentation of color images is tested with different classical color spaces, , , , , and , to select the best color space for the considered kind of images.

The segmentation process is based on the GrabCut segmentation technique [5], which is considered as one of the powerful state-of-the-art techniques for the problem of color image segmentation. The iterative energy minimization scheme of the GrabCut is based on the powerful optimization of the Graph Cut technique [6] which allows for the generation of the global optimal segmentation. In addition, Graph Cut can be easily well extended to the problem of N-D images. Furthermore, the cost energy function of the Graph Cut minimization process allows it to be defined in terms of different image features such as color, region, boundary, or any mixture of image features. This flexibility provides wide potential for the use of GrabCut in different applications. On the other hand, GrabCut is considered as a bilabel segmentation technique, where images can be segmented into two background and foreground regions only. Initial user intervention is required in order to specify an object of interest to be segmented out of the image, considering all the remaining image pixels as one background region. This classifies the GrabCut as a semiautomatic segmentation technique and turns the quality of the initialization and hence the segmentation performance, sensitive to the user selection. In other words, poor GrabCut initialization may lead to bad final segmentation accuracy which might require extra user interactions with the segmentation results for fine tuning [5].

In this work, a modified GrabCut is proposed as an automatic segmentation technique, which can segment the image into its natural objects without any need for the initial user intervention. Automation of GrabCut is applied using Orchard and Bouman clustering [7] as an unsupervised clustering technique. The selection of the Orchard and Bouman clustering is based on the empirical comparison results carried out in the work of [8]. The paper exploits the use of some evaluation criteria to evaluate the discriminating power of the automatic GrabCut with the different color spaces. The remainder of the paper is organized as follows. Section 2 provides a basic background on segmentation based-color space models, image segmentation using GrabCut, and unsupervised clustering techniques. Section 3 explains the different color space models. Section 4 illustrates the Orchard and Bouman clustering. The original GrabCut technique and details of its modification are explained in Section 5. Experimental results are presented in Section 6, while the conclusion and future work are presented in Section 7.

As no common opinion has emerged about which is the best choice for color space based image segmentation, some research work tried to identify the best color space for a specific task. Several works [9, 10] show that different color spaces are useful for the problem of color image segmentation. Jurio et al. [11] have carried out a comparative study between different color spaces in cluster based image segmentation using two similar clustering algorithms. Their study involved the test of four color spaces, , , , and , in order to identify the best color representation. They obtained their best results in most cases using color space, while also provided good results. Busin et al. [4] proposed a method to automatically select a specific color space among classical color spaces. This selection was done according to an evaluation criterion based on a spectral color analysis. This criterion evaluates the quality of the segmentation in each space and selects the best one, which preserves its own specific properties. A study of the ten most common color spaces for skin color detection was presented in [12]. They concluded that is the best color space to detect skin in an image. Another study that was applied for the classification of pizza toppings [13] proved that the polynomial SVM classifier combined with color space is the best approach among five different color spaces. Based on a comparative study between the and models, Ruiz-Ruiz et al. [14] declared that the best accuracy was achieved with representation in order to achieve real time processing in real farm fields for crop segmentation.

GrabCut is considered one of the powerful techniques used for color image segmentation. It has been applied to different segmentation problems such as human body segmentation [1517], video segmentation [18], semantic segmentation [19], and volume segmentation [20]. In [17], an automatic extraction of the human body from color images was developed by Hu. The iterated GrabCut technique was used to dynamically update a trimap contour, which was initialized from the results of a scanning detector used for detecting faces from images. The research has some drawbacks as the process goes through many steps and iterations, in addition to being constrained to human poses with frontal side faces. A fully automatic Spatio-Temporal GrabCut human segmentation methodology was proposed by Hernandez et al. [16]. They developed methodology that takes the benefits of the combination of tracking and segmentation. Instead of the initial user intervention to initialize the GrabCut algorithm, a set of seeds defined by face detection and a skin color model are used for initialization. Another approach to segment humans from cluttered images was proposed by Gulshan et al. in [15]. They utilized the local color model based GrabCut for automatic segmentation. This GrabCut local color model was used to refine the crude human segmentations they obtained. In video segmentation, Corrigan et al. [18] extended GrabCut for more robust video object segmentation. They extended the Gaussian mixture model (GMM) of the GrabCut algorithm, so that the color space was complemented with the derivative in time of the pixel’s intensities in order to include temporal information in the segmentation optimization process. Göring et al. [19] integrated GrabCut into a semantic segmentation framework by labeling objects in a given image. Most recently, Ramírez et al. [20] proposed a fully parallelized scheme using GrabCut for 3D segmentation that has been adopted to run on GPU. The scheme aims at producing efficient segmentation results for the case of volume meshes, in addition to reducing the computational time.

Clustering [21], the unsupervised classification of patterns into groups, is one of the most important tasks in exploratory data analysis [22]. It has a long and rich history in a variety of scientific disciplines including anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science. Clustering in image segmentations [2, 23, 24] is defined as the process of identifying groups of similar image primitives. Unsupervised clustering techniques [25] are content based clustering, where content refers to shapes, textures, or any other information that can be inherited from the image itself.

In the cases of bilabel segmentation, good separation between foreground and background is required. This can be implemented through finding clusters with a low variance, since this makes the cluster easier to separate from the others. The selection of the Orchard and Bouman clustering technique [7] is guided by Ruzon and Tomasi [26] and Chaung et al. [27] in order to get tight and well separated clusters. They have worked on solving the problem of image matting that is required for image compositing. In their approach, Orchard and Bouman binary split algorithm has been used for partitioning the unknown region colors into several clusters, in order to generate a color distribution for the unknown region to be estimated. According to a comparative study in [8], the Orchard and Bouman clustering outperformed other unsupervised clustering techniques including self-organizing maps (SOFM) and fuzzy C-means (FCM) for the automation of the GrabCut in terms of improving the segmentation accuracy.

3. Color Space Models

The most widely used color space is the color space, where a color point in the space is characterized by three color components of the corresponding pixel which are red (), green (), and blue (). However since there exist a lot of color spaces, it is useful to classify them into fewer categories with respect to their definitions and properties. Vandenbroucke [28] proposed the classification of the color spaces into the following categories.(i)The primary spaces which are based on the theory that assumes it is possible to match any color by mixing an appropriate amount of the three primary colors: the primary spaces are the real , the subtractive , and the imaginary primary spaces. The conversion from to is and the conversion from to is (i)The luminance-chrominance spaces, which are computed of one color component that represents the luminance and two color components that represent the chrominance: the color space is an example of the luminance-chrominance spaces. The conversion from to is (ii)The perceptual spaces that try to quantify the subjective human color perception by means of three measures, intensity, hue, and saturation: the is an example of the perceptual color space. The conversion from to is

4. Orchard and Bouman Clustering Technique

Orchard and Bouman [7] is a color quantization clustering technique that uses the eigenvector of the color covariance matrix to determine good cluster splits. The algorithm starts with all the pixels in a single cluster. The cluster is then split into two using a function of eigenvector of the covariance matrix as the split point. Then it uses the eigenvalues of the covariance matrices to choose which of the resulting clusters is candidate for the next splitting. This procedure is repeated until the desired number of clusters is achieved. It is an optimal solution for large clusters with Gaussian distributions.

For example, consider as a set of pixels, in order to divide it into clusters:(1)calculate , the mean of , and , the covariance matrix of ,(2)for to do the following:(i)find the set which has the largest eigenvalue and store the associated eigenvector ,(ii)split into two sets, and ,(iii)compute , , , and .This results in pixel clusters.

5. Image Segmentation Using GrabCut

Image segmentation is simply the process of separating an image into foreground and background parts. Graph Cut technique [6] was considered as an effective way for the segmentation of monochrome images, which is based on the Min-Cut/Max-Flow algorithm [29]. GrabCut [5] is a powerful extension of the Graph Cut algorithm to segment color images iteratively and to simplify the user interaction needed for a given quality of the segmentation results. Section 5.1 explains the original semiautomatic GrabCut algorithm as developed by Rother et al. in [5], while its modification for automatic segmentation is presented in Section 5.2.

5.1. Original Semiautomatic GrabCut

The GrabCut algorithm learns the color distributions of the foreground and background by giving each pixel a probability to belong to a cluster of other pixels. It can be explained as follows: given a color image I, let us consider the of pixels, where , , and is the th color component in the used color space. The segmentation is defined as an array , , assigning a label to each pixel of the image, indicating if it belongs to the background or the foreground. The GrabCut algorithm consists mainly of two basic steps: initialization and iterative minimization. The details of both steps are explained in the following subsections.

5.1.1. GrabCut Initialization

The novelty of the GrabCut technique is in the “incomplete labeling” which allows a reduced degree of user interaction. The user interaction consists simply of specifying only the background pixels by dragging a rectangle around the desired foreground object (Figure 1). The process of GrabCut initialization works as follows.

Step 1. A trimap is initialized in a semiautomatic way. The two regions TB and TU contain the initial background and uncertain pixels, respectively, while . The initial TB is determined as the pixels around the outside of the marked rectangle. Pixels belonging to TB are considered as a fixed background, whereas those belonging to TU will be labeled by the algorithm.

Step 2. An initial image segmentation , , is created, where all unknown pixels are tentatively placed in the foreground class ( for ) and all known background pixels are placed in the background class ( for ).

Step 3. Two full covariance Gaussian mixture models (GMMs) are defined, each consisting of components, one for background pixels () and the other one for foreground (initially unknown) pixels (). The components of both GMMs are initialized from the foreground and background classes using the Orchard and Bouman clustering technique.

5.1.2. GrabCut Iterative Energy Minimization

The final segmentation is performed using the iterative minimization algorithm of the Graph Cut [6] in the following steps.

Step 4. Each pixel in the foreground class is assigned to the most likely Gaussian component in the foreground GMM. Similarly, each pixel in the background is assigned to the most likely background Gaussian component.

Step 5. The GMMs are thrown away and new GMMs are learned from the pixel sets created in the previous set.

Step 6. A graph is built and Graph Cut is run to find a new foreground and background classification of pixels.

Step 7. Steps 46 are repeated until the classification converges.

This has the advantage of allowing the automatic refinement of the opacities , as newly labeled pixels from the TU region of the initial trimap are used to refine the color of the GMM.

5.2. Proposed Automatic GrabCut

Although the incomplete user labeling of GrabCut reduces the user interaction substantially, it is still a requirement in order to initiate the segmentation process. This identifies GrabCut as a semiautomatic/supervised segmentation algorithm. In order to allow the image to be segmented into proper segments without any user guidance, this requires replacing the semiautomatic/supervised step of GrabCut initialization with a totally automatic/unsupervised one.

In this paper, the Orchard and Bouman [7] is proposed to be used as an image clustering technique to automatically set the initial trimap and the initial segmentation (Section 5.1, Steps 1 and 2). The distinction between the trimap and the segmentation formalizes the separation between the region of interest to be segmented and the final segmentation derived by the GrabCut algorithm. In the automatic technique, Steps 1 and 2 of the GrabCut initialization process will be modified as follows.

Step 1. While the original GrabCut constructs a trimap of two regions, TB and TU, as a fixed background and unknown regions, respectively, the proposed automatic technique considers the whole image as one unknown region TU, where , . This means that no fixed foreground or background regions are known and all image pixels will be involved in the minimization process to be labeled by the algorithm.

Step 2. The image is initially separated into two foreground TF and background TB regions, using the Orchard and Bouman clustering technique. During this step, a new GMM is introduced, which consists of only two components (): one component for the background pixels () and the other for the foreground pixels (). The Orchard and Bouman clustering technique is then applied and repeated until reaching the number of components () in the GMM, resulting in separating the image exactly into two clusters.

Step 3. The colors of image pixels belonging to each cluster (foreground and background clusters) generated from the previous step are then used to initialize another two full covariance Gaussian mixture models (GMMs) with ().

Steps 4–7. The learning portion of the algorithm runs exactly as the original GrabCut (Section 5.1, Steps 47).

6. Results and Discussions

The automatic GrabCut technique was experimentally tested using a dataset of 23 different images, as shown in Figure 2. According to literature, many recent works in the fields of cluster based image segmentation and automatic image segmentation are conducting their experiments on fewer numbers of images such as [9, 11, 30, 31]. They are using a dataset of 8, 4, 4, and 15 images, respectively. In this work using a dataset of 23 images can be considered a reasonable number of test cases. This dataset is collected partially from the Berkeley segmentation dataset [32] and from publically available images [33] in a way that matches certain criteria. These criteria consider a special fitting into two class segmentations, including having mainly one object (as a foreground) and a well separation between the foreground and background color regions.

For evaluation, it was noticed that no binary segmentations exist as part of the human segmentations included in the Berkeley segmentation dataset [32]. For this reason, the ground truth data for our selected dataset is manually generated using standard image processing tools (Adobe Photoshop). Figure 3 displays samples of the manual binary ground truths generated. The error rate and the overlap score rate are used as two evaluation metrics. The error rate is calculated as the fraction of pixels with wrong segmentations (compared to ground truth) divided by the total number of pixels in the image. The overlap score rate is given by , where and are any two binary segmentations.

In the first experiment, automatic GrabCut, which is initialized using Orchard and Bouman, is applied and compared to the original GrabCut algorithm. Figure 4 shows sample visual results for the segmentation using () components for GMMs as recommended by Rother et al. [5]. Table 1 shows the quantitative comparison between the original and modified GrabCut for the whole dataset as presented in Figure 2. As shown in Table 1, the automatic GrabCut using Orchard and Bouman clustering outperforms the original one in terms of minimizing the error and improving the segmentation accuracy. The average error rate is 3.64% for the automatic GrabCut compared to 4.28% for the original GrabCut technique. The overall performance looks better in terms of the standard deviation (SD) which exhibits 3.61% for the automatic GrabCut compared to 5.5% for the original GrabCut.

Some cases with bad segmentation error using the original GrabCut can be noticed in Table 1 (images 1 and 9). This explains one main drawback of the original GrabCut initialization, which makes the segmentation results sensitive to the user selection of the area of interest to be segmented. This occurs when other objects, which are out of interest, may be considered as part of the foreground by being located within the area of the dragged rectangular boundary around the object of interest. The segmentation results of these two images are visually illustrated in Figure 4(a). It can be noticed how a large portion of the leaf appears in the final segmentation of the insect image. The same problem occurred when considering the land as part of the foreground area with the elephant image. The quantitative comparisons of the error rates generated for these two images in Table 1 and visual comparisons in Figure 4 illustrate the efficiency of the automatic GrabCut in handling such a problem. The efficiency of the automatic GrabCut is provoked by preventing any hard constraints to be specified during initialization either for foreground or background (Section 5.2, Step 1).

In the second experiment, the automatic GrabCut, which is initialized using Orchard and Bouman, is applied with different color space models, including , , , , and . The features that identify each image pixel are only the values of its three components in the selected color space. The final segmentation results are obtained for all used images. For a quantitative comparison, Table 2 shows the error rate and the overlap score rate for the whole dataset. The results in Table 2 are ordered in ascending order from left to right in terms of the total number of good image segmentation results and the average error rates. We can see that the space is the one that obtains better results for most of the images in terms of the average error rate. and follow with very little increase in the average error rate. They exhibit almost the same average error and overlap score rates, which are 5.49% for the error rate and 95.35% for the overlap score rate and 5.63% for the error rate and 95.79% for the overlap score rate, respectively. Figure 5 shows visual segmentation results for some images, while Figure 6 shows graph plots of the average segmentation error rate and the overlap score rate for all different color spaces.

7. Conclusions and Future Work

In this paper, a modification of GrabCut is presented to eliminate the need of initial user interaction for guiding segmentation and hence converting GrabCut into an automatic segmentation technique. The modification includes using Orchard and Bouman as an unsupervised clustering technique to initialize the GrabCut segmentation process. Based on a dataset of 23 images, the experiments revealed that automatic GrabCut using Orchard and Bouman clustering outperforms the original GrabCut. It reduces the need for user intervention while segmentation and adds extra advantage for the GrabCut via automation. Furthermore, it provides robust and accurate segmentation with average error rates of 3.64% compared to the results of 4.28% average error rate that is achieved by the original GrabCut. In addition, the performance of the automatic GrabCut is evaluated using five different color spaces, , , , , and . The experimental results show that the segmentation results depending on the color space provided the best segmentation results compared to other color spaces for the considered set of images.

This study can be improved by enlarging the dataset and including different kinds of images. On the other hand, future work might include modifying the energy minimization procedure of the automatic GrabCut to allow for multilabel optimization and segmentation.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Supplementary Materials

The supplementary material includes a table showing the experimental results of a comparative study that was implemented in a previous research by the authors and which is currently submitted to the Journal of Computer Science and Technology (JCST) 2014 and under review. This study applies the automatic GrabCut which is initialized using different unsupervised clustering techniques and compares their performance based on the accuracy achieved to the problem of color image segmentation.

The study justifies the selection of the Orchard and Bouman clustering technique for the GrabCut initialization. According to the comparative study, the Orchard and Bouman clustering outperformed other unsupervised clustering techniques including Self Organizing Maps (SOFM) and Fuzzy C-means (FCM) for the automation of the GrabCut in terms of improving the segmentation accuracy and achieving the best error rate applied to the selected dataset of images.

  1. Supplementary Table