Abstract

Natural image segmentation is often a crucial first step for high-level image understanding, significantly reducing the complexity of content analysis of images. LRAC may have some disadvantages. (1) Segmentation results heavily depend on the initial contour selection which is a very skillful task. (2) In some situations, manual interactions are infeasible. To overcome these shortcomings, we propose a novel model for unsupervised segmentation of viewer’s attention object from natural images based on localizing region-based active model (LRAC). With aid of the color boosting Harris detector and the core saliency map, we get the salient object edge points. Then, these points are employed as the seeds of initial convex hull. Finally, this convex hull is improved by the edge-preserving filter to generate the initial contour for our automatic object segmentation system. In contrast with localizing region-based active contours that require considerable user interaction, the proposed method does not require it; that is, the segmentation task is fulfilled in a fully automatic manner. Extensive experiments results on a large variety of natural images demonstrate that our algorithm consistently outperforms the popular existing salient object segmentation methods, yielding higher precision and better recall rates. Our framework can reliably and automatically extract the object contour from the complex background.

1. Introduction

Object segmentation is one of the most important and challenging issues in image analysis and computer vision research. It facilitates a number of high-level applications, such as object recognition, image retrieval, image editing, and scene reconstruction [1, 2]. Most existing object segmentation systems adopt interaction-based paradigms [3, 4]; that is, users are asked to provide segmentation cues manually and carefully.

Although the interaction-based methods are promising, they all pose a critical problem in which they need the users’ semantic intention. Such manual labeling is time consuming and often infeasible. Moreover, the segmentation performance heavily depends on the user-specified seed locations. Thus, additional interactions are necessary when the seeds are not accurately provided. Specially, localizing region-based active contour (called LRAC) [5] is exactly one of the classic interaction-based methods. Segmentation results heavily depend on the initial contour selection. Thus, it needs the specified initial contour which should be close to the boundary of object.

For this reason, developing a sophisticated fully automatic object segmentation method has been strongly demanded. The human brain and visual system can effortlessly grasp certain salient regions in cluttered scenes. By observing the fact that, under most circumstances, the salient parts of an image are usually consistent with interesting objects to be segmented, salient regions have been attempted for estimation. In contrast with existing interaction-based approaches that specify the object and background seeds by manual labeling, some methods (e.g., Fu’s method [6] and Achanta’s method [7]) determine the seed locations based on the visual attention model. Since the accuracy of the visual attention model plays a crucial role in object segmentation, these algorithms also depend on the quality of the chosen saliency map. Alternatively speaking, the worse the chosen saliency map is, the worse the corresponding final extraction result is.

To remedy such shortcoming, we pay close attention to salient object edge points rather than the saliency map itself. After the salient object edge points were detected, the region which is constrained by these corner points will be obtained. The boundary of this region is close to the object edge. Thereby, the boundary of this region is used as the initial contour of LRAC model.

In our method, the salient edge points are generated by the color boosting Harris detector for input image firstly. We then explore the salient object seeds by the core saliency map, and the salient object edge points are determined by these salient object seeds. Initial contour is then created by convex hull algorithm with salient object edge points automatically. Finally, the object will be extracted accurately by LRAC method with the initial contour in the previous step.

The remainder of this paper is organized as follows. Section 2 reviews some related work about saliency models and an interactive image segmentation method. Section 3 presents the proposed salient edge point based active contour for natural object segmentation algorithm. Section 4 demonstrates extensive experimental comparison results. Section 5 finally draws the conclusions.

2.1. The State-of-the-Art Automatic Image Segmentation Methods

In [6], Fu et al. proposed an automatic object segmentation approach integrating saliency detection and graph cuts [8], namely, Fu’s method, to overcome the disadvantages of interactive graph cuts. They also explored the effects of labels to graph based segmentation, and the so-called “Professional Labels” are introduced to evaluate labels and a multiresolution framework is designed to provide such “Professional Labels” automatically. This method obtains quite complete object segmentation comparable to interactive graph cuts with manual “Professional Labels.”

Achanta’s method [7] is also an automatic image segmentation method. It oversegments the input image using mean-shift algorithm and retains only those segments whose average saliency is greater than an adaptive threshold. The binary maps representing the salient object are thus obtained by assigning ones to pixels of chosen segments and zeroes to the rest of the pixels.

These two methods are absolutely automatic and involve none of manual interactions. Fu’s method is based on either of the graph cuts while Achanta’s method uses mean-shift algorithm. However, there are several desirable advantages of LRAC over graph cuts and mean-shift algorithm. First, LRAC can achieve subpixel accuracy of object boundaries [5]. Second, LRAC can be easily formulated under a principled energy minimization framework and allow incorporation of various prior knowledge for robust image segmentation. Third, LRAC can provide smooth and closed contours as segmentation results which are necessary and can be readily used for further applications, such as shape analysis and recognition.

2.2. Localizing Region-Based Active Contour Model

In [5], Lankton and Tannenbaum proposed a natural framework that allows any region-based segmentation energy to be reformulated in a local way.

In general, this algorithm could reliably extract the object contour if the user inputs appropriate markers. Namely, the interactive segmentation algorithm is more or less sensitive to the position and quality of the user-inputs (see an example in Figure 1).

Here, we choose a complex energy that looks past simple means and compares the full histograms of the foreground and background. Consider and to be two smoothed intensity histograms computed from the global interior and exterior regions of a partitioned image using intensity bins.

Here, we choose the global region-based energy that uses mean intensities which is the one proposed by Wen et al. [9] which we refer to as histogram separation energy:where BC is the Bhattacharyya coefficient used to compare probability density functions and and represent two smoothed intensity histograms computed from the global interior and exterior regions of a partitioned image using intensity bins. Optimizing this energy causes that the interior and exterior means have the largest difference possible.

In [5], Lankton and Tannenbaum introduced to mask local regions. Function will be 1 when the point is within a ball of radius centered at and 0 otherwise.

Accordingly, the corresponding internal energy function is formed by localizing the histogram separation energy as shown in where and represent the intensity histograms in the local image regions and , respectively.

We can get the following local region-based flow: where is a Gaussian kernel, is a parameter which weights the length of the curve, denotes a bounded open subset of , and and are the areas of the local interior and local exterior regions, respectively, given byIn general, this algorithm could reliably extract the object contour if the user inputs appropriate markers. Namely, the original interactive segmentation algorithm is more or less sensitive to the position and quantity of the user-inputs. Although many markers were used to cover the object features, in some regions it does not achieve satisfying results (see the third row of Figure 1). Moreover, it is tedious and time consuming in some cases.

2.3. Saliency Detection Models

During the last two decades, visual saliency detection and saliency map generation aiming to find out what attracts human’s attention got broad interest in computer vision, especially for object detection or recognition from different scenes. A majority of computational models of attention follow the structure adapted from the Feature Integration Theory (FIT) [10] and the Guided Search model [11]. The saliency detection models fall into two general categories: local contrast based method and global contrast based methods.

Local contrast based methods investigate the rarity of image regions with respect to (small) local neighborhoods. Based on the highly influential biologically inspired early representation model introduced by Koch and Ullman [12], Itti et al. [13] define image saliency using central surrounded differences across multiscale image features. Ma and Zhang [14] propose an alternative local contrast analysis for generating saliency maps, which is then extended using a fuzzy growth model. Harel et al. [15] normalize the feature maps of Itti et al., to highlight conspicuous parts and permit combination with other importance maps. Liu et al. [16] find multiscale contrast by linearly combining contrast in a Gaussian image pyramid. More recently, Goferman et al. [17] simultaneously model local low-level clues, global considerations, visual organization rules, and high-level features to highlight salient objects along with their contexts. Such methods using local contrast tend to produce higher saliency values near edges instead of uniformly highlighting salient objects.

Global contrast based methods evaluate saliency of an image region using its contrast with respect to the entire image. Zhai and Shah [18] define pixel-level saliency based on a pixel’s contrast to all other pixels. However, for efficiency, they use only luminance information, thus ignoring distinctiveness clues in other channels. Achanta et al. [7] propose a frequency tuned method that directly defines pixel saliency using a pixel’s color difference from the average image color. The elegant approach, however, only considers first order average color, which can be insufficient to analyze complex variations common in natural images. A recent excellent model proposed by Cheng et al. [19], which is named RC, calculated the saliency map by evaluating global contrast differences based on histogram.

We compared the abovementioned 5 state-of-the-art saliency detection methods. The comparison results are shown in Figure 2.

3. The Proposed Method: LRACSEP

For the issues pointed out in Section 2, in this paper, we focus our attention on the automatic acquisition of prior information. For one pixel in a saliency map, the saliency value is proportional to the intensity value. In other words, normally, for an image, pixels which have higher values in the corresponding saliency map are object pixels; conversely, they are background pixels. Inspired by this idea, we proposed our approach called localizing region-based active contours via salient edge points (LRACSEP). This strategy is intended mainly for the acquisition of prior information automatically instead of user-inputs.

Our purpose is to set the initial contour close to the object boundary. It is noted that the color boosting Harris detector yields the salient edge points. Consequently, we have to detect the salient object edge points firstly. For this purpose, we propose the core saliency map to find the salient object edge points. As is known to all, the initial contour of the level set is a closed curve. Therefore, we choose convex-hull polygon to embody the detected salient object points.

A general schematic framework of our proposed method (LRACSEP) is depicted in Figure 3. The major steps include (i) detecting the salient edge points; (ii) obtaining the core saliency map; (iii) finding the core edge points corresponding to the core saliency map; (iv) detecting the salient object edge points based on the core saliency map; (iv) using convex hull to generate the initial level set contour.

3.1. Salient Edge Points Detection via the Color Boosting Harris Detector

Traditional luminance-based saliency detection methods incline to completely ignore the color information and thus are very sensitive to the background noises. van de Weijer et al. [20] analyze the statistical distribution of color derivative and propose a color saliency boosting function to enhance rare color edges or corners. Their goal is to incorporate color distinctiveness into salient point detection or, mathematically, to find the transformation for which vectors with equal information content have equal impact on the saliency function. The desired color saliency boosting function is obtained bywhere is a diagonal matrix with , , , and , is one of the color transformations , , or , and for a color image.

Meanwhile, the Harris corner detector [21] is a popular interest point detector due to its strong invariance to rotation, scale, illumination variation, and image noise. The Harris detector has been shown to outperform other detectors both on “shape” distinctiveness and repeatability.

The Harris corner detector is based on the local autocorrelation function of a signal, where the local autocorrelation function measures the local changes of the signal with patches shifted by a small amount in different directions. Thereby, the boosting color saliency theory can be applied to Harris detector. As can be seen in Figure 4, compared with the intensity-based feature detectors, the boosted color saliency points [20] are shown to be more stable and informative.

In this paper, we adopt the color boosting Harris points as salient points (Figure 4(d)) to catch the corners or marginal points of visual salient region in color image. The salient points provide us a coarse location of the salient areas. These points are denoted by , . However, these points contain not only salient object points but also salient background points. The salient background points (from the tree in Figure 4) are noises for us to get the initial contour close to the object. Thus, the objective of our model is to distinguish object points from background points. It is exactly binary classification problem. Hereby, we will present a clustering method to find salient object points, which is based on the initial object seeds. Therefore, the objective is to select the most appropriate initial seeds. For this purpose, we present the core saliency map. The seeds of salient points are determined by the core saliency map.

3.2. The Seeds Determined by the Core Saliency Map

We choose the three prominent saliency models: RC, MZ, and FT. MZ is local contrast method while RC and FT are based on global contrast. The reason of choosing the two global contrast based models is that FT can output desirable results with very efficient computation while RC can well represent the regional contrast feature and is insensitive to local sudden changes.

As can be seen in Figure 5, the details highlighted by the three saliency maps (, , and ) are not the same. In spite of this, these saliency maps prefer to highlight the common parts of objects (referred to as core saliency map). For any pixel , the core saliency map () is computed asFor convenient show, we propose the core map (), which is the binarization of :where is the binarization operator. To bianarize , we introduce adaptive threshold which is determined aswhere and are the height and width of the image, respectively. The corresponding core map is exactly Figure 5(d).

Pixels which are included in the core map are highly likely to be parts of the object. Consequently, the points which are included not only in the salient edge points (white dots in Figure 4(d)) but also in the core map are labeled as foreground seeds. These seeds are indicated by blue dots in Figure 6.

3.3. The Salient Object Edge Points Detection and Using Convex Hull

As known to all, each superpixel is a perceptually consistent unit; that is, all pixels in a superpixel are most likely uniform in color and texture. For this reason, provided that one of the color boosting Harris points is in the same superpixel with the foreground seed, this point should be treated as the salient object edge points. According to this strategy, the search of the salient object edge points is shown in Figure 7. We can observe that the points in the left-hand part of Figure 7(c) are omitted in Figure 7(d).

Then the convex hull (Figure 8(a)) is used to embody these salient object edge points. The contour of the convex hull (green line in Figure 8(b)) is chosen as the initial contour of LRAC model. This initial contour is not sufficiently close to the boundary of this object.

3.4. Improved Convex Hull by an Edge-Preserving Filter

Given input image and initial convex hull , we want to get a refined convex hull. We note that solving this problem is similar to the image matting method. Therefore, our goal can be achieved by minimizingwhere denotes the output that we want to get, is the initial convex hull, is a diagonal matrix encoded with the weights of the constraints, and is the matting Laplacian matrix [21]. The th element of is given bywhere is Kronecker delta, and are pixel indexes of input image , is a mean vector of the colors in a squared window with dimensions , centered at pixel , is a covariance matrix, is a identity matrix, denotes the number of pixels in the window , and is a smoothness parameter.

As can be seen from Figure 9(a), the initial contour (the contour of the convex hull) is close to the object boundary. It gives rise to the fact that LRAC model provides good segmentation performance and the times of iterative steps are reduced in contour evolution.

We use more images to better show the performance of our improved convex hull. They are shown in Figure 10. It is obvious that the obtained convex hull is more close to the real object than the initial convex hull.

4. Experiments

In order to verify the proposed method, we have evaluated the results of our approach on the publicly available database provided by Achanta et al. [7]. This database includes 5000 images, originally containing labeled rectangles from nine users drawing a bounding box around what they consider the most salient object. There is a large variation among images, including natural scenes, animals, indoor, and outdoor. To the best of our knowledge, the database is the largest of its kind and has ground truth in the form of accurate human-marked labels for salient regions. For consistency in these experiments, we chose in all trials to weight the influence of contour smoothness.

4.1. Comparison and Evaluation

Firstly, to measure the segmentation performance of LRACSEP algorithm comprehensively, we compare LRACSEP algorithm with the Grabcut [22] algorithm using more saliency maps, that is, the abovementioned 9 state-of-the-art saliency maps. Grabcut is very useful for image segmentation and one can get satisfactory results when giving a very informative input. It enabled users to roughly annotate (e.g., using a rectangle) a region of interest and then use Grabcut to extract a precise image mask. To automatically initialize Grabcut, we use a segmentation obtained by binarizing the saliency map using a fixed threshold. We set the threshold to 0.3 empirically. Once initialized, we iteratively run Grabcut 4 times to improve the segmentation result. Figure 11 shows the comparison results.

Here, we use the precision, recall, and -measure to evaluate the performance of our proposed model. Given a ground-truth segmentation map and the detected segmentation map for an image, we have-measure, a harmonic mean of precision and recall, is a measure that combines precision and recall. It is calculated as follows:where is a positive parameter to decide the importance of precision over recall in computing the -measure.

We use [19] in our work for fair comparison. The segmentation performance is compared in Figure 11. It is shown in the figure that the proposed method significantly outperforms the abovementioned 9 models with respect to precision, recall, and -measure.

As seen in Figure 11, the Grabcut using RC saliency map is better than other saliency maps based Grabcut. For the convenience of visual inspection of the segmentation performance, the LRACSEP method is compared with the Grabcut on RC saliency map on a group of images (see Figure 12). As shown in Figure 12, the Grabcut on RC saliency map yields high false-positive (i.e., the background areas misclassified to object areas) and false-negative (i.e., the object areas misclassified to background areas) rates. In contrast with that, the proposed algorithm robustly works even with complicated cluttered background. Such favorable segmentation results can be achieved since we use localizing region-based active contour model which can achieve subpixel accuracy of object boundaries. Additionally, for the Grabcut on RC saliency map, the performance of saliency map affects the final segmentation result.

We secondly measure the segmentation performance of the proposed algorithm, as compared with existing competitive automatic salient object segmentation methods, such as Fu’s method [6] and Achanta’s segmentation method [7]. Figure 13 shows the segmentation performance of the three methods. It is shown in the figure that the proposed method significantly outperforms the state-of-the-art algorithms with respect to precision, recall, and -measure.

4.2. The Comparison of Iteration Times

To verify the effectiveness of our method, we compare LR with the abovementioned two state-of-the-art algorithms: Fu’s method [6] and Achanta’s segmentation method [7]. The average numbers of iterations are depicted in Figure 14. It can be observed that our method is more efficient. The reason for the advantage of our method is that our method makes use of the salient edge points, while the other two methods are based on the saliency maps. The computation of saliency map is consuming.

5. Conclusions and Future Work

In this paper, we propose a novel automatic approach to extract interesting objects from natural images. This approach uses the salient edge points as the prior knowledge. It makes the original semisupervised segmentation method LRAC become unsupervised. Our main contributions are threefold: the first is that the core saliency map is proposed to determine the foreground seeds; the second is that salient object edge points are detected by the foreground seeds; the last is that the proposed framework can apply any active contour model to segment the salient object automatically. From the experimental results, our method is better than several state-of-the-art saliency-based segmentation methods on the public database. In contrast with existing interactive segmentation approaches that require considerable user interaction, the proposed method does not require it; that is, the segmentation task is fulfilled in a fully automatic manner.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is sponsored by the National Natural Science Foundation of China (NSFC) no. 61402192, JiangSu Qing Lan Project, Six Talent Peaks project in Jiangsu Province, Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant no. 14KJB520006), Jiangsu 333 Project, the Science & Technology Fund of Huai’an under the Grants nos. HAG2013059, HANZ2014006, and HAG2014028, the open fund of Jiangsu Provincial Key Laboratory for Advanced Manufacturing Technology (HGAMTL-1401), and the open fund of Jiangsu Provincial Key Laboratory for Interventional Medical Devices (JR1405).