Abstract
A novel algorithm for automatic foreground extraction based on difference of Gaussian (DoG) is presented. In our algorithm, DoG is employed to find the candidate keypoints of an input image in different color layers. Then, a keypoints filter algorithm is proposed to get the keypoints by removing the pseudokeypoints and rebuilding the important keypoints. Finally, Normalized cut (Ncut) is used to segment an image into several regions and locate the foreground with the number of keypoints in each region. Experiments on the given image data set demonstrate the effectiveness of our algorithm.
1. Introduction
In the image processing, the foreground is an integral part of the objective image. It takes important advantage in many applications [1–4]. For example, in the field of object recognition, in 2011, Rosenfeld and Weinshall [5] proposed an algorithm to extract a foreground mask and to identify the locations of objects in the image. In the field of object tracking, in 2012, Wang et al. [6] used partial least squares (PLS) analysis to label the foreground and background of an image and the results showed that the proposed tracking algorithm was very powerful with the labeled foreground. In the field of contentbased image retrieval, in 2006, Shekhar and Chaudhuri [7] investigated the influence of the foreground. In the field of image editing, in 2008, Levin et al. [8] indicated that the process of extracting a foreground object from an image based on limited user input was an important task in image editing.
In current methods, foreground extraction can be classified into two categories [9], one is the interactive foreground extraction and the other is the automatic one. The interactive foreground extraction can accurately find artificial areas from the input images; however, it can do nothing when the task is to extract foregrounds from thousands of images. In this case, the technology of extracting foregrounds automatically is becoming more and more important. Moreover, it can be applied in many fields, such as image segmentation, image enhancement, object recognition, and contentbased image retrieval.
In 2011, Kim et al. [10] proposed an automatic method to extract a foreground object captured from multiple viewpoints. Their result was the high quality alpha mattes of the foreground object consistently across all different viewpoints. In 2012, Zhang et al. [11] proposed a technique of automatic foregroundbackground segmentation based on depth from coded aperture. Their entire progress was fully automatic, without any manual intervention. In 2013, Hsieh and Lee [12] proposed an automatic trimap generation technique. The experimental results showed that the trimap generated by the proposed method effectively improves the matting result. Moreover, they processed the enhancement of the accuracy of the trimap results in a reduction of regions, so that the extraction procedure can be accelerated.
In recent years, with the development of video technology, the research on the automatic foreground extraction will become more and more popular. In this paper, we present a novel approach for automatic foreground extraction based on difference of Gaussian. We employ the difference of Gaussian (DoG) to find candidate keypoints. After filtering and rebuilding the candidate keypoints, we get the refined keypoints of an input image. With Ncut, we extract the foreground from the original image.
The rest of this paper is organized as follows. In Section 2, we introduce the key steps of proposed extraction algorithm, including basic framework, candidate keypoints locating, and keypoints filter. In Section 3, the algorithm is presented. In Section 4, some excellent experimental results are shown. Finally, we give a conclusion with this research work.
2. Foreground Extraction Based on Difference of Gaussian
2.1. Framework
The motivation of this work is to develop a useful technology to extract the foreground from an input image. The contributions of this paper are as follows.(i)A new procedure to find candidate keypoints in different color layers is proposed. It helps us to find the point that has a more obvious color difference than its neighbours. It is implemented efficiently by using a difference of Gaussian function to find candidate keypoints. We fulfill this task in different color layers.(ii)Novel filtering operators are constructed to remove the pseudokeypoint and rebuild the important keypoints. This stage can be summarized into two steps. The first is to reduce the candidate keypoints to find the edge of the foreground by the result of Ncut. Another step is to rebuild the points to increase the proportion of candidate keypoints by a novel approach. At last, we call these candidate points keypoints.(iii)Novel operator for foreground extraction is proposed. We locate the foreground by the proportion of keypoints and segment the foreground by the result of Ncut.
The basic procedures to locate the foreground are illustrated as in Figure 1.
2.2. Regional Segmentations Based on Normalized Cut
In this work, we use a wellknown technique to segment the input image, namely, the Ncut method [15]. Based on the studies in [15], Ncut is from graphpartitioning method. We map the image into a graph , where is the set of nodes, and is the set of edges connecting the nodes. A pair of nodes and is connected by edge and is weighted by to measure the dissimilarity between them. The basic optimal model of graph cut can be given by in which In [15], Shi and Malik proposed Ncut method based on the following optimal model: in which where in which denotes the total connection from nodes in to all nodes in the graph and is similarly defined.
In practice, Ncut is a powerful segmentation method. After employing the Ncut algorithm, if we denote the th part of the th input image as , the input image is segmented into parts
2.3. Candidate Keypoints Locating
An input image can be seen as a surface on square domain . In the discrete space, if the total number of pixels is , the domain can be represented as .
Candidate keypoints of an input image are detected firstly by difference of Gaussian (DoG) algorithm. The difference of Gaussian with the scale and constant multiplicative factor can be computed by in which is the scale space and denotes the boundary of the input image. can be obtained by where is the convolution operation and In practice, the size is usually chosen as and the constant multiplicative factor is chosen as .
We will detect maxima and minima of the difference of Gaussian images by comparing a pixel to its 26 neighbors in 3 × 3 regions at the current and adjacent scales. Once the value of a pixel is maxima or minima, we regarded this pixel as a candidate point. Mathematically, for any , we get the map () of the candidate points, where
2.4. Keypoints Filtering
In general, the candidate points can not be used to detect right foreground regions. They always gather in the areas with mixed colors. However, these areas may be in the background. Motivated by this observation, a point filtering function is constructed to reduce the candidate points in the background and rebuild some new candidate points in the foreground. We call these candidate points as keypoints.
The filtering function is formulated as in which means that is the candidate point; otherwise is not. The selection function can be formulated as follows:
Remark 1. We create a 5 × 5 filter to reduce the candidate points. Only one candidate point can remain in the 5 × 5 filter.
Through the filter, the number of the candidate points is decreased dramatically. In particular, for the dense candidate points in one region, the reduction is obvious.
In the next step, we employ the Ncut to find the edges of an input image. With Ncut, we segment the image into many regions and locate the edge points.
We select the more useful candidate points along the edges. We keep the candidate points on the edges. This process can be formulated as follows:
In the next step, we will rebuild some new candidate points.
Firstly, we need to define the focus of an image. We regard center of the region that contains the largest number of candidate points as the focus. The region that the focus in () can be computed by where is the th region of an input image and
The initial focus is located in the center of the image. The focus may shift to the other regions that contains the largest number of candidate points.
The candidate points will be rebuilt towards the focus. For an input image, there are four boundaries. We reserved the boundary information included in each area in the candidate points of this area. Some weight is added to each already existed point. The weight is determined by the boundary information of the candidate point. The more boundaries it contains, the smaller the weight is. The weight can be computed by
We regard the candidate points which are obtained from the above steps as keypoints. At last, we locate the foreground by the number of keypoints in each region. The process of filtering keypoints is displayed in Figure 2.
3. Proposed Algorithm to Extract the Foreground
In this section, we give the basic procedures of our proposed algorithm. The main steps include difference of Gaussian (DoG), generation of candidate points, keypoints filtering, and locating the foreground. The pseudocodes are listed in Algorithm 1.

Remark 2. In Algorithm 1, and are two constants given by users or experts. Otherwise, they can be determined by a learning procedure. In this paper, and are determined by the observation value on many experimental results.
Remark 3. It is a big trouble problem for noisy images, because it is very difficult to detect the keypoints with DoG. In this case, we need to employ the operator to move the noise away at the beginning step of this algorithm.
4. Experiments
4.1. Images Data Set
We evaluate our extraction technique in two different data sets. The first data set consists of 27 images that are the most popular images used to extract foreground interactively. The second one is created by ourselves which contains 26 images. The latter one is more complicated than the front.
Some excellent results in the first data set are shown in Figure 3 and the other excellent results in the second data set are shown in Figure 4. Observations on Figures 3 and 4, our proposed algorithm, can extract the foreground beyond 95%. The original images in Figure 3 have rich color and texture in the foreground. When we use the DoG to check out the candidate keypoints, the number of keypoints in the foreground is larger than in the background. In this case, it is easy to extract the foreground. However, few of the images in the first data set are difficult to extract foreground automatically because there is confusion between foreground and background. For the images with outstanding target and complicated background, although more keypoints in the background are obtained, we can get effective keypoints by the keypoints filtered function. So we can extract the foreground with high performance, for example, the lady, the dog, and the postbox.
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
In order to the effectiveness of our proposed method, two good algorithms are employed to extract the foreground from our 27 images. They are wellknown in the saliency detection. One is regional contrast (RC) method and the other is twostage scheme (TSS) method.
4.2. Regional Contrast Based Saliency Extraction Algorithm (RC)
Cheng et al. [13] proposed a regional contrast based saliency extraction algorithm (RC), which simultaneously evaluates global contrast differences and spatial coherence. Their algorithm was simple, efficient and yields full resolution saliency maps.
At the beginning of the algorithm, an input image is mapped into a graphic, which is used to segment the image by GB [16]. The mapping operator is as follows: where denotes the pixel in the image. denotes the edge connected between adjacent pixels. is the weight of the edge.
With the minimum spanning tree and the smallest weight value between two vertexes, the input image is segmented into several regions as follows: in which the number means that the image is mapped into parts.
For each segmented region, its salient values are calculated by comparing itself with the value of the other regions in Lab color space. In the same region, each pixel has the same salient value. The spatial distance information is also the important factor that influences salient value, so we consider it in the saliency detection. If one segmented region is close to current segmented region, the saliency influence of it is big. Otherwise, the influence is small.
The formula that adds spatial weights is as follows [13]: where is the weight value of region , which is defined as the number of pixels in . is the distance between region and , which is defined as the Euclidean distance between their centers of gravity. is used to control the strength of spatial weight. If the value is big, the impact of the spatial weight is great, and the region far from the current region will have a stronger impact. Here, is set as 0.4, and the pixel coordinates are all normalized to . is the color distance metric between and .
The color distance formula between and is defined as follows [13]: in which is the frequency of th color among all colors in th segmented region .
4.3. TwoStage Scheme for BottomUp Saliency Detection (TSS)
In 2013, Yang et al. [14] proposed a twostage scheme (TSS) for bottomup saliency detection using ranking with background and foreground queries. In this subsection, we introduce the basis procedures and the primal ideas of the TSS. The following sentences are referred from [14].
At first, an input image is represented as a closeloop graph with superpixels as nodes (a graph with superpixels as nodes, is a set of nodes and is a set of undirected edges). The weight between two nodes is defined by in which and denote the mean of the superpixels corresponding to two nodes in the feature space and is a constant that controls the strength of the weight.
The graphbased ranking technique is employed to calculate the similarity of the image elements (pixels or regions) with foreground cues or background cues. The basic idea is that for a given node as a query, the remaining nodes are ranked based on their relevances to the given query. The goal is to learn a ranking function, which defines the relevance between unlabelled nodes and queries. The saliency of the image elements is defined based on their relevances to the given seeds or queries. The saliency map of the first stage is binary segmented (i.e., salient foreground and background) using an adaptive threshold, which facilitates selecting the nodes of the foreground salient objects as queries. The selected queries cover the salient object regions as much as possible (i.e., with high recall). The threshold is set as the mean saliency over the entire saliency map. Once the salient queries are given, an indicator vector is formed to compute the ranking vector using the equation In (22), can be regarded as a learnt optimal affinity matrix and can be determined by the supervised manifold learning (details can be seen in Section 2.2 from [14]). As is carried out in the first stage, the ranking vector is normalized between the range of 0 and 1 to form the final saliency map. It is calculated by where indexes superpixel node on graph and denotes the normalized vector.
4.4. Results
The results of these methods are displayed in Figures 3 and 4. It is obvious that better results can be obtained by our method in most cases.
5. Conclusion
In this paper, a novel approach for automatic foreground extraction is proposed. It is based on the difference of Gaussian (DoG). We create a keypoints filter to obtain the keypoints which are used to locate the foreground region in the image. Normalized cut (Ncut) is used to cut the image into different regions and find the information of the boundaries. This approach can be better applied to the image of which foreground is easy to identify by interactive foreground extraction. So our experiments are taken on the data set for interactive foreground extraction.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research has been supported by the National Natural Science Foundations of China (Grants 61370174,61001200), Open Project Program of the State Key Lab of CAD&CG (Grant no. A1213), Zhejiang University, and Natural Science Foundation of Shanghai Province of China (11ZR1409600).