Abstract

We present a nonlocal variational model for saliency detection from still images, from which various features for visual attention can be detected by minimizing the energy functional. The associated Euler-Lagrange equation is a nonlocal -Laplacian type diffusion equation with two reaction terms, and it is a nonlinear diffusion. The main advantage of our method is that it provides flexible and intuitive control over the detecting procedure by the temporal evolution of the Euler-Lagrange equation. Experimental results on various images show that our model can better make background details diminish eventually while luxuriant subtle details in foreground are preserved very well.

1. Introduction

Saliency is an important and basic visual feature for describing image content. It can be particular location, objects, or pixels which stand out relative to their neighbors and thus capture peoples’ attention. The saliency detection technologies, which exploit the most important areas for natural scenes, are very useful in image and video processing such as image retrieval [1], video compression [2], and video analysis [3]. However, saliency detection is still a difficult task because it somewhat requires a semantic understanding of the image. Furthermore, the difficulty arises from the fact that most of the natural images contain variant texture and color information. So far, a large number of good algorithms and methodologies have been developed for this task. Saliency detection methods can be roughly categorized as biologically based [4, 5], purely computational [612], or those that combine the two ideas [1315].

Itti et al. [4] devise their method based on the biologically plausible architecture proposed by Koch and Ullman [16], in which multiple low-level visual features, such as intensity, color, orientation, texture, and motion are extracted from images at multiple scales and are used for saliency computing. They determine center-surround contrast using a difference of Gaussians approach. Inspired by Itti’s method, Frintrop et al. [5] present a method in which they compute center-surround differences with square filters.

Different from the biological methods, the pure computational models [612] are not explicitly based on biological vision principles. Ma and Zhang [6] and Achanta et al. [7, 8] measure saliency using center-surround feature distances. Hou and Zhang [9] devise a saliency detection model based on a concept defined as spectral residual (SR). Liu et al. [10] obtain the saliency map of images from the technology of machine learning. The model in [11] achieves the saliency maps by inverse Fourier transform on a constant amplitude spectrum and the original phase spectrum of images. Feng et al. [12] define the multiscale contrast features as a linear combination of contrasts in the Gaussian image pyramid.

The third category of methods are partly based on biological models and partly on computational ones, that is, the combination of the two ideas. For instance, Harel et al. [13] create feature maps adopting Itti’s method but perform their normalization by a graph-based approach. In [14], Bruce and Tsotsos present a saliency computation method within the visual cortex which is based on the premise that localized saliency computation serves to maximize information sampled from one’s environment. Fang et al. [15] propose a saliency detection model based on human visual sensitivity and the amplitude spectrum of quaternion Fourier transform.

These methods [415] build up elegant maps based on biological theories and/or computational framework. However, some key characteristics in the object are still neglected in these models. For example, the saliency maps generated by the methods [46, 9, 13] have low resolution. Moreover, the outputs of [4, 5, 13] have ill-defined boundaries, and the methods [6, 9] produce higher saliency values at object edges instead of the whole object. The methods [7, 8, 15] capture the saliency maps of the same size as the input image. Though methods [7, 8] achieve higher precision than methods [46, 9, 13], the information in the background cannot be well suppressed. Additionally, the method [15] seems difficult to extract subtle details (e.g., the texture in saliency) which are very important for visual perception and are primary visual cue for pattern recognition. Moreover, the study of human attention mechanism is not mature yet. Therefore, if we are concerned with high level application such as image retrieval and browsing, we should exploit some mechanism producing accurate saliency.

In this paper, we focus on the problem of saliency detection in the variational framework. The main advantage of variational methods for image processes is that they can be easily formulated under an energy minimization framework and allow the inclusion of constrains to ensure image regularity while preserving important features. Over the past decades, many researchers have devoted their work to the development of variational models and proposed many good algorithms to solve important topics in image analysis and computer vision, including anisotropic diffusion for image denoising [17], p-Laplacian evolution for image analysis [18], nonlocal p-Laplacian evolution for image interpolation [19], active contour model for image segmentation [20], and complex Ginzburg-Landau equation for codimension-two objects detection [21] and image inpainting [22], respectively. But, to our knowledge, there exist very few saliency detection methods which take benefits of variational framework.

Inspired by the nonlocal p-Laplacian [19, 23] and the complex Ginzburg-Landau model [21, 22], we propose a nonlocal p-Laplacian regularized variational model for saliency detection. Our work is a pure computational model for saliency extraction from still images. The proposed energy functional is described by a diffusion-based regularization, phase transition, and a reaction term for the fidelity. In the energy functional, the nonlocal p-Laplacian is introduced to penalize the intermediate values of image intensity, and the phase transition makes the background vanish while preserving visually prominent features. Our approach offers the following technical features. First, we formulate saliency detection as a phase transition over an image domain and then a variational framework for saliency selection is developed. Various visual features can be detected by minimizing the energy functional in the variational framework. Second, a dynamical formulation follows naturally from the definition of the energy functional. The associated Euler-Lagrange equation is a nonlocal p-Laplacian type diffusion equation with a nonlinear reaction term for saliency extraction and a linear reaction term for the fidelity. It achieves the control of the information flow from original images to saliency maps. So the process of saliency extraction is a nonlinear diffusion. This makes our method quite different from the existing models for saliency detection. Third, our method employs the nonlocal p-Laplacian regularization which restricts the features of the resulting image. Compared to the classical p-Laplacian regularization, the direction of edge curves indicated by nonlocal p-Laplacian is more accurate than the direction indicated by gradient in p-Laplacian equation [19]. Due to the accuracy of our model, our saliency maps can be seen as salient objects directly. Experimental results on various images show that our model can better make background details diminish eventually, while luxuriant subtle details in foreground are preserved very well.

The remainder of this paper is organized as follows. In Section 2, we review the Ginzburg-Landau model and the nonlocal evolution equation. The proposed model is introduced in Section 3. Section 4 presents numerical method, followed by experiments and results in Section 5. This paper is summarized in Section 6.

2. Background

2.1. The Ginzburg-Landau Models

The Ginzburg-Landau equation was originally developed by Ginzburg and Landau [24] to phenomenologically describe phase transitions in superconductors near their critical temperature. The equation has proven to be useful in many areas of physics or chemistry [25]. A lot of mathematical theories about this matter can be found in the literature [26]. Moreover, Ginzburg-Landau equations have already been used for image processing [21, 22, 27, 28]. Most of them rely on the simplified energy as or on the associated flow governed by the following evolution equation: where is a small nonzero constant and is a complex-value function indicating local state of the material: if , the material is in a superconducting phase; if , it is in its normal phase. A rigorous mathematical theory on the Ginzburg-Landau functional shows that there exists a phase transition between the above two states [21]. Minimization of the functional (1) develops homogeneous areas which are separated by phase transition regions. In image processing, homogeneous areas correspond to domains of constant grey value intensities and phase transitions to features.

2.2. Nonlocal Evolution Equations

Recently, nonlocal evolution equations have been widely used to model diffusion processes in many areas [19]. Let us briefly introduce some references of nonlocal problem considered along this work. A nonlocal evolution equation corresponding to the Laplacian equation is presented as follows: The kernel is a nonnegative, bounded continuous radial function with (compact support set). Equation (3) is called a nonlocal diffusion equation since the diffusion of the density at a point and time depends not only on but also on all the values of in a neighborhood of through the convolution term . This equation shares many properties with the classical heat equation . This nonlocal evolution can be thought of as nonlocal isotropic diffusion.

For the p-Laplacian equation , a nonlocal counterpart was studied mathematically in the literature [23] with Neumann boundary condition. It was proved that the solution of (4) converges to the solution of the classical p-Laplace if and to the total variation flow when with Neumann boundary conditions when the convolution kernel is rescaled in a suitable way [23]. The energy functional corresponding to (4) is

3. The Proposed Model for Saliency Detection

In this section, we propose a variational model (nonlocal p-Laplacian regularized variational model) whose (local) minima extract salient objects from image background.

3.1. Nonlocal p-Laplacian Regularized Variational Model

Let be an image domain. For a given image , and we construct a complex-value image from image as following. We first rescale the intensity image into interval by the formula ) ( or ) and assume ; then is identified with real part of the complex image , so that for all . In order to extract salient objects from a still image, we propose the following energy functional: with where , , is a small constant, is a complex-valued function, and is defined by energy functional (5). Note that the functional is slightly different from the second term of the Ginzburg-Landau model (1).

In the following, we will explain the proposed energy functional defined as (6).(I)The functional in (6) serves the purpose of penalizing the spatial inhomogeneity of . As we know, certain penalties on intermediate densities are equivalent to restrictions on the microstructural configuration [29]. So the nonlocal p-Laplacian acts as a regularizer to restrict the feature of the resulting images, physically.(II)The potential in (6) has clearly a minimum at . Thus the minimization of the functional (6) develops homogeneous areas separated by phase transition regions, which makes almost everywhere after enough diffusion except for the regions of the visually prominent features.(III)The third term is a fidelity term which forces to be a close approximation of the original function .

3.2. Behavioral Analysis of Our Model

In calculus of variations, a standard method to minimize the functional is to find steady state solution of the gradient descent flow equation as where is Gteaux derivative of the functional . Equation (8) is an evolution equation of a time-dependent function with a spatial variable in the domain and an artificial time variable , and the evolution starts with a given initial function . So a dynamical formulation follows naturally from definition of the energy functional (6) as with the initial condition and the Neumann boundary condition on (where is the outward unit normal to ), where The kernel in (10) is a nonnegative, bounded continuous radial function with and satisfies the following properties:(1),(2), if , and ,(3).

Equation (9) is a nonlocal p-Laplacian type diffusion equation with nonlinear reaction terms. Here we will explain further the nonlocal p-Laplacian equation. The nonlocal p-Laplacian in (9) acts as a regularizer to restrict features of the output images. First, the regularizer shares many properties of the classical p-Laplacian regularization. In the case of saliency detection, we can achieve a reasonable balance between penalizing irregularities (often due to noise) and reserving intrinsic image features by the regularizers with different values of . Second, the regularizer improves the classical p-Laplacian regularization based on local gradient because the nonlocal diffusion at a point and time depends on all the values of in a larger neighborhood of . The evolution process at artificial time given by (9) is viewed as an anisotropic energy dissipation process. The direction of anisotropic diffusion is indicated by in a larger neighborhood. It approximates to the direction of edge curve more accurately than the direction indicated by gradient.

We conclude this subsection by discussing dynamical behavior of the formula (9). The temporal evolution of the dynamical formulation (9) makes the energy in (6) decrease monotonically in time. We may make a supposition that the regions with less activity in temporal evolution have rich information and are most likely to attract human attention. Therefore, the irrelevant information will be suppressed gradually, and the visual features can be preserved to the last. This achieves the control of information flow from original images to saliency maps.

4. Numerical Algorithms

In this section, we briefly present the numerical algorithm and procedure to solve the evolution equation (9). In this paper, is a complex-valued function. Let , and we can get the following Euler-Lagrange equations with (9):with the initial condition and .

Equation (9) can be implemented via a simple explicit finite difference scheme. Let and be space and time steps, respectively, and let be the grid points. Let with . Then we discretize time variable using explicit Euler method for (9) and we have The iteration formulas are given by

Remark 1. In all numerical experiments, we choose the following kernel function: The constant is selected such that .

Remark 2. For color image, let , and be red, green, and blue channels of the input image, respectively. I, the intensity channel, is defined to be used in our model, where

5. Experiments and Results

The proposed nonlocal p-Laplacian regularized variational model has been applied to detect saliency in varied image scenes. In our experiments, the function is initialized to ( or ) and . If the saliency in image is brighter than the background, . If the saliency is darker than the background, . We always use the following parameter setting for all the experiments: , , , and time step . Our saliency maps are displayed by , and the initial state corresponds to the intensity image . The reason for using as the saliency maps is that is the rescaling of , which is anisotropic diffusion from initial data , by the evolution equations (13).

Figure 1 demonstrates the effects of the proposed method on various images with objects having luxuriant subtle details and/or complex background. We compare our saliency maps with four state-of-the-art methods. The four saliency detectors are Hou and Zhang [9], Harel et al. [13], Achanta et al. [8], and Fang et al. [15], hereby referred to as SR, GB, IG, and AS, respectively. The codes of AS model are cited from http://qtfm.sourceforge.net/. And the results of SR, GB, and IG models are cited from http://ivrg.epfl.ch/supplementary_material/RK_CVPR09/index.html. From Figure 1, we can see that our saliency maps have well-defined borders, highlight whole objects features, and suppress background better than the other methods even in the presence of complex backgrounds. In addition, our saliency maps have higher accuracy than the previous approaches. We can see from Column 6 that the preservation of subtle details in foreground is very good for these test images; for example, subtle details such as the textures in petals, the downy flower of a dandelion, and the hairs of dog are maintained clearly. In methods SR and GB, Col. 2 and Col. 3 show that the net information retained from original image contains very few details and represents a very blurry version of the original image. In method IG, Col. 4 shows that the high frequencies from the original image are retained in saliency maps whereas some details in background are still clear. In method AS, Col. 5 shows that the background information is suppressed better, but some subtle details in saliency are smoothed out, and the saliency maps suffer from “stair-case” effects for smooth-texture salient objects, for example, the egg in Figure 1. Due to the accuracy of our model, our saliency maps can be seen as salient objects directly. However, in order to segment a salient object, the other methods need to binarize saliency map such that ones (white pixels) correspond to salient object pixels while zeros (black pixels) correspond to the background [8].

In order to perform an objective comparison of the quality of the saliency maps with other methods, we adopt the precision, recall, and F-measure used by Achanta et al. [8] and Fang et al. [15] to evaluate these methods. The quantitative evaluation of this experiment is based on 1000 images which come from the experimental settings of Achanta et al. [8]. This image database includes original images and their corresponding ground-truth saliency maps. The quantitative evaluation for a saliency detection algorithm is to see how much the saliency map from algorithm overlaps with the ground-truth saliency map. And then for a ground-truth saliency map and the detected saliency map S, we have and , with a nonnegative : . We set = 0.3 in this experiment as in the literature [15] for fair comparison. The comparison results are shown in Figure 2. It can be clear that the overall performance of our proposed model for 1000 images is better than the others under comparison in terms of all three measures.

6. Conclusion

In this paper, we develop a variational model for saliency detection, which bases on the phase transition theory in the fields of mechanics and material sciences. The dynamics of the system, that is, the temporal evolution from the energy functional, yields information of attention. And the process of saliency extraction is interface diffusion. Compared to the existing models for saliency detection, our method provides flexible and intuitive control over the detecting procedure. Experimental results show that the proposed method is effective in extracting important features in terms of human visual perception.

Acknowledgments

This work was supported by the NSF of China (nos. 61202349 and 61271452), the Natural Science Foundation Project of CQ CSTC (no. cstc2013jcyjA40058), the Education Committee Project Research Foundation of Chongqing (nos. KJ120709 and KJ131209), the Research Foundation of Chongqing University of Arts and Sciences (no. R2012SC20), and the Innovation Foundation of Chongqing (no. KJTD201321).