Abstract

Image saliency detection has become increasingly important with the development of intelligent identification and machine vision technology. This process is essential for many image processing algorithms such as image retrieval, image segmentation, image recognition, and adaptive image compression. We propose a salient region detection algorithm for full-resolution images. This algorithm analyzes the randomness and correlation of image pixels and pixel-to-region saliency computation mechanism. The algorithm first obtains points with more saliency probability by using the improved smallest univalue segment assimilating nucleus operator. It then reconstructs the entire saliency region detection by taking these points as reference and combining them with image spatial color distribution, as well as regional and global contrasts. The results for subjective and objective image saliency detection show that the proposed algorithm exhibits outstanding performance in terms of technology indices such as precision and recall rates.

1. Introduction

Image saliency detection is the key to the extraction of image information. Extracting image saliency regions is required in most image processing methods that are based on image content because important image components provide the most comprehensive information on an entire image. Therefore, precisely, extracting the salient regions of images effectively facilitates many image applications such as image retrieval [13], adaptive image compression [4, 5], object recognition [6], and content-aware image resizing [7].

Humans can easily focus on the salient parts of images according to experience and judgment, but machines are unable to precisely replicate such an ability. Many scholars have studied this matter on the basis of biology, physiology, and neurobiology. In these studies, some features that the salient regions should have, including uniqueness, randomness, and surprising characteristics, are acquired.

In this paper, we present salient region detection as a random distribution problem of image pixels’ binary labels. To concretely analyze an image, we represent all the image pixel sets as a sequence of only {1, 0}; that is, each pixel of the image has only two kinds of attribution, which either belong to the salient region or not. The two important properties in this distribution are randomness and correlation. Randomness refers to already known pixel sets belonging to the salient region but with unknown number, location, and pixel combination, that is, the size, position, and shape of the salient regions of an image. Correlation refers to random pixel distribution, but this distribution is not an irregularly complete “free random distribution.” These pixels always influence one another through some associated features such as image contrast, multiscale features, and color distribution.

We establish a computation mechanism, called location-quantification. The algorithm begins with corner points that have abundant information, and then the corner points are obtained by improving the smallest univalue segment assimilating nucleus (SUSAN) operator. Then, the salient region is located by using the corner points combined with their global features. Finally, the salient region of the image is quantified by spatial weighted similarity. The image saliency detection results are evaluated by two technology indices: the precision for single images and the success rate for large images. We test our algorithm on publicly available benchmark image data sets and compare it with existing representational algorithms. The results show that our method excellently performs in terms of the above-mentioned technology indices.

Image visual attention detection has rapidly developed in the last 20 years, and various outstanding methods have been developed. Different approaches have varied points of emphases and perform well in certain respects.

The biological vision-inspired model proposed by Koch and Ullman [8] affects a large amount of image saliency detection algorithms that are based on basic image features. Itti et al. [9] define image saliency in which the intensity, color, and orientation of images are combined to compute an image saliency map. This method analyzes only the global features of an image to effectively locate salient regions. However, the identified region is not sufficiently effective for salient region quantization. Goferman et al. [10] synthetically consider the image’s regional and global features. This method exhibits an improvement in image region details, unlike that proposed by Itti et al. [9], but this technique remains inadequate for homogeneously highlighting an entire salient region.

Some algorithms incorporate mathematical models on the basis of biological vision. Harel et al. [11] and Gopalakrishnan et al. [12] use medium range forecast to process basic image features. Duan et al. [13] employ principal component analysis to transform image color space and decrease dimension. Li et al. [14] propose a method based on sparse coding length to compute the salient region of an image. This method views image saliency as a direct reflection of coding length and presents a probable explanation for the influence of visual saliency. These methods can provide good results up to a certain extent and extend the scope of image types. Nevertheless, the introduction of mathematical models increases computational complexity.

Liu et al. [15] and Judd et al. [16] propose systematic methods of saliency detection for mass images. All their methods introduce the advantages of machine learning and collect the features of the salient region labeled by tested participants. This kind of machine learning method is robust for both image type and precision of saliency detection. The aforementioned techniques can precisely estimate images with complicated scenes. However, these approaches are unsuitable for real-time image saliency detection because the processes require complicated equipment and incur high computational costs. The increase in complexity is nonlinear with the performance improvement of saliency detection for common nature images.

Hou and Zhang [17] analyze image saliency in the frequency domain. The authors found that the average value of the log frequency spectrum of mass images is directly proportional to frequency. The salient parts of an image are obtained by subtracting the average log amplitude spectrum from the log amplitude spectrum of one image. Guo et al. [18] suggest that better results can be obtained by using the phase spectrum of the Fourier transform.

Our method is based on basic image features, which are analyzed by location-quantification. We first obtain the probable salient points and then quantify these points to reconstruct the entire image using the salient region. We comprehensively consider the regional and global image features; thus, our method sufficiently highlights the entire salient region. We do not compare and compute every pixel of an image, so that the algorithm results in less computational complexity.

3. Our Method

The essential problem in saliency region detection is the need to decide which pixels belong to the salient region and which do not belong in an image. Therefore, we define as a given image. For a random pixel , represents whether this pixel is a salient point. is used to indicate the saliency probability of one pixel coordinate and then .

The algorithm is intended for common nature images, whose salient regions contain most of the information that the image expresses. The image background also contributes to showing the entire image content, with relationships such as beautiful flowers and green leaves. The energy of different pixels varies, indicating that is also different. The algorithm therefore begins from some pixels with high values.

3.1. Computing Reference Points

The information on a pixel in an image commonly depicts that the frequency of pixel color value is low and its energy is high. Various methods, such as the SUSAN corner [19], Harris corner [20], and scale-invariant feature transform points, have been used to detect high energy and abundant information points in an image. In the current work, we use the SUSAN corner [19] to detect the points with high in an image to reduce the computational effort required in the early stage.

SUSAN corner detection [19] uses a circle template to move in an image. The intensity of every pixel in the template is compared with that of the pixel at the template nucleus. This comparison is expressed by The SUSAN region of every pixel is defined as Then, we obtain the initial corner response of this pixel by where is the geometric threshold and commonly assigned the value , which indicates half of the number of pixels in the template.

To guarantee that the SUSAN algorithm for gray images is suitable for color image corner detection, we improve the algorithm as follows. Equation (1) only considers pixel intensity as the measurement standard, which is insufficient for more complicated color images. Thus, we change in (1) into indicating the norm of the color vector in the CIELab color space.

The value of in (1) is usually 25 according to experimental results, but this invariable value lacks flexibility. In our algorithm, we define self-adaption as

Thus, we use (6) to compare the template nucleus pixels with the other pixels in the template in the CIELab color space: where is the average value of all the pixel vectors in the template.

The third column in Figure 1 shows the results of the improved SUSAN operator, in which (a) represents the source images, (b) represents images with the salient regions identified by participants, and (c) represents images with reference points computed by the improved SUSAN operator. We analyze these points in terms of two aspects. First, most of the obtained reference points are located in the labeled salient region, a finding that is consistent with our goal and shows that the obtained points’ values are large in all the pixels. Second, not all reference points are located in the labeled salient regions because of the correlation among the pixels. We consider the correlation of only one pixel with tens of other pixels around it in computing the reference points. All the pixels in an image are correlative; this relationship is called global correlation. We process the aforementioned reference points by their global correlation to obtain the saliency location map, as discussed in the next section.

3.2. From Point to Region

We have already obtained some reference points according to the principle of “larger probability,” but some of these points are not located in the labeled salient region in as Figure 1(c). The absence of the points in the labeled region is attributed to the fact that we choose only the limited-length neighborhood features that take these points as the center in the computation process. Their global features are disregarded.

3.2.1. Global Point Processing

We use the global contrast method introduced by Cheng et al. [21] to compute the global features of the reference points because this method effectively separates a large-scale object from its surroundings. This technique is also preferred over local contrast-based methods that produce high saliency values at near-object edges. In the method of Cheng et al. [21], every pixel should be compared with others, but the analysis in the preceding section indicates that this comparison is not necessary for all pixels because the saliency probability of image pixels is different.

We choose the large probability points and compute their global contrast by where denotes the vector with coordinates ; is the vector of the random points in the image; and represents the distance of the pixels in the CIELab color space, which is expressed by (4).

We define a threshold to sift all the points: where is the average saliency value of the reference points. All the reference points that show are removed; then, the isolated points are also removed. The result is shown in Figure 1(d). The fourth column of Figure 1 shows that most of the reference points are located in the labeled salient region after processing.

3.2.2. Point Diffusion by Spatial Weighted Similarity

As shown in Figure 1, the remaining points after global contrast are mostly located in the labeled salient region or boundary. The next step is obtaining the entire salient region according to the reference points. The color and texture of the pixels inside the salient region are similar but differ from those of the background. A random pixel search method centered on one reference point is proposed to complete salient region detection.

We suggest that the computation of every pixel’s saliency value follows the four basic principles according to the relationship of the reference and other pixels.(1)Both of the regional and global saliency probability of the reference points after processing are high; thus, the saliency probability of pixels with similar features should also be high.(2)The saliency probability of the pixels is related to distance [10, 13]. That is, some pixels’ features are similar to those of reference points, but their saliency probability decreases as a result of large distances.(3)All saliency pixels are centered before the entire image is subjected to saliency detection [16]. That is, the position of the salient object is near the center of the image for a nature image.(4)The amount of pixel information is a ratio that is inversely related to frequency, but their saliency probability is low, regardless of whether the frequency is a maximum or minimum value.

Every reference point is successively chosen as the center point from all reference points according to the aforementioned four principles. Every pixel in the circle of its circle neighborhood is searched until all the pixels are chosen. The saliency value is computed by where is the reference point coordinate and is the circle radius. Weight , defined according to principle 2, is where the algorithm of Goferman et al. [10] is used as the reference for argument and .

Weight indicates the offset from pixel to the image center. The expression in the study of Judd et al. [16] is used as the reference and is defined as follows: where is the spatial distance from pixels to the image center and denotes the normalizing factor.

We use the Gauss function to show the influence of the frequency of one pixel on its saliency probability. We define as where is the frequency of pixel ; is the average frequency of pixels in the image; and controls the strength of pixel frequency weight. Larger values of are more influential in saliency probability. In our experiment, .

To compute (9), we usually first choose the center point of all the reference points. If one point is similar to another that has been computed, we disregard it. Our experimental results are shown in Figure 2.

3.2.3. Multiscale Enhancement

Multiple scales of an image play an important role in enhancing the algorithm effect in the area of image recognition and object detection. We compute (9) at scale . We merge the three saliency maps of different scales to obtain the final saliency map. Merging necessitates transforming the three saliency maps to the same scale because of their different sizes. The merging method based on the weighted saliency maps is used to generate the final saliency map. We use the reference points’ number proportions at different scales as weight standards: where , , and are the reference points’ numbers at different scales and is the sum of all the reference points at three scales.

4. Experimental Results and Evaluation

Various methods for evaluating the results of salient region detection are available, and each method emphasizes different elements. We generally classify these methods as subjective and objective evaluations. The first problem in evaluating an algorithm is the selection of an image data set. We test our algorithm in the data set provided by Cheng et al. [21] and Achanta et al. [22]. We then show our subjective and objective evaluation results.

4.1. Subjective Evaluation Results

Subjective evaluation results are usually reflected by comparing the results of different algorithms. In our paper, some representative methods are chosen to compare results. The chosen methods are those of Itti et al. [9], Harel et al. [11], Goferman et al. [10], and Hou and Zhang [17]. The method of Itti et al. [9] is a classic image saliency detection method, from which more effective methods emerged. The method of Harel et al. [11], called graph-based visual saliency, possesses features of biology motivation and mathematical computation. The method proposed by Goferman et al. [10] uses the regional and global features of an image and is similar to ours. The spectral residual method of Hou and Zhang [17] analyzes image saliency in the frequency domain, serving a new route to saliency detection. Figure 3 shows the comparison of our results with those of the aforementioned methods. The figure shows that our method improves the precision and integrity of salient regions.

4.2. Objective Evaluation Results

Objective evaluation results are usually reflected by quantitative data. Evaluation results are reflected in two aspects. The applicability of the algorithm should be considered, which is indicated by the run time and memory space of the algorithm. If the algorithm is applied under high real-time requirements, then the requirement of run time and memory space for the algorithm is high. The recall ratio and precision should also be taken into account. These factors can be generated by the confusion matrix of results.

Table 1 shows the comparison of the run time of our algorithm and those of others. The resolution of most test images is 400 × 300. The test platform is a machine with a Dual Core 2.10 GHz CPU and 2G memory. The test system is Microsoft Windows XP.

Under similar principles, our method consumes more time than do the comparison methods but spends less time than that observed in the method of Goferman (Table 1). However, we achieve better saliency detection results at the cost of time. In special situations, our algorithm satisfies the real-time requirement of an efficient programming language such as C++.

Figure 3 shows the comparison histogram of our method with those of others in terms of precision and recall rates. The -measure is computed by As in [20], we use to weight precision more than recall ratio. Our method exhibits higher precision, recall ratio, and (Figure 3).

We also show the receiver operating characteristic curves of our method and compare them with those of the aforementioned approaches. Figure 4 indicates that our method achieves higher hit rates and lower false positive rates.

5. Conclusion

We propose a salient region detection algorithm for location-quantification images. The algorithm considers points to regions. First, the salient region is preliminarily located by corner. The region is then precisely located on the basis of image global features. Finally, the salient region is determined by spatial weighted similarity. Our algorithm does not compare pixels individually; only a few are compared, thereby reducing computational complexity. Our algorithm exceeds the performance of other well-known methods.

Nonetheless, we note that our method excessively depends on image color features, making it nonideal for images with complicated scenes and background textures. Future plans include incorporating advanced factors such as human face detection and image symmetry to obtain better results. We believe that exact saliency region detection can improve the results of image scene analysis, image classification, and image retrieval based on image content.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (nos. 61370111 and 61103113) and Beijing Municipal Education Commission General Program (KM201310009004).