Abstract

For infrared images, it is a formidable challenge to highlight salient regions completely and suppress the background noise effectively at the same time. To handle this problem, a novel saliency detection method based on multiscale local sparse representation and local contrast measure is proposed in this paper. The saliency detection problem is implemented in three stages. First, a multiscale local sparse representation based approach is designed for detecting saliency in infrared images. Using it, multiple saliency maps with various scales are obtained for an infrared image. These maps are then fused to generate a combined saliency map, which can highlight the salient region fully. Second, we adopt a local contrast measure based technique to process the infrared image. It divides the image into a number of image blocks. Then these blocks are utilized to calculate the local contrast to generate a local contrast measure based saliency map. In this map, the background noise can be suppressed effectually. Last, to make full use of the advantages of the above two saliency maps, we propose combining them together using an adaptive fusion scheme. Experimental results show that our method achieves better performance than several state-of-the-art algorithms for saliency detection in infrared images.

1. Introduction

In the field of image processing and computer vision, a number of methods for saliency detection in visible images have been proposed [15]. These techniques have been proved to be very mature. However, when applied to infrared (IR) images, they tend to fail. The reason is that infrared images, which are generated by infrared imaging sensors, always have low resolutions, low signal-to-noise ratios (SNR), low contrasts, and few useful image features. If the commonly used saliency detection methods that have been originally designed for visible images are directly applied to infrared images without considering the special characteristics of infrared images, their effect cannot be fully played, which directly leads to reduced saliency detection performance [6].

To this end, some techniques have been later proposed especially for saliency detection in infrared images [710]. The idea of many of these methods is to predict the size and shape of the background and salient region as accurately as possible and then adopt the method of background suppression to eliminate background and simultaneously preserve the salient region. To some extent, they have achieved good detection effect, but the premise is that the salient region should have a certain signal-to-noise ratio; that is to say, the gray intensity of the salient region must be much higher than the surrounding background region in infrared images. Thus, when the salient region has a low signal-to-noise ratio, most of the above methods cannot effectively restrain background noise as well as highlight the salient region.

In recent years, the sparse representation (SR) theory has been developed in the field of signal processing and analysis [11]. Image sparse representation is the linear representation of image signal by a few atoms in the learned overcomplete dictionary. The key of sparse representation is to construct a good dictionary to make the image signal sparser. Sparse representation has been used in many fields, such as image denoising [12], image restoration [13], and image target detection [14]. Nowadays, researchers begin to apply sparse representation to image saliency detection. For example, in [15], Xia et al. presented a saliency model based on a nonlocal reconstruction. In the method, the saliency was measured by the sparse reconstruction residual of representing the central patch with a linear combination of its surrounding patches sampled in a nonlocal manner. In [16], Fareed et al. proposed detecting salient region through sparse reconstruction and graph-based ranking. In the first step, the original image was segmented into superpixels. In the second step, the sparse representation measure and uniqueness of the features were computed. Then both were ranked on the basis of the background and foreground seeds, respectively. Thirdly, a location prior map was used to enhance the foci of attention. Rigas et al. [17] presented an algorithm for efficient modeling of visual saliency based on local sparse representation and the use of Hamming distance. This method was based on an efficient comparison scheme for the local sparse representations deriving from nonoverlapping image patches. The sparse coding stage was implemented via an overcomplete dictionary trained with a soft-competitive bioinspired algorithm and the use of natural images. The resulting local sparse codes were pairwise compared using the Hamming distance as a gauge of their coactivation. The calculated distances were used to quantify the saliency strength for each individual patch, and then, the saliency values were nonlinearly filtered to form the final map. Fan and Qi [18] introduced a saliency detection method based on global and local short-term sparse representation. They employed the ICA algorithm to learn a set of basis functions at first and then represented the input image by this set of basis functions. Next, a global and local saliency framework was employed to measure the saliency, where the global saliency was obtained through Low-Rank Representation (LRR), and the local saliency was obtained through a sparse coding scheme. Based on the above introduction, we can see that sparse representation is indeed an effective tool for image saliency detection, so we tried applying the existing sparse representation based saliency detection methods to infrared images. Unfortunately, they cannot obtain satisfying results due to the special characteristics of infrared images. Because of the special imaging mechanism, infrared images usually have low contrast and lots of background noise. The saliency maps obtained by the existing sparse representation based methods always contain much noise and hardly present complete saliency regions.

Hence, to utilize the advantages of sparse representation for saliency detection in infrared images and at the same time overcome the above-mentioned deficiencies, we present a novel saliency detection algorithm based on multiscale local sparse representation and local contrast measure in this paper. First, a multiscale local sparse representation (MLSR) based approach is proposed to detect saliency in infrared images. Based on this approach, multiple saliency maps with various scales are obtained for an input infrared image. Then, to enhance the salient region completely, the maps are combined together to derive a fused saliency map. Subsequently, in order to suppress the disturbances of background and further improve the saliency detection performance, a local contrast measure (LCM) based technique is adopted to process the input infrared image. Based on this technique, the original infrared image is divided into many image blocks. Then the blocks are used to calculate the local contrast so as to compute a local contrast measure based saliency map. Since this technique takes the characteristics of infrared images into consideration, it can effectively suppress the background noise. Finally, by combining the advantages of multiscale local sparse representation which can fully highlight salient region and local contrast measure which can restrain background noise, the proposed algorithm can guarantee that the detected salient regions of infrared images are accurate and integral, and the background clutters are restrained. Experimental results show that it can provide more accurate and reliable results of saliency detection in infrared images than tested conventional algorithms.

The rest of this paper is organized as follows. Section 2 describes the proposed multiscale local sparse representation and local contrast measure based saliency detection method in detail. Section 3 presents and discusses the evaluation results on real-life infrared images. Section 4 concludes the paper.

2. The Proposed Algorithm

The proposed saliency detection method for infrared images involves three main stages, as shown in Figure 1, which correspond to the multiscale local sparse representation based saliency computation, local contrast measure based saliency computation, and adaptive saliency map combination, respectively. We describe these three main stages of the method in detail as follows.

2.1. Multiscale Local Sparse Representation Based Saliency Computation

It has been verified that sparse representation is a useful tool for saliency detection, but it is difficult to obtain complete salient regions when applied to infrared images [15]. To solve this problem, here we propose a multiscale local sparse representation based approach to compute saliency for infrared images. This approach combines the advantages of local sparse representation that can highlight the edge profile of salient object and the advantages of multiscale idea that can enhance the internal area of salient object, which guarantees the integrality of the salient region.

2.1.1. Scale Selection

Multiscale analysis is very useful for estimating visual saliency, since it can help capturing salient objects in the visual field no matter what sizes the objects are. Hence we apply multiscale idea to infrared image salient detection, that is, extracting visual saliency over different scales. However, finding an appropriate number of image scales is not easy for infrared images. While we wish to preserve the fine image structures by selecting a large number of image scales, we also want to avoid decomposing connected regions into smaller noisy image patches. After deep consideration, in this work, a scheme is implemented based upon our previous work [19] that aims to decompose the infrared image with different numbers of image scales. The scales are given as follows:where is the smallest scale which is used to represent the coarse structure of infrared images. is the largest scale which is used to represent the fine structure of infrared images. is the total number of scales. corresponds to the scale .

For simplicity and efficiency, here we consider two different scales; that is, . For fine scale, is 1. For coarse scale, experiments showed that is sufficient to represent the coarse structure of infrared images. Finally, two saliency maps at both the large and the small scales will be calculated, respectively, based on local sparse representation.

2.1.2. Multiscale Local Sparse Representation Based Saliency Map

Given an original infrared image, we get two images with different scales based on the aforementioned multiscale idea and compute the saliency map using local sparse representation [15]. The detailed steps are given as follows.

Step 1. Given an original infrared image , calculate its saliency map at scale . Specifically, in image , given a pixel , we use to represent the vector of the square local patch with pixels at , which is formed by sequentially stacking each column of the patch. Let be the search window centered at . The central patch can be represented by using a linear combination of patches whose central pixels are in the search window :where is the data matrix and is the coefficient vector of the linear combination. In order to make full use of the sparse representation for the original infrared image, we should ensure that the coefficient vector contains only few nonzero elements. So we transform (2) to the following form:where denotes the pseudo -norm. Orthogonal Matching Pursuit (OMP) can be used to compute the coefficient , and then reconstructed patch is calculated bySubsequently, the reconstruction residual is calculated bywhere denotes the -norm.
By applying the above local sparse representation to the whole infrared image, that is, changing the central patch by scanning the original image from left to right and top to bottom, we can finally get the saliency map at scale of the original infrared image.

Step 2. For the coarse scale , resize the original infrared image to get a cutdown image and, at the same time, reduce the size of the search window of local sparse representation to , where . Similar to Step 1, we can obtain the saliency map at scale of the original infrared image. Note that is ultimately rescaled to the original size of the infrared image.

Step 3. Combine the saliency map and the saliency map by weighted additive fusion:where gives the same importance of and . The final multiscale local sparse representation based saliency map is obtained for the original infrared image. Since this result combines the advantages of multiscale analysis and local sparse representation, the salient region can be well highlighted. Nevertheless, when the infrared image has complex background clutters, this saliency map will include much background noise. Therefore, in the following section, we will adopt the local contrast measure [9] as an auxiliary tool to solve this problem.

2.2. Local Contrast Measure Based Saliency Computation

As an important factor in human visual perception, contrast has been extensively studied by researchers in psychology and computer vision. Numerous methods have been proposed computing contrast saliency maps using various visual properties, such as color, intensity, texture, and structure. For infrared images, intensity is usually regarded as a very important and distinctive visual characteristic [9]. Motivated by these, we focus on calculating intensity contrast for saliency detection for infrared images, which is unaffected by temperature variations in infrared image applications.

Considering that infrared images usually exhibit much noise due to the infrared sensors, electronic circuit noise, background noise, and so forth, to reduce the influence of noise and generate a more robust saliency map, we calculate a saliency map by using local contrast measure as follows.

First, given an infrared image , slide a window on it from up to down and left to right at a certain step to get a set of subblocks.

Second, let be a subblock. Calculate the average intensity value for :where represents the number of pixels in . Also, search the maximum intensity value in :

Third, apply an image area whose side length is three times the subblock’s side length on the image and take as the central subblock. Then eight adjacent subblocks of can be obtained in this area. Calculate their average intensity values . The local contrast measure of can be defined byObviously, if the central block is a target block, is usually lower than . Thus, , then target is enhanced. If is a background block, there may be . Thus, ; then the background is suppressed.

Finally, apply the above local contrast measure to the whole infrared image, we can obtain a local contrast measure based saliency map . In this map, the background noise can be effectively restrained.

2.3. Adaptive Saliency Map Combination

After obtaining the multiscale local sparse representation based saliency map and the local contrast measure based saliency map , we design an adaptive fusion scheme to combine them together.

In the human visual attention, the fusion of different characteristics is very important. Currently, there exist various fusion methods, such as additive fusion [20], multiplicative fusion [21], and maximum fusion [22]. Considering that multiplicative fusion is likely to lose much information, while maximum fusion is easily influenced by background noise, we select additive fusion for saliency maps combination.

For additive fusion, computing the weight of each saliency map is an interesting and important task. There are two ways for the weight calculation: one is the fixed weighting methods [20], and another is the adaptive weighting method [19]. Comparing these two kinds of weighting methods, the former is simple and fast to implement but lacks flexibility and easily results in a bad result; on the contrary, the latter has strong flexibility and adaptability and usually achieves better fusion effect. Therefore, here we adopt a mutual consistency guided fusion method, the effectiveness of which has been verified in our recent related work [19], to adaptively combine and .

Given the multiscale local sparse representation based saliency map and the local contrast measure based saliency map , their mutual consistencies are first calculated:where denotes consistency of relative to . denotes the consistency of relative to . Both and take values in .

Then, the relative adaptive weights for and are calculated byFinally, the fused saliency map is obtained by

Figures 2 and 3 show two saliency detection examples. For each example, the original infrared image, the multiscale local sparse representation based saliency map, the local contrast measure based saliency map, and the combined saliency maps using different fusion strategies, including multiplicative fusion, maximum fusion, additive fusion of fixed weights (i.e., ), and additive fusion of our adaptive weights, are illustrated. From these two examples, we can see that the multiscale local sparse representation based saliency map can well highlight the whole salient object, but it includes too much noise (see Figures 2(b) and 3(b)). While the saliency map based on local contrast measure can effectively suppress the background noise, but the detected salient region is incomplete (see Figures 2(c) and 3(c)). In order to achieve better results, these two kinds of saliency maps are combined together. By comparing various fusion strategies, it can be seen that, since multiplicative fusion loses much information, the salient region is incomplete in the fused saliency map (see Figures 2(d) and 3(d)); maximum fusion easily tends to the response of one predominant channel and cannot effectively suppress background noise (see Figures 2(e) and 3(e)); fixed weights-based additive fusion method lacks flexibility, so it also leads to bad results (see Figures 2(f) and 3(f)). On the contrary, our adaptive fusion method produces superior results. As shown in Figures 2(g) and 3(g), it can not only enhance the salient regions, but also simultaneously suppress the background noise.

3. Experimental Results

In this section, we evaluate the performance of the proposed method. We compare it with ground truth data as well as eight state-of-the-art saliency detection algorithms. The first selected model, called a structured matrix decomposition model, was recently proposed by Peng et al. [23] for salient object detection. It utilized a tree-structured sparsity-inducing regularization and a Laplacian regularization to complete the saliency detection (we refer to this method as SMD). Second, Chen et al. [24] presented a contrast measuring method for infrared salient object detection. It firstly measured the dissimilarity between the current location and its neighborhoods and then used an adaptive threshold to obtain the salient object (we refer to this method as LC). Third, Hou and Zhang [25] proposed a dynamic visual attention model for saliency detection, which introduced the Incremental Coding Length to measure the perspective entropy gain of each feature (we refer to this method as ICL). Fourth, the selected model was proposed by Itti et al. [5]. In this model, color, intensity, and orientation features were used to calculate saliency in an image: given an image, it was first decomposed into color, intensity, and orientation channels to generate three conspicuity maps by using the multiscale and center-surround operations; then these three conspicuity maps were linearly combined together to produce the final saliency map for the input image (we refer to this method as ITTI). Fifth, in [26], Zhai and Shah presented a visual attention detection algorithm using spatiotemporal cues. In the spatial attention model, they developed a fast method for computing pixel-level saliency maps using color histograms of images (We refer to this method as SC). Sixth, in [27], Hou and Zhang proposed a spectral residual approach for visual saliency detection. This model was independent of features, categories, or other forms of prior knowledge of the objects. By analyzing the log-spectrum of an input image, the spectral residual of an image was extracted in spectral domain, and then the corresponding saliency map was constructed in spatial domain (we refer to this method as SR). Seventh, in [28], Achanta et al. provided a frequency-tuned saliency model relying on difference of Gaussian bandpass filters. This method exploited features of color and luminance and was computationally efficient (we refer to this method as FT). The last selected saliency detection model for comparison was based on local sparse representation [15], which divided a test image into a number of patches at first; then, each patch was sparsely coded with its surrounding patches. Based on the learned dictionary, the sparse reconstruction error of each patch was computed as the saliency values for the corresponding patch (we refer to this method as LSR).

In this paper, we mainly compare the above methods by using the publicly available dataset [29]. Our test dataset contains a number of various infrared images with different salient objects, such as pedestrian, airplane, and vessel. The salient regions in each image are manually labeled to generate the ground truths. Both qualitative evaluation and quantitative evaluation are made.

3.1. Qualitative Evaluation

In this section, we present some qualitative comparison results of our algorithm and other eight methods, as shown in Figures 4 and 5. Figure 4 displays the computed saliency maps of infrared images which contain various salient human objects, while Figure 5 shows the computed saliency maps of infrared images which include different salient nonhuman objects.

From Figures 4 and 5, we can see that our method can highlight all the salient regions completely no matter how many human objects the images have, and at the same time, it can suppress the background noise effectively. On the contrary, other eight approaches achieve inferior performance for infrared images. For example, SMD, LC, SC, and FT can highlight most salient regions, but the background noise cannot be suppressed. ICL, LSR, and ITTI tend to generate diffuse saliency maps. SR produces the worst performance, since it not only fails to detect the salient regions, but also contains too much noise. Thus, the proposed approach outperforms the other methods.

3.2. Quantitative Evaluation

To further evaluate our algorithm, we also make quantitative comparison. We select three metrics, namely, recall, precision, and -measure [30, 31], which are defined as follows:where denotes the binary salient region mask and is the corresponding binary ground truth. and are the numbers of pixels in and , respectively. In general, precision and recall are two conflicting goals. To consider precision and recall simultaneously, -measure metric is always calculated. Similar to [32], is set to 0.3. The binary salient region mask is found by applying a simple threshold operation on a saliency map [21].

The quantitatively results are shown in Figure 6. As can be seen, our method achieves the best performance compared with the state-of-the-art saliency detection algorithms for infrared images. It should be emphasized again that such favorable results can be obtained since our proposed method is especially designed for infrared images based on the analysis of characteristics of infrared images.

Besides, all experiments are implemented on an Intel Dual Core 2.3 GHz laptop with 4 GB RAM. The programming platform is Matlab R2013b. We compute the saliency detection time for all test images with the above-mentioned algorithms. The results are reported in Table 1, where the time is computed as an average over all the test images.

When compared with other eight methods, that is, SMD, LC, ICL, ITTI, SC, SR, FT, and LSR, which only take about 1-2 seconds to handle one image, the efficiency of our multiscale local sparse representation and local contrast measure based approach is much lower, for it involves online learning processes. As shown in Table 1, for an image with the size of , our average computational time is about 3 s in the above experimental setup. An efficient C/C++ implement or even a parallel architecture would reduce the overall execution time dramatically and would make the proposed method feasible for real-world application.

4. Conclusion

This paper proposes a novel saliency detection method for infrared images based on multiscale local sparse representation and local contrast measure. We first develop a multiscale local sparse representation based approach to compute saliency for infrared images, which can highlight the salient regions fully. We then adopt a local contrast measure based scheme to compute saliency for infrared images, which can suppress the background noise effectively. By incorporating these two categories of saliency maps into a unified one, we can generate reliable saliency maps. Compared with several up-to-date approaches, we have shown that the proposed method is able to achieve more accurate and robust results of saliency detection for infrared images. In the future, we will apply it to infrared video analysis, such as video retrieval, abstraction, and event recognition.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant nos. 61374019 and 61603124), Fundamental Research Funds for the Central Universities (Grant no. 2015B19014), and Open Research Fund of Jiangsu Key Laboratory of Spectral Imaging & Intelligent Sense (Grant no. 3091601410401).