Abstract

Salient object detection has a wide range of applications in computer vision tasks. Although tremendous progress has been made in recent decades, the weak light image still poses formidable challenges to current saliency models due to its low illumination and low signal-to-noise ratio properties. Traditional hand-crafted features inevitably encounter great difficulties in handling images with weak light backgrounds, while most of the high-level features are unfavorable to highlight visually salient objects in weak light images. In allusion to these problems, an optimal feature selection-guided saliency seed propagation model is proposed for salient object detection in weak light images. The main idea of this paper is to hierarchically refine the saliency map by learning the optimal saliency seeds in weak light images recursively. Particularly, multiscale superpixel segmentation and entropy-based optimal feature selection are first introduced to suppress the background interference. The initial saliency map is then obtained by the calculation of global contrast and spatial relationship. Moreover, local fitness and global fitness are used to optimize the prediction saliency map. Extensive experiments on six datasets show that our saliency model outperforms 20 state-of-the-art models in terms of popular evaluation criteria.

1. Introduction

Aiming to mimic human visual system (HVS), which has the ability to effortlessly sort out the most attractive things from the scene in front of eyes, the goal of salient object detection is to calculate the most important objects in an image. For the moment, salient object detection can substantially facilitate a series of applications, such as image segmentation [1, 2], object recognition [3], image retrieval [4], image compression [5], and photo cropping [6].

By computing pixel or region uniqueness in either low-level cue or high-level cue, existing salient object detection models can be broadly divided into two types. (1) Bottom-up models are usually unsupervised and based on local contrast or global contrast. These methods tend to suffer from false detections in the context of cluttered background and less effective visual features. (2) Top-down models mainly leverage supervised learning to guide object detection. However, the complexity of the algorithm and the diversity of objectives limit the generality of these methods.

Although a large number of bottom-up and top-down salient object detection models have been proposed, most of them are only designed for normal light scenes. These saliency models are confronted with significant challenges in weak light images due to low signal-to-noise ratio and lack of well-defined features to capture saliency information in low lighting scenarios. The most likely reasons may attribute to two aspects: (1) current hand-crafted visual features can hardly evaluate the objectness in weak light images; (2) most of the high-level features normally present enormous challenges in detecting accurate object boundary information, which can be easily blurred due to multiple levels of convolution layers and pooling layers in common convolutional neural network models.

To address these challenges, this paper proposes an optimal feature selection-based saliency seed propagation model for salient object detection in weak light images (the code of this paper can be downloaded from https://drive.google.com/open?id=1w0qBapNVygh8TOOp7AijFWOxsYxdRcWa). Several hand-crafted visual features are selected to hierarchically refine the saliency map obtained from the high-level cues recursively. The flowchart of our model is presented in Figure 1. The optimal low-level features are first selected to give a robust expression for weak light images, which aims to capture more objectness information and contributes to the prediction of salient objects under weak light conditions. Next, two cost functions are introduced to iteratively optimize the foreground seeds and background seeds of the initial saliency map, which can continuously compensate for salient information and remove nonsalient information to generate more precise object details. To estimate the overall performance, the proposed model is compared with 20 state-of-the-art salient object detection models on six datasets.

The paper is an extended version of our previously accepted conference paper [7], which provides a more detailed explanation and richer experimental demonstration. To sum up, this research has four main contributions: (1) a bottom-up visual saliency model, which requires no training, is explored toward weak light images, (2) an effective feature selection strategy is put forward to provide a robust representation of saliency information, (3) two cost functions are built to refine the initial saliency seeds recursively, and (4) a nighttime image (NI) dataset (the nighttime image (NI) dataset can be downloaded from https://drive.google.com/open?id=0BwVQK2zsuAQwX2hXbnc3ZVMzejQ) is constructed to verify the performance of our model.

The rest of this paper is organized as follows. Section 2 reviews the related works of saliency detection. Section 3 introduces the proposed saliency model. Section 4 presents the experimental results of the state-of-the-art models and the proposed model on six datasets. The conclusion of this paper is given in Section 5.

Numerous salient object detection models have been proposed recently (see [8] for review); the main task of them is to highlight the most important visual regions for further processing. Depending on whether the task-independent or task-dependent is considered, they can be categorized as the bottom-up models and the top-down models, respectively.

Bottom-up saliency models are stimuli-driven and rely on low-level features. One typical model was presented by Itti et al. [9], which is mainly based on the center-surround difference of multiple features. Following this pioneering work, various bottom-up saliency models were proposed. Goferman et al. [10] computed the saliency value of image patches by implementing the local and global contrasts. Cheng et al. [11] executed the saliency computation by calculating the histogram and region contrasts. Xu et al. [12] introduced the contrast and spatial distribution strategies to evaluate the image saliency. Kim et al. [13] estimated the local saliency and global saliency based on regression and high-dimensional color transform. Hu et al. [14] performed salient object detection by utilizing the compactness hypothesis of color feature and texture feature. Huang and Zhang [15] presented a minimum directional contrast based salient object detection method. Wang et al. [16] exploited the pyramid attention and salient edges to guide the salient object detection. Sun et al. [17] detected the salient objects by employing a cascaded bottom-up feature aggregation module to capture the detailed information of low-level features. Jiang et al. [18] proposed a task-independent saliency model based on the bidirectional absorbing Markov chains. Molin et al. [19] exploited a neuromorphic dynamic bottom-up saliency detection method, which is feed-forward and requires no training. Typically, these bottom-up saliency models tend to face many difficult problems in handling images of a busy background and struggle to predict the true salient objects, which are in a low-contrast weak light environment.

Top-down saliency models are task-driven and rely on high-level perceptual learning. Xu et al. [20] used the support vector machine (SVM) model to produce the superpixel-level saliency map. Qu et al. [21] proposed a deep learning-based salient object detection model by combining the superpixel-based Laplacian propagation and the trained convolutional neural network (CNN) model. Mu et al. [22] designed a region covariance-based CNN method to learn the saliency value of image patches. Wang et al. [23] employed the top-down process for coarse-to-fine saliency estimation. Mu et al. [24] explored global convolutional and boundary refinement in a top-down manner to guide the learning of salient objects. Qiu et al. [25] introduced an automatic top-down fusion (ATDF) saliency model, which utilizes the global information to guide the learning of underlying knowledge. Zhang et al. [26] developed a top-down multilevel fusion method for RGB-D salient object detection. Wang et al. [27] progressively optimized the salient objects by exploiting the fixation map in a top-down mode. Xu et al. [28] utilized a progressive architecture with a knowledge review network (PA-KRN) for salient object detection, which compensates for the important information in a top-down way. Dong et al. [29] presented a bidirectional collaboration network (BCNet) for salient object detection, which integrates feature fusion and feature aggregation in an edge-guided top-down progressive pathway. These top-down saliency models generally have high computational complexity and are relatively ineffective in determining accurate boundary and localization of salient objects under weak light conditions.

Since saliency detection in a weak light environment is a challenging problem, there were few studies on the salient object detection of weak light images [30, 31]. Mu et al. [30] proposed an ant colony optimization (ACO) based saliency model for predicting the salient objects on weak light images. Xu et al. [31] explored an image enhancement method for salient object detection in weak light images. These saliency models, however, are not robust enough to capture the salient objects in real-time. Different from these previous methods, the proposed model creates a totally unsupervised algorithm by integrating the bottom-up measures and the single-objective optimization cues. Specifically, (1) the proposed saliency model explores low-level features to represent the object properties and selects the most effective ones based on the entropy information; (2) the superpixel-level saliency is directly estimated by the feature dissimilarity and spatial similarity; (3) the prior saliency map, which contains the foreground seeds and the background seeds, is generated by implementing the bottom-up measures; (4) the single-objective optimization cues are formulated by designing the fitness-based cost functions to iteratively optimize the salient and nonsalient seeds; and (5) experimental results indicate that the proposed model can generate a high-performance saliency map in real time.

3. The Proposed Saliency Model

The proposed optimal feature selection-guided saliency seed propagation model is presented in detail in this section. The input image is first segmented into superpixels at three scales. Then, 12 features are extracted from the preprocessing image, and only nine optimal ones are chosen for the next calculation. Next, the initial saliency map is computed by combining the global contrast and center prior. And then, the new saliency map can be obtained by the foreground and background seeds from the previous one. Two cost functions which are based on global fitness and local fitness are defined to control the end of the iteration. At last, the optimal saliency map is obtained, and the results of three scales are integrated to get the final saliency map.

3.1. Multiscale Superpixel Segmentation

To make full use of the midlevel information and preserve the object structure context of the input image, the simple linear iterative clustering (SLIC) algorithm [32] is used to divide the input image into superpixels (denoted as , ). This operation can boost the efficiency of the method by regarding the superpixel as a processing unit. For saliency detection, the background region is more likely to have semblable superpixels at different scales, while the salient regions may have similar superpixels at some scales. That is, the fusion of the acquired salient superpixels at different scales can more accurately represent the real salient regions. However, as the number of superpixels increases, the time required for superpixel segmentation also increases. For accuracy and efficiency, our model generates the superpixels at three different scales, where the superpixel number is set to 100, 200, and 300, respectively. The final saliency map is the integration of the obtained multiscale saliency maps.

3.2. Effective Feature Extraction

Given an input image, 12 low-level visual features are extracted, containing nine color features in three color spaces, the texture feature based on local entropy information, the orientation feature fused by the information in four directions, the gradient feature obtained from the horizontal and vertical vectors. Since the effectiveness of these various features varies according to the contrasts of different input images, nine optimal ones are selected from the 12 features, and the adaptive selection strategy is mainly based on the global information entropy of these features. The feature extraction process is introduced in detail as follows.

3.2.1. Color Features

The input image is first normalized to eliminate the interference of shadow and light (see preprocessing in Figure 1). This preprocessing is a general procedure in our model, including processing both normal light images and weak light images. Then, the input image is transformed from RGB color space to LAB, HSV, and YCbCr color spaces to capture nine color features. The L, A, and B components of LAB color space can describe all colors visible to the human eye, which are closer to human visual perception in weak light images. The H, S, and V components of HSV color space can be very intuitive to represent the hue, depth, and bright degree, which have good robustness in low lightness and weak light images. The Y, Cb, and Cr components of YCbCr color space can better perceive the intensity changes and the chromatic differences, which are more conducive to highlight the salient object information in weak light images.

3.2.2. Texture Feature

The 2-dimensional entropy of the original image is mainly used to represent the texture feature. Let , denote the gray value of an image pixel, and let , denote the average gray value of its neighborhood pixels; the spatial synthesis characteristic of gray distribution can be expressed as follows:where is the frequency of the characteristic tuple and is the size of the neighborhood region. The discrete 2-dimensional entropy of the input image is defined as follows:

Since the entropy information has strong resistance against noise interference and geometric deformation, the texture feature changes of salient objects in the weak light image can be well estimated by the variations in entropy.

3.2.3. Orientation Feature

The orientation feature is computed by executing the Gabor filter of different directions (denoted as ) on the grayscale image (denoted as ) via

The rotational invariance and the global property of the orientation feature make it have less impact from weak light scenes.

3.2.4. Gradient Feature

The gradient feature is calculated by averaging the vertical gradient and horizontal gradient via

Thus, the magnitude information of local grayscale changes can be represented by the gradient feature, which can overcome the interference of a low signal-to-noise ratio in the weak light image.

3.2.5. Optimal Feature Selection

Feature selection plays an important role in predicting the real salient objects in weak light images. Gopalakrishnan et al. [33] proposed an unsupervised feature selection method, which removes the irrelevant features by maximizing the mixing rate of Markov processes of different features. However, naive inclusion of irrelevant features for a particular image can easily lead to performance degradation. Liang et al. [34] explored feature selection methods in supervised saliency learning, the features utilized in the model are highly redundant. Naqvi et al. [35] selected useful features by measuring the feature quality. However, they use a large number of features trying to explain all possible saliency-related factors, which increases the time cost and ignores some truly effective features. Since the goal of our model is to identify a small set of optimal features, with which the salient object detection in the weak light image can be both efficient and effective, traditional adaptive feature selection techniques are not suitable for us. The proposed model mainly extracts 12 features to participate in the salient object calculation. Due to the fact that the effectiveness of each feature is different when the image contrast changes, which can be seen in Figure 2, nine optimal features (denoted as, ) that can better describe the attributes of the corresponding weak light image are then selected from the extracted 12 different visual features by calculating the 1-dimensional entropy information of these feature maps as follows:where denotes the proportion of image pixels and denotes the grayscale values of these pixels.

As a statistical feature form, the mean information content contained in the aggregation properties of image grayscale distribution can be well represented by image entropy information. The greater the entropy of the feature map is, the more efficient this feature will be. Thus, the selected nine optimal features could better account for the visual saliency of the corresponding weak light image.

3.3. Initial Saliency Map Generation

The global contrast measure and the spatial relationship strategy of the feature map are calculated to estimate the saliency value of each superpixel as follows:where is the Euclidean distance between superpixels and . denotes the spatial distance between the coordinate and image center . and are variables, which are decided by the vertical and horizontal information of the input image.

3.4. Saliency Map Optimization

To achieve clean and uniform salient objects, optimization strategies are considered to improve detection accuracy. Zhu et al. [36] presented a principled optimization structure to fuse multiple low-level saliency cues, the whole framework mainly relies on the background cues, and it does not work well in weak light images, of which the background information is cluttered. Lu et al. [37] devoted to learning optimal saliency seeds set by utilizing a large margin formulation of discriminant saliency criterion. However, the gradient descent they used is not robust in weak light images and is not efficient for high accuracy salient object detection. In the proposed model, we built two cost functions to refine the generated saliency seeds recursively, which is an effective and straightforward manner to obtain more accurate salient objects in weak light images. The initial saliency map (denoted as , ) is first segmented into the salient region and nonsalient region by utilizing Otsu’s thresholding [38]. The salient region and nonsalient region can be seen as the foreground seeds (denoted as ) and the background seeds (denoted as ) of the input image, respectively. The larger the difference between the superpixel and the foreground region is, the lower the saliency value of this superpixel is. Conversely, the greater the difference between the superpixel and the background region is, the higher the saliency value of this superpixel will be. Thus, the saliency value of can be updated based on foreground seeds and background seeds as follows:

Then, a new saliency map (denoted as , ) of the first iteration optimization is obtained. The Otsu’s method is reused to generate new and ; the saliency map of the next generation (denoted as ) can be computed according to (7-9). Finally, two cost functions are implemented to decide whether the iteration procedures meet the end condition or not:

The function mainly represents the global fitness, which denotes that the smaller the change between the saliency map of the new generation and the previous generation is, the more optimization of the objective can be. The function mainly represents the local fitness, which denotes that the smaller the difference between the superpixel and its neighboring superpixels is, the better the saliency information of each decision variable can be. By minimizing the two functions and , the optimal superpixel-level saliency map can be obtained.

4. Experiment Results

Comprehensive experiments are carried out on six datasets to estimate the performance of our model against 20 state-of-the-art salient object detection models.

4.1. Experimental Setup
4.1.1. Testing Datasets

The six test datasets contain five public datasets and the proposed weak light image dataset as follows: (1) the MSRA dataset [39] includes 10000 images which have relatively high contrast and only simple background; (2) the SOD dataset [40] includes various images of multiple objects and complex background; (3) the CSSD dataset [41] includes complex natural scenes; (4) the DUT-OMRON dataset [42] includes complex and challenging images; (5) the PASCAL-S dataset [43] includes images of cluttered background; and (6) our NI dataset includes 200 weak light images, which are captured at night with a stand camera. The resolution of these images is , and the human-annotated ground-truths (GTs) are also given.

4.1.2. Comparison Models

The first 15 state-of-the-art saliency models include: Itti’s (IT) model [9], spectral residual (SR) model [44], frequency-tuned (FT) model [45], nonparametric (NP) model [46], context-aware (CA) model [10], image signature (IS) model [47], low rank matrix recovery (LR) model [48], patch distinct (PD) model [49], graph-based manifold ranking (MR) model [42], saliency optimization (SO) model [36], bootstrap learning (BL) model [50], generic promotion (GP) model [51], spatiochromatic context (SC) model [52], structured matrix decomposition (SMD) model [53], and multiple-instance learning (MIL) model [54]. All these experiments are performed by MATLAB software on an Intel i5-5250 CPU (1.6GHz) PC with 8 GB RAM.

4.1.3. Evaluation Criteria

To estimate the overall performance of various saliency models, seven criteria are used, including the true positive rates and false positive rates (TPRs-FPRs) curve, the precision-recall (PR) curve, the area under the curve (AUC) score, the mean absolute error (MAE) score, the weighted F-measure (WF) score, the overlapping ratio (OR) score, and the average execution time per image (in seconds).

The TPR is defined as the ratio of salient pixels that are correctly detected to all the true salient pixels, and FPR corresponds to the ratio of falsely detected salient pixels to all the true nonsalient pixels. The precision is computed as the ratio of correctly detected salient pixels to all the detected salient pixels, and the recall is the same as TPR, which measures the comprehensiveness of the detected salient pixels. By varying the threshold over the obtained saliency map, different TPRs, FPRs, precisions, and recalls can be calculated by comparing the generated different binary images with GT viawhere the true positive (TP) is the collection of pixels that correctly identify the salient object; the false positive (FP) is the collection of pixels that falsely identify the salient object; the true negative (TN) is the collection of pixels that correctly identify the nonsalient pixels; and the false negative (FN) is the collection of pixels which falsely identify the nonsalient pixels.

The TPRs-FPRs curve and the PR curve can be generated by plotting the corresponding ratios. The AUC score is calculated by measuring the proportion of the area under the TPRs-FPRs curve, which can give an intuitive indication of how well the obtained saliency map represents the real salient objects. The MAE score is calculated as the average absolute difference between the generated saliency map (denoted as ) and the ground-truth (denoted as ) via

The smaller the MAE value is, the higher the similarity between and is. The F-measure score is computed as the weighted harmonic mean of the precision and the recall viawhere is the parameter to weigh the precision and recall. The WF score is calculated by adding a weighting function to the detection errors [55].

The OR score is measured by computing the overlapping ratio of salient pixels between the binary saliency map (denoted as ) and via

4.2. Experimental Results

The quantitative performances of our salient object detection method against the other 15 saliency models on the six datasets are presented in Figure 3 and 4 and Tables 16. The best three experimental results of Table 16 are highlighted in the red, blue, and green fonts, respectively. In particular, the up-arrow ↑ denotes the larger the value is, the better the performance of the saliency model is. At the same time, the down-arrow ↓ indicates the opposite meaning. As shown in the quantitative results, our salient object detection model performs the first or second performance on the five public datasets in most cases and obtains the best performance on the NI dataset in a relatively low time-consuming.

On MSRA, DUT-OMRON, and PASCAL-S datasets ((a), (d) and (e) of Figures 3 and 4 and Tables 13), our model achieves the best performance on the TPRs-FPRs curve, PR curve, and AUC score, while the saliency model SO obtains the best MAE score and WF score, and the saliency model MIL obtains the best OR score. The main reason is that the SO model used boundary connectivity and global optimization to increase its robustness, and the MIL model introduced a multiple-instance learning approaches to increase precision. These two saliency models take full advantage of the background measures, which can be of some effect in detecting the salient object under complex background conditions. Although the MAE score, WF score, and OR score of the proposed saliency model are slightly lower than the two models SO and MIL, our detection results are more competitive than the other models. The average time consumption of the MIL model is more than 100 seconds per image, which is not efficient in generating the saliency map.

On the SOD dataset (Figures 3(b) and 4(b), and Table 4), our saliency model has the best performance on the TPRs-FPRs curve, PR curve, AUC score, WF score, and OR score. In terms of the MAE criterion, the proposed model performs the second-best performance, which only has a small gap (0.0025) with the best MAE score of the SO model.

On the CSSD dataset (Figures 3(c) and 4(c), and Table 5), our saliency model performs the best performance on the TPRs-FPRs curve, PR curve, and AUC score. The MAE, WF, and OR scores of the proposed model are slighter than the best results achieved by the SMD model. The SMD model is based on the structured matrix decomposition with two regularizations, which has a strong potential in detecting the image of complex environments. The main reason for the poor performance of the proposed model on these metrics is that the selected optimal features contain less useful information that can effectively distinguish the salient objects.

On the NI dataset (Figures 3(f) and 4(f), and Table 6), our saliency model is superior, as it achieves the best performance on these criteria with a relatively short time-consuming.

The qualitative comparisons of saliency maps generated by the various salient object detection models on the six datasets are shown in Figure 5, indicating that our saliency model can detect the real salient object accurately in complex and/or weak light images (more detected saliency maps can be downloaded from https://drive.google.com/open?id=0BwVQK2zsuAQwQjZHeUJ1dlBsQms).

Since the standard real-world images and the weak light images have different properties, the proposed framework employs a feature selection strategy over the candidate feature set to pick out the most relevant features that apply to different types of images, which ensures that our model can be adapted to both standard saliency datasets and the weak light image dataset. In addition, we further optimize the saliency results through iteration to ensure robustness.

To further verify the effectiveness of our model, we have added some experiments with other five state-of-the-art deep learning-based saliency models (NLDF [56], LPS [57], BAS [58], F3Net [59], and LDF [60]) to better illustrate the advantages of the proposed flowchart. The subjective performance comparisons of the proposed model with the latest deep saliency models are shown in Figure 6.

As can be seen in Figure 6, the saliency maps of the NLDF and F3Net models cannot capture the effective salient objects in weak light images. The saliency results of the LPS model are seriously interfered with by the background noise. The saliency maps generated by the BAS model can highlight the salient objects with less noise, but the detected salient objects are incomplete. The LDF model has difficulty in detecting the whole objects and is prone to failure. Relatively speaking, the proposed model can accurately detect the real salient objects from the background on weak light images.

5. Conclusion

In this paper, we propose an optimal feature selection-based saliency seed propagation model to detect the salient object in weak light images. The main idea of the proposed saliency model is to execute saliency calculation by learning the optimal hand-crafted visual features and refining the foreground seeds and background seeds recursively. Guided by the optimized saliency seeds, the final saliency map can be achieved by fusing the multiple superpixel-level saliency maps at three different scales. Comprehensive experiments demonstrate that our saliency model performs satisfactory results against 20 state-of-the-art saliency models on five public datasets and a weak light image dataset.

Serving as a preprocessing step, salient object detection can efficiently focus on the most interesting area associated with the current visual task and it facilitates various computer vision applications such as image classification, object segmentation, visual tracking, etc. The proposed salient object detection model can be used to optimize correlational vision applications under weak light conditions, and it is of great application value to the monitoring system. In the future, we will further improve the time performance of the proposed saliency model and explore more potential applications.

Data Availability

The proposed nighttime image (NI) dataset, the code of the proposed model, and the experimental result data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Science Foundation of China (62006165 and 61701331).