Abstract

We proposed a novel saliency detection method based on histogram contrast algorithm and images captured with WMSN (wireless multimedia sensor network) for practical wild animal monitoring purpose. Current studies on wild animal monitoring mainly focus on analyzing images with high resolution, complex background, and nonuniform illumination features. Most current visual saliency detection methods are not capable of completing the processing work. In this algorithm, we firstly smoothed the image texture and reduced the noise with the help of structure extraction method based on image total variation. After that, the saliency target edge information was obtained by Canny operator edge detection method, which will be further improved by position saliency map according to the Hanning window. In order to verify the efficiency of the proposed algorithm, field-captured wild animal images were tested by using our algorithm in terms of visual effect and detection efficiency. Compared with histogram contrast algorithm, the result shows that the rate of average precision, recall and F-measure improved by 18.38%, 19.53%, 19.06%, respectively, when processing the captured animal images.

1. Introduction

Preservation of wild animal is crucial for the balance and stability of the whole ecosystem. However, the phenomenon of excessive hunting and killing of wild animal around the world is serious [1]. Over 300 kinds of terrestrial vertebrates are in an endangered state according to preliminary statistics. Saliency region detection are capable of effectively extracting the wild animal region information. Besides, it can provide an option for scanning and matching important target regions in wild animal detection and recognition [2, 3]. Hence, saliency region detection for wild animal images is becoming more and more significant in animal protection realm, which has become a focus of recent researches.

Traditional wild animal detection and recognition method [4] mainly use the collected images of wild animal as a test while a training set for learning and training purpose. These experimental samples need to treated by several screening and preprocessing since they generally contain complete, clear, and low-noise image features. However, traditional detection algorithms cannot effectively process captured animal images during wild animal monitoring mission due to the character of complex background and nonuniform illumination that exist in original images. Therefore, proposing an appropriate and effective detection method is a crucial prerequisite to solve the existing problem.

At present, visual saliency detection technique [5] can quickly and automatically extract the main image information and remove the redundant background information, which has won wide attention from both domestic and foreign researchers [68]. Most visual attention-related saliency detection methods are based on the foundation of biological theory [911]. However, these algorithms have low saliency image resolutions, and their computational complexity is high at the same time. Another popular method is based on the basis of model analysis [1215]. Although it has a good detection efficiency and the detection result coordinates well with the human eye characteristics, it cannot effectively process rich texture information in monitoring scenes. Saliency object detection [1618] can efficiently separate the salient objects from image background. Klein et al. developed a salient object detection based on the standard structure with the help of cognitive visual attention models, and Yong et al. presented a framework that models semantic contexts for key-frame extraction based on wild animal images. Nevertheless, most existing algorithms can only process images with simple background and ordinary resolution features. Besides, they are not applicable to field-captured images in practical wildlife monitoring mission.

Therefore, this paper aims at the demands of actual wild animal monitoring and focuses on solving the problem of high resolution, complex background, and nonuniform illumination that exist in wild animal image saliency detection research.

In this paper, a sample library of wild animal monitoring images was established. The dataset contains images captured from Saihan Ula Nature Reserve in Inner Mongolia province using WMSN monitoring system. The WMSN monitoring system was configured and developed independently in the laboratory with wireless remote, real-time, precise, and meticulous modules. The image dataset covers 12 species and 1000 HD wild animal images. We have established a standard ground-truth image library through manual stamp operation. Based on the field monitoring images, we proposed an improved histogram contrast detection method. The correctness and validity of this method are shown by implementing to the wild animal images.

The contribution of the present study included the following: (1) developed the wild animal monitoring system based on WMSN to capture experimental materials in Saihan Ula Nature Reserve in Inner Mongolia province; (2) established actual wild animal monitoring image database with the unique characteristics; (3) a novel saliency detection method was introduced in this paper for wild animal images with high resolution, complex background, and nonuniform illumination.

2. Wild Animal Monitoring Based on WMSN

2.1. Wild Animal Monitoring System

Traditional wild animal monitoring methods include crewed field survey, GPS (global positioning system) collar [19], infrared camera [20], and satellite remote sensing monitoring [21]. However, these methods have defects such as limited monitoring range, data acquisition lag, and unmeasured local microinformation.

Wireless multimedia sensor network (WMSN) is mainly used to capture the wild animal image materials utilizing industrial grade cameras with terminal node equipment embedded. The wild animal monitoring system which achieves remote, real-time, all-weather, and friendly monitoring goals consists of WMSN terminal nodes, coordination nodes, gateway nodes, and data storage center (back-end sever). The detailed configurations are shown in Figure 1. The monitoring node devices developed by our laboratory are based on ZigBee network protocols. Detailed parameters are shown in Table 1.

The monitoring node devices using ZigBee network protocols established a wireless image sensor network in a self-organizing way. When wild animals enter the monitoring view field, the infrared sensor embedded in terminal node will trigger the camera to capture images. Captured images were firstly saved in SD card. Then these images will be transmitted to the coordination node via multihop method. After the coordination nodes successfully receive and converging transmitted image data information from all terminal nodes, the monitoring image information will be transmitted to data center through gateway node utilizing 4G signal by wireless and remote way.

2.2. Fieldwork Material Collection

The WMSN monitoring system for wild animal monitoring was deployed in Saihan Ula National Nature Reserve in Inner Mongolia, which is subordinative to the Greater Khingan mountains. The experiment area with temperate semihumid rainy climate has an average altitude of 1000 m above the sea level. Wild animals collected in the experimental area include Cervus elaphus, Lynx, Capreolus pygargus, Sus scrofa, and Naemorhedus goral. Cervus elaphus and Lynx are national secondary protected animals (shown in Figure 2). In this paper, 1600 images of more than 12 wild animal species are acquired, and the total image data storage volume is 2.4 G.

By analyzing the captured images, we found that most images have complex background, nonuniform illumination, and different image target ratio features. Those image features will cause effects on the saliency detection work, especially in the regions with large grayscale gradients.

3. Method Analysis

The improved histogram contrast area detection method is proposed to process the wild monitoring images with high resolution and complex background in this section. Due to the particularity of materials, both structure extraction and edge detection are introduced in this paper. As shown in Figure 3, we firstly implemented the image structure extraction method based on the image total variation to extract the structure of input images, which aims to smooth the image texture and reduce image noise. Then the saliency detection method-based histogram contrast is implemented to capture the color saliency information of the image. By quantifying the input image to a small color range, the calculation procedure becomes simple and the computation efficiency can be improved. Finally, the edge detection and the position saliency map are synthesized to obtain the final optimization results.

3.1. Structure Extraction Based on Image Total Variation

During the first step, we have not assumed or manually determined the type of textures, as the patterns could vary a lot in different examples. We applied image window total variation and internal variation to the structure extraction method.

Firstly, the window total variation and of image samples pixel in and directions are obtained by where denotes the 19 × 19 square region centered at pixel . denotes the Gaussian filter kernel function in which and are the pixel coordinates and means scale parameter of the function, controlling the spatial scale of the window.

To help distinguish prominent structures from the texture elements, the window internal variation is calculated to extract the prominent structures from the texture elements according to

Finally, the optimized model with window total variation and internal variation is established. With the optimization result, the contrast between texture and structure of the visually salient areas can be further enhanced. where denotes extracted structure image while denotes the input image. The term makes the input and result not deviate. is the smoothness coefficient of the image, and is a small positive number to avoid division by zero.

3.2. Histogram Contrast Saliency Detection

The structure extraction can smooth the texture and reduce the image noise, therefore the extraction result could be used as input of saliency detection. Then the input images are quantified according to the number of quantify channels CN. The main color is arranged into a color matrix by histogram statistics. After, the image pixels are reordered by color value, such that the terms with the same color value are grouped together. The saliency value between different colors is calculated and expressed as shown in (4). The saliency values are the same when the color of the pixels is the same. denotes the color distance metric between the pixel and in Lab space. where is the color value of the in structure extraction image, and represents the saliency value of . denotes the total color numbers of input image and represents the ratio of the pixels whose color value is to the total pixel numbers in the image.

Color quantization greatly simplifies calculation procedure, but similar colors may be quantized to different values during the process. In order to reduce noisy saliency results caused by such randomness, we replace the saliency value of each color by the weighted average of the saliency value of similar colors. where the equation denotes the distance between the color and its nearest colors. Typically, is quarter of color numbers in the images after quantification.

Then the saliency area is obtained by comparing appearance frequency [22] of the first n kinds color. If the appearance frequency is greater than CR (color retention rate), the color with low frequency is discarded subsequently and replaced with the closest color. The color appearance frequency of the first n kinds color will be increased in accordance with the number of quantify channels until it is greater than CF.

3.3. Edge Detection Based on Canny Operator

The edge detection process aims to measure the convolution of the Gaussian smoothing filter and the above saliency detection result to obtain the most optimized approximation operator. where denotes convolutional result, refers to convolution function, and is the position of the pixel in saliency result.

Then the partial derivative is obtained by calculating the first-order finite difference of the filter result.

Among them, represents the gradient partial derivative of image in direction, and is the gradient partial derivative in direction. Therefore, the pixel amplitude matrix and gradient direction matrix are calculated as shown in the following equations:

Finally, nonmaximal value suppression is completed through seeking amplitude maximum of the matrix along the gradient direction. The pixels with maximum amplitude are considered as the edge pixel. To make the image edge close, this paper selects double appropriate threshold (high threshold and low threshold). As a consequence, the nonedge points that do not satisfy the threshold condition are removed. Then the connected domain is expanded to get the final edge detection result.

3.4. Synthesis and Optimization

In this section, we optimize the center position weight of the two-dimensional images with Hanning window function established by center-edge and contrast degree theory. The one-dimensional Hanning window function is constructed as follows. where denotes the input data length, . Consequently, in the two-dimensional Hanning window construction function is the scalar product of two one-dimensional function,

On the basis of position saliency map, the saliency and edge detection based on structure extraction are introduced to obtain more accurate saliency detection results, where denotes the synthetic saliency map and coefficient . Smap refers to saliency detection result of histogram contrast. Emap is the edge detection result by image structure extraction. Hmap is the constructed position saliency map.

3.5. Experimental Parameters Selection

The color value of a single pixel in RGB image ranges from 0 to 2563, therefore the number of colors that needs to be processed will reach 107 during the color saliency calculation process. In this paper, the color value of each channel (R, G, and B) is quantified to 0–12 (CN) so that the number of colors that need to be calculated will reduce to 123 = 1728. In order to ensure the smoothing effects, the filter function scale parameter is taken as 3 after several single-variable experiments. The number of iterations is 3, and smoothness coefficient λ is taken as 0.015. In addition, the color retention rate CF is 0.95 to obtain high-frequency colors in saliency area. The position saliency map whose size is 300 × 400 is shown in Figure 4.

4. Comparison and Discussion

In order to verify the effectiveness of the proposed algorithm, we selected the images of actual wild animal monitoring images and public image library as experimental sample and compared the result of our algorithm with other saliency detection algorithms.

Both precision and recall rate [13] are used as objective criteria to evaluate the accuracy of saliency detection. Precision/recall is the ratio of correctly detected salient region to the detected/“ground truth” salient region which means that the precision rate is the ratio of the correct area and the true saliency area of saliency object detected by the visual saliency model.

The recall rate is the ratio of the correct area and the detected saliency area that are calculated by the saliency model in saliency map. where is the true saliency area and is the corresponding image index. is the calculated saliency area.

4.1. Experiment to Wild Animal Images

The field samples of wild animal monitoring are selected from private image database with different light intensities, capture distances, and backgrounds due to seasonal variations. Six classical saliency detection algorithms consisting of CA (context aware) [23], SEG (segment) [24], LLV (low level vision) [25], WT (wavelet transform) [26], SR (spectral residual) [27], HC (histogram contrast) [28], and five classical object detection algorithms consisting of GS (global saliency) [29], FD (frequency domain) [30], GP (gestalt principle) [31], MC (multiscale contrast) [32], and DF (dynamic feature) [13] are compared with our proposed algorithm (SHC) to verify the detection effect in this section.

As the results shown in Figures 5 and 6, the detection method proposed in this paper is more accurate than above classical algorithms in detecting the object areas. The algorithm in this paper preserves a better quality of edge information, and its object area is more smooth as the images containing rich color and complex background.

We set the segmentation threshold from 0 to 255, and the average precision and recall rate of all the images of private image database are shown in Figure 7.

The precision and recall rate of our algorithm is higher than other eleven alternatives. We believe that the main structure extraction utilized in our algorithm successfully suppresses the influence of the texture information on the detection result. The relatively high recall rate tends to have higher precision rate. The edge detection and the position saliency map are further improved both in uniform and smooth aspects.

In addition, as all the pixels in the saliency map are considered to be foreground, where the segmentation threshold is 0, all algorithms tend to have the same precision and recall rate (precision rate is about 0.1, recall rate is 1.0).

According to formula (14) and (15), the evaluation indicators of precision and recall rate are negative correlation. Therefore, we use F-measure (also known as F-score) to evaluate the effectiveness of saliency detection algorithms. F-measure value is the harmonic mean parameter calculated from precision and recall rate by a certain weight that can be obtained by

The F-measure is an overall performance measurement, among them, is the weight parameter for controlling the precision and recall rate. The smaller means less important the precision is, which is set as herein.

We also introduce an adaptive threshold that is image saliency dependent, instead of using a constant threshold for each image. The adaptive threshold is used to segment the saliency detection results as follows. where is the obtained threshold value. and are the width and height of the saliency map, respectively. denotes the saliency map and refers to the corresponding coordinate of the saliency map.

After obtaining the segmentation result based on the saliency map, the precision, recall rate, and F-measure of all segmentation results are calculated. Taking their average, respectively, as the comparison result by using different methods and the detailed results are shown in Figure 8.

As shown in Figure 6, the average performance of the proposed algorithm is more efficient than the other six algorithms. Among them, the average precision rate, recall rate, and F-measure of detected results of our algorithm are 0.4895, 0.7321, and 0.5300 (shown in Tables 2 and 3), which increased by 18.38%, 19.53%, and 19.06%, respectively, while comparing with the HC algorithm. Although the average precision rate of SEG algorithm is higher than our algorithm, its higher computational complexity makes it not efficient.

We have compared the average running time of each algorithm, and the comparison table is shown in Table 4. All experiments are performed using MATLAB (R2014a) on the workstation with Intel (R) Core (TM) i3-2330 and 4 GB RAM.

SR algorithms cost the least calculation time because they used domain transformation and simple filtering to get the saliency map. However, the detection accuracy and quality of the above two algorithms are not satisfactory. Compared with the HC algorithm, the saliency detection result of our algorithm is better despite slightly higher calculation time. Above results show that our algorithm is more suitable for the application of wild animal image saliency detection.

5. Conclusion

In this paper, we proposed a novel saliency detection algorithm based on histogram contrast for wild animal monitoring images, which can be used for dealing with high resolution, rich colors, and high noise. The proposed method consists of four steps, namely, structure extraction, saliency detection, edge detection, and synthesis optimization. Firstly, the structure extraction is required to smooth image texture and reduce image noise. Saliency detection using histogram contrast aims to extract the area of wild animal from images. Then canny operator is further implemented to edge detection to obtain complete saliency target edge information. Finally, Hanning window is applied to make saliency areas prominent. To demonstrate the efficiency and validation of the proposed method, the images from field-captured wild monitoring database are processed. The final result shows that the proposed algorithm has better performance than existing classical algorithms, especially for captured wild animal monitor images. Compared with the classical detection algorithms, the average precision rate, recall rate, and F-measure of detected results obtained by our algorithm are increased, respectively, by 18.38%, 19.53%, and 19.06% for the wild animal images when compared with the HC algorithm.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was financially supported by National Natural Science Foundation of China (Grant no. 31670553), Fundamental Research Funds for the Central Universities (Grant no. 2016ZCQ08), and Import Project under China State Forestry Administration (Grant no. 2014-4-05).