Abstract

Underwater moving object detection is the key for many underwater computer vision tasks, such as object recognizing, locating, and tracking. Considering the super ability in visual sensing of the underwater habitats, the visual mechanism of aquatic animals is generally regarded as the cue for establishing bionic models which are more adaptive to the underwater environments. However, the low accuracy rate and the absence of the prior knowledge learning limit their adaptation in underwater applications. Aiming to solve the problems originated from the inhomogeneous lumination and the unstable background, the mechanism of the visual information sensing and processing pattern from the eye of frogs are imitated to produce a hierarchical background model for detecting underwater objects. Firstly, the image is segmented into several subblocks. The intensity information is extracted for establishing background model which could roughly identify the object and the background regions. The texture feature of each pixel in the rough object region is further analyzed to generate the object contour precisely. Experimental results demonstrate that the proposed method gives a better performance. Compared to the traditional Gaussian background model, the completeness of the object detection is 97.92% with only 0.94% of the background region that is included in the detection results.

1. Introduction

Underwater object detection is aiming to extract the interesting objects from the background scene. Effective underwater moving object detection contributes to many scientific research and engineering applications, such as marine biology, seabed topography, marine environment monitoring, and marine exploration [1]. However, due to the strong optical attenuation and light scattering caused by the water medium and suspending particles, underwater images are essentially characterized by their poor visibility, especially the low contrast and distorted information [2, 3]. These low quality image data seriously block the underwater computer vision tasks. In the underwater object detection task, the decayed color and the haze effect would significantly decrease the contrast between the object and the background. Many commonly used image features are distorted and can hardly be taken for precise object detection.

After a long period of evolution, biological visual systems develop a strong ability for sensing the world. Various visual mechanisms in animals have been simulated and introduced into computer vision tasks [46]. For underwater object detection, the visual system in aquatic animals gives us many valuable inspirations and some progress has been achieved in the bionic model. Barat and Rendas [7] introduced the motion information in successive video frames to extract salient regions. The edge and contour of the object are detected by the active contour algorithm. Walther et al. [8] combined the visual attention model and the background difference to obtain global saliency maps. Wang et al. [9] updated the Itti model by introducing the prior knowledge about the maximum number of objects in a single frame. However, many problems still exist in these researches. The underwater object detection by the above methods is incomplete, missing in object regions. Furthermore models based on the prior knowledge learning are difficult to adapt for the underwater tasks. Most crucially the artificial illumination which is used to compensate for the power attenuation in the underwater medium would generate inhomogeneous illumination environments while the background scene is unstable and the pixels with strong intensity would be mistaken as the object region.

In order to solve the problems in existing underwater moving object detection method, this paper proposes a novel hierarchical background model by simulating the frog visual perception, which is considered to have an excellent ability for motion detection [10]. Hence the visual relativity is modeled. Only the intensity information is extracted and introduced into the background model. Finally a hierarchical background modeling is proposed for efficiently detecting the underwater object in illumination changing and nonstationary environments. With the bionic method, the proposed method has high dynamical adaptability in the underwater object extraction task and stronger robustness to the underwater environment. The experiment results prove its efficiency in object extraction under the underwater optical environments.

The remainder of this paper is organized as follows. Section 2 briefly describes the characteristics of visual perception and information processing mode in the frogs. Section 3 presents details of the proposed method. The experiment results on several underwater image sequences are discussed in Section 4 and finally Section 5 concludes the paper.

2. Frog Visual Mechanism and Information Processing Model

Frog is a typical visually guided animal. The eye of the frogs is their main biological sensor for tracking preys. However frogs are more sensitive to the moving object compared with other animals. When frogs keep completely static, nothing can be perceived by the retina of the eye. Therefore they are blind to the static object even if it is very close [11]. Accordingly the motion information of moving object is the critical cue which controls the preying behavior in the visual system of frogs. Biological researches find that frogs are born “myopia.” The foreground scene is clearly imaged on the retina, while the background is blurred. This visual mechanism enables frogs to find and capture the preys correctly and quickly. Different from the focus shift process in human visual attention mechanism [12], frogs keeping in static state do not move their eyes to search and track interested objects. However if the frog’s body moves, the whole visual scene would be reversal [13]. In this case, in order to keep the image stably represented on the retina, the frogs would move the eye to compensate for the movement of the scene.

In the view of the underwater moving object detection, this paper focuses on three aspects in frog’s visual and neurophysiologic mechanism.(1)The low distance resolution would result in the blurred background and clear foreground presented in the retina. Therefore frog can easily distinguish the object in the foreground from the background. By employing this mechanism in the computer vision, we firstly segment the image into subblocks which are utilized for classification of the foreground and the background. Then the background region is ignored and the pixel-based processing is operated on the foreground region. With the above preprocessing, the foreground region can be easily extracted and the object detection operation is focused on the foreground region. Accordingly the redundant computation on the background region is saved and the complexity of the object detection is reduced. Furthermore the subblocks based processing solves the difficulties caused by nonstationary background in some extent.(2)A frog has a memory on both the moving objects and the background. Once the interest is focused on any objects, the attention of frogs can hardly be dispersed. By taking this into consideration, the foreground and the background are modelled and updated by the feature extracted in the respective regions. This strategy solves the problems caused by the change in the lamination and increases the accuracy of underwater object detection.(3)The retina and neural fiber in the eye of frogs are sensitive to the local bright-dark contrast and the bright and dark change in movement region. This visual sensitivity can be modeled by the selection of the image feature. According to the computer vision task, the intensity and the texture feature describing the intensity distribution in local regions are extracted for detecting the contour of the moving object.

Inspired by the above aspects of visual mechanisms in the eye of frogs, this paper proposed a hierarchical background model based underwater moving object detection method. In this method, the foreground and the background are modeled by the information extracted from pixels and subblocks, respectively. The intensity and the texture feature are extracted to describe the contour of the underwater objects correctly.

3. Object Detection Method

3.1. Overview of the Proposed Method

The key for the object extraction is to stretch the contrast between the object and the background. Considering the spatial correlation between pixels, the subblock based background modeling is sensitive to the global change of the scene but blind to local movement which solves the problems caused by the unstable background. However it might generate the rough object region with serious blocking effect due to subblock operation which may deform the object and the intensity feature for modeling the background can hardly identify the objects in the scene in some cases.

More precise object information can be extracted by using pixel-based background model. By using the pixel-based operation, the rough object region is correctly detected without the blocking effect. However, the results given by the pixel-based operation do not only include the object region but also include the regions surrounding the object. Hence, errors would exist in the scene with the unstable background.

Therefore, the subblock and the pixel-based operation are mutually compensative. The asymmetric forward feedback mechanism is then applied to jointly combine these two strategies to form a hierarchical background model for object detection. Firstly, intensity features are extracted in the subblock and the difference between the subblocks is taken as the cue for classifying the rough object and background region. The rough object region is extracted afterwards and the background model is updated. Then texture features of every single pixel which belongs to the rough object region are extracted to establish the pixel-based background model. Figure 1 illustrates the process of the proposed underwater moving object detection algorithm.

In order to reduce the computational complexity, the detection process is operated under the following rules.(1)The background region identified by the subblock based method is reliable. The pixel-based identification is omitted for the given background region. In order to adapt our method to the change of the scene, the background region is updated by the information extracted from the subblock regions but not the pixels.(2)The foreground region identified by the subblock based method contains the pixels of the real object region and a small amount of unstable pixels. Hence, the pixel-based algorithm should be utilized to further detect the object region to remove the blocking effect.(3)Since most of the pixels in the detected rough object region are included in the real object region, updating process of the background model is not necessary in this region.

3.2. Rough Object Region Detection

The rough object region is detected by the subblock based operation. The input video frames are segmented into multiple nonoverlapped subblocks with a size of . For each subblock, the intensity feature is extracted. By block truncation coding (BTC), an image coding method which represents the movement vector based on the subblock [14], the intensity feature vector accordingly can be represented as : where denotes the intensity of pixel in a subblock, denotes the mean intensity of all pixels in a subblock, and is the mean intensity of the pixels whose intensity is higher than the threshold while denotes the mean intensity of the pixels whose intensity is lower than the value of .

The feature extracted in a subblock is represented by a vector . If all in one subblock are identical, then set all these four values as . If the high-intensity values of pixels in one subblock are identical, then set and as . If the low-intensity values of pixels in one subblock are identical, then set and as . With the subblock based background modeling method and the intensity feature, the difference between the object and the background region can be correctly and quickly recognized. In order to solve the problem caused by the change of the scene, the strategy for Gaussian mixture model updating [15] is introduced.

A set of intensity feature vectors for each subblock is introduced. In order to indicate the importance of different elements, the additional weight is introduced and . Accordingly, the vector with larger weights has stronger ability to identify the object and the background. These weights are initialized as where denotes the vector of the intensity feature extracted from subblock in the first frame. Each subblock in the following frame is discriminated according to where is the weight for the th frame and the threshold is set for identifying the rough region of the object and the background. The first vectors which are satisfied with (3) are discriminated as the background regions, and the last vectors belong to the object regions.

To extract the rough object region, the intensity features in the subblocks are extracted. Then they are related to the background model by the Euclidean distance: where and are the th vector of and . If ( is the distance threshold), the intensity feature of subblock and th vector are matched. Once a subblock matches the first vectors, it belongs to the background or it is involved in the object region.

If the feature of subblock is in correspondence with at least one background model, then is utilized to update the background model: where is the parameter controlling the rate of learning. The parameters of the background model are updated as follows: where is the parameter controlling the rate of learning; when the new subblock matches the th vector, and otherwise.

If subblock fails to match any models, then a new model is established with a minimum weight . The new model is initialized as If the variance of the interest subblock is different from that of the background model, the interest subblock is likely to belong to the moving object region but not the background. To solve this problem and considering the large influence of the illumination on the imaging environments the threshold is adaptively moderated by the intensity variance: where is an empirical constant and set as . The parameter denotes the similarity of intensity features between two subblocks: where and denote the mean intensity of all pixels in two subblocks, respectively. and are the intensities of th pixel in two subblocks, respectively.

3.3. Accurate Object Contour Extraction

For each pixel in the detected rough object region, the texture feature is extracted and utilized to extract the accurate object contour. In this paper, we choose the local binary pattern (LBP) texture operator to describe texture features. The most important properties of the LBP operator are its tolerance against the change of illumination and its computational simplicity [1618]. In order to adapt the LBP to the underwater scenes, we modify this operator.

Given the center pixel , LBP operator uses joint distribution to describe local texture features: where corresponds to the gray value of the pixel and are the gray values of pixels which are equally located on a circle with radius . By increasing the radius, we can collect larger-scale texture primitives as shown in Figure 2.

By introducing the difference between and , the joint distribution can be transformed as

Assuming that and are independent, can be decomposed as

As denotes the gray distribution of the whole image, the texture feature can be described by the joint distribution of the gray difference between the pixels and the center pixel , as

If illumination changes linearly in underwater scenes, the value of is not changed. Hence, the sign function can be chosen as the replacement to describe the texture feature: where the sign function can be denoted as

Practically the sign of the differences in a neighborhood is interpreted as a -bit binary number. This -bit value is transformed into a unique decimal number for describing the local spatial texture feature:

LBP is robust against the considerable gray-scale variations which commonly appear in natural images. Moreover, the LBP operator is computationally economic, which is important in practice. Besides these factors, LBP is a nonparametric method without any assumptions about the underlying distributions. However since the low change of the grey in the underwater background, the grey values between the center point and its neighborhood are homogeneous. In this case, a large error would exist if the traditional LBP operator is used. For example, if and the given by (15) is 0, while when . In practice, this low difference between and is commonly ignored. Hence a moderation factor is introduced and in (16) is replaced by . In this paper, we set .

A set of texture feature vectors is extracted within the rough object region. These features are arranged according to the image sequence. The texture vectors are initialized as where denotes the texture feature of LBP in the pixel of the first frame. Euclidean distance which is to estimate the similarity between and is calculated as where denotes the th component of the texture feature vectors. If , pixel is identified as object. Otherwise, pixel is defined as the background and this texture feature is introduced into the model : The hierarchical background modeling method can be summarized as shown in Algorithm 1.

Input: Underwater image sequence
Step  1. Segment each frame of input underwater image sequence into multiple nonoverlapped
  subblocks;
Step  2. Extract intensity features of each subblock, ;
Step  3. Establish the background model based on each subblock to distinguish background and
  rough object;
Step  4. Update parameters for the background model;
Step  5. Extract the texture features of each pixel in rough object region, ;
Step  6. Build background model based on each pixel to obtain object contours.
Output: Results of moving objects detection

4. Experimental Results and Analysis

In order to demonstrate the efficiency of the proposed method for detecting underwater moving object, the classic Gaussian background modeling method is selected as the reference which is used to compare it with our proposed method. The detection results are shown in Figures 3, 4, and 5.

According to the detection results, the Gaussian background modeling method has the ability to roughly detect contours of object. However the detected contours are not complete, especially for those parts which are similar to the background. In contrast to the results given by the Gaussian method, the contours of objects given by the proposed hierarchical method are more complete. The detected results are more precise. The criteria and [19] are employed to achieve quantized evaluation, as where is the detected object region, is the real object region, and is the background region. denotes the ratio of the detected region to the real object region and is the ratio of the false detected region to the background region. The performance evaluation is shown in Table 1.

From the results shown in Table 1, for the underwater moving object detection, the proposed hierarchical method, has better performance in contrast to the Gaussian background modeling method. More precise results can be obtained by our method. The mean value of is increased to 0.9792 and the mean value of is decreased to 0.0094. For Figure 3 and Figure 5, obtained by the proposed method is very close to 1 while is very small. It is indicated that the detected region by the proposed method can generally cover the real object and there is a very little background included in the results. For Figure 4   achieved is relatively lower than that for Figure 3 and is the lowest one among all results. Overall it is demonstrated that our method is feasible, effective, and sufficiently accurate for the underwater moving object detection.

5. Conclusion

Inspired by the frog visual mechanism, the frog visual information processing mode is simulated to establish a bionic underwater object detecting method. By using the illumination information of the input image a hierarchical background model is established to detect underwater moving objects. The experimental results demonstrate that the proposed method detects underwater moving objects effectively and accurately. In this paper the visual mechanism in the visual system of frogs is modeled preliminarily and further research work will focus on this field to achieve a more complete bionic model.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work is supported by the National Natural Science Foundation of China (no. 61263029).