Abstract
Underwater object detection plays an important role in research and practice, as it provides condensed and informative content that represents underwater objects. However, detecting objects from underwater images is challenging because underwater environments significantly degenerate image quality and distort the contrast between the object and background. To address this problem, this paper proposes an optical prior-based underwater object detection approach that takes advantage of optical principles to identify optical collimation over underwater images, providing valuable guidance for extracting object features. Unlike data-driven knowledge, the prior in our method is independent of training samples. The fundamental novelty of our approach lies in the integration of an image prior and the object detection task. This novelty is fundamental to the satisfying performance of our approach in underwater environments, which is demonstrated through comparisons with state-of-the-art object detection methods.
1. Introduction
Underwater object detection from images plays an important role in underwater scientific research and practical projects, providing condensed and informative content regarding underwater objects [1, 2]. However, underwater object detection is difficult owing to multiple factors. First, in contrast to objects on the ground, underwater objects are characterized by their streamlined contours and deceptive appearances. Commonly used image features (i.e., corners and color histograms) can hardly recognize them. Owing to the variety of underwater objects, there are only few carefully designed templates available. Second, challenging underwater conditions pose difficulties from two aspects: (i) the wavelength-selective attenuation reduces and distorts the contrast between the object and background [3] and (ii) the bias illumination and haze effects may generate false positives appearances [4]. These challenges together degrade the performance of underwater object detection significantly [5–7].
In general, there are two main approaches to underwater object detection. First, previous background models are updated for underwater tasks. However, this approach is problematic in theory, as the underlying mechanisms of previous models cannot possibly adapt to underwater scenes, specifically to noise distribution and object characteristics. Second, various learning methods are introduced for underwater object detection. A practical issue concerning these models is the small database available for underwater model training, which may result in errors in model convergence.
To solve the aforementioned problems, this paper proposes a novel foreground feature-based approach that detects objects by identifying and propagating features. Specifically, our approach develops a novel optical prior to locate the coarse region of objects, focusing on the feature seeds inside the object region. The complete object region can be extracted by propagating the seeds. The main contributions of this work are as follows. (i) Our approach integrates underwater active imaging principles and object detection tasks, providing a novel strategy for underwater object detection. (ii) Our approach detects underwater objects by extracting and propagating feature seeds, demonstrating superior performance in terms of accuracy and generalization.
The remainder of this paper is organized as follows. Section 2 reviews the related literature and discusses the theoretical methodology of our approach. Section 3 presents the framework of our approach. Section 4 presents our novel optical prior model for underwater object detection. The seed selection process is described in Section 5. Section 6 presents the seed propagation process. The experimental results and analysis are discussed in Section 7. Finally, Section 8 concludes the paper.
2. Related Works
This section briefly reviews state-of-the-art research related to underwater object detection and theoretically discusses the novelty of our approach in comparison to these counterparts.
2.1. Object Detection from Underwater Images
Existing underwater object detection methods can be generally classified into two categories based on their fundamental strategies, that is, foreground feature extraction and background modeling.
In general, foreground feature extraction-based methods are applicable to predesigned objects with distinguishable appearances. For these objects of interest, carefully designed templates are available, and thus the detection can be realized by matching templates. For example, a color template was designed to detect light signs for underwater navigation [8]. An intensity template was established to detect the LED ring, guiding the docking position of the autonomous underwater vehicle (AUV) [9]. Moreover, different types of morphological templates have been developed to detect objects with distinguishable structures. For example, scale and rotationally invariant features were extracted for detecting man-made landmarks [10]. In terms of detecting underwater man-made landmarks, experimental results have shown that the shape feature is more stable than the color feature [11]. However, this benchmark only includes dock-mark samples while overlooking various natural objects under water. In addition to these man-made templates, many templates are automatically learned from training data. For example, the correspondence between raw images and their corrected results was modeled to restore the original radiation of underwater scenes, which can present the intrinsic contrast between an underwater object and its background [12]. As an extension to this method, an updated model was proposed by combining a number of low-complexity but moderately accurate color feature detectors [13, 14]. Moreover, saliency detection methods have been introduced to initially identify the region of underwater objects [15–19]. The problem is obvious, as common image features are possibly nonsalient owing to the light attenuation effect.
As mentioned above, underwater scenes are characterized by variability and dynamism. This degrades the background models significantly, and a few methods have been proposed for this purpose. For example, the average number of consecutive frames was calculated as the underwater background [20]. Moreover, the spatiocontextual Gaussian mixture model was used to estimate spatial and temporal correlations. A similar strategy was used in recent research wherein a probabilistic background was established for real-time fish detection [21].
In recent studies, various deep learning networks have been proven to be successful for object detection on the ground [22]. However, using this framework is relatively problematic for underwater tasks because of the small database under water. As a result, the performance of deep learning networks in underwater environments is insufficiently demonstrated [23].
2.2. Theoretical Discussion
In general, the main problems in current underwater detection result from challenging imaging environments. As a result, underwater images are significantly distorted and attenuated. The commonly used methods are significantly degraded in terms of their accuracy and generalization. This problem may be solved by learning strategies such as deep learning architectures. However, the small database and differences among underwater samples reduce the feasibility of learning strategies.
Current problems in underwater object detection have motivated our study. Our novel approach, which is different from other approaches, detects objects by using foreground features. In this study, foreground feature extraction is the key. To correctly identify object features, a novel underwater optical prior was modeled. This type of optical prior takes advantage of the optical principles in underwater imaging; specifically, it assumes that the optical collimation of active imaging necessarily overlaps with object bodies. Object features can be identified with the guidance of optical collimation. A complete object was extracted by propagating the object feature seeds.
In theory, this study develops a novel approach for underwater object detection, which extracts object features with the guidance of an optical prior [24]. This idea is beneficial to underwater image processing and requires further development in the future. Moreover, from the aspect of information fusion, the proposed approach can be understood as the fusion of external and internal features, such as optical and image features.
3. Framework of the Proposed Approach
The framework of the proposed approach is illustrated in Figure 1. In general, the framework consists of two main phases: feature seed extraction and seed propagation. In the first phase, the optical prior is combined with various image features, such as color, texture, and contrast, to extract the most distinguishable seeds of underwater objects. In the second phase, the selected seeds are propagated by the extended random walk (ERW) algorithm to identify the complete region of the underwater objects [25].

4. Underwater Optical Prior
Active imaging technologies have been widely used to compensate for light attenuation in underwater environments. However, artificial power causes an inhomogeneous distribution of the scene light. This effect was considered a drawback in previous studies; however, this optical inhomogeneity is regarded as a novel valuable cue for identifying underwater objects in this study. According to underwater active imaging principles, an artificial source is necessarily collimated at the object of interest to enhance visibility. As a result, this type of optical collimation can help identify the region of underwater objects.
Regarding underwater active imaging (see Figure 2), the light irradiance along the optical axis (line AB in Figure 2) decreases with transmission distance [26].where z is the distance from a point on the optical axis to the light source, λ is the light wavelength, and is the initial irradiance. The quadratic term measures the change in irradiance. The exponential term accounts for the light attenuation with the attenuation coefficient .

Consider a point P in an underwater object, and the irradiance at P (denoted as ) can be represented aswhere models the distribution of irradiance in plane S perpendicular to the optical axis and the vector denotes the coordinates of point P in the plane. Considering that the light field is generally symmetric around the optical axis, the function can be approximated using a Gaussian model. According to the optical model (Equations (1)–(2)), the power of the artificial source is maximized at the object and then gradually decreases. Thus, we have more opportunities to identify the object features along the optical collimation. Generally, the optical collimation of a light source can be identified by the intensity feature. This is the case for natural scenes covered by parallel ambient light; however, this is not true for most underwater conditions because of the complex light components. As a result, the region with the largest intensity may be distant from the optical collimation. To solve this problem, a series of optical principles is proposed in this study, to comprehensively model the optical prior.(i)According to the underwater active imaging model (Equation (2)), the irradiation of any point in the optical collimation region depends on the radius of the collimation center.(ii)According to the Beer–Lambert law, the wavelength-selective attenuation is reduced in the optical collimation region for a short sight distance, such that the color-channel variation is low, and the red channel is more obvious in the optical collimation region. These two optical principles can be analyzed with three mathematical computations.(i)Irradiation-position relation in the local region: this measures the relationship between the irradiation of any points and their distance from the point with the maximum intensity in the local region. where is the Euclidean distance from point x to point m that has the maximum intensity in the local region, , centered at point x and and are the coordinates of points x and m, respectively.(ii)Red-channel contrast: this measures the irradiation difference in the red channel between the optical collimation region and outside regions. where is the absolute intensity difference between points x and y in the red channel and the superscript r denotes the red channel.(iii) Channel variation: this measures the variation between color channels. where is the channel variance of point x in the color space. Herein, the Euclidean distance is used to measure the difference. According to these three measurements, the optical collimation can be identified at points such that their irradiation distribution is of a Gaussian model, they must have a large contrast against other points in the red channel, and they should be homogeneous among color channels. This can be mathematically modeled using three types of relationships.(i)Positive relationship between the irradiation and position relation and channel variation: Inside the optical collimation region, the points with a lower channel variation must be spatially closer to the point with the maximum irradiation in local regions, and vice versa. where is the two-dimensional correlation operator between two matrices.(ii)Inverse relationship between the red channel contrast and irradiation–position relation: Inside the optical collimation region, the points with a higher red contrast must be spatially closer to the point with the maximum irradiation in local regions, and vice versa.(iii)Inverse relationship between the contrast of the red channel and channel variation: Inside the optical collimation region, the points with a higher red contrast must be lower in the channel variation, and vice versa.
Inside the optical collimation region, all of the above three two-dimensional correlations should be maintained at a high level.
The results for S are shown in Figure 3. From Figure 3, we can observe that the S values inside the optical collimation are significantly larger than the values outside. Hence, using S values initially can help identify underwater object regions. This can be deemed the external optical prior for underwater object detection, as it does not consider any object-specific features.

(a)

(b)

(c)

(d)

(e)
5. Seed Selection Refining by Optical Prior and Image Features
After modeling the optical prior, feature seeds are refined using image features, that is, textural and contrast features. Accordingly, the underwater image is transferred from the pixel level to the superpixel level. The edge density in each superpixel was calculated as a textual feature [27].where is the total length of the edges in the i-th superpixel and is the size of the i-th superpixel.where OE is the oriented energy for detecting and localizing the composite edges, TG denotes the texture gradient, and C is a classifier for combining cues. and are quadrature pairs of even- and odd-symmetric filters at the orientation and scale s. and h are two half-disc histograms.
The contrast feature of any point denotes their difference from the average of the background points where .where is the difference between point x and the average background.where is the number of the background points with .
The comprehensive evidence for identifying seeds can be formulated from multiple cues.where where is the i-th superpixel. The superpixels with the largest k value of were selected as the seeds.
6. Feature Seed Propagation
In our approach, underwater objects were detected by propagating the selected seeds. Specifically, the ERW algorithm is opted as it can propagate seeds to more distant areas, thereby preventing false negatives in object regions [25].
6.1. Graph Construction
We initially measured the correlation between superpixels prior to seed propagation. In this study, the superpixel-to-superpixel correlation is modeled by an undirected graph of the input image, G = (V, E). Here, V is the node set including all superpixels over the image, V = {SP1, SP2, …, SPN}, and E is the connection between any two nodes. The similarity between the nodes is measured by the weight matrix , and each element of W is defined as follows:where denotes the feature extracted in superpixel SPi and is the controlling constant. The degree of any node is defined as the sum of edges connected to them: . The degree matrix is defined as . The Laplace matrix was generated by L = M − W.
6.2. Extended Random Walking
According to ERW [25], seed propagation is realized by minimizing the following energy function:where the first term is a second Laplacian term aiming to propagate saliency to distant nodes and fi denotes the label of the node SPi; therefore, when , it denotes the seed, and in contrast, when , it denotes other places, and Ci is the node set centered at the superpixel SPi. The second term is the standard random walk formula, the third term ensures the correctness of the seeds, and yi denotes the output of an external seed classifier. Parameters and are trade-off parameters.
6.3. Combination
The features of any notes are characterized by their color and texture information. Complementation between these two features can enhance the reliability of object detection. For example, the color feature can better identify parts with distinguishable colors (see Figure 4(a)). Alternatively, the texture feature plays a more important role in identifying textural parts (see Figure 4(b)). The mean CIELab color and local binary pattern (LBP) features were utilized and separately propagated within their own feature spaces. Therefore, two maps can be obtained, and their intersection is regarded as the result of underwater object detection.where and denote the maps generated by the CIELab and LBP features, respectively, and R represents the final detection result (see Figure 4(c)).

(a)

(b)

(c)

(d)

(e)
It is important to note that the above calculations were all processed at the superpixel level, which consequently reduces the accuracy of the object detection results. A diffusion process was added to refine the detection results. The superpixels over the detection results are clustered into k partitions, and r superpixels are included in each cluster. In each partition, the value of the i-th superpixel is smoothened by the average of the remaining superpixels.where and are the refined and original values at the i-th superpixel, respectively, denotes the color feature, and and are the balance parameters. A comparison between the original and refined results is shown in Figure 4(d). As shown in Figure 4(d), the effect of the diffusion process can be clearly demonstrated; after diffusion, the detection results become more homogeneous in the underwater object region, while the contrast between the object and background is enlarged. The binary result can be obtained using the thresholding method (see Figure 4(e)) [28].
6.4. Pseudocode for Underwater Object Detection
In general, our approach detects underwater objects through three successive phases: feature seed selection, propagation, and combination. The step-by-step pseudocode details are listed in Algorithm 1.
|
7. Experimental Evaluation and Analysis
Both qualitative and quantitative evaluations are proposed in this section to demonstrate the performance of our underwater object detection method. We first present the results of our novel seed selection for diverse samples, as it is the key and fundamental novelty of our approach. Then, our underwater object detection results are presented and compared to other typical object detection methods. Our approach can efficiently detect underwater objects in underwater images. This aspect is shared by comparing methods that include two classic image segmentation methods (PCNN [29] and graph-based segmentation methods [30]) and eight saliency-based methods (Itti [31], HFT [32], MR [33], GBVS [34], PBS [35], SF [36], wCtr∗ [37], and RRWR [38]). The code for the baseline methods was downloaded from the websites provided by the authors, and default parameters were used. Here, we did not attempt to adapt the model parameters to special underwater data. There are three possible reasons for this. First, all compared methods can work in a fully automatic manner, while the parameter setting plays a minor role in the detection performance [39]. Moreover, parameter setting is unnecessary for most compared methods, which is only required for PCNN and graph-based methods. Second, unlike the images on the ground, underwater images are characterized by large intervariability. Taking the images in Figure 3 as examples, we can find that not only the objects but also the background hues are significantly changed between scenes. Therefore, it is unreasonable to adjust the parameters to all possible underwater scenes, as well as special samples. Hence, we used unified parameters in all experiments when detecting underwater objects. Finally, experimental comparisons in this section are intended to verify the underlying mechanisms of different methods, such as feature modeling and classification. The theoretical mechanism difference between the methods determines their performance variation in underwater object detection. Further, carefully moderating each parameter is partially problematic, which can be deemed as an overfitting issue for underwater object detection. To ensure a fair performance evaluation, the parameters in our approach are fixed for different samples without any adaptation process. In contrast to the model learning strategy, our approach and the compared methods can detect underwater objects without additional learning processes.
7.1. Dataset and Experimental Setup
Underwater images available on YouTube were collected to establish a benchmark for experimental evaluations [40–48]. More than 200 images and 50 scenes were included in our underwater benchmarks. The object region of each image was labeled by 10 volunteers, and the average results were deemed the ground truth. Images in this dataset differ in several aspects, such as participants, context, and ambient light. The diversity of these images can ensure a comprehensive and fair evaluation of different aspects, such as background removal, noise inhibition, and object identification. The quantitative performance of the object detection is evaluated with respect to six criteria [49]: precision (Pr), similarity (Sim), true positive rate (TPR), F-score (FS), false positive rate (FPR), and percentage of wrong classifications (PWC).where tp, tn, fp, and fn denote the scores of the true positive, true negative, false positive, and false negative, respectively. The parameter tp is evaluated by the number of pixels that belong to the object in both the detection results and the ground truth. The parameter tn is the number of pixels that belong to the background in both the detection results and the ground truth. The parameter fp counts the number of pixels that belong to the background in the ground truth but are mistaken as the object in the detection result. The parameter fn is the number of pixels that are the object in the ground truth but are mistaken as the background in the detection results. In each of these experiments, we maintained the resolution of all the inputs as the original resolution. Additionally, we kept the tunable parameters in our approach fixed in different experiments as follows: , , , , , and , which can ensure the automation of our approach.
7.2. Seed Selection
This section evaluates the performance of our proposed seed selection method, which is the fundamental novelty of our approach, and also aids in validating our detection results. A qualitative presentation is depicted in Figure 5. From the results in Figure 5, most of the selected seeds were correctly located on the object regions. Given the ground truth for experimental samples, seed selection results can be quantitatively evaluated by the TP, FP, and Pr scores, which comprehensively calculate the ratio of the seeds that were located inside and outside the objects. The quantitative evaluation of our seed selection method with 30 samples is given as follows: TP = 0.4486, FP = 0.0010, and Pr = 0.9416. From these quantitative scores, we can conclude that almost all of the selected seeds were correctly located inside the object regions. This is a prerequisite for the good performance of underwater object detection.

7.3. Underwater Object Detection
Experimental evaluations are realized by comparing our approach to state-of-the-art object and saliency detection methods. Figure 6 presents the experimental results for twelve scenes with diverse background and object categories. The first column in Figure 6 presents the original images, the second column shows the ground truth, the third to fourth columns show the results of PCNN and graph-based segmentation methods [29, 30], the fifth to twelfth columns present the saliency-based segmentation methods [31–38], and the last column shows the results obtained by our approach.

According to the results in Figure 6, different methods demonstrate discrepant performance in various scenes. In contrast to the compared methods, our approach presents the most consistent performance that our approach can robustly segment different underwater objects under all conditions. Overall, the saliency-based methods performed better than the image segmentation methods in the underwater object detection experiments. Other segmentation methods can only segment images into homogeneous partitions but cannot identify the object regions. The most comparable performance to ours is obtained by SF, which identifies objects using the contrast feature that is also considered in our approach. This similarity between our approach and SF proves the importance of the contrast feature for object identification in underwater environments.
Table 1 summarizes the average performance of the different methods. The scores in Table 1 present the average results for the 30 samples. For Pr, TPR, Fs, Sim, and C, larger values indicate better performance, while lower values for the FPR and PWC show better results. In general, our approach won the best in six criteria and ranked fourth in the TPR criterion. This is because the coarse regions given by the GBVS, Itti, and graph-based methods increase the tp value while reducing the fn value, resulting in a significant increase in the TPR value. By comprehensively understanding all criteria, our method provides the best results in contrast to the other methods. This functional advantage can theoretically be explained from two perspectives. First, image degeneration and distortion will reduce the feature contrast between the object and the background, such that the clustering-based method cannot identify the object region, which can be observed by the results generated by the PCNN and graph methods (Figures 6(c) and 6(d) and the first and second rows in Table 1). In contrast, the saliency detection methods have better ability in pop-out region detection, which has more opportunities to identify the objects of interest (Figures 6(e)–6(l) and the third to tenth rows in Table 1). However, the regions given by the saliency detection methods were considerably coarse. This is caused by light scattering in water, where a haze effect is generated around the objects. This is the reason for the irregular edge expansion of the object detection results. Second, the functional advantages of our approach are presented in Figure 6(m) and the last row in Table 1. These advantages demonstrate the contributions of our novel approach to underwater object detection. The most representative features can be refined using our feature seed selection process. However, these feature seeds are limited and cannot identify the complete region of the objects. This problem is solved by our seed propagation phase, which can extract the complete regions of underwater objects.
8. Conclusions
Underwater object detection is difficult owing to the challenging optical environments. Commonly used detection methods can hardly distinguish an object from the scene background, often resulting in a high false positive ratio. To solve this problem, we propose a novel optical prior-based approach for detecting underwater objects. Theoretically, our work develops a novel interdisciplinary idea for underwater object detection that models and introduces an external prior into the detection task. Unlike data-driven strategies, our prior method is independent of the training data and can be directly obtained by optical disciplines. Using this prior method can correctly guide foreground object feature extraction. The object features provide sufficient cues for identifying the object regions. The advantages of the proposed idea have been demonstrated by a variety of underwater samples, where the scene background and objects are different. Moreover, we believe that this new approach will provide valuable inspiration for future computer vision research in water. However, we acknowledge that our approach may fail in some cases owing to the lack of feature seeds. This problem likely occurs when objects have a high-contrast appearance. This problem will be discussed in future work.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported partly by the National Natural Science Foundation of China (nos. 61903124 and 62073120), the Fundamental Research Funds for the Central Universities (nos. B200202186 and B200202181), the National Natural Science Foundation of China (no. 51922065), the Natural Science Foundation of Jiangsu Province (no. BK20201311), the Hubei International Science and Technology Cooperation Base of Fish Passage (no. HIBF2020010), and the Open Research Fund of Jiangxi Provincial Key Laboratory of Water Resources and Environment of Poyang Lake (no. 2020GPSYS01).