Abstract

Underwater image processing is a difficult subtopic in the field of computer vision due to the complex underwater environment. Since the light is absorbed and scattered, underwater images have many distortions such as underexposure, blurriness, and color cast. The poor quality hinders subsequent processing such as image classification, object detection, or segmentation. In this paper, we propose a method to collect underwater image pairs by placing two tanks in front of the camera. Due to the high-quality training data, the proposed restoration algorithm based on deep learning achieves inspiring results for underwater images taken in a low-light environment. The proposed method solves two of the most challenging problems for underwater image: darkness and fuzziness. The experimental results show that the proposed method surpasses most other methods.

1. Introduction

Recently, developing, exploring, and protecting the ocean’s resources have received significant attention from the international community. Following the recent development of sea research, the autonomous underwater vehicles (AUVs) and remotely operated vehicles (ROVs) have been widely used as the carrier of various sensing devices. Sonars and vision camera are two kinds of major perception equipment to detect and recognize objects in underwater environments. In general, sonar is suitable for long-range detection and generates low-resolution image. However, these vision sensors are used for short-range and high-resolution identification.

The underwater imaging model could help us better understand underwater optical propagation. The diagram of light transmission in underwater is shown in Figure 1. The optical sensor receives three types of light which are shown as three different arrow symbols. The first solid arrow represents the direct of transmission of the light from the subject without the obstruction and scattering of particles. The second kind of light is the forward scattering light which is reflected from the objects and scattered by particles. The third kind of light is background scattering light which comes from the background light and is reflected by the suspended particles. According to the model, the imaging process of underwater images can be represented as the linear superposition of three components [1], shown as follows:where Er represents the total received light. Ed, Ef, and Eb represent the direct transmission light, the forward scattering light, and the background scattering light, respectively. The background scattering light comes from all scattered light of the suspended particles except the objects. This kind of light blurs the visual effect and reduces the clarity of underwater images. If the object is close to the camera, the forward scattering light should have very little value in the direction of the camera. In this situation, the component Ef often is ignored for easier analysis of the model [2], so (1) can be shortened as follows:

Due to light absorption and scattering in an underwater environment, the underwater images generally have the following problems: low contrast and brightness, blurry details, color distortion, and bright specks. In addition, the dark environment is also encountered in the period of deep-sea exploration and complex environmental detection. The camera flash also can be used to increase light in the environment; however, it is still necessary that the restoration algorithms are highly capable in an underwater dark environment when the flash is not available. The case can be divided into three primary factors: (1) The flash is not allowed on certain occasions such as the detection of certain sea creatures or passive detection of underwater intrusion. (2) The flash could be ineffective in complex surroundings. For example, the light from the flash could be blocked by the complicated structure of bridge bases, ports, and sea wrecks. (3) The battery’s power limits the flash exposure range. Even when the flash is turned on, there are larger areas away from the areas illuminated by flash. The above issues mentioned in a special submerged scene are urgently needed to be addressed. In this paper, we mainly focus on restoring the dark and blurred underwater images.

The methods related to image restoration can be divided into traditional methods and “modern” data-driven techniques. The former type of method includes the model-based and the model-free methods, detailed in Section 2. The latter method utilizes big data to learn the model and mainly uses machine learning techniques to complete the task. Deep learning methods are an important technique in machine learning. Deep learning methods have made rapid progress since 2012 in various computer vision tasks. There are three components that can vastly improve deep learning methods: big data, improved networks, and powerful hardware. Big data provides not only adequate data for training but also a standard answer (Ground Truth) to the algorithms. In other words, deep learning methods “peek” at the ground truth while traditional methods go without it.

The contributions of this paper are summarized as follows:(1)The image pairs (the dark and blurred underwater images and the corresponding normal exposure and clear images) are collected and provided to a neural network. We use a new method to collect the image pairs (the dark and blurred underwater images and the corresponding normal exposure and clear images). In particular, the objects are placed in the air, and the light passes through both air and water to the camera to simulate the underwater environment. The collecting method is proved to be effective to generate the underwater images in theory and practice(2)The proposed algorithm restores the images captured in the extremely dark and blurred underwater environment. The restoration results are beyond most underwater enhancement and restoration methods.

The rest of this paper is organized as follows. Related work about underwater restoration is proposed in Section 2. The new framework of the neural network is shown in Section 3. Detailed experimental results are shown in Section 4. Several problems which need to be further studied are proposed in Section 5. The paper concludes with Section 6.

The restoration of dark underwater images has been extensively studied in huge works of literature. In this section, we provide a short review of related work. In our research, we only consider that the underwater image restoration relies on a single image. The categories of single underwater image restoration are shown in Figure 2. The algorithms can be divided into two categories: traditional methods and machine learning methods [3].

Traditional methods include image enhancement and restoration and aim at improving the image quality which can be performed in both spatial and frequency domains [4]. The former is often a subjective process, a heuristic procedure designed to improve low-quality images, and without degradation model. On the other hand, the latter formulates an objective criterion and attempts to reconstruct a degraded image by using the prior knowledge of the degradation. In other words, it models the degradation procedure and applies the inverse process to recover the ideal image. The model is called the underwater image formation model (IFM) in the underwater scene. Image enhancement is equivalent to a blind operation, while image restoration tries to model the reverse procedure of the degradation. The difference between image enhancement and image restoration is listed in Table 1.

Early studies of underwater IFM-free methods directly used the corresponding methods that were used out of the water. Later methods are designed according to the distinguishing features found underwater, such as haze, color cast, and low contrast. The IFM-free methods can be divided into two categories: spatial-domain [5, 6] and transform-domain methods [79].

The methods based on spatial domain complete a redistribution about intensity histogram by expanding gray levels. It works in different color models such as Red-Green-Blue (RBG), Hue-Saturation-Intensity (HSI), and Hue-Saturation-Value (HSV). The color models can also be divided into single-color model (SCM) and multiple-color model (MCM) by class number of color models. The typical SCM-based image enhancement methods, Histogram Equalization (HE) [10], Contrast Limited Adaptive Histogram Equalization (CLAHE) [11], and Generalized Unsharp Masking (GUM) [12], work in RGB color model. Many researchers such as Torres-Méndez et al. [13], Iqbal et al. [14], and Huang et al. [15] proposed MCM-based image enhancement. For example, Torres-Méndez et al. [13] and Iqbal et al. [14] used Markov Random Field (MRF) and Integrated Colour Model (ICM) to describe the correlation procedure of distortion. Huang et al. [15] proposed the relative global histogram stretching (RGHS) strategy in RGB and CIE-Lab color models.

An image can be explained in frequency domain. The high-frequency component in an image usually indicates the edge region where the brightness value or color value of the pixels have sudden change, whereas the low-frequency component indicates the flat and large area. In order to achieve a higher-quality image, the high-frequency component needs more data, while the low-frequency component does not require as much data. Firstly, the transform-domain image enhancement methods convert the spatial domain image into the frequency domain through the conversion methods of the spatial-transform domain such as the Fourier Transform [16]. Secondly, the quality of underwater images can be improved by increasing the high-frequency component and suppressing the low-frequency component, synchronously. In 2010, Prabhakar et al. [17] used a homomorphic filter, an anisotropic filter, and an adaptive wavelet subband threshold to correct nonuniform illumination and smooth and denoise the image. In 2016, Amjad et al. presented the wavelet-based fusion method [18] to improve the low-quality issue of underwater images. In 2017, Vasamsetti et al. proposed a wavelet-based perspective technique [19] for underwater images, which performed the discrete wavelet transform (DWT) [20] on the RGB channels to generate two decomposition levels and reconstruct the grayscale images.

The image formation model-based (IFM-based) method is one of the traditional methods. It analyzes the underwater imaging mechanism and law of light propagation in water then constructs a physical model to restore high-quality images. Considering the optical properties, different prior-based methods are used for underwater image restoration. These methods include dark channel prior (DCP) [21], underwater dark channel prior (UDCP) [22], red channel prior (RCP) [23], and blurriness and light prior [24]. According to the priors, the background light (BL) and transmission map (TM) can be derived and entered into the IFM model for image restoration.

In recent years, many researchers have explored machine learning technology to improve the quality of the underwater images. Support Vector Machines (SVM) [25], one of the machine learning methods, was mostly used in underwater image object detection [26]. The deterministic annealing algorithm [27, 28] is developed based on Lyapunov’s functional method. It can be used in the learning of network parameters. During the past few decades, deep learning has achieved rapid development.

Deep learning works well by using convolutional neural networks (CNNs) [29, 30] or generative adversarial networks (GANs) [31] by backpropagation training [32]. The model in deep learning is unlike the physical model in the image restoration. The methods based on deep learning are neither image enhancement nor image restoration. Depending on the various models, deep learning can be divided into several categories. Sun et al. [33] suggested the pixel-to-pixel (P2P) network to enhance underwater images. The encoder part is composed of three convolutional layers, while the decoder is three deconvolutional layers. Underwater generative adversarial network (UGAN) [34] is proposed to improve the underwater image quality. The discriminator of UGAN is Wasserstein GAN with gradient penalty (WGANGP) [35] to soft constraint on the output. For solving the limitation of underwater images, Anwar et al. [36] proposed an end-to-end model UWCNN trained by the synthetic image. To take advantage of the popular dense connections, residual network, and multiscale network, the multiscale dense block (MSDB) algorithm [37] is proposed to enhance the underwater images. Not only the single-branch network, but also the multibranch network is designed to learn the different features of the same input. For example, UIE-Net [38] is composed of three subnetworks. In general, deep learning is divided into two classes: supervised learning and semisupervised/unsupervised learning. Mainstream technology in the two learning methods is CNN and GAN, respectively.

3. Method

3.1. Procedure Pipeline of Dark and Blurred Underwater Image

The pipelines based on the traditional methods and deep learning can be used to process the dark and blurred underwater images, as shown in Figure 3. The dark environment is defined as a 100-fold reduction in exposure amount to normal exposure in our study. The blurred underwater environment is made by adding a little milk powder. The traditional methods are divided into the single algorithm and the cascading algorithms, shown as the upper subimage surrounded by the dotted box.

The traditional pipeline using a single algorithm works well on the normal-light underwater images, shown in line A of Figure 3. However, it has less consideration for the low-light environment, often having poor performance in low-light underwater environments.

The second type of traditional method cascades multiple low-level vision processing procedures, shown in line B of Figure 3. The first step is luminosity scaling of the dark images. The images taken by the Nikon D700 made in Japan are RAW-format images with 14 bits. The maximum brightness value of the images is 214, 16384. Experiments show that the brightness values of the pixels are less than 50 in the underexposure 100 times environment. The procedure of luminosity scaling can be written as , where represents the brightness value of a pixel and represents the max brightness value of all pixels. The simple luminosity scaling initially solves the issue of underexposure, but simultaneously amplifies the noise while amplifying the information. For denoising the amplified noise, the next step noise reduction is immediately behind luminosity scaling. Considering the fact that BM3D [39] is a classic noise reduction algorithm, we select it as a baseline in the step of denoising. After brightness enhancement and noise reduction, the last step is underwater image enhancement and restoration.

The third type of traditional method is also a cascade method, but it uses a single method to accomplish both brightness enhancement and denoising, shown in line C of Figure 3. Recently, many algorithms were proposed to recover the low-light images while keeping a high SNR, such as the Robust Retinex Model algorithm [40], LIME [41], etc. Because LIME is a simple yet effective low-light image enhancement (LIME) method, we select LIME as the baseline in our study. The principle of LIME is that firstly the illumination value of each pixel is estimated individually by calculating the maximum value in RGB channels. Further, the initial illumination map is refined by imposing a structure prior to it. Finally, the enhancement can be achieved by the constructed illumination map.

Unlike the traditional methods without ground truth images, a general deep learning neural network must train the data before the test phase, shown in the bottom of the pipelines figure. The deep learning method includes the training phase (line D_1) and the test phase (line D_2). Data in the training phase is taken in the atmosphere by placed tanks in front of the camera, while data in the test phase is taken from the water. The details about the procedure of collecting data are described in the next section. In our work, the deep learning convolutional neural network [42] is proposed for dark and blurred underwater images. Specifically, a network similar to U-net [43] is used for processing, inspired by the recent algorithms [43, 44].

The structure of the proposed deep learning network is shown in Figure 4. The raw images are entered into the input of the network, and the restored images are output.

In the network, block 1 includes three layers: two convolutional layers abbreviated as conv2d(32, [3, 3]) and max pooling2d. The parameter “32, [3, 3]” in convolutional layers represents that output array size is 32 and convolutional kernel size is 3 × 3. The blocks from the 2nd to the 8th, respectively, include two conv2d(64, [3, 3]) and max pooling2d, two conv2d(128, [3, 3]) and max pooling2d, two conv2d(256, [3, 3]) and max pooling2d, two conv2d(512, [3, 3]), two conv2d(256, [3, 3]), two conv2d(128, [3, 3]), and two conv2d(64, [3, 3]). Block 9 includes two conv2d(32, [3, 3]) and conv2d(12, [1, 1]) layers.

The U-net network belongs to autoencoder neural networks [45, 46], which are trained to attempt to map the input to the output. The autoencoder network is divided into two parts: an encoder function h = F(x) and a decoder function G(h) which generates the reconstruction, respectively, shown as the left and the right parts of Figure 4. The skip connection shown as the blue arrow in Figure 4 transfers weight, avoiding gradient disappearance.

The algorithm is an offline one; it is necessary to provide the computational complexity analysis. Computational complexity is a concept that focuses on the amount of computing resources for particular kinds of tasks. Each operation of deep learning requires a lot of computing resources. For convolutional layer, each input feature has a Fh convolutional kernel, and output feature has a coutHcout convolutional kernel. At the same time, the number of input and output features is defined as Nin and Ncout, respectively. If multiplication and addition are required for each element, all operations of the convolution layer can be calculated as 2NinNcout Fh coutHcout. For ReLU activation layer, there is only one compare operation in each cell. So, the operation number of ReLU is Ncout. For pooling layer, each filter has the size of Ph, and output is Npout images which have the size of poutHpout. The operation number of pooling layer can be calculated as poutHpoutNpout.

3.2. Procedure of Collecting Data

The deep learning portion is divided into three methods: full supervision, semi-supervision, and unsupervised methods according to whether the data is labeled. In our research, we consider the underwater image restoration as a supervised learning task, in which the data label must be given in the training phase.

A label is not just the name of the image but has varied concepts in different computer vision areas. For example, a label is a number that represents the ID of a category in an image classification task. It is the location of the objects in the object detection task. It indicates whether each pixel belongs to a category in image segmentation. In the image restoration, the label is the high-quality image corresponding to the low-quality image.

The training phase and test phase are shown as the bottom of Figure 3. The training data in the training phase (line D_1) have two parts: the low-light and blurred images (left) and the Ground Truth images (right). The Ground Truth represented by the clear Lena image is the label of the left dark images. Because our method uses the label data, the method is considered a full supervision method. Two kinds of images before and after restoration must be the same size, preferably aligned pixel by pixel. The learning-based model can learn the map relation between the two kinds of training data using BP algorithm [47]. The trained model is produced through the training phase, shown in line D_1. In the test phase (line D_2), the low-light images are input into the trained model, and then the normal-light images are generated by the model.

The test and training data are best derived from the same probability distribution to ensure the effectiveness of the algorithm. Specifically, test data is taken from an extremely dark and blurred underwater environment; training data needs to be collected from the same or similar environment. However, it is well known that the collection of underwater images requires high computational cost. It hinders the application of deep learning methods in underwater image processing. In Section 2, several methods about collecting underwater images are discussed. We propose a method to collect underwater images, as shown in Figure 5.

In scene 1, the camera is fixed on the tripod, and a glass tank filled with clear water is placed on the table. The glass tank has a high grade of transparency, and the water is clear. The images taken in scene 1 are considered as the ground truth in our research. Then, the first tank is moved; the second tank is placed on the same location as the first tank. In other words, the only difference between scene 1 and scene 2 is the replacement of the tank. The second tank is filled with water mixed with suspended particles. The milk powder is added into the clear water in order to simulate the suspended particles. In scene 2, we capture dark images by adjusting the camera parameter, for example, reduction of the exposure time, closing the aperture, and reduction of the ISO value. Thus, we can collect the low-quality underwater images in scene 2.

The bottom subimage in Figure 5 demoes the light refraction transmission, which follows Snell’s law expressed as follows: , where and are the angle of incidence and Ir is an abbreviation for the indices of refraction. The equation states that for a given pair of media, the ratio of sin and sin is equal to the ration of the indices of refraction in the respective media. The indices of refraction are about 1, 1.33, and 1.5 in the following media: atmosphere, water, and glass, respectively. The indices of refraction of the water are larger than those of the atmosphere. Considering that the thickness of the glass is much smaller than the width of the water in the bottom of Figure 5, the refractive effect of the glass is ignored. According to the refractive law, the camera has a larger angle of view when the light passes through a tank filled with water. If we remove the tank in scene 1, the camera directly takes a picture in the clear air. In this case, we can actually get higher-quality label data (Ground Truth), but the angle of view should shrink. So, the images from scene 1 and scene 2 have different shooting regions, and the two images from the image pairs cannot be aligned pixel by pixel. The summary is as follows; we can collect the image pairs from the two scenes.

3.3. Scattering Models
3.3.1. Atmospheric Scattering Model

For an image shot in a scattering medium, only a part of the light from the object reaches the camera due to the absorption and scattering effects. Similar to (2), the atmospheric scattering model [48, 49] can be written aswhere x denotes the pixel coordinates, U(x) is the captured image, I(x) is a clear image, B is the global atmospheric light, and T(x) is the transmission proportion of light passing through the object to the camera. When the media is homogenous, T(x) can be written in an exponential decay term as follows:where represents the atmospheric attenuation coefficient and d(x) is the distance from the object to the camera. In the atmospheric scattering model, attenuation is independent of wavelengths. Since underwater images usually have a blurred appearance, the atmospheric scattering model can be used to describe the degradation the underwater image.

3.3.2. Underwater Scattering Model [4, 50]

Similar to the atmosphere scattering model, the underwater image imaging model can be written aswhere presents the wavelength of the light. The main difference between the models about the underwater and the air environments is the effect of the wavelength parameters. The attenuation of varying wavelengths needs to be calculated separately in (5), because light with varying wavelengths has varying attenuation levels in an underwater environment. Experiments show that the light with about 500 nm wavelength (blue-green color) has the smallest attenuation coefficient [51]. Thus, the underwater images have a more blue-green color. In our research, the object is close to the camera, so the effect of the wavelength is ignored.

Similar to the equation in the atmosphere, the transmission ration in underwater environments can be written as

It can be also written aswhere is the strength of light after the transmission of d(x) distance, is the energy of light in the original location before transmission, and is the normalized residual energy.

3.3.3. Mixed Scattering Model in Underwater and Atmosphere

Two different media are between the object and the camera, as shown in Figure 6; the scattering model is defined as a mixed scattering model. If the tank is removed between the camera and the object, the mixed scattering model is converted to atmospheric scattering model. Similarly, the atmosphere media is removed; in other words, if the object is placed in the water, then the model should be converted to an underwater scattering model.

The mixed mode is expressed aswhere is the light transmittance ratio in air, is the transmittance ratio in water with suspended particles, and is the global background light in the tank. Because clear air can be approximated to contain fewer particles, approximately set to 1 and is close to . Additionally, because the particles in air are fewer than those in water, the parameter can be ignored. By the above analysis, the mixed model can be expressed as

Comparing (9) with (5), we can find that if the light passes through two media (air and water), objects placed in the air are equivalent to being placed in water. Taking advantage of this law, the objects are placed in air to simulate the scene of the objects in water in the experimental part of our research.

3.4. Formulation as an Image Restoration Task

To recover the clear image I(x), the traditional underwater image restoration methods estimate not only I(x) but also homogenous global background light B and the medium energy ratio T(x) from an underwater image U(x), shown in the equation (3). The estimating process can be divided into two main steps. After the first step of estimating B and b, the latent image I is reconstructed by inverting the underwater formation model.

Unlike the previous conventional methods, the methods based on deep learning directly estimate the latent image I without calculating the global background light and the medium energy ratio. Instead of estimating parameters, deep learning methods compute the residual information between the target latent image and the underwater image in a data-driven and end-to-end manner. We use maximum a posteriori estimator (MAP estimator) to explain the restoration procedure from I(x) to U(x), written in a nonlinear function . The function can be further shortened as .

According to the Bayes rule, the maximization over the probability distribution of the posterior can be written aswhere is the likelihood of observing U given I, and is the prior on the latent image. Having a uniform distribution on the observations, the maximization of the posterior can be written as

In (11), because U is a determined value, is a fixed value and omitted. Further, (11) can be converted by minimizing the log likelihood aswhere the first item is log-probability term similar with the maximum likelihood method, is a priori probability influencing the results, term enforces the observations to be faithful to the degraded image, and is the regularization prior term.

4. Experiments

Three factors including big data, subtle algorithms, and hardware for parallel computing have led to significant progress in the deep learning approach. The high cost of collecting underwater images causes the scarcity of the data in underwater restoration algorithms. We designed the corresponding experimental scene for collecting training and test data, shown in Figure 7, based on the theoretical analysis in the section “The Procedure of Collecting Data” of the approach chapter.

In the training phase, two kinds of data are collected in scene 1 and scene 2 as shown in Figure 5. Figure 7(a) shows the scene 1 for collecting the Ground Truth images in the training phase. A variety of methods are adopted to make the quality of the images better. A Wi-Fi controller is used to take a photo remotely by the app “qDslrDashboard.” Polarized lenses are mounted in front of the lens to eliminate the reflection light of the glass. The tripods and tables are fixed steadily. After collecting the GT data, the first tank filled with clear water is removed and the second tank filled with the slightly muddy water is placed in the same location. We adjust the camera parameter to take the corresponding low-exposure images.

In the training phase, the distance between the tank and the camera is only about 2 cm, while the corresponding distance of the test phase is longer than 45 cm. The different focus objects cause different distances in two phases. In the training phase, the camera focuses on the object away from the tank. However, the camera focused on the object in the tank in the test phase. The closest focus distance for the lens is 45 cm, so the camera must be placed at least 45 cm away. The training pictures picked randomly from the training data including about 50 pictures are shown in Figure 8. We can see that the images taken in dark environments are noisy and blurry.

The scene of the test phase is shown in Figure 7(b); several black cloths are used to obtain better test images. Firstly, the top of the tank is covered by a black cloth to avoid direct sunlight on the top. Secondly, the back of the tank is sheltered by a black cloth to make sure the scene has a black background. Thirdly, a big black cloth is placed behind the camera to prevent the rear light from shining on the tank and reflecting back to the camera. In the test phase, only one tank with slightly muddy water is provided, and the camera is adjusted to take the low-exposure images.

Based on the recent advances in underwater low-light image processing algorithms, the comparative experiments are divided into four categories, respectively, shown in lines A, B, C, and D.

Line A in Figure 3 describes the pipeline that the underwater low-light images are processed directly by underwater enhancement and restoration algorithms. The comparative results using different underwater algorithms are shown in Figure 9.

It can be seen from Figure 9 that single processing underwater enhancement and restoration algorithms cannot effectively process the underwater images taken in a dark environment. The results by single underwater algorithms are too dark except for the column E. The result of column E has abnormal bright spots and rough texture on a smooth surface. Our algorithm achieves the closest effect to the Ground Truth.

Objective IQA methods are used to measure the results. The classical FR methods peak signal-to-noise ratio (PSNR) and structural similarity index [56] (SSIM index) are selected in our quantitative analysis as full-reference methods. The higher the value of the two FR methods, the better the image quality. Other IQA methods selected in the analysis are IL-NIQE [57] and NIQE [58]. IL-NIQE uses a feature-enriched completely blind image quality evaluator. NIQE makes a completely blind image quality analyzer and is also one of the NR methods.

The results of objective IQA methods between our algorithm and underwater enhancement and restoration algorithms are shown in Table 2. The results show that our algorithm works better than the other algorithms.

The second image processing pipeline is shown in line B of Figure 3. This pipeline cascades the following three steps: luminosity scaling, denoising, and underwater processing algorithm. The first two steps are abbreviated as “S + D.” In the experiment, the classic BM3D [39] is selected in the denoising step. The results of the pipeline are shown in Figure 10.

All cascading pipelines obtain the results with normal exposure relative to the single underwater algorithm. We consider that the normal exposure benefits from the step of luminosity scaling. The images of column C are fuzzier than our algorithm. The results of other algorithms, such as columns E, F, and G, cannot restore dark areas well. The results of objective IQA methods are shown in Table 3.

The 3rd kind of image processing pipeline is shown in line C of Figure 3. The pipeline cascades low-light enhancement algorithm LIME [41] and underwater images processing algorithm. The results of algorithms and the objective IQA methods are shown in Figure 11 and Table 4. In Figure 11, our algorithm exceeds all LIME + underwater enhancement and restoration algorithms. The column D (LIME + GBdehazingRCorrection) has the same effect in terms of color and brightness recovery. In the aspect of objective IQA methods, our algorithm exceeds all of the LIME + underwater enhancement and restoration algorithms.

The pipeline based on deep learning is shown in line D of Figure 3. In the field of computer vision, most of the current deep learning methods are applied to aerial pictures. There are also some deep learning methods used in underwater image recovery, but few deep learning methods can be applied to blurring and dark underwater images. Deep underwater image enhancement [36] is selected as a comparison algorithm. The results of restoration and objective IQA methods are shown in Figure 12 and Table 5. It can be seen that columns B, E, and F show whiteness, column A has low clarity and color cast, columns D, E, and F cannot restore the background, and column C has rough particles on the smooth surfaces. Because underwater image characteristics are more complex than aerial images, many algorithms have not enough robustness. The test results of objective IQA methods show that our algorithm exceeds all deep underwater image enhancement algorithms.

4.1. Implementation Details

In all of our experiments, we used L1 loss and the Adam optimizer [59]. We only trained the network with the use of a Nikon D700 camera. The initial learning rate was set to 0.0001. The initial learning rate decreased according to the cosine function. The weight decay was set to 0.00001 and dampened to 0. According to the practical effect of the experiment, the training epoch is set between 3000 and 5000. Our implementation was based on Torch which is one of the deep learning platforms.

5. Discussion

In this work, we shared a new method for collecting images that can be used for future research of machine learning. With the help of a high-quality dataset, our algorithm achieves inspiring results in the restoration of the extremely low-light underwater images.

In future, we should try to improve our work with the following points. (1) The improved U-net networks can be used to improve performance. (2) The generalization performance of the method still needs to be studied, for example, different depth and different turbidity. (3) Experimental equipment for underwater image acquisition can be extended from water tank to pool.

Our research has important theoretical value for the underwater robot, surveillance, and many more areas.

6. Conclusions

To see in dark and blurred underwater environments, we propose a new method of collecting underwater image pairs by two tanks filled with different-turbidity water and different-environment light. Experiments show that our approach in the collection underwater image is simple and highly effective. We demonstrate the efficacy of our algorithm in blurred and dark underwater image restoration by supervised learning. The experiment shows that this approach can achieve inspiring results.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

All authors contributed to the paper. Huigang Wang carried out project administration; Yifeng Xu carried out conceptualization and methodology, investigation, and writing; Garth Douglas Cooper carried out review and editing; Shaowei Rong carried out data curation; and Weitao Sun carried out software.

Acknowledgments

This research was funded by the National Science Foundation of China (Grant no. 61571369). It was also funded by Zhejiang Provincial Natural Science Foundation (ZJNSF) (Grant no. LY18F010018) and by the 111 Project under Grant no. B18041.