Abstract

Due to the limitation of numerical aperture (NA) in a microscope, it is very difficult to obtain a clear image of the specimen with a large depth of field (DOF). We propose a deep learning network model to simultaneously improve the imaging resolution and DOF of optical microscopes. The proposed M-Deblurgan consists of three parts: (i) a deblurring module equipped with an encoder-decoder network for feature extraction, (ii) an optimal approximation module to reduce the error propagation between the two tasks, and (iii) an SR module to super-resolve the image from the output of the optimal approximation module. The experimental results show that the proposed network model reaches the optimal result. The peak signal-to-noise ratio (PSNR) of the method can reach 37.5326, and the structural similarity (SSIM) can reach 0.9551 in the experimental dataset. The method can also be used in other potential applications, such as microscopes, mobile cameras, and telescopes.

1. Introduction

In microscopy, it is very difficult to obtain a clear image of the specimen with a large depth of field (DOF), especially for large NA. Usually, optical axial scanning is used to solve the problem. However, it is time-consuming. With the rapid development of deep learning in computer vision and image processing, it is possible to use deep learning algorithms to obtain a clear image of the specimen with a large DOF. For example, the super-resolution (SR) and deblurring algorithms are used to improve the resolution of images [19]. Recently, the deep learning single-image super-resolution (SISR) model [1012] and convolutional neural network (CNN) model [13] have been applied to improve the resolution and definition in microscope images, which indicates that they have great potential in microscopes. However, compared with the application of SISR in enhancing the texture of real images, the SR microscopic images have higher requirements for the predicted image fineness. Therefore, it is still an uncertain problem that the SR image based on deep learning is better than the traditional SR microscope [1416] in many scenarios. Public image deblurring methods in deep learning simply solve the blur caused by the point-spread function (PSF) of the optical microscope [17], while the problem of the blurred microscope images due to insufficient DOF needs to be solved urgently. For example, traditional SR microscopes such as localization microscopy [14], STED microscopy [15], and structured illumination microscopy (SIM) [16] provide great convenience to observe the internal work of cells and various biological processes. However, it needs a relatively more complicated experimental setup and requires a priori knowledge about the sample and its preparation. An SR confocal microscope based on deep learning is proposed to improve the resolution of the image [18], but this method is based on photoactivated localization microscopy (PALM), and it is difficult to generalize it in various microscopic imaging systems. An SR microscope based on generative adversarial networks (GAN) enhances the textures of macroscale realistic photographs, but the fineness is far from the requirements of microscopic imaging [19]. An SR microscope based on deep neural networks has greater accuracy and quantifiability of the inferred nanoscale structure [20]. But the blur inherent in low-resolution (LR) images is still unresolved. A deblur microscope based on CNN is proposed to remove the blur caused by PSF [17]. This method ignores the blur caused by the insufficient DOF. Therefore, it is urgent to realize to study a microscope with both SR and large DOF functions.

In this paper, we propose a generative adversarial neural network framework model (M-Deblurgan) based on the optimal approximation, which is to simultaneously solve the low-resolution and insufficient depth of field in an optical microscope. M-Deblurgan is composed of a generator network and a discriminator network that have a competitive relationship. The generator network comprises three modules: deblurring module, optimal approximation module, and SR module. The robustness of the model on the publicly available dataset and experimental dataset are tested. The PSNR of the method can reach 37.5326, and the SSIM can reach 0.9551 in the experimental dataset, which shows the deblurred results and resolution are improved compared to other models.

2. Proposed Method

The proposed method is shown in Figure 1. The proposed deep learning network structure follows the GAN framework [21], which has two subnetworks trained at the same time. One is a generator network that makes LR microscopic blurred images become high-resolution clear microscopic images, and the other is a discriminator network that distinguishes the image generated by the generator from the real image.

2.1. Network Structure

The generator network is shown in Figure 1(a). We integrate the ideas of residual network and encoder-decoder network into the joint task of microscopic image deblurring and SR. The generator network consists of a deblurring module, an optimal approximation module, and an SR module.

In the deblurring module, we propose an improved encoder-decoder network as our deblurring module and use the improved method as a lightweight option to incorporate multiscale features. The architecture consists of an encoder-decoder network, and we output the final feature maps of five different scales during the upsampling process. The encoder is made up of a 5 × 5 flat-convolutional layer and four 3 × 3 Res-Blocks. To broaden the perceptual field and learn high-dimensional representations, the encoder component employs three pooling layers. Three 3×3 upsampling layers and three 3 × 3 Res-Blocks make up the decoder component, which gradually increases the size of the feature map to the original input resolution. Between the encoder and the decoder, we added four skip connections. These connections can hasten the network's convergence and result in significantly crisper outputs. To avoid adding too many fuzzy features from the encoder side, we additionally use a 3 × 3 flat-convolution layer before element-wise addition. These feature maps of different scales are sampled to the same 1/4 input size and connected into a tensor that contains rich image texture information of different scales.

There are two upsampling and convolutional layers at the end of the network, which can make the output size of the deblurring module equal to the input size and reduce artifacts. We have also made some improvements to the residual block. We removed the batch normalization layer in ResBlock compared to the original ResNet. The batch normalization layer normalizes the properties of specific jobs, increases the amount of computation, and lowers the flexibility of the network.

In the optimal approximation module, firstly, this module can avoid the accumulation of errors due to the simple series connection of the deblurring module and the SR module, that is, to prevent the estimation error of the first model from being further propagated and amplified in the second model. Secondly, it makes full use of the correlation between the two tasks. The main formulas used in our optimal approximation method are as follows:

Among them,

N in equation (2) represents the number of inputs, and represents the i-th input. N in this network is 3. represents the set of inputs, which represents the input image, the output of the encoder-decoder network, and the output of the deblurring module in the proposed network, respectively. The two formulas are mainly used to determine which of the 3 inputs is closest to the actual optimal input. The input picture and the output of the feature module are connected to the optimal approximation module in a jump connection. Through the screening of the optimal approximation module, the data of the one closest to the real picture can be output. The forward and backward jump connections can minimize the influence of the front network error on the backpropagation and reduce the network coupling.

In the super-resolution module, the output of the optimal approximation module is used as the input of the SR module. The model has three fully connected layers (FC) with a core size of 3. Then, the feature map resolution is improved by two trainable 2x subpixel convolutional layers. The model adds a convolutional layer with a core size of 3 at the tail to convert the features to the RGB image and remove redundant artifacts. We add residual learning to the super-resolution module and add a global jump connection to jointly reconstruct the super-resolution image, making the final image closer to the real image.

The input of the discriminator is composed of the image generated by the generator and the real image, and the output is the probability that the image generated by the generator is the real image. The network structure is shown in Figure 1(b). The network is similar to the VGG network, which uses step-size convolution to downsample the image while doubling the number of feature maps. Except for the sigmoid function used in the last layer, the Leaky ReLU (LR) activation function with a slope of 0.2 is used after the other layers. Besides, we use a global average pooling layer to replace the first fully connected layer, which can prevent overfitting and reduce the number of parameters in the model training process.

2.2. Network Loss

The designed objective function is the combination of three different loss functions in M-Deblurgan: pixel-wise mean squared error (MSE) loss , perceptual loss, and SSIM loss . In the proposed method, the MSE loss is calculated as follows:

Among them, , , and represent LR microscopically blurred images, LR clear images, deblurred images, and high-resolution clear images, respectively. F represents the feature extraction module, represents the deblurring module, and represents the SR module. The perceptual loss is calculated as follows:

Among them, represents the feature map output in the i-th pooling layer, and the j-th convolutional layer after the image is input to the pretrained VGG19 network model. and represent the dimension of the feature map. Specifically, we aim to minimize from the following equation:where is the generative model output, D (.) is the predicted result of the discriminator, and y is the HR image used as ground truth. Since the images generated by the model often have unsatisfactory artifacts and excessively smooth textures, and were set to accommodate the MSE loss and the SSIM loss to be ∼5–15% of the combined generator model loss .

3. Experimental Results and Analysis

The proposed model is trained and tested on a PC (Intel Core i7-10750H CPU @2.6 GHz + RTX1660Ti) equipped with the Windows10 operating system and is based on the platform of PyTorch 1.7.0. We set the learning rate to 10−4 in the first 150 epochs and linearly decayed to 10−7 in the subsequent 50 epochs.

The FIRE (https://www.ics.forth.gr/cvrl/fire/) dataset is used as the simulated dataset for the training and testing of the proposed network. We cropped the dataset into 5360 images with a size of 512 × 512, of which 4840 images are used for training and 520 images are used for testing. For training, the training dataset is down-sampled 4 times through bicubic interpolation to obtain an LR image with a spatial resolution of 256 × 256. Finally, the blurry LR inputs are generated by applying a Gaussian kernel. The proposed network model processes a 256 × 256 blurry LR input and outputs an SR and a clear image of 512 × 512 size. The results of the experiment are shown in Figure 2. Figure 2(a) is an original uncropped image. Figures 2(b), 2(d), and 2(f) represent the enlarged images of the red frame region enclosed in Figure 2(a). Figures 2(c), 2(e), and 2(g) represent the enlarged images of the blue frame region enclosed in Figure 2(a). Figures 2(b) and 2(c) are blurry LR inputs. By using Figures 2(b) and 2(c) as the model input, the output is the clear SR images shown in Figures 2(d) and 2(e). From the experimental results, Figures 2(d) and 2(e) remove the blur of small blood vessels and improve the resolution of the image (as indicated by the red and blue arrows). The restored image is very close to the ground truth (Figures 2(f) and 2(g)) in terms of details and intensity information.

Our model performs very well in the simulated dataset. It is very necessary to use the PINE CONES dataset collected in a real and complex experimental environment to verify the quality of the model. The experimental image acquisition and processing process is shown in Figure 3.

Microscopic defocused images of female pine cones are obtained by the ×5 objective lens of a biological microscope. At the same time, the clear images of the corresponding ×10 objective are obtained as the ground truth for model training. The experimental observation device is shown in Figure 4. The specimen is the female pine cones. We train and test our proposed model and other SR and deblurring methods on the PINE CONES dataset. The results are shown in Figure 5. Figure 5(a) is obtained by the ×5 objective with the process of the defocused blur. Figures 5(b)–5(d) are the images obtained by processing Figure 5(a) with the RL-Decov + SR method [22], the U-net + SR method [23], and our method, respectively. Figure 5(e) is obtained by the ×10 objective. Figures 5(f) and 5(j) and 5(k)–5(o) are the enlarged images of two same areas in Figures 5(a)–5(e), respectively. These areas are pointed out in Figures 5(a)–5(e) using solid red and blue frames. It can be seen from the enlarged image Figures 5(f)–5(o) that there are serious distortions in the restoration of blurry LR images through the RL-Decov + SR and U-net + SR methods. The cell image recovered by the RL-Decov + SR method is shown in Figures 5(g) and 5(l). It is not difficult to find that the nucleus and the cell wall are diffused. The cell image recovered by the U-net + SR method is shown in Figure 5(h) with unsatisfactory artifacts, and the sharpening of the cell wall is not enough in Figure 5(m). However, the processed image in Figure 5(d), obtained using our method, is similar to Figure 5(e).

The model proposed in this paper reconstructs a clear and high-resolution image when the input is a low-resolution microscopic blurred image while maintaining a small model size. In the prediction phase, the results can be output quickly. When reconstructing the image, only the generator network model is required. Since the generator of the proposed method belongs to a fully convolutional neural network, the size of the input images is not limited. As shown in Figure 5, our proposed method can generate clearer images under real microscope observation scenes and has stronger robustness than other methods.

To measure the quantitative performance of the proposed network, we considered conventional indicators, namely, PSNR and SSIM. Table 1 is the quantitative comparison between the proposed method and other methods in the testing phase. The results show that the performance of the proposed network is superior to the existing methods in terms of PSNR and SSIM, with an increase of 4.8033 dB and 0.053 dB on the PINE CONES dataset. By introducing the optimal approximation module, the performance of the proposed network is greatly improved.

Comparing the U-net + SR model and our model in Figure 6, these two models performed similarly in the initial training stage. When the epoch is close to 200, the PSNR of M-Deblurgan is 2.7 dB higher than that of another model. The result demonstrates the superiority of the proposed model. Combining the data in Figure 7, it is not difficult to find that the proposed method has fewer parameters and higher PSNR values than other deep learning models. Fewer parameters will lead to less computer video memory consumption and improve the training efficiency of the model. Therefore, the M-Deblurgan is more robust. From the quantitative comparison, we can draw the conclusion that the M-Deblurgan based on the optimal approximation module is conducive to generating clearer and higher perceptual images.

4. Conclusion

In this paper, we propose a deep learning model (M-Deblurgan) to improve the imaging resolution and DOF of traditional optical microscopes. The M-Deblurgan doesn't require any manual manipulation of the PINE CONES dataset during the training process. Furthermore, the clear SR output can be obtained quickly when the blurry LR microscopic images are used as input. The experimental results show that our proposed method, especially the optimal approximation module, overcomes the limitations of the simple combination of deblurring and super-resolution tasks. The experiments show that the peak signal-to-noise ratio (PSNR) of our method can reach 37.5326, and the structural similarity (SSIM) can reach 0.9551 in the experimental dataset. The experimental results show that the model is superior to existing methods in objective indicators such as visual quality and PSNR.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61975139 and 61927809).