Review Article

Deep CNN and Deep GAN in Computational Visual Perception-Driven Image Analysis

Table 6

Recent advancements of the deep GAN in computer vision.

S. noApplicationThe objective of the studyMethodology/network architecturePerformanceDataset

1Facial attribute transformation [14]To develop a novel conditional recycle D-GAN that can transform high-level face attributes retaining the face’s identityThe developed conditional recycle D-GAN model has two phases. In the first phase, conditional D-GAN attempts to generate fake facial images with a condition. In the second phase, recycling D-GAN is used to generate facial images to modify the attributes without changing the identity.The results were compared with existing D-GAN architectures to prove the efficiency of CRGAN. CRGAN performed better than existing D-GAN architectures.CelebA dataset

2Fusion of images [94]To propose a method to fuse images belonging to different spectra using D-GANFusionNet architecture was developed using the Pix2Pix architecture to generate fused images from different spectra fragments of images.The proposed FusionNet model was compared with existing fusion methods such as the cross bilateral filter, the weighted least squares, and the sparse joint representation. The FusionNet technique performed equally well with the existing methods.Dataset provided by experts

3Synthesis of high-quality faces [95]To propose a D-GAN-based method to synthesize high-quality images from polarimetric imagesThe proposed model has a generator subnetwork built based on an encoder-decoder network and a discriminator subnetwork. The generator is trained by optimizing identity loss, perceptual loss, and identity preserving loss.The qualitative and quantitative performance of the developed model is compared with state-of-the-art methods. The use of perceptual loss generated visually pleasing results.A dataset with polarimetric and visible facial signatures from 111 subjects

4Vehicle detection in aerial images [96]To develop a lightweight deep CNN model to detect vehicles in aerial images using D-GAN effectivelyThe architecture has two parts: lightweight deep CNN was developed to accurately detect vehicles and a multicondition-constrained GAN to generate samples to cope with data insufficiency.The model tested on the Munich dataset achieved a mean average precision of 86.9%.Performance evaluation is done on Munich public dataset and the collected dataset.

5Image deraining [97]To develop a deep learning model to remove rain steak from imagesA feature supervised D-GAN was developed to remove rain from a single image. Feature supervised D-GAN has two subnets to generate derained images that are very close to the real image.The developed model was tested on synthetic and real-world images. It showed better performance than the existing state-of-the-art deraining methods.Performance evaluated on real-world images and two synthetic datasets.

6Scene generation [12]To develop a model to generate scenes based on the conditional D-GANA model named PSGAN was developed to generate a multidomain particular scene. The quality of the images is improved by spectral normalization.The developed model is compared with Pix2Pix and StarGAN. 97% accuracy is achieved using PSGAN as against 95% accuracy achieved using StarGAN.The performance of the model is evaluated on MNIST, CIFAR-10, and LSUN.

7Human pose estimation [98]To develop self-attention D-GAN to perform human pose estimationThe D-GAN model used hourglass networks as its backbone. Hourglass architecture has Conv-Deconv architecture and residual blocks. The generator predicts the pose, while the discriminator enforces structural constraints to refine the postures.The model outperformed the state-of-the-art methods on benchmark datasets.The performance of the model is evaluated on Leeds Sports Pose and MPII human pose dataset.

8Automatic pearl classification [57]To develop deep learning models to perform automatic classification of pearlsMultiview GAN is used to expand the pearl images dataset. Multistream CNN is trained using the expanded dataset.The image generated using the multiview GAN is used to reduce the existing multistream CNN significantly.The dataset includes 10,500 pearls, with seven categories and each category containing 1,500 pearls.

9Image dehazing [99].To develop a deep learning model to recover the image’s texture information and enhance hazy scenes’ visual performanceAttention-to-attention generative network model is developed to map hay images to haze-free images. All the instance normalization layers are removed to generate high-quality images.The developed model performed better state-of-the-art methods for both real-world and synthetic imagesNYU2 synthetic dataset with 1300 images and SUN3D synthetic dataset with 150 images.

10Gesture recognition [100]To propose a new gesture recognition algorithm based on D-CNN and DCGANFor a particular gesture, the model recognizes the meaning of the gesture. DCGAN is used to solve overfitting in case of data insufficiency. Preprocessing is done to improve illumination conditions.An accuracy of 90.45% is achieved.Data collected using a computer containing 1200 images for each gesture.

11Face depth estimation [101]To develop a D-GAN-based method to estimate the depth map for a given facial imageD-GAN architecture is used to estimate the depth of a 2D image for 3D reconstruction. Data augmentation is done to improve the robustness of the models. Transformations such as slight rotation clockwise, Gaussian blur, and histogram equalization were applied to the image.Several variants of the D-GAN were evaluated for depth estimation. Wasserstein GAN was found to be the most robust model for depth estimation.The Texas 3D face recognition database and Bosphorus database for 3D face analysis.

12Image enhancement [102]To propose an image enhancement model using the conditional D-GAN based on the nonsaturating gameThe super-resolution method is combined with the D-GAN to generate a clearer image. The architecture has 23 layers composed of convolution layers with the ReLU activation function.The model is compared with existing methods, which showed an improvement in peak signal-to-noise ratio by 2.38 dB.Images from Flickr and ImageNet datasets were used without augmentation.

13Retinal image synthesis [103]To propose multiple-channels-multiple-landmarks, a preprocessing pipeline to synthesize retinal images from optic cup imagesResidual neural network and U-Net were integrated to form residual U-Net architecture. Residual U-Net is capable of capturing finer-scale details. Multiple-landmark maps comprise of batch normal layer, convolution layer, and ReLU activation. The final layer has a sigmoid activation function.The proposed multiple-channels-multiple-landmarks model outperformed the existing single vessel-based methods. Pix2Pix, using the proposed method, generated realistic images.Public fundus image datasets DRIVE and DRISHTI-GS were used.