Complexity

Review Article

Deep CNN and Deep GAN in Computational Visual Perception-Driven Image Analysis

Table 6

Recent advancements of the deep GAN in computer vision.


S. no	Application	The objective of the study	Methodology/network architecture	Performance	Dataset

1	Facial attribute transformation [14]	To develop a novel conditional recycle D-GAN that can transform high-level face attributes retaining the face’s identity	The developed conditional recycle D-GAN model has two phases. In the first phase, conditional D-GAN attempts to generate fake facial images with a condition. In the second phase, recycling D-GAN is used to generate facial images to modify the attributes without changing the identity.	The results were compared with existing D-GAN architectures to prove the efficiency of CRGAN. CRGAN performed better than existing D-GAN architectures.	CelebA dataset

2	Fusion of images [94]	To propose a method to fuse images belonging to different spectra using D-GAN	FusionNet architecture was developed using the Pix2Pix architecture to generate fused images from different spectra fragments of images.	The proposed FusionNet model was compared with existing fusion methods such as the cross bilateral filter, the weighted least squares, and the sparse joint representation. The FusionNet technique performed equally well with the existing methods.	Dataset provided by experts

3	Synthesis of high-quality faces [95]	To propose a D-GAN-based method to synthesize high-quality images from polarimetric images	The proposed model has a generator subnetwork built based on an encoder-decoder network and a discriminator subnetwork. The generator is trained by optimizing identity loss, perceptual loss, and identity preserving loss.	The qualitative and quantitative performance of the developed model is compared with state-of-the-art methods. The use of perceptual loss generated visually pleasing results.	A dataset with polarimetric and visible facial signatures from 111 subjects

4	Vehicle detection in aerial images [96]	To develop a lightweight deep CNN model to detect vehicles in aerial images using D-GAN effectively	The architecture has two parts: lightweight deep CNN was developed to accurately detect vehicles and a multicondition-constrained GAN to generate samples to cope with data insufficiency.	The model tested on the Munich dataset achieved a mean average precision of 86.9%.	Performance evaluation is done on Munich public dataset and the collected dataset.

5	Image deraining [97]	To develop a deep learning model to remove rain steak from images	A feature supervised D-GAN was developed to remove rain from a single image. Feature supervised D-GAN has two subnets to generate derained images that are very close to the real image.	The developed model was tested on synthetic and real-world images. It showed better performance than the existing state-of-the-art deraining methods.	Performance evaluated on real-world images and two synthetic datasets.

6	Scene generation [12]	To develop a model to generate scenes based on the conditional D-GAN	A model named PSGAN was developed to generate a multidomain particular scene. The quality of the images is improved by spectral normalization.	The developed model is compared with Pix2Pix and StarGAN. 97% accuracy is achieved using PSGAN as against 95% accuracy achieved using StarGAN.	The performance of the model is evaluated on MNIST, CIFAR-10, and LSUN.

7	Human pose estimation [98]	To develop self-attention D-GAN to perform human pose estimation	The D-GAN model used hourglass networks as its backbone. Hourglass architecture has Conv-Deconv architecture and residual blocks. The generator predicts the pose, while the discriminator enforces structural constraints to refine the postures.	The model outperformed the state-of-the-art methods on benchmark datasets.	The performance of the model is evaluated on Leeds Sports Pose and MPII human pose dataset.

8	Automatic pearl classification [57]	To develop deep learning models to perform automatic classification of pearls	Multiview GAN is used to expand the pearl images dataset. Multistream CNN is trained using the expanded dataset.	The image generated using the multiview GAN is used to reduce the existing multistream CNN significantly.	The dataset includes 10,500 pearls, with seven categories and each category containing 1,500 pearls.

9	Image dehazing [99].	To develop a deep learning model to recover the image’s texture information and enhance hazy scenes’ visual performance	Attention-to-attention generative network model is developed to map hay images to haze-free images. All the instance normalization layers are removed to generate high-quality images.	The developed model performed better state-of-the-art methods for both real-world and synthetic images	NYU2 synthetic dataset with 1300 images and SUN3D synthetic dataset with 150 images.

10	Gesture recognition [100]	To propose a new gesture recognition algorithm based on D-CNN and DCGAN	For a particular gesture, the model recognizes the meaning of the gesture. DCGAN is used to solve overfitting in case of data insufficiency. Preprocessing is done to improve illumination conditions.	An accuracy of 90.45% is achieved.	Data collected using a computer containing 1200 images for each gesture.

11	Face depth estimation [101]	To develop a D-GAN-based method to estimate the depth map for a given facial image	D-GAN architecture is used to estimate the depth of a 2D image for 3D reconstruction. Data augmentation is done to improve the robustness of the models. Transformations such as slight rotation clockwise, Gaussian blur, and histogram equalization were applied to the image.	Several variants of the D-GAN were evaluated for depth estimation. Wasserstein GAN was found to be the most robust model for depth estimation.	The Texas 3D face recognition database and Bosphorus database for 3D face analysis.

12	Image enhancement [102]	To propose an image enhancement model using the conditional D-GAN based on the nonsaturating game	The super-resolution method is combined with the D-GAN to generate a clearer image. The architecture has 23 layers composed of convolution layers with the ReLU activation function.	The model is compared with existing methods, which showed an improvement in peak signal-to-noise ratio by 2.38 dB.	Images from Flickr and ImageNet datasets were used without augmentation.

13	Retinal image synthesis [103]	To propose multiple-channels-multiple-landmarks, a preprocessing pipeline to synthesize retinal images from optic cup images	Residual neural network and U-Net were integrated to form residual U-Net architecture. Residual U-Net is capable of capturing finer-scale details. Multiple-landmark maps comprise of batch normal layer, convolution layer, and ReLU activation. The final layer has a sigmoid activation function.	The proposed multiple-channels-multiple-landmarks model outperformed the existing single vessel-based methods. Pix2Pix, using the proposed method, generated realistic images.	Public fundus image datasets DRIVE and DRISHTI-GS were used.