|
S. no | Model | Application | Network architecture | Methodology | Key advantage | Major limitation |
|
1 | SRGAN [9] | Image super-resolution | Generator: two convolutional layers with small 3 × 3 kernels and 64 feature maps, batch normalization layers, and parametric ReLU. Discriminator: leaky ReLU activation function. | A perceptual loss function is proposed, which is composed of content loss and adversarial loss | Low-resolution images are converted into a high-resolution image for 4x upscaling factors | Texture information is not real enough, accompanied by some noise |
|
2 | ACGAN [15] | Image synthesis | Generator: has a series of deconvolutional layers, also known as transposed convolutional layers. Discriminator: has a set of 2D convolutional layers with leaky ReLU followed by linear layers and softmax and sigmoid functions for each of its outputs. | Two variants of the model were trained to generate images of resolution 64 × 64 and 128 × 128 spatial resolutions | Accuracy can be assessed for individual classes | Ignores the loss component arising from class labels when a label is unavailable for a given training image |
|
3 | CGAN [16] | Image-image translation, image tagging, and face generation (Gauthier, J. (2014)) | Generator and discriminator are conditioned on some arbitrary external information with the ReLU activation function and sigmoid for the output layer. | Minimize the value function for G and maximize the value function for D | CGAN could easily accept a multimodal embedding as the conditional input | CGAN is not strictly unsupervised; some kind of labeling is required for them to work |
|
4 | InfoGAN [1, 2] | Facial image generation | Generator: upconventional architecture and normal ReLU activation function. Discriminator: leaky ReLU with leaky rate 0.1 applied to hidden layers as nonlinearity. | Learn disentangled representations by maximizing mutual information | InfoGAN is capable of learning disentangled and interpretable representation | Sometimes, it requires adding noise to the data to stabilize the network |
|
5 | DCGAN [91] | General image representations | Generator: the ReLU activation except for the output layer, which uses the tanh function and batch norm. Discriminator: leaky ReLU for high-resolution modeling and batch norm. | The hierarchy of representations is learned from object parts | Stable, good representations of images, easy convergence | Gradients disappear or explode |
|
6 | LAPGAN [10] | Generation of images | Laplacian pyramid of convolutional networks. | Image generation in a coarse-to-fine fashion | Independent training of each pyramid level | Nonconvergence and mode collapse |
|
7 | SAGAN [11] | Generation of images | Spectral normalization is applied to the GAN generator. | Realistic images are generated | Inception score is boosted, Frechet inception distance is reduced, and images are generated sequentially | Attention is not extended |
|
8 | GRAN [12] | Generating realistic images | Recurrent CNN with constraints. | Images are generated by incremental updates to the canvas using a recurrent network | Sequential generation of images | Samples collapse on training for a long duration |
|
9 | VAE-GAN [92] | Facial image generation | GAN discriminator is used in place of a VAE’s decoder to learn the loss function. | Combines GAN and VAE to produce an encoder of data into the latent space | Its advantage, GAN, and VAE are combined in a single model | The major drawback of the VAE is the blurry output it generates |
|
10 | BIGAN [18] | Image generation | Generator: in addition to G, it has an encoder to map the data to latent representations. Discriminator: discriminates both data and latent spaces. | Learn features for related semantic tasks and use them in unsupervised settings | Minimization of the reconstruction loss | Generator and discriminator are highly dependent on each other in the training phase |
|
11 | AAE [19] | Dimensionality reduction, data visualization, disentangling the style of the image, and unsupervised clustering | Generative autoencoder. | Variational inference is performed by matching the arbitrary prior distribution with the autoencoder’s aggregated posterior of the hidden code vector | Balanced approximation; this method can be extended to semisupervised learning and is better than variational autoencoders | Samples generated are blurry and smoothened (Zhang, J. et al., 2018) |
|
12 | Pix2Pix [93] | Image-image translation | Generator: UNET architecture. Discriminator: PatchGAN classifier. ReLU activation function. Batch normalization. | The network learns the loss associated with the data and the task, making it applicable for a wide variety of tasks | Parameter reduction, and realistic images are generated | The required images are to be one-one paired |
|