Review Article

Deep CNN and Deep GAN in Computational Visual Perception-Driven Image Analysis

Table 3

Recent advancements of the D-CNN in computer vision.

S. noApplicationThe objective of the studyMethodology/network architecturePerformanceDatasetDepth/layer sizes

1Olive fruit variety classification [53]To provide computer vision methodology for automatic classification of seven different olive fruit varieties using an image processing techniqueSix different D-CNN architectures, namely, AlexNet, residual neural network-50, residual neural network-101, inception-residual neural network V2, Inception V1, and Inception V3, were employed.AlexNet: 89.90%, residual neural network-50: 94%, residual neural network-101: 95.91%, inception-residual neural network V2: 91.81%, Inception V1: 94.86%, and Inception V3: 95.33%.The dataset was generated from 400 photographs of olive fruits of each variety.Five convolutional layers

2Fabric defect detection [54]On-loom fabric defect detection combines image preprocessing, candidate defect map generation, fabric motif determination, and D-CNNSeven-layer D-CNN with pairwise potential activation layer as a third hidden layer.Accuracy for predicting the presence of a defect in an image: 95%. Accuracy for counting the number of defects in an image: 98%.Fabric defect dataset created using an on-loom fabric imaging system.Seven-layer CNN, which includes pairwise potential activation layer

3Polarimetric synthetic aperture radar image classification [55]Polarimetric synthetic aperture radar image classification classifies image pixels of terrain types, namely, forest, water, grass, and sandComplex-valued D-CNN and real-valued D-CNN.Complex-valued D-CNN: 93.4%; real-valued D-CNN: 89.9%.The performance is analyzed with the airborne steered array radar dataset and the electronically steered array radar dataset.Six-layer CNN

4Visual aesthetic quality assessment [56]To present a biological model for three tasks: aesthetic score regression, aesthetic quality classification, and aesthetic score distribution predictionA double-subnet gated peripheral-foveal convolutional neural network: a foveal and a peripheral subnet. The peripheral subnet mimics peripheral vision, while foveal extracts fine-grained features.Gated peripheral-foveal convolutional neural networks (VGG16): 80.70%.Standard aesthetic visual assessment datasets and photo. Net datasets are used for unified aesthetic prediction tasks.Nine-layer CNN

53D object recognition [57]To take multiview images captured from partial angles as the input and perform 3D object detection using the 3D CNN3D object information is encoded from the 3D spatial dimension. 3D kernel, the view images are applied to perform 3D convolution.ModelNet10 dataset: 94.5%. ModelNet40 dataset: 93.9%.Pearl image dataset with 10,500 images split into seven classes.Eight layers

6Medical image classification [58]To develop a feature extractor using a fine-tuned D-CNN and to classify medical imagesFine-tuned AlexNet and GoogLeNet D-CNN architectures were used in two ways: (i) as an image feature extractor and to train multiclass support vector machines; (ii) as a classifier to generate softmax probabilities.GoogLeNet: 81.03%. AlexNet: 77.55%.Image CLEF 2016 medical image public dataset with 6776 training images and 4166 testing images.Eight layers with five convolutional layers followed by three fully connected layers and max pooling layers

7Detection and recognition of dumpsters [59]Visual detection of dumpsters using a twofold methodology with minimal labeling of the dataset to a have census of their type and numbersGoogle Inception v3 is used, and a D-CNN is pretrained with 1,500,000 images corresponding to 1000 different classes. ReLU is used as an activation function.94%.Dumpsters dataset with 27,624 labeled images provided by Ecoembes.27-layer deep CNN

8Classification of rice grain images [60]Localization and classification of rice grain images using contrast-limited adaptive histogram equalization techniqueRegion-based D-CNN is used to localize and classify rice grains. Dropout regularization and transfer learning were used to avoid overfitting.81% accuracy is achieved as against 50–76% accuracy of human experts.MIMR1 to MIMR8 datasets to classify rice into sticky and paddy rice.Residual neural network-50 is used as the prominent architecture, which is 50 layers deep

9Object recognition [61]To evaluate the performance of the inception, recurrent residual convolutional neural network model on benchmark datasets, namely, Tiny ImageNet-200, Canadian Institute for Advanced Research-100, CU3D-100, and Canadian Institute for Advanced Research-10An inception recurrent residual convolutional neural network, a deep CNN model, was introduced. It utilizes the power of the residual network, inception network, and recurrent convolutional neural network.72.78% accuracy was achieved, which is 4.53% better than a recurrent CNN.The model’s performance is evaluated on different benchmark datasets, namely, Canadian Institute for Advanced Research, Canadian Institute for Advanced Research-100, Tiny ImageNet-200, and CU3D-100.Five-layer CNN

103D object classification [62]To successfully classify 3D objects for mobile robots irrespective of starting positions of object modelingA novel 3D object representation using the 3D CNN with a row-wise max pooling layer and cylinder occupancy grid was introduced.The results showed that the cylindrical occupancy grid performed better than the existing rectangular algorithms with accuracy 91%.The performance assessment is done on the ModelNet10 dataset and dataset with six classes collected using the mobile robot.Three-layer CNN

11Product quality control [63]To automatically detect defects in products using fast and robust deep CNNA simplified D-CNN architecture consisting of nested convolutional and pooling layers with the ReLU activation function is used. It has two parts: a classification frame and a detection frame.An accuracy of 99.8% is achieved on a benchmark dataset provided by the German Association of Pattern Recognition.Deutsche Arbeitsgemeinschaft fur Mustererkennung e.V German dataset with six classes. Each class consists of 1000 defect-free images and 150 defective images.11-layer CNN

12Recognition of Chinese food [64]To propose an efficient D-CNN architecture for Chinese food recognitionA 5-layer deep CNN architecture performs a pipeline of processing to optimize the entire network through backpropagation.97.12% accuracy was achieved, which is better than other bag feature methods.Chinese food image dataset composed of 8734 images under 25 food categories.Five-layer deep CNN

13Age recognition from facial images [65]Recognition of age from the facial image using pretrained modelsMobileNetV2, residual neural network, and pretrained models such as soft stagewise regression networks were used for age recognition. Multiclass classification is performed, which is followed by regression to calculate the age.Residual neural networks performed better than the other two network models.The dataset contains 460,723 photographs from the internet movie database cinema website and another dataset with 62,328 pictures from Wikipedia.Three-layer CNN

14Person recognition [66]To develop an effective and efficient multiple-person recognition system for face recognition in random video sequences using the D-CNNMultiple video faces are detected, and the VGG-19 D-CNN classifier is trained to identify the facial images. The model is tested using standard labeled faces in the wild database.96.83% accuracy is achieved using the VGG-19 D-CNN classifier.In the training phase, images from the Chinese Academy of Sciences WebFace database with 9000 classes were utilized. In the validation phase, labeled faces in the wild were used.Three-layer CNN

15Multiformat digit recognition [67]To recognize unconstrained natural images of digits using DIGI-NetDIGI-Net D-CNN architecture is trained and tested on the MNIST, CVL single digit dataset, and the Chars74K dataset digits.MNIST: 99.11%, Computer Vision Lab single digit dataset: 93.29%, and digits of the Chars74K dataset: 97.60%.The performance evaluation is done on MNIST, CVL single digit dataset, and the Chars74K dataset digits.Seven-layer deep CNN

16Liver image classification [68]To develop a perpetual hash-based D-CNN to classify liver images to reduce the time taken to classify liver computed tomography imagesA fused perceptual hash-based D-CNN, a hybrid model, was designed to identify malignant and benign masses using computer tomography images.98.2% accuracy was achieved using the fused perceptual hash-based D-CNN.The dataset is obtained from Elazig Education and Research Hospital Radiology Laboratory.Seven-layer deep CNN

17Object detection [49]Hardware-oriented ultrahigh-speed object detection using the D-CNNTwo-stage high-speed object detection bounding boxes in the first stage and D-CNN-based classification in the second stage.MNIST dataset: 98.01%. Self-built dataset: 96.5%.MNIST and self-built dataset.Five-layer deep CNN