Complexity

Review Article

Deep CNN and Deep GAN in Computational Visual Perception-Driven Image Analysis

Table 3

Recent advancements of the D-CNN in computer vision.


S. no	Application	The objective of the study	Methodology/network architecture	Performance	Dataset	Depth/layer sizes

1	Olive fruit variety classification [53]	To provide computer vision methodology for automatic classification of seven different olive fruit varieties using an image processing technique	Six different D-CNN architectures, namely, AlexNet, residual neural network-50, residual neural network-101, inception-residual neural network V2, Inception V1, and Inception V3, were employed.	AlexNet: 89.90%, residual neural network-50: 94%, residual neural network-101: 95.91%, inception-residual neural network V2: 91.81%, Inception V1: 94.86%, and Inception V3: 95.33%.	The dataset was generated from 400 photographs of olive fruits of each variety.	Five convolutional layers

2	Fabric defect detection [54]	On-loom fabric defect detection combines image preprocessing, candidate defect map generation, fabric motif determination, and D-CNN	Seven-layer D-CNN with pairwise potential activation layer as a third hidden layer.	Accuracy for predicting the presence of a defect in an image: 95%. Accuracy for counting the number of defects in an image: 98%.	Fabric defect dataset created using an on-loom fabric imaging system.	Seven-layer CNN, which includes pairwise potential activation layer

3	Polarimetric synthetic aperture radar image classification [55]	Polarimetric synthetic aperture radar image classification classifies image pixels of terrain types, namely, forest, water, grass, and sand	Complex-valued D-CNN and real-valued D-CNN.	Complex-valued D-CNN: 93.4%; real-valued D-CNN: 89.9%.	The performance is analyzed with the airborne steered array radar dataset and the electronically steered array radar dataset.	Six-layer CNN

4	Visual aesthetic quality assessment [56]	To present a biological model for three tasks: aesthetic score regression, aesthetic quality classification, and aesthetic score distribution prediction	A double-subnet gated peripheral-foveal convolutional neural network: a foveal and a peripheral subnet. The peripheral subnet mimics peripheral vision, while foveal extracts fine-grained features.	Gated peripheral-foveal convolutional neural networks (VGG16): 80.70%.	Standard aesthetic visual assessment datasets and photo. Net datasets are used for unified aesthetic prediction tasks.	Nine-layer CNN

5	3D object recognition [57]	To take multiview images captured from partial angles as the input and perform 3D object detection using the 3D CNN	3D object information is encoded from the 3D spatial dimension. 3D kernel, the view images are applied to perform 3D convolution.	ModelNet10 dataset: 94.5%. ModelNet40 dataset: 93.9%.	Pearl image dataset with 10,500 images split into seven classes.	Eight layers

6	Medical image classification [58]	To develop a feature extractor using a fine-tuned D-CNN and to classify medical images	Fine-tuned AlexNet and GoogLeNet D-CNN architectures were used in two ways: (i) as an image feature extractor and to train multiclass support vector machines; (ii) as a classifier to generate softmax probabilities.	GoogLeNet: 81.03%. AlexNet: 77.55%.	Image CLEF 2016 medical image public dataset with 6776 training images and 4166 testing images.	Eight layers with five convolutional layers followed by three fully connected layers and max pooling layers

7	Detection and recognition of dumpsters [59]	Visual detection of dumpsters using a twofold methodology with minimal labeling of the dataset to a have census of their type and numbers	Google Inception v3 is used, and a D-CNN is pretrained with 1,500,000 images corresponding to 1000 different classes. ReLU is used as an activation function.	94%.	Dumpsters dataset with 27,624 labeled images provided by Ecoembes.	27-layer deep CNN

8	Classification of rice grain images [60]	Localization and classification of rice grain images using contrast-limited adaptive histogram equalization technique	Region-based D-CNN is used to localize and classify rice grains. Dropout regularization and transfer learning were used to avoid overfitting.	81% accuracy is achieved as against 50–76% accuracy of human experts.	MIMR1 to MIMR8 datasets to classify rice into sticky and paddy rice.	Residual neural network-50 is used as the prominent architecture, which is 50 layers deep

9	Object recognition [61]	To evaluate the performance of the inception, recurrent residual convolutional neural network model on benchmark datasets, namely, Tiny ImageNet-200, Canadian Institute for Advanced Research-100, CU3D-100, and Canadian Institute for Advanced Research-10	An inception recurrent residual convolutional neural network, a deep CNN model, was introduced. It utilizes the power of the residual network, inception network, and recurrent convolutional neural network.	72.78% accuracy was achieved, which is 4.53% better than a recurrent CNN.	The model’s performance is evaluated on different benchmark datasets, namely, Canadian Institute for Advanced Research, Canadian Institute for Advanced Research-100, Tiny ImageNet-200, and CU3D-100.	Five-layer CNN

10	3D object classification [62]	To successfully classify 3D objects for mobile robots irrespective of starting positions of object modeling	A novel 3D object representation using the 3D CNN with a row-wise max pooling layer and cylinder occupancy grid was introduced.	The results showed that the cylindrical occupancy grid performed better than the existing rectangular algorithms with accuracy 91%.	The performance assessment is done on the ModelNet10 dataset and dataset with six classes collected using the mobile robot.	Three-layer CNN

11	Product quality control [63]	To automatically detect defects in products using fast and robust deep CNN	A simplified D-CNN architecture consisting of nested convolutional and pooling layers with the ReLU activation function is used. It has two parts: a classification frame and a detection frame.	An accuracy of 99.8% is achieved on a benchmark dataset provided by the German Association of Pattern Recognition.	Deutsche Arbeitsgemeinschaft fur Mustererkennung e.V German dataset with six classes. Each class consists of 1000 defect-free images and 150 defective images.	11-layer CNN

12	Recognition of Chinese food [64]	To propose an efficient D-CNN architecture for Chinese food recognition	A 5-layer deep CNN architecture performs a pipeline of processing to optimize the entire network through backpropagation.	97.12% accuracy was achieved, which is better than other bag feature methods.	Chinese food image dataset composed of 8734 images under 25 food categories.	Five-layer deep CNN

13	Age recognition from facial images [65]	Recognition of age from the facial image using pretrained models	MobileNetV2, residual neural network, and pretrained models such as soft stagewise regression networks were used for age recognition. Multiclass classification is performed, which is followed by regression to calculate the age.	Residual neural networks performed better than the other two network models.	The dataset contains 460,723 photographs from the internet movie database cinema website and another dataset with 62,328 pictures from Wikipedia.	Three-layer CNN

14	Person recognition [66]	To develop an effective and efficient multiple-person recognition system for face recognition in random video sequences using the D-CNN	Multiple video faces are detected, and the VGG-19 D-CNN classifier is trained to identify the facial images. The model is tested using standard labeled faces in the wild database.	96.83% accuracy is achieved using the VGG-19 D-CNN classifier.	In the training phase, images from the Chinese Academy of Sciences WebFace database with 9000 classes were utilized. In the validation phase, labeled faces in the wild were used.	Three-layer CNN

15	Multiformat digit recognition [67]	To recognize unconstrained natural images of digits using DIGI-Net	DIGI-Net D-CNN architecture is trained and tested on the MNIST, CVL single digit dataset, and the Chars74K dataset digits.	MNIST: 99.11%, Computer Vision Lab single digit dataset: 93.29%, and digits of the Chars74K dataset: 97.60%.	The performance evaluation is done on MNIST, CVL single digit dataset, and the Chars74K dataset digits.	Seven-layer deep CNN

16	Liver image classification [68]	To develop a perpetual hash-based D-CNN to classify liver images to reduce the time taken to classify liver computed tomography images	A fused perceptual hash-based D-CNN, a hybrid model, was designed to identify malignant and benign masses using computer tomography images.	98.2% accuracy was achieved using the fused perceptual hash-based D-CNN.	The dataset is obtained from Elazig Education and Research Hospital Radiology Laboratory.	Seven-layer deep CNN

17	Object detection [49]	Hardware-oriented ultrahigh-speed object detection using the D-CNN	Two-stage high-speed object detection bounding boxes in the first stage and D-CNN-based classification in the second stage.	MNIST dataset: 98.01%. Self-built dataset: 96.5%.	MNIST and self-built dataset.	Five-layer deep CNN