|
S. no | Application | The objective of the study | Methodology/network architecture | Performance | Dataset | Depth/layer sizes |
|
1 | Olive fruit variety classification [53] | To provide computer vision methodology for automatic classification of seven different olive fruit varieties using an image processing technique | Six different D-CNN architectures, namely, AlexNet, residual neural network-50, residual neural network-101, inception-residual neural network V2, Inception V1, and Inception V3, were employed. | AlexNet: 89.90%, residual neural network-50: 94%, residual neural network-101: 95.91%, inception-residual neural network V2: 91.81%, Inception V1: 94.86%, and Inception V3: 95.33%. | The dataset was generated from 400 photographs of olive fruits of each variety. | Five convolutional layers |
|
2 | Fabric defect detection [54] | On-loom fabric defect detection combines image preprocessing, candidate defect map generation, fabric motif determination, and D-CNN | Seven-layer D-CNN with pairwise potential activation layer as a third hidden layer. | Accuracy for predicting the presence of a defect in an image: 95%. Accuracy for counting the number of defects in an image: 98%. | Fabric defect dataset created using an on-loom fabric imaging system. | Seven-layer CNN, which includes pairwise potential activation layer |
|
3 | Polarimetric synthetic aperture radar image classification [55] | Polarimetric synthetic aperture radar image classification classifies image pixels of terrain types, namely, forest, water, grass, and sand | Complex-valued D-CNN and real-valued D-CNN. | Complex-valued D-CNN: 93.4%; real-valued D-CNN: 89.9%. | The performance is analyzed with the airborne steered array radar dataset and the electronically steered array radar dataset. | Six-layer CNN |
|
4 | Visual aesthetic quality assessment [56] | To present a biological model for three tasks: aesthetic score regression, aesthetic quality classification, and aesthetic score distribution prediction | A double-subnet gated peripheral-foveal convolutional neural network: a foveal and a peripheral subnet. The peripheral subnet mimics peripheral vision, while foveal extracts fine-grained features. | Gated peripheral-foveal convolutional neural networks (VGG16): 80.70%. | Standard aesthetic visual assessment datasets and photo. Net datasets are used for unified aesthetic prediction tasks. | Nine-layer CNN |
|
5 | 3D object recognition [57] | To take multiview images captured from partial angles as the input and perform 3D object detection using the 3D CNN | 3D object information is encoded from the 3D spatial dimension. 3D kernel, the view images are applied to perform 3D convolution. | ModelNet10 dataset: 94.5%. ModelNet40 dataset: 93.9%. | Pearl image dataset with 10,500 images split into seven classes. | Eight layers |
|
6 | Medical image classification [58] | To develop a feature extractor using a fine-tuned D-CNN and to classify medical images | Fine-tuned AlexNet and GoogLeNet D-CNN architectures were used in two ways: (i) as an image feature extractor and to train multiclass support vector machines; (ii) as a classifier to generate softmax probabilities. | GoogLeNet: 81.03%. AlexNet: 77.55%. | Image CLEF 2016 medical image public dataset with 6776 training images and 4166 testing images. | Eight layers with five convolutional layers followed by three fully connected layers and max pooling layers |
|
7 | Detection and recognition of dumpsters [59] | Visual detection of dumpsters using a twofold methodology with minimal labeling of the dataset to a have census of their type and numbers | Google Inception v3 is used, and a D-CNN is pretrained with 1,500,000 images corresponding to 1000 different classes. ReLU is used as an activation function. | 94%. | Dumpsters dataset with 27,624 labeled images provided by Ecoembes. | 27-layer deep CNN |
|
8 | Classification of rice grain images [60] | Localization and classification of rice grain images using contrast-limited adaptive histogram equalization technique | Region-based D-CNN is used to localize and classify rice grains. Dropout regularization and transfer learning were used to avoid overfitting. | 81% accuracy is achieved as against 50–76% accuracy of human experts. | MIMR1 to MIMR8 datasets to classify rice into sticky and paddy rice. | Residual neural network-50 is used as the prominent architecture, which is 50 layers deep |
|
9 | Object recognition [61] | To evaluate the performance of the inception, recurrent residual convolutional neural network model on benchmark datasets, namely, Tiny ImageNet-200, Canadian Institute for Advanced Research-100, CU3D-100, and Canadian Institute for Advanced Research-10 | An inception recurrent residual convolutional neural network, a deep CNN model, was introduced. It utilizes the power of the residual network, inception network, and recurrent convolutional neural network. | 72.78% accuracy was achieved, which is 4.53% better than a recurrent CNN. | The model’s performance is evaluated on different benchmark datasets, namely, Canadian Institute for Advanced Research, Canadian Institute for Advanced Research-100, Tiny ImageNet-200, and CU3D-100. | Five-layer CNN |
|
10 | 3D object classification [62] | To successfully classify 3D objects for mobile robots irrespective of starting positions of object modeling | A novel 3D object representation using the 3D CNN with a row-wise max pooling layer and cylinder occupancy grid was introduced. | The results showed that the cylindrical occupancy grid performed better than the existing rectangular algorithms with accuracy 91%. | The performance assessment is done on the ModelNet10 dataset and dataset with six classes collected using the mobile robot. | Three-layer CNN |
|
11 | Product quality control [63] | To automatically detect defects in products using fast and robust deep CNN | A simplified D-CNN architecture consisting of nested convolutional and pooling layers with the ReLU activation function is used. It has two parts: a classification frame and a detection frame. | An accuracy of 99.8% is achieved on a benchmark dataset provided by the German Association of Pattern Recognition. | Deutsche Arbeitsgemeinschaft fur Mustererkennung e.V German dataset with six classes. Each class consists of 1000 defect-free images and 150 defective images. | 11-layer CNN |
|
12 | Recognition of Chinese food [64] | To propose an efficient D-CNN architecture for Chinese food recognition | A 5-layer deep CNN architecture performs a pipeline of processing to optimize the entire network through backpropagation. | 97.12% accuracy was achieved, which is better than other bag feature methods. | Chinese food image dataset composed of 8734 images under 25 food categories. | Five-layer deep CNN |
|
13 | Age recognition from facial images [65] | Recognition of age from the facial image using pretrained models | MobileNetV2, residual neural network, and pretrained models such as soft stagewise regression networks were used for age recognition. Multiclass classification is performed, which is followed by regression to calculate the age. | Residual neural networks performed better than the other two network models. | The dataset contains 460,723 photographs from the internet movie database cinema website and another dataset with 62,328 pictures from Wikipedia. | Three-layer CNN |
|
14 | Person recognition [66] | To develop an effective and efficient multiple-person recognition system for face recognition in random video sequences using the D-CNN | Multiple video faces are detected, and the VGG-19 D-CNN classifier is trained to identify the facial images. The model is tested using standard labeled faces in the wild database. | 96.83% accuracy is achieved using the VGG-19 D-CNN classifier. | In the training phase, images from the Chinese Academy of Sciences WebFace database with 9000 classes were utilized. In the validation phase, labeled faces in the wild were used. | Three-layer CNN |
|
15 | Multiformat digit recognition [67] | To recognize unconstrained natural images of digits using DIGI-Net | DIGI-Net D-CNN architecture is trained and tested on the MNIST, CVL single digit dataset, and the Chars74K dataset digits. | MNIST: 99.11%, Computer Vision Lab single digit dataset: 93.29%, and digits of the Chars74K dataset: 97.60%. | The performance evaluation is done on MNIST, CVL single digit dataset, and the Chars74K dataset digits. | Seven-layer deep CNN |
|
16 | Liver image classification [68] | To develop a perpetual hash-based D-CNN to classify liver images to reduce the time taken to classify liver computed tomography images | A fused perceptual hash-based D-CNN, a hybrid model, was designed to identify malignant and benign masses using computer tomography images. | 98.2% accuracy was achieved using the fused perceptual hash-based D-CNN. | The dataset is obtained from Elazig Education and Research Hospital Radiology Laboratory. | Seven-layer deep CNN |
|
17 | Object detection [49] | Hardware-oriented ultrahigh-speed object detection using the D-CNN | Two-stage high-speed object detection bounding boxes in the first stage and D-CNN-based classification in the second stage. | MNIST dataset: 98.01%. Self-built dataset: 96.5%. | MNIST and self-built dataset. | Five-layer deep CNN |
|