Review Article

A State-of-the-Art Computer Vision Adopting Non-Euclidean Deep-Learning Models

Table 1

Performance summary of the described models with their corresponding datasets as well as their limitations and complexity.

ArchitecturesYearCategoryDatasetsResult (accuracy)NoveltyLimitations

ShapeNet [78]2015Voxel-basedFAUST [91]4.04 (EER %)Suggests generalizing CNN to learn hierarchical task-specific featuresExperiments were limited to meshes only
SCAPE [92]3.34 (EER %)
TOSCA [93]5.61 (EER %)

MVCNN [33]2015Multiview-basedModelNet40 [94]90.1%Compactness, efficiency, and improved accuracy are attained by creating descriptors that are aggregations of data from many viewpointsBuilding compact and discriminative 3D descriptors is untested
SketchClean [95]87.2%Time-consuming, large datasets

TGNet [85]2017Point cloud-basedScanNet [96]62.2%Changing CNNs’ linear convolutional feature map combinationMultiobject linked area labeling failure, mostly due to the TGConv’s narrow dynamic range
S3DIS [97]88.5 (OA)
57.8(mIoU)
Paris-Lille-3D [98]96.97 (OA)
68.17 (mIoU)

MoNet [69]2017Spatial graph convolutionMNIST [99]95.05%The ensemble CNN model combines two CNNsThis method underestimates segmentation, and weighted categorical cross-entropy improves outcomes
GraphCORA [100]81.69 ± 0.48%
PubMed [101]78.81 ± 0.44%
FAUST [91]0 cm error (at 90% points)
4 cm error (at 99% points)

SplineCNN [70]2018Spatial graph convolutionMNIST [99]Grid99.22 %Using a trainable continuous kernel function and B-spline control values combines local spatial informationGlobal behavior worsens with larger geodesic error bounds. Geodesic misclassifications are infrequent
Superpixels95.22 %
CORA [100]89.48 ± 0.31%

CayleyNets [28]2019Spectral graph convolutionMNIST [99]99.18 %Specializes in small frequency bands with few filter parameters while maintaining spatial localizationBidirectional lines cost more, and the routing method is more sophisticated
CORA [100] (standard)81.0 ± 0.5%
CORA [100] (extended)88.09 ± 0.6%

CurvaNet [63]2020Spectral graph convolutionSCAPE [92]0.875U-Net-like hierarchical structure is shown to exploit multiscale curvature characteristicsNo structural regularization such as segment class topology
FAUST [91]0.952
MIT0.911
Adobe fuse0.943
All datasets0.934
PSBHuman0.876
Teddy0.904
Fish0.931

MDGCN [42]2020Spectral-spatial-basedIndian Pines [102]OA93.47 ± 0.38 %Utilizes dynamic graphs that are gradually refined and can accurately encode the inherent similarities between image regionsLSTM is used which requires high computational cost. The introduced GC unit has lower-order approximations of spectral graph convolution
AA96.24 ± 0.21 %
Kappa92.55 ± 0.43 %
The University of Pavia [103]OA95.68 ± 0.22 %
AA93.15 ± 0.28 %
Kappa94.25 ± 0.29 %
Kennedy Space Center [104]OA99.79 ± 0.01 %
AA99.61 ± 0.02 %
Kappa99.77 ± 0.01 %

ACSCNN [30]2020Spectral graph convolutionFAUST [91]SHOT98.06% (geodesic distance = 0)Provided anisotropic convolution enables a more thorough capture of the intrinsic local information of signalsThe model needs shape segmentation and classification
99.26% (geodesic distance = 0.01)
199.56% (geodesic distance = 0)
99.87% (geodesic distance = 0.01)

MongeNet [87]2021Point cloud-basedThe gap between the target point cloud and the sample point cloud from the mesh shrunk more quickly. So, at a given optimization time, the input point cloud was represented more accuratelyCostly and time-consuming computations

UV-Net [31]2021Spectral graph convolutionMachining feature [105]99.94 ± 0.00%Solid model classificationUtilizes existing image and graph convolutional neural networks and can operate on B-rep dataThe model did not use the B-rep curve and surface types, edge convexity, half-edge ordering, etc. UV-grid features do not rotate
FabWave [106]94.51 ± 0.10%
SolidLetters97.24 ± 0.10%
MFCAD [107]99.95 ± 0.02 % (Per-face)Solid face segmentation
ABC [108]88.87 ± 0.70 % (Per-face)

EER = equal error rate; mIoU = mean intersection over union; OA = overall accuracy; AA = average accuracy; Kappa = Kappa coefficient; geodesic distance = shortest path between the vertices. FAUST [91] includes 300 high-resolution scans of 10 human participants in 30 positions and is used for 3D mesh reconstruction, shape analysis, virtual try-on, animation, and gaming. SCAPE [92]: the dataset contains 71 registered meshes of a particular person in different poses and is used for 3D mesh reconstruction, shape analysis, virtual try-on, animation, and gaming. TOSCA [93]: TOSCA (the object shape capture archive) is a repository of 3D scanned models of real-world objects and used for shape analysis, 3D object reconstruction, and deformation analysis. ScanNet [96]: it contains high-resolution 3D scans of over 1,500 indoor spaces and is used for scene understanding, 3D reconstruction, robotics, and virtual reality. S3DIS [97]: the dataset includes 6 large-scale indoor areas and is used for scene segmentation, object detection and recognition, and robotics. Paris-Lille-3D [97] is a large-scale dataset of dense point clouds representing urban environments in France. The dataset contains over 2 billion points, with a point density of approximately 20 points per square meter, and is used for urban planning, autonomous driving, object detection and recognition, and environmental monitoring. MINST [99] includes a total of 70,000 grayscale images of handwritten digits from zero to nine, each of which is 28 × 28 pixels in size, and is used for digit recognition and data augmentation. CORA: the dataset includes a total of 2,708 research papers in the field of computer science, each of which is represented by a bag-of-words vector of its abstract, and is used for citation network analysis, link prediction, text classification, and graph convolutional networks. PubMed [101]: the dataset includes information on over 32 million articles and is used for literature review, text mining, natural language processing, and biomedical informatics. PSB: the dataset consists of a set of protein structures and associated information that is used to evaluate and compare methods for predicting protein structure and function. Machining feature [105]: a synthetic labeled, balanced dataset representing machining features such as chamfers and circular end pockets applied to a cube. FabWave [106]: a small labeled, imbalanced collection of 5,373 3D shapes split into 52 mechanical part classes, such as brackets, gears, and o-rings. MFCAD [107]: a synthetic segmentation dataset of 15,488 3D shapes, similar to the machining feature dataset, but with multiple machining features. SolidLetters consists of 96 k 3D shapes generated by randomly extruding and filleting the 26 alphabets (a–z) to form class categories across 2002 style categories from fonts. ABC [108]: a real-world collection of millions of 3D shapes. The dataset is unlabeled and imbalanced and has many duplicates. ModelNet40 [94]: the dataset contains 12,311 CAD models of 40 different object categories and is used for object recognition, 3D recognition, and shape analysis. SketchClean [95] contains 160 categories, on which humans can achieve 93% recognition accuracy, and is used for sketch recognition, sketch synthesis, and human-computer interaction. Indian Pines [102]: it consists of 145 × 145 pixels with a spatial resolution of 20 m × 20 m, has 220 spectral channels covering the range from 0.4 to 2.5 μm, and is used for remote sensing, environmental monitoring, and agricultural analysis. University of Pavia [103]: the dataset captured Pavia University in Italy with the ROSIS sensor in 2001. It consists of 610 × 340 pixels with a spatial resolution of 1.3 m × 1.3 m, has 103 spectral channels in the wavelength range from 0.43 to 0.86 μm after removing noisy bands, and is used for urban analysis and remote sensing. Kennedy Space Center [104]: the Kennedy Space Center dataset was taken by the AVIRIS sensor in Florida with a spectral coverage ranging from 0.4 to 2.5 μm. This dataset contains 224 bands and 614 × 512 pixels with a spatial resolution of 18 m and is used for remote sensing and space exploration.