International Journal of Intelligent Systems

Review Article

A State-of-the-Art Computer Vision Adopting Non-Euclidean Deep-Learning Models

Table 1

Performance summary of the described models with their corresponding datasets as well as their limitations and complexity.


Architectures	Year	Category	Datasets				Result (accuracy)		Novelty	Limitations

ShapeNet [78]	2015	Voxel-based	FAUST [91]				4.04 (EER %)		Suggests generalizing CNN to learn hierarchical task-specific features	Experiments were limited to meshes only
			SCAPE [92]				3.34 (EER %)
			TOSCA [93]				5.61 (EER %)

MVCNN [33]	2015	Multiview-based	ModelNet40 [94]				90.1%		Compactness, efficiency, and improved accuracy are attained by creating descriptors that are aggregations of data from many viewpoints	Building compact and discriminative 3D descriptors is untested
			SketchClean [95]				87.2%			Time-consuming, large datasets

TGNet [85]	2017	Point cloud-based	ScanNet [96]				62.2%		Changing CNNs’ linear convolutional feature map combination	Multiobject linked area labeling failure, mostly due to the TGConv’s narrow dynamic range
			S3DIS [97]				88.5 (OA)
							57.8(mIoU)
			Paris-Lille-3D [98]				96.97 (OA)
							68.17 (mIoU)

MoNet [69]	2017	Spatial graph convolution	MNIST [99]				95.05%		The ensemble CNN model combines two CNNs	This method underestimates segmentation, and weighted categorical cross-entropy improves outcomes
			Graph		CORA [100]		81.69 ± 0.48%
					PubMed [101]		78.81 ± 0.44%
			FAUST [91]				0 cm error (at 90% points)
							4 cm error (at 99% points)

SplineCNN [70]	2018	Spatial graph convolution	MNIST [99]			Grid	99.22 %		Using a trainable continuous kernel function and B-spline control values combines local spatial information	Global behavior worsens with larger geodesic error bounds. Geodesic misclassifications are infrequent
						Superpixels	95.22 %
			CORA [100]				89.48 ± 0.31%

CayleyNets [28]	2019	Spectral graph convolution	MNIST [99]				99.18 %		Specializes in small frequency bands with few filter parameters while maintaining spatial localization	Bidirectional lines cost more, and the routing method is more sophisticated
			CORA [100] (standard)				81.0 ± 0.5%
			CORA [100] (extended)				88.09 ± 0.6%

CurvaNet [63]	2020	Spectral graph convolution	SCAPE [92]				0.875		U-Net-like hierarchical structure is shown to exploit multiscale curvature characteristics	No structural regularization such as segment class topology
			FAUST [91]				0.952
			MIT				0.911
			Adobe fuse				0.943
			All datasets				0.934
			PSB	Human			0.876
				Teddy			0.904
				Fish			0.931

MDGCN [42]	2020	Spectral-spatial-based	Indian Pines [102]	OA			93.47 ± 0.38 %		Utilizes dynamic graphs that are gradually refined and can accurately encode the inherent similarities between image regions	LSTM is used which requires high computational cost. The introduced GC unit has lower-order approximations of spectral graph convolution
				AA			96.24 ± 0.21 %
				Kappa			92.55 ± 0.43 %
			The University of Pavia [103]	OA			95.68 ± 0.22 %
				AA			93.15 ± 0.28 %
				Kappa			94.25 ± 0.29 %
			Kennedy Space Center [104]	OA			99.79 ± 0.01 %
				AA			99.61 ± 0.02 %
				Kappa			99.77 ± 0.01 %

ACSCNN [30]	2020	Spectral graph convolution	FAUST [91]	SHOT			98.06% (geodesic distance = 0)		Provided anisotropic convolution enables a more thorough capture of the intrinsic local information of signals	The model needs shape segmentation and classification
							99.26% (geodesic distance = 0.01)
				1			99.56% (geodesic distance = 0)
							99.87% (geodesic distance = 0.01)

MongeNet [87]	2021	Point cloud-based	The gap between the target point cloud and the sample point cloud from the mesh shrunk more quickly. So, at a given optimization time, the input point cloud was represented more accurately							Costly and time-consuming computations

UV-Net [31]	2021	Spectral graph convolution	Machining feature [105]				99.94 ± 0.00%	Solid model classification	Utilizes existing image and graph convolutional neural networks and can operate on B-rep data	The model did not use the B-rep curve and surface types, edge convexity, half-edge ordering, etc. UV-grid features do not rotate
			FabWave [106]				94.51 ± 0.10%
			SolidLetters				97.24 ± 0.10%
			MFCAD [107]				99.95 ± 0.02 % (Per-face)	Solid face segmentation
			ABC [108]				88.87 ± 0.70 % (Per-face)

EER = equal error rate; mIoU = mean intersection over union; OA = overall accuracy; AA = average accuracy; Kappa = Kappa coefficient; geodesic distance = shortest path between the vertices. FAUST [91] includes 300 high-resolution scans of 10 human participants in 30 positions and is used for 3D mesh reconstruction, shape analysis, virtual try-on, animation, and gaming. SCAPE [92]: the dataset contains 71 registered meshes of a particular person in different poses and is used for 3D mesh reconstruction, shape analysis, virtual try-on, animation, and gaming. TOSCA [93]: TOSCA (the object shape capture archive) is a repository of 3D scanned models of real-world objects and used for shape analysis, 3D object reconstruction, and deformation analysis. ScanNet [96]: it contains high-resolution 3D scans of over 1,500 indoor spaces and is used for scene understanding, 3D reconstruction, robotics, and virtual reality. S3DIS [97]: the dataset includes 6 large-scale indoor areas and is used for scene segmentation, object detection and recognition, and robotics. Paris-Lille-3D [97] is a large-scale dataset of dense point clouds representing urban environments in France. The dataset contains over 2 billion points, with a point density of approximately 20 points per square meter, and is used for urban planning, autonomous driving, object detection and recognition, and environmental monitoring. MINST [99] includes a total of 70,000 grayscale images of handwritten digits from zero to nine, each of which is 28 × 28 pixels in size, and is used for digit recognition and data augmentation. CORA: the dataset includes a total of 2,708 research papers in the field of computer science, each of which is represented by a bag-of-words vector of its abstract, and is used for citation network analysis, link prediction, text classification, and graph convolutional networks. PubMed [101]: the dataset includes information on over 32 million articles and is used for literature review, text mining, natural language processing, and biomedical informatics. PSB: the dataset consists of a set of protein structures and associated information that is used to evaluate and compare methods for predicting protein structure and function. Machining feature [105]: a synthetic labeled, balanced dataset representing machining features such as chamfers and circular end pockets applied to a cube. FabWave [106]: a small labeled, imbalanced collection of 5,373 3D shapes split into 52 mechanical part classes, such as brackets, gears, and o-rings. MFCAD [107]: a synthetic segmentation dataset of 15,488 3D shapes, similar to the machining feature dataset, but with multiple machining features. SolidLetters consists of 96 k 3D shapes generated by randomly extruding and filleting the 26 alphabets (a–z) to form class categories across 2002 style categories from fonts. ABC [108]: a real-world collection of millions of 3D shapes. The dataset is unlabeled and imbalanced and has many duplicates. ModelNet40 [94]: the dataset contains 12,311 CAD models of 40 different object categories and is used for object recognition, 3D recognition, and shape analysis. SketchClean [95] contains 160 categories, on which humans can achieve 93% recognition accuracy, and is used for sketch recognition, sketch synthesis, and human-computer interaction. Indian Pines [102]: it consists of 145 × 145 pixels with a spatial resolution of 20 m × 20 m, has 220 spectral channels covering the range from 0.4 to 2.5 μm, and is used for remote sensing, environmental monitoring, and agricultural analysis. University of Pavia [103]: the dataset captured Pavia University in Italy with the ROSIS sensor in 2001. It consists of 610 × 340 pixels with a spatial resolution of 1.3 m × 1.3 m, has 103 spectral channels in the wavelength range from 0.43 to 0.86 μm after removing noisy bands, and is used for urban analysis and remote sensing. Kennedy Space Center [104]: the Kennedy Space Center dataset was taken by the AVIRIS sensor in Florida with a spectral coverage ranging from 0.4 to 2.5 μm. This dataset contains 224 bands and 614 × 512 pixels with a spatial resolution of 18 m and is used for remote sensing and space exploration.