Review Article
A State-of-the-Art Computer Vision Adopting Non-Euclidean Deep-Learning Models
Table 1
Performance summary of the described models with their corresponding datasets as well as their limitations and complexity.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
EER = equal error rate; mIoU = mean intersection over union; OA = overall accuracy; AA = average accuracy; Kappa = Kappa coefficient; geodesic distance = shortest path between the vertices. FAUST [91] includes 300 high-resolution scans of 10 human participants in 30 positions and is used for 3D mesh reconstruction, shape analysis, virtual try-on, animation, and gaming. SCAPE [92]: the dataset contains 71 registered meshes of a particular person in different poses and is used for 3D mesh reconstruction, shape analysis, virtual try-on, animation, and gaming. TOSCA [93]: TOSCA (the object shape capture archive) is a repository of 3D scanned models of real-world objects and used for shape analysis, 3D object reconstruction, and deformation analysis. ScanNet [96]: it contains high-resolution 3D scans of over 1,500 indoor spaces and is used for scene understanding, 3D reconstruction, robotics, and virtual reality. S3DIS [97]: the dataset includes 6 large-scale indoor areas and is used for scene segmentation, object detection and recognition, and robotics. Paris-Lille-3D [97] is a large-scale dataset of dense point clouds representing urban environments in France. The dataset contains over 2 billion points, with a point density of approximately 20 points per square meter, and is used for urban planning, autonomous driving, object detection and recognition, and environmental monitoring. MINST [99] includes a total of 70,000 grayscale images of handwritten digits from zero to nine, each of which is 28 × 28 pixels in size, and is used for digit recognition and data augmentation. CORA: the dataset includes a total of 2,708 research papers in the field of computer science, each of which is represented by a bag-of-words vector of its abstract, and is used for citation network analysis, link prediction, text classification, and graph convolutional networks. PubMed [101]: the dataset includes information on over 32 million articles and is used for literature review, text mining, natural language processing, and biomedical informatics. PSB: the dataset consists of a set of protein structures and associated information that is used to evaluate and compare methods for predicting protein structure and function. Machining feature [105]: a synthetic labeled, balanced dataset representing machining features such as chamfers and circular end pockets applied to a cube. FabWave [106]: a small labeled, imbalanced collection of 5,373 3D shapes split into 52 mechanical part classes, such as brackets, gears, and o-rings. MFCAD [107]: a synthetic segmentation dataset of 15,488 3D shapes, similar to the machining feature dataset, but with multiple machining features. SolidLetters consists of 96 k 3D shapes generated by randomly extruding and filleting the 26 alphabets (a–z) to form class categories across 2002 style categories from fonts. ABC [108]: a real-world collection of millions of 3D shapes. The dataset is unlabeled and imbalanced and has many duplicates. ModelNet40 [94]: the dataset contains 12,311 CAD models of 40 different object categories and is used for object recognition, 3D recognition, and shape analysis. SketchClean [95] contains 160 categories, on which humans can achieve 93% recognition accuracy, and is used for sketch recognition, sketch synthesis, and human-computer interaction. Indian Pines [102]: it consists of 145 × 145 pixels with a spatial resolution of 20 m × 20 m, has 220 spectral channels covering the range from 0.4 to 2.5 μm, and is used for remote sensing, environmental monitoring, and agricultural analysis. University of Pavia [103]: the dataset captured Pavia University in Italy with the ROSIS sensor in 2001. It consists of 610 × 340 pixels with a spatial resolution of 1.3 m × 1.3 m, has 103 spectral channels in the wavelength range from 0.43 to 0.86 μm after removing noisy bands, and is used for urban analysis and remote sensing. Kennedy Space Center [104]: the Kennedy Space Center dataset was taken by the AVIRIS sensor in Florida with a spectral coverage ranging from 0.4 to 2.5 μm. This dataset contains 224 bands and 614 × 512 pixels with a spatial resolution of 18 m and is used for remote sensing and space exploration. |