Abstract

A distance metric known as non-Euclidean distance deviates from the laws of Euclidean geometry, which is the geometry that governs most physical spaces. It is utilized when Euclidean distance is inappropriate, for as when dealing with curved surfaces or spaces with complex topologies. The ability to apply deep learning techniques to non-Euclidean domains including graphs, manifolds, and point clouds is made possible by non-Euclidean deep learning. The use of non-Euclidean deep learning is rapidly expanding to study real-world datasets that are intrinsically non-Euclidean. Over the years, numerous novel techniques have been introduced, each with its benefits and drawbacks. This paper provides a categorized archive of non-Euclidean approaches used in computer vision up to this point. It starts by outlining the context, pertinent information, and the development of the field’s history. Modern state-of-the-art methods have been described briefly and categorized by application fields. It also highlights the model’s shortcomings in tables and graphs and shows different real-world applicability. Overall, this work contributes to a collective information and performance comparison that will help enhance non-Euclidean deep-learning research and development in the future.

1. Introduction

For decades, machine learning has been enriched in many dimensions in terms of rich data inputs, nobler algorithms, and output optimization. The computer vision’s primary goal is to analyze, process, and give meaning to digital images. To achieve this feat, machine-learning (ML) algorithms as per different methods are being developed day by day [1]. For text-based models or 1D inputs, K-nearest neighbor (KNN) models are being used with proficiency [2, 3]. Text-based inputs are more straightforward as only one dimension is enough to assess features. Then, came the 2D or image-based algorithms which are segmented with grids to analyze. Convolutional neural network (CNN) [4], artificial neural network (ANN) [5], and recurrent neural network (RNN) [6] models are established as an almost saturated model for image processing [79]. Until this point, these were directed with structured parameters and traceable features. While 2D models are tamed, 3D models have been the main focus in the computer vision department for the last decade. Modern GPU-based computers’ increasing processing power, the accessibility of the size of training datasets, and effective stochastic optimization techniques have all made it possible in recent years to design and successfully train complex network frameworks with numerous degrees of freedom. This sparked the field’s growth by enabling deep neural networks to significantly improve productivity with a large range of applications, starting from processing speech and language for machine processing to image processing and computer vision.

3D models are more potent in terms of extracting analytics, simply containing more information. Analyzing 3D datasets became a necessity with the requirements of analyzing massive information from social media with many parameters added with each node [10], observations of physical objects found in nature with archaeological values [11], analyzing the protein structures [12], 3D scanning in the medical department [13], and many more. Virtual reality is becoming more embedded in the world with each passing day, but still, it is in its infancy stage, with its usage only in the entertainment department. For practical embedding of virtual reality, flawless accuracy must be achieved in analyzing the 3D surroundings [14]. That said, 3D models come with non-Euclidean geometry, which brings a lot of difficulties while measuring features.

As 1D and 2D models are easy to organize and connectivity among nodes is prioritized, the CNN model can easily recognize and analyze the provided input patterns. These inputs are called Euclidean inputs that feature Euclidean geometry that can be defined with 2D shapes and figures following explainable mathematical rules. The main difference lies in Euclidean geometry being only on the same planes, whereas non-Euclidean geometry is the geometry in 3D places with infinite planes. Thus, conventional 2D geometry that has been used till now is useless in non-Euclidean geometry. That said, non-Euclidean geometry uses complex structures that hold more data. Euclidean geometry has only one plane to store data and, therefore, has less freedom for inputs. Existing models for 2D assessment are being upgraded for 3D analysis and unlocking their full potential to analyze data of multiple planes. Thus, new algorithms are being created to recognize the 3D analysis by non-Euclidean deep learning, also known as “geometric deep learning” [15], taking on a challenge to give structure to unstructured mesh grids, manifolds, and point clouds.

Data having a non-Euclidean spatial structure is of interest to many scientific disciplines. Social networks in the field of computational social sciences, sensor networks in information exchange, functional networks in neuroimaging, regulatory networks in genetics, and meshed surfaces in 3D modeling are just a few examples.

In social networking sites, user characteristics may be replicated using signals on the vertex points of the social graph. Distributed, interconnected sensors make up sensor networks, and their measurements are shown on graphs as time-varying signals. 3D objects are modeled in computer vision and graphics as Riemannian manifolds [16] with attributes like color, texture, and motion fields such as dynamic meshes. Since these data are non-Euclidean, it follows that they lack well-known characteristics like global parametrization, a standard set of coordinates, graph-based structure, and shift invariance. As a result, fundamental operations like linear combination and convolution, which are assumed to be clearly defined in the Euclidean context, are considerably less, so in non-Euclidean domains. This is a significant barrier that has prevented the application of effective deep-learning techniques, like convolution or recurrent neural networks, to non-Euclidean geometric data up to this point. As a result, fields like computer graphics and computational sociology have not yet experienced the practical and theoretical breakthroughs that deep-learning models have given to voice recognition, natural language, and computer vision. So the evolution of this deep learning begins.

Figure 1 describes practices of deep geometrical learning that started a while back starting with recursive neural networks (RvNN) in 1997 and are used on directed graphs [17]. A breakthrough in machine learning came in 1998 with the convolution of a neural network, which is further developed to assist in the creation of many more models for understanding 3D objects [18]. The reason for the possibility is local spatial feature extraction in multiscale. The first modified version of a graph neural network (GNN) [19] to calculate graph data started its journey in 2005 and fully came to attention in 2009 with improved performance using the SL algorithm [20]. Following that, the graph convolutional neural network (GCNN) [21] model was introduced to counter the difficulties of analyzing non-Euclidean manifolds. GCNN was followed by lookup-based CNN (LCNN) [22], which used a learned lexicon feature to encode CNN convolution. The diffusion CNN (DCNN) [23] also known as diffusion-convolutional neural network was the following method put forth to develop a diffusion-based model of nodes from network data to categorize them. A common technique for extracting the local feature from the graph was proposed in the work of GNN, which is comparable to convolutional networks that work based on images and on the inputs connected by local regions. ChebNet [24] was first proposed in 2016. Following that, an anisotropic CNN (ACNN) [25] model was introduced that added a shape factor to CNN for analyzing using shapes as a unit. PointNet [26] models that already existed were improved using the recursive structure the following year and established PointNet++ [27]. Afterward, GCN was suggested as a more specific variant. CayleyNet [28] was proposed a year later that used Cayley polynomials to existing GCN [29]. Two recently proposed models are anisotropic Chebyshev spectral CNN (ACSCNN) [30] which aggregates local feature values for effective signal collection in 2020 and UV-Net [31] which combines image data with GCN for low memory overhead cost and computational cost in 2021. Convolutional networks constituted the foundation for the research findings mentioned above for graph-based methods. Similarly, manifold or voxel-based algorithms saw their development over the years side by side using 3D CNN as the base model. Working inherently with geometric forms is commonplace in the field of computer graphics [32]. Here, 3D objects are often treated as Riemannian manifolds and discretized using meshes.

Many image-based machine-learning methods have been directly applied to 3D geometric data, with varying degrees of success, with the data being represented as range pictures [33, 34] or rasterized volumes [35, 36]. The primary problem with these methods is that they incorrectly assume that geometric data can be represented as Euclidean structures. To explain Figure 2, when dealing with complicated 3D objects, representations based on Euclidean geometry, such as depth pictures or voxels, may distort or even destroy the object’s topological structure, resulting in a considerable loss of information. Second, when an item is posed differently or deformed, its Euclidean representation will change [37].

There have been some reviews on geometric deep learning in the past few years. Bronstein et al. gave a sophisticated overview of this context. Quite a few general models with corresponding mathematical derivations with some wide extensive application in 2017 were described in detail [15]. Zhang et al. [38] reviewed the different types of deep-learning methods on graphs only [39, 40]. Later on, Cao et al. [41] reviewed some models with their mathematical derivations on graphs and manifolds.

This paper shifts the emphasis from a broad overview to computer vision exclusively, since the previous publications all concentrate on their models in general use cases across disciplines. Models like UV-Net [31] and MDGCN [42] presented more recently improved computer vision in an intuitive manner with effective visual representations for hobbyist computer vision researchers. In contrast to the cited publications, the strengths and weaknesses in terms of the performances of these models are discussed and summarized. Unlike previous works published in the modern age of computer vision, this one covers a broad range of cutting-edge applications. The sole contributions of this study are statistical findings, trends, obstacles, and recommendations for further research. This article provides a detailed examination of current deep-learning architectures in computer vision, together with an exhaustive account of their methodology and their applications.

Our contributions can be summarized as follows:(i)This paper offered a quick summary of mostly used and efficient non-Euclidean deep-learning models suggested during the last two decades. In addition, it offered theoretical and operational analyses of the model. It has visualizations, explanations, and mathematical reasoning.(ii)This literature categorized the models according to their underpinnings, such as spectral and spatial types. Newer hybrid frameworks, such as a spatial-spectral-based model, are also included. Each model was outlined together with its underlying logic and operational principles.(iii)To fully comprehend these models, it has summarized their key features, such as their uniqueness and their limits, into a performance table. Various numerical findings demonstrating performance metrics on their respective datasets are also included.(iv)To help put the spotlight on the current trend in this area, it has compiled a table summarizing the most recent uses of these graph-based frameworks in a variety of contexts.(v)To emphasize the importance, it provides data from recent research showing the current trajectory and promising future of the field.(vi)In light of these most recent work patterns and technical challenges, this paper analyzed potential future scopes.

This paper reviews the most recent deep-learning methods for computer vision on graphs with manifolds and the applications that traverse them. It started with the introduction in Section 1 of our article, where it gave a succinct history and evolution of the various models put forth over the years. It also discussed the importance and influence of our field in this section. The background research in the linked fields of the point cloud, graph, and manifold is detailed in Section 2, along with relevant theories. In Section 3, it has categorized every suggested methodology for non-Euclidean fields with simple math and descriptions. The models were divided into GCNs and manifolds, which were then divided into spectral and spatial subcategories. Section 4 is devoted to outlining several current algorithmic limitations, applications, and prospective future opportunities for the progress of our profession. In Section 5, it closes with a summary and nobility of our paper.

2. Background

Geometric deep-learning models use point clouds, shapes, graphs, and manifolds whose mathematical operations are described.

2.1. Point Cloud

Commonly used ways to analyze 3D objects are to use a laser scanner to gather a lot of data as a point cloud. Providing them inputs, analysis becomes much more complicated as all the nodes are undirected and independent [43]. The points are non-Euclidean and require feature segmentation for further calculation. Object recognition and detection take point clouds as inputs in computer vision tasks due to their availability.

In point clouds, a single point from the point cloud can be defined as , where is the reference point and represent the 3D position of the point [44]. As represents the three-dimensional space, navigating between two points is performed by radial rotation from a fixed point. The connecting line is defined by , in which is the distance between two locations and represents the angle between the axes.

While the point cloud needs a reference point to be located and used, the mesh feature allows the points to be referred through another point. Mesh form is built by point-by-point addition and reduction of some vertices by compression.

The data’s complexity necessitates using a B-spline curve for this kind of mathematical manipulation, as stated above. In this part, the B-spline curve will be discussed.

2.1.1. B-Spline Curve

Approximate 3D segmentation comes with a problem of denoting features as fixed or defined as on continuous curvature surfaces, and there are no reference points [45]. To counter the problem, the B-spline curve is presented, which is visualized in Figure 3. At the maximum bending portion, a V-shaped joint is provided called control points, whereas the corresponding points on the curve are presented as knot vectors [46]. The control points can be used as the reference point, and local features can be obtained.

The B-spline curve of degree in is exemplified by [47]:where are the B-spline basis function defined on the knot vectors and functions on knot vectors are denoted as [47]:where similar end-to-end points are denoted as ; the control point [48] is exemplified by

A B-spline curvature fits only on a fixed knot vector. Assuming for any point, when cloud , the B-spline curve is [48]:

The minimal distance between and is stated as , in which is the term for regularization and is the error coefficient. This regularization is defined as in [49]:

For , multiplicands of α and are denoted as and . Assuming to be the projection point on of , the local Frenet frame on at is given that is unit tangential and is a unit normal vector of curvature .

Let . The quadratic approximation for is given by [48]:

Thus, the local quadratic model [47] is

The approximation of F matches to the Gauss–Newton method for , and the approximation leads to intrinsic parametrization for .

2.2. Graph Theory

Graphs have been used for a long time for analyzing and enhancing grid inputs from pictures. For 2D pictures, grid inputs are directed graphs with components of link with direction [50]. Being successful in 2D analysis, 3D observation comes with a few difficulties to handle undirected graphs and different types of input of mesh grids. The undirected graphs are structured by using the discrete Laplacian and Fourier transform using eigenvectors to add local features.

Graph theories come with these challenges while analyzing:(a)Node classification(b)Graph categorization(c)Clustering of nodes(d)Prediction of link(e)Influence maximization

A graph [51] is defined by given that and represent vertices and edges, respectively. We assume and . Here, denotes the nodes and denotes edges among and . For , the adjacency matrix is given by an matrix stated as edge weights. Here, and . Graphs can be of two types: directed and undirected, based on the types of edges. The edges are connected with a direction for directed graphs, and undirected graphs have connected edges without a direction. Thus, for the undirected and for the directed graphs.

By analyzing the eigenvalues of the graph Laplacian matrix, spectral graph theory may provide insight into whether a graph is linked and the quality of that connection. The Fourier transform is utilized for eigendecomposition or the breakdown of a matrix into its parts for the graph’s Laplacian matrix. The term “convolution” is used to describe the process of multiplying the input neurons by a set of weights, sometimes called “filters” or “kernels,” in a graph. In this part, discrete Laplacian, Fourier transform, and convolutional operations on the graph will be discussed.

2.2.1. Discrete Laplacian on Graph Theory

Discrete Laplacian (also known as the Laplacian matrix) is a spectral graph machine-learning algorithm. The Laplacian matrix allows the creation of a link between discrete inputs of graphs and manifolds. The function provides a mathematically tractable solution to graph localization limitations. The eigenvalues at the adjacency matrix are an essential factor in localizing two graph vectors as similar graphs provide the same eigenvalues [52]. Thus, graph Laplacian is a must to understand the undirected graph.

The Laplacian matrix is given by . The degree matrix is denoted as such that . The Laplacian matrix can be of three forms [53]:

Assuming a graph that is undirected and straightforward, the adjacency matrix will only contain 1 and 0. Thus, the value of L is given by [53]:where denotes the degree of the node . The different forms of the Laplacian matrix are given by the type of the degree of the node.

2.2.2. Graph Fourier Transform

Fourier analysis on the graph is possible as eigenvectors of the Laplacian matrix represent similar values on the Fourier basis. The Fourier transform and its reverse enable a node to be present in two different scales [54]. Given that for any no of nonnegative, mutually orthogonal, and independent eigenvalues, the graph’s Laplacian matrix may be expressed as in [55]:

Given that is the eigenvalue and is the eigenvalue matrix such that , the significance of the graph’s vertices is represented via the values of Also, and is a matrix of eigenvectors. Since is orthogonal, it can be written as . Thus, will be [54]:

For using the eigenvectors in the Fourier transform, the eigenfunction must the transformed via the basis function . For any graph signal, such that is denoted as the value of node.

The Fourier transform can be expressed as and the inverse Fourier transform as . Here, eigenvectors are used as the basis, and the transform projects the input graphs on the orthogonal space.

2.2.3. Convolutional Operations on Graphs

Using the Fourier transform of a graph in the frequency domain, the undirected graph becomes well-directed and component-wise multiplication transforms. Convolution requires a filter to find out the compact on convolution layers. Now, for any input and using a filter , the result is [55] as follows:

Although the filters bring complexity with the number of nodes and do not clarify a lot, we use parametrization of . Thus, the resultant is [55] as follows:where is the polynomial parametrization, whereas has a degree of . The complexity becomes clearer as is -localized with known relation .

2.3. Manifold Geometry

A manifold can be defined as curved, but locally, it can be seen as flat, as in Figure 4. Thus, although a manifold is a non-Euclidean shape, it can be treated locally as a Euclidean model.

Considering a manifold of the topological space with the dimension number of , Also, let such that and are neighboring points. The inner product of the tangent space is denoted by given that represents the tangential space of the Euclidean part and is an abstract manifold capable of comparable measurements. For describing 3D data objects, computer vision takes two neighboring surfaces as input and embeds them in the space. Although the model is not completely taken as 3D, the completion of embedding creates a 3D shape to analyze.

2.3.1. Calculus Operation Manifold

Being manifold as non-Euclidean surfaces, calculus cannot be performed directly as declaring variables is not possible. However, various methods have been present to create smooth surfaces on a manifold that is known as a differentiable manifold. A differential manifold is just a space on a manifold where calculus can be performed. That said, manifold data can be segmented into many more spaces, and the calculus function must be applied to all of them. Thus, the difference between normal and manifolds is normal calculus works on only one dimension, whereas manifold calculus is structured with higher dimensions to adapt to multiple spaces.

We state that , such that f is a smooth function of manifolds defined in the scalar field, where the mapping function is given by , such that is a tangent vector at the point . Considering Hilbert space fields of as a scalar field and as a vector field, thus multiplication results [56] are

With being the area element, differentiating results in . Again, differentiation on the closest neighbor points is defined as . Now, applying the operation to the tangent vector, Now, a small displacement of x results in which is the gradient operator. Thus, divergence [56] becomes :

The relation between the gradient and divergence is as follows [56]:

Laplacian is found symmetric as in [56]:

For compact manifolds, the functions of x work similarly to the spectral analysis for graphs. Also, it can be seen that Laplacian stands as a vital part of analyzing non-Euclidean space.

2.3.2. Discretization of Manifold

Discretization is required for data conversion, such as from point clouds to manifolds. For discretization, a manifold [57] can be sampled by points, where the positions of nodes are stated as , thus creating a graph accordingly. The undirected graphs can be difficult to pose for the increase in nodes of a unit area. Again, the surface can be created as the mesh , where is the vertex, is the edge, and is the triangular face [57]: , , and . The triangular mesh must have the designated manifold boundary, and the edge length should be , satisfying the inequity of a triangle. For , cotangent weights are given by for the manifold’s Laplacian mesh.

3. Methods

Because of our familiarity with the background, we can now analyze many models, each of which uses non-Euclidean geometric data for a specific reason. This geometry, as said previously, can be characterized by graphs as well as manifolds. In network structures like social media or in general, it learns embedding that integrates knowledge about its surroundings [58]. Graphs are employed in these network structures. In addition, manifolds are used in the process of three-dimensional form, as well as on various complicated contour surfaces and in model analysis. Convolutional neural networks (CNNs) are the foundation, on which the vast majority of these models are typically built, as shown in Figure 5.

The primary applications of this network include image processing, classification, and segmentation, in addition to the processing of various types of autocorrelated data, although it contains GANs and GGNs along with GAEs, respectively.

3.1. Methods Based on Graph Convolutional Networks (GCNs)

In the actual world, graphs are the most popular form of data organization, so to deal with it, GCNs fit very well in a situation like social media connection analysis, protein model analysis, and traffic control. Every image and video can be presented as some grids or grid structure data. To manipulate it, we need the solution of a graph convolutional network. In modern technology, deep learning greatly impacts various developments like games, not just social networks. Image analysis is a testament to the effectiveness of deep learning, and it is effective in computer vision as well. Recently, researchers have been trying to evolve the architecture based on the graph using some traditional models like the CNN, long short-term memory (LSTM), attention mechanism (AM), and autoencoder (AE) for more efficient performance.

However, it incorporates some problems. As images could be of different types, their complexities also vary. As the image contains more complex data (i.e., different lighting conditions, light, shades, and complex contour structures in various colors of light), it becomes more difficult to extract the actual shape from the image. Technically, nodes of the graphs may vary in large numbers. As a result, the convolution operation is very likely difficult in this situation, and this includes another problem, like data size, as the nodes of the graphs vary. This introduces new rising problems each time to these algorithms.

Now, this paper will classify this graph-based model into three different parts according to their basis of analysis or method of working:(i)Spectral-based GNN(ii)Spatial-based GNN(iii)Spatial and spectral-based

Researchers try to conclude a common form of convolutional network that may work in any scenario. The main theme of this network is shown in Figure 6. It tries to make an input graph from the image, set the target, learn from its neighbor node, and aggregate those to form usable data like in Figure 7.

This GCN is divided into two kinds. Graph signal processing is inspired by spectral graph theory, which is the foundation of graph convolution [59]. Alternately, spatial domain convolution is the second.

Based on the context, it expressed the graph Fourier Transform and its inverse equation. The equations distinguish between spectral and special domains of a graph. represents the spatial domain, while indicates the spectral domain. Consequently, if a signal is used in the spatial domain, is likewise significant in the spectral domain. These signals are typically referred to as kernels, and the Fourier coefficient of a graph often decays fast. The signal is compressible since its Fourier coefficients may be calculated from a few graph coefficients.

3.1.1. Spectral-Based GCNs

The spectral-based GCN model that has been constructed cannot be applied directly to graphs; nevertheless, because of its effective feature extraction capacity, it may be highly beneficial to extract features [60]. It is possible to define non-Euclidean convolution using it, and by analogy, it may be used to describe the relation in terms of the frequency domain. In recent times, the graph Laplacian matrix has been used directly because its convolutional layer architecture is so effective.

(1) Spectral CNN. The “specifying” architectural cluster in spectral CNNs includes additional inputs. A vector representing the network’s most recent output distribution is sent to the specified input for each training dataset on each training step. The propagation of this input then proceeds in a conventional feed-forward manner, with the specification of a cluster and network layer at the conclusion. This extracluster is likewise subject to a learning rule. This strategy comes in a variety of architectural forms: using a single-layer structure or a multilayer structure, we link the outputs of the specified cluster with the output layer and a hidden level of the network [61]. With the use of this model, convolution filters are altered to provide greater optimization capabilities via complex-coefficient spectral parameterization. Competitive outcomes on classification and approximation tasks were accomplished without the need for dropout or max pooling thanks to a more recent method of randomized change of resolution.

(2) CayleyNets. The CayleyNet model is a modified model of ChebNet [24]. ChebNet uses Chebyshev filters that avoid expensive computation without using eigenvectors. The main drawback of the model is that it cannot produce narrow-band filters. It occurs when there are eigenvalues clustered around minimal frequencies and the spectral gap is high [62]. CayleyNets add a new type of filter that takes the simplicity of the Chebyshev filters and can also produce narrow-band filters to counter disadvantages. The real value of a complex function is determined as the Cayley polynomial of order [24]:where is given as a vector for a single real coefficient and is the coefficient of complex. is denoted as the zoom parameter. For a real signal , the Cayley filter is given as and defined by [24]:

Parameters of and are optimized in training. The filter works with the basic calculation of matrix operations similar to ChebNet. Thus, no eigendecomposition is required for the filter to work. Cayley filters are based on the rational function of Laplacian and named ARMA filters. As general ARMA filters require matrix inversion, there is no way to ensure stable inversion as the training path is unknown. The Cayley filters to ensure inversion are stable. Also, the general ARMA filter uses a larger number of parameters which are overfitting for the objective, whereas the Cayley filter uses a moderate number of parameters.

The model presents a new class of extremely regular, localized, complicated rational Cayley filters that can represent any smooth spectral transfer function. The fundamental characteristic of the model is its ability to maintain localization in the spatial domain while specializing in narrow frequency bands with a limited number of filter parameters.

(3). UV-Net (Boundary Representations). U and V parameters of curves and surfaces are clearly expressed, while an adjacency graph explicitly defines topology in a boundary-representation data model. That happens when the user combines convolutional neural networks with image processing to create UV-Net [31], a network that both memories and computes economically.

Numerous topological components, such as faces, edges, half-edges, vertices, and their connections, compose the boundary-representation data model. With a few clicks, it extracts the most critical geometric and topological data from the boundary-representation and transforms it into a format suitable for current neural network architectures [31].

The UV-Net representation offers some benefits:(1)For both primitive and geometric surface types, curve assessment of parameters can be applied quickly and easily [31](2)The representation is sparse and proportional to the number of B-rep contour surfaces(3)The grid is mostly independent of precise parametrization

Using graph convolutions, local curves as well as surface characteristics are conveyed over the whole boundary representation. Curve and surface convolution is performed by taking 2D UV-grids. For message passing, contour CNNs are hidden features considered input edges and node features of the GNN. We calculate the hidden node features in the graph layer by combining all the input features of the node from a one-hop neighborhood while conditioning them on the edge features [31] :

Here, is an MLP, or in other words, multilayer perceptron along with two FC (fully connected) layers, is a parameter to differentiate the center nodes from the neighbors, and represents linear projection from the edge to the node feature space [31]. End-point features influence a concealed edge feature. The following recursive model underpins this learning process [31]:where is also an MLP having 2 layers. Final shape embedding is obtained by projecting (linearly) these characteristics into 128 D vectors and summing them [31]:

The model utilizes existing image and graph convolutional neural networks and can operate on B-rep data. On both supervised and self-supervised tasks spanning five B-rep datasets, advantages and adaptability are demonstrated, outperforming other representations such as point clouds, voxels, and meshes. A fresh synthetic B-rep dataset with differences in geometry and topology was once more introduced as SolidLetters.

(4) CurvaNet. Analyzing 2D images with the regular grid is much less challenging than 3D images using mesh surfaces or manifold input. Although traditional GNN models segment and use smaller surfaces, considering them flat, the model cannot differentiate higher surface changes due to a lack of data. CurvaNet modifies the GNN model by integrating differential geometry [63]. Data accuracy is ensured by downsampling by mesh pooling and upsampling by unpooling operation using an encoder and decoder. Thus, minimization of classification error is done by considering more input properties. The architecture is quite similar to the U-Net model, which runs through the curvature filter (CF) and graph convolution filter (GC) for segmentation. Skip connection is used to preserve precise boundaries [64].

A directional curvature filter negates fixed curvature limitations such as data loss and underfits or overfits. Addressing weight parameter direction is easier by segmenting all directions with many tangent vectors. The vectors must have a unique origin to point out null values. To ensure fixed parameters are provided for pool rotation for different angles. Graph convolution layers are used for sampling a neighborhood using the graph Laplacian matrix. ChebyNet24 and graph attention network (GAT) 56 are used for the function. Afterward, the properties of curvature are conserved by downsampling and upsampling. Let be the nonlinear activation function and and be the shared kernels. The feature matrix of curvature yields , where is the interval number and is the maximum number [63]:

To learn the directional curvature features at each vertex on a mesh surface, the model offers a unique convolutional filter. The mesh surface’s curvature features are sent using graph convolutional methods. A U-Net-like hierarchical structure that downsamples and upsamples a mesh surface dependent on mesh simplification was presented to make use of multiscale curvature features.

(5). Anisotropic Chebyshev Spectral CNNs (ACSCNNs). Anisotropic Chebyshev spectral CNN [30] is a new shape correspondence architecture based on manifold convolution. Extended convolution operators combine local signal characteristics by a series of directed kernels around each point, capturing additional signal information. Based on multiple anisotropic Laplace–Beltrami operator (LBO) eigendecomposition, spectral filtering is used to train kernels. To decrease computing difficulties, trainable Chebyshev polynomial expansion coefficients are used to represent spectrum filters [30].

The manifold X [30] is defined byand is the area element.

Vallet and Lévy [65] observed that the eigenvalues and eigenfunctions of the LBO are comparable to the frequency as well as Fourier basis in the Euclidean space. The LBO can be described [30] as

The inner product is known as the Fourier transform (coefficient) for manifolds, because the eigenvalues and eigenfunctions of LBO have periodic properties. Its inverse Fourier transform for the manifold can be expressed [30] as

To define the convolution theorem based on manifolds [30],

Anisotropic LBO [66] can be defined as

As illustrated in the following model, ALBO is redefined [66] as

Here, is a revolution about a surface with a normal angle on the direction of the tangent, is a thermal conductivity tensor, and parameter controls the anisotropic level [30].

Instead of employing anisotropic heat kernels in [25], it aims to learn kernels that depend on tasks by learning their parameterized filters . As stated in [24, 67], a polynomial filter may be used to solve these problems. Chebyshev polynomials are adopted to the filter , and due to its properly functioning, repetitive relation eliminates the eigendecomposition of ALBO and learning becomes easier [24, 68]. If the Chebyshev polynomial is of order n, the filter can be expressed [30]:

An extension of the manifold convolution operator, the model suggests the anisotropic convolution operator. Due to its direction-based consideration, this sort of anisotropic convolution enables a more thorough capture of the intrinsic local information of signals when compared to earlier works. To simplify the computation, Chebyshev polynomials are used to express the filters with trainable coefficients. In certain cases, the achieved outcome was superior to that of the earlier models.

3.1.2. Spatial-Based GCN

Unlike the spectral base, this convolution can be used directly because the kernel size is fixed. So it must need to select the neighbor of concern to be convoluted in a traditional manner. It uses pseudocoordinates by the filter function. These pseudocoordinates accumulated at the time of convolution. The most difficult aspect of developing CNNs that function with core nodes that have a variety of neighboring nodes is establishing local invariance for such CNNs.

The first intrinsic version of CNNs was introduced by Masci et al. [16]. Then, evolution began by Boscaini et al. [25] by introducing anisotropic heat kernels. The more general framework (MoNet) was then introduced by Monti et al. [69] to develop the deep convolutional architecture on graphs and manifolds. Then, the B-spline-based filter was introduced by Fey et al. [70], which works quite efficiently in the input of arbitrary dimensionality.

(1) Diffusion CNN (DCNN). The motivation for diffusion CNN is that a form encompassing graph diffusion can serve as a more reliable foundation for forecasting than a graph alone. A simple method for including contextual information about things that are calculated in polynomial time and effectively used on the GPU is provided by graph diffusion, which may be repressed. Many methods, such as probabilistic structural models and kernel methods, incorporate depth information in classification tasks; DCNNs offer a supplementary strategy that significantly improves predictive performance at node classifications [23]. When performing node classification tasks, diffusion-convolutional neural networks outperform probabilistic relational models and kernel approaches thanks to the representation that captures the effects of graph diffusion. Diffusion processes perform a good job of representing nodes, but they are ineffective in summarizing complete graphs.

(2) Graph Neural Network (GNN). The graph neural network is a supervised neural architecture that works well in terms of the graph and node-based applications. Two existing concepts are combined into a single framework by this model. The neural network model will be referred to as the GNN. Both random walk models and recursive neural networks are shown to be extensions of the GNN and to retain their features. The model expands repetitive neural networks because it can handle node-focused tasks with no processing stage and can analyze a wider range of graphs, covering cyclic, oriented, and undirected graphs. The method broadens the range of processes that may be described and adds a learning mechanism to random walk theory [19].

The model offers a cutting-edge neural network architecture that can handle inputs from cyclic, directed, and undirected graphs or a combination of these. The diffusion of information and relaxation mechanisms are the model’s foundations. Analysis of the outcome shows that the strategy is also appropriate for huge datasets.

(3) GraphSAGE. To accomplish the objective of node categorization, GraphSAGE models fully utilize the attribute information, structure, and knowledge of nodes in social networks, as well as mine the implicit mutual information among nodes. The best performance of a graph neural network may be comparable to that of the graph WL ISOMORPHISM test when the data structure (update feature plus aggregate feature) in graph networks is singular. An enhanced GraphSage [71] method is used to create the model for learning about GraphSAGE.

The model offers a revolutionary method that makes it possible to effectively create embeddings for invisible nodes. GraphSAGE successfully balances performance and runtime via sampling node neighborhoods, regularly outperforms state-of-the-art baselines, and offers a theoretical analysis that sheds light on understanding local graph structures.

(4) Large-Scale Graph Convolution Network (LGCN). The algorithmic structure cooptimization to speed up large-scale GCN inference on FPGA is provided to combat the significant expense of external storage access when evaluating the graph-structured dataset. To comply with on-chip storage restriction, first, data splitting is executed. Then, to decrease computational complexity and improve data locality, a two-phase preprocessing approach is created. The main computational kernels are mapped on an FPGA during hardware design, and data transmission for pipelined execution occurs through on-chip memory [72]. Varied GCN architectures and various analytic orders are supported by the data path.

The suggested architecture allows for the application of standard convolutional methods while transforming generic graphs into data with grid-like patterns. Transformation is carried out using a brand-new k-largest node selection method that ranks the values of node features.

(5) Mixture Model CNN (MoNet). For analyzing non-Euclidean geometries of GCNN and ACNN for graphs, GCN and DCNN models are proposed, but they come with some shortcomings. For analyzing, each segment of the shape requires a separate local function, while the functions are not learnable to use on similar locals. Using parametric construction which can be used in similar localities, a different framework of a mixture model network is presented [69]. Operations are convolution working on spatial-domain methods using local parameters of ‘patches’ of manifolds or graphs [73]. The patches create a local function that can be represented as a Gaussian kernel mixture.

A general spatial framework is used in mixture model CNNs for graphs and manifolds. For any point on the manifold and the vertex y of the neighborhood N such that , a vector of dimension d and pseudocoordinates are denoted as . A weighting function is achieved using learnable parameters. Using fixed parameters of Gaussian kernels and geodesic coordinates, the mixture model CNN can be reverted to GCNN, ACNN, and GCN models. The mixture model is based on parametric kernels by using learnable parameters denoted as follows [69]:where is the covariance matrix and is a mean vector constructed from a Gaussian kernel. Covariances are restricted to 2D and 2jD for patch operators.

To achieve deep learning, the aggregate ensemble CNN model integrates two distinct convolutional neural network architectures. Two deep-learning networks—AlexNet66 and NIN67—are integrated to calculate the weighted average of feature vectors. Based on AECNN modeling runs, we see that the aggregate model outperforms the single-CNN ensemble model in terms of classification accuracy and retrieval precision for images.

(6) SplineCNN. SplineCNN is a modified version of CNN for countering the non-Euclidean geometry focused on using B-spline. The model uses the convolution of spline bases, and the convolution layer takes undirected data with a directed graph as input [70].

A trainable set is used to aggregate the node features in the spatial layer. Represented by , the node features are weighted by the continuous kernel function [74]. The spatial relation of the nodes is stated by pseudocoordinates in U. Presenting no restrictions on U, no values are lost in a local neighborhood as it can contain edge weights, features of nodes, and local data.

A continuous kernel function is used for the convolution operation that uses B-spline bases. Constant values of trainable sets are used for parametrizing the function. For computation efficiency, all input outside the preset interval is set to zero. Now, considering a B-spline curvature of degree and a trainable variable for every element p formed from the Cartesian product [70], while l is the input feature. For as input, the trainable parameter can be defined as . Now, defining the continuous convolution function [70] aswhere represents the product of basic functions. Relating to traditional CNN, SplineCNN uses a normalization factor as an extra and a differently functioned filter. Otherwise, SplineCNN can also take 2D inputs with a few kernels to analyze the dataset.

On irregularly structured, geometric input data, the model learns. The proposed convolution filter combines nearby information in the spatial domain by using a trainable continuous kernel function with trainable B-spline control values. SplineCNN is the first architecture that enables robust end-to-end deep learning directly from geometric data.

3.1.3. Spectral- and Spatial-Based Models

In this method, models are benefited from both spectral and spatial characteristics. Their combined features use for better data extraction.

(1) Multiscale Dynamic Graph Convolutional Network (MDGCN). GCN is capable of performing convolution on non-Euclidean data and is ideal for irregular image regions represented by graph topological data [42], so that these two stages may work together to produce discriminative embedded features and a revised graph, and the graph must be constantly changed while the graph convolution technique is running. Simple linear iterative clustering (SLIC) [75] is utilized when a hyperspectral image is supplied as input. This approach creates homogeneous superpixels. Then, at different spatial scales, graphs are constructed on the top of these superpixels. Following that, the input graphs are further refined by performing convolutions on them, which simultaneously acquire and accumulate multiscale spectral-spatial features. In an ideal embedding space, superpixels possibly belonging to the same class will be grouped. Finally, the well-trained network produces the categorization results in [42].

Hyperspectral image categorization is the name of the proposed model. During the convolution process, MDGCN utilizes dynamic graphs that are gradually refined. As a result, the graphs can accurately encode the inherent similarities between image regions and aid in the discovery of precise region representations. To completely utilize the multiscale information and gain hidden spatial context with superior results, many graphs with various neighborhood scales are built.

3.1.4. Comparison between Spectral- and Spatial-Based Models

The model that is based on spectral analysis is far more effective than the spatial one. Although the intricacy of the calculations grows more difficult as the size of the graph rises, it uses the eigendecomposition technique inside this convolution which may modify the results [67]. A summary of their main points is shown in Figure 8.

On the other hand, spatially based models are more beneficial in terms of huge graph-based applications. These models accomplish localization by grouping the nodes in their immediate surroundings.

The most significant thing to note is that it may be applied directly to the graph data in contrast to the spectral one. While a model based on spectral analysis has difficulty dealing with graph-based input data, a model based on spatial analysis can very effectively handle many sources of graph input (such as edge features and edge directions).

Because it functions well only on certain preset graphs, the spectral-based approach only has a restricted range of applications. To put it another way, it is improbable that the model trained for one application would function well for another, because its graphs and Laplacian nature are unique.

On the other hand, on a spatial basis, it is somewhat reliant on each node of graphs. Because of this, it is used extensively in three-dimensional form and structural analysis (e.g., shape correspondence on the FAUST dataset). Because of this, it is relatively easy to apply this model to a variety of roles and structures. Because of this, the spatial model is broader and getting more appreciation day by day.

3.1.5. Apart from GCN Architecture

This section will provide an overview of additional graph neural networks. GAN and GNN are described here.

(1) Graph Attention Networks (GANs). An attention-based architecture called graph attention networks is used to classify nodes in graph-structured data. The goal is to use a self-attention method to monitor each node’s neighbors to compute each node’s hidden representations. The attention structure has several intriguing characteristics, such as efficient operation, because it may be parallelized over node neighbor sets. By assigning arbitrary values to neighbors, it may also be used to graph nodes with varying degrees, and the method is capable of direct inductive learning issues, including challenges where the algorithm must generalize to wholly unknown graphs [76].

(2) Graph Generative Network (GGN). The difficulty of creating a graph structure for expanding graphs having different nodes that are disconnected from the previously observed graph is overcome by graph generative networks [77]. The slow response issues in social platforms and recommendation systems it has significant significance. The fundamental generating process is assumed to be stationary during growth. Neither node characteristics nor natural extension to new, isolated nodes is utilized by graph RNN [6]. Similar problems plague the majority of alternative graph representation learning techniques; notably, the separation from the existing graph makes it difficult to apply aggregation or pass messages. Understanding how graph architectures are generated consecutively for situations where node characteristics and topological information are both present and for situations in which only node characteristics are accessible solves this problem.

A sequential generative model for developing graphs is proposed that combines graph representation learning and graph convolutional networks. Scalability, however, is still a significant problem because it depends on the size of the entire graph.

3.2. Method Based on Manifolds

As discussed earlier, manifolds are nonconventional geometry having a complex shape. Figure 9 describes the manifold differently as its smallest portion can be explained as conventional geometry like a rectangle. The manifold-based model works based on this rule. Manifolds can be of different complex shapes so does nature have.

3.2.1. Voxel-Based

Bounded by small boxes, 2D space images cannot figure the depth of 3D spaces or be further analyzed as the data become blurry in smaller portions as in Figure 10. Voxel-based data create an object by segmenting them piece by piece with a 3D object, thus creating a richer dataset capable of in-depth analysis with higher accuracy.

(1) ShapeNet. ShapeNet advances CNNs to non-Euclidean manifolds, demonstrating how to use them to create invariant shape descriptors. Utilizing a local network of geodesic coordinates, ShapeNet produces “patches” that are then put to a range of techniques and linear as well as nonlinear operators [78]. These filters’ parameters are optimization parameters that can be trained to reduce a loss function that depends on the task at hand. Because of the framework’s considerable flexibility, different descriptors can be obtained depending on the requirements by combining several layers with various configurations. CNNs are expanded to manifolds by using the idea of geodesic convolution. The design, known as ShapeNet [78], is made up of numerous tiers that are applied in succession, meaning that the result of one layer advances as the input for the next process. The depth of the model is measured by the number of “hidden” layers that exist between both the input and output levels. The levels are followed as fully connected, ReLU, convolution of geodesic, angular max pooling, and Fourier transform magnitude.

The Siamese neural network [79], a well-liked architecture that has been extensively employed in metric learning tasks, is how ShapeNet is trained. A Siamese network comprises two identical models with the same parameterization and is fed by pairs of data that are purposefully similar or distinct. The loss is minimized in the model [78]:where denotes the differential parameter between the losses [78],where is the variable of the ShapeNet model. The set of layers is given as , and the negative parts are pulled at margin .

With the use of non-Euclidean manifolds, the model suggests generalizing convolutional neural networks to learn hierarchical task-specific features. The model is extremely flexible and general, and it may be made arbitrarily complicated by stacking additional layers. The model improves on various prior shape descriptor approaches by energizing them.

3.2.2. Multiview-Based

If 2D views for different surfaces are combined, a good idea of the 3D object can be obtained, as shown in Figure 10. Based on this fact, the multiview model takes different 2D features as input, as in Figure 11(a), and combines them by view pooling for analyzing 3D objects, as in Figure 11(b).

(1) Multiview Convolutional Neural Networks (MVCNNs). A conventional CNN architecture taught to detect forms’ generated viewpoints that are not linked to one another shows a 3-dimensional shape that can be detected from a single view with greater accuracy than using 3-dimensional form descriptors of the highest quality [33]. Multiple shapes improve recognition rates. This new CNN architecture integrates many perspectives of 3D geometry into a single description for improved recognition. This multiview depiction of 3D forms is useful for many activities. First, we utilize existing 2D picture attributes to create a view description. This is the simplest way to use multiview. Multiple 2D image descriptors per 3-dimensional form, one per view, must be integrated for recognition jobs [33].

For image descriptors, two types of image descriptors for each 2D view: a state-of-the-art “hand-crafted” image descriptor based on Fisher vectors [80] with multiscale SIFT and CNN activation features [81] are used in this model. One-versus-rest linear SVM was trained to categorize forms using picture information [33]. A measurement of distance or likeness is essential for retrieval tasks. Taking shape as along with as image descriptors and shape with image descriptors, the space around them is calculated as in equation (42). The space between two 2D images can be expressed as , and the space between their feature vectors is 2. So it is as follows [33]:

The model suggests using these several 2D projections, which produce excellent discrimination performance. Compactness, efficiency, and improved accuracy can be attained by creating descriptors that are aggregations of data from many viewpoints. Additionally, these 3D shapes can be recovered using sketches with high precision and take advantage of the implicit understanding of 3D shapes included in their 2D views by connecting the information of 3D shapes to 2D representations like sketches.

3.2.3. Difference between Volumetric CNN and MVCNN

A 3D form is encoded in the volumetric representation as a 3-dimensional tensor of binary or real values. Oppositely, the multiview representation organizes a 3-dimensional form as an accumulation of multiple perspective representations. It seems intuitive that the volumetric representation should be able to input extra information about the characteristics of three-dimensional structures rather than the multiview representation. The most important highlights are shown in Figure 12.

However, Qi et al. [36] replicated the tests using a grid of 30 voxels with 3D ShapeNets with multiview CNNs on the ModelNet40 dataset. According to the data, the categorization performance of individual algorithms suggests that the volumetric CNN with voxel-dependent performance is 7.3% less accurate than the MVCNN [82]. There are at least two probable explanations, including the input data performance and the diversity in network architecture [36]. However, when both networks are fed equal levels of information, the accuracy of the classification of MVCNNs is much higher (89.5%) than that of 3-dimensional ShapeNets (84.7%). In this experiment, data from MVCNNs are combined with sphere representations of the grid of 30 equal in height, width, and length. Even with lower features (resolution) of input data, the classification performance of the MVCNN is much greater than that of 3D ShapeNets. This shows that the design of VCNNs has many opportunities for improvement.

3.2.4. Point-Based

The advantage of the point-based approach is that it may take point cloud data as input without first transforming it to voxel, mesh, or another type of 3D representation. A point cloud is a combination of geometrically important points that form a structure. In place of a large number of benefits, there are certain difficulties, such as sparsity and the unpredictability of the geometric data. Because it may be used effectively in various contexts, researchers are becoming more interested in this model. As a direct consequence of this, there is persistent progression [83, 84].

(1) PointNet++. PointNet++ [27] is a hierarchical structure of the neural network that performs recursion of PointNet [26] on layered partitioning of the input. It is a direct successor of PointNet. Although PointNet was the very first DNN capable of manipulating 3D point clouds natively, several networks have since been developed. It learns the spatial coding of every location in the input cloud and then, by aggregating all the features, it determines the global characteristics of a point cloud. However, PointNet++ removes its shortcoming by solving how a local division of a point cloud can be carried out and how local characteristics of a point cloud can be extracted. In other words, it learns about local characteristics with increasing contextual scales [27]. Using an analysis of metric space lengths, Qi et al. [27] claimed that it is capable of learning features very robustly, even in nonuniformly sampled point sets. The way it extracts features can be briefed into three sections:(a)Sampling Layer. The sampling method is known as FPS, or farthest point sampling, which begins its work by picking a random series of points out from point cloud functioning as its input.(b)Grouping Layer. The objective of the grouping layer is to construct local areas before extracting characteristics. More specifically, this research uses the neighborhood ball approach rather than the KNN algorithm since it is feasible to ensure a set area scale. Multiple subpoint clouds are created by employing surrounding points surrounding centroid points (within a defined radius) [27].(c)PointNet Layer. As described earlier, it uses the PointNet algorithm to originate preliminary assumption and then run it through some iterations to get closer to the extraction of features.

In terms of 3D shape, there may be an ununiform density of sampling points like the perspective effect, and radial density variations cause great trouble learning features. Qi et al. [27] proposed an abstraction layer that may aggregate information from multiple aspects according to local point densities. By this algorithm, the author comes up with an accuracy of 90.7% in ModelNet40 [83].

The model is suggested for handling sampled point sets in the metric space. PointNet++ efficiently learns hierarchical features concerning the distance metric by performing recursive operations on nested partitioning of the input point set. Two unique sets were suggested abstraction layers that intelligently aggregate multiscale data following local point densities, producing improved results to address the problem of nonuniform point sampling.

(2) Taylor GMM Convolutional Network (TGNet). The Taylor GMM convolutional network constructs a graph pyramid through clustering point clouds. In each layer of the pyramid, local regions are abstracted progressively. For learning local features, TGConv is applied and the targeted data are again interpolated at a finessed scale at each layer. On a similar scale, the features are interconnected [85]. TGNet uses limited computation, and information losses occur. To counter the issue, MLP can be applied to input for conserving information along the process. Finally, sampled features and the finessed scale are combined for per-point segmentation.

The TGNet model is driven by TGConv [86]. Let a graph formed of a point cloud , where is the collection of vertices and is the collection of edges [86]. Every directed edge has 3D pseudocoordinates defined as given that is the set of the coordinates. We consider all point , where is the neighbor set vertex which indicates a 3D vector of coordinates of . Let be the set of input features of the vertex. For as a feature dimension, the features are associated with a related graph . Local coordinates are generated from input characteristics using Taylor kernel functions [86] with Gaussian weighting. The learnable weighted functions are denoted by . The convolution function is performed by the aggregation of the feature sets and is given by [86]:

The model suggested altering the linear combination of convolutional feature maps in the conventional convolutional operation of CNNs to collect detailed high-frequency and low-frequency information. It is shown that TaylorNets have a nonlinear combination of the convolutional feature maps based on Talyor expansion 87. The steerable module created by TaylorNets is generic, making it simple to include in various deep architectures and to be taught using the same backpropagation algorithm pipeline. This results in a higher representational capacity.

(3) MongeNet. In the ShapeNet model, triangular meshes are used for sampling 3D surfaces, but sampling gets irregular as the surface angle changes and clamping or undersampling occurs. The problem can be defined as a transport problem of discrete measures and simplex. MongeNet is a neural network working as a uniform mesh sampler to counter the mentioned problem [87]. Computation is performed on GPUs and batchwise across triangles. This model’s direct competitor is the current cutting-edge sampler PyTorch3D. Test findings suggest that, for a moderate increase in computational cost, the model provides greater performance than the previous model.

MongeNet is proposed to replace the already established sampling technique using a triangle mesh. The model minimizes 2-Wasserstein distance which implies favorable transport distance for segmented Euclidean metrics [87]. MongeNet approximation relies on solving convex which relies on Laguerre tessellation for computing the optimal distance.

Generally, solving the resulting point cloud should consume a lot of time with a high cost, but the model proposes learning optimal positional points using a feed-forward neural network with satisfactory approximation and fast calculation. Stating that the network is denoted as , where is the learnable parameter. The input is the triangle with limited output points such that and random noise that follows the normal distribution, and the output provided is a random order of . For training set D with components , the sampled points [87] are

Stating as optimal transport and , the loss function is given as follows [88]:

The model overcomes drawbacks of the common mesh sampling approach used by most 3D deep-learning models, such as its proneness to erroneous sampling and clamping, which leads to noisy distance estimates. For a small additional investment in computer deep learning, MongeNet outperforms already used methods, such as widely used random uniform sampling.

3.2.5. Spatial-Based

Spatial convolution is the implementation of graph-based convolution processes directly. Being the size of the standard convolution kernel predefined, a predetermined-length neighborhood must be selected for convolution if standard convolution is performed on a graph. However, graph nodes often contain a variable number of neighbors despite data with a normal grid form. The graph convolution technique mimics the image convolution process and is constructed from spatial node relationships. Similar to the central pixel in normal CNN 3 × 3 filters, the presentation of a center node depends on the aggregate output of its neighboring nodes.

(1) Geodesic Convolutional Neural Networks (GCNN). The GCNN [16] model is an extension of non-Euclidean manifolds of the convolutional neural network (CNN) paradigm. This local geodesic framework of polar coordinates is used to extract “patches,” which pass through a series of filters including linear and nonlinear processes. To reduce a task-specific cost function, the values of the filters with linear combination weights are optimized variables. Utilizing the diagonal results of heat-like operators yields a variety of very well spectral-shaped descriptors. Thus, spectral descriptors may be utilized as heat kernel signature (HKS), wave kernel signature (WKS), and optimal spectral descriptors (OSDs) [16]:

Mapping is performed by , and the value of the function is in the range of to the neighbor polar coordinates , in which is the radial interpolation weights having a geodesic distance from revolving around , represents the angular weights derived from a collection of geodesics radiating from x in direction , indicates the surface component of the Riemannian measure, and localizes a weighting function around [16]. in GCNN converts the values for the function around the node into the regional polar coordinates , hence forming the geodesic convolution [16]:where acts as a filter. GCNN is composed of numerous consecutively applied layers. The layer is differentiated as follows:(1)Typically, the linear layer comes after the input layer and before the output layer to modify the input and output sizes by a linear function.(2)The usual Euclidean convolutional layer is replaced with the geodesic convolution (GC) layer.(3)The angular max pooling layer combines the GC layer to estimate the optimum filter rotation [16].(4)The FTM layer is an extra constant layer that performs the patch operation to every input dimension, proceeded by rotational coordinates as well as actual value Fourier transform.(5)The covariance (COV) layer is utilized in recovery applications that need the aggregation of point-wise descriptors into something like a descriptor of global shape [89].

The model was developed for uses like shape correspondence or retrieval to learn hierarchical task-specific features on non-Euclidean manifolds. Our model is extremely flexible and general, and by stacking additional layers, it may be made arbitrarily complicated. By altering the local geodesic charting process, GCNN could be used for different form of representations, such as point clouds.

(2) Anisotropic Convolutional Neural Networks (ACNNs). These networks are generalization of standard CNNs to non-Euclidean entities in which traditional convolutions are substituted by projections over a set of centered approach anisotropic diffusion kernels [25]. As spatial scaling functions, anisotropic heat kernels retrieve the inherent regional representation of a function defined on the manifold. This ACNN architecture is a CNN. This patch operator design is far easier than those of GCNN, irrespective of the manifold’s subsurface diameter, which is not limited to triangular meshes. The basis of this method is the generation of regional geodesic polar coordinates utilizing a technique that may have been used for fundamental shape context descriptors [90].

ACNN translates heat kernels as a regional weighted function and builds the patch operator as follows [25]:and for some anisotropy level,  > 1. is the anisotropic heat kernel that indicates the quantity of heat transmitted at the period from the point to the point [25].

Convolution [25] can be described as

The major curvature direction is primarily used as the reference 0 in ACNN’s creation. The most potential future work path is the use of ACNN to graph learning. GCNN and ACNN approaches work in spatial domains; avoiding the limitations of standard spectral approaches with varied domains, these techniques have proven to be more successful than traditional hand-crafted methods in locating deformable shapes.

Convolutional neural networks are generalized to non-Euclidean domains in the proposed model, which enables deep learning on geometric data. The work, which is currently the most generic intrinsic CNN model, continues the very recent trend of applying machine-learning techniques to computer graphics and geometry processing applications.

The provided models are fairly efficient and beneficial for a variety of reasons; therefore, their results and analyses are quite valuable for surveying their efficacy on various datasets. The accuracy and error of the previously stated models are summarized in Table 1 together with their related datasets.

A second perspective was studied, based on the Scopus string TITLE-ABS-KEY (geometric AND deep AND learning, OR graph, OR manifold) AND (LIMIT-TO (PUBSTAGE, “final”)) AND (LIMIT-TO (PUBYEAR, 2022) OR LIMIT-TO (PUBYEAR, 2021) OR LIMIT-TO (PUBYEAR, 2020) OR LIMIT-TO (PUBYEAR, 2019) OR LIMIT-TO (PUBYEAR, 2018) OR LIMIT-TO (PUBYEAR, 2017) OR LIMIT-TO (PUBYEAR, 2016) OR LIMIT-TO (PUBYEAR, 2015)) AND (LIMIT-TO (SUBJAREA, “COMP”)) AND (LIMIT-TO (EXACTKEYWORD, “Deep Learning”) OR LIMIT-TO (EXACTKEYWORD, “Geometry”) OR LIMIT-TO (EXACTKEYWORD, “Deep Neural Networks”) OR LIMIT-TO (EXACTKEYWORD, “Convolution”) OR LIMIT-TO (EXACTKEYWORD, “Convolutional Neural Networks”) OR LIMIT-TO (EXACTKEYWORD, “Computer Vision”) OR LIMIT-TO (EXACTKEYWORD, “Convolutional Neural Network”)) which is shown in Table 2. It describes the most advanced uses of computer vision applications of these models, including object detection, medical imaging, face detection, action, and activity detection, human pose detection, network detection, and pedestrian trajectory. These applications are stated along with their specific employments.

3.2.6. Comparative Analysis of Described Models

Table 1 provides a comparative analysis of described models in terms of numerical values, novelty, and their limitations, which is sorted according to the proposed year. Initially, ShapeNet [78] was proposed in 2015 in the field of non-Euclidean geometry which was successful in terms of generalizing CNN to learn specific features. However, it is only limited to mesh features. With continuous improvements, in the same year, MVCNN [33] was proposed which is compact and efficient with improved accuracy. However, the 3D descriptor is untested. Later on, in 2017, TGNet [85] changed the linear convolution of CNN in the feature map, but its TGConv has a very narrow dynamic range. In 2017, MoNet [69] was proposed to ensemble the CNN model, but this method does not follow segmentation and weighted categorical cross-entropy outcomes.

In the next year, SplineCNN [70] was proposed which uses trainable continuous kernel functions and B-spline values to extract local features. However, its global behavior becomes worse along the large geodesic error. In 2019, CayleyNet [28] was proposed, which specializes in small frequency bands with few filter parameters while maintaining spatial localization. However, its bidirectional line cost is more, and the routing method is more sophisticated. In this continuation, CurvaNet [63] proposed in 2020 for a U-Net-like hierarchical structure is shown to exploit multiscale curvature characteristics but does not have structural regularization such as segment class topology. In this succession, MDGCN [42] initiated in 2020 utilizes dynamic graphs that are gradually refined and can accurately encode the inherent similarities between image regions. LSTM is used which requires high computational cost. The introduced GC unit has lower-order approximations of spectral graph convolution.

Again in 2020, ACSCNN [30] offers anisotropic convolution enabling a more thorough capture of the intrinsic local information of signals. However, the model needs shape segmentation and classification. In 2021, MongeNet [87] was proposed. The gap between the target point cloud and the sample point cloud from the mesh shrunk more quickly. So, at a given optimization time, the input point cloud was represented more accurately. However, it is a costly and time-consuming computation. This year, UV-Net [31] utilizes existing image and graph convolutional neural networks and can operate on B-rep data. However, the model did not use the B-rep curve and surface types, edge convexity, half-edge ordering, etc. Moreover, UV-grid features do not rotate.

4. Future Prospects and Challenges

These models are meant to deal with data that cannot be conveniently represented in the Euclidean space and have shown exceptional performance in tasks like shape identification, chemical design, and social network analysis. These non-Euclidean deep-learning models have several promising applications, including computer vision, robotics, natural language processing, and drug discovery. As a result, the future scope of non-Euclidean deep-learning models is very promising, and the research area is expanding in this field at an impressive rate. In addition to computer vision, it has already been used in applications such as AI pathfinding, 3D mapping, medical diagnostics, molecular analysis, VR-based applications, and even big data classification.

The extracted data by the string are plotted as “Document Published by Year” and “Documents by Type” in the following graphs and pie chart in Figure 13.

The bar graph gives information about documents published by year (till 2022). According to the graph, it can be seen that the number of publications started to increase from 2015 to 2021. In 2015, there were 14 documents published. Then, 42 were published in 2016 (an increased 121%), 91 were published in 2017 (an increased 116%), 174 were published in 2018 (an increased 91%), 349 were published in 2019 (an increased 100%), and 498 were published in 2020 (an increased 42%). It was highest in 2021 when the document published the most, 591 published (an increased 18%). 311 documents were published till the first half of 2022.

From the pie chart, it is clear that conference paper is published at the highest rate at 1126, and 890 articles were published in the document. However, the review rate is too poorer than the article and conference paper. There are a total of 31 reviews. In 100% of the pie chart, the total type of documents published is as follows: conference paper 55% (1126) + article 43% (890) + review 1.5% (31) = 100% (2047).

By evaluating the statistical future, it may be concluded that demand for geometrically based deep learning will increase. However, it leads us to newer challenges and difficulties day after day. Some difficulties are solved, and more of these are just ahead of us.

Various usages and challenges are illustrated in Figure 14. As challenges tend to be resolved, our future will progress toward these advanced applications. Key points of challenges are visualized and explained as follows:(1)Computational Difficulties. Despite the success of GNN in several disciplines, the high cost of computing remains a challenge for academics and applications. A DNN includes a high set of variables, which makes testing and training stages computationally costly. It requires higher hardware computer resources like GPUs. GNN exhibits the same characteristics. In addition, computation is difficult due to the complex relationship among graph nodes as well as the nongrid structure. In contrast, the great majority of existing deep-learning systems deal with normalized datasets in the Euclidean space, such as a 1D or 2D grid, which may make use of the strong processing capacity of contemporary GPUs. In most instances, however, geometric data may not have a grid-like structure, necessitating other approaches for efficient and sophisticated computing. Therefore, accelerating graph neural network performance is a pressing demand.(2)Complex Architecture. Concerning the issue of Euclidean data, the CNN architecture has achieved remarkable progress in the area of deep learning. The model gets more sophisticated as the number of network levels grows. Empirically, neural networks with more variables have the potential to perform well. However, stacked multilayered GNNS will exacerbate the smoothness issue. The nodes in a graph are connected to their surroundings, as well as the graph function in a network is a stream of data dispersion and consolidation. The more layers that are stacked, the more data from nodes will be integrated, resulting in an identical picture for all nodes. Consequently, each vertex will fall to the same value. For instance, the majority of complex GCN architectures include little or more than three or four layers. The study [206] sought to use a more sophisticated network architecture; however, its efficacy is insufficient. Nonetheless, in 2019, DeepGCNs [207] were able to construct a 56-layer network by leveraging the concepts of residual connections as well as dilated convolutions. Building a neural network based on a deep graph is an interesting yet difficult challenge.(3)Unpredictable Graph. Most known techniques are used to edit static graphs, and many datasets are also static, though many graphs change over time. In social media platforms, the reduction of existing users and the addition of new users may occur at any time, as well as the relationship among members can vary considerably. The question of how to adequately describe the development of dynamic graphs is unresolved, which has some impact on the applicability of graph neural networks. There are several efforts to resolve this issue such as [208210].(4)Scalability. It is a challenging challenge to apply graph neural networks on huge graphs. On the one side, each node seems to have its neighborhood topology, which includes the hidden layer of surrounding nodes, making it challenging to train using the batch technique [39]. In contrast, while dealing with millions of nodes and edges, the Laplacian matrix of the network was challenging to compute for researchers. There are additional approaches to increase the performance of the model by quick sampling [211, 212] and subgraph training [213, 214] although the results are not particularly impressive.

5. Conclusion

Even while deep learning has traditionally relied on Euclidean geometry, the modern era’s complex structural geometry demands the use of non-Euclidean geometry. The current world is aiming to include this non-Euclidean geometry in these deep-learning methods to satisfy the need for 3D complex structure analysis. Learning from complicated data, such as graphs and manifolds, is made easier using this geometric deep learning. Here, this paper explores the field of deep learning for manifolds and graphs at length. An in-depth look at the history, context, state-of-the-art and efficient mathematical deep-learning models, performance analysis, and pros and cons of deep networks used in computer vision on graphs and manifolds is presented in this paper. In addition, it is expensive and requires significant resources to deploy. Moreover, dynamic graphs and shapes demand a sophisticated real-time system, although it needs a great deal of more development for everyday usage. At this rate, however, it is reasonable to anticipate a more widely available advanced model for a variety of challenging classification tasks.

Abbreviations

:Single points of point cloud
:Fourier transform
:Laplace–Beltrami operator
:Reference point
:Inverse Fourier transform
:Manifold
:Distance between two locations
:Smooth function of manifolds
:Chebyshev polynomial
:The angle between the axes
:Closest neighbor points
:Filter
:B-spline curve
:Cayley polynomial
:Anisotropy level
:Knot vectors
:Cayley filter for signal
:Node features
:Control points
:Hidden node features
:Product of basic functions
:Regularization
:Multilayer perceptron
:Minimized loss
:Error coefficient
:Differentiate center nodes and neighbors
:Set of layers
:Vertices
:Eigenvector matrix
:Space between feature vectors
:Edges
:Linear projection
, :Image descriptors
:Graph function
:MLP (2 layers)
:Neighbor set vertex
L:Laplacian matrix
:Nonlinear activation function
:Neighbor polar coordinates
D:Degree matrix
:Feature matrix of curvature
:Interpolation weight
A:Adjacency matrix
:Interval number
:Mapping function
:Eigenvalue matrix
:Maximum number
:Anisotropic heat kernel.

Data Availability

The data that support the findings of this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.