Abstract

With the exponential increasement of 3D models, 3D model classification is crucial to the effective management and retrieval of model database. Feature descriptor has important influence on 3D model classification. Voxel descriptor expresses surface and internal information of 3D model. However, it does not contain topological structure information. Shape distribution descriptor expresses geometry relationship of random points on model surface and has rotation invariance. They can all be used to classify 3D models, but accuracy is low due to insufficient description of 3D model. This paper proposes a 3D model classification algorithm that fuses voxel descriptor and shape distribution descriptor. 3D convolutional neural network (CNN) is used to extract voxel features, and 1D CNN is adopted to extract shape distribution features. AdaBoost algorithm is applied to combine several Bayesian classifiers to get a strong classifier for classifying 3D models. Experiments are conducted on ModelNet10, and results show that accuracy of the proposed method is improved.

1. Introduction

With the development of multimedia technology, 3D model has been applied in many fields such as mechanical design, computer vision, construction industry, entertainment, medical treatment, education, e-commerce, and molecular biology [1]. The number of 3D models is becoming larger and larger in our lives. Therefore, the classification of 3D models becomes more and more important. Now, scholars at home and abroad are studying 3D model classification. 3D model classification is divided into voxel-based classification method, view-based one, and point cloud-based one.

In voxel-based classification method, 3D model is expressed as the distribution of 3D voxel grid. 3D model is denoted by 3D matrix, from which 3D CNN is used to extract features for classifying 3D models [2]. In view-based one, 3D model is converted into a series of 2D images, from which 2D deep learning algorithm is applied to extract view features. Then, view features are merged to represent 3D model for model classification [3]. In point cloud-based one, point cloud is preprocessed to solve sparseness and disorder. Then, CNN is used to extract features from point cloud for classifying 3D models [4].

AdaBoost is an ensemble learning technique which converts weak classifier into strong one [5, 6]. AdaBoost provides a simple and useful method to generate strong classifier. Strong classifier depends on the diversity between weak classifiers and the performance of weak classifiers [7].

Main contributions of this paper are shown as follows:(1)Voxel descriptor is used to express surface and internal information of 3D model. Shape distribution descriptor is adopted to express geometry relationship of random points on model surface. Two kinds of descriptors are applied to describe 3D models completely from different perspectives.(2)3D CNN is used to extract voxel features, and 1D CNN is adopted to extract shape distribution features. These features are merged to improve the performance of 3D model classifier.(3)AdaBoost algorithm is used to obtain multiple Bayesian classifiers by iterations. The weighted combination of Bayesian classifiers is used as strong 3D model classifier.

This paper is organized as follows. Related work is reported in Section 2. Voxel and shape distribution features of 3D model are given in Section 3. The proposed method is described in Section 4. Experimental results are given and analyzed in Section 5. Conclusion is presented in Section 6.

Feature descriptors are used to express shape and structure of 3D model. Machine learning algorithms are applied to classify 3D models based on feature descriptors. According to feature descriptors, 3D model classification is divided into voxel-based classification method, view-based one, and point cloud-based one.

In voxel-based classification method, 3D voxel matrix is used to describe 3D model. Sirma et al. [8] voxelize and scale 3D models with certain parameters. Two CNNs are combined to classify 3D models. Kang et al. [9] propose a voxel-based method to classify 3D pole-like objects by analyzing spatial characteristics of objects. Plaza et al. [10] use multilayer perceptrons to classify voxels by analyzing spatial distribution of inner points. They identify tubular structures and flat surfaces for unstructured and natural environments. Mughees and Tao [11] present hyper-voxel stacked autoencoder to exploit spatial context within similar contiguous pixels to classify hyperspectral images. Babahajiani et al. [12] design a novel street scene semantic recognition framework, which takes advantage of 3D point cloud captured by a high definition laser scanner. Ahmad et al. [13] classify urban scenes based on a super-voxel segmentation of 3D data obtained from LiDAR sensors. Agnello et al. [14] select useful MRI data according to fuzziness values of Fuzzy C-Means and train the fully connected cascade neural network to classify brain tissues into White Matter, Gray Matter, and Cerebrospinal Fluid in an unsupervised way.

In view-based classification method, 3D model is projected into a series of 2D views from different directions. Chen et al. [15] propose multimodal support vector machine to combine sift descriptor, outline Fourier transform descriptor, and Zernike moments descriptor to discriminate multiple classes of object, which considers the independence of each modality and the interrelation between them. Liu et al. [16] design a semantic and context information fusion network, which extracts contextual information in continuous view sequence and semantic information of individual view. Kim et al. [17] give a viewpoint and image-resolution estimation method for view-based 3D shape retrieval from point cloud queries. Automatic selection method of viewpoint and image resolution is proposed based on data acquisition rate and density calculations. Huang et al. [18] design a view-based weight network for 3D object recognition where view-based weights are assigned to projections. Shajahan et al. [19] present a multiview CNN with self-attention, which takes multiple views of a roof point cloud as input and outputs category of the roof. Zhou et al. [20] design multiview saliency guided deep neural network to retrieve and classify 3D objects. It selects representative views, exploits multiview context to compute similarities, and discovers discriminative structure of multiview sequence. Zhou et al. [21] propose CNN trained by a single polar view of 3D shape to obtain polar view representation, which is used to classify and retrieve 3D shapes. Yang et al. [22] design a network structure combining multiview CNN, extreme learning machine autoencoder, and ELM classifier, which converts 3D model into a single compact feature descriptor.

In point cloud-based classification method, neural network is used to extract features from point cloud. Lee et al. [23] propose DenX-Conv to improve accuracy of object classification and secure the connectivity of points from raw point cloud. Hu et al. [24] present a convolution operation and design a feature transformation module to capture local structural information and prior information from 3D point cloud. Li et al. [25] design a multiscale receptive field graph attention network with semantic features of local patch for point cloud, which captures abundant features of point cloud. Zhang et al. [26] propose a hybrid feature CNN to describe features of 3D point cloud, which can handle 3D point cloud data with unstructured and unordered properties. Li et al. [27] use X-Conv transformation to solve disorder of point clouds, and apply CNN to classify 3D point clouds. Huang et al. [28] project features of disordered points into an ordered sequence of feature vectors and use recurrent neural network to classify 3D point clouds.

Boosting is a very successful technology for solving two classification problems. Hastie et al. [29] develop an algorithm which extends AdaBoost to solve multiple classification problem. Wang and Yan [30] propose a gait classification framework based on CNN ensemble. Wu and Nagahashi [31] present parameterized AdaBoost whose parameter is devised to penalize the misclassification of instances. Compared with Real AdaBoost, the proposed method achieves faster convergence and improves the generalization ability. Li et al. [32] give a novel multifeature joint learning ensemble framework which combines global features with local key features to consider multiple labels of expressions in facial action units. Taherkhani et al. [33] propose AdaBoost-CNN, which combines AdaBoost and CNN to deal with data imbalance. Kavitha et al. [34] merge Hough transform with the improved Dragonfly algorithm to address face recognition problem in unconstrained conditions. Feng [35] uses AdaBoost algorithm to strengthen Bayesian classifier for classifying text emotion accurately. Xiao et al. [36] propose a multitemporal ensemble learning framework to extract snow cover from images of mountain areas. Kaur and Kumar [37] extract zoning features, diagonal features, intersection features, and open-end point features from word images. Majority voting scheme of K-nearest neighbor, support vector machine, random forest, and AdaBoost algorithm are used to recognize handwritten Gurmukhi words. Chen et al. [38] integrate deep learning and subspace-based ensemble learning to boost the performance of HSI classification. Lee et al. [39] propose a new sparse coding method based on an ensemble of image patches. Zhang et al. [40] give a bioinspired fault diagnosis method based on neural P system with belief AdaBoost for oil-immersed power transformer. Oyewole and Olugbara [41] design an image classification architecture including data preprocessing, feature extraction, dimensionality reduction, and ensemble of machine learning methods. Song et al. [42] propose an innovative classification method which combines texture features of images with extreme learning machine. Zeng et al. [43] present a learning-based multiple pooling fusion method, in which multiview feature maps projected from 3D model are compiled as a simple and effective feature descriptor. Li et al. [44] give a novel unsupervised hashing method to solve large-scale image retrieval on scenarios of single information source.

These 3 methods have their own shortcomings. More research is needed no matter which method is used for 3D model classification. For voxel-based classification method, network structure needs to be optimized, and voxel resolution should be set. At the same time, there is less research on fusing voxel features with other ones. Most networks use softmax as classifier, and there is less research on using strong classifier obtained by AdaBoost algorithm to classify 3D models.

3. Voxel and Shape Distribution Features of 3D Model

Voxel is basic data unit in volume graphics which can describe volume model. 3D model is denoted as polygon mesh structure which is a collection of vertices, edges, and faces. This paper voxelizes polygon mesh structure and represents 3D model with voxel feature. Here, a simple and effective method is used to voxelize 3D model. The resolution of model voxelization is set to NNN. AABB of 3D model is determined. Vertex coordinates of polygon mesh model are traversed to obtain the maximum (xmax, ymax, zmax) and the minimum (xmin, ymin, zmin) as shown in Figure 1. Here, xmax, xmin represent, respectively, the maximum and minimum in x-axis direction of all vertex coordinates. AABB is a cube which is composed of diagonal vertices (xmax, ymax, zmax) and (xmin, ymin, zmin). The size of voxel unit is set to VVV, where

AABB is divided equidistantly according to V along x-axis, y-axis, and z-axis. If there is a side whose length is less than V, it is rounded up. Because the size of voxel unit is VVV, AABB is divided into XNYNZN voxel units. XN, YN, ZN represent, respectively, layer number of AABB divided along x-axis, y-axis and z-axis, as shown in the following formulas:where ceil (·) is ceiling function. Vertex (x, y, z) of 3D model corresponds to a voxel unit in the XV-th row, YV-th column, and ZV-th layer. XV, YV, and ZV are computed, respectively, as shown in the following formulas:

If a voxel unit is occupied by 3D model, its value is 1 and it is called nonempty voxel. Otherwise, its value is 0 and it is called empty voxel. 3D model is denoted as a binary string. In this paper, the voxelization resolution of 3D model is set to 24 ∗ 24 ∗ 24, and the size of voxel matrix is 24 ∗ 24 ∗ 24. In order to extract features from model surface accurately, the size of voxel matrix is expanded into 30 ∗ 30 ∗ 30 with 0.

In order to describe shape characteristics of 3D model on the whole, this paper uses D1, D2, D3, and A3 shape functions which have good rotation invariance. D1, D2, D3, and A3 shape functions are described as shown in Figure 2. O is model centroid. P1, P2, and P3 are 3 random sampling points on 3D model surface. D1 shape function is the distance between O and P1. D2 shape function is the distance between P1 and P2. D3 shape function is the square root of area formed by P1, P2, and P3. A3 shape function is angle A formed by P1, P2, and P3.

Points = {(x1, y1, z1), …, (xi, yi, zi), …, (xn, yn, zn)} is point set on 3D model surface. Here, (xi, yi, zi) is point coordinates. Bins is the number of statistical intervals, and BinSize is the length of interval.

Extract a set of points NPoints = {ai1, ai2, …, aiN} from Points by N sampling. The set of D1 shape distribution features is {D1_v1, …, D1_vi, …, D1_vBins}. D1_vi is computed aswhere BinSize = max({dist(P1, O)|P1∈NPoints})/N. Function dist(·) is Euclidean distance between points P1 and O. O is the centroid of 3D model.

Construct a set of point pairs NPoints = {(ai1, bi1), (ai2, bi2), …, (aiN, biN)} by N sampling from Points. The set of D2 shape distribution features is {D2_v1, …, D2_vi, …, D2_vBins}. D2_vi is calculated aswhere BinSize = max({dist(P1, P2)|(P1, P2)∈NPoints})/N.

Construct a set of point triples NPoints = {(ai1, bi1, ci1), …, (ain, bin, cin)} by N sampling from Points. The set of D3 shape distribution features is {D3_v1, …, D3_vi, …, D3_vBins}. D3_vi is computed aswhere BinSize = max({herson(P1, P2, P3)|(P1, P2, P3)∈NPoints})/N. Function herson(·) is used to compute the area of triangle (P1, P2, P3) as shown aswhere |a| = dist(P1, P2), |b| = dist(P1, P3), |c| = dist(P2, P3).

The set of A3 shape distribution features is {A3_v1, …, A3_vi, …, A3_vBins}. A3_vi is calculated aswhere BinSize = max({cosine(P1, P2, P3)|(P1, P2, P3)∈NPoints})/N. cosine(·) is cosine function. Angle A formed by sides a, b, and c is shown in (14). A is angle corresponding to side a.

4. 3D Model Classification Based on AdaBoost Algorithm

3D model classification framework is shown in Figure 3.

3D CNN is applied to extract deep features from voxel descriptor. 1D CNN is used to extract deep features from shape distribution descriptors. Then, Merge layer fuses voxel feature and shape distribution features. Finally, the fused feature is input into Bayesian classifier.

4.1. 3D CNN for Extracting Voxel Feature

3D CNN is used to extract voxel feature from voxel matrix including 4 convolutional layers as shown in Figure 4.

Dropout layer is added between two convolutional layers. The input of 3D CNN is voxel matrix whose size is 30 ∗ 30 ∗ 30 and its output is a 1024 ∗ 1 feature vector. In Figure 4, Conv1 is the first convolutional layer, and the size of convolutional kernel is 6 ∗ 6 ∗ 6. The number of convolutional kernels is 16, and the sliding step is 2. The size of convolutional kernel in Conv2, Conv3, and Conv4 is 5 ∗ 5 ∗ 5, and the sliding step is 1. The number of convolutional kernels is, respectively, 64, 128, and 1024.

In Figure 4, is element in the l-th convolutional layer. Here, i, j, and k represent, respectively, the i-th row, the j-th column, and the k-th layer in voxel matrix, where 0 ≤ i < Wl, 0 ≤ j < Ll, and 0 ≤ k < Hl. Here, Wl, Ll, and Hl represent, respectively, the length, width, and height of voxel matrix in the l-th convolutional layer. is computed aswhere f(·) represents activation function as shown in (16) and eV is net activation. is convolved from feature maps in the (l − 1)-th layer. Here, represents the element in the r-th row, the s-th column, and the t-th layer of convolutional kernel. Indices r, s, t take, respectively, values from Mil, Mjl, Mkl. represents its bias.

4.2. 1D CNN for Extracting Shape Distribution Feature

1D CNN is used to extract shape distribution feature including 4 convolutional layers as shown in Figure 5. Dropout layer is added between two convolutional layers. The input of 1D CNN is shape distribution descriptors with 1 ∗ 512, and its output is a 1792-feature vector.

is element in the l-th convolutional layer. 1D convolutional kernel is used, whose size is 1 ∗ 5. The sliding step is 2. is computed aswhere f(·) represents activation function and eG denotes net activation. in the l-th layer is convolved from feature maps in the (l − 1)-th layer. Index s takes values from Mil. represents the s-th element of convolutional kernel in l-th layer, and denotes its bias.

Maximum pooling operation is used in pooling layer aswhere max(·) is the operation of taking maximum. The size of pooling kernel is 1 ∗ 2, and step size is 2.

4.3. Fusing 3D CNN and 1D CNN to Classify 3D Model

3D CNN is used to extract voxel feature. After voxel feature is processed by Flatten layer, XV is gotten. 1D CNN is adopted to extract D1, D2, D3, and A3 shape distribution features. After D1, D2, D3, and A3 shape distribution features are processed by Flatten layer, XG is obtained. Then, XV and XG are fused in Merge layer as

Bayesian classifier is a simple probabilistic one based on Bayes’ theorem with strong independence assumptions between features. The classification principle is that the posterior probability is calculated by Bayesian formula and the category with the largest probability is selected. Bayesian classifier is used as weak one in AdaBoost algorithm to classify 3D models here. Model M has m categories including s1, s2, …, sm. Feature of model M is denoted as XM. The process of Bayesian classifier predicting category of M is described as follows: when PBayes(si|XM) ≥ PBayes(sj|XM), where PBayes is the output probability of Bayesian model, j = 1, 2, …, m, category of M is determined as si. ClassifierBayes(XM) is calculated as

4.4. 3D Model Classification Algorithm Based on 3D and 1D CNNs

3D model classification framework mainly includes 3 parts. The first part is 3D CNN for extracting voxel features. The second one is 1D CNN for extracting shape distribution features. The third one is strong classifier, Classifier(X), constructed by AdaBoost algorithm. The process of constructing Classifier(X) based on Adaboost is shown in Algorithm 1.

Input: training instances (X1, y1), (X2, y2), …, (Xn, yn).
Output: strong classifier Classifier(X).
Step 1. Initialize weight 1i of training instances, and iteration time K:
  
 Step 2. for (k = 1; k ≤ K; k++){
 Step 2.1 Use training instances with weight k1, k2, …, kn to train classifier Classifierk(X).
 Step 2.2 Calculate error rate ek of Classifierk(X):
  
 where I (Classifierk(Xi)≠yi) is 0 or 1. If Classifierk(Xi)≠yi, then I(Classifierk(Xi)≠yi) = 1. Otherwise, I(Classifierk(Xi)≠yi) = 0.
 Step 2.3 Calculate weight coefficient αk of Classifierk(X):
  
 Step 2.4 Update weight k + 1,i:
  
  
}
Step 3. Combine Classifierk(X) according to weight coefficient αk, (k = 1, 2, …, K):
  

The process of classifying model M is described as follows: 3D CNN is used to extract voxel feature. After voxel feature is processed by Flatten layer, XV is gotten. 1D CNN is adopted to extract D1, D2, D3, and A3 shape distribution features. After D1, D2, D3, and A3 features are processed by Flatten layer, XG is gotten. XV and XG are fused to obtain XM as shown in (19). Then, XM is input into Classifier(X) to determine category of M.

5. Experiments and Result Analysis

5.1. Experimental Data Set

ModelNet10 is used for experiments, which contains 4899 models in 10 categories. Training data contains 3991 models, and test data contains 908 ones. They are applied to train and test the proposed model. Firstly, 3D model is voxelized into voxel units with resolution of 24 ∗ 24 ∗ 24. Secondly, D1, D2, D3, and A3 shape functions are adopted to express 3D model.

5.2. Analysis of Experimental Results

This paper uses five classifiers. The first one is softmax classifier. The second one is Bayesian classifier. The third one is as follows: AdaBoost algorithm is adopted to obtain multiple Bayesian classifiers by iterations. The weighted combination of Bayesian classifiers is used as strong Bayesian classifier. The fourth one is Decision tree. The fifth one is as follows: AdaBoost algorithm is adopted to obtain multiple Decision trees by iterations. The weighted combination of Decision trees is used as strong Decision classifier.

3D CNN is used to extract voxel feature, and 1D CNN is adopted to extract D1 + D2 + D3 + A3 feature. They are combined to obtain voxel + D1 + D2 + D3 + A3 feature.

Experiment 1 and Experiment 2 are conducted to testify accuracy of 3D model classification under different inputs. In Experiment 1, softmax classifier is used to classify 3D models based on voxel feature and voxel + D1 + D2 + D3 + A3 feature. In Experiment 2, strong Bayesian classifier is adopted based on voxel feature and voxel + D1 + D2 + D3 + A3 feature. Training data is used to optimize 3D model classifier, and the optimized classifier is applied to classify test data. Accuracies of Experiment 1 and Experiment 2 are shown in Table 1. From Table 1, it can be seen that accuracy of Experiment 2 is better than that of Experiment 1. This is because strong Bayesian classifier makes full use of the diversity between multiple Bayesian classifiers. Under the same classifier, voxel feature is better than D1 + D2 + D3 + A3 feature at classification ability. It shows that voxel feature has better discriminative ability than D1 + D2 + D3 + A3 feature. The classification ability of voxel + D1 + D2 + D3 + A3 feature is the best. This is because the fused feature can describe 3D model completely from different perspectives.

Experiment 3 and Experiment 4 are conducted to testify the influence of shape distribution feature on classification accuracy. In Experiment 3, softmax classifier is used to classify 3D models based on D1 feature, D2 feature, D3 feature, A3 feature, and D1 + D2 + D3 + A3 feature. In Experiment 4, strong Bayesian classifier is adopted to classify 3D models based on D1 feature, D2 feature, D3 feature, A3 feature, and D1 + D2 + D3 + A3 feature. Training data is used to optimize 3D model classifier, and the optimized classifier is applied to classify test data. Accuracies of Experiment 3 and Experiment 4 are shown in Table 2. From Table 2, it can be seen that under the same classifier, D1 + D2 + D3 + A3 feature is better than D1 feature, D2 feature, D3 feature, and A3 feature in classification ability. This is because D1 + D2 + D3 + A3 feature has better discriminative ability than D1 feature, D2 feature, D3 feature, and A3 feature. The discriminative ability of D1 feature is the worst.

Experiment 5 and Experiment 6 are conducted to testify classification accuracy under voxel feature with different shape distribution features. In Experiment 5, softmax classifier is used to classify 3D models based on voxel + D1 feature, voxel + D2 feature, voxel + D3 feature, voxel + A3 feature, and voxel + D1 + D2 + D3 + A3 feature. In Experiment 6, strong Bayesian classifier is adopted to classify 3D models based on voxel + D1 feature, voxel + D2 feature, voxel + D3 feature, voxel + A3 feature, and voxel + D1 + D2 + D3 + A3 feature. Training data is used to optimize 3D model classifier, and the optimized classifier is applied to classify test data. Accuracies of Experiment 5 and Experiment 6 are shown in Table 3. From Table 3, it can be seen that under the same classifier, the classification ability of voxel + D1 + D2 + D3 + A3 feature is the best. This is because voxel + D1 + D2 + D3 + A3 feature can describe shape and structural information of 3D model from different perspectives. Voxel + D2 and voxel + A3 features are better than voxel + D1 and voxel + D3 features in classification ability.

Experiment 7 and Experiment 8 are conducted to testify accuracies of 3D model classification at different ratios of voxel feature to shape distribution feature. The purpose is to find an optimal combination of voxel feature and D1 + D2 + D3 + A3 feature. The ratio of voxel feature to D1 + D2 + D3 + A3 feature is, respectively, set to 1 : 16, 1 : 8, and 1 : 4. In Experiment 7, softmax classifier based on different ratios of voxel feature to D1 + D2 + D3 + A3 feature is used. In Experiment 8, strong Bayesian classifier based on different ratios of voxel feature to D1 + D2 + D3 + A3 feature is adopted. Training data is used to optimize 3D model classifier, and the optimized classifier is applied to classify test data. Accuracies of Experiment 7 and Experiment 8 are shown in Table 4. From Table 4, it can be seen that under the same classifier, accuracy of 3D model classification increases firstly and then decreases with the increase of the ratio. The reason is that voxel feature contains surface and internal information of 3D model. However, it does not describe topological structure of 3D model. Shape distribution feature expresses geometry relationship of random points on model surface. With the increase of the ratio, shape distribution feature begins to make up for voxel feature in the ability of describing 3D model. However, voxel feature has better classification ability than shape distribution features. Therefore, accuracy of 3D model classification decreases as the ratio increases continuously.

Experiment 9 is conducted to testify the influence of iteration time on classification accuracy. The purpose is to find an optimal iteration time for AdaBoost algorithm. In Experiment 9, strong Bayesian classifier based on voxel + D1 + D2 + D3 + A3 feature is used. Training data is used to optimize 3D model classifier, and the optimized classifier is applied to classify test data. Accuracies of Experiment 9 are shown in Table 5. From Table 5, it can be seen that as iteration time increases, accuracy of 3D model classification becomes bigger and bigger. When iteration time is 800, the best classification accuracy is achieved. However, as iteration time continues to increase, classification accuracy begins to decline. This is because the overfitting phenomenon occurs in AdaBoost algorithm.

Experiment 10 is conducted to testify accuracy of 3D model classification under different weak classifiers. The purpose is to find a better weak classifier for AdaBoost algorithm. Bayesian classifier, strong Bayesian classifier, Decision tree, and strong Decision tree based on voxel + D1 + D2 + D3 + A3 feature are adopted. Training data is used to optimize 3D model classifier, and the optimized classifier is applied to classify test data. Accuracies of Experiment 10 are shown in Table 6. From Table 6, it can be seen that Bayesian classifier is better than Decision tree, and strong Bayesian classifier is better than strong Decision tree. Strong Bayesian classifier is better than Bayesian classifier, and strong Decision tree is better than Decision tree. This is because AdaBoost algorithm can combine several week classifiers to construct strong classifier for classifying 3D models better.

In order to testify the influence of features on accuracy of each category, confusion matrices from Experiment 1 and Experiment 2 are constructed. Figures 6(a)6(c) are obtained by softmax classifier. Figures 6(d)6(f) are obtained by strong Bayesian classifier.

From Figures 6(a)6(c), it can be seen that discriminative abilities of voxel feature and voxel + D1 + D2 + D3 + A3 feature are better than that of D1 + D2 + D3 + A3 feature. Compared with Figures 6(a) and 6(b), voxel + D1 + D2 + D3 + A3 feature is better than voxel feature in classification ability for bed class, night_stand class, and table class. This is because D1 + D2 + D3 + A3 feature can describe 3D model on the whole and make up for shortcomings of voxel feature. From Figure 6(a), it can be seen that when voxel + D1 + D2 + D3 + A3 feature is used, the probability that night_stand is mistakenly classified as dresser is 16%. From Figure 6(b), it can be seen that when voxel feature is used, the probability that night_stand is mistakenly classified as dresser is 23%. Error rate of classification decreases by 7% after D1 + D2 + D3 + A3 feature is fused. From Figure 6(a), it can be seen that when voxel + D1 + D2 + D3 + A3 feature is used, the probability that table is mistakenly classified as desk is 25%. From Figure 6(b), it can be seen that when voxel feature is used, the probability that table is mistakenly classified as desk is 31%. It shows that the misclassification of table cannot be solved after D1 + D2 + D3 + A3 feature is fused. This is because table and desk are similar to each other. The proposed method cannot distinguish this gap well, which results in low accuracy.

From Figures 6(d)6(f), it can be seen that discriminative abilities of voxel feature and voxel + D1 + D2 + D3 + A3 feature are better than that of D1 + D2 + D3 + A3 feature. Compared with Figures 6(d) and 6(f), voxel + D1 + D2 + D3 + A3 feature is better than voxel feature in classification ability for bathtub class and desk class. From Figure 6(f), it can be seen that when voxel feature is used, the probability that desk is mistakenly classified as table is 16%. From Figure 6(d), it can be seen that when voxel + D1 + D2 + D3 + A3 feature is used, the probability that desk is mistakenly classified as table is 7%. Error rate of classification decreases by 9% after D1 + D2 + D3 + A3 feature is fused.

From Figures 6(a) and 6(d), it can be seen that strong Bayesian classifier performs better than softmax classifier in bathtub class and table class. Accuracy increases by 10% and 9%, respectively. At the same time, the problem of table being misclassified as desk has been solved to a certain extent. Error rate of classification decreases by 8%. However, strong Bayesian classifier performs poorly in dresser class. The probability that dresser is mistakenly classified as night_stand increases by 6%.

6. Conclusions

In this paper, voxel descriptor is used to express surface and internal information of 3D model. Shape distribution descriptor is adopted to denote geometry relationship of random points on model surface. 3D CNN is used to extract voxel feature, and 1D CNN is adopted to extract shape distribution features. Based on voxel feature and shape distribution feature, Bayesian classifier is used to classify 3D models. AdaBoost algorithm is adopted to combine several Bayesian classifiers to get a strong classifier for classifying 3D models. Experimental results show that accuracy of the proposed method is effective.

Data Availability

Princeton Shape Benchmark model library can be downloaded from http://modelnet.cs.princeton.edu/.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant nos. 61502124 and 60903082), Fundamental Research Foundation for Universities of Heilongjiang Province (Grant no. LGYC2018JC014), and Natural Science Foundation of Heilongjiang Province of China (Grant no. F2017014).