Abstract

Deep neural network (DNN) has replaced humans to make decisions in many security-critical senses such as face recognition and automatic drive. Essentially, researchers try to teach DNN to simulate human behavior. However, many evidences show that there is a huge gap between humans and DNN, which has raised lots of security concern. Adversarial sample is a common way to show the gap between DNN and humans in recognizing objects with similar appearance. However, we argue that the difference is not limited to adversarial samples. Hence, this paper explores such differences in a new way by generating fooling samples in 3D point cloud domain. Specifically, the fooling point cloud is hardly recognized by human vision but is classified to the target class by the victim 3D point cloud DNN (3D DNN) with more than confidence. Furthermore, to search for the optimal fooling point cloud, a new evolutionary algorithm named Multielites Harris Hawk Optimization (MEHHO) with enhanced exploitation ability is designed. On one hand, our experiments demonstrate that: (1) 3D DNN tends to learn high-level features of one object; (2) 3D DNN that makes decisions relying on more points is more robust; and (3) the gap is hardly learned by 3D DNN. On the other hand, the comparison experiments show that the designed MEHHO outperforms the SOTA evolutionary algorithms w.r.t. statistics and convergence results.

1. Introduction

Deep Neural Network (DNN) techniques have achieved remarkable success on a variety of tasks, particularly in the fields of 2D image [15] and 3D shape [611]. In many security-critical senses, DNN has replaced humans to make decisions, such as natural language recognition, face recognition, and pedestrian reidentification. In the future, many complicated scenarios will urge to apply DNN for development, for example, level-five automatic driving and human-like service robot. Essentially, one common characteristic of these tasks is to teach DNN to simulate human behavior.

Although the prospects look bright, DNN has been proven to be vulnerable to a carefully crafted adversarial attack, no matter whether in the field of 2D images [1220] or 3D point clouds [2125]. The aim of adversarial attack is to generate fake samples that will not cause people to be alarmed but will mislead DNN to make wrong decision, as shown in Figure 1(b). Such a phenomenon has raised many security concerns. Szegedy et al. [12] are the pioneers to point out the vulnerability of DNN in the 2D image domain. Later, lots of attacking methods are designed [13, 16, 26]. For the 3D point cloud which is the most simple format to represent a 3D shape, adversarial samples are generated by points perturbing, attaching, and detaching [21, 22, 27, 28].

Adversarial attack takes the benign samples as the initial target and adjusts its feature to cross the decision boundary. It can be modeled as a roughly min-max problem which minimizes the probability of the original class and maximizes the probability of the targeted class. Many recent adversarial attacks on 3D point cloud are able to achieve success rate [21, 27, 28]. These successes exhibit the huge difference between 3D DNN and human vision. Exploring such differences is helpful to accelerate the development of 3D DNN robustness. However, most adversarial defenses [2936] for 3D DNN are mainly aimed at defense specific attack and do not explain this difference clearly.

Essentially, the adversarial sample itself is one way to show the gap between 3D DNN and humans in recognizing objects with similar appearance. But we argue that the gap will not be limited to adversarial samples since the disparity between DNN and the human brain. Unfortunately, the adversarial samples usually stay near the decision boundary and will not move inside the class space since the attacking purpose is already achieved. Such a characteristic prevents it from further exhibiting information about the disparity between 3D DNN and human vision.

Hence, this paper aims to explore the disparity using the newly defined fooling samples (illustrated in Figure 1(a)). The fooling samples exhibit the gap between appearances that humans and 3D DNN think category should look like. For this purpose, we should have an extremely high level of the confidence in fooling samples. Therefore, the fooling sample is searched by an evolutionary algorithm with high exploitation ability. The formal generation process is as follows: given a 3D DNN and a target class, we generate a fooling point cloud by an evolutionary algorithm. The 3D DNN believes it belongs to the target class with more than confidence, but human vision can hardly recognize it.

To efficiently generate fooling samples, we model the generation process as an optimization problem and design MEHHO with high exploitation ability to solve it. Specifically, Harris Hawk Optimization (HHO) [37] is an evolutionary algorithm which simulates the hunting process of harris hawk. It has dealt with a wide variety of problems ranging from operational cost optimization, engineering design, to copyright protection and data authentication [3841]. However, HHO always stuck in the local optimum due to the nonlinearity of DNN function. Therefore, we propose the Muti-Elite Harris Hawk Optimization (MEHHO) based on group besiege and spiral-shaped attack. The experiment results show that the classification confidence of the fooling point cloud generated by MEHHO is higher than the compared algorithms.

After obtaining the fooling point cloud by MEHHO, we explore the gap between 3D DNN and human vision and further give some advice about the characteristics of 3D DNN by analyzing the fooling samples. In summary, we make the following contributions in this paper:(i)We exhibit the difference between 3D DNN and human vision in a new way by generating fooling point cloud.(ii)A new evolutionary algorithm MEHHO is proposed, which achieves better statistic and convergence results on the generation of fooling point cloud.(iii)We explore the characteristics of 3D DNN by analyzing the feature of the fooling point cloud.

2.1. DNNs for 3D Point Cloud

In order to perform weight sharing or other kernel optimizations, typical DNNs require regular data format as input, such as 2D image. However, a 3D point cloud is unordered and therefore cannot be directly put into typical DNNs. Most researchers transform such data to regular format by multiview projection [4244] or 3D voxel grids [4547], but such transformation will inevitably cause disturbances. To address this issue, Qi et al. [6] proposes PointNet which directly processes on the point cloud. It is the benchmark for many efficient 3D DNNs. Since PointNet cannot capture local structure information, Qi et al. [7] propose PointNet++, whose key is composed by the sampling layer and the grouping layer. DGCNN [48] is a typical work of graph-based methods; it transforms a point cloud to a graph and learns 3D feature using EdgeConv. LDGCNN [49] links extracted features between different layers and simplifies the transformation network of DGCNN to improve its performance. Due to the irregularity of the point cloud, it is more complicated to design the convolution kernel for 3D DNN. RS-CNN [50] implements the convolution by MLP, which aims to learn the mapping from low-level relations to high-level relations between points in the local subset. Point-Bert [10] learns 3D point cloud transformers by a Bert-style pretraining to. It achieves accuracy on ModelNet40 for classification which surpasses most 3D DNNs.

2.2. Adversarial Attack and Defense on 3D DNN

Xiang et al. [21] propose the first work to generate adversarial point clouds; the author crafts adversarial point clouds against PointNet by point perturbation and point generation. To limit the deformation of adversarial point cloud, three perturbation metrics were introduced as the constraint. Since critical points have the greatest impact on 3D DNN when making its decisions, Zheng et al. [22] and Naderi et al. [51] focus on generating a point cloud saliency map to illustrate which points are important for 3D DNN. Then the author drops top most important points for attack. Zhao et al. [23] propose a black-box attack and a white-box attack in 3D adversarial settings, respectively. The result shows the vulnerability of the 3D deep learning model under isometric transformations. Aiming at improving the time efficiency of adversarial attack, Zhou et al. [52] proposed LG-GAN for real-time flexible targeted point cloud attack. In the field of autonomous driving which requires higher robustness, Wang et al. [27] attack a 3D object detection task, and 3D adversarial point clouds are generated based on the idea of point cloud perturbations. To promote the robustness of 3D DNN directly, Yang et al. [28] proposed an attack and defense scheme at the same time. Otherwise, the imperceptibility of the adversarial point cloud is important for bypassing defense. Wen et al. [53] newly design geometry-aware objectives, whose solutions favor the desired surface properties of smoothness and fairness. Liu and Hu [54] shift each point in the direction of its normal vector within a strictly bounded width so as to keep geometric properties of the original point clouds. Transferability of attack method means the scope of application of one attack method; to this extent, Hamdi et al. [25] develop a point attack method by exploring the input data distribution and adding an adversarial loss.

Compared with adversarial attack, there are only a few methods proposed for 3D point cloud adversarial defense. Adversarial training retrains DNN with adversarial samples which is an effective way to defend against an adversarial attack. Liu et al. [30] extend this defense method to 3D DNN and achieve promising results. Restoring the adversarial point cloud is another way to defend against an adversarial attack. Zhou et al. [31] design a denoiser network to remove the outlier points to defend the attacks. Meanwhile, the smoothness of the point cloud surface is guaranteed. Dong et al. [32] extracted the global feature to confuse the benign point cloud and the adversarial point cloud by the designed vector. Liang et al. [34] propose a novel perturbation adaptation generation network. This model can adaptively generate adversarial point clouds according to the victim point cloud. Different from improving the robustness of the whole point cloud, Wicker and Kwiatkowska [35] analyzed the pointwise robustness of 3D DNN in an adversarial setting.

As we can see, most attack methods achieve a nearly success rate by crafting visually similar adversarial point cloud. The gap between 3D DNN and human vision is obvious. In this paper, we further explore such a gap by generating the fooling point cloud as the supplement of adversarial “attack-defense” of 3D DNN.

2.3. Evolutionary Algorithm

The evolutionary algorithm simulates the behavior of animals in nature and is a kind of typical global optimization technique which has dealt with lots of engineering problems [55, 56]. The initial evolutionary algorithm is inspired by the biology and the natural selection principle to optimize the problems. After that, many efficient evolutionary algorithms are designed, and they can be grouped in the three main categories which include swarm-based, evolution-based, and physics-based methods. In detail, swarm-based methods [38, 5759] mimic the cooperative mechanism of group animals. Evolution-based methods [6066] are inspired by Darwinian evolution law, which follows the saying: survival of the fittest and elimination of the unfit. Physics-based methods [56, 6769] simulate the physical laws of how things work in the universe. Although the thriving of the evolutionary algorithm, it is impossible that one evolutionary algorithm is able to adapt to all optimization problems due to the unique characteristics of each problem. For the above point cloud generation problem, we design a new evolutionary algorithm named MEHHO based on group besiege and spiral-shaped attack to enhance the exploitation ability.

3. Method to Generate a Fooling Point Cloud

3.1. Problem Definition

A 3D point cloud is defined as a set of 3D points, where each point is represented by its 3D coordinates (, , and ). Meanwhile, the 3D DNN is defined as , which maps an input point cloud to its corresponding class label . The probability that X is classified as target class is indicated by with the well trained parameters .

For the ease of understanding, we first illustrate the process of an adversarial attack. It aims to mislead a 3D DNN to classify an adversarial example as the target class , by adding perturbations that are imperceptible to humans. Adversarial example shows the disparity of human and DNN when recognizing objects with similar appearance. The formal description iswhere means the distance between adversarial point cloud and benign point cloud . The common metrics include normal, chamfer distance, and hausdorff distance.

Different from adversarial attack, we mainly explore the gap between appearance that human and 3D DNN think category should look like. We first randomly generate a point cloud as the initial, and then maximize its probability w.r.t. category to generate fooling sample :

Here, we utilize to quantify probability of category , where is the inference set. It is obvious that higher probability will better reflect the inside of 3D DNN when it recognizes category . In the next section, we design a new evolutionary algorithm to solve equation (2). The generation process is as shown in Figure 2.

3.2. Harris Hawks Optimization (HHO)

Given the number of points and target class, there are massive permutations to compose into a point cloud. We urge to find the optimal one with the highest confidence efficiently. Evolutionary algorithm (EA) [70] is a good kind of global optimization technique and has dealt with a wide variety of problems.

HHO is one of the most efficient EAs which mimic the hunting mechanism of harris hawks. It is simple and straightforward and can be divided into exploration phase and exploitation phase. HHO has the advantages of well the convergence rate, competitive exploration ability, and good time efficiency. Meanwhile, it has shown to be good at handling high-dimensional data despite its weak exploitation ability. In lots of practical engineering applications, HHO has been proven to be very effective and stable.

3.2.1. Exploration Phase

Since the rabbit (global optimum) is not easy to be found, the hawks wait, observe, and monitor the desert site by two ways, which as shown in the following equation:where is the position of hawk in iteration, is the position of a random hawk in population, is the average position of the population, means the current optimal hawk, and means the upper bound and lower bound of variables, and and are random numbers inside (0, 1).

3.2.2. Exploitation Phase

There are abundant search strategies in the exploitation phase. The hawks exchange search strategy according to the random number and search energy with decreases from 0 to 1, the current iteration and the total iteration times.

(1) Soft Besiege. When and , the rabbit is energetic and tries to escape by misleading jumps. To make the rabbit exhausted, the hawks encircle it softly bywhere means the random jump strength of the rabbit throughout the escaping procedure and is a random number inside (0, 1).

(2) Hard Besiege. When and the rabbit is tired and the hawks attack it by hardly encircle:

(3) Soft Besiege with Progressive Rapid Dives. When and , the hawks progressively select the best possible dive to attack the rabbit. The new site of hawks is calculated bywherehere is a random vector and means levy flight function which can be computed as follows:where are two random number, and is obtained bywhere is a constant number which equal to 1.5.

(4) Hard Besiege with Progressive Rapid Dives. When and , the rabbit is exhausted and be trapped. The hawks dive and kill the rabbit:where and are obtained by

3.3. Multielites HHO

Our purpose is to excavate the inside of 3D DNN by generating a fooling point cloud which 3D DNN sincerely believes belongs to the target class. Therefore, we push the initial point cloud away from the decision boundary as far as possible. That is, we search for fooling point cloud whose probability . Thus, it requires an evolutionary algorithm with excellent exploitation ability.

Though HHO boosts the exploitation behavior through abundant searching strategies and different Levy flight-based patterns, it is hard to impend over due to several local optimum of 3D DNN. Thus, we propose the multielites HHO (MEHHO) with better exploitation ability by the designed group besiege and spiral-shaped attack. Its pseudo-code is shown as Algorithm 1.

3.3.1. Group Besiege

In HHO, the rabbit in each iteration is decided by one elite hawk who has the best position as most evolutionary algorithms do. Although this mechanism accelerates the convergence around the current optimum, it will move along zigzag when close to the optimum position and will increase time cost. Therefore, we propose the group besiege stage which utilizes the top best individual to represent . By this way, the hawks will search for the rabbit based on abundant information from multielites:where are the hawks with the top best positions.

3.3.2. Spiral-Shaped Attack

Logarithmic spiral is a well-known attack stage in nature, and thus many mathematical optimization techniques based on logarithmic spirals are designed, such as WOA and MFO. Here, we design a prey factor to leverage the movement toward the rabbit and the jump combined of HHO with logarithmic spiral. Therefore, the hawk position in stage soft besiege with progressive rapid dives and stage hard besiege with progressive rapid dives is reformulated by equation (13), as shown in Figure 3.where is the prey factor which randomly selected in , is a random number in , is a constant number which is set to 1.

3.4. Generate a Fooling Point Cloud by MEHHO

The generation process for fooling point cloud is modeled as a maximization problem. Then, we solve the maximization problem by MEHHO. In detail, each hawk of one population is represented by one point cloud. The fitness of one hawk is calculated by equation (2). During evolution, to limit the search space, we constrain the points of the point cloud in a bounding box. Besides, the initial population is composed by the point clouds with random points in the bounding box.

Initialize the population;
Calculate the fitness of each hawk in ;
Find top best hawks ;
whiledo
 Obtain .
for to do
  for to do
   Update
   ifthen
    update X (t + 1) by equation (3)
   else if then
    if and then
     
    else if and then
     
    else if and then
     ifthen
      update X (t + 1) by equation (13)
     else ifthen
      
     else if and then
      ifthen
      update X (t + 1) by equation (13)
     else ifthen
      
  end
  Update
  t+ = 1
end
end

4. Results and Analysis

We conduct experiments to answer the following three questions: (1) whether fooling samples can be found? (Subsection 4.2) (2) what characteristic of 3D DNN does a fooling sample reveals? (Subsection 4.3) (3) does the improvement of MEHHO work in generating fooling samples? (Subsection 4.4).

4.1. 3D DNNs and Datasets
4.1.1. 3D DNNs

The results are mainly obtained on PointNet [6]. Besides, PointNet++ [7], DGCNN [48], and RS-CNN [50] are also selected to prove the transferability. They are typical works of pointwise MLP methods, convolution-based methods, and graph-based methods, respectively [11]. In the experiments, we follow their default setting for fairness.

4.1.2. Datasets

The experiments are conducted on 3D Minist and ModelNet40, respectively [46], which are widely used in adversarial attack experiments [2123, 25, 27, 28, 52, 53]. In detail, there are 12,311 objects from 40 categories in ModleNet40, where 9,843 are used for training and the other 2,468 for testing. 3D Mnist includes 6000 3D handwritten digits from 10 categories, of which 5,000 are used for training and 1,000 for testing. For convenience of comparison, we uniformly sample 1,024 points from each object and rescale them into a unit cube.

4.2. Generate a Fooling Point Cloud
4.2.1. Fooling Point Clouds on 3D Minist

To evaluate whether the fooling point clouds able to fool 3D DNN, we first generate fooling point clouds on 3D Mnist by PointNet, as shown in Figure 4. The results show that the proposed MEHHO can find a fooling point cloud for all classes of 3D Mnist. At the same time, all fooling point clouds are hardly recognized by human vision, but PointNet labels them as the target class with more than confidence which are higher than the benign point cloud.

Although the fooling point cloud is unrecognizable by human vision, the fooling point clouds do contain a similar outline to the victim class. For example, the fooling point clouds w.r.t. class 0 tend to appear an oval outline. Fooling point clouds classified to 3, 5, 6, and 9 all tend to exhibit outline like the benign digit class. Such a result suggests that MEHHO explores the global feature that PointNet used to recognize a point cloud and utilizes the explored feature to generate fooling point clouds.

4.2.2. Fooling Point Clouds on ModelNet40

Furthermore, to verify whether PointNet is overfitting on 3D Mnist, we generate fooling point clouds on ModelNet40, which has bigger training data and more categories. The generated fooling point clouds are shown in Figure 5. The result suggests that it is feasible to generate fooling point clouds on all 40 classes. In detail, 32 classes of them only need 100 iterations, and the rest 8 classes need more than 1000 iterations and a larger population. The hypothesis is that MEHHO stuck in the local optimum. In sum, Figures 4 and 5 illustrate that MEHHO has the ability to fool PointNet with success rate on both 3D Mnist and ModelNet40.

4.2.3. Generate Fooling Samples by Other 3D DNNs

In this section, we further generate fooling samples on other 3D DNNs. The chosen 3D DNNs are PointNet++, DGCNN, and RS-CNN. The result illustrates that MEHHO can generate point clouds that fool PointNet++, DGCNN, and RS-CNN on all 40 classes of ModelNet40. Portion results are shown in Figure 6.

4.3. Further Analysis of Fooling Samples

For each fooling sample, the confidence of the victim 3D DNN on it is . In this section, we explore characteristics of 3D DNN by analyzing the feature of fooling point clouds.

4.3.1. Influence of Iteration Times

We first visualize the process to generate fooling point clouds in Figure 7. The result suggests that the confidence of the fooling point cloud goes up obviously, and the outline gradually approaches to the benign point cloud at the same time, according to the increase in iterations . Meanwhile, we find that the points of fooling point cloud tend to gather to the center of the bounding box. But the benign point cloud whose points are evenly distributed in the bounding box will slow the gathering tendency. The phenomenon suggests that the global structure mainly guides the generation of the fooling point cloud.

To further explore the influence of iterations times , we compare fooling point clouds from different as shown in Figure 8. The result indicates that the points in the point cloud are more concentrated toward the center, and their outline change slightly with the increase of which further exhibits the importance of the global feature.

4.3.2. Which Points Are Matter in the Fooling Point Cloud?

We find that lots of points in the fooling point cloud tend to gather toward the center rather than the surface. However, psychophysical evidence shows that humans perceive 3D object shapes from combined sources of boundary contour, motion, texture, and shading [71]. For human vision, the gathered points are redundant to represent object shape. When PointNet makes decisions, it does not regard each point equally. To evaluate whether the gathered points are crucial for PointNet, we visualize point cloud by the saliency map technology which aims to evaluate point-wise importance [22]. The saliency map of fooling point clouds and the corresponding benign point cloud are shown in Figure 9. The results show that the gathered points contain proportional high score points which exhibit their importance for PointNet.

At the same time, Zheng et al. [22] reduce the accuracy of PointNet from to by dropping 200 high score points. For comparison, we drop the same number of high-score points of the fooling point cloud, causing the accuracy to reduce to . The phenomenon illustrates that the fooling point cloud is more fragile in spite of its high confidence. The possible reason is that PointNet makes decisions based on fewer points of fooling point cloud than the benign. In sum, the result suggests that more points a 3D DNN relies on when making decisions, the harder it is for 3D DNN to be fooled.

4.3.3. What Type of Feature PointNet Learns?

To explore what feature the PointNet learns, we generate several fooling point clouds from independent runs which are shown as Figure 10. The basis is that the generated fooling point cloud is what PointNet thinks one class should be. In Figure 10, we observe that the fooling point clouds belonging to the same class are similar to the benign. Specifically, for class “vase,” the five fooling point clouds all exhibit narrow bottleneck, base, and a wide body. For class “tent,” the five fooling point clouds all exhibit the shape of a triangle like the victim point cloud. The result exhibits the similarity between human vision and PointNet. It is infeasible to generate objects recognizable by the human vision only through the evolutionary algorithm in the 2D image domain unless adding specific constraints [72].

These results suggest that PointNet tends to learn high-level features rather than low- or middle-level feature. If PointNet properly learns low- or middle-level features, the generated fooling point cloud should contain repetitions of object subcomponents [72] such as bottleneck.

4.3.4. Features Extracted by Different 3D DNN

Discussing the similarity of features extracted from different 3D DNNs is helpful to understand the disparity between different 3D DNNs. In this section, we feed the fooling point clouds generated by one DNN to other DNNs and evaluate the inference accuracy. The chosen 3D DNNs are PointNet, PointNet++, and DGCNN.

In Table 1, we find that PointNet and PointNet++ both able to recognize the fooling point clouds generated by each other more efficiently than DGCNN. The possible reason is that the two 3D DNNs tend to extract similar high-level features due to their similar structures.

Comparing each column of Table 1, we find that PointNet and PointNet++ both achieve higher test accuracy than DGCNN, which means that DGCNN is harder to be fooled than PointNet and PointNet++. Furthermore, it is reasonable to say that the features extracted from one class by one 3D DNN are not unique, since one 3D DNN is vulnerable to fooling samples generated by another 3D DNN. Thus, we can draw the conclusion that 3D DNN can be fooled in different ways.

To verify whether the same 3D DNN from independent training learns discriminative features, we separately train two PointNets named PointNet_1 and PointNet_2. The two PointNets have the same network structure and are trained on the same dataset, but the initialization parameters are different. We then test the accuracy of each PointNet on the fooling point clouds generated by another PointNet. The results are shown in Table 2 which illustrates that the features extracted by the same 3D DNN are similar to a large extent. The difference may come from the randomness of independent training.

4.3.5. Try to Eliminate Fooling Sample

To explore the defense against fooling samples, we regard the fooling point clouds as a new class and add them to the training dataset as an “n + 1” class. After that, we retrain PointNet using the created dataset to find whether PointNet is able to recognize the fooling samples. The result shows that the retrained 3D DNN can recognize the fooling point clouds with accuracy of .

However, we find that MEHHO can explore new fooling point clouds to fool the retrained PointNet even after ten retrainings. This phenomenon indicates that the difference between PointNet and human vision is hardly learned.

4.4. Evaluation of MEHHO
4.4.1. Comparison Evolutionary Algorithms

To evaluate the advantages of MEHHO w.r.t. searching for fooling point cloud, we compare the statistical results and the convergence results with WOA [73], BWO [74], SSA [75], SO [76], I-GWO [77], and HHO [37]. The victim 3D DNN and dataset is PointNet and ModelNet40, respectively.

4.4.2. Parameters Setting

We set the iteration times and the population size to 100. The hyper-parameters of each evolutionary algorithm are shown in Table 3.

Since the softmax layer always assigns probability to all classes, the probability of the target class will not achieve to , but will close to as possible. The results in Table 4 suggest that the generated fooling point cloud of MEHHO is closer to than other evolutionary algorithms. In detail, WOA always fails to generate fooling point cloud since it falls into the local optimum, when attacking ‘bathtub’. For model bed, bench, bookshelf, MEHHO is able to generate fooling point cloud with more than confidence. For model bottle and bowl, I-GWO is the better one to generate fooling samples, but it is inferior to MEHHO on other models.

Figure 11 shows the convergence curves on classes bed, bottle, airplane, bench, bookshelf, and bowl. The closer the curve is to the upper left, the better. For the model bench, the probability of MEHHO increases rapidly when and converges when . Meanwhile, we argue that the model bed is easily fooled since the probability of all evolutionary algorithms increases at the early stage of iteration and all achieve probabilities greater than . Furthermore, for model Bookshelf and Bottle, MEHHO is able to generate fooling point cloud using the minimum iteration times. Therefore, the results suggest that MEHHO outperforms other algorithms w.r.t. convergence performance.

5. Conclusions and Future Works

To research the difference between 3D DNN and human vision, we generate a fooling point cloud using MEHHO. The fooling point cloud is unrecognizable to humans but is classified as target class by 3D DNN firmly. We analyze the characteristic of 3D DNN by analyzing features of the fooling point cloud and give the following main conclusions: (1) 3D DNN tends to learn high-level feature of one object; (2) 3D DNN that makes decision relying on more points is more robust; (3) The gap is hardly learned by 3D DNN. In the end, we evaluated the proposed MEHHO on several models, and the results suggest that MEHHO achieves better statistics and convergence results.

In the future, we will extend the research from the following two aspects: First, explore the feature extracted by each layer of 3D DNN to learn more about it. Second, some generated fooling point clouds exhibit similar shapes as the victims in our experiments. Thus, it is meaningful to study whether such a method can generate a point cloud that conforms to human vision.

Data Availability

The ModelNet40 is a public dataset widely used for academic research.

Conflicts of Interest

The authors declare that they have no conflicts of interests.

Authors’ Contributions

Linkun Fan, Xiaoxin Gao, and Jinkun Luo contributed equally to this work.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant no. 62072348, the Science and Technology Major Project of Hubei Province (Next-Generation AI Technologies) under Grant no. 2019AEA170 and China Yunnan Province Major Science and Technology Special Plan Project no. 202202AF080004. The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University.