Abstract

Segmentation is an important step in point cloud data feature extraction and three-dimensional modelling. Currently, it is also a challenging problem in point cloud processing. There are some disadvantages of the DBSCAN method, such as requiring the manual definition of parameters and low efficiency when it is used for large amounts of calculation. This paper proposes the AQ-DBSCAN algorithm, which is a density clustering segmentation method combined with Gaussian mapping. The algorithm improves upon the DBSCAN algorithm by solving the problem of automatic estimation of the parameter neighborhood radius. The improved algorithm can carry out density clustering processing quickly by reducing the amount of computation required.

1. Introduction

Light Detection and Ranging (LIDAR) is an important means of obtaining building data. Because the point cloud data obtained by LIDAR are voluminous and discrete, and usually have significant amounts of noise data, automatic semantic segmentation and feature extraction have become a bottleneck of current research and are academic hotspots at the forefront of this field.

At present, the commonly used segmentation methods include the region-growing segmentation method [14], model fitting segmentation method [5, 6], mixed segmentation method [6, 7], and clustering-based segmentation method [811].

The region-growing segmentation method is simple and efficient, but it is easy to segment excessively due to the interference of noise points [12, 13]. The model fitting segmentation method is robust against interference; it has a significant disadvantage in that it can only detect basic geometry and cannot detect complex surfaces [6, 1418]. Additionally, due to the vast number of matching iterative operations, its efficiency is low. The mixed segmentation method integrates multiple segmentation methods and is generally designed for specific point cloud data [19, 20]. In order to obtain good applicability, the algorithm inherits advantages and abandons or weakens defects.

There are three main kinds of clustering-based segmentation methods, including hierarchical methods, partitioning methods, and density-based methods [21, 22].

Hierarchical methods decompose a collection hierarchically until certain conditions are met. Because of the irreversibility of hierarchical methods, errors cannot be corrected in the process of decomposition or clustering.

The k-means algorithm is one of the most common partitioning methods. The k-means algorithm uses value k as a parameter to divide n objects into k clusters so that the objects within the same clusters have higher similarity, while the similarity between different clusters is low. Because of its simplicity and speed, it can be used for very large datasets. However, the k-means algorithm is easily affected by the initial value, and the value k must be given in advance.

The key idea of density-based clustering is to judge whether the sample points belong to a cluster by whether they are closely related. In the view of the density clustering method, the cluster is a high-density object region separated by a low-density region in data space, and the data in the sparse data region is considered as noise data. This algorithm sets a certain threshold, and as long as the density of the adjacent area of a certain point exceeds the threshold, the cluster can proceed. This method can find any shape of cluster and can filter the “noise" data. Finding the cluster shape is the greatest advantage of the algorithm, and it is widely used in the practical algorithm. A typical algorithm for this method is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm [23]. The main disadvantage of this type of method is that it is more sensitive to user-defined density parameters, that different thresholds for the results of clustering are relatively large and that the threshold must be manually set. In addition, many iterative computations are needed in the clustering process, which leads to low efficiency. The FDBSCAN algorithm proposed by Zhou [24] is an improvement on the DBSCAN algorithm. This algorithm proposed a method of determining the number of representative points according to the clustering dimension, which can reduce the number of iterations. However, it still requires setting of the neighborhood radius manually and cannot be fully automated.

Focusing on the advantages and disadvantages of the above methods, this paper proposes AQ-DBSCAN, an algorithm that improves upon the DBSCAN algorithm by including an automatic segmentation method for point cloud data obtained from buildings. The method used in the AQ-DBSCAN algorithm is designed to segment a point cloud mapped on a Gaussian sphere. Based on the minimum distance, maximum distance, and number of points in different neighborhood ranges on the Gaussian sphere, the value of space sphere radius σ is obtained automatically. The AQ-DBSCAN algorithm solves the problem of automatic estimation of the neighborhood radius in the DBSCAN algorithm and reduces the amount of computation in density clustering. The specific segmentation process is shown in Figure 1.

Through Gaussian mapping, the data can be reduced from three dimensions to two dimensions and the influence of interference points on subsequent segmentation can be removed. The clustering can also get shape features, which is convenient for subsequent feature extraction.

2. DBSCAN Algorithm

DBSCAN is a classical algorithm for density clustering. The DBSCAN algorithm requires two parameters in advance. The first parameter is the neighborhood radius σ. The second parameter is MinPts, which is the threshold of the number of points in the core point neighborhood. Clusters can be established by directly and iteratively searching density-reachable points. Ester [23] gives the selection method of these two parameters:(1)Specify MinPts as 4; then for any , calculate the maximum distance of the four objects closest to , and then traverse all the points in X to get the set .(2)Sort D, and when drawing the curve, the abscissa is the number of points, and the ordinate is the distance.(3)Calculate and find the inflection point on the curve of the rapid rise, and the distance of the inflection point is the neighborhood radius .

The geometric shapes of ancient building elements mainly include planes, cylinders, and the like, which are mapped to Gaussian balls for the point or line shape. The DBSCAN algorithm can identify any shape of cluster and has strong applicability for the initial segmentation of the ancient building Gaussian map point cloud. However, as the DBSCAN algorithm performs a neighborhood query and calculation for each point, its operation efficiency is low, having a time complexity of . For a point cloud of ancient buildings with a large amount of data, the DBSCAN algorithm is particularly inefficient. The neighborhood radius is usually determined by hand and is obviously not suitable for automatic point cloud segmentation.

3. AQ-DBSCAN: An Improved Automatic Clustering Algorithm

3.1. Automatic Parameter Estimation

The DBSCAN parameter estimation proposed in this paper is an artificial judgment method, which cannot satisfy the requirement of automatic point cloud segmentation. According to the DBSCAN algorithm, is set to 4. is an element in , which is mapped on the Gauss sphere by the X-point set. Search the K neighborhood of the original point corresponding to in X and then find the MinPtsth point closest to the after the point set of the K neighborhood is mapped to the Gaussian sphere. Let the distance be , calculating the for each point, and the total number of corresponding points with as the increment unit interval is counted as follows:The graph takes as the horizontal axis and as the vertical axis. As m increases, the curve of gradually flattens out. The graph of the point cloud data is generally shown in Figure 2.

In Figure 2, because the plane and the cylinder have certain differences in the on the Gaussian sphere, the curve exhibits a slight wave fluctuation during the upward extension. The start and the end points of the connection curve form a straight line L and are then used to calculate the distance H of each point on the curve from the straight line L as well as the largest statistical point of H. The x-value of the point is the value of the parameter. In this way, the automatic estimation of the parameter is achieved.

3.2. Fast Clustering

In the growth of a class, the DBSCAN algorithm needs to determine whether each point in a neighborhood is a core point, which is accomplished by calculating the number of points in the neighborhood of. Although the K-D tree index can be established on the point cloud data after Gaussian mapping to increase the judgment speed, the operation efficiency of the algorithm is still low.

The FDBSCAN algorithm proposed by Zhou [24] is an improved version of DBSCAN. The FDBSCAN algorithm proposes selecting the number of representative points (rather than all neighborhood points) in the neighborhood to replace the class growth. The number of representative points is related to the dimension of space; i.e., the number of representative points of n-dimensional space is 2n.

Figure 3 shows the effect of the divergence of four representative points on the clustering expansion effect of two-dimensional space. In the case where the divergence of the representative point is not good, there is a case where the density of a point and is reachable, but it can only be reached through the core point density, which is not selected as the representative point. In Figure 3, the red dot is , the red circle is the region of, the green point is the representative point selected from Σ (), the green circle is the neighborhood area of the representative point, and the black points are representative points which are further selected from the neighborhood of the representative point to make the cluster extension. Because the representative points are concentrated in the upper region of the neighborhood, the expansion direction of the clustering is developed upward, and the lower points cannot be classified into the same category.

In view of the above problems, on one hand, we need to combine the points and the relevant subclasses of these points in follow-up work, and on the other hand, we should take into account the rapidity of the algorithm and the diffusion of the representative points in the selection of representative points. In [24], a selection algorithm of two representative points is proposed, but neither the efficiency of algorithm nor the divergence of representative points is suitable for the dense point cloud data on Gaussian spheres. Compared with the FDBSCAN algorithm, AQ-DBSCAN reduces the amount of computation in the search algorithm with the two-dimensional characteristics of data on the Gaussian spherical surface and sets the number of representative points as 4. Additionally, this algorithm presents a means for selecting representative points that are suitable for dense point cloud data, which enhances the selection efficiency of representative points and guarantees the diffusion of representative points. The selection method is described as follows:

Set as an object in point set X, 0<ε<σ; then is a candidate set of representative points in an neighborhood. is the annular region betweenandcentered on . Selecting a representative point in the peripheral area of σ() helps to enhance the diffusion of the representative points and reduces the frequency of neighborhood queries when the class expands. The closer the value of is to the value of, the fewer the number of candidate points is. In this case, the search efficiency of the representative points will be higher, but may also lead to a poor divergence of points. Considering the factors of diffusion and high efficiency synthetically, thevalue of this algorithm is. When the representative point is selected, the point furthest from is searched as the representative point, and the selection of the subsequent representative point is a three-iteration of search in ε().

Set the represented set of points to , and set the representative point to be searched to ; then,

Among them,

as shown in Figure 4.

There are four steps used to find the representative point in the area of the red ring (located between the p and k ring regions). First, find the point X1, which is the furthest point from the search point X0. Second, find the furthest point X2 with X1. The third step is to find the smallest maximum-distance point with X1 and X2. Finally, find the maximum distance point with X1, X2, and X3.

3.3. Practical Comparison among AQ-DBSCAN, DBSCAN, and FDSCAN

Compared with DBSCAN and FDBSCAN, there are two improvements made by AQ-DBSCAN. First, the automatic estimation ofis provided with the precondition of the given MinPts parameter. Second, a faster density-based spatial clustering algorithm is achieved. In this paper, all the experiments are completed by a notebook computer with 1.2GHz dual-core CPU and 1 GB memory. The experiment data is point cloud data, which was collected using a terrestrial laser scanner from the Gate and the Hall of Supreme Harmony in the Forbidden City in Beijing, China. The scanner model used was a ScanStation2, shown in Figure 5, and is a pulsed scanner with a scanning speed of 50,000 dps. The maximum scanning angle of a single station is 360°270°, the maximum distance of a single station is 300 meters, and the accuracy of the scanner is better than 6 mm.

Figure 6(a) shows the columns, which are located outside the Hall of Supreme Harmony, and the number of column samples of point cloud data is 141,999. Figure 6(b) is the corresponding Gauss map. Figure 6(c) includes the line graph counted automatically by the AQ-DBSCAN algorithm and the estimated value of (0.067831). The DBSCAN and FDBSCAN algorithms mainly rely on the expert’s individual judgment of the shape of the curve and the change of the extension trend in parameter estimation. According to the figures, the AQ-DBSCAN algorithm gives the value exactly at the inflection point of the curve, which is consistent with the result of manual judgment. Figure 6(d) is the Clustering Effect Diagram on the Gaussian sphere. The cylinder and some appendages are clustered into one class because they overlap on the Gaussian sphere, which will be segmented after the analysis of the overlapping zone. Figure 6(e) is the corresponding Clustering Effect Diagram to a spacing surface.

Figure 7 illustrates parameter estimation and clustering effect of the beam above the Hall of Supreme Harmony. Figure 7(a) shows the beam above the Hall of Supreme Harmony, and the number of the beam samples of point cloud data is 642,984. Figure 7(b) is the corresponding Gauss map. Figure 7(c) includes the line graph counted automatically by the AQ-DBSCAN algorithm and the estimated value of (0.011236). According to the figures, the AQ-DBSCAN algorithm gives the value exactly at the inflection point of the curve, which is consistent with the result of manual judgment. Figure 7(d) is the clustering effect diagram on the Gaussian sphere. Figure 7(e) is the corresponding Clustering Effect Diagram to a spacing surface. Based on the data, the AQ-DBSCAN cluster separates the front and side of the beam.

Figure 8(a) shows part of the data of the Hall of Supreme Harmony, including a column and the beams associated with it, and the number of the samples of point cloud data is 5,771. Figure 8(b) is the corresponding Gauss map. Figure 8(c) includes the line graph counted automatically by the AQ-DBSCAN algorithm and the estimated value of (0.014331). According to the figures, the AQ-DBSCAN algorithm gives the value exactly at the inflection point of the curve, which is consistent with the result of manual judgment. Figure 8(d) is the Clustering Effect Diagram on the Gaussian sphere. The cylinder and some beams are clustered into one class because they overlap on the Gaussian sphere. Figure 8(e) is the corresponding Clustering Effect Diagram to a spacing surface.

In contrast to DBSCAN, AQ-DBSCAN quickly selects the representative points to extend the region, and with the precondition of the same result after the algorithm calculating, the clustering speed has been greatly improved. It is clear that from the above-mentioned experimental results, shown in Figures 9, 10, and 11, the Gauss figures show that the clustering effect is identical.

The final segmentation effect is as shown in Figure 12.

Table 1 compares the time consumed by the three experiments described above, and the time unit is milliseconds. Figure 13 is an analysis of time-consumption for AQ-DBSCAN and DBSCAN, and Figure 14 is an analysis of time-consumption for AQ-DBSCAN and FDBSCAN. It can be seen from the time-consumption comparison that the AQ-DBSCAN algorithm is less time-consuming than the DBSCAN and FDBSCAN algorithms. The AQ-DBSCAN algorithm accelerates significantly as the number of points increases. For example, when 1,441,999 points are used, DBSCAN is six times more time-consuming than AQ-DBSCAN, and when 642,984 points are used, DBSCAN is 65 times more time-consuming than AQ-DBSCAN.

4. Conclusions

The segmentation method based on density clustering is one of the important methods of point cloud segmentation. The most commonly used density clustering method, the DBSCAN algorithm, can identify any shape of cluster, and its strong applicability is demonstrated by the initial segmentation of the ancient building Gaussian map point cloud. However, there are some drawbacks to the method, such as the need for a manual definition of the parameters, as well as having low efficiency for high amounts of calculation. Aiming at addressing the disadvantages of the DBSCAN algorithm, this paper presents an improved automatic clustering algorithm, AQ-DBSCAN, which is based on the DBSCAN algorithm. The AQ-DBSCAN algorithm solves the lack of automatic estimation of the neighborhood of parameters in the DBSCAN algorithm by automatically generating the Nm curve and automatically searching for the inflection point. In addition, this algorithm presents a reliable method of using representative points to expand the neighborhoods, thereby reducing the amount of computation, to quickly perform density clustering. Clustering experiments were performed using point cloud data collected from the Imperial Palace of the Forbidden City in Beijing, China, using a ScanStation 2 scanner. According to the comparison of DBSCAN and FDBSCAN algorithms, it is obvious that with the same clustering effect, the AQ-DBSCAN algorithm allows for automatic parameter estimation and high efficiency. Only the automatic estimation method of neighborhood radius σ is discussed in this paper. The other parameter, MinPts, is defined as 4 according to convention. Further work will focus on the automatic adaptive definition of MinPts based on differences in point cloud density.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (41601409 and 41301429) and the Beijing Natural Science Foundation (8172016).