#### Abstract

A four-ocular vision system is proposed for the three-dimensional (3D) reconstruction of large-scale concrete-filled steel tube (CFST) under complex testing conditions. These measurements are vitally important for evaluating the seismic performance and 3D deformation of large-scale specimens. A four-ocular vision system is constructed to sample the large-scale CFST; then point cloud acquisition, point cloud filtering, and point cloud stitching algorithms are applied to obtain a 3D point cloud of the specimen surface. A point cloud correction algorithm based on geometric features and a deep learning algorithm are utilized, respectively, to correct the coordinates of the stitched point cloud. This enhances the vision measurement accuracy in complex environments and therefore yields a higher-accuracy 3D model for the purposes of real-time complex surface monitoring. The performance indicators of the two algorithms are evaluated on actual tasks. The cross-sectional diameters at specific heights in the reconstructed models are calculated and compared against laser rangefinder data to test the performance of the proposed algorithms. A visual tracking test on a CFST under cyclic loading shows that the reconstructed output well reflects the complex 3D surface after correction and meets the requirements for dynamic monitoring. The proposed methodology is applicable to complex environments featuring dynamic movement, mechanical vibration, and continuously changing features.

#### 1. Introduction

Three-dimensional (3D) visual information is the most intuitive data available to an intelligent machine as it attempts to sense the external world [1–3]. Vision 3D reconstruction technology can be utilized to acquire the spatial information of target objects for efficient and accurate noncontact measurement [4, 5]. It is an effective approach to tasks such as real-time target tracking, quality monitoring, and surface data acquisition; further, it is the key to realizing automatic, intelligent, and safe machine operations [6–15].

In the field of civil engineering, researchers struggle to reveal the failure mechanisms of certain materials or structures in seeking the exact properties of composite materials. Traditional contact measurement methods rely on strain gauges, displacement meters, or other technologies, which may be inconvenient and inefficient. A vision sensor can comprehensively reveal the optical information of the target surface, allow the user to develop a highly targeted measurement scheme for different targets, and achieve high precision and noncontact measurement.

The construction of the vision system and its working process differ slightly across different measurement distances and types of target. For large-scale targets at long distances, the comprehensiveness of sampling and the improvement of the ability of the vision system to resist long-distance interference are the primary design considerations. Multiple cameras and even other types of sensors can be used together to enhance the system’s stability [16–18]. For close-range targets (e.g., within 1-2 m), the measurement accuracy of the vision system is an important consideration. Appropriate imaging models and distortion correction models are necessary to build high-performance visual frameworks that achieve specific high-precision measurement and inspection tasks.

At close range, a measured object can be global or local [19–21]. Structural monitoring tasks rarely require the use of visual systems with very small measurement distances, as the focus of attention in the field of civil engineering tends to be large structures. The real-time performance of the vision system must be further improved to suite deformed structures. The robustness of the 3D reconstruction algorithm also should be strengthened, as the geometric parameters vary throughout the deformation process [22, 23]. Researchers and developers in the computer vision field also struggle to effectively track and measure dynamic surface-deformed objects with stereo vision. The 3D reconstruction of curved surfaces under large fields of view (FOVs) is particularly challenging in terms of full-field dynamic tracking [24]. The core algorithm is the key component of any tracking system. Problematic core algorithms restrict the application of 3D visual reconstruction technology including omnidirectional sampling and high-quality point cloud stitching.

Existing target surface monitoring techniques based on 3D reconstruction include monocular vision, binocular vision, and multivision methods [15, 21, 25, 26]. The monocular vision method cannot directly reveal 3D visual information; it must be restored through Structure from Motion (SFM) technology [27]. SFM works by extracting and matching the feature points of the images taken by a single camera at different positions, so as to correlate the images of each frame and calculate the geometric relationship between the cameras at each position to triangulate the spatial points. The monocular vision method has a simple hardware structure and is easily operated but is disadvantaged by the instability of the available feature point extraction algorithms.

The binocular vision method is also based on feature point matching and triangulation techniques. However, the positional relationship of the cameras in the binocular vision system is fixed, so the geometric relationship between the cameras can be obtained offline with high-precision calibration objects. This results in better measurement performance in complex surface monitoring tasks than the monocular vision approach. Both methods are limited to their narrow FOVs, however, and do not allow users to sample large-scale information [28, 29], which is not conducive to high-quality structural monitoring.

The construction of a multivision system, supported by model solutions and error analysis methodology under coordinate correlation theory, is the key to successful omnidirectional sampling. The concept is similar to that of the facial soft tissue measurement method. The multiangle information of the target is sampled before the 3D reconstruction is completed [22]. However, for real-time structural monitoring tasks, the visual system also must complete accurate size measurement. Candau et al. [30], for example, correlated two independent binocular vision systems with a calibration object while applying a spray on an elastic target object as a random marker for the dynamic mechanical analysis of an elastomer. Zhou et al. [31] binary-encoded a computer-generated standard sinusoidal fringe pattern. Shen et al. [32] conducted 3D profilometric reconstruction via flexible sensing integral imaging with object recognition and automatic occlusion removal. Liu et al. [29] automatically reconstructed a real, 3D human body in motion as captured by multiple RGB-D (depth) cameras in the form of a polygonal mesh; this method could, in practice, help users to navigate virtual worlds or even collaborative immersive environments. Malesa et al. [33] used two strategies for the spatial stitching of data obtained by multicamera digital image correlation (DIC) systems for engineering failure analysis: one with overlapping FOVs of 3D DIC setups and another with distributed 3D DIC setups that have not-necessarily-overlapping FOVs.

The above point cloud stitching applications transform the point cloud into a uniform coordinate system by coordinate correlation. In demanding situations, the precision of the stitched point cloud may be decisive, while the raw output of the coordinate correlation is ineffective. Persistent issues with dynamic deformation, illumination, vibration, and characteristic changes as well as visual measurement error caused by equipment and instrument deviations yet restrict the efficacy of high-precision 3D reconstruction applications. To this effect, there is demand for new techniques to analyze point cloud stitching error and for designing novel correction methods.

In addition to classical geometric methods and optical methods, deep neural networks have also received increasing attention in the 3D vision field due to their robustness. Sinha et al. [34] obtained the topology and structure of 3D shape by means of coding, so that convolutional neural network (CNN) could be directly used to learn 3D shapes and therefore perform 3D reconstruction; Li et al. [35] combined structured light techniques and deep learning to calculate the depth of targets. These methods alone outperform traditional methods on occlusion and untextured areas; they can also be used as a complement to traditional methods. Zhang et al. [36] proposed a method for measuring the distance of a given target using deep learning and binocular vision methods, where target detection network methodology and geometric measurement theory were combined to obtain 3D target information. Sun et al. [37] designed a CNN and established a multiview system for high-precision attitude determination, which effectively mitigates the lack of reliable visual features in visual measurement. Yang et al. [38] established a binocular stereo vision system combined with online deep learning technology for semantic segmentation and ultimately generated an outdoor large-scale 3D dense semantic map.

Our research team has conducted several studies combining machine vision and 3D reconstruction in various attempts to address the above problems related to omnidirectional sampling, high-accuracy measurement, and robust calculation [39–56]. In the present study, we focus on stitching error and the recovery of multiview point clouds for the high-accuracy monitoring of large-scale CFST specimens. Existing multiview point cloud stitching technology is further improved here via traditional and deep learning methods. Our goal is to examine the structure and material properties in a noncontact manner with high-accuracy, stable, and robust performance. We ran a construction-and-error analysis of a multivision model followed by improvements to the high-quality point cloud correction algorithms to achieve real-time correction and reconstruction of large-scale CFST structures. The point cloud correction algorithms, which center on the stitching error of the multiview point cloud, are implemented via geometry-based algorithm and deep-learning-based algorithm. Since the proposed point cloud correction algorithms are constructed based on the geometric and spatial characteristics of the target, they are adaptive and efficient under complex conditions featuring dynamic movement, mechanical vibration, and continuously changing features. We hope that the observations discussed below will provide theoretical and technical support in improving the monitoring performance of current visual methods for CFSTs and others.

#### 2. Materials and Methods

The algorithms and seismic test operations used in this study for obtaining the dynamic CFST surfaces by four-ocular vision system are described in this section. The dynamic specimen surfaces were obtained, and the relevant geometric parameters were extracted by means of stereo rectification, point cloud acquisition, point cloud filtering, point cloud stitching, and point cloud correction algorithms. A flow chart of this process is shown in Figure 1.

##### 2.1. Stereo Rectification

A 2D circle grid with 0.3 mm manufacturing error was placed in 20 different poses to provide independent optical information for calibration, as shown in Figure 2.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

We applied camera calibration based on Zhang’s method [57] to determine the matrices of each individual camera:where is the image coordinate of the circle center, is the world coordinate of the circle center, *Z*_{c} is the depth of the circle center, and *M* and are the camera matrix and extrinsic matrix of the single camera, respectively. Structural parameters of the binocular cameras were calculated based on the extrinsic matrices to realize stereo calibration:where is the structural parameter of binocular cameras and and are the extrinsic matrices of the left and right cameras, respectively. The camera calibration results are discussed in Section 3.

After stereo calibration, we used the classical Bouguet’s method [58] to implement stereo rectification. Corresponding points were placed from left and right images into the same row as shown in Figure 3.

##### 2.2. Point Cloud Acquisition

The camera in our setup samples images after stereo rectification. It is necessary to obtain as much target surface information as possible to perform 3D reconstruction and obtain accurate geometric parameters of the target, so we sought to generate as dense a 3D point cloud as possible. The triangulation-based calculation for dense 3D points is as follows:where is the camera coordinate of the target, is the left imaging plane coordinate of the target, *T*_{x} is the length of the baseline, *f* is the focal length of the camera, and *d* = *x*_{l} − *x*_{r} is the disparity. Here, we used a classical 3D stereo matching algorithm [59] to generate a dense disparity map from each pixel in the images.

The point cloud acquisition process is shown in Figure 4.

##### 2.3. Point Cloud Filtering

Point cloud filtering is one of the key steps in point cloud postprocessing. It involves denoising and structural optimization according to the geometric features and spatial distribution of the point cloud, which yields a relatively compact and streamlined point cloud structure for further postprocessing. The point cloud filtering process also involves the removal of large-scale and small-scale noise. Large-scale noise is created by scattering outside the main structure of the point cloud over a large range; small-scale noise is caused by small fluctuations adhering to the vicinity of the point cloud structure. Large-scale noise is generally attributed to noninteresting objects and mismatches, while small-scale noise relates to the spatial resolution limitations of the vision system. A flow chart of the point cloud filtering process is shown in Figure 5.

###### 2.3.1. Pass-Through Filtering

The pass-through filter [60] defines inner and outer points according to a cuboid with edges parallel to three coordinate axes. Points that fall outside of the cuboid are deleted. The cuboid is expressed as follows:where is the inlier defined using the pass-through filter, , , and are the minimum cut-off thresholds in the directions of , , and axes, respectively, and , , and are the maximum cut-off thresholds in the directions of , , and axes, respectively.

The pass-through filter can be used to roughly obtain the main point cloud structure and minimize the cost of calculation. The cut-off threshold should be conservative enough to ensure that the cuboid is consistently larger than expected, which prevents the accidental deletion of the main structure of the point cloud. A fixed and reliable cut-off threshold can be determined by strictly controlling the relative position between the camera and the specimen.

In the task of CFST deformation monitoring, the relative position of the specimen and the optical system can be determined in advance so that the threshold in each direction can be determined accurately. It is worth noting that although the specimen discussed here is cylindrical, it had a large amplitude swing during the test; the cuboid area has better adaptability than the cylindrical area in the filtering task for this reason. A diagram of the pass-through filter is shown in Figure 6.

###### 2.3.2. Bilateral Filtering

The bilateral filter [61] can remove small-scale noise, that is, anomalous 3D points that are intermingled with the surface of the point cloud. The noise is closely attached to the surface and readily causes fluctuations that can drive down the accuracy of the reconstructed 3D model. The primary purpose of bilateral filtering is to move this superficial noise along the normal direction of the model and gradually correct its coordinates and position. Bilateral filtering smooths the point cloud surface while retaining the necessary edge information. Noise moves along the following direction:where is the filtered point, is the original point, is the weight of bilateral filtering, and is the normal vector of . Detailed descriptions can be found in [60].

###### 2.3.3. ROR Filtering

The radius outlier removal (ROR) filter [62] distinguishes inliers from outliers based on the number of neighbors of each element in the point cloud. For any point in point cloud , a sphere is constructed with a radius of centered on it:

If the number of elements in is less than , then the ROR filter regards as noise, as shown in Figure 7. The threshold is selected according to a certain ratio of the number of elements in the point cloud:where is the number of elements of the point cloud and is a scale factor. Generally, is 2%–5%.

The original and filtered point clouds are shown in Figure 8.

##### 2.4. Point Cloud Stitching

Multiple cameras can be employed in the sampling task to gather as much information as possible about the target. As shown in Figure 9, four cameras constitute two pairs of binocular cameras in our experimental setup. The two left camera coordinate systems are denoted as “CCS-A” and “CCS-B,” respectively, and the world coordinate system is denoted as “WCS.” The principle of point cloud stitching is to solve the coordinate transformation matrices and within WCS to CCS-A and CCS-B and then transfer the point cloud from CCS-A and CCS-B to WCS to complete the coordinate correlation. The WCS can be established via a high-precision calibration board with known parameters.

Our calibration board has 99 circular patterns. The coordinates of all the centers on CCS-A, CCS-B, and WCS were combined column-by-column to obtain coordinate data matrices , , and (sized 3 99):

The following is obtained through matrix transformation according to the principal of least squares:

The coordinate data matrices and of the target relative to CCS-A and CCS-B were calculated according to equation (3) and then converted into the WCS as follows:

The point cloud stitching process is illustrated in Figure 10.

**(a)**

**(b)**

**(c)**

##### 2.5. Point Cloud Correction

###### 2.5.1. Stitching Correction

The stress state of the devices in this setup is very complex due to the combination of large axial pressure, cyclic tension, and fastening force. Random, difficult-to-measure shocks and vibrations often occur during such tests, which cause the camera frames to move slightly on the ground, although they are fixed in advance and are far from the specimen (about 1 m). The effect of the slight movement of the camera frames may cause the stitched point clouds to appear staggered, as shown in Figure 11, which can be destructive in demanding monitoring missions. Assume that the translation vector and rotation matrix of two staggered point clouds caused by the complex loading conditions are *P*_{M} and *R*_{M}.

We designed geometry-based and deep-learning-based algorithms in this study to correct two staggered point clouds. We assume that the optical axes of the cameras remain parallel to the (horizontal) ground surface after they move, so the moving direction of the point cloud is perpendicular to the axis and . There is no rotation between the two point clouds:where *E* refers to the unit matrix. The translation vector mentioned above was acquired by nonlinear least squares; then the staggered point cloud was translated accordingly. Deformation of the upper part of the specimen was not severe at any point in the test, so it is close to an ideal cylinder and even the bottom presents obvious deformation. In order to simplify the calculation of , the upper part of the specimen can be regarded as a standard cylinder.

We set a plane parallel to the plane to intercept the upper part of the specimen and obtain a circular arc. The expression of the arc in 3D space iswhere is the projection of the circle center on the plane and is the radius of the circle. The optimization target of nonlinear least squares iswhere is the objective function of nonlinear least squares and is the *i*-th sample point in the section. The optimal values , , and were iteratively obtained by the Levenberg–Marquardt (LM) algorithm [63] as follows:where is the value of the model parameter at the *k*-th iteration, is the Jacobian matrix of the objective function at the *k*-th iteration, is the gradient of the objective function at the *k*-th iteration, is the damping coefficient, and is the identity matrix. When is around 150 mm, it is basically guaranteed that the edge of the section is an approximate arc.

The respective estimated values and for the height can be obtained according to formula (14). Figure 12 shows the stitching correction process:

**(a)**

**(b)**

Next, we used PointNet++ [64], a robust 3D semantic segmentation network, to extract the common parts of the point cloud and performed ICP point cloud registration [65] on them to correct the relative positions of the two clouds. This method has no stronger assumptions than the geometry-based method:where *P*_{M} and *R*_{M} are not necessarily equal to 0 and form a unit matrix.

As shown in Figure 13, we manually marked the common parts of two staggered point clouds in a large number of samples and fed them to the network for training. After the training was complete, we could use the network to identify and extract the common parts of the two given point clouds.

The blue-shaded area in Figure 14 represents the common parts extracted by the PointNet++ network, which were used to implement ICP registration for point cloud correction.

**(a)**

**(b)**

###### 2.5.2. Establishing the Specimen Coordinate System

In the point cloud stitching step, there is always a small angle A between the plane of the calibration board and the axis of the specimen (Figure 15(a)). Indeed, the section in the direction of the WCS is not the actual cross section of the test piece but rather an oblique section (red line, Figure 15(a)). The vision measurement technique is based on the WCS, so it is difficult to determine the actual measuring position in this setup.

**(a)**

**(b)**

To address this problem, we considered a coordinate system (specimen coordinate system) fixed to the specimen. As shown in Figure 15(b), the axis coincides with the axis of the specimen. If a point cloud is transformed from the WCS to the specimen coordinate system, it is guaranteed that the section in the direction actually corresponds to the real cross section of the specimen (green line, Figure 15(a)). Therefore, the point cloud based on the specimen coordinate system is meaningful.

The specimen coordinate system was established as discussed below. The equation for an arbitrarily posed cylinder in the WCS iswhere is the unit vector of the axis of the cylinder, is a 3D point on the axis of the cylinder, and is the radius of the cylinder.

Similar to the circle-fitting process discussed above, the upper half of the cylinder was fitted using nonlinear least squares. The left side of equation (17) is denoted as ; then the objective function of the cylinder fitting iswhere is the objective function of nonlinear least squares and is the *i*-th sample point. Similarly, equation (18) can be solved by LM algorithm:where is the value of the model parameter at the *k*-th iteration, is the Jacobian matrix of the objective function at the *k*-th iteration, is the gradient of the objective function at the *k*-th iteration, is the damping coefficient, and is the unit matrix. According to equation (19), the optimal values of and can be iteratively obtained as follows:

The direction of is different from that of the axis and is located on the axis, so a unique point for can be determined on the axis:

Taking as the origin, a directional cylinder axis can be set in the direction of axis ; the direction of axis is the same as and the direction of is perpendicular to the plane . Thus, the specimen coordinate system is established:

Next, the rotation matrix and translation vector of the specimen coordinate system to the WCS can be calculated as follows:

According to , the point cloud from the WCS is transformed to the specimen coordinate system as follows:

##### 2.6. Surface Reconstruction

After the corrected point cloud is obtained, the target surface reconstruction is completed via Poisson surface reconstruction algorithm [66]. The Poisson surface reconstruction algorithm is a triangular mesh reconstruction algorithm based on implicit functions, which approximates the surface by optimizing the interpolation of 3D point cloud data. Its stepwise implementation is discussed below.

Point cloud data containing the normal information is used as the input data of the algorithm and recorded as a sample set *S*. Each *S* includes a plurality of *P*_{i} and corresponding normal *N*_{i}. Assuming that all points are located on or near the surface of an unknown model *M*, the indication function of *M* is estimated and the Poisson equation is established with the intrinsic relationship of the gradient. The contour surface is then extracted via Marching Cubes algorithm and the reconstruction of the surface is complete. Finally, the surface model data is output.

The Poisson surface reconstruction algorithm process involves octree segmentation, vector field calculation, solving the Poisson equation, surface model generation, and surface extraction. A flow chart of this process is given in Figure 16.

#### 3. Experiment

A four-ocular vision system was used to perform 3D surface tracking experiments on a CSFT column under axial and cyclic radial loads. We focused on the accuracy of two selected sample points on the 3D model, which approximately reflects the precision of the reconstructed 3D surface. Diameters of specimen cross sections were measured by laser rangefinders as standard values; then the visual measurements were compared against the standard values. Finally, indicators for error evaluation were calculated to validate the proposed method. Three CFST specimens were selected for this experiment.

Camera calibration and stereo rectification were implemented prior to the formal initiation of the experiment. Since strong vibrations can cause subtle changes in the structural parameters of the multivision system, the system was recalibrated before each column was tested to ensure accurate 3D reconstruction. Only the parameters corresponding to Specimen 1 are presented here to illustrate the calibration target (Table 1).

During the test, an axial load was applied to the top of the specimen at a constant 10 kN. A cyclic radial load was applied along the horizontal direction with 20 kN added in each cycle. The axial load from the top varied slightly in each cycle as the specimen swung to the side under the cyclic load. To account for this, we continuously fine-tuned the axial pressure during the experiment so that the axial load of the specimen was always 10 kN. After the specimen yielded, the horizontal displacement of the press increased in each cycle; the increment is an integral multiple of the yield displacement of the specimen. Each run of the experiment ended once significant deformation or damage was observed in the bottom of the specimen.

As shown in Figure 17, four calibrated MV-EM200C industrial cameras (1600 1200 resolution) were placed in front of the specimen and two laser rangefinders (1 mm accuracy) were placed symmetrically on its left and right sides. The cameras and laser rangefinders sampled data once per minute. A lighting source was used to ensure constant lighting so that all surface information could be obtained by the vision system. A press produced axial and radial loads on the specimen throughout the experiment to ultimately generate convex deformation at the bottom of the specimen. Figure 18 shows a dynamic 3D surface reconstructed according to the proposed algorithm.

As shown in Figure 19(a), the laser rangefinders and vision system collected dynamic measurements and once per minute. The distance of the laser point to the base *h*_{0} was determined in advance as a standard measurement. In the reconstructed surface model, a cross section of *h*_{0} height was taken to obtain a visual measurement as shown in Figure 19(b). To determine this section on the 3D model, the point with the smallest value in the model was targeted; then the section with height *h*_{0} above it was taken as the target section. The visual measurement was taken according to the distance between the leftmost and rightmost points in this section.

**(a)**

**(b)**

The initial diameters of the specimens were about 203 mm, and the tolerance was between IT12 and IT16. Line graphs of the measured values of each specimen are shown in Figure 20(a)–20(c). We used a personal computer (i3-4150 CPU, Nvidia GTX 750Ti GPU, and 8 GB RAM) to accomplish the calculation. The black lines in Figure 19 are standard measurements obtained by the laser rangefinder. The red lines are visual measurements without point cloud correction, the green lines are visual measurements after deep-learning-based correction, and the blue lines are visual measurements after geometry-based correction.

**(a)**

**(b)**

**(c)**

We collected the calculation times for all sample points as shown in Figure 20. The average calculation times of each type of correction method are also listed in Table 2. Each method took under 2.5 s; their respective averages were 1.75 s, 2.28 s, and 1.87 s. The time described here includes the time necessary to calculate a 3D point cloud from a 2D image.

In order to evaluate the performance of the vision system and the correction algorithm, the maximum absolute error (*M*), mean absolute error (MAE), mean relative error (MRE), and root mean square error (RMSE) were used to evaluate the dynamic measurement error. The calculated indicators are shown in Figure 21.

**(a)**

**(b)**

**(c)**

**(d)**

Figure 21 show that the point cloud correction algorithm effectively reduces visual measurement error; it can compensate for any error caused by vibration of the press and inaccurate establishment of the WCS to provide accurate visual measurements of the 3D model. The dynamic indicator values after point cloud correction are significantly smaller than the uncorrected values, which also indicates that the proposed point cloud correction algorithm is effective.

For the deep-learning-based algorithm, the average *M* of each specimen is 3.00 mm, the average MAE is 1.11 mm, the average MRE is 0.52%, and the average RMSE is 1.84 mm. For the geometry-based correction algorithm, the average *M* of each specimen is 3.21 mm, the average MAE is 1.23 mm, the average MRE is 0.58%, and the average RMSE is 2.34 mm. These values altogether satisfy the requirements for high-accuracy measurement.

Both of the algorithms we developed in this study can effectively correct the spatial position of point clouds. Their effects do not significantly differ. Compared with the geometry-based algorithm, the deep-learning-based algorithm relies on weaker assumptions and thus is more general. However, it takes longer to run. When the object is irregular, the former algorithm is well applicable. When the object is cylindrical, the latter is a better choice as it is more computationally effective.

#### 4. Conclusion

In this study, we focused on a series of unfavorable factors that degrade the accuracy of surface reconstruction tasks. A point cloud correction algorithm was proposed to manage the unexpected shocks and vibration which occur under actual testing conditions and to correct the stitched point cloud obtained by multivision systems. The essential geometric parameters of the reconstructed surface were measured; then stereo rectification, point cloud acquisition, point cloud filtering, and point cloud stitching were applied to obtain a 3D model of a complex dynamic surface. In this process, a deep-learning-based algorithm and geometry-based algorithm were deployed to compensate for the stitching error of multiview point clouds and secure high-accuracy 3D structures of the target objects.

Geometric analysis and coordinate transformation were applied to design the geometry-based point cloud correction algorithm. This method is based on strong mathematical assumptions, so it has a fast calculation speed with satisfactory correction accuracy. By contrast, the deep-learning-based algorithm relies on a large number of training samples, and the forward propagation of the network is more computationally complicated than the geometry-based algorithm; it takes a longer time to accomplish point cloud correction. However, since the applicable object of the network is determined by the type of objects in the training set, it does not rely on manually designed geometric assumptions and is thus much more generalizable to different types of 3D objects.

The proposed point cloud correction algorithms make full use of the geometric and spatial characteristics of targets for error compensation, so both are more adaptive and efficient than standard-marker-based correction frameworks. They effectively enhance the accuracy of point cloud stitching over traditional methods and their effects do not significantly differ. The deep-learning-based algorithm is highly versatile, while the geometry-based algorithm is more computationally effective for cylindrical objects. They may serve as a reference for improving the accuracy of multiview, high-accuracy, and dynamic 3D reconstructions for CFSTs and other large-scale structures under complex conditions. They are also workable as is for completing tasks such as structural monitoring and data collection.

The two proposed algorithms have their limitations, and neither of them can achieve good output while ensuring satisfactory real-time performance. In the future, we will consider optimizing the structure of the PointNet++ network to make it more suitable for specific tasks and to improve its computing efficiency. We will also extract more general geometric features to improve the geometry-based algorithm, thereby improving its robustness. The advantages of these two algorithms can be combined to form a new algorithm if neither is eliminated, which depends on the specific tasks to which the method is applied.

#### Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Authors’ Contributions

Yunchao Tang and Mingyou Chen contributed equally to this work.

#### Acknowledgments

This work was supported by the Key-Area Research and Development Program of Guangdong Province (2019B020223003), the Scientific and Technological Research Project of Guangdong Province (2016B090912005), and Science and Technology Planning Project of Guangdong Province (2019A050510035).