Abstract

This study aimed to investigate the usability of smartphone camera images in 3D positioning applications with photogrammetric techniques. These investigations were performed in two stages. In the first stage, the cameras of five smartphones and a digital compact camera were calibrated using a calibration reference object, with signalized points having known three-dimensional (3D) coordinates. In the calibration process, the self-calibration bundle adjustment method was used. To evaluate the metric performances, the geometric accuracy tests in the image and object spaces were performed and the test results were compared. In the second stage, a 3D mesh model of a historical cylindrical structure (height = 8 m and diameter = 5 m) was generated using Structure-from-Motion and Multi-View-Stereo (SfM-MVS) approach. The images were captured using the Galaxy S4 smartphone camera, which produced the best result in the geometric accuracy tests for smartphone cameras. The accuracy tests on the generated 3D model were also applied in order to examine 3D object reconstruction capabilities of imaging with this device. The results demonstrated that smartphone cameras can be easily used as image acquisition tools for multiple photogrammetric applications.

1. Introduction

In the early 1980s, the advent of digital cameras had a striking and positive impact on close-range photogrammetry, immediately expanding the scope of applications and providing facilities for full measurement automation. The development of digital cameras has provided a substantial acceleration in the processing steps for offline measurement tasks. In addition, subpixel image operators such as centering or template matching provided a level of image measurement accuracy that was routinely better than 0.1 pixel [1].

Digital cameras can be divided into two categories with respect to their metric properties: comparatively low-cost, low-resolution, amateur digital cameras, and comparatively expensive, professional digital cameras with high resolution. Professional cameras have a wide range of features, such as good lens quality, a robust structure, a large sensor with high resolution and sensitivity, and the ability to switch lenses, while amateur cameras may contain any of these features. The main difference between compact and professional cameras is the lower geometric stability of amateur cameras. Among the different amateur cameras, smartphone cameras are the most interesting option because mobiles phones are light, portable, inexpensive, and fully equipped with high-resolution digital cameras [2, 3].

The camera calibration is a process required to extract accurate and reliable 3D information from images. Various algorithms have been proposed for camera calibration in the area of photogrammetry and computer vision. The developed algorithms are generally based on the perspective camera model [3]. In the early 1970s, the self-calibration approach was initially developed by Brown [4] and has been routinely used as an efficient technique in photogrammetry [1].

The photogrammetric camera calibration has been the subject of many publications. These studies are based on experience acquired over the years that digital cameras have been used for photogrammetric measurement, and in the particular parameter sets, the stability of parameters, camera configurations, and analysis techniques are suggested [57].

Mobile-phone cameras have been used in many research studies and commercial applications. One of the most prominent applications is the use of smartphones for recognizing the two-dimensional (2D) barcodes. Smartphone cameras are used for capturing and decoding the barcodes of devices. After decoding, an encoded URL, which automatically directs the users to the source website for further information, is obtained [8].

One of the first applications in the field of photogrammetry using mobile-phone cameras was presented in the work by Akca and Gruen [9]. They investigated the geometric and radiometric evaluation of the low-resolution mobile-phone cameras. Furthermore, Azhar and Ahmad [10] carried out the same tests for a low-resolution mobile-phone camera.

With the advent of high-resolution mobile-phone cameras in the 2010s, these devices have begun to be used as imaging tools in photogrammetric tasks. A few studies have focused on the use of smartphone camera images for pure photogrammetric processes, e.g., in the work of El-Ashmawy et al. [2]. The authors used mobile-phone camera images to photogrammetrically determine the displacements of signalized points on a beam under loading.

3D reconstruction procedures using the smartphones have initiated research in this field [1113]. Recently, image-based 3D reconstruction techniques based on the combination of computer vision and photogrammetric algorithms have become a robust and efficient solution [14]. The basic concept of these methods is to apply an automatic image orientation by SFM and then a dense image matching. The measured point cloud is then converted into a triangular mesh or textured surface which represents the object surface shape. Depending on the computational load and the desired reliability level, the 3D reconstruction process can be performed on a mobile phone directly or on a cloud-based server or on a PC using the appropriate software. In the PC and server-based solutions, the mobile phone is used only as an imaging device to capture images of the scene of interest. There are a wide variety of open-source solutions (Visual SfM, Bundler, etc.) and free web-based services (e.g., Photosynth, 123DCatch, etc.) to implement the SfM-MVS approaches [15]. Although these methods provide automation and convenient services for users in data processing, they do not ensure accuracy and robustness in the final results, lacking georeferencing process and spatial data creation. On the commercial side, many effective software packages have also emerged on the market (e.g., Pix4d, Agisoft PhotoScan), providing a 3D reconstruction of objects from image data [14, 16]. In the literature, there are a few studies using these software packages for the processing of smartphone camera images. In one of these studies, Kim et al. [17] investigated the possibilities of using smartphone cameras in photogrammetric UAV systems. In another study, Micheletti et al. [18] explored the possibility of obtaining high-resolution topographic and terrain data using a set of low-resolution smartphone camera images. The solution was implemented on a smartphone, on a server, and also with a commercial software package.

The geometric accuracy tests of high-resolution smartphone camera images and 3D object reconstruction capabilities have not been sufficiently researched in literature. Herein, the usability of smartphone cameras in photogrammetric applications has been investigated. For this purpose, the geometric accuracy tests for five smartphone cameras and one compact camera have been performed using a 3D reference field, followed by a comparison of the test results. In the second stage, a 3D model of a historical structure has been reconstructed based on SfM-MVS approaches using smartphone camera (Galaxy S4) images and geometric accuracy tests have been performed on the model. This produced better results in terms of geometric accuracy.

In the next section, the process of geometric accuracy tests for the cameras is introduced. Subsequently, the details of 3D modeling performed using the images captured by Galaxy S4 are provided. Finally, the test results are summarized.

2. Geometric Performance Tests

2.1. Cameras

Today, for purchasing a smartphone, users have an option to choose from a wide variety of brands and models. In our research, we studied five smartphones, which were widely used during the period of the study, and one digital compact camera. Table 1 lists the technical specifications of the compact camera and smartphone cameras.

2.2. Comparison of Cameras

In total, 25 images of the calibration reference object with 80 targets comprising a white dot on a black background (Figure 1) from approximately the same locations were captured using the compact camera and each smartphone camera. 3D coordinates of the marked points installed on a transparent glass plate (60 × 60 cm) at different heights were measured previously with high accuracy. The images were taken from a distance of ∼70 cm. To avoid correlations between the parameters of interior orientation (IO), exterior orientation, and the coordinates of the object point, the cameras of the smartphones were rotated 90° to the left and right around the optical axis in eight positions while capturing the images.

For photogrammetric evaluation, the Australis photogrammetric software package (version 6.06; Photometrix, 2012) was used. It can perform least-squares adjustments of photogrammetric bundles with photogrammetric-only data. Alternatively, it uses combined adjustments with either known camera parameters or by self-calibration. The self-calibrating bundle adjustment method used herein is the most versatile and accurate photogrammetric positioning and calibration method. The mathematical model of this method is based on the following collinearity condition, which is implicit in the perspective transformation between the image and object spaces [19, 20]:with,where x and y are the image coordinates of the point. x0, y0, and c are the IO parameters. X, Y, and Z are the object coordinates of the point. X0, Y0, and Z0 are the object coordinates of the perspective center. R is the orthogonal rotation matrix built with the three rotation angles (ω, φ, and κ) of the camera. Δx and Δy are functions of a set of additional parameters (AP) to account for the departures from collinearity due to lens distortion and focal plane distortions [19, 20].

The most common set of APs employed to compensate for systematic errors in digital cameras is the 8-term “physical” model originally formulated by Brown [4]. This model includes the 3D position of the perspective center in image space (principal distance and principal point), as well as the three coefficients of radial and two of decentering distortion. The model can be extended by two further parameters to account for affinity and shear within the sensor system. While a large number of additional parameter sets has been published in the literature [21], this model has become an accepted standard for digital camera calibration [6]. In this study, the standard 10-term “physical” calibration model described by the following equations has been used to investigate the geometric accuracy potential of the cameras used in the research.with,where k1, k2, and k3 are the first three parameters of radial symmetric distortion. p1 and p2 are the first two parameters of decentering distortion. b1 and b2 represent terms for differential scaling and nonorthogonality between the x- and y-axes [22].

The computation is iteratively performed using the Gauss–Markov least-squares model. The results of the self-calibrating bundle adjustment process comprise the 3D object space coordinates of unknown points and the camera parameters.

The results of the self-calibrating bundle adjustment for multiple comparisons are summarized in Table 2. In the bundle adjustment process, 10 points from a total of 80 test field points with known 3D coordinates were used as the control point and the 3D coordinates of the remaining points (checkpoints) were calculated. To evaluate the effect on the positioning accuracy of the APs, we also performed photogrammetric bundle adjustment without using APs for each smartphone camera. Using free-network bundle adjustment for each camera, another camera calibration procedure was performed. In this case, the control points were not used and were included in the adjustment process as checkpoints.

Compared with other cameras, the Galaxy S4 camera exhibited the best performance. According to the results of the bundle adjustment with external constraints using 10 control points, the triangulation misclosures (RMS of image coordinate residuals) used as precision indicators for internal accuracy was computed as 0.27 μm for Galaxy S4, whereas this value for the three other smartphone cameras and the Canon compact camera was determined to be in the range 0.49–0.63 μm. The relative precision was determined as 1/40000 for Galaxy S4. Relative precision is the ratio of the mean target coordinate precision to the largest span of the target array. For other cameras, this ratio was in the range from 1/18000 to 1/25000.

With regard to relative accuracy (measured as the mean RMS coordinate error divided by the largest span of the target array), the most accurate result in the horizontal was 1 : 25000 for Galaxy S4. A relative accuracy average of 1 : 13000 in the horizontal was achieved with the other four smartphone cameras and the compact camera. The best result in the depth direction was 1 : 22000 for Galaxy S4, whereas those for iPhone 5, Nokia C7, Sony Xperia S, Galaxy S3, and Canon IXUS 960 IS were computed as 1 : 9000, 1 : 9000, 1 : 13000, 1 :10000, and 1 : 17000, respectively. The best fit between the theoretical precision and experimental accuracy was obtained for Galaxy S4.

Without using APs, we calculated the accuracy in the image space for all smartphone cameras. These values were in the range 2.26–11.02 pixel. A comparison of bundle adjustments with and without APs showed that the accuracy in the image space improved by a factor of 15. The greatest improvement was by a factor of 25 for Galaxy S4, and the minimum improvement was by a factor of 6 for Galaxy S3. An average of 24-factor improvement in the accuracy was observed in the object space.

With free adjustments, the photogrammetric network was not affected by the probable discrepancies between the reference points. The object coordinate residuals were only affected by photogrammetric measurements and model quality. Free-network adjustment thus provides optimal internal precision [23]. A comparison of the results obtained from free-network adjustment and from adjustment with control points showed that the values of the camera parameters are considerably similar. As expected, the accuracy and precision values resulting from free-network adjustment were smaller than those resulting from adjustment with control points.

The Gaussian radial distortion profiles recovered for each camera are also shown in Figure 2. The profiles are obtained using the following equation:where dr is the radial distortion and r is the radial distance [24]. The plots were derived using a free-network self-calibrating bundle adjustment. The profiles have been plotted only for the maximum radial distance encountered in the self-calibration.

The lowest values for radial distortion (∼9 μm at the corners) were obtained for Galaxy S3, whereas the highest value was obtained for Canon IXUS 960 IS (∼180 μm at the corners). The obtained decentering lens distortion value was less than 1 μm for the cameras, except for Sony Xperia S. This value for Sony Xperia S was calculated as 3 μm at the corners. The calculated principal point locations and their standard deviations are listed in Table 3. The highest values for the principal point position were obtained for Galaxy S4 and Canon IXUS 960 IS. Imperfect mounting of the lenses may have led to these high values.

3. Image-Based 3D Modeling Tests

The integration of photogrammetric methods and computer vision algorithms is leading to attractive procedures which have increasingly automated the whole image-based 3D modeling process [24]. Recently, automatic solutions based on SfM-MVS techniques have been extensively used in image-based 3D reconstruction tasks [16, 25, 26]. The process principally involves image orientation and dense model reconstruction with a high level of automation.

3.1. Structure from Motion

Structure from motion can be described as the determination of orientation parameters and 3D scene’s model at the same time. Traditionally, the SfM pipeline consists of two main stages. First, a set of point correspondences between image sequences is detected as a consequence of feature detection and image matching. Second, the SfM is operated to determine the orientation parameters and scene structure [27].

The aim of correspondence estimation phase is to obtain sets of matching pixel positions between image sequences. Each set of matching pixels ideally represents a single point in 3D space. Currently, the scale-invariant operators such as SIFT or SURF provide the state-of-the-art methodology for extracting point features from the images. The features in each image are invariant with respect to image scaling, translation, rotation, and partially illumination changes. For each of these features, the detector also computes a “signature” for the neighborhood of that feature, also known as a feature descriptor [27]. These descriptors are unique enough to allow features to be matched in large datasets [28].

Next, for each pair of images, a set of matching features is determined. The matches are generally obtained on the basis of a kd-tree procedure based on the approximate nearest neighbours [29].

After matching features for an image pair, the fundamental matrix is robustly estimated for the pair using RANSAC with the eight-point algorithm [30, 31]. After finding a set of geometrically consistent matches between each image pair, the matches are organized into tracks, where a track is a connected set of matching key points across multiple images [32, 33].

The second stage of the pipeline comprises determining a set of camera parameters and a 3D position for each track. The recovered parameters should be coherent, in that the reprojection error is minimized. This minimization problem is considered as a nonlinear least-squares problem and solved using bundle adjustment [33].

3.2. Dense Scene Reconstruction

SfM is capable to construct a sparse geometric structure consisting of the 3D positions of matched image features. While this is adequate for some applications such as the image-based visualizations, the reconstruction of a highly detailed and accurate 3D model demands producing a dense point cloud, which requires to apply dense image matching methods between oriented images [27, 34].

The image matching problem is generally resolved by utilizing stereo pairs or via determination of correspondences in multiple images. Stereo methods can be local or global. Local methods determine the disparity at a given point depending only on intensity values within a local window, while global methods make explicit smoothness assumptions and then solve an optimization problem over a global cost function. Most of these procedures apply consistency measures only to single stereo pairs. On the other hand, geometric constraints are applied only during the fusion of the point clouds derived by the stereo pairs [24, 35].

The last steps of the 3D reconstruction process consist of meshing and texturing. Various approaches can be used to derive a photorealistic 3D model from a dense point. Remondino and El-Hakim [36] expressed that polygonal meshing is generally the most efficient solution to accurately represent the results of 3D measurements, providing an optimal surface description. One of the most popular polygonal 2D mesh algorithms is the Delaunay triangulation method. In general, these methods require a starting point such as the visual hull model, a calculation of additional information such as a vertex normal, and a sufficient number of points [34].

3.3. 3D Modeling and Performance Testing

A second application was implemented to examine the potential use of mobile-phone images for 3D modeling applications. In this application, 3D modeling of a historical cylindrical structure (height = 8 m and diameter = 5 m) was first performed based on SfM-MVS techniques using the images captured using Galaxy S4, and then accuracy tests were conducted on the 3D model.

The historical building, which is said to have been built in the mid-14th century, is known as Sircali cupola. Today, this cupola is located in the garden of Kayseri Technical and Industrial Vocational High School in Kayseri (Figure 3).

For 3D modeling, a closed five-point polygon network was first created and measured the coordinates of the 96 signalized points on the historic cupola using a reflectorless total station with high accuracy. In addition to the signalized points, 280 natural points, clearly identified and well distributed on the cupola, were also measured. Herein, we took 69 overlapping images of the cupola. The images were captured from a distance of 5–20 m. The average shooting distance was ∼8 m. The ground resolution at this shooting distance was an average of 2.06 mm/pixel.

The Agisoft PhotoScan Professional software package (version 1.4.3; Agisoft, 2018) was used to generate a 3D photorealistic model of the historical structure based on SfM and MVS algorithms. The workflow of the software for 3D modeling comprised four primary steps: (1) align photos, (2) optimization, (3) build dense point cloud, and (4) surface reconstruction (3D polygonal model). The align photos process consists of detecting the common tie points in the images, matching them on the images, and then the photoalignment process. In our study, the photoalignment accuracy was selected as “high” (the software package uses photos in their original size). To optimize the performance, the number of matching points for each image was limited to the default value of 4000. The outputs of the photoalignment process are the camera position and orientation parameters for each image, camera calibration parameters, and a sparse point cloud model.

For the optimization step, firstly, 3D coordinates of the marked points were added to the input data set. Twenty-eight of these points were used as ground control points (GCPs), while the remaining sixty-eight points were used as the checkpoint. The coordinates of the marks were manually associated with the corresponding marker center. This procedure georeferenced the sparse point cloud model. Nonlinear deformations of the model can be eliminated by optimizing the calculated sparse point cloud and camera parameters based on the known control point coordinates. Throughout this optimization procedure, the software updated estimated point coordinates and camera parameters minimizing the sum of reprojection error and reference coordinate misalignment error.

The third step involved the generation of a dense point cloud using the estimated camera positions and the images. The software calculated the depth map for each image and combined into a final dense point cloud. To obtain a more detailed and accurate geometry, the reconstruction quality was set to “high.”

The final step in the process was the reconstruction of the 3D polygonal model representing the surface of the object based on the dense point cloud. The surface type was selected as “arbitrary,” which was suitable for modeling any object. The user could also determine the maximum number of polygons in the final mesh. This parameter was set to “high” to optimize the number of polygons for a mesh with the corresponding level of detail [37].

From the optimization step, the average precision of the coordinates of the control points was calculated as 0.59 mm and 0.71 mm for the in-plane (x-z) and out-of-plane (y) components, respectively. The empirical accuracy in the object space calculated from the checkpoints was found to be 1.04 mm and 1.33 mm for the in-plane and out-of-plane components, respectively. The RMS of reprojection error (residuals of image coordinates) was 0.54 pixels for all tie points.

The SfM-MVS process produced a 3D dense point cloud containing 21.8 million points (with an average density of 17 points/cm2). This point cloud was triangulated to create a mesh model with 4.3 million faces and 2.18 million vertices (Figure 4).

To assess the final positional accuracy of the geometric model, 68 signalized checkpoint coordinates were measured within the 3D point cloud. These measurements were performed by measuring the closest points to the center of the targets. RMSEs at checkpoint coordinates were calculated as 2.05 mm and 2.38 mm for the in-plane and out-of-plane components, respectively.

Besides point measurements at checkpoints, an assessment of positional accuracy was performed by measuring the spatial differences between the meshed point cloud and the point cloud, which consists of both the checkpoints and natural points using the cloud-to-mesh distance function of the CloudCompare software package. This function is considered more robust to local noise. The distance to the nearest triangle was calculated as follows. If the orthogonal projection of the checkpoint fell inside the triangle, the distance was defined as the orthogonal distance from the point to the triangular plane; however, if the projection fell outside the triangle, the distance to the nearest edge was considered. Figure 5 illustrates the results of distance computation with a color scale display.

Each surface deviation depiction is followed by a graph showing the deviation distance frequency of the occurrence along with the mean distance and standard deviation (σ) of the measured checkpoints [38]. It is worth noting that the comparison resulted in Gaussian-like distributions. The average mean distance between the 3D mesh model and the checkpoints was 0.12 mm, whereas the standard deviation of the checkpoints was 3.83 mm.

3.4. Automatic Camera Calibration

Currently, a convenient, stand-alone targetless camera calibration is achievable via a process that combines SfM methods with rigorous photogrammetric orientation and self-calibration [39]. An attempt was made to evaluate the performance of the targetless camera calibration method based on the SFM algorithm on Galaxy S4 camera images. A total of 40 images of the cupola were used in the study and camera calibration parameters were calculated automatically in Agisoft PhotoScan software. For comparison with the targetless approach, the camera calibration parameters were recalculated in the Australis software using the same images and signalized points having known 3D coordinates. Twenty-five of the signalized points were used as ground control points, while the remaining 50 points were used as the checkpoint. The self-calibration results are given in Table 4.

The calibrated values for focal length (c) and principal point offsets (x0, y0) are listed in the table, together with the estimated standard errors. In addition, the radial distortion correction values (Δr) at the two selected radial distances, and the two decentring distortion profile values P (r) for the same radial distances are listed. The RMS value () of image coordinate residuals and the number of object points for each case are also shown in the table. The repeatability between the target-based and targetless cases can be seen to be high for the interior orientation and lens distortion parameters. Regarding the internal accuracy indicators, as expected, there was a two-fold discrepancy between the accuracy of the image coordinate measurement in both cases, with the RMS values being 0.33 pixel for the target-based case and 0.67 pixels for the targetless case. The accuracy tests were also performed by using checkpoints for the target-based case and as a result of the improvement in image coordinate measurement accuracy, the relative object point accuracy was determined as l : 11000 in-plane and 0.012% of average depth.

4. Conclusions

We investigated the usability of mobile-phone camera images in photogrammetric applications in two stages. In the first stage, five different mobile-phone cameras and a digital compact camera were compared in terms of their accuracy and precision. For this purpose, both self-calibration with the control points and free-network bundle adjustments were performed for each camera. With external constraints using 10 control points, the accuracy of the image space was calculated to be ∼1/4 pixel for Galaxy S4. For other cameras, this value was in the range 1/2–1/2.9 pixel. The relative accuracy in the object space for Galaxy S4 was l : 25000 in-plane and 0.004% of average depth, whereas these ratios for other cameras ranged from l : 10000 to l : 13000 in-plane and from 0.01% to 0.006% of average depth. The 3D object point accuracy was less than 0.1 mm for all cameras. The best results in all evaluations were obtained for the Galaxy S4 smartphone camera. For the mobile-phone cameras used in the study, we think that the principal influence on accuracy is the resolution of the camera sensor and the pixel size. Luhman et al. [1] stated that values consistent with anticipated accuracy of image coordinate measurement are in the range of 0.03–0.1 pixels for automatically measured targets, and the quality of the camera calibration will provide a relative accuracy of 1 : 50000 with the precondition of a strong multi-image geometry. There was about twofold discrepancy between the accuracy values obtained for the Galaxy S4 and recommended for digital cameras in both image and object space. On the other hand, in an earlier study using test field data, Akca and Gruen [9] demonstrated that relative accuracies of 1 : 8000 in-plane and 0.03% of average depth can be achieved with low-resolution mobile-phone cameras. Therefore, it can be said that there is an appreciable improvement in relative accuracy as a result of developments in image resolution and mobile-phone technology.

In the second stage, we used the SfM-MVS techniques to reconstruct the 3D model of the historical Sircali cupola using the images captured using the Galaxy S4 smartphone camera, which yielded the best performance in the geometric accuracy tests. After aerotriangulation block adjustment, relative accuracy values were determined as 1 : 7500 and 1 : 6000 for the in-plane and out-of-plane components, respectively. In SfM-based approaches, there are two main problems associated with measurement applications. Firstly, imperatives of avoiding wide base-lines means that a weaker network geometry results. Secondly, descriptor-based feature point matching leads to the lower accuracy image measurement [39]. The disparity in accuracy is likely related to these disadvantages of the sfm approach, the lower image scale, and the manual image measurements of targets. Indeed, as also stated by Fraser and Shortis [40], the accuracy of vision metrology systems based on digital cameras is dependent on the image resolution, image scale, image measurement precision, and a number of other factors, such as network design. The problems of the sfm approach were also seen in the automatic calibration process. Although the compatibility between the target-based and targetless cases was high for the camera calibration parameters, there was about twofold discrepancy between the accuracy of the image coordinate measurements.

The final positional accuracy tests of the geometric model showed a reasonable accuracy (mm level) of the dense point cloud and the resulting mesh model. The data evaluation phase demonstrates that it is possible to obtain high-quality results from numerous images captured by smartphone cameras using appropriate software solutions. Furthermore, the generated 3D model demonstrated the feasibility of the SfM-MVS approaches in low-budget digitization or documentation projects.

Consequently, with an appropriate imaging configuration, calibration, and data processing software performance, these devices can be used in multiple photogrammetric measurement applications demanding high accuracy. This option is being explored because mobile-phone cameras have good resolution and are economical and flexible. In addition, in line with the technological advancements, the quality and performance of mobile-phone cameras will develop further along with their built-in image processing functions. Therefore, it will be possible to obtain high-quality results for 3D modeling applications in which these devices are used alone as both photogrammetric data acquisition and processing tools for at least small projects.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.