Abstract

To reduce data acquisition cost, this study proposed a novel method of individual tree height estimation and canopy extraction based on fusion of an airborne multispectral image and photogrammetric point cloud. A fixed-wing drone was deployed to acquire the true color and multispectral images of a shelter forest. The Structure-from-Motion (SfM) algorithm was used to reconstruct the 3D point cloud of the canopy. The 3D point cloud was filtered to acquire the ground point cloud and then interpolated to a Digital Elevation Model (DEM) using the Radial Basis Function Neural Network (RBFNN). The DEM was subtracted from the Digital Surface Model (DSM) generated from the original point cloud to get the canopy height model (CHM). The CHM was processed for the crown extraction using local maximum filters and watershed segmentation. Then, object-oriented methods were employed in the combination of 12 bands and CHM for image segmentation. To extract the tree crown, the Support Vector Machine (SVM) algorithm was used. The result of the object-oriented method was vectorized and superimposed on the CHM to estimate the tree height. Experimental results demonstrated that it is efficient to employ point cloud and the proposed approach has great potential in the tree height estimation. The proposed object-oriented method based on fusion of a multispectral image and CHM effectively reduced the oversegmentation and undersegmentation, with an increase in the -score by 0.12–0.17. Our findings provided a reference for the health and change monitoring of shelter forests as well.

1. Introduction

Shelter forests are considered the green barriers at the edge of deserts, which are able to prevent land desertification and provide wind proofing and sand fixation. Thus, they play an indispensable role in enhancing the self-regulation ability of the ecosystem and slowing down the expansion of land desertification. With the degradation of shelter forests due to man-made destruction and climate change, the prevention of land desertification has become more urgent [1]. To improve the ecological environment and reduce natural disasters, the Chinese government has also successively implemented ecological restoration projects aimed at preventing land desertification, such as the Three-North Shelterbelt Project (TNSP) and the Grain to Green Program (GGP) [2, 3]. Therefore, monitoring the growth parameters of shelter forests has become crucial. Among these parameters, tree height is an important indicator of shelter forest structural characteristics and is essential in the estimation of canopy density and aboveground biomass [4, 5]. The rapid and accurate extraction of tree heights of shelter forests is of great significance to maintain desert ecosystems.

For the estimation of tree height, manual measurement and satellite remote sensing methods cannot cater to the needs of forestry management departments for monitoring products, such as thematic maps of tree height. Traditionally, heights of individual trees are obtained using the ground measurement method, in which the tree height is measured with a laser range finder and other measuring equipment, allowing higher accuracy of tree height measurement results to be obtained, but consuming significant manpower and material resources [6]. The rapid development of high-resolution remote sensing technology not only improves the efficiency of the tree height estimation but also provides a wider range of data sources, including WorldView-2 [7] and GF-2 [8], which have been used to prove the potential of spaceborne satellite imagery in the estimation of tree height. For satellite data, researchers use a high-precision Digital Elevation Model (DEM) to assist in generating a canopy height model (CHM) [810]. But the drawbacks of DEM are obvious: low data accuracy, the highest of which is only 30 m [11], low resolution, and limited ability to display the subtle changes in the topography and generate the CHM in the forest area of the complex terrain. Xu et al. [12] built a high-precision DEM from the point cloud generated by Light Detection and Ranging (LiDAR) and then subtracted the DEM from the DSM generated from the photogrammetric point cloud to obtain CHM. LiDAR can penetrate the shelter forest canopy to the interior and the ground through laser echoes, thereby obtaining vertical forest structure information, which is beneficial to generate a high-precision DEM. However, the approach is neither economically optimal, which is because the LiDAR data is more expensive and not applicable in large-scale shelter forest monitoring. Furthermore, when there exist several influential factors, such as clouds covering, the satellite data is difficult to deliver forestry monitoring products necessary for forestry management at a specific time.

Unmanned aerial vehicle (UAV) images with a high overlap rate can generate photogrammetric point clouds with lower cost and higher economic benefits. The image data of the shelter forest is updated in real time to guarantee large-scale flight operations as planned. The CHM generated by a UAV photogrammetric point cloud shows good performance in the tree height estimation [1315]. However, the photogrammetric point cloud is generated by image matching, and the forest structure is allowed to be reconstructed only when the distance between trees is far enough to identify the ground between and below the canopy. Moreover, the CHM generated from a UAV photogrammetric point cloud usually underestimates tree height [16, 17]. To meet economic benefits and data requirements, obtaining high-precision DEM through photogrammetric point clouds has become a key issue. In the meantime, Radial Basis Function Neural Network (RBFNN) with excellent spatial interpolation ability provides the possibility for this. RBFNN has been applied in hydrological data spatial interpolation prediction [18], soil element interpolation [19], point cloud interpolation [20], etc. Zhao et al. [20] used RBFNN to interpolate the point cloud of LiDAR, and obtained the results that the point cloud elevation prediction coefficient () was 0.887, and the root mean square error (RMSE) was 0.168 m. The RBFNN has a high prediction accuracy for spatial data, but few studies applied it into the prediction of the elevation in the photogrammetric point cloud. Therefore, this study focuses on the applicability of generating DEM only based on photogrammetric point clouds.

To date, many scholars have conducted researches on individual tree height estimation methods under different woodland scenarios based on the CHM generated by point cloud data. Brieger et al. [21] used UAV photogrammetric point cloud data to generate a CHM, selecting three types of stand data about sparse deciduous forest, dense deciduous forest, and dense mixed forest, and used local maximum filters (LMF) in variable window size for the tree height estimation. Huang et al. [22] used photogrammetric point clouds to generate the DEM and Digital Surface Model (DSM) through Triangulated Irregular Network (TIN) interpolation to obtain the CHM and estimated tree height with LMF. In leafless and sparse forest areas, it is found to be difficult to reconstruct the three-dimensional (3D) structure of the forest areas, resulting in low accuracy of the tree height estimation. In contrast, in forest areas with sufficient leaves, the accuracy of tree height estimation is greatly improved, confirming the significant potential of photogrammetric point clouds in the estimation of individual tree heights of shelter forests. These studies are based on the CHM generated by the point cloud for individual tree height estimation, for which, irrespective of the algorithm used, individual trees will be subject to oversegmentation and undersegmentation [21, 23]. This is more obvious in broad-leaved forests, in which the canopy of an individual tree has multiple vertices and multiple maximum values, which will cause oversegmentation.

Therefore, high accuracy of tree crown extraction is a prerequisite for the high-precision tree height estimation. The Object-Based Image Analysis (OBIA) uses the segmented object as the basic classification unit, and makes full use of the object’s spectrum and texture. Compared with the traditional pixel-based classification method, it can effectively improve the classification accuracy. Franklin [24] achieved the result that overall classification accuracy of approximately 50%, 60%, and 80%, respectively, for nine commercial coniferous trees by employing pixel-based unsupervised clustering, supervised maximum likelihood classifications, and OBIA of UAV-based multispectral imagery. No matter which classification method is used, the accuracy of near-infrared spectroscopy is better than using RGB band alone, which confirms that multispectral data has an unique advantage over RGB in describing the canopy. Thus, it is theoretically possible to combine point clouds with multi/hyperspectral remote sensing images to extract tree canopy to reduce the oversegmentation and undersegmentation of individual trees.

In view of the above problems, our research objectives are to solve the problem of low DEM accuracy due to tree canopy occlusion by using RBFNN interpolation prediction, and to combine point cloud with multispectral data by using OBIA to reduce oversegmentation and undersegmentation of the canopy. Three areas are selected for experimental verification, and the main contributions of this work are listed as follows: (1)We have solved the problem of low DEM accuracy of the current point cloud data of the photographic point cloud in the forest with high canopy coverage(2)Compared with the tree crown extraction method based on CHM alone, extracting the tree crown from the multispectral images fusing DEM has obvious advantages

2. Study Area and Materials

2.1. Study Area

We selected the Three North Shelter forest area (45°10N, 85°56E) of 150 regiments in the north of the Mosuowan reclamation area as the study area, which is approximately 150 km north of Shihezi City, Xinjiang Uygur Autonomous Region, China (Figure 1). The regiments are located at the northern foot of the Tianshan Mountain and the south edge of Gurbantunggut Desert in the Junggar Basin and are surrounded by sand to the east, west, and north. The shelter forest is planted in a wedge shape around the edge of the desert, dominated mainly by deciduous broad-leaved forests such as Ulmus pumila, Populus bolleana, Populus euphratica, Elaeagnus angustifolia, and Haloxylon ammodendron, all of which are known for their drought tolerance. In addition, their strong windproof and sand fixation ability make these tree species provide excellent sand fixation and afforestation effects in arid desert areas. Furthermore, characteristics of Ulmus pumila and Elaeagnus angustifolia, such as large and dense canopy, diverse growth conditions, uneven spatial distribution, and presence of additional green plants (such as weeds) at the tree base, provide an opportunity to test the accuracy of tree height estimation against a complex background.

We selected three areas as the research area, and the characteristics of the three areas of the shelterbelt are as follows: (1) study area 1—mixed artificial forest consisting of Ulmus pumila, Populus bolleana, Populus euphratica, and Elaeagnus angustifolia, where the smaller spacing between the canopies is dense forest, as shown in Figure 1(b); (2) study area 2—sparse pure forest consisting of Populus bolleana which is mostly in good health, where the gap between the canopy is large, as shown in Figure 1(c); and (3) study area 3—sparse pure forest consisting of Populus bolleana which is mostly not in good condition, where the gap between the canopy is large, as shown in Figure 1(d).

2.2. Remote Sensing Data Acquisition

The UAV platform used in this study was a fixed-wing UAV CW-20 produced by JOUAV Company. It was equipped with a SONY-A7RII visible light camera and a Micro MCA12 Snap multispectral camera, which acquire visible light images and multispectral images taken as the data sources of our study. The UAV has the advantages of fully autonomous takeoff and rapid installation. It was a professional-level aerial survey UAV at a cruising speed of 26–40 m/s and with a battery life of 3 h. It was operated by a GCS-202 ground station and CWCommander software and used Real-Time Kinematic/Post Processed Kinematic (RTK/PPK) positioning technology. The location information of the remote sensing images obtained could reach centimeter-level accuracy, and such devices have been widely used in the acquisition of remote sensing data for large-scale agriculture and forestry in China. We selected October 9, 2019 as the data acquisition date, and set a relative flight height of 400 m and the flight coverage area of the study area as 8.48 square kilometers, which was to meet the requirements of the high-precision photogrammetry point cloud, according to the suggestion of a previous study in [25]. The lateral direction and route overlap rate was set to be 80% for the SONY-A7RII visible light camera, and the spatial resolution was 0.05 m. 1716 images were acquired to create a point cloud. To meet stitching requirements, the Micro MCA12 Snap multispectral cameras (Micro MCA12 Snap sensor band parameters are shown in Table 1) were at the settings: line overlap rate of 60%, side overlap rate of 70%, relative flight height of 400 m, and spatial resolution of 0.2 m. In addition, we set four radiation targets on the ground, with reflectivity of 3%, 22%, 48%, and 64% for future radiation correction.

2.3. Field Measurements

The field measurements were made on October 7, 2019. The location (including latitude and longitude coordinates) of individual trees was recorded using the geolocation function of the Aowei software which is based on Google Maps. The health (good or bad condition) of each tree was recorded, and multifunction laser distance measurement instrument (BAOSHIAN-CS600VH) was used to measure the height of individual tree whose sampling situation is shown in Figures 1(b)–1(d) and sampling number is shown in Table 2. Besides, canopy sampling numbers (number of manually delineated tree crowns in the study area) are shown in Table 2.

The recorded geographic location was imported into ArcGIS 10.6 software and any deviations were corrected. According to the similarity of species and the overall similarity between the forest stand structure of field data, including tree cover, density, and planting type, three areas were selected, so that the data would not be affected by the growth of the shelter forest. All the field measurements of the shelter forest for the three areas were collected within one week with UAV data acquisition.

3. Methods

The technical workflow of this research is shown in Figure 2, which includes the following steps.

3.1. Data Preprocessing

After the flight mission was completed, the Position and Orientation System (POS) data in the base station was sorted and imported into Pix4Dmapper 4.4.10 software for processing. After feature extraction, image matching, bundle adjustment, automatic triangulation, camera self-checking, and optimize external parameters, the image was preliminarily processed. A dense point cloud based on Structure-from-Motion (SfM) was generated by selecting the following parameters: the image scale was half (the default value) with the multiscale option selected, point density was set to be optimal, and the minimum number of matches was 3. This operation resulted in a photogrammetric point cloud data in LAS format and generated orthophotos with a spatial resolution of 0.05 m. Photogrammetric point cloud was imported into Terrasolid software to go through such processes: first, noise was removed from the point cloud data, and then the point cloud filtering was performed using the TIN densification filtering method built into the Terrasolid software to derive the ground point cloud. Using the generated original photogrammetric point cloud, through Inverse Distance Weighted (IDW) interpolation operation of ArcGIS 10.6 software, a DSM of 0.2 m resolution was generated.

The acquired original multispectral images were exported in RAW format, and Tetracam PixelWrench2 software was used to convert them into standard TIFF format raster image data. The one-to-one correspondence between the POS and the image was performed in the Pix4Dmapper 4.4.10 software to obtain multispectral image data with 0.2 m spatial resolution. Radiometric correction was performed on orthophotos that had been stitched together, and the relationship between the actual digital quantization value () of the UAV multispectral image and the ground reflectance () is where is the scaling gain coefficient and is the offset value.

According to the calibration equation, the values of the four target images on the ground were calculated by drawing the area of interest, corresponding to the standard reflectance values of the four targets. The least square method was used to fit the empirical linear model. This operation provided the coefficients and of the UAV Micro MCA12 Snap multispectrometer radiation calibration. Based on the visible light image, 30–40 control points were selected, and the corrected multispectral image data was geographically registered in ArcGIS 10.6 software for subsequent canopy segmentation.

3.2. RBFNN Predictive Interpolation Generates DEM

RBFNN [27, 28] can usually be used for classification or spatial data interpolation. Due to its advantages of simple structure, fast learning speed, and not easily falling into a local minimum, it was often used in spatial data interpolation prediction [29]. RBFNN has been applied to generate DEM through the interpolation of airborne LiDAR point cloud, which is also trying to generate DEM in the photogrammetric point cloud [30]. Therefore, we used the network to take the ground point cloud obtained by filtering as an input and interpolated each point to generate the height value. RBFNN is usually composed of an input layer, a hidden layer, and an output layer (Figure 3).

The output formula is

We usually describe as where is the output layer function of RBFNN, is the weight of the sample hidden neuron to the neuron of the output layer, uses the Gaussian function, is the center of the basis function in the sample hidden layer neuron, and is the width of the cell of the sample hidden layer neuron.

The DEM with a resolution of 0.2 m was generated from the ground point cloud which was produced by RBFNN interpolation, and the CHM was obtained by subtracting it from the DSM (Figure 4).

3.3. Multispectral Image Combined with CHM Canopy Extraction

The selected three areas had a variety of features such as bare soil, shadows, and weeds around the canopy, which made it difficult to extract the canopy. In order to describe the canopies of individual shelter forests, this study proposed an OBIA method based on fusion of multisource data (FMSD-OBIA) to identify the canopy. The traditional pixel-based supervised classification method is based on statistical spectral features, in which the selected sample feature values were clustered to obtain pixel-level classification results. The improvement of image resolution resulted in the spectral feature of a single pixel and the reduced texture information. The OBIA method was based on the characteristics of spectral, texture, shape, etc., divided into regions or sets, which was more applicable to high-resolution image data. In this paper, we used the multispectral sensor Micro MCA12 Snap equipped with two bands of red edges. The red edges had a high sensitivity to vegetation, which could reflect the spectral characteristics of vegetation better, and had certain advantages in the classification of vegetation. Twelve original bands and the CHM were selected for combination, and the tree crown was extracted based on the FMSD-OBIA method.

Segmentation and classification operations of FMSD-OBIA were carried out in ENVI 5.3, including three main steps: segmentation, merging, and supervised classification. Reasonable segmentation and merging scales are very important in FMSD-OBIA methods. If the scales of segmentation are too large, results of the method will be subject to merge and recognize smaller tree crowns, otherwise, results of the methods will be subject to divide larger tree crowns into considerable multiple parts and broken patches. In ENVI 5.3, the edge algorithm was selected for segmentation, the full lambda schedule algorithm was selected for merging, and through repeated experiments, FMSD-OBIA parameters were selected (Table 3). -Nearest Neighbor (KNN) and Support Vector Machine (SVM), the supervised versions of the algorithms, were typically used in the following FMSD-OBIA segmentation. SVM is an excellent small sample learning algorithm, which has shown good robustness in remote sensing image classification. In this study, tree canopy and other object (considered as background) were classified into two categories, and the training samples (70% of all manually delineated canopies) were selected by using the sigmoid kernel function. The SVM algorithm performed the supervised classification and obtained the crown vector image.

3.4. Canopy Extraction Based on CHM and Tree Height Estimation

Based on the CHM, R-package ForestTools (https://github.com/AndyPL22/ ForestTools, Plowright, 2020) [31] was used to complete the position detection and canopy area division of individual shelter forests. We used Variable Window Filter (VWF), a single-tree detection function that is an LMF based on a dynamic circular moving window [32]. Since the crown size of different trees was different, various linear functions were needed to be used. The height value of pixels was used to estimate the radius of the search window. According to the suggestion of a previous study in [21], when the crown of trees was narrow, a smaller search radius was suggested to be used, while when the crown of trees was larger, a larger search radius was suggested. Through repeated experiments, the search radius of the study area was selected, as shown in Table 4.

Then, the marker-controlled Inverse Watershed Segmentation (MCWS) method was used to explore the tree crown size based on the detected tree crown vertices. The watershed algorithm [33] was proposed by Vincent, whose basic idea was to treat the image as a topographic map, in which each gray value in the image represented the altitude of the point, each local minimum value and its affected area represented the water catchment basin, and the boundary formed a watershed. However, images with irregular noise and gradients were prone to oversegmentation, so that a watershed algorithm that incorporates prior labels was devised to address this problem. The algorithm reversed the CHM, used the crown vertex as the seed point, calculated the gradient of each grid cell to the neighborhood, determined the contour of the crown area, and set the minimum height of the single tree minHeight parameter to 2 m. Through the above operations, the tree crown extraction results were received.

The results obtained by the FMSD-OBIA method were vectorized, and the vector image was superimposed on CHM to get the acquired data ( results). The LMF algorithm was applied to the numerical statistics of each polygon (tree crown), whose maximum value was the tree height.

3.5. Accuracy Evaluation of Tree Crown and Tree Height

In order to further evaluate the generated crown maps, the multispectral images of the three areas were manually delineated by an experienced researcher and then the resulting manually delineated crown maps were used for space comparison with the model-generated crown maps. For the sake of simplicity, the map automatically drawn by the extraction model and the map drawn manually were called the target map and the reference map, respectively. The reference crowns near the image boundary had been removed and only the remained crowns were used in the following evaluation. According to the spatial relationship between all remaining reference crowns and target segments, they were divided into the following five categories [34, 35]: (a)Crown matched: the target canopy map (canopy extraction results of the model) and the reference canopy map exceed 50% of each other, which was regarded as a crown matched(b)Crown nearly matched: the target canopy map and the reference canopy map only exceed 50% of one of them, which was regarded as a crown nearly matched(c)Crown missed: Both the target canopy map and the reference canopy map were not within 50% of each other, which was regarded to be a crown missed(d)Crown merged: if there were multiple reference crowns with more than half the area covered by a target canopy, the multiple reference crown maps were taken as crowns merged in the automatic delineation(e)Crown split: if there were multiple target segments with more than half the area covered by a reference crown, the reference crown map was considered a crown split in the automatic delineation

Crown matched and crown nearly matched are considered to be the correct crown width extraction results and recorded as True Positive (TP); crown missed and crown merged are considered to be omission errors and recorded as False Negative (FN); and crown split corresponds to commission errors and is recorded as False Positive (FP). Then, the crown extraction recall rate (recall), accuracy rate (precision), and -score are defined as follows [36]:

To estimate the tree height, the linear regression was applied to analyze the estimation results of models and the collected field measurements, and the coefficients of determination () and Root Mean Square Error (RMSE) were employed to quantitatively evaluate the accuracy of the estimation. The value of ranges between 0 and 1, a larger value of which indicates that a good fitting effect is obtained. The RMSE was used to measure the deviation between the predicted value and the measured value, a smaller value of which indicates that the error is small and the prediction effect is good. The calculation formulae of and RMSE are as follows: where is the predicted value of the sample of the tree height estimation model, is the measured value of the sample of the shelter forest, is the mean value of the measured sample, and is the total number of samples.

4. Results

4.1. Analysis of the Results of Extracting the Crown of the Individual Shelterbelt

The results of different methods for extracting the canopy of three selected areas are shown in Figure 5, which can be seen through the combination of field sampling, photos, and visual observation. From the method of combination of LMF and MCWS in study area 1, a splitting of one tree crown into two (oversegmentation phenomenon) appeared and two tree crowns were merged into one (undersegmentation phenomenon). Oversegmentation phenomena also appeared in study areas 2 and 3, and there were also many crowns missed. The canopy extracted by the FMSD-OBIA method effectively solved the problems of oversegmentation and undersegmentation phenomena in study area 1, although undersegmentation phenomena in study areas 2 and 3 still appeared, which had been reduced. Furthermore, incidences of the crown omission phenomenon also decreased. Overall, the FMSD-OBIA method was better than the traditional method of the combination of LMF and MCWS, and the crown extraction performance is also improved.

In order to qualitatively evaluate the results of different canopy extraction methods on the extraction accuracy of individual canopies of the shelter forests in study area, we calculated relevant evaluation indexes, as shown in Table 5. The FMSD-OBIA method achieved good results in the three study areas: the average -score was above 0.89, while the average -score of the combination method of LMF and MCWS did not exceed 0.8, with values between 0.75 and 0.79, indicating that the crown result of the FMSD-OBIA method was more consistent with the actual shelter forest crown result.

For study area 1, the sizes of the individual tree crowns in the dense forest area were inconsistent, which posed a significant challenge to the method of combining LMF and MCWS, and resulted in some individual trees with small crowns missed or merged, with the healthy tree recall value of 0.74 and the dead tree recall value of 0.70. Furthermore, there were multiple local maxima in broad-leaved forests (such as Ulmus pumila and Elaeagnus angustifolia), causing individual trees to be divided into multiple trees. A total of 40 healthy trees and 19 dead trees were subject to oversegmentation, with precision values of 0.83 and 0.84, respectively. The FMSD-OBIA method effectively reduced the occurrence of these two phenomena: the recall values increased by 0.12 and 0.16, and the oversegmentation phenomenon decreased by 16 and 13 trees, respectively, hence resulting in increases in the precision values by 0.08 and 0.11.

The healthy trees in study area 2 were compact but sparse, comprised mostly of Populus bolleana. In addition, the canopy area was small: the multispectral images with 0.2 m spatial resolution were mostly 20–50 pixels, and the canopy area was 0.8–2 m2. The oversegmentation of all methods was obviously reduced, and the precision value of healthy trees was above 0.93. The recall values of the healthy tree and the dead tree increased by 0.13 and 0.20 relative to the method of combining LMF and MCWS, and the -score were increased by 0.09 and 0.16, respectively, which demonstrated that the FMSD-OBIA method effectively avoided the missing and merging of the trees.

There were more dead trees than healthy trees in study area 3, which also had numerous shrubs under the trees. It was difficult to distinguish trees using height information alone. Multispectral images combined with height information were used to extract crowns based on objects, effectively removing shrubs from the crown images. The adhesion and merging improved the recall and precision values, which increased by 0.23 and 0.07, respectively, indicating that the spectral information played a key role in extracting the canopy.

4.2. The Accuracy Evaluation of the Tree Height Estimation Model

Individual tree height was extracted using the method proposed in this paper and compared to the field-based tree height measurements with linear fitting in study areas 1 to 3 (Table 6). The RMSE values of the three types of the shelter forest study area were different as shown in Table 6. It can be noticed that the value of study area 3 was the highest, in which there were the dead trees, with an RMSE of 1.03 m. The value of the mixed forest in study area 1 was the second highest, which contained multiple tree species and a denser shelter forest, with an RMSE of 0.68 m. And the RMSE of study area 2 was the lowest with a value of 0.30 m, indicating that the dead tree area without leaves resulted in a larger error. We also analyzed the average value of the samples in each study area and the estimated tree height. The results are described as follows. The measured average tree height in study area 1 was 8.02 m and the estimated average tree height was 7.6 m. The average measured tree height in study area 2 was 4.01 m and the estimated average tree height was 3.7 m. And the average measured tree height in research area 3 was 5.77 m and the estimated average tree height was 5.1 m. The results can explain the overall underestimation of the estimated tree height using the CHM. The underestimation situation presented above had a difference in various types of shelter forests, and the best estimation was acquired in a vigorous and sparse forest (), with a high correlation, nearly as great as the one acquired in mixed forest (). The weakest correlation was obtained in a sparse forest with many dead trees (), which showed the difference in the ability of photogrammetric point clouds to rebuild the canopies of different shelter forests.

Because we did not collect ground points, this study used the original point cloud without the RBFNN interpolation. And the DEM was directly generated from Pix4D software to estimate the tree height for comparison, verifying the feasibility and scientificity of RBFNN interpolation. It can be seen from Table 6 that the accuracy of the tree height estimation results after RBFNN interpolation was overall higher than those of the other two methods. The error of the results without RBFNN interpolation increased, and the correlation of those decreased, which demonstrated the effectiveness of RBFNN interpolation prediction. RBFNN interpolation can be used to address the problem of low DEM accuracy of the photogrammetric point cloud caused by the large canopy. The canopy coverages of study areas 2 and 3 were small. After RBFNN interpolation, and RMSE were higher than the other two methods (DEM generated without RBFNN interpolation and DEM directly generated by Pix4D software), which proved that RBFNN was effective and feasible in photogrammetric point cloud. On the whole, the tree height estimation results of RBFNN interpolation were more accurate and efficient.

5. Discussion

5.1. Analysis of the Results of Extracting the Crown of the Individual Shelterbelt

Due to the discontinuous change of the gray level of the pixels in the CHM, the segmentation process will generate smaller segmentation units, and oversegmentation and undersegmentation often occur. The method based on combining spectral information with the CHM and using an FMSD-OBIA method to extract the crown, which considered the object the basic unit, is a better approach than the traditional methods for the extraction of the crown of a shelter forest. It effectively reduces the misclassification of the pixels of shelter forest crowns as other feature categories. Thus, with more corrected information (such as spectrum and height) related to the canopy extraction, the accuracy of the proposed model is improved, which is consistent with the research results in [37].

The method based on combining LMF and MCWS had a large difference in the crown extraction results in the three study areas. The accuracy of the canopy extraction of the sparse forest (in study area 2) was better than that of the mixed dense forest (in study area 1), but the sparse forest in study area 3 resulted in the worst extraction effect. The main reason for the difference is the large difference in the characteristics of the three types of shelter forest. In study area 3, there were many withered (dead) trees, and the coverage area was small; thus, the canopy vegetation features were not obvious. In addition, the protection forest had large row spacing and small density, which also made it difficult to extract the canopy. If the canopy is too small to extract, the phenomenon of omission error is more serious. Relatively, the sparse forest in study area 2 had large line spacing and low density, but the area of the crown width was appropriate, which could be more easy to well distinguish the crown from the background and thus result in a better extraction accuracy.

In the mixed forest in study area 1, the large differences of the crown size made it difficult to determine the search radius of the VWF function into the procedure of individual tree detection, causing the oversegmentation and omission error of the extreme trees that were too large or too small. The accuracy of the sample-based object-oriented method in all three research areas was higher than that of the method of the combination of LMF and MCWS. The canopy extraction results were ideal, meaning that the canopy edges were better identified and the tree crown information was completely described. Regardless of whether it was a sparse forest or a mixed forest, the emergence of the oversegmentation and undersegmentation was reduced, and the overall accuracy of the sparse forest with the FMSD-OBIA method was higher than that of the mixed dense forest. Dead trees with smaller crown areas had higher accuracy than the mixed dense forest because, after adding spectral information, there was a certain difference between the dead tree and the surrounding environment spectrum, which could be effectively distinguished. However, the best results cannot be obtained using spectral information alone, which is 11%–14% lower than the canopy extraction results of the study in [38].

In general, the FMSD-OBIA method for crown extraction can effectively solve the problems of oversegmentation and undersegmentation. However, the study still has some limitations. Although the object-oriented method had high precision, the degree of automation was not ideal, as the threshold needed to be set manually which were usually obtained requiring a number of attempts Different parameters also needed to be set as for different types of shelter forest and especially in mixed forests, it was almost impossible to define different canopy areas to get accurate boundary information, contributing to the difference between the results of the extraction of different canopy areas and the actual edges. Therefore, a single optimal segmentation scale was not suitable for extracting tree crowns with large differences in crown areas in the same image.

5.2. Analysis of Differences in the Tree Height Estimation of Different Shelter Forest Types

In the study of using the UAV photogrammetric point cloud to estimate the height of individual trees for different types of shelter forest, the availability of the DEM generated based on RBFNN interpolation was confirmed. This approach has high estimation accuracy and can be implemented economically. The method would be replaced the point cloud data with the one obtained by LiDAR or laser scanner. Existing studies have shown that DEM accuracy will be affected by vegetation coverage, slope, and interpolation algorithms [39]. The shelterbelt was located on the edge of a desert, where the terrain was relatively flat and the vegetation coverage was not large. The data acquisition time was during the transition between autumn and winter seasons. Studies in [40, 41] have shown that the DEM is the most accurate in winter with low vegetation coverage. The CHM obtained by interpolation through the RBFNN can theoretically get a more accurate tree height estimation. Analyzing the actual tree height and estimating the average tree height showed that the overall estimated tree height was underestimated (between 0.3 and 0.7 m). Nuijten et al. [42] indicated that the tree height may be underestimated when leaves have fallen. After the frost in October in the northern Xinjiang region, the leaves of the shelterbelt began to become yellow and gradually fell, while the vegetation coverage of the canopy also diminished, which led to the overall underestimation phenomenon of tree heights.

The difference in tree height estimation of different shelterbelt types was mainly due to the acquisition and reconstruction of photogrammetric point clouds. The image data obtained from the UAV platform was affected by many factors in the reconstruction of the 3D canopy point cloud [13, 39, 43], including the UAV platform, sensors, image acquisition parameters, and protection forest type. This study focused on the differences in the ability to reconstruct photogrammetric point clouds in different types of shelter forest, selecting image data acquired on the same day, of the same sort, and using the same flight platform, overlap rate, and lighting conditions, to prevent these parameters from having an impact on the reconstruction of the point cloud. Through this research, it was found that the tree species and health status of the shelter forest had different effects on the tree height estimation. For dead trees without leaves (Populus bolleana, sparse forest), the RMSE was 1.03 m, and the correlation with the measured data was lower than that of other types. The dense mixed forest had a lower RMSE than the sparse forest with more dead trees. The crown area of dead trees was significantly smaller than that of healthy trees, and the vegetation information was not obvious, which made the reconstruction of point clouds difficult. Generally speaking, the higher the point cloud density, the more canopy information was obtained; this was more conducive to the reconstruction of canopy point cloud information, and higher point cloud density can provide higher estimation accuracy of tree height [44]. Dead trees had a small canopy and a limited number of point clouds. The lower canopy coverage results in fewer features that can be extracted, which reduces the ability to reconstruct the point cloud. This resulted in a higher tree height estimation error, which is consistent with the research results in [22].

6. Conclusions

This study proposed a novel method of individual tree height estimation based on fusion of an airborne multispectral image and photogrammetric point cloud and selected 3 areas in the shelter forest for verification. The DEM generated after RBFNN interpolation could meet the requirement of estimating tree height, which confirmed that the photogrammetric point cloud obtained by a CW-20 fixed-wing UAV, equipped with SONY-A7RII camera, had the significant potential for the estimation of tree height. The coverage and health condition of protective forest canopies had a certain influence on the reconstruction of photogrammetric point clouds. Dead trees had a small canopy area and no physiological characteristics of healthy vegetation, so that the SfM algorithm extracted fewer features from the images. The FMSD-OBIA methods were employed in the combination of 12-bands and CHM increased the spectral information of the shelterbelt forest canopy, effectively reduced the phenomena of oversegmentation and undersegmentation, increased the -score by 0.12–0.17, and improved the accuracy of canopy extraction. The proposed method is effective for estimating the tree height of individual trees of shelter forests in a desert area but nonetheless requires further improvement. The degree of automation of object-oriented methods is not ideal, and there is definitely room for further improvement of the accuracy and the speed of the canopy, which can be replaced with deep learning methods. Moreover, the two types of forest stand data, from dense and sparse forest areas, can be used to build a tree height growth model based on multiperiod image data, which will be helpful to conduct health assessments to protect the desert forest and provide an important reference for maintenance and replacement.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to acknowledge the support from colleagues of the Geospatial Information Engineering Research Center, Xinjiang Production and Construction Corps, especially Yongjian Ma, Wenzhong Tian, and Xiang Long. This study was financially supported by the Xinjiang Production and Construction Corps Science and Technology Project (2017DB005); the Geospatial Information Engineering Research Center to Create, Xinjiang Production and Construction Corps (2016BA001); and the Central Government Directs Local Science and Technology Development Special Funds (201610011).