Abstract

The traditional remote sensing image segmentation method uses the same set of parameters for the entire image. However, due to objects’ scale-dependent nature, the optimal segmentation parameters for an overall image may not be suitable for all objects. According to the idea of spatial dependence, the same kind of objects, which have the similar spatial scale, often gather in the same scene and form a scene. Based on this scenario, this paper proposes a stratified object-oriented image analysis method based on remote sensing image scene division. This method firstly uses middle semantic which can reflect an image’s visual complexity to classify the remote sensing image into different scenes, and then within each scene, an improved grid search algorithm is employed to optimize the segmentation result of each scene, so that the optimal scale can be utmostly adopted for each scene. Because the complexity of data is effectively reduced by stratified processing, local scale optimization ensures the overall classification accuracy of the whole image, which is practically meaningful for remote sensing geo-application.

1. Introduction

GEOBIA has been the mainstream method for processing high spatial resolution remote sensing images [1, 2]. Spatial dimensions are crucial to GEOBIA methods [3], and scale has a great influence on a remote sensing image’s object-oriented classification. However, due to the complexity of feature type, there is no absolute optimal scale suitable for all objects [46]; scale is a problem that needs to be solved in image segmentation [7]. Segmentation quality will be limited by the parameters set by the experience of the user [8], and the optimization algorithm determines the optimal segmentation parameters of the overall image, which is a compromise result of all objects.

Different objects or geographic phenomenon have inherent spatial and time scales [9], and it is increasingly difficult to recognize complex patterns in highresolution [10]. To extract objects or separate them from their surroundings, the processing scale (segmentation scale) needs to be set similar to object spatial scales [11]. Object-based scale selection is the key to object-based image analysis, and selecting an inappropriate scale will cause over-segmentation or under-segmentation [12]. This will reduce the accuracy and efficiency of multiscale information extraction from high spatial resolution images [1315]. Many methods have been used to select optimal parameters for multiscale segmentation [1625]; however, optimal segmentation parameters for an overall image may not suitable for different objects when processing large heterogeneous images [26, 27]. A key issue that remains to be resolved is to determine a suitable segmentation scale that allows different objects and phenomena to be characterized in a single image [28, 29]. However, observations indicate that there is a tendency: the same types of objects often have similar spatial scale and often aggregate in the same area. Therefore, it is a feasible way to divide the overall image into different scenes and then use an optimization algorithm to segment the scene image into image objects, which will improve overall segmentation quality. Different from the conventional scene classification method which aims to determine the class attribute of an image [3032], the scene division mentioned in this article aims to divide one whole image into several scenes. Methods used to classify remote sensing images into scenes can be roughly separated into the following three categories: hand boundary tracing, featuring layer’s threshold segmentation, and segmentation or classification based scene division.

Ordinary hand boundary tracing method [3335] delineates scene boundaries based on color composition or the difference between feature values, such as brightness and NDVI. This method can ensure that the result will meet the user’s subjective requirements, but it suffers from the subjectivity of operator and is highly time consuming [36].

The featuring layer’s threshold segmentation method chooses one feature, such as brightness or NDVI, to roughly divide the image into several scenes by setting thresholds [37, 38]. For example, the NDVI values between a plant coverage scene and nonplant coverage scene are different, so the image can be roughly divided into several scenes using a defined threshold value. In this method, the threshold has great influence on the result, and selecting the threshold is often accomplished by using sample statistics or random samples. Therefore, both the threshold and samples used for statistics influence the division results.

The segmentation-based scene division method combines two ideas: one is to set large-scale parameters in image segmentation to obtain large object, whose size is close to scenes [39, 40] and another is to merge small objects to form large scenes [41]. The software eCognition, SPAING, and MAGIC also provide image segmentation and classification operations [41], but the segmentation result is easily influenced by linear objects such as road and river, so even if the coverage is same, one desired scene will be separated into two or more scenes.

Additionally, an image can also be classified into scenes using texture brightness or NDVI [42, 43], but this method is a simple classification operation. For example, it will divide the image into plant and nonplant scenes, lighting and shaded scenes, or rock and nonrock scenes. This method may need training samples, so it only provides good results from specific images, and lacks universality, which limits its application.

In summary, many problems can be found in these described methods: some methods are less efficient, only suitable for some types of images, influenced by subjective factors, or the result does not meet the requirements. Therefore, a new method incorporating middle semantic (entropy, homogeneity, and mean) to divide remote sensing image into different scenes is proposed. This method is not influenced by subjective factors and is suitable for most types of images because hue value and its texture can be calculated in almost every type of image. The result shows that this method can efficiently improve classification accuracy when combined with segmentation parameter optimization methods, such as an improved grid search algorithm.

2. Methods

2.1. Scene Structure and Scale Dependence in Remote Sensing Image

Combining the scale effect of remote sensing with the geographic concept of scene structure may find a breakthrough to solve the scale problem [44]. Scene structure is the composition and structure of different scales of geographic units in a certain geographical area. A geographic entity or a phenomenon’s spatial pattern often exhibits a certain degree of scale dependence, so using different time spans and spatial range to observe the same objects may provide different results or conclusions [44]. Different scene structure has different visual complexity, and more objects in a scene will lead to a more complex scene. The scale of interest in this study is the segmentation scale. In order to obtain high precision segmentation result, the segmentation scale needs to be similar to the inherent spatial scales of geographic units.

2.2. The Principle of Stratified Segmentation

A scene is bounded by land planning or grouped by economic influence, and the type and distribution pattern of one type of object in one scene are similar, but the scene structure between different scenes may be different. Therefore, different scenes have individual suitable segmentation parameters. Most segmentation methods and parameter optimization algorithms are aimed at determining the best result for an overall image, but this is a compromise of different objects and is unsuitable for different types of objects. In this study, a stratified object-oriented image analysis based on remote sensing image scene division is proposed. This method can break down the complex entire image into several simple spatial structure scenes (Figure 1). Objects with similar color will have similar hue value, so some features such as hue value can be used to divide image into scenes. What is more, different scenes’ visual complexity and structure may also differ, so the texture of hue can be used to reflect it. While mean can reflect the main hue (main object) of one scene, entropy and homogeneity can reflect the scene structure. According to entropy and homogeneity, the image can be divided into single coverage type scenes and complex coverage type scenes. And according to the mean value, the single coverage type scene can be redivided into several feature dominant scenes. Using parameters optimization methods to segment different scenes individually, then every scene’s final segmented scale will become as close as possible to geographic units’ inherent spatial scale.

2.3. Segmentation Parameters Optimization Based on an Improved Grid Search Algorithm

An improved grid search algorithm was used to optimize segmentation parameters. The grid search algorithm (GSA) uses the grid, which is divided into two parameters for optimization within a certain space range, to find one set of optimized parameter by traversing all crossings in the grid. In this process, all combinations of parameters are traversed. Given a large enough parameter selection range and short enough step size, the method can find the global optimal solution and obtain the optimal combination of parameters at the same time. However, this is time intensive. To improve GSA efficiency for parameter optimization, an improved GSA (IGSA) is proposed. First, it obtains an approximate optimal solution using a large scale and step size. Then, one of the parameters is fixed, and a small step size is used to search another parameter value in a narrow search range near the fixed parameter. Usually, this improved method centers on an approximate optimal combination and expands with crossing directions [45]. Therefore, the first selection of step size is particularly important for grid searching with expanding crossing directions.

3. Experiments and Analysis

3.1. Experimental Data

To test the method’s robustness, two study areas were selected. The first is a QuickBird pansharpened image (Image A) of Hualien City, Taiwan, China (Figure 2). The size is 12000 × 12000 pixels, with a resolution 0.7 m per pixel. The primary land cover types in this image are buildings, plants, bare land, roads, and water. The second is a QuickBird multispectral image (Image B) of the Alma Cray area (copper mine), Uzbekistan (Figure 2), and it has a size of 3400 × 3400 pixels, and the resolution is 2.4 m per pixel. The cover types are buildings, plants, bare land, mines, and water.

3.2. Scene Division: The First Step of Stratified Segmentation

As the process steps show in Figure 1, after preprocessing, the near-infrared, red, and green bands were selected for RGB color synthesis in both studies. Then, the image was transformed from RGB color space to HSV color space. The hue layer values can represent the coverage colors, and similar color’s hue values are also numerical approximations. The calculation windows should be smaller than object sizes but large enough to distinguish object features, and based on this, eight texture layers, representing hue layer characteristics, were obtained. The hue values reflect the scene color differences. Because the goal is scene division, the textures for different scene values are represented with different gray scales (values). Most texture measures within a given group are strongly correlated. Homogeneity, dissimilarity, variance, and contrast are strongly correlated, and entropy is strongly correlated with the second moment [46]. For scene division, the scene differences need to be magnified. So in texture layers, the differences of values in different scenes need to distributed in different ranges. Therefore, entropy, homogeneous mean layers, and HSV layers were chosen to cooperate with the original image to produce an integrated image for scene division. Different scenes’ main colors were different, and the boundaries in those images are more pronounced than the original image.

The eCognition multiscale segmentation has proven to be the superior method at present [21]; thus, this method was used for scene division and subsequent scene image segmentation. There are three parameters in this method: scale, shape, and compactness. The experiment parameters of image A were scale: 1000, shape: 0.1, and compactness: 0.5; and the parameters set for image B were scale: 1500, shape: 0.1, and compactness: 0.5. The bands chosen for the image A scene division were near-infrared, hue layer, mean layer, homogeneity layer, and entropy layer with a weighting of 1 : 1 : 1 : 1 : 1. The bands chosen for image B’s scene division were blue, green, red, near-infrared, hue layer, mean layer, homogeneity layer, and entropy layer with a weighting of 1 : 1 : 1 : 1 : 2 : 2 : 2 : 2, which weighted the texture layers more than the other parameters. Figure 3 shows the scene division results after segmenting the image using the described parameters and merging the crushing scenes. The overall image A was divided into six scenes, and according to their different dominant characters, they were named as follows: low covered building, high covered building, low covered plants, high covered plants, and ocean scene (Figure 3). The clouds were removed from the image; therefore, the overall image below haven’t include a clouds scene. The overall image B was divided into city, mineral, and two low covered plants scenes (Figure 4).

3.3. Image Segmentation and Classification

The segmentation result has a great influence on the subsequent classification, so the accuracy of classification, to a certain extent can reflect the merits of segmentation [47]. Therefore, the classification result can be used to evaluate the segmentation result in this study. This article sets up comparative experiments to verify the effectiveness of scene division-based object-oriented image analysis method. Except scene division, other processes of these two sets of experiments are same; both the overall image and scene images use the same classification and test samples.

Tables 14 show the number of classification and test samples. A larger number of features used in classification require longer computational time [48], so only brightness, NDVI, NDWI, and shape index were used as classification features. GSA was used to obtain optimal segmentation results for different scenes.

Figure 5 shows the overall and scene images’ Kappa coefficients [49] for different parameters in image A, while Figure 6 shows same kind of content for image B. The optimal segmentation parameters are marked in the figures. The following conclusions are drawn from these results. First, both the overall and scene result show that the classification accuracy is significantly affected by segmentation parameters. Second, different images’ optimal classification results correspond to different segmentation parameters, which indicate that it is necessary to divide the overall image into different scenes. Four scene image and the overall image’s optimal segmentation parameters are all different; that means overall image’s optimal segmentation parameters are just a compromise of all different objects. So it is not suitable to use one set of parameters to segment all kind of objects, and dividing the image into several scenes which were occupied by different objects can reduce the effect of scale effects on image segmentation as much as possible.

3.4. Classification Comparison between the Stratified and Ordinary Segmentation Result

Tables 5 and 6 show image A and image B’s optimal segmentation parameters for each scene and overall image; the accuracies were calculated using (1). The last line shows five scene images’ merged results; the optimal segmentation parameter has not been provided. Compared with the overall image, the accuracy of the merged image A, which was produced using a combination of scene images, increased by 8.70% (2), and the Kappa coefficient increased by 9.70%. In Table 6, compared with overall image, accuracy for the merged image B, which was produced using a combination of scene images, increased by 11.20%, and the Kappa coefficient increased by 21.12%. This improvement indicates that the proposed stratified segmentation method can improve segmentation accuracy and reduce the scale effect on classification results.where is the value of accuracy, is the number of reference samples, is the index of classes, and is the number of objects be classified into class where the reference category is also class .where is the value of improved accuracy or Kappa coefficient, is the accuracy or Kappa coefficient of merged image, and is the two kinds of value of the overall image.

The overall and scene image classification results for image A are shown in Figure 7, and classification results for image B are shown in Figure 8. The building classification results based on image scene division (Figure 7) provided more details compared with overall image. Compared with other objects, the building occupied scene need a smaller scale, which is reflected in Table 5. The building classification is more sensitive to the segmentation parameters compared with other objects, and different from other objects, building’s higher accuracy only appears at smaller scale parameters. For image B, the difference between the overall and scene image is even clearer. City scene (Figure 6) has a suitable scale that is much smaller than other scenes, and even two low covered plant scenes have different appropriate scales (Table 6). However, because of the similar spectral features of building and other objects, the building classification accuracy is also poor (Figure 6). In the overall and scene images, there are some rock and waste disposal sites that are incorrectly classified as buildings. During segmentation, as Figure 7(b) and Figure 8(b) show, the stratified method can provide segmentation results similar to objects’ inherent scale as much as possible.

4. Discussion and Conclusion

The proposed stratified segmentation method combines hue and hue layer textures to divide scenes, which is theoretically more similar to the human visual mechanism. Through scene division, the complex of entire image was effectively reduced. In practice, this method is strongly universal, so can be easily used, also the divisions are of accuracy to some extent. The result shows that this method can improve the final classification accuracy effectively, especially for large-sized images wherein the aggregation phenomenon is clear. The method can significantly aid in remote sensing image classification and feature extraction. In addition, various segmentation methods can be used in different scenes depending on scene image characteristics, which may correspond to spatial scales, and thus improve classification accuracy.

The proposed method also has some shortcomings; the segmentation parameter optimization method used in this method may increase time consumption. However, the focus of this study is stratified segmentation. Therefore, the subject of future work will be a more efficient optimization algorithm, combining knowledge of spatial statistics to estimate the optimal segmentation parameters, and finding the most suitable segmentation method for scenes dominated by different coverage.

Although the stratified method was implemented by eCognition, this idea has been adapted to all GeOBIA work. This method can also be used recursively in an image with huge size or complex nested landscape, which means that we can use stratified method to divide overall image into several scenes, then reapply this method to every scene to get subscene until the appropriate image objects are segmented.

Data Availability

Image data were purchased from a commercial data sales company. Other data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Wen Zhou conceived and designed the study, performed the experiments, and wrote the paper. Dongping Ming proposed the research idea, supervised the research, and revised the manuscript. Lu Xu and Hanqing Bao helped to perform the experiments. Min Wang provided significant comments and suggestions.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (41671369 and 41671341), the “Fundamental Research Funds for the Central Universities,” the Open Fund of Twenty First Century Aerospace Technology Co. Ltd. (Grant no. 21AT_2016-07), and the Major Science and Technology Program for Water Pollution Control and Treatment (2017ZX07302003).