Due to the high similarity of the spectra of urban water and building shadows, high-resolution satellite imagery often confuses and wrongly classifies these features. To address this problem, we propose an object-based method for distinguishing building shadow from water using an artificial bee colony algorithm. In the method, four spectral ratio bands are first calculated as additional input parameters for improving the accuracy of segmentation results. During the segmentation, a series of statistical factors, such as spectrum, ratio, and sharp features, are calculated to make up for defects in the high-resolution imagery. Finally, we propose a fuzzy-rule-based classifier to generate extraction rules. The classifier is based on artificial bee colony optimization, which employs the geometric mean (-mean) as fitness function. The proposed method was carried out on two test sites in Xiamen City. The experimental results based on GF-1 satellite date show that, compared with SVM method, the proposed method improved the overall accuracy of extraction by approximately 6% to 15% and the kappa coefficient values by approximately 0.1 to 0.2. The analysis of the extraction rules also proves that the red/NIR band and the length-width ratio band are significantly influenced by the distinction between building shadow and water.

1. Introduction

Mapping of remote sensing imagery is widely used for surveying urban land resources, monitoring natural disasters, land-use planning, and so on. With the rapid development of technology concerning high-resolution satellite imagery (HRSI) such as IKONOS and QuickBird, it is possible to identify small-scale features such as roads and buildings in urban environments. HRSI has been shown to be an effective means for generating urban digital images with positional accuracies. High-resolution monitoring of urban environments remains a challenge, however, because of the complex nature and diverse composition of features within such areas. Many features found in the urban environment are spectrally similar (such as shadow and water), leading to problems in automated and semiautomated classification methods for urban areas. The principal problem caused by the shadow effect is either a reduction or total loss of information; this is particularly significant in high-density urban areas where shadows are cast by buildings. Moreover, building shadow is a typical source of noise during the extraction of water or other dark objects because such objects are hardly separated by their spectra. Conventional methods, such as the normalized difference water index (NDVI), maximum likelihood, and minimum distance from mean, have difficulty distinguishing water from building shadow in high-resolution imagery because they utilize only spectral information. Spatial information, such as texture and context, should be exploited to improve the accuracy of classifying spectrally similar objects.

Many algorithms have been proposed for identifying water bodies by HRSI. These methods can be divided into four categories: single-band threshold, multiband ratio, supervised or unsupervised classification, and linear unmixing [1]. Among them, the single-band threshold method is wildly used because it is easy to apply, but its accuracy is unsatisfactory because it adopts only one band’s information. The multiband ratio method utilizes the spectral characteristic of various bands, which can reduce the errors resulting from the conditions of sensors and atmospheric radiation [2]. The formulation of water indices, such as NDWI [3, 4], MNDWI [5], and SWI [6], have been widely adopted in water extraction for low- and moderate-resolution data. Despite differences in the expression of the spectral reflectance pattern, the basis of these indices is either enhancing signals of water features or suppressing those of non-water sources. The single threshold was unstable, however, varying with scene and location. Choosing among the indices is problematic, as the knowledge of them is often inaccurate or inconsistent. In addition, considering the available water indices for high-resolution imagery, NDVI is the only choice because most high-resolution images have only the visible and NIR bands. The classification accuracy of water may therefore degrade because it is mixed with spectrally similar objects, such as shadows and low-albedo urban surfaces.

To solve the problem, researchers have introduced an abundance of shadow detection methods into water indices [7], such as thresholding, the invariant color model, and the shaded relief algorithm. For instance, Ma et al. derived a normalized saturation-value difference index (NSVDI) to distinguish shadow via thresholding [8]. Thresholding, however, lacks spatial information, resulting in the inclusion of land-water transition zones in shadow regions. The dark-blue objects (water) may be incorrectly segmented by using invariant color models. The shaded relief algorithm, which is based on solar elevation, solar zenith, and the digital elevation mode (DEM), does not calculate the shadow cast by topographic features onto the surrounding surface [9]. Recently, the authors in [10] have proposed utilizing the machine learning methods for shadow detection. Meanwhile, Adeline et al. suggest that Object-based method can accurately detect shadow from urban high-resolution images [11].

The image classification method is another option to extract features, including water and building shadow. Supervised and unsupervised classification can be accomplished using either pixel-based or object-based approaches. In addition to spectral and textural information utilized in pixel-based classification methods, object-based methods also use geometrical characteristics and topographic relationships for classification [12]. Conventional classifiers such as decision tree [13], maximum likelihood (ML) [14], and support vector machine (SVM) [15] produce only crisp classifications. Since they only classify each pixel/segment as one class, they have limited success in classifying high-resolution urban images. Conversely, fuzzy classification techniques, which can distinguish the overlapping spectral signatures of closely related features more accurately, are better at expressing the nature of the urban features. It is thus more appropriate than conventional classifiers for classifying high-resolution urban images.

Another advantage of fuzzy classification methods is that they have the most intuitive representation. The simple rules of fuzzy classifiers, which realize the formulation from the feature space to the class space through a set of IF-THEN rules, can employ the enhanced search capabilities of swarm algorithms such as particle swarm optimization (PSO), ant colony optimization, and artificial immune system.

Recently, a new intelligence theory, artificial bee colony (ABC), has also been applied for solving optimization problems. ABC is an emerging technology in swarm intelligence algorithms; it was proposed by Karaboga in 2005 [16]. The principle of this algorithm is similar to the ant colony optimization (ACO) and genetic (GA) algorithms, which utilize nature-inspired intelligent computation to simulate the behavior of biota. The ABC algorithm simulates the cooperation and competition between individuals to reflect the acts of swarm intelligence, which aptly addresses some difficult nonlinear problems [17]. In addition, the interactive process can perform global and local searches simultaneously; it therefore has the advantage of fast convergence and strong applicability. In addition, the algorithm can be implemented easily and integrate with other algorithms to improve itself. ABC has therefore developed rapidly and has already been used and achieved satisfactory results in difficult-problem optimization, image-change detection [18], image segmentation, classification [19], and so on. Previous studies have revealed that ABC resulted in better-quality solutions than PSO, GA, or ACO. Those cases, however, focused mainly on optimization problems; few of them were particularly applied to classification tasks. The accuracy and efficiency of ABC for high-resolution image classification have not been fully examined.

To address these problems, we propose a method for distinguishing building shadow from water in high-resolution satellite images. The method consists of two steps. First, we employed an image segmentation algorithm to obtain homogeneity objects from a high-resolution image. As the four available bands may not address the whole problem, we involved four bands of ratio to calculate each object’s geometrical and topographic characteristics. Second, we provided a fuzzy-rule-based classifier to distinguish shadow and water segments. The classifier applied a customized ABC algorithm for optimizing the best fitness function, that is, the classification accuracy of water and shadow.

2. Materials and Methods

2.1. Study Area and Data Preprocessing

The study area was Xiamen City (Figure 1), which is on the southeast coast of the PRC (24°23′–24°54′ N latitude, 117°53′–118°26′ E longitude). We selected two subscenes as test sites; both of these were characterized by complex surface features such as vegetation, built-up areas, and other dark surfaces (shadows and water bodies). Site A on Xiamen Island consists of abundant building shadows located within a complex urban background (Figure 1(c)) and a few extensive water bodies. Site B on Dadeng Island exhibits complex water features, including sea, ponds, and aquatic parks. The differences between the two sites were used to validate the robustness of the proposed method.

The GaoFeng-1 (GF-1) image was acquired in May 2014. The GF-1 satellite is the first satellite in China’s high-resolution earth observation system and was launched on April 26, 2013. Its WFV sensors can obtain 16 m resolution multispectral color and 2 m resolution panchromatic images. The experimental image was treated by image preprocessing, including orthorectification, geometric correction, atmospheric correction (FLAASH), and image fusion. The original data included four bands: blue (0.45–0.52 μm), green (0.52–0.59 μm), red (0.63–0.69 μm), and near-infrared (0.77–0.89 μm). Through image clipping, site A had an image size of 1276 × 1103 pixels and site B was 2384 × 2712 pixels.

Because our only focus was distinguishing between water and building shadow, thresholding based on the images’ brightness was used to mask most other land-cover classes before image segmentation; the rest of the objects were collected as samples through manual selection. The reference data used for accuracy assessment was acquired mainly by visual interpretation. Approximately 350 randomly distributed polygons were manually digitized at both sites. Dataset A had 1213/58 training pixels/objects and 11614/98 pixels/objects as reference, and dataset B had 4237/58 training pixels/objects and 20361/126 pixels/objects as reference.

2.2. Methods

Many studies have extracted shadows from high-resolution images using pixel-based methods. Such methods, however, produce a number of errors. To reduce those errors, in this study we employed an object-based method in which objects are identified by number of pixels, shape, and other criteria. The fuzzy-rules extraction method was based on ABC optimization, namely, F-ABC. A flowchart of the F-ABC method is shown in Figure 2. The method included three steps: segmentation, fuzzy-rules extraction, and prediction.

2.2.1. Image Segmentation

There are several object-oriented image segmentation algorithms, but most algorithms cannot integrate spectral and special information and differ in efficiency and results. Due to the lack of a standard evaluation system, an optimal segmentation scale does not exist, so the optimal scale must integrate with specific remote sensing images manually. The segmentation algorithm used in this study is similar to the multiscale segmentation algorithm in eCognition software [20]. Our algorithm includes three main steps. First, apply initial segmentation on an original image using the fast sweeping method, then form initial segmentation objects, and then construct a region adjacency graph. Second, using the initial region adjacency graph, merge regions using the heterogeneity-oriented smallest-region-merging algorithm. Third, according to some level output rule, export the middle segmentation result that meets the demand of the level output rule until the segmentation termination condition is met.

Restricted by the quality of sensors, the original high-resolution imagery has information based on only four bands, which is insufficient for producing high-quality results. To improve segmentation and subsequent classification results, four additional ratio bands, including blue/green, green/red, red/NIR, and NDVI, were calculated through ArcGIS’s band math tool. Concerning scale selection, the initial scale was set at 70 and the terminate scale was set at 250. Finally, with eight bands as input features, the segmentation maps were generated through the multiscale image segmentation method. The number of objects at different scales is shown in Table 1, and the segmentation maps are shown in Figure 3.

Figure 3(c) shows that the segmentation map at scale 70 obtained the most objects. Small buildings, small water areas such as ponds around farms, and greening design around buildings are relatively clear. Roads, forestland, grassland, and sea, however, are relatively fragmentized. Conversely, Figure 3(d) shows that the outlines of roads and water bodies correspond to the visual interpretation results at scale 100, but parts of water bodies are merged with roads and trees. The phenomenon of merging exists universally at the subsequent scales; it will definitely reduce the accuracy of ultimate classification results. Consequently, the segmentation map at scale 70 was used to distinguish building shadows from water bodies.

Subsequently, we collected statistical information on each object in the segmentation map at scale 70. That information was divided into three categories: spectrum, ratio of multibands, and geometrical features. Among them, the spectral features included mean value, maximum value, and minimum value on each band. Ratio of multiband features included spectral brightness, standard deviation, and ratio of maximum and minimum brightness values. Object shape features included area, convexity, hardness, generate rate, decurrent rate, shoxen rate, shape index, and length-width ratio. Due to the existing correlation effect among features, however, including all the features in the classification would increase the complexity of computation and might therefore affect accuracy. Thus, all objects as training samples were subjected to Pearson’s correlation analysis. Features with correlation coefficient values larger than 0.7 were excluded. Some 13 features were finally selected, including number of pixels, average DN value of spectral bands and ratio bands, standard deviation, and shapes. Description of the statistical information is shown in Table 2.

2.2.2. Fuzzy-Rules Extraction Based on Artificial Bee Colony Optimization

A standard ABC model includes four main parts: (1) extraction rule definition, (2) fitness function construction, (3) global/local neighborhood search, and (4) overlapping objects prediction. To apply ABC to the extraction of water and building shadow, the original ABC algorithm had to be reconstructed. Specific content was performed as follows.

(1) Fuzzy-Rule-Based Rule Definition. The extraction rule is in the IF…THEN format derived from the fuzzy-rule-based system. This format inherently is a threshold segmentation system based on multimetrics; it adopts a single threshold or a maximum-minimum threshold interval similar to a decision tree. Compared to conventional “crisp” classifiers such as ML and SVM, fuzzy-rule classification methods do not require specific mathematical formulas and the extraction rule is more direct and easier to explain. This kind of method is now applied to many classification models for remotely sensed imagery [21]. The formation of the rule is represented as follows:, , denotes the fuzzy sets belonging to the th feature and is the class described by the th rule. The of each feature is composed by the upper bound () and the lower bound () of the DN value, which can limit certain classes from feature spaces.

(2) Fitness Function Construction. The procedure of fitness function construction is the core of the ABC algorithm; it directly affects the ultimate result. Concerning the extraction of urban water, the similarity between water and building shadow is the main challenge to the accuracy of water extraction. This is because water and building shadow are both dark in remotely sensed imagery and are easily confused. Optimization of urban water extraction is therefore equal to distinguishing the two kinds of objects, water and building shadow, to the greatest extent. In fact, the number of urban water features and building shadows was very different. The building shadow objects are more broken and the number of objects is larger. Consequently, this paper introduces the average geometrical classification accuracy index as a fitness function. The formula is shown below:In this formula, refers to the sensitivity of the binary classification result, which reflects the accuracy of most labels. , and correspond to the statistics of the binary classification result; the concrete implication is demonstrated in Table 3.

(3) Global/Local Neighborhood Search. Our study adopted random initialization for the initialization of the ABC algorithm. Local search utilizes mostly random search strategy to jump out of a locally optimal solution, but local neighborhood search can make the algorithm converge systematic, which is the key step to improving the fitness evaluation. Local neighborhood search can be formulated as follows:In this formula, the new position of represents the nectar gatherer on driving force factor , where belongs to , belongs to , but is not equal to . Factors and are generated by random number (Random is a random number in ), and the new position is within the extent of the extremum of factors. The search process is equal to the iterative process of the algorithm. It derives the ultimate optimizing classification result by calculating the fitness of each search result and combing it with the greedy rule. When current rule is extracted, delete the labels confirmed to rules and reselect labels until the remaining labels are all classified or the preset termination condition is met.

(4) Overlapping Objects Prediction. The classification rule based on the ABC algorithm can be used to extract urban water. For the overlap of extraction rules generated by the fuzzy rule, this study proposed an extra index, regular labels cover degree . This index reflects the ratio between the accurately classified labels and all the labels used of a specific class. When overlap occurred in labels, we usually gained the ultimate result by synthetically comparing fitness and overlap rate. The formula is shown below:where is the weight threshold and is the number of labels. Factor is within , which can reflect the effect of these two indexes on the predicted result. To gain the optimal result, a statistic of iterative calculation is applied to . Step size is set as 0.1 and cycle computing is adopted to sample data using the formula above. Hence, the value and threshold is computed and the ultimate extraction result is gained.

2.3. Accuracy Assessment

To validate the extraction result, a widely used pixel-based SVM classifier was also adopted in this experiment. Error matrices and kappa coefficient were both used for accuracy assessment. The comparison between SVM and our proposed method was based on the same datasets derived from two test sites. The only difference was that the SVM method is pixel-based and our proposed method is object-based. The algorithm parameters of ABC were set as bee colony scale = 200, limit search number = 200, and maximum cycle number = 600.

3. Results

3.1. Water and Building Shadow Extraction Map

Figure 4 shows the water and building shadow extraction results using F-ABC and SVM on the two images. With the advantage of the segmentation process, visual inspection of Figure 4 revealed that the proposed object-based method successfully extracted most of the urban water bodies with complete shapes, while the results by SVM were incomplete. For example, F-ABC extracted ponds with completely sharp roads (yellow region), whereas the roads in the SVM result were discontinuous. Moreover, the SVM result had many fragmented pixels misclassified as building shadow, but F-ABC could effectively suppress such noise.

3.2. Extraction Accuracy

Based on reference datasets of the two test sites, the statistical comparisons are shown in Tables 4(a) and 4(b). The confusion matrixes indicated that the F-ABC achieved significantly better accuracy than the SVM classifier in both areas. The F-ABC improved the overall accuracy of extraction by approximately 6% to 15%, and the kappa coefficient values by approximately 0.1 to 0.2. For test site A, the overall accuracy of F-ABC was 87.69% with a kappa coefficient of 0.7633, whereas the SVM had lower overall accuracy of 81.97% and a kappa of 0.6588. For test site B, F-ABC still presented better accuracy, with an overall accuracy of 86.93% and a kappa coefficient of 0.7633. By comparison, the SVM had a much lower kappa (0.5591) and overall accuracy (71.73%).

Compared to SVM results, the producer’s accuracy for each class was improved when using F-ABC. In particular, for water at site B, Table 4(b) shows a PA improvement of more than 28%. This largest error was due to misclassification between the water and other classes by SVM. That is, 3,353 of the water reference pixels were classified as other. By contrast, F-ABC achieved a much better result, with only 413 of water pixels misclassified as other. The classification of SVM showed higher UA than F-ABC; this was because SVM omitted approximately 25% (3,343 of water, 2,144 of shadow) of the reference pixels because of its limitation of crisp classification. For site A (Table 4(a)), the PA of water improved approximately 6%, from 77.82 to 83.40. The PA of shadow was 92.93%, also higher than the 87.05% of SVM. The SVM classifier took more references of water pixels (239) into the classification procedure. Most of them, however, were misclassified as building shadow. By contrast, F-ABC identified fewer references of water but a greater total number and performed well for distinguishing between water and shadow.

4. Discussion

In this study, we found that the improvement of accuracies using F-ABC could benefit from two aspects. First, with object-based image segmentation, pixels were generated as an object with its complete shape. Pixels of ambiguous spectrum could therefore be counted as an object and be involved in the extraction procedure. For example, more than 3,000 reference pixels were identified by F-ABC as either building shadow or water for site B, whereas they were misclassified as other by the SVM classifier. Second, these improvements further indicated that segmented objects might contain the information necessary for distinguishing water from building shadow.

The building shadow and water detection method in F-ABC was used to address the misidentification caused by overlapping spectral signatures of closely related features. Figure 4 shows the extracted shadow and water maps at both test sites. Visual inspection of Figure 5 indicated that F-ABC achieved better performance for shadow detection than SVM. The F-ABC method can extract a more complete shape by employing image segmentation based on spectrum, geometrical characteristics, and topographic relationships. For example, the object-based method detected the shape of shadows for different kinds of buildings well. The pixel-based SVM method, however, was incapable of extracting complete shapes, especially on the edges of shadow areas.

The segmentation procedure also improved the extraction of water bodies. Figure 5 shows that the shape of a lake, which is located on east side of site B, was well detected by F-ABC, but was detected as discontinuous by SVM. The extraction of incompletely sharp water bodies and building shadows led to reference datasets being misclassified as other and a consequent loss of accuracy. The accurate extraction of a complete shape sensitively depends on the selection of segmentation scale. Since the optimal scale varies with different areas, how to determine the optimal scale by automatic methods is still one of the challenges for object-based classification tasks. The increment of scale for segmentation may lead to the incorrect merging of pixels from different land-use types, and the decrease of scale causes division of pixels from common land-use types, resulting in the object-based accuracy eventually closing toward pixel-based classification. In this study, we applied the scale of 70 for distinguishing between water and building shadow by simple visual inspection from multiscale segmentation maps. We believe that the segmentation map was capable of detecting most water bodies and building shadows in the study areas. Yet a small number of objects were hardly segmented under the selected scale if they were characterized by small areas or with neighboring buildings.

Spatial measures extracted from the segmentation procedure can decrease the rate of misclassification for the spectrally similar water/shadow classes. Other classes, however, should have their own best-suited spatial measures. Toward that end, we developed a fuzzy classification scheme to overcome the overlapping objects prediction problem. The ABC algorithm was used to acquire the optimal fuzzy rules for the provided fitness function. Those fuzzy rules were composed of different thresholds for different test sites. At site B, for example, eight rules were acquired by training the F-ABC. Among these were four rules for water and four rules for building shadow. Similar to combining fuzzy classification with other swarm algorithms such as ACO and PSO, the first rule for each class covers the greatest number of training samples; this indicated that the first rule had the most capability to describe characteristics related to its own class. The specific thresholds of first rules are shown in Table 5. The first fuzzy rule of water was capable of correctly detecting 87.5% of training samples labeled as water, with a fitness of 89.5%. High accuracy of detection could also be found for the first fuzzy rule of building shadow, which achieved a 94.0% fitness quality and 92.68% completeness of coverage.

Figure 6 shows the normalized smooth curves based on fuzzy rules for water and building shadow. We found that there were several significant distinctions between water and building shadow, which might provide some clues for the distinction between these two classes. (1) In terms of the spectrum, building shadows have narrower gray spaces than water does between band 2 (green) and 3 (blue). Water objects, however, completely covered the information from shadow at bands 1–3. A small space at band 4 (red) might be useful to distinguish between water and light shadow. (2) Among all four ratio bands, only band 7, NIR/R, showed a significant difference (green region) between water and building shadow; this indicated that object might tend to be shadow if the ratio signal was lower than 2.252 at band 7. (3) Figure 6 shows that there were significant differences among the three spatial bands—10, 11, and 13 (blue regions)—these differences could improve accuracy based only on the limited information of the spectral and ratio bands. Thus, these differences could be applied to distinguish between water and shadow, which proves that the F-ABC method could overcome the overlapping features problem.

5. Conclusions

To address the problem of water features being confused with building shadow and so being misclassified in urban areas, we proposed an object-based water and building shadow detection method, namely, F-ABC, which is based on fuzzy-rules classification and ABC optimization. First, multiscale segmentation was applied to high-resolution imagery and the best suitable scale was selected by visual interpretation. Four spectral ratio bands were calculated as additional input parameters for improving the accuracy of the segmentation result. Second, image segmentation was employed to calculate object’s statistical features, which included single-band gray value, ratio of multibands, and geometrical characteristics. Then, using the geometric mean (-mean) as the selected fitness function, fuzzy rules were extracted, taking the advantage of ABC optimization in addressing the complicated system optimization problem.

The experiment was carried out on two test sites exposed to GF-1 imagery in Xiamen City and the result of F-ABC was compared to that of the conventional SVM method. By integrating visual interpretation and statistical results, we analyzed the extraction rules and synthetically estimated the accuracy of applying the proposed method in urban water extraction. Compared with outputs of the widely used SVM method, our results showed that F-ABC improved the overall accuracy of extraction from approximately 6% to 15% and the kappa coefficient values from approximately 0.1 to 0.2. The results indicate that the proposed method could effectively distinguish water from building shadow; this could satisfy actual needs.

Further analysis of the extraction rules indicated that although the two types of objects are similar in most features, the NIR band, red/NIR band, and the length-width-ratio band are significantly influenced by the differences between water and building shadows. Follow-up research would combine these results with another index model, such as the SWI (shadow water index) [22], and introduce textural features to the segmentation algorithm to improve extraction accuracy.

Competing Interests

The authors declare that there are no competing interests.


This work was supported in part by the National Natural Science Foundation of China under Grants 41401475, 41471366, and 41501448 and Xiamen University of Technology high level talents under Grant YKJ13022R. The study was in cooperation with Universities Project of Fujian Bureau of Surveying, Mapping and Geoinformation (no. 2015JX04).