Potential of combining SAR and optical remotely sensed data for rapid urban mapping is highlight. Two groups of optical and SAR remotely sensed data are selected to evaluate the strategy. Outputs are verified and analyzed from 3 aspects. The single class and merged map accuracy are evaluated; the proposed method is compared with 2 mature algorithms; the selected classifiers are applied to 7 different fusion algorithms to make further comprehension. The outcomes illustrate the potential of synergic optical and SAR data for monitoring urbanization status and demonstrate that the proposed SAR/optical information synergy method improved the capabilities of urban mapping compared with separately using SAR and optical data. The results demonstrate that the proposed method can map built-up area, water body, and vegetation at accuracy of 99.31%, 91.92%, and 91.72%, respectively. These results are much better than when solo optical or SAR data was selected and better than classification results based on mature fusion methods. The main contributions of this article are as follows: the proposal of a rapid urban mapping strategy based on integration of optical and SAR data and the verifying and analysis of potential of synergic optical and SAR data for rapid urban mapping.

1. Introduction

With the development of aerospace technology and remote sensing technique, more and more earth observation data archives are becoming available which increases the possibility of joint use of optical and SAR data, as well as different sensors at a regional or global level for urban monitoring. As a matter of fact, in the last decade, there have been a number of different approaches for urban mapping [13]. Some of these literatures focus on urban mapping using only optical remotely sensed data for many years in several parts of the world [47], because the optical remotely sensed data which is richer in spectral information would be on behalf of surface reflective and emissive spectrum. But the disadvantage of optical remotely sensed data is limited to distinct the same object with different spectrum, different objects with the same spectrum while SAR remote sensing data has an advantage of penetration which is suitable for analysis of surface roughness, structures, and dielectric constant and has an ability to distinguish different land cover types by its structure and shape. Hence, more and more literatures focus on algorithms developing for complicated surface urban mapping with SAR remotely sensed data [811]. One of the most important tasks of urban practitioners is to find simple yet effective approaches for urban extent extraction in optical and SAR data, as Gamba et al. cited and discussed the main studies exploiting the potential of different optical and SAR sensor data [12]. Integration of optical and SAR remotely sensed data in feature extraction and classification is attracting more and more attention recently [13]. In this paper, the most significative urban cover types for rapid urban mapping are extracted by means of combining optical and radar data. The purpose of this study is to explore a convenient way to obtain urban cover types by cooperating with results extracted from active and passive remotely sensed data.

The paper is organized as follows: overall strategy and methodology are presented in Section 2. The proposed methodology is applied on two different test areas with different representative surroundings, and rapid urban mapping results will be shown in Section 3. In Section 4, the analysis will be presented and conclusions will be given in Section 5.

2. Methodology

The overall procedure of this paper includes five parts: human settlement extraction, vegetation and water body extraction, rapid urban mapping, mature data fusion, and SVM and NN classification. This procedure can be graphically described as Figure 1. In particular, the first three steps can be summarized as rapid urban mapping stage, detailed as follows: human settlement is extracted from passive SAR data with method based on Gray-Level Cooccurrence Matrix (GLCM) and Local Indicators of Spatial Association (LISA) unsupervised method in the first step; then vegetation and water body are generated using Normalized Difference Vegetation Index (NDVI) and Modified Normal Differential Water Index (MNDWI) quantitative indexes, respectively; finally, these primary results are merged with a decision fusion algorithm to deal with omission/commission pixels and generate final urban cover map. More details of these three steps are described in Sections 2.1 and 2.2. A standard SVM classifier is selected to obtain urban land use/cover status map from original optical and SAR data and their fusion consequence. Ground truth of these two different test areas is made by different experts and using the same color legend as in Figures 4(b) and 5(b). We should mark one important comment here which is the ground truth maps were obtained by different remote sensing specialist manually. Instead of “reference data,” the “ground truth” data is selected, because of the lack of “reference data,” and we do not except a 100% accuracy of the ground truth, but the “ground truth” data we obtained is satisfied to validation procedure. So the level of accuracy shown below should be considered as matching the level of uncertainty in validation sets.

2.1. Information Extraction

In remotely sensed data analysis, spatial neighborhood of a pixel may contain more information than the pixel itself, and this phenomenon is more significant for SAR data because the speckle noise makes the single pixel value unreliable. Textures extracted from GLCM gauge the statistical properties of pixels neighborhood [14]. There are mass of texture information that can be drawn on the basis of GLCM, while variance and correlation are selected aimed at extracting human settlement. Due to the fact that only dual polarized data is available in this step for SAR data processing, algorithms based on GLCM features and the LISA detailed in paper [15] are selected. The selected LISA indexes are Moran’s , Geary , and Getis-Ord [16], which can be described in (1), and they are functions of the pixel values () in a -neighborhood of the current one () and the weights are the elements of a “weight matrix” , which defines the above-mentioned neighborhood. There are three stages for human settlement extraction on the basis of texture information extract based on GLCM as shown in Figure 2, and the algorithms are detailed in [14, 17, 18]. Processing window size is 3 times 3, cooccurrence shift is 1 for and direction, and gray scale quantization level is 64.

In this study, water body is extracted using threshold method based on MNDWI proposed by Xu [19], and vegetation is obtained with similar method based on NDVI. The thresholds are chosen by statistic and iteration method. There are two strategies for initial threshold value determination: (1) value obtained by automatic classification with symbol of layer properties in ArcGIS and (2) numerical analysis of NDVI gray image using MATLAB.The final threshold value is adjusted according to the situation on the ground, where MIR, NIR, and Green represent the middle-infrared, near-infrared, and green band, respectively. Because there is no MIR band in ALOS AVNIR data NDWI is selected for ALOS data, while MNDWI is selected for Landsat dataset. Mathematically, NDVI can be calculated by the formula as where NIR is near-infrared channel and RED is red channel for remotely sensed data and for ALOS AVNIR-2 image they are channel 4 and channel 3, respectively.

2.2. Plurality Voting Fusion

Information integration is one of the most important steps for the explored rapid urban mapping strategy. Without a doubt, one of the most popular and promising integration strategies, the plurality voting method is chosen for its simplicity, convenience, and efficiency [20, 21]. There are weighted and unweighted voting schemes over voting method. For unweighted voting scheme, all outcomes have the same “authority” to decide the pixels to the class , while for weighted scheme the contribution/accuracy of each outcome is considered when a decision is taken. There are simple weighed vote, rescaled weighted vote, best-worst weighted vote, quadratic best-worst weighted vote, and WMV and more details about these mathematical model are found in [22]. In this paper, WMV scheme is selected, and the detailed application in this paper can be described as follows: discard nonbuilding area, nonwater body, and nonvegetation covers in the beginning; then the classification accuracy of each class is assigned as weight and WMV scheme is applied to all previous binary results. The accuracy of the integration is maximized through assigning weights in the theorem of WMV, and weight of each class is computed aswhere is the individual accuracy of each class. There are two approaches to obtain the individual accuracy of each class : (1) was calculated after the classifiers results have been compared to ground truth and (2) was calculated on classification result based on validation region of interest (ROI), except training ROI and test ROI. Supervise classification is on the basis of some a priori knowledge to select ROI, while, in our experiment, the specialist selected two ground truths manually taking place of the two sets of validation and test ROI. In this test, the first method is selected, and, in consequence work, the second method will be tried with more test areas. Let us denote as pixel predicted to class , and prediction of final voting result can be described as when and go into the same class and ; otherwise .

3. Experiment Results

There are two scenes of optical and SAR remotely sensed data (Table 1) which were selected within and around the city of Xuzhou, China, in developing country and the capital of Pavia, Pavia, Italy, in developed country (Figure 2). The first dataset is comprised of ALOS AVNIR-2 and PALSAR images over Xuzhou city. Xuzhou is located in the northwest of Jiangsu province, which covers 11257 square km and has a population of more than ten million people. As a typical mining industrial city, Xuzhou has been called “Coal Sea of eastern China.” The second dataset is comprised of Landsat TM multispectral image and ERS-2 SAR image collected by the European satellite. The test case covers an area around the town of Pavia, in northern Italy.

Optical and SAR data are geometric coregistered using image-to-image method based on GCP method. Over 25 ground control points were selected and the root mean square errors for all images were less than 0.5 pixels. The datasets of Pavia city has been preprocessing when obtained. Dataset of Xuzhou city has been preprocessed using ENVI FLAASH ALOS AVNIR-2 atmospheric correction kit developed by third party. In the developed module, sensor type is “UNKNOW-MSI,” satellite altitude is 691.65 km, and pixels size is 10 m; in multispectral settings, the avnir2sli data was selected as filter function file. The cause of choosing those two sites would be that they are representative of the different situations. The downtown center is high-density large settlement in Xuzhou city and the city of Pavia is composed of well-organized sparse settlements while large amount of vegetation areas surround it. The considered land cover types are the most representative and significant three cases: water body, human settlement, and vegetation mostly include farmland covers area and wood fields, and additional, unclassified area which appears in results of proposed method and ground truth maps is marked as black.

As described in the Methodology, firstly, texture information is calculated for human settlement extracting automatically from SAR data; then vegetation and water body cover maps are extracted with NDVI and MNDWI indexes. Finally, the finally urban covering statues map is obtained by a “soft” fusion of previous results. The overall classification results are verified by corresponding ground truth as shown in Figures 3 and 4(a) for classification result and in Figures 3 and 4(b) for ground truth.

In order to make a further evaluation, an analysis based on supervised classification is performed. In this experiment, the SVM classifier is selected to divide research area into the same land cover types. For SVM classifier, the kernel type is “radial basis function,” Gamma in kernel function is 0.333, penalty parameter is 100, pyramid level is 0, and classification probability threshold is 0.00. For NN classifier, two hidden layers were selected, and number of training iterations is 1000, training threshold contribution is 0.90, training rate is 0.20, training momentum is 0.9, and training RMS exit criteria are 0.1. The results can be seen as Figures 5 and 6.

Outcomes of the proposed synergy strategy are also compared with classification results based on image fusion results of optical and SAR data. A few of mature image fusion algorithms including Brovey [23], GS [24], PCA [25], and trous wavelet transform applied in the Hue Intensity Saturation space (ATWT + HIS) [26] as well as LYR and CT are selected to fuse optical and SAR data. The SVM [27] and NN classifier are applied on the fused datasets. And the representational classification results of fused data are shown as Figure 7 for SVM classifier and Figure 8 for NN classifier.

4. Discussion

As the paper aims to design a strategy for rapid urban mapping based on the synergy of optical and SAR remotely sensed data, it should be characterized with straightforwardness, time efficiency, robustness, and easy operation. In this stage, we compare time consumption of the main steps; for instance, building area extraction from SAR data is 75.480919 seconds, while wavelet fusion based on wavelet transform and consistency detection for Xuzhou city is 2225.924652 seconds, and all these tests take place under Mac Book Pro computer configuration which is CPU of 2.3 GHz Intel Core i5, RAM of 8 GB 1333 MHz DDR3, 320 GB hard disk. And the results of the proposed strategy are analyzed from three aspects as follows.

4.1. Merged Accuracy Analysis

In order to evaluate accuracy of the proposed strategy, the obtained results are compared with the single land cover type obtained step by step. In this study, the accuracy of merged results is compared with accuracy of vegetation and water body types extracted from threshold NDVI and MNDWI and human settlement extracted using LISA and GLCM method. For the first test area Xuzhou city, original datasets of ALOS PALSAR and ALOS AVNIR-2 optical data are shown as Figure 3, and merged classification results are shown as Figure 5, and the statistical accuracy of single category is shown as Table 2.

From Table 2 we can see that there is a significant accuracy improvement for every category after a “soft” merge. There is a 40.69% improvement and 7.93% improvement after merge for vegetation area and human settlement, respectively, in the Xuzhou city. And there is a 47.01% and 19.15% improvement for vegetation and human settlement in the Pavia city. For water body, the most relatively stable land cover type, which has slim accuracy improvement, there is a 2.16% improvement for Xuzhou city and a 6.42% improvement for Pavia city.

4.2. Comparison with Supervised Classification Algorithm

Overall accuracy and kappa values of SVM classification results for optical, SAR, and merged data are shown as Table 3. As can be seen, with a qualitative sense from a comparison of Figures 7, 8, 5, and 6 and a quantitative mineral from Table 3, there are a 13.65% and a 35.69% OA improvement for merged result compared to classification results of optical and SAR data using SVM classifier in Xuzhou city. And there are a 6.53% and 15.29% improvement for merged result compared to classification results of optical and SAR data using SVM classifier in Pavia city. There is less OA improvement using ERS and Landsat dataset compared to using PALSAR and AVNIR dataset. The reason for this phenomenon might be high accuracy of human settlement and water body in Xuzhou city, and the designed human settlement extraction algorithm is more suitable with high space resolution PALSAR data compared to moderate resolution ERS SAR data.

4.3. Comparison with Classification Based on Data Fusion

Overall urban mapping accuracy is evaluated and compared with results of the proposed strategy, shown as in Table 4. The numbers in Table 4 indicate that the proposed strategy improves classification accuracy significantly compared to mature fusion method. Table 4 shows that the proposed method obtained the highest OA score in Xuzhou and Pavia city. In total, it is clear that there is an evident accuracy improvement when image fusion algorithms are applied to optical and SAR data with respect to solo optical and/or SAR data. In particular, the proposed strategy has much more advantage than mature image fusion method.

5. Conclusions

In this paper, we have proposed a convenient method combining optical and SAR data for rapid urban mapping. The proposed synergy of active and passive method is able to map human settlement at an accuracy level of 99.31% and 88.01%, water body at an accuracy level of 91.92% and 77.49%, and vegetation at an accuracy level of 85.73% and 91.72% for Xuzhou and Pavia, respectively. These results are much better than results extracted from unitary remote sensing sensor shown as Table 2. In comparison with supervised classification, OA of our procedure are higher than those results from SVM classification. The advantage of the proposed procedure is obvious when compared with classification results based on mature fusion outcomes. All these demonstrate that our method has robust functionality for high accuracy rapid urban mapping in the selected research areas. The method will generate valid results so long as research area is covered by medium and/or high resolution SAR data (in our search the ERS and PALSAR data) and optical remotely sensed data with green, red, and near-infrared channels. The major contributions of this research can be concluded as follows: from an application perspective it introduces a convenient rapid urban mapping strategy, and it is one of the first studies merging optical and SAR data for urban cover status map extraction and comparing it to unitary artificial satellite sensor data. Human settlement is extracted from active PALSAR/ERS data, while water body and vegetation are obtained from passive AVNIR-2/TM data.

Nevertheless, the proposed strategy does have some limitations. The most obvious aspect is that the proposed method is on the basis of texture information of SAR data and the very specific spectral band of optical data. For instance, human settlement extraction step does not work well or even does not work for SAR data with very coarse resolution, while optical data without near-infrared channel should switch to another solution for water body or vegetation extraction, yet seldom aircraft remote sensing sensor would be missing these commonly used bands. The other limitation is that the finally extracted results of vegetation and water body land cover types will be affected by threshold, and there should be different thresholds for different research area.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This paper is supported by the National Natural Science Foundation of China under Grants nos. 41601450 and 41401403; the Key Research Project Plan of Colleges and Universities in Henan Province (no. 16A420004), and Ph.D. Fund of Henan Polytechnic University (nos. B2015-20, B2014-018). The authors would like to give their sincere thanks to Professor Peijun Du from Nanjing University for his suggestions and Professor Paolo Gamba from University of Pavia (UNIPV), Italy, for his suggestion of this research and providing ERS and Landsat data over Pavia area.