Abstract

Land use and land cover (LULC) mapping in urban areas is one of the core applications in remote sensing, and it plays an important role in modern urban planning and management. Deep learning is springing up in the field of machine learning recently. By mimicking the hierarchical structure of the human brain, deep learning can gradually extract features from lower level to higher level. The Deep Belief Networks (DBN) model is a widely investigated and deployed deep learning architecture. It combines the advantages of unsupervised and supervised learning and can archive good classification performance. This study proposes a classification approach based on the DBN model for detailed urban mapping using polarimetric synthetic aperture radar (PolSAR) data. Through the DBN model, effective contextual mapping features can be automatically extracted from the PolSAR data to improve the classification performance. Two-date high-resolution RADARSAT-2 PolSAR data over the Great Toronto Area were used for evaluation. Comparisons with the support vector machine (SVM), conventional neural networks (NN), and stochastic Expectation-Maximization (SEM) were conducted to assess the potential of the DBN-based classification approach. Experimental results show that the DBN-based method outperforms three other approaches and produces homogenous mapping results with preserved shape details.

1. Introduction

Urban land use and land cover (LULC) mapping is one of the core applications in remote sensing. Up-to-date LULC maps obtained by classifying remotely sensed data are essential to modern urban planning and management. In many remote sensing systems, the synthetic aperture radar (SAR) has long been recognized as an effective tool for urban analysis, as it is less influenced by solar illumination or weather conditions in contrast to optical or infrared sensors [1]. Since more scattering information can be collected in multipolarizations, polarimetric SAR (PolSAR) data have been increasingly used for urban LULC classification [24].

Nevertheless, most studies about urban mapping using SAR or PolSAR data are limited in identifying the urban extent or mapping very few urban classes. Few studies have focused on detailed urban mapping using SAR data. The difficulty in detailed urban mapping using SAR data is mainly due to the complexity of the urban environment. The urban environment is comprised of various natural and man-made objects with several kinds of materials, different orientations, various shapes and sizes, and so forth, which complicates the interpretation of SAR images. Problems can also originate from the nature of polarimetric SAR imaging such as inherent SAR speckle or geometry distortions such as shadow and layover [1, 2]. As a consequence, detailed urban mapping using high resolution SAR data is still a challenging task.

Regarding the method of urban land cover mapping, approaches can be generally divided into pixel-based or object-based classification. Object-based methods, which directly explore the contextual information to improve the mapping accuracy, have been increasingly employed recently [5]. By using object-based approaches, shape characteristics and inner statistics of segmented objects can be used as classification features [68]. However, the ideal segmentation on urban areas using SAR data is often difficult to achieve. Pixel-based approaches have been traditionally used for coarse-resolution SAR data with reasonable results. However, when dealing with high-resolution SAR data, the pixel-by-pixel approach is usually limited because of the speckles and increased interclass variance [9]. To cope with the problem of pixel-based approaches, some contextual analyses, such as Markov random field (MRF), have been employed [1012]. Although contextual approaches [1018] can learn the statistics within the local neighborhood, their capability to represent spatial patterns is limited. Moreover, although some texture indices can be used to describe certain spatial patterns, most of them are still limited in their relatively simple representation capabilities [19, 20].

From the perspective of data modeling, LULC classification methods can be grouped into parametric and nonparametric approaches. Parametric approaches, such as the minimum distance classifier, maximum likelihood classifier, and the expectation-maximization (EM) algorithm, often require proper assumptions of data distribution [21]. However, for multitemporal or multisource data, the class distributions are hard to model. On the other hand, nonparametric approaches, such as artificial neural networks, decision tree, and support vector machine (SVM), are widely used in land cover classification [22]. Nevertheless, the performance of nonparametric approaches strongly depends on the selected classification features.

As an advanced machine learning approach, deep learning has been successfully applied in the field of image recognition and classification in recent years [2327]. By mimicking the hierarchical structure of the human brain, deep learning approaches, such as Deep Belief Networks (DBN), can exploit complex spatiotemporal statistical patterns implied in the studied data [28, 29]. For remotely sensed data, deep learning approaches can automatically extract more abstract, invariant features, thereby facilitating land cover mapping. However, to the best of our knowledge, no research has been reported using deep learning for detailed urban LULC mapping on SAR data.

The present study proposes a detailed urban LULC mapping approach based on the popular deep learning architecture DBN. This study is one of the first attempts to apply the deep learning approach to detailed urban classification. Two-date high-resolution RADARSAT-2 PolSAR data over the Great Toronto Area (GTA) have been used for evaluation.

The rest of this paper is organized as follows. Section 2 describes the proposed land cover classification approach based on the DBN model. Section 3 introduces the data and the process of the experiment. Section 4 presents and discusses the experimental results. Finally, we conclude this paper in Section 5.

2. Methodology

The proposed approach is based on the DBN model. This section briefly reviews the principle of the DBN model and describes the proposed method for land cover classification.

2.1. Deep Belief Networks

The DBN model was introduced by Hinton et al. in 2006 [28] for learning complex data patterns. It has become one of the extensively investigated and deployed deep learning architectures [24, 25]. The DBN is a probabilistic multilayer neural network composed of several stacked Restricted Boltzmann Machines (RBMs) [28, 30]. In a DBN, every two sequential hidden neural layers form an RBM. The input of the current RBM is actually the output features of a previous one. A DBN is therefore expected to hierarchically explore the pattern features in several abstract levels, given that the features obtained by a higher-level RBM are more representative than those obtained by lower ones. The training of DBN can be divided into two steps: pretraining and fine-tuning. This training process is further discussed below.

2.1.1. Restricted Boltzmann Machines

As the basic component of a DBN, Restricted Boltzmann Machine (RBM) can be treated as an unsupervised energy-based generative model. An RBM consists of a layer of visible units and a layer of hidden units , connected by symmetrically weighted connections, as shown in Figure 1.

Assuming binary-valued units, the RBM defines the energy of the joint configuration of visible and hidden units () aswhere represents the weight associated with the connection between the visible unit and the hidden unit , and are the bias terms, and and are the numbers of visible and hidden units, respectively. The RBM assigns a probability to each configuration () using the energy function given bywhere is a normalization factor obtained by summing up the energies of all the possible () configurations: The conditional probabilities can be analytically computed aswhere is the sigmoid function; that is, .

The training process of the RBM can be described as follows. After the random initialization of the weights and biases, iterative training of the RBM on the training data is performed. Given the training data on the visible units , the states of hidden units are sampled according to (4). This step is called the positive phase of the RBM training. In the negative phase, the “reconstruction” of the visible units is obtained according to (5). The positive phase is once more conducted to generate . Afterwards, the RBM weights and biases can be updated by the contrastive-divergence (CD) algorithm [31] through gradient ascent, which can be formulated aswhere denotes the learning rate and represents the mathematical expectation under the corresponding data distribution.

2.1.2. Pretraining

The DBN takes a layer-wise greedy learning strategy, in which RBMs are individually trained one after another and then stacked on the top of each other. When the first RBM has been trained, its parameters are fixed, and the hidden unit values are used as the visible unit values for the second RBM. The DBN repeats this process until the last RBM. Since pretraining is unsupervised, no label is needed. Unsupervised learning is believed to capture the crucial distribution of the data and can therefore help supervise learning when labels are provided. A batch-learning method is usually applied to accelerate the pretraining process; that is, the weights of the RBMs are updated every minibatch [32, 33].

2.1.3. Fine-Tuning

After the pretraining phase, the fine-tuning procedure is performed. A softmax output layer can be placed on top of the last RBM as a multiclass classifier, and the output-layer size is set to the same value as the total number of classes. To accomplish classification by utilizing the learned feature, we use the ordinary back-propagation technique through the whole pretrained network to fine-tune the weights for enhanced discriminative ability. Given that the fine-tuning procedure is supervised learning, the corresponding labels for the training data are needed. After training, the predicted class label of a test sample can be obtained by forward propagation, in which the test data pass from the lowest-level visible layer through multi-RBM layers to the softmax output layer.

2.2. LULC Classification Based on DBN

To better understand the structure of the DBN-based LULC classification, a flowchart is given in Figure 2. To delineate the high variance and speckles of the PolSAR image, a neighbor window is used for local analysis, with the to-be-classified pixel placed at the center. Such neighbor window with size of can be represented by a vector formed by the pixel values from the window. The original input feature for the DBN consists of the processed Pauli parameters, which are the diagonal elements (, , and under the reciprocal assumption) of the coherency matrix with their logarithm form stretched by linear scaling [2]. One kind of Pauli feature in a window is reshaped in a vector by sequentially connecting each feature line. A Pauli vector of a day can then be formed by connecting the three Pauli feature vectors. For multitemporal analysis, the input to DBN can be formed by connecting the dates’ Pauli vectors, with the dimension of .

For the training of DBN, Pauli vectors of the training samples are assigned to the visible layer of the first RBM as input training features. With a layer-by-layer pretraining strategy, the spatiotemporal dependencies are successively encoded in the hidden layers , and . In the output layer, the labels of the training samples are provided, and the weights of the DBN are fine-tuned in a supervised manner.

For the prediction, the input features of the test samples are prepared in the same way as that of the training samples. The classification labels for the test samples can be obtained from the forward propagation of the test features through the trained network.

3. Data and Experiment

The study area is located in northern Greater Toronto Area (GTA), Ontario, Canada. The ten major LULC classes in the study area are as follows: high-density residential areas (HD), low-density residential areas (LD), industrial and commercial areas (Ind.), construction sites (Cons.), Water, Forest, Pasture, golf courses (Golf), and two types of crops (Crop1 and Crop2).

Two fine-beam full polarimetric SAR images were acquired by the RADARSAT-2 SAR sensor on June 19, 2008, and July 5, 2008. The center frequency is 5.4 GHz, that is, C-band. The June 19 data were obtained from the descending orbit, whereas the July 5 data were obtained from the ascending orbit, as shown in Figures 3(a) and 3(b). The data from the ascending and descending orbits were expected to complement each other from two different look directions. A total of 4952065 pixels of the overlap between the two images were classified.

During the preprocessing, the multitemporal raw data were first orthorectified using the satellite orbital parameters and a 30 m resolution DEM. Then, they were registered to a vector file National Topographic Database (NTDB). A multilook process was further applied to generate the PolSAR features with the final spatial resolution of about 10 meters.

In the classification scheme, 19 subclasses were defined for the abovementioned 10 major land cover classes according to different scattering characteristics (e.g., the man-made structures have varying scattering appearance due to their distinctive shapes and directions). Approximately 1000 training pixels were assigned to each subclass. 120617 pixels evenly distributed over the classification area were randomly selected as the test samples. The training and test samples are visually shown in Figures 3(c) and 3(d), respectively.

The effective configurations of the DBN for detailed urban mapping were investigated. Comparisons with SVM, conventional neural networks (NN), and stochastic Expectation-Maximization (SEM) were conducted to assess the potential of our approach.

4. Results and Discussions

In this study, several experiments were conducted to validate the impact of different DBN configurations, including different network depths and hidden layer node numbers. To evaluate its classification efficiency, the DBN-based approach was compared with three other land cover methods: SVM, traditional neural networks (NN), and stochastic Expectation-Maximization (SEM). To quantitatively compare and estimate the capabilities of the proposed method, the overall accuracy (OA) and Kappa coefficient [34] were used as performance measurements.

The performance of the DBN-based classification method is sensitive to the neighbor window size. As the window size increases, more spatial dependencies could be captured by the DBN; thus, it is expected that better classification accuracy could be obtained with larger neighbor window size. Nevertheless, larger neighbor window size does not ensure better classification performance. Overly large window sizes could decrease the classification performance because bound areas tend to be confounded under an overlarge window. In the following experiments, the neighbor window size is set to ; thus, the dimension of the input data would be .

Several parameters of the DBN are listed in Table 1; some of these parameters are based on experimentation, while the others are based on the recommendation of Hinton [33]. All the hidden layers in the DBN have the same number of hidden units. For all the DBN depths mentioned below, only the hidden layers were counted.

4.1. Effect of Network Depth

We first examine how the DBN depth influences the classification performance. The number of hidden layers is one of the key factors to the deep learning strategy. On one hand, it is proved that additional RBM layer can yield improved modeling power [35]. A higher level of representation leads to potentially more abstract features [27]. On the other hand, Larochelle et al. [36] argue that unnecessary RBM layers may degenerate the generalization capability of the DBN because more layers engender a more complex network model with more parameters to fit. With relatively less training samples, complex models often cause the overfitting problem [35]. The best depth of the DBN is usually related to a specific application and dataset.

To find a proper network depth, DBN models with increased number of RBM layers (i.e., from one to four layers) were compared. Each DBN model had the same constant structure; that is, all the RBM layers had the same number of hidden neurons. Comparisons were also conducted by varying the number of hidden neurons from 100 to 600 per layer. The results in Figure 4 show that, regardless of the number of neurons, improved overall accuracies were all obtained by the two-layer DBN model. Although the comparisons were made only up to 4 layers, it is expected that, with more layers, the overfitting problem will become more serious, which will lead to worse results. As such, the depth of DBN was set to two layers in the following experiment.

4.2. Comparison with Other Classification Methods

To demonstrate the effectiveness of the proposed LULC classification method, a comparison was conducted with three other land cover classification approaches (i.e., SVM, conventional NN, and SEM). The same Pauli features as the DBN-based method were used in SVM and traditional NN. The SEM method [9] applied an adaptive Markov Random Field (MRF) to explore contextual information, and we used the same settings reported there. The DBN contained two RBM layers, and each hidden layer had 500 units. Conventional NN had the same parameters as those of DBN; their only difference was that the weights of NN were not pretrained with unsupervised learning. The LIBSVM [37] toolkit was used as the implementation of SVM. SVM is a binary classifier, and the one-against-one strategy was used to convert the multiclass categorization problem to the binary classification problem. Experiments were performed using a radial basis function (RBF) kernel. The penalty term and the RBF kernel width were selected using grid search within a given set . The fivefold cross validation method indicated that the best validation rate was achieved when and . These parameters were then used to train the SVM model. The classification accuracies using different classification approaches are presented in Table 2, where and stand for the producer’s accuracy and user’s accuracy, respectively.

Table 2 shows that, among the four classification methods, the DBN method results in the best performance, with an overall accuracy (OA) of 81.74%. Tables 3, 4, 5, and 6 list the confusion matrices of the four classification methods in percent.

Obviously, SEM obtained the highest accuracies in most natural classes (Water, Golf, Pasture, Crop1, and Forest). However, it performed extremely badly in several man-made classes (LD, HD, and Ind.). Generally, SEM provided the lowest overall classification accuracy of 72.43%.

Although SVM attained higher producer’s accuracies in Cons., LD, and Crop1, its overall accuracy was still below DBN by 5%. The improved classification accuracy by the DBN method mainly originated from the significant increase of Pasture and Crop2. Tables 3 and 6 show that the accuracy of Pasture was greatly improved owing to the decrease of the confusion with the Golf class. The improvement of the accuracy of Crop2 was mainly due to the decrease of the commission to Cons. One plausible explanation for this improvement is that, with the effective features represented by the hidden layers, DBN could extract additional underlying dependencies and structures for the SAR data.

Compared with conventional NN, DBN obtained higher classification accuracies for almost all land cover types, resulting in a notable increase in OA of 7%. The reason behind the superiority of DBN over NN is that, with an unsupervised pretraining process, more appropriate initial weights are assigned to the network, while the traditional neural network just sets random values for initial weights. The DBN-based method combines the advantages of both unsupervised and supervised learning; thus it can better distill spatiotemporal regularities from SAR data and improve classification performance.

The effects of different land cover classification methods are further illustrated in Figure 5. As can be observed in Figure 5, compared with SVM, the DBN method significantly reduces the misclassification of Forest. Compared with NN, DBN greatly decreases the misclassification of Pasture from Golf. Compared with SEM, DBN preserves the detail of residential areas. Figure 6 shows another example from an Ind. area. The figure shows that the DBN-based method provides classification map with more homogenous regions of the Ind. land cover type, which is more in line with reality.

5. Conclusion

A detailed urban LULC classification method based on the DBN model for PolSAR data is proposed. The effects of different network configurations are discussed. It is found that DBN with two hidden layers were appropriate for such detailed LULC mapping application. The experimental results demonstrate that the proposed method provides homogenous mapping results with preserved shape details and that it outperforms other land cover classification approaches (i.e., SVM, NN, and SEM) in a complex urban environment. Our future work will focus on more deep learning models for SAR data to further improve the classification results.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work is mainly supported by the National Natural Science Foundation of China under Grants U1435219, 61125201, 61202126, 61202127, and 61402507.