Abstract

A Bayesian hierarchical model is presented to classify very high resolution (VHR) images in a semisupervised manner, in which both a maximum entropy discrimination latent Dirichlet allocation (MedLDA) and a bilateral filter are combined into a novel application framework. The primary contribution of this paper is to nullify the disadvantages of traditional probabilistic topic models on pixel-level supervised information and to achieve the effective classification of VHR remote sensing images. This framework consists of the following two iterative steps. In the training stage, the model utilizes the central labeled pixel and its neighborhood, as a squared labeled image object, to train the classifiers. In the classification stage, each central unlabeled pixel with its neighborhood, as an unlabeled object, is classified as a user-provided geoobject class label with the maximum posterior probability. Gibbs sampling is adopted for model inference. The experimental results demonstrate that the proposed method outperforms two classical SVM-based supervised classification methods and probabilistic-topic-models-based classification methods.

1. Introduction

With the development of imaging technology, many airborne and satellite sensors, for example, QuickBird, IKONOS, and WorldView, can provide very high resolution (VHR) images. Pixel-based supervised classification methods that have been successfully applied to low or moderate resolution remote sensing images do not have desirable results when applied to VHR images, because the spatial relationship among pixels is neglected when these methods are used and the “pepper and salt” effect is often observed in classification results of VHR remote sensing images. To solve this problem, object-based classification methods are often used to classify VHR images. For example, Wang et al. integrated the pixel and object-based methods for mapping mangroves with IKONOS imagery [1]. Salehi et al. developed an object-based classification framework for QuickBird imagery coupled with a layer of height points to classify a complex urban environment [2]. Kim et al. investigated the use of a geographic object-based image analysis approach with the incorporation of object-specific grey-level cooccurrence matrix texture measures from a multispectral IKONOS image for mapping forest type [3]. These methods often consist of two sequential steps, that is, segmentation and classification [4]. Although this two-step procedure works well in some cases, several problems still remain. First, the classification results depends heavily on the segmentation algorithm. Second, training objects must be labeled before the image analysis in these classification methods. However supervised information is often obtained at a pixel-level. Thus a contradiction is noted between image objects and pixel-level supervised information, and labels cannot be used directly to train object classifiers [4].

In natural language processing, probabilistic topic models can be applied to find the latent topic representations of documents in a corpus [5, 6]. When these models are used to model remote sensing images, the images are often partitioned into a set of image tiles (i.e., documents) and the characteristics of pixels are often treated as visual words [49]. These probabilistic topic models have been used to discover semantic structures from very high resolution (VHR) remote sensing images, such as in target detection [7], image clustering [8, 10, 11], and image annotation [12]. Probabilistic topic models can also use supervised information for discovering latent topic representations, such as supervised latent Dirichlet allocation (sLDA) [13], discriminative latent Dirichlet allocation (DiscLDA) [14], semisupervised latent Dirichlet allocation (ssLDA) [4], and maximum entropy discrimination latent Dirichlet allocation (MedLDA) [15]. Differences among these models are noted in the classification: the sLDA, DiscLDA, and ssLDA are learned under likelihood-driven objective functions, which are fully generative models. MedLDA employs the discriminative max-margin principle into the process of topic learning, which is more suitable for classification tasks. Additionally, the supervised information in these models is associated with documents or tiles of images rather than individual pixels except for the ssLDA. Thus, a contradiction is noted between classification results of individual pixels and tile-level supervised information in these models, and tile-level labels cannot be used directly to train individual pixels classifiers.

To address the aforementioned problems in object-based supervised classification methods and probabilistic topic models, we present an object-oriented semisupervised classification method [4, 16] for VHR remote sensing images based on the MedLDA model [15, 17] and bilateral filter [18] in a novel framework, which is referred to as the semisupervised MedLDA (ssMedLDA). Each image object is defined as a squared image block with a neighborhood for pixels without image segmentation [10]. The main contribution of our proposed model is the combination of a probabilistic topic model with pixel-level supervised information to achieve effective object-oriented semisupervised classification of VHR remote sensing images. The remainder of this paper is organized as follows. In Section 2, the proposed approach is presented in details. The experimental results are given in Section 3. Finally, the conclusions are presented in Section 4.

2. Methodology

First, the problem between the supervised information for objects and pixel-level is discussed in this section. Second the proposed method ssMedLDA for classification processing is introduced. Third, the algorithm is presented.

2.1. Problem Statement

Let be the set of lattice sites in the given satellite image, where and are width and height of the image, respectively. A random field indexed by is given by , where a random variable at site takes a value in its state space . The set is drawn from the state space with the joint probability . In this paper, a linear discriminative function, for example, , is used to bridge the user-provided geoobject class label and an observed image object via an inferred latent image feature , where is learned under the max-margin principle.

As shown in Figure 1, the latent image feature in probabilistic topic models is an expected measurement for the th image object , which consists of pixels, that is, . As for image object , its latent image feature is a -dimension vector with element ,where is an indicator function that equals 1 when the predicate holds; otherwise it is 0. Because latent topic assignments of pixels in image object , that is, , are assumed to be independent and identically distributed in the MedLDA, the contribution or distance coefficientof each latent topic assignment to the image object is assumed to be equal, that is, . When image objects are given geoobject class labels, the parameter of linear discriminative function can be learned using the inferred latent features.

As shown in Figure 1, the user-provided geoobject class labels are often given for partially labelled pixels. Thus a contradiction is noted between image objects and pixel-level supervised information : the supervised information has different influences on each pixel in the image object and cannot be used directly to train object-oriented classifiers. Thus the problem is how to use labeled pixels directly to train object oriented classifiers.

To solve the above-mentioned problems, an edge-preserving filtering (EPF) [19], for example, the bilateral filter [18], is used to construct the relationship between the central labeled pixel and the other pixels in the image object. The pixel-level supervised information can be diffused to object-level supervised information based on the distance which includes both the spatial distance or gap and the spectral distance between the center pixel and the th pixel in object . and are defined as the spatial and spectral decay functions, respectively. Therefore, the pixel-level supervised information diffuses to object-level semisupervised information in this paper by : where is the space location of , is the space location of , is the spectral location of , is the spectral location of , is a space decay parameter, is a decay spectral decay parameter, and and .

2.2. ssMedLDA

Let be a vector of pixels appearing in the th image object in the remote sensing image, and let be the number of latent topics in the model. The vector of the response of discrete variable in the remote sensing image is the geoobject class labels, and is the number of class labels. Different from the model of the MedLDA, the variable of the geoobject class label for the central pixels in the objects is shown in Figure 2. The half hashed pattern denotes that the class node is partially observed. The generative process of the ssMedLDA model is as follows:(1)Sample topic proportions .(2)For each of the pixels in th image object ,(a)sample a topic assignment ;(b)sample a word from , a multinomial probability conditioned on , namely, .(3)Sample , where is a vector with element and is an indicator function that equals 1 when predicate holds; otherwise it is 0 in the image object .

The distance coefficient is integrated into ssMedLDA through conditional topic random fields [20] as shown in Figure 2.

2.3. Algorithm

As shown in Figure 3, Gibbs sampling is used to estimate the model inference and approximate the posterior distribution of latent variables in the model. The algorithm of ssMedLDA can be summarized as follows.

Step 1 (initialization). In this step, the parameters in the model are initialized. Two types of parameters must be set, including the number of latent topics, ; the Dirichlet hyperparameter, ; the neighborhood size, ; the positive regularization parameter, ; and the cost of making a wrong prediction, . In addition, the matrix of topics z is initialized by assigning random topics for all pixels.

Step 2 (inference). In the ssMedLDA, topics are learned in two ways based on whether the supervised information of the central pixel is available or not.
(a) When the central pixel in th image object is unlabeled, analogous to the deduction in previous studies [5, 21], the conditional distribution of is given by the following:where are parameters of the th Gaussian distribution. is the number of times that terms being associated with topic within the object are without ; and is the value of the Dirichlet hyperparameter associated with topic .
(b) When the central pixel is labeled, analogous to the deduction previously presented [17], the formulas of the conditional posterior of change to the following:where , , , is the augmented variables, is the classification model, and equals 1 when ; otherwise it is equivalent to −1; is the number of classes of interest.
The augmented variables can be given by the following:where indicates the inverse Gaussian, obeys the inverse Gaussian, and .
The linear discriminative function can be given by the following:where , , and indicates the number of labeled pixels.

Step 3 (classification). Each unlabeled pixel is classified as a user-provided geoobject class label with the maximum posterior probability

3. Experimental Result and Discussion

In this section, the experimental data and the quantitative evaluation methods for the experimental results are described. The performance of the ssMedLDA is then compared with that of the pixel-based SVM, spectral-spatial SVM, and ssLDA methods. Finally the influence of different sizes of labeled pixels on the methods is analyzed. The proposed ssMedLDA algorithm and other methods are coded and implemented in a MATLAB environment.

3.1. Experimental Data

The VHR data was collected by the ROSIS optical sensor over the urban area of the University of Pavia, Italy. The image data with a size of 610 × 340 pixels and a high spatial resolution of 1.3 m/pixel was used in the experiment. Figure 4(a) shows a color composite of the image, whereas Figure 4(b) presents the ground truth map, which includes nine geoobject classes of interest, that is, asphalt, meadows, gravel, trees, metal sheets, bare soil, bitumen, bricks, and shadows, for which each geoclass is represented by a color.

3.2. Comparison with Existing Methods and Parameters Setting

To evaluate the effectiveness of the proposed method, the performance of the ssMedLDA was compared to three classification methods: (1) the pixel-based SVM; (2) the spectral-spatial SVM: pixel-based SVM classification was generated and a majority voting within the neighborhoods was defined by image segmentation (termed as SVM + MV) [22]; and (3) the ssLDA which is the supervised probabilistic topic models that used the pixel-level supervised information. The segmentation map for the three existing classification methods used the entropy rate superpixel segmentation algorithm, which is effective and efficient [23]. The optimal number of segments was experimentally derived.

In the two SVMs, a multiclass one-versus-one SVM classification with a RBF kernel from a LIBSVM was used [24] and the other parameters were determined by parameter optimization. In the ssLDA, the number of latent topics was 80 and the Dirichlet hyperparameter was 1 + 50/. In the ssMedLDA, the parameters and were identical to the ssLDA. The positive regularization parameter and the cost of making a wrong prediction were 1, the number of classes of interests is 9 according to the ground truth, the neighborhood size was 11, the space decay parameter was 5, and spectral decay parameter was the variance of the VHR image.

In total, 10% of the labeled pixels were selected for the four methods at equal interval for training purposes and the remaining 90% of labeled pixels were set aside for testing purposes. The corresponding classification maps of the 10% labeled pixels are shown in Figures 5(a)5(d). The ssMedLDA achieves a more compact and smoother classification result when compared with the SVM and ssLDA. The two visual results from SVM + MV and ssMedLDA look similar, but the latter does not require the segmentation map. Therefore, the advantages of integrating the max-margin principle, topic modeling, and pixel supervised information influence on its neighborhood based on distance consistency are confirmed.

Table 1 displays the class-specific (producer accuracies), overall accuracies, and kappa coefficients for all methods and shows that the ssMedLDA method yields the best overall accuracy and kappa coefficients. The producer accuracies of asphalt, meadows, gravel, trees, metal sheets, bare soil, bitumen, and bricks are better than other methods. The overall accuracies and kappa coefficients from SVM + MV are better than SVM, but the producer accuracies of the shadows are not better because of the segmentation map. Therefore the influence of the segmentation on the classification is not always good.

3.3. Influence of Different Sizes of Labeled Pixels

To explore how the performance of the ssMedLDA and other object-based models behaves with different settings of the number of labeled pixels, different proportions of the ground truth pixels for training, that is, approximately 1.0%, 2.0%, 4.0%, 5.9%, 7.7%, 10.0%, 12.5%, 14.3%, 16.7%, 20.0%, and 25.0%, are conducted. Labeled training pixels for each class were acquired at equal interval from the ground truth, and the remaining labeled pixels of the ground truth were used for testing. The identical labeled pixels were used to train the SVM + MV, ssLDA, and ssMedLDA. All of the unlabeled pixels are used for training in the ssLDA and ssMedLDA. Figure 6 shows the overall accuracies of the SVM + MV, ssLDA, and ssMedLDA against the proportions of labeled training pixels. Two obvious results are noted: (1) regardless of the size of labeled pixels, the ssMedLDA method achieves a better performance than the ssLDA and SVM + MV, because the ssMedLDA integrates the mechanisms of probabilistic topic models, maximum entropy discrimination, semisupervised learning, and spatial coherence; and (2) the performance of the ssMedLDA, SVM + MV, and ssLDA is not always enhanced by increasing the portion of labeled pixels, because the samples are chosen at equal intervals, which means samples of the 4% condition do not completely contain the 2% samples.

4. Conclusions

In this paper, a semisupervised method has been proposed to address the problem of VHR remote sensing image classification. The method combines the MedLDA model and bilateral filter through conditional topic random fields for training. The proposed method takes advantage of spatial and spectral relationships in VHR images. Additionally, the ssMedLDA does not require a segmentation map, and the pixels with their neighborhoods are used as objects to enforce spatial regularization over the classification results. The experimental results show that the proposed approach is superior to the SVM + MV and ssLDA methods.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this research article.

Acknowledgments

This work is partly supported by the National Natural Science Foundation of China (no. 41571334 and no. 41401374) and the Fundamental Research Funds for the Central Universities.