Remote Sensing Landslide Recognition Based on Convolutional Neural Network
Landslides are a type of frequent and widespread natural disaster. It is of great significance to extract location information from the landslide in time. At present, most articles still select single band or RGB bands as the feature for landslide recognition. To improve the efficiency of landslide recognition, this study proposed a remote sensing recognition method based on the convolutional neural network of the mixed spectral characteristics. Firstly, this paper tried to add NDVI (normalized difference vegetation index) and NIRS (near-infrared spectroscopy) to enhance the features. Then, remote sensing images (predisaster and postdisaster images) with same spatial information but different time series information regarding landslide are taken directly from GF-1 satellite as input images. By combining the 4 bands (red + green + blue + near-infrared) of the prelandslide remote sensing images with the 4 bands of the postlandslide images and NDVI images, images with 9 bands were obtained, and the band values reflecting the changing characteristics of the landslide were determined. Finally, a deep learning convolutional neural network (CNN) was introduced to solve the problem. The proposed method was tested and verified with remote sensing data from the 2015 large-scale landslide event in Shanxi, China, and 2016 large-scale landslide event in Fujian, China. The results showed that the accuracy of the method was high. Compared with the traditional methods, the recognition efficiency was improved, proving the effectiveness and feasibility of the method.
In recent decades, disaster detection has been one of the major research goals in the modern remote sensing field. Researchers have studied the effects of changes occurring due to disasters using sensors  and simple image processing techniques . As a global environmental problem, landslide disasters seriously threaten the safety of human life and property. Therefore, it is very important to timely and accurately recognize landslide location information and implement relevant measures. In the early stage of disaster detection at home and abroad, traditional machine learning methods have been mainly used to solve these problems. For example, Cheng et al.  presented a new scene classification method for automatic landslide detection from remote sensing imaging, and Danneels et al.  used a maximum likelihood classification (MLC) method for landslide recognition in multispectral remote sensing images. Segmentation of the output image into landslide and nonlandslide areas was obtained using the double-threshold technique in combination with a histogram-based thresholding method. Junping et al.  used support vector machine (SVM) for hyperspectral data classification. Ningning et al.  aimed at the shortcomings of support vector machines and introduced a fuzzy support vector machine to identify landslides in remote sensing images. By improving the fuzzy membership to overcome the influence of noise in the training process and improve the penalty coefficient to eliminate the negative impact of unbalanced sample size, the accuracy of landslide recognition was further enhanced. Traditional machine learning methods are relatively mature in the disaster detection field, but both SVM and MLC are shallow learning algorithms because the cell is finite. As such, shallow learning networks have difficulties effectively expressing complex functions. With an increase in sample size and sample diversity, shallow models gradually fail to catch up with complex samples.
In theory, the more the parameters, the higher the complexity of the model and the larger the “capacity,” meaning that it can complete more complex learning tasks . However, in general, the training of complex models is inefficient and easily causes overfitting. Therefore, it is not typically favored by users. With the advent of cloud computing and the big data era, computing abilities can greatly alleviate the training inefficiency, and a large increase in training data can reduce the overfitting risk. Therefore, complex models represented by “deep learning” begin to receive increasing attention. In recent years, deep learning networks composed of multilayer nonlinear mapping layers have been applied in many different fields, such as speech recognition, object detection , information retrieval, and others, especially in the field of image processing for facial recognition  and video analysis . Studies have proven that in big data learning, deep learning performs better than traditional machine learning. In the remote sensing field, due to the vast and tremendous amounts of data, the use of deep learning to process remote sensing images has already become the mainstream. For example, Cheng et al.  used three convolutional neural network (CNN) models, namely, AlexNet , VGGNet , and GoogleNet  to extract features of high-resolution remote sensing image scenes and used SVM to classify the scenes. Masi et al.  proposed CNN for pan-sharpening of remote sensing images, proving that the proposed method is superior to the current level in full reference and no reference measures are required. Yu et al,  used CNN to detect landslides and used the RSG_R algorithm to extract the discriminant information of the postdisaster image elements. Based on the CNN, Luo et al.  proposed an SR method that could acquire more precise reconstructed high-spatial-resolution images. Based on deep convolutional neural networks (DCNNs) and multiscale features fusion (MSFF), Zhou et al.  proposed the DCNN_MSFF to improve the accuracy of remote sensing image scene classifications. Mohammad et al.  used deep convolutional neural network for complex wetland classification. Zhou et al.  proposed an effective framework named compact and discriminative stacked autoencoder (CDSAE) for HSI classification. To address the challenges of within-class diversity and between-class similarity, Cheng et al.  proposed to learn DCNNs by optimizing a new objective function, Gong et al.  proposed an architecture of the initial FD-CNN model, in which a new rotation-invariant layer (FCa) and a new Fisher discriminative layer (FCb) are additionally added to the existing CNN models, and Gong et al.  proposed a new method to extract hierarchical deep spatial feature for HSI classification with off-the-shelf CNN models.
In this context, CNN was investigated for recognizing landslides in this study. For the feature selection of multispectral remote sensing images, this paper tried to add NDVI and NIR to enhance features because the band values changed greatly after the landslide. This paper selected remote sensing images with the same spatial information but different time series information for band fusion, and the fusion bands before and after the disaster were used as the characteristic values that well reflected the landslide characteristic information. In the meantime, because remote sensing image information is abundant and highly nonlinear, CNN can better learn the features of the landslide and complete the landslide recognition. Finally, this paper used mixed band as the feature value and selected CNN classifier for landslide recognition.
At present, the problem of landslide identification is that it is easy to confuse landslide area with soil area. In this paper, firstly, the NDVI vegetation index of postdisaster remote sensing images was extracted, and the NDVI vegetation index was combined with other 3 bands (red, green, and blue) to strengthen the distinction between soil region and vegetation region. Then, without losing original color information, this paper added the near-infrared band for landslide identification. Because remote sensing images from before and after a disaster event can well reflect landslide change information, the remote sensing images (predisaster and postdisaster images) with the same spatial information but different time series information were used, and this paper combined the bands of predisaster images and the bands of postdisaster images as features. Finally, this paper obtains the mixing information that can reflect the landslide area and nonlandslide area. Since more bands are introduced, this paper chose CNN, which has advantages in big data processing. The landslide recognition could be completed after the convolutional neural network learning process.
2.1. Data Preprocessing
As shown in Figure 1, the sample set was initially constructed. After preprocessing the remote sensing images using geometric rectification, projection transformation, and image fusion, remote sensing images with a resolution of 2 m and 4 bands were obtained. Since changes in the remote sensing images before and after disasters are required for the landslide information, the image pixels should correspond to each other as much as possible. Therefore, the remote sensing image registration is essential, and the accuracy of the registration affects the outcome of the landslide recognition.
In this paper, NIR is added for soil analysis. NIR spectroscopy uses diffuse reflection to analyze the chemical properties of substances, and the size of soil particles is closely related to the intensity of diffuse reflection. Therefore, NIR spectroscopy can be used to analyze and predict the characteristics of soil particles. NIR can help us to distinguish the soil area from the landslide area.
NDVI index is introduced as a feature to distinguish vegetation region from nonvegetation region in this study:where NIR is the reflectance of the near-infrared band and R is the reflectance of the red band. NDVI is a remote sensing index reflecting the vegetation coverage status of land. NDVI can eliminate most of the changes in irradiance related to instrument calibration, solar angle, topography, cloud shadow, and atmospheric conditions and enhances the responsiveness to vegetation. It is the most widely used vegetation index among more than 40 existing vegetation indices.
After the introduction of NDVI and NRI, considering postlandslide spectral information alone is still difficult to distinguish the landslide area and bare soil area. Since remote sensing images from before and after landslide can reflect landslide information well, the remote sensing images (predisaster and postdisaster images) with the same spatial information but different time series information are used as input images. This paper combines the bands of the prelandslide remote sensing images with the bands of the postlandslide images as the feature for learning. With respect to selecting the characteristic values, the red, green, blue, and near-infrared remote sensing image band values were extracted in this study. Using the fusion of 4 bands of predisaster, 4 bands of postdisaster, and NDVI index, 9 remote sensing image bands were obtained and used as the characteristic values. The images were clipped in batches, and the pixel size of each sample was 8 × 8 × 9. The training set and testing set were about 3 : 1, and the sample set was normalized .
For label making, this paper extracted the binary images from the remote sensing images to make the labels. The binary images were clipped in batches. Initially, this paper set the rate of change to 50%, but found that a significant amount of the experiment edge information was lost. Therefore, the final setting was to label the sample using a rate of change greater than or equal to 25% as 1 and less than 25% as 0. In this way, the label corresponding to the sample set was obtained. Finally, the sample sets and labels, as input data, were inputted into the CNN for training.
In this study, CNN was selected for the landslide recognition and the framework is shown in Figure 2. CNN is a category of feed-forward neural networks that contain convolutional or correlational computation with a deep structure. It is one of the popular deep learning models used to deal with multidimensional array data . It has unique advantages in speech recognition and image processing with its special structure of local weight sharing. The multidimensional vector input images can be directly inputted to the network, avoiding complex data reconstruction processes for feature extractions and classifications. Therefore, CNN is one of the core algorithms in the field of image recognition .
As shown in the Figure 2, the network inputs were remote sensing images. The outputs were the recognition outcomes. The CNN combined multiple “convolutional layers” and “pooling layers” to process the input images and implement a map to the output target at the fully connected layers.
2.2.1. Convolutional Layer
A convolutional layer is used for feature extraction, performs the linear operations, and usually is a combined convolution. Convolutional kernel parameters include the number, size, step size, and filling mode. A convolutional operation is usually defined as “”:
In convolutional network terminology, the first parameter of convolution (function ) is usually called the input, the second parameter (function ) is called the kernel function, and the outputs are called the eigenmaps. Each convolutional layer contains a plurality of feature maps. Each feature map is a “plane” composed of a plurality of neurons, and a feature of the input is extracted by a convolutional filter. In the application of machine learning, the input is usually multidimensional array data, and the kernel is usually the parameter of the multidigit array optimized by the learning algorithm.
2.2.2. Activation Function
An activation function is added to a convolutional layer and is usually a nonlinear activation function, such as Sigmoid, Tanh, Relu, Leaky Relu, and others. These activation functions perform nonlinear operations to reduce the vanishing gradient. The Relu function was used in this study, as shown in Figure 3. Comparing the Relu function with Sigmoid function, the Relu formulates the output of some neurons as 0, leading to sparseness in the network and reducing the correlation dependence of the parameters. The overfitting problem is alleviated and the computation of the entire process is greatly reduced .
2.2.3. Pooling Layer
Pooling layers are used to lessen the size of feature images and enhance the robustness of a feature to rotation and transformation. Pooling is an operation used by almost all convolutional networks. This paper further adjusted the output of the upper layer with pooling functions subsampled by the local correlation principle to reduce the amount of data while retaining useful information. The pooling included mean pooling, max pooling, stochastic pooling, and others. In this study, max pooling (uses the maximum nearby characteristic points) was selected.
For the overfitting phenomenon that occurs during experiments, this study used dropout regularization to prevent overfitting . Dropout improves the performance of neural networks by preventing the mutual effect of feature detectors. In simple terms, the activation value of a certain neuron stops working with a certain probability in the forward propagation, improving the model generalization. As shown in Figure 4, the process does not extensively rely on the local characteristics.
2.2.5. Optimization Algorithm
The purpose of the optimization algorithm is to minimize (or maximize) the loss function while improving the training so that it can approximate or reach the optimal value. The internal parameters of the model play an important role in effectively training the model and producing accurate results. The adaptive moment estimation (Adam) optimization algorithm  is introduced in this paper as shown in Algorithm 1.
Adam is a first-order optimization algorithm that can replace the traditional SGD (stochastic gradient descent) algorithm. The Adam algorithm adjusts the learning rate of each parameter based on the first-order and second-order moment estimations of the gradient of each parameter and retains the advantages of the AdaGrad (adaptive gradient) algorithm and RMSprop (root mean square propagation) algorithm. The former performs better on sparse gradients, while the latter performs better on nonstationary conditions. Adam works well in practical applications. Compared with other self-adaptive learning rate algorithms, it has a faster convergence speed and better learning effect. Additionally, it can correct problems existing in other optimization algorithms, such as the learning rate vanishing, slow convergence, or large fluctuations in the loss function caused by parameter updates with high variances.
2.2.6. Fully Connected Layer
The output of the convolutional layer represents the advanced features of the data. When the output can be flattened and connected to the output layer, this adds a fully connected layer that typically allows the nonlinear combination features to be learned in a simple manner. The fully connected layer connects all the features and sends the output values to the classifier (such as the Softmax classifier).
3.1. Satellite Data and Network Structure
The experimental data used in this study were acquired from GF-1 satellite remote sensing images. GF-1 is the first satellite in China's high-resolution Earth observation system. It has a 2 m panchromatic resolution and a multispectral resolution of 8 m. The CCD camera in the satellite contains four bands (red, green, blue, and near-infrared). The bands’ ranges are shown in Table 1.
As shown in Figure 5, in this study, a large landslide which occurred in Shanxi Province, China, in 2015, was selected as the study area, and its four remote sensing image bands (red, green, blue, and near-infrared) were selected for the experimental data. The landslide’s training dataset consists of 62500 patches, and landslide’s test dataset consists of 20000 patches.
After training, this paper predicted the testing data and marked the data conforming to the landslide characteristics as 1 and finally drew the predicted labels into a binary image as shown in Figure 6. It can be seen that when this paper used a single band as a feature and CNN, SVM, GBM (gradient boosting machine), and NBC (Naive Bayes classifier) as classifiers (Figures 6(a)–6(d)), the recognition accuracy was lower, and when this paper used RGB bands as features (Figures 6(e)–6(h)), the accuracy of recognition had large improvement, but details were still unable to identify, and some soil information was incorrectly identified as landslide information. Finally, when compared with other methods, the proposed method (Figures 6(i)–6(l)) can perfectly identify the landslide information, and the results of the recognition did not lose detailed information, proving that the proposed method is suitable for landslide disaster recognition.
To evaluate the recognition accuracy, the predicted data were compared with the real data labels to obtain four evaluation parameters (accuracy, precision, recall, and F-measure). In this study, four parameters (TP, FP, FN, and TN) were introduced to calculate the evaluation parameters as shown in Table 2.
Evaluation parameters are given as follows: Accuracy. Accuracy is the most common evaluation parameter and is expressed by the deviations. It is the degree to which the average value of repeated measurements in certain experimental conditions conform to the true values. It is used to indicate the magnitude of the systematic deviations.
When classifying unbalanced datasets, the accuracy rate does not reflect the overall situation. Precision and recall are the two most basic parameters in the field of information retrieval. They are defined as follows. Precision. The degree that the measured values correspond to each other when the same data are measured repeatedly. Precision characterizes the magnitude of random deviations in the measurement process. Recall. The recall rate is a ratio between “all correctly retrieved information” and “the information that should be retrieved correctly.” F-Measure. Precision and recall sometimes appear in contradictory situations and should be considered comprehensively. The most common method is the F-measure. F-measure is the weighted harmonic mean of the precision and recall.
As shown in Table 3, four evaluation parameters (accuracy, precision, recall, and F-measure) were introduced to evaluate the performance of the method used in this study. The experimental results show that the proposed method is superior to the single band, RGB, and the traditional machine learning algorithms SVM, GBM, and NBC in both accuracy (98.98%) and F-measure (95.33%), proving that the proposed method is suitable for landslide disaster recognition.
In addition to the parameter comparisons, the ROC (receiver operating characteristic) curve was accepted in this study, and each point on the curve reflects susceptibility to the same signal stimulus; besides, the AUC value is introduced to evaluate the results. AUC (area under curve) is the area under ROC curve, which is between 0.1 and 1. As a numerical value, AUC can directly evaluate the quality of classifier. The higher the value of AUC, the better the classification effect. As shown in Figure 7(a), the CNN ROC curve obtained by the proposed method was closer to the (0, 1) point, and the CNN ROC curve area essentially covers the SVM ROC curve area, NBC ROC curve area, and GBM ROC curve area, and the CNN AUC (0.9705) is larger than other methods, thus demonstrating that the proposed method is superior to other algorithms.
As shown in Figure 8, in this study, a large landslide which occurred in Fujian Province, China, in 2016, was selected as the study area, and its four remote sensing image bands (red, green, blue, and near-infrared) were selected for the experimental data. The landslide’s training dataset consists of 40000 patches, and landslide’s test dataset consists of 25000 patches.
Finally, the predicted labels are drawn into a binary image as shown in Figure 9.
As shown in Table 4, the experimental results show that the proposed method is superior to the single band, RGB, and the traditional machine learning algorithms SVM, GBM, and NBC in both accuracy (97.69%) and F-measure (70.44%), proving that the proposed method is suitable for landslide disaster recognition.
As shown in Figure 10(a), the CNN ROC curve obtained by the proposed method was closer to the (0, 1) point, and the CNN ROC curve area essentially covers the SVM ROC curve area, NBC ROC curve area, and GBM ROC curve area, and the AUC (0.8144) of CNN is larger than that of the other methods, thus demonstrating that the proposed method is superior to other algorithms.
The experiment was carried out in the Keras environment. In this method, the parameters of CNN model were shared between the bands before and after landslide. The parameter adjustment of the CNN is an important step. The results in this study were obtained after many hyperparameter adjustments. In the experiment, the accuracy of the initial test set with the increased training times was higher than the accuracy of the training set. In this experiment, due to the limited amount of data and the increase in the number of training runs, overfitting occurred in the experiment that reduced the accuracy. Overfitting is unavoidable, but some measures can still be taken to reduce overfitting. In this study, for choosing the number of hidden layers in the network structure, this paper chose the double hidden layer convolutional neural network. By reducing the complexity of the model, the problem where the training precision decline was caused by overfitting which brought about a complicated network structure was avoided. At the same time, this paper adjusted the training step because a larger step can easily lead to high loss values, and the accuracy values cannot increase. In this study, batch size was set as 16. This paper found that a lower learning rate would lead to overfitting at the later stages of the study, while a higher learning rate will lead to network shock. The overfitting was further decreased by reducing the initial learning rate appropriately and using exponential decay learning rate in the Adam optimization algorithm, and this paper set the initial learning rate at 0.001. Finally, this study added dropout regularization, where the dropout rate was set as 0.25 and the number of iterations was 120∼140. A precision value that reached the maximum after multiple trainings was acquired.
In summary, the experiment proved the superiority of the proposed method compared to the traditional method. However, when using the CNN for training, the performance and training time were superior to the traditional machine learning algorithm where the adjustment of massive hyperparameters is complicated . This is why most traditional machine learning algorithms are still used in the applications of remote sensing images.
In this study, on the basis of the CNN, the remote sensing image bands before and after the disaster and NDVI index were fused, and the fused values of 9 bands were used as characteristics to recognize the landslide disaster in Shenzhen and Shanxi, China. An accuracy of 98.98% and 97.69% was obtained in this study. From the initial comparison, it was noted that the proposed method classification algorithms were those which provided the best performance over all the considered datasets.
Compared with the traditional methods, the precision of this method is improved, indicating that the band variation can reflect the characteristics of the landslide well. In addition, it also shows that deep learning is superior to traditional machine learning algorithms for image processing and has high accuracy and fast speeds when processing large datasets. In this study, deep learning gradually extracted the characteristics of the input data from the lower layer to the upper layer and finally formed the ideal features suitable for landslide recognition and also improved the recognition accuracy. Therefore, the method proposed in this study can be applied to landslide recognition and other disaster detections. However, the data used in this paper were obtained when the weather was better and the vegetation coverage area was wider. Inclement weather such as rain and snow were not considered, and it will be a challenge to overcome these challenges.
Finally, although the performance of the convolutional neural network was better than traditional machine learning, a large number of highly complex parameter adjustments for the neural network were required. The greater number of layers in the neural network does not imply it is better. Therefore, it is very significant to develop an experiment that can identify a suitable neural network structure.
The GF-1 images used to support the findings of this study were downloaded from http://22.214.171.124:7777/DSSPlatform/productSearch.html (the China Centre for Resources Satellite Data and Application).
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This study was supported by the National Key R&D Program of China (2016YFB0502502) and the National Natural Science Foundations of China (61871150).
P. Danneels, E. Pirard, and H.-B. Havenith, “Automatic landslide detection from remote sensing images using supervised classification methods,” in Proceedings of the 2007 IEEE International Geoscience and Remote Sensing Symposium, IEEE, Barcelona, Spain, July 2007.View at: Publisher Site | Google Scholar
G. Ningning, Y. Jingyuan, L. Chengfan, L. Ming, and Z. Ming, “Landslide recognition in remote sensing image based on fuzzy support vector machine,” in Proceedings of the 2012 IEEE 12th International Conference on Computer and Information Technology, IEEE, Chengdu, China, October 2012.View at: Publisher Site | Google Scholar
G. Cheng, C. Ma, P. Zhou, X. Yao, and J. Han, “Scene classification of high resolution remote sensing images using convolutional neural networks,” in Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), IEEE, Beijing, China, July 2016.View at: Publisher Site | Google Scholar
A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, vol. 25, p. 2, Curran Associates, Inc., Red Hook, NY, USA, 2012.View at: Google Scholar
R. Mohammad, M. Mahdianpari, Y. Zhang, and B. Salehi, “Deep convolutional neural network for complex wetland classification using optical remote sensing imagery,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 11, no. 9, pp. 3030–3039, 2018.View at: Publisher Site | Google Scholar
S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the International Conference on Machine Learning JMLR.org, Lille, France, July 2015.View at: Google Scholar
X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pp. 315–323, Fort Lauderdale, FL, USA, April 2011.View at: Google Scholar
N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.View at: Google Scholar