Abstract

Earthquake, flood, human activity, and rainfall are some of the trigger factors leading to landslides. Landslide monitoring data analysis indicates the deformation characteristics of landslides and helps to reduce the threat of landslide disasters. There are monitoring methods that enable efficient acquisition of real-time data to facilitate comprehensive research on landslides. However, it is challenging to analyze large amounts of monitoring data with problems like missing data and outlier data during data collection and transfer. These problems also hinder practical analysis and determination concerning the uncertain monitoring data. This work analyzes and processes the deformation characteristics of a rainfall-induced rotational landslide based on exploratory data analysis techniques. First, we found that the moving average denoising method is better than the polynomial fitting method for the repair and fitting of monitoring data. Besides, the exploratory data analysis of the Global Navigation Satellite System (GNSS) monitoring data reveals that the distribution of GNSS monitoring points has a positive correlation with the deformational characteristics of a rotational landslide. Our findings in the subsequent case study indicate that rainfalls are the primary trigger of the Zhutoushan landslide, Jiangsu Province, China. Therefore, this method provides support for the analysis of rotational landslides and more useful landslide monitoring information.

1. Introduction

A landslide is a common geological hazard that seriously threatens life and property globally [13]. Like an earthquake, rainfall is also one of the most recognized trigger factors for landslides [4, 5]. Researchers have proposed rainfall thresholds and established relevant models to forecast the landslide occurrence [6, 7], and some have developed early warning systems for all kinds of landslides [8]. However, researchers have also suggested that rainfall information alone is not enough for predicting landslides because it does not consider soil moisture conditions [9]. Therefore, a single-factor monitoring method is not enough to predict landslides accurately. At present, there are various hydrological, geological, and surface monitoring methods. Setting alarm thresholds for multiple parameters is reasonable [10]. However, rainfall can cause changes in other monitoring parameters of landslides.

Exploratory data analysis (EDA) is a technique or method for analyzing and processing data sets for summarizing their main characteristics, usually by visualization, and it plays a paramount role in obtaining valuable information from data [11]. This method has been effectively applied to a variety of aspects [12], such as computer graphics [13], bioinformatics [14, 15], meteorology [16], traffic [17], and crops [18]. According to the soil or rock mass movement principle and reason, for landslides, Qarinur [19] focused on determining the correlations between the sliding distance of landslide against high, slope, and volume by using the EDA method. It was reported to be the first time EDA was implemented to describe the relationship between the location of GPS monitoring points and the trend of their displacements. He detected the outliers from the raw data and verified the deformation characteristics of a rotational landslide using 3D graphs.

It is necessary to know the steps of the EDA process. First, the project-related data, their types, and the category of variables should be prepared and identified; second, outliers should be detected or fixed using mathematical models if necessary [20]. There are two types of causes of outliers: artificial and natural [21]. Second, different methods should be considered to solve different situations due to the different outliers, such as monitoring, data processing, and sampling errors. Third, outliers should be removed or fixed by using statistical methods [22]. Fourth, the connections between each variable and its target should be surveyed. Finally, data should be analyzed from different dimensions to get the internal relationship among various data.

Mathematical models are often employed for the landslide early warning to predict landslides and achieve good results [23, 24]. However, the precondition of establishing a mathematical model is to ensure the integrity and validity of monitoring data. In the process of landslide monitoring, the loss or abnormality of monitoring data caused by monitoring equipment failure or external factors is inevitable [25]. It is essential for irregular data to know whether it is caused by disturbance, gear disappointment, or avalanche distortion to avoid triggering a false alarm. This requires data analysis and research to reduce the uncertainty impact of big data [26, 27].

Owing to the complex geologic structure of the landslides, the deformation of the monitored points at different locations is closely related to the geological features of those locations [25]. This paper provides insights into the types of landslides and the relationship between rainfall and other monitoring data through the analysis of the Zhutoushan landslide monitoring data in China and explores how to evaluate the outlier data using EDA.

2. Material and Methodology

The Zhutoushan lies above the residential area of Yongning Town, Nanjing, Jiangsu Province, China. The central areas are located at the 118°3937E and 32°0924N (see Figure 1).

The study area is located on one side of the Zhutoushan Range, and there are many prominent fault zones. The geological structure is composed of sandstone, marlite, and loose soil. In the 1970s, large-scale mining activities took place at the base of Zhutoushan and unreasonable excavations led to many landslide disasters. Houses were destroyed at the foot of the slope, resulting in a significant loss of life and property.

An early warning and automatic deformation monitoring system based on GNSS is adopted to monitor the surface deformation of the Zhutoushan landslide. The system integrates technologies are composed of GNSS high-precision positioning, General Packet Radio Service (GPRS) communication, wireless communication, database technology, and all kinds of sensors, being able to monitor the deformation of the landslide immediately and analyze and compare the monitoring results.

According to the design requirements and field survey, this system comprises one GNSS reference station located outside of the landslide, eight GNSS monitoring stations (Figure 1) (the GPRS is outside the area affected by landslide deformation), six inclinometer monitoring points in which each point was installed with four sensors at various depths to detect slope deformation, four water-level monitoring points, three pore water pressure (PWP) monitoring points, one rainfall monitoring point which was installed at the edge of the landslide (Figures 2 and 3), one soil water content monitoring point, and two video monitoring points.

An inclinometer is used for obtaining displacement information of deep sliding mass. As an essential aspect of landslide monitoring, four fixed inclinometers in series are installed in each well log buried at different depths, effectively reflecting the displacement changes of deep sliding mass in real time (Figure 2). Soil moisture sensors are usually used to measure the moisture percentage of the soil. They can calculate the moisture content. Soil moisture sensors are located around the GNSS position and collected the data in real time.

The system was initiated in July 2017, and the data were collected and transferred to the computing center in real time using wireless sensor network technology.

The approach utilized involves the following: (a) GNSS data, inclinometer data, rainfall data, soil water content data, and water pressure data come from the landslide early warning system and (b) data analysis by using the EDA for establishing the correlation between the monitoring rainfall, other monitoring data, outliers, and characteristics of GNSS data.

3. Results and Discussion

3.1. Rainfall and Displacement

In most cases, the primary trigger of landslides is persistent heavy rain. Detailed data from the monitored landslides reveal that numerous landslides have been associated with rainfall [2830]. Because of the heavy rainfall, the rainwater can seep into the underground and then flows through unsaturated soil. As a result, the water may be located on lower permeability materials or a drainage barrier, such as sandstone, siltstone, marlstone, limestone, soil, and bedrock.

Rainfall has a significant impact on the displacement of the ground surface and underground inclination. On August 15, 2018, the rainfall less than 2 mm caused changes in the surface’s horizontal displacement but had a slight effect on the changes in the inclination and elevation. During December 25–27 in 2018, the rainfall was 178 mm, 406 mm, and 313 mm, respectively. These caused dramatic changes in horizontal displacement, vertical displacement, and inclination (Figure 4). The displacement of various inclinometer depths, determined by rainfall, is also different. The surface deformation with a buried depth of 4.5 m surpassed that with a buried depth at 9 m and 13.5 m.

Rainfall has little effect on PWP. The influence of rainfall on soil water content is also comparatively small, only affecting its fluctuation in a small range (Figure 4(a)). Yet the rainfall on December 26, 2018, was 406. These changes are not obvious.

3.2. Outlier Detection in Raw Data

In GPS data collection and transmission, measurement errors and random noises are inevitable. By establishing the corresponding mathematical model or error processing, the influence of errors and noises on the original data can be reduced. However, an outlier will significantly affect the quality of data determination and modeling of original data. Therefore, the outliers must be identified and removed when a mathematical model is established to reduce the impacts of errors or noises.

Box plots, a statistical diagram that displays a set of data distributions, can be used to detect outliers in raw data. For example, GPS1 (mark 1 of GPS) has an abnormal value in the elevation direction; , , and horizontal directions are normal (Figure 5). Therefore, the vertical displacement of GPS1 can be fitted by basic 20 days moving average method. With this method, each observation is replaced by an average (Figure 6). However, some vital information can be concealed. Therefore, the moving average denoising method is suitable [7, 31, 32]. Using this method, we determined the outliers by the Root Mean Square Error (RMSE) and replaced them with their average values (Figure 7). The other values are still observations because some vital information from the raw data is available.

For the same data, the polynomial fitting model is adopted, and it is found that the correlation coefficient of the second method is better than that of the first method. Thus, the accuracy of the polynomial fitting model is enhanced after the outliers are removed. However, it should be identified whether the measurement error causes the abnormal value. If it is a real deformation value, an alarm is required. This requires careful consideration of various factors to make a comprehensive determination. Since the operation of the landslide monitoring system in 2017, there have been several outliers caused by equipment faults.

3.3. Exploratory Data Analysis (EDA)

The EDA method has been applied to a variety of problems successfully [12]. It refers to a data analysis method that explores structure and raw data through charts, equation fitting, and calculation of characteristic quantities under little prior assumptions for the original data obtained by observation or investigation [33]. From the scattered plots, squared is greater than or equal to 0.9 (Figures 8 and 9). squared of GPS3 is the maximum (at 0.970). squared of GPS1 is the minimum. Some points are far away from the line. On both coordinate axes, the variables are rescaled as standard deviational units. Thus, an observational value exceeding 2 can be designated as outlier (Figure 10(a)) [34]. The regression line is recalculated, and it reflects the slope for monitoring data without the current selection. During updating in the scatter plot (Figure 10(b)), 9 squared of GPS1 will increase to 0.911, and value will increase from 74.031 to 78.324.

Since vertical displacements of GPS1 and GPS2 are upward, and the others are downward, the trend of vertical displacements is either upward or downward, and there is a positive correlation between them. Conversely, if one trend of vertical deformation is upward and another is downward, their relation is negative (Figure 11). The correlation imitates the deformation characteristics of a rotational landslide, and it gives information for further research.

3.4. Landslide Surface Displacement

The horizontal displacement direction of the eight GPS stations imitates the sliding tendency of the sliding mass in the horizontal directions. The landslide as a whole is in the northwest direction, and the azimuth angle is around 310° to 330° (Figure 12). The most significant horizontal displacement is GPS1, while GPS8 is the smallest horizontal displacement, and the value is less than 50 mm, indicating that the point is currently stable or unaffected by the landslide deformation. The reasons are that GPS8 is located far from the landslide. Simultaneously, the displacement of GPS6 is very distinct. The north direction displacement is more significant than that of the west direction; from Figure 1, we can see that the position of GPS6 is located at the edge of the landslide.

The landslide deformation is not just in one direction but in three directions (Figure 13). The deformation at the north is superior to the deformation at the west. Thus, the direction of landslide displacement is transformed into the north. The vertical deformations are rising for the GPS1 and GPS3, and the other monitoring points are the opposite.

The date from July 14, 2017, to April 8, 2019, is selected for this study object on Zhutoushan landslide monitoring. GNSS monitoring data analysis results show that the vertical displacement of other points is larger except GPS6 and GPS8 (Figure 14). The GPS1 deformation monitoring point is the largest horizontal in these eight monitoring points, which is 792 mm. According to the vertical displacement site, the displacement values of GPS1 and GPS2 move up, while the other values move down; also, GPS2 and GPS3 monitoring points are more significant than others. The largest is GPS2, reaching 149.8 mm. From the deformation rate site, the average rate of these eight points is 2 mm per day, which indicates that the Zhutoushan landslide is in a stable stage overall. However, the observation frequency and intensity of GPS1 and GPS2 need to be enhanced, as these two are located in the toes of the landslide and are more affected by the landslide deformation to prevent the possibility of sudden landslide deformation.

According to landslide classifications [35], Zhutoushan is a rotational landslide (Figure 15). Squeezed by the upper landslide, GPS1 and GPS2 that are located in the toe of landslide surface of rupture rise in the vertical direction. Other GPS monitoring points slip down under the influence of gravity.

It is dangerous to overlook outliers in the raw data. Negative results are yielded if outliers are included in the course of processing data and analysis with exclusion. Box plots can be implemented for detecting outliers from the raw data. The limitation of a box plot is that accurate measurement of the distortion in data distribution and degree of tail weight are not obtained. In addition, for many data sets, the shape information reflected by the box plot is fuzzier, and there are limitations in using the median to represent the overall average [3640].

Polynomial fitting models and moving average denoising methods can be used to repair outliers. From the data analysis perspective, the accuracy of the latter is better than the former. The polynomial fitting model is a mathematical model that can be fitted by including outliers and raw data; it will remove some critical information from the raw data and the moving average noise. According to the setting step, the accuracy and retained information will also be different, requiring setting the corresponding step size to the specific project.

Moreover, the authenticity of the data affects the analysis results. One challenge in landslide monitoring is that amid the large amounts of data collected via different monitoring methods, it is hard to identify whether the outliers are caused by external factors or by landslide deformation. This requires a comprehensive determination to avoid misdetermination and threat to people’s lives and property damage. If the outliers are caused by the deformation of the landslide and exceed the deformation warning threshold, the system should send an alarm to inform people to take safety measures. Otherwise, the outliers can be removed from the raw data.

EDA can only detect outliers in a data set but cannot recognize the outliers caused by data transmission or deformation of the landslide. Simply removing outliers’ value can add further risk of deviation to the analysis. Therefore, EDA needs to improve user engagement through the users’ feedback. These are the main challenges in the processing of the raw data.

For GNSS data in landslide monitoring that are data of a single variable, we mainly performed analysis using box plots and scatter plots. To monitor the deformation of the landslide, various means can be used to collect data through various monitoring methods, but it is challenging to describe outliers in multivariable data using the EDA method.

The landslide deformation data can be visualized through many methods. As for the form of expression, it is preferable to distinct the horizontal and vertical deformation data, and it is easier for people to get relevant information from Figures 12 and 14. Though Figure 13 shows the data of North, West, and Height simultaneously, it cannot show the deformation process but only shows the results.

Owing to the landslide’s complex geological structure, the landslide’s geological characteristics and the all kinds of types of landslides are related to the deformation of the monitoring points at different locations. The relationship between different monitoring points is positive, consistent with most similar landslide deformations through EDA of surface GNSS monitoring data. After standardizing the data, the outliers can also be detected to improve the quality of the data.

4. Conclusions

The deformation characteristics of a rainfall-induced rotational landslide are analyzed in this paper based on the EDA method, and the Zhutoushan landslide is a rainfall-induced rotational landslide. We get some conclusions that are as follows. First, the box plot can detect outliers from a mass of raw data collected by the instruments located in the Zhutoushan landslide, and the moving average denoising method is better than the polynomial fitting method repairing and fitting the monitoring data. In addition, based on the EDA of GNSS monitoring data, the distribution of GNSS monitoring points has a positive correlation with the deformation characteristics of the rotational landslide, which supports rotational landslide and more helpful information for landslide monitoring. Overall, the EDA method is shown to be interesting and valuable to process massive monitoring data for the rainfall-induced rotational landslide.

Moreover, multiple monitoring methods, which are geological and meteorological monitoring methods, can be used to enhance landslide prediction. The monitoring results of landslides by various monitoring methods can be mutually verified to ensure data correctness and prediction accuracy. With the development of artificial intelligence and deep learning, rapid and effective analysis and determination of monitoring data help us understand and grasp the characteristics of landslide deformation.

With the development of landslide deformation monitoring equipment, data collection, transmission, and storage technology, it is one of the future development directions of landslide monitoring information processing to mine the complex relationship between massive monitoring data and various monitoring data.

Data Availability

The data used to support the findings of the study can be available within the article. Readers can obtain data supporting the research results from the data tables in the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Acknowledgments

The authors want to thank Jiangsu Kebo Space Information Technology Limited Company for kindly providing all the landslide data. Moreover, we want to thank the reviewers for the pertinent suggestions that improved the final version of the manuscript. The authors thank Ing. Kačmařík Michal, Ph.D., for his helpful comments, which prompted us to improve our manuscript, especially the quality of the maps.