Abstract

Outlier generally exists in dam monitoring data which may seriously affect the accuracy of dam safety evaluation results. Aiming at the outlier detection of dam monitoring data, a novel dynamic detection method of dam outlier data based on SSA-NAR is proposed. This combined method does not depend on the effect quantity and influence quantity relationship of traditional dam safety theory and only uses the time series of effect quantity to mine the variation, which can avoid the impact of missing or abnormal of the influence quantity. The Nonlinear Autoregression (NAR) is a classical time series neural network widely used in engineering field. However, the prediction accuracy of NAR is greatly affected by the selection of model parameters, the Sparrow Search Algorithm (SSA) which is a novel model parameter solution method and can be combined with NAR to derive the optimal parameters of NAR prediction model. The outlier is identified through the analysis of the residual distribution between the predicted data and the measured data. The case study shows that when the original data does not contain outliers, the prediction accuracy of the model is high. When the outlier is included, the proposed model has good robustness which the outlier has little influence on the prediction effect. It can effectively detect the outlier in the original dam monitoring data and provide a reliable data basis for dam safety evaluation.

1. Introduction

The safety of dam projects is of great importance to society and people’s lives [1, 2]. Dam monitoring data can objectively and comprehensively reflect the safety status of the dam, which is obtained by the monitoring instruments [35]. Among the dam monitoring data, there have outliers inevitably due to the instrumentation and manual monitoring problem [6]. The detection of outliers is the prerequisite for dam monitoring data analysis.

The outlier detection method of dam monitoring data generally includes manual judgment and statistical probability detection [7]. Manual judgment is based on comparison of the adjacent monitoring data, which is less efficient and mostly depends on the level of expert experience. Statistical probability method is based on statistical hypothesis test; when the data samples are insufficient or the probability distribution assumption deviates from the reality, the outlier detection accuracy of this method is greatly affected [8]. With the development of artificial intelligence technology [9], deep learning models have been successfully applied to the diagnosis of outliers in dam deformation monitoring data, scholars constructed prediction models through the dynamic relationship between deformation data and impact factors, and detected outliers through the residual distribution of predicted values and measured values [1012]. The intelligence methods mostly need to determine the model input, such as water pressure, temperature and aging factor. When the input data is partly missing or abnormal, the method may not work normally.

Therefore, the artificial intelligence algorithm that does not depend on the input-output relationship has good applicability in the detection of dam data outliers. The nonlinear autoregressive (NAR) neural network which is widely used in data prediction field that only uses deformation data as input to complete prediction function [13, 14]. The prediction accuracy of the NAR model largely depends on the parameters of model network, such as the delay parameter and number of hidden layer elements. Sparrow Search Algorithm (SSA) is a novel advanced intelligence optimization algorithm on the basis of the behavior of sparrows foraging [1517]. It has the advantages of high robustness and fast convergence which can effectively solve the parameter optimization problem of the NAR model.

In this study, the NAR model and SSA optimization algorithm are integrated to construct a detection method of outliers in concrete dam monitoring data. SSA is introduced to obtain the optimal NAR neural network parameters, and the optimal parameters are used to derive an optimal NAR dynamic model to predict the dam monitoring data. Then, the dynamic detection steps of outliers by SSA-NAR are constructed. Finally, an actual dam project is given to prove the effectiveness of the outlier detection method.

2. SSA-NAR Detection Model

2.1. SSA Algorithm

The SSA algorithm is proposed in 2020 and mainly on the basis of the foraging behavior of sparrows [18, 19]. The sparrows are divided into discoverers and followers during foraging. The discoverers are in charge of finding food and providing foraging locations, while the followers use the information of discoverers to get food. Because the discoverers have priority to obtain food information, the discoverers could acquire a larger foraging search information than the followers. During each foraging, the location of the discoverer is updated as below: where indicates the number of iterations, is the location of the-th sparrow at the-th dimension. is a constant with the largest number of iteration. is a random number, . and express the warning value and the safety threshold, respectively, and . is a random variable which satisfies normal distribution. shows a matrix that all element inside is 1.

The location update description of the follower can be expressed as where is the optimal location of the discoverers. shows the current worst location. is a matrix that all element inside is randomly numbered 1 or −1, and .

When the sparrow spots the danger, it will lead to anti-predation behavior which shows as follows:

where is the current best location. is the random parameter which obeys the normal distribution with the mean value of 0 and the variance of 1. is a random number, . is the fitness value of the present sparrow. is the current best fitness values, and is the current worst fitness values. is the smallest constant so as to avoid the denominator to be zero.

2.2. NAR Dynamic Neural Network

Neural networks are divided into two categories: static neural networks and dynamic neural networks [20, 21]. Static neural networks have no feedback and memory capabilities. The output of the static network only depends on the current input and has no relationship with the previous input and output. Dynamic neural networks are divided into two types: feedback networks and nonfeedback networks. The output of the network without feedback depends not only on the current input, but also on the previous input. The output of the network with feedback depends not only on the current and previous inputs, but also on the previous output. Due to its memory function, dynamic neural network is more suitable for prediction of time series which has the advantages of short training time and high prediction accuracy.

NAR neural network is a widely used dynamic neural network; the algorithm model can be expressed as where is monitoring value at time . are the monitoring values from to , respectively. is the delay parameter. is a nonlinear function obtained through learning and training.

NAR dynamic neural network is composed of input layer, output layer, hidden layer, and delay parameter. It has two network modes; one is close-loop network mode; the output of the neural network will be feedback to the input layer and continue to learn again with other inputs. The other is open-loop network mode; the expected output of the neural network will be feedback to the input layer in this mode. In order to improve the prediction accuracy, it selects the commonly used open-loop network mode; the specific structure is shown in Figure 1. The on the left represents the network input. is the delay parameter. is number of hidden layer elements. is weight. is the threshold. The on the right represents the network output. The delay parameter and the number of hidden layer elements should be determined, and these parameters directly affect the training and prediction capabilities of the NAR dynamic neural network.

2.3. Detection Method of Outliers by SSA-NAR

Outliers in dam monitoring are generally caused by monitoring system failures and manual observation errors. The basic characteristic of outliers is that there is an isolated measurement value that is significantly larger or smaller than the previous time and the subsequent time at time . Outlier has the characteristics of contingency and discreteness. Figure 2 is a schematic diagram of a typical characterization mode of outliers.

This paper proposes a method for dynamic detection of outliers in dam monitoring data based on SSA-NAR. This method uses the SSA to optimize the delay parameter and the number of hidden layer elements and introduces the optimal parameter in the NAR dynamic neural network for prediction. The residual distribution of predicted and measured values is used to identify outliers, which can carry out outlier inspection on the latest monitoring data in time, so as to provide technique basis for the project management department to check and correct the information in time. The model flow chart is shown in Figure 3, and the specific steps are as follows: (1)Data set acquisition: Obtain dam monitoring data through safety monitoring system(2)Dam data prediction: Use SSA optimization to determine the delay parameter and the number of hidden layer elements, and establish NAR dynamic neural network for prediction(3)Outlier detection: According to the definition of outlier, when the residual between the expected value and the measured value exceeds a certain threshold, the measured value is called outlier. Hence, there are two key problems in the detection of outlier: one is the determination of expected value, and the other is the determination of threshold. The expected value can be determined by the prediction of SSA-NAR model. The “ criteria” is commonly used in outlier detection to determine the threshold. Therefore, the formula of outlier detection is as follows:

where is the measured value at time , is the predicted value of SSA-NAR model at time , and is the standard deviation of the sample.

3. Case Study

3.1. Project Overview

A concrete gravity dam is located on Muyang River, Fujian Province, China. The maximum dam height is 72.4 m, and dam crest length is 206 m. The dam body is divided into 9 dam sections. In this case analysis, the deformation data which is commonly analyzed is taken as an example to verify the effectiveness of the proposed model. In order to monitor the deformation of the dam, the tension line, vertical line, and other methods are introduced. The distribution map of dam deformation measuring points is shown in Figure 4.

Typical measuring point EX4 is selected as the analysis object. The monitoring time is from June 6, 2017, to October 22, 2018. There are totally 500 sets of monitoring data. 300 sets of data from June 6, 2017, to April 5, 2018 are used as training data. 200 sets of data from April 6, 2018, to October 22, 2018 are used as test data. The test data process line is shown in Figure 5.

In order to test the effect of the proposed method, five monitoring data (the data number is 9, 15, 71, 156, and 166, respectively) are randomly selected to construct the outlier by adding or subtracting a constant . According to the definition of outlier, is generally selected. Based on the SSA-NAR model principle, when the outlier is larger, the influence of outlier on the accuracy of the model is more remarkable. In order to illustrate that the accuracy of SSA-NAR model has little influence on the outlier, and are selected for comparative analysis. The test samples and three groups of test samples with outlier are shown in Figures 58.

3.2. Parameter Optimization by SSA

Before obtaining the optimal NAR model parameters, it is necessary to set the parameters of SSA model. The parameters are selected on basis of a lot of references [2224]. (1)Fitness function . Select the root mean squared error of the training data as the fitness function; the formula is as follows:

where is the number of training data. and are the measured value and predicted value of the training data, respectively (2)Population size. In the SSA algorithm, the population size generally takes 10 to 30. When the population size is large, the prediction effect is not obviously improved, but the convergence speed is reduced. Considering the convergence accuracy and speed, 10 is selected as the population sizes(3)Number of discoverers. Select the recommended value of the model, 20% of the population size(4)Safety value . Select the recommended value 0.8 as the safety value(5)The maximum number of iterations. The greater the number of iterations, the greater the possibility of obtaining the optimal parameters; meanwhile, the training time of the model is longer, and over-fitting is prone to occur. Therefore, the maximum number of iterations in this study selects 100

After optimization of the SSA, the optimal delay parameter is 6, and the optimal number of hidden layer elements is 10. The fitness curve is shown in Figure 9.

3.3. Result Analysis

The result analysis is divided into two parts. The first part verifies the prediction accuracy of the model based on the training data without outlier, and the other part verifies the outlier detection ability of the model based on the training data with outlier.

In order to verify the prediction performance of the proposed model based on the training data without outlier, BP model and LSTM model are used for comparison and verification. BP neural network is a classical artificial intelligence model and is widely used in dam deformation prediction. As a representative of deep learning in recent years, LSTM has also made a lot of research results in dam monitoring model. BP neural network adopts double-layer neural network, and the neuron of each layer is set to 64. The LSTM model also selects a double-layer neural network, and the neuron of each layer is set to 64. The dropout parameter is set to 0.3 and the active function is tanh. Mean absolute error (MAE), root mean squared error (RMSE), and mean absolute percentage error (MAPE) are used as evaluation index to analyze the prediction accuracy of the model. The calculation formula of each index is as follows: where is number of test data. and are the measured value and predicted value of the test data, respectively.

The sequence of measured values and model predicted values is shown in Figure 10. The predictive performance evaluation indexes are shown in Table 1. It can be found from the chart that the BP model has a good prediction effect, but the accuracy of BP model is the lowest among the three models. Compared with the LSTM model, the MAE, RMSE, and MAPE values of SSA-NAR model reduce by 6.42%, 5.78%, and 10.05%, respectively. The residual distribution diagram of the prediction results of the SSA-NAR model is shown in Figure 11, which indicates the SSA-NAR model has high prediction accuracy.

This part verifies the outlier test ability of the proposed model. After SSA-NAR model training, the prediction results of three groups of test sets with outliers are shown in Figures 1214. It can be indicated from the figure that even if the original data contains outliers, the accuracy of SSA-NAR model is still high, and the outliers have little impact on the overall accuracy of the model. With the increase of outliers (from to ), the accuracy of the model does not decrease significantly, indicating that SSA-NAR prediction model has a strong ability to resist outlier.

After the predicted value is obtained, the outlier of dam deformation data can be identified according to Equation (5). The first step is to calculate the residuals between the test data and the predicted data; the second step is to use the “criteria” to detect outliers on the residuals. The residual calculation results of three groups are shown in Figures 1517. The detection accuracy (number of detected outliers/number of actual outliers) is shown in Table 2. A total of 5 outliers were added artificially; when the outliers are and in the test data, all the outliers were detected. When the outliers were , a total of 4 were detected, and the detection accuracy was 80%.

According to the detection results, the detection accuracy of deformation outlier is 100% in the test data with and outlier. All outliers are detected by the SSA-NAR model. For the test data with outlier, four of the five outliers were detected. The reason is that when the outlier is large, the outlier has an impact on the prediction performance of the model. The predicted value of the second outlier is close to the outlier, resulting in a small difference between the predicted value and the measured value. Therefore, this outlier point is not detected. In practical dam engineering, the outliers of dam deformation monitoring data are mostly near , and rarely more than . Even in the case of large outlier, the detection accuracy of SSA-NAR method is relatively high. Therefore, the proposed model can effectively detect the outliers in the deformation sequence and provide a reliable data basis for the analysis of dam deformation data. Outliers are generally caused by manual errors and systematic errors which is inevitable during the operation of the dam. After identifying abnormal values, it can further analyze the causes of abnormal values by checking the original data and put forward safety suggestions for the operation of the project.

4. Conclusions

The outlier of monitoring data may have a great impact on the results of dam safety monitoring. In order to improve the accuracy of outlier detection, a new technique which comprehensively combines the SSA optimization algorithm, and the NAR dynamic neural network is applied in the outlier diagnosis of dam monitoring data. Due to the combination of the SSA algorithm, the problem that the prediction accuracy of NAR model is greatly affected by parameter selection is solved. Based on the definition of outlier and the prediction model, the outlier detection method is constructed, and the following conclusions are obtained through a dam engineering example: (1)At present, most dam deformation prediction methods rely on the input-output relationship between effect quantity and influence quantity. When the effect quantity data is abnormal or missing, the prediction function of dam deformation cannot be realized. This method does not depend on the relationship between effect quantity and influence quantity; the effect quantity is predicted by deeply mining the internal relationship of effect quantity time series. Compared with BP and LSTM methods, it is verified that the SSA-NAR prediction model has high accuracy(2)SSA is introduced to optimize the parameters of NAR neural network, which reduces the influence of the parameter selection of artificial random input. When there is outlier in monitoring data, it can still effectively predict the data without being significantly affected by outlier(3)When the outlier is less than , the model can effectively identify the outlier in the monitoring data, and the accuracy is 100%. When the outlier is large, the prediction performance of the model may be disturbed by the outlier, mistakenly inferred that the outlier is the real value, resulting in the deviation of the predicted value from the real value. Therefore, it is necessary to conduct further study to reduce the interference of large outlier to the model in the future(4)The proposed method can only identify the location of outliers, but cannot identify the reason of outliers. It needs to build various outlier identification methods according to the reason and characteristics of outlier and establish analysis methods for other abnormal data except outliers

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Authors’ Contributions

J.S. and Y.C. contributed to the conceptualization. J.S. contributed to the methodology. Y.C. contributed to the software. J.S., Y.C., and J.Y. contributed to the validation. Y.C. contributed to the formal analysis. J.S. contributed to the investigation and resources. Y.C. contributed to the data curation. J.S. contributed to the writing—original draft preparation. Y.C. contributed to the writing—review and editing. J.Y. contributed to the visualization. Y.C. contributed to the supervision. J.S. contributed to the project administration and funding acquisition. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (Grant Nos.52109166 and 52039008).