Mathematical Problems in Engineering

Volume 2019, Article ID 7620948, 13 pages

https://doi.org/10.1155/2019/7620948

## An Empirical Comparison of Multiple Linear Regression and Artificial Neural Network for Concrete Dam Deformation Modelling

State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University, Wuhan 430072, China

Correspondence should be addressed to Junxing Wang; nc.ude.uhw@gnawxj

Received 12 January 2019; Revised 6 March 2019; Accepted 24 March 2019; Published 17 April 2019

Academic Editor: Łukasz Jankowski

Copyright © 2019 Mingjun Li and Junxing Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Deformation predicting models are essential for evaluating the health status of concrete dams. Nevertheless, the application of the conventional multiple linear regression model has been limited due to the particular structure, random loading, and strong nonlinear deformation of concrete dams. Conversely, the artificial neural network (ANN) model shows good adaptability to complex and highly nonlinear behaviors. This paper aims to evaluate the specific performance of the multiple linear regression (MLR) and artificial neural network (ANN) model in characterizing concrete dam deformation under environmental loads. In this study, four models, namely, the multiple linear regression (MLR), stepwise regression (SR), backpropagation (BP) neural network, and extreme learning machine (ELM) model, are employed to simulate dam deformation from two aspects: single measurement point and multiple measurement points, approximately 11 years of historical dam operation records. Results showed that the prediction accuracy of the multipoint model was higher than that of the single point model except the MLR model. Moreover, the prediction accuracy of the ELM model was always higher than the other three models. All discussions would be conducted in conjunction with a gravity dam study.

#### 1. Introduction

Deformation modelling is an important component of dam safety systems, both for the daily operation and for long-term behavior evaluation [1]. They are built to calculate the dam response under safe conditions for a given load combination, which is compared to actual measurements of dam performance with the aim of detecting anomalies and preventing failures. Current predictive models for simulating dam deformation can be classified as three types: deterministic models, statistical models, and hybrid models [2], i.e., a mixture of the first two.

Deterministic models based on physical laws such as load, material properties, and stress-strain relationships are often used to design dams and function throughout the life of concrete dams [3]. Anomalies in the operation of concrete dams can be fundamentally explained by deterministic models whose parameters have specific physical meanings; uncertainties existing in geological conditions and in material properties of rock base and concrete hinder the implementation of deterministic models.

Statistical models are mathematical equation that quantitatively describe the variation law of dam monitoring values and are the abstraction and simplification of the actual working state of the dam [4]. Without regard to the specific physical mechanism of dam operation, the statistical model is essentially an empirical model based on dam measurement data over years. As the most widely used model for characterizing concrete dam deformation, the statistical model usually consists of three parts temperature component, hydrostatic pressure component, and aging component. The statistical model assumes that the components are completely independent and may not match the actual situation [5]. At present, the most common statistical models are based on regression methods such as the multiple linear regression (MLR) [6], stepwise regression (SR) [7], principal component regression (PCR) [8], and partial least squares regression (PLSR) [9]. By gradually screening regression factors of MLR models, SR models obtain regression coefficients at a certain significance level, results more accurate than those obtained by the general least square method.

In recent years, more and more scholars have begun to apply machine learning algorithms or intelligent algorithms to dam safety effect prediction or dam safety diagnosis analysis [2, 10, 11]. Mata [12] found that ANN models can be a very powerful tool in evaluating dam behavior by comparing the multiple linear regression model with the multilayer perceptron model for the horizontal displacement of a concrete arch dam. F. Salazar [2] assessed the potential of some state-of-the-art machine learning techniques including random forests, boosted regression trees, neural networks, and multivariate adaptive regression splines to build models for the prediction of dam behavior in the field of displacement and leakage. Kao [13] studied the feasibility of ANN-based approaches for dam health monitoring and set an early warning threshold level of the Fei-Tsui dam based on the analysis results. But the artificial neural network based on gradient descent method is still relatively slow and easily gets stuck. Su [14] proposed a dam safety monitoring model based on support vector machine, which could overcome the disadvantages of the above artificial neural network, but the selection of kernel function parameters is a difficult point. The extreme learning machine (ELM) is a new method training single hidden layer feedforward neural networks proposed by Huang et al. [15, 16]. The extreme learning machine first randomly generates the hidden layer deviation and weight of the connected input and the hidden layer and then directly determines the weight value between the hidden layer and the output layer by Moore-Penrose generalized inverse method. While overcoming the shortcomings of the gradient descent learning method, the ELM greatly improves the learning speed of artificial neural networks and ensures good generalization ability. Kang [17] proposed an ELM-based gravity dam deformation prediction model and explored the application of ELM algorithm from the perspective of prediction accuracy. However, the adaptability of the ELM model to the interpretation of dam deformation has not yet been elucidated.

The dam deformation monitoring model can be divided into two types: the single point deformation monitoring model and the multipoint deformation monitoring model. The current research mainly focuses on single point monitoring deformation model, which cannot reflect the spatial distribution of deformation. And the multipoint deformation monitoring model can better reflect the mutual relationship between the deformation points of the dam body, which is more reasonable than the single point model. As a kind of statically indeterminate shell structure, the concrete arch dam is obviously affected by the spatial integrity of the concrete arch dam.

This paper studies the application characteristics and effects of the multiple linear regression (MLR), stepwise regression (SR), backpropagation (BP) neural network, and extreme learning machine (ELM) on concrete dam deformation modelling based on the monitoring data of the Dongjiang arch dam. The similarities and differences between the single point model and the one-dimensional multipoint model are discussed. The work focuses on prediction accuracy and the suitability for interpreting dam behavior. All discussions will be carried out in conjunction with the results of a gravity dam [17] study.

#### 2. Statistical Model

Statistical models are established by the correlation between observed effect quantities and environmental variables. With the environment treated as an independent variable, the structural response of the dam is affected by three effects the reversible effect of the hydrostatic load, the reversible thermal influence of the temperature, and the irreversible term due to the evolution of the dam response over time [4, 18]. According to the influencing factors, the displacement of the arbitrary point in the direction can be expressed as

where represent the hydraulic displacement component, temperature displacement component, and aging displacement component, respectively. indicates the deformed surface of the dam fixed point under the action of water pressure (), temperature () and aging () alone, where is approximated by multiple power series, and

And the hydraulic displacement component , temperature displacement component , and aging displacement component can expressed as follows [19]:

where is the upstream and downstream water level difference; represents the period, represents the annual period, and represents the half-year period in (4); is the number of days since the initial date. is the average temperature from to days before the observation day; , is the measured date and is the initial date; are the regression coefficients.

More attention should be placed on the choice of two calculation methods for temperature displacement (see (4) and (5)). When the temperature data is complete and continuous, (4) is adopted to consider the influence of the actual temperature. When the temperature data is incomplete or discontinuous, (5) is used.

Substituting (2), (3), (4), (6) or (2), (3), (5), and (6) into (1), using Taylor series expansion, omitting high-order terms, and combining similar items, we can obtain the space-time distribution model of the fixed point in the direction, that is, the one-dimensional multipoint statistical deformation model.

When the coordinate of measuring point remains unchanged, a displacement statistical model of the single measuring point is obtained:

According to the reasons mentioned above, this paper chooses (7) and (9) to study the deformation monitoring model of concrete dams. Therefore, the input variable of the single point deformation prediction model is , the input variable of the one-dimensional multipoint deformation prediction model is and the output variable is the radial displacement of the measuring point.

#### 3. Methodology

##### 3.1. Multiple Linear Regression

Multiple linear regression (MLR) models are based on the linear correlation between dam effect quantities and environmental variables. When considering the relationship between the independent variables and the dependent variable , a regression equation is established: , where are the regression coefficients to be estimated; ( is the sample size); is the random error [20].

Assuming that the random errors are generally normal distributed and independent of each other, the multiple linear regression equation is represented by a matrix: , where is the vector of observations; is the parameter vector; is the constant vector; is the random error vector. There is a set of parameter estimates such that the residual sum of squares is the smallest; that is, the system of equations is solved. Therefore, the overall parameter of the least squares estimation is , the fitted model is , and the vector of the residuals is denoted by . The ultimate goal of the overall model is to minimize the sum of the squared deviations between the model predictions and the observations.

##### 3.2. Stepwise Regression

For the MLR method, the more independent variables, the smaller the residual square sum , the better the regression equation effect, and the higher the prediction accuracy. In the optimal regression equation, it is always desirable to include as many independent variables as possible, especially the independent variables that have a significant influence on the dependent variable. Nonetheless, too many independent variables may also result in some disadvantages of the regression equation. Firstly, if more independent variables are required, many quantities must be measured and calculations are inconvenient. Secondly, if the regression equation includes an independent variable that has no effect on the dependent variable or has a very small effect, then the residual square sum will not decrease, thus affecting the accuracy of the regression equation. Thirdly, the existence of independent variables that have no significant influence on the dependent variable affects the stability of the regression equation and reduces the prediction accuracy. Thus, in the optimal regression equation, it is desirable to exclude independent variables that have no significant effect on the dependent variable.

Stepwise regression (SR) is a method for a linear regression model to select independent variables [21]. The basic idea is to introduce variables one by one, with the condition that its partial regression squared and experience are significant. According to the above principle, stepwise regression can be used to screen and eliminate the variables causing multicollinearity. The specific steps are as follows: first, use to make a simple regression for each considered and then gradually introduce the remaining based on the regression equation corresponding to the that contributes the most to . After a stepwise regression, that is finally retained in the model is both important and not heavily multicollinear. The effect of stepwise regression on the improvement of multiple linear regression is still controversial, which is also a focus of this paper.

##### 3.3. Backpropagation Neural Network

Artificial neural networks are often divided into two categories: one is a recursive network that generates loops through feedback connections, and the other is a feedforward neural network [22] in which the network structure has no loops. The typical single hidden layer feedforward neural network structure is shown in Figure 1. Both the ELM and BP neural network belong to the feedforward neural network, except that the learning methods of the two are different. The BP neural network is a learning method that uses backpropagation by the gradient descent method, which requires constant iteration to update the weights and thresholds, while the ELM randomly determines the initial weights and thresholds without adjustment.