#### Abstract

Data-driven damage identification based on measurements of the structural health monitoring (SHM) system is a hot issue. In this study, based on the intrinsic mode functions (IMFs) decomposed by the empirical mode decomposition (EMD) method and the trend term fitting residual of measured data, a structural damage identification method based on Mahalanobis distance cumulant (MDC) was proposed. The damage feature vector is composed of the squared MDC values and is calculated by the segmentation data set. It makes the changes of monitoring points caused by damage accumulate as “amplification effect,” so as to obtain more damage information. The calculation method of the damage feature vector and the damage identification procedure were given. A mass-spring system with four mass points and four springs was used to simulate the damage cases. The results showed that the damage feature vector MDC can effectively identify the occurrence and location of the damage. The dynamic measurements of a prestress concrete continuous box-girder bridge were used for decomposing into IMFs and the trend term by the EMD method and the recursive algorithm autoregressive-moving average with the exogenous inputs (RARMX) method, which were used for fitting the trend term and to obtain the fitting residual. By using the first *n*-order IMFs and the fitting residual as the clusters for damage identification, the effectiveness of the method is also shown.

#### 1. Introduction

The structural health monitoring (SHM) system provides a wealth of measurements making the data-driven method effective and rapid for damage identification [1–3]. These different types of measurements are representative and authentic in expressing the real state of the structure. Meanwhile, the measurements contain a large amount of structural variation information, environmental information, and noise information, which when combined with the mathematical, statistical, and probability methodologies makes the data-driven damage identification method become a hot issue of little damage and low signal-to-noise ratio damage identification [4].

The first issue of damage identification via measurements of the SHM system is that we need to obtain the real structural responses to identify whether the damage happened or not. It is well known that in monitoring data, the variations induced by temperature, wind, and vehicle are always fused together so that the changes caused by damage are submerged which leads the damage-induced variation to drown out. Many studies have focused on the relationship between temperature and monitoring structural responses and how to eliminate the effects of noise and temperature influences [5–9]. The results showed that the temperature and monitoring responses presented linear relationship through mathematical models fitting [10]. In addition, many studies presented that the noise satisfies probability characteristics and can be filtered out by means of Fourier transform and wavelet transform [11]. It has been noticed that the data separated from the original monitoring data, i.e., the final long-term trend, are usually considered as the temperature trend term. Some researchers used the trend term and its fitting residual to identify the damage [12, 13]. Therefore, before the damage identification, it is usually necessary to process the measurements of the SHM system by some data processing methods in order to obtain more real damage information or separate the environmental and noise influences.

Another key issue of damage identification using the data-driven method is to establish an applicable and effective damage feature index or vector to evaluate the occurrence and location of the damage. The Mahalanobis distance method has been widely used in structural damage identification due to its good applicability and sensitivity to damage. The advantages of Mahalanobis distance are as follows: (1) data normalization: it can make the scale of monitoring data independent, so that a variety of data can be integrated into the same scale; (2) it can eliminate the interference of related variables when the monitoring data are multidimensional and the correlation of each dimension variable is large. For the measurement of the SHM system, such as displacement and strain, there are significant correlations among them, and the Mahalanobis distance method can fuse these multisource dimensional data and obtain more damage information from these different types of data.

Luo et al. [14] used the Mahalanobis distance corresponding to different crack degrees as the standard value and judged the damage degree by comparing the difference between the Mahalanobis distance and the standard value in the test group. The authors of [15] also used the coefficient vectors of the autoregressive sliding average model to calculate Mahalanobis distance for damage localization of the shear frame structure. Mosavi et al. [16] proposed a damage location identification method of steel-girder bridges under environmental excitation, in which Mahalanobis distance was used as the evaluation index of damage location, and verified good sensitivity of the method to damage through the *i*-steel model test. Cao et al. [17] proposed an impact test method for the steel frame model to evaluate structural damage based on independent component analysis and Mahalanobis distance. Zhou et al. [18] used the maximum Mahalanobis distance to evaluate the damage degree of the structure under forced vibration and compared the identification results with the frequency response function method. Liu et al. [19, 20] used Mahalanobis distance as the damage characteristic value to evaluate the damage sensitivity vector of the autoregressive model, and the results showed that the method had good noise resistance. Wang et al. [21] used electromagnetic interference technology combined with the Mahalanobis distance index to evaluate the damage degree of wood structure. Compared with the Mahalanobis distance constructed directly from the monitoring data, Szymon et al. [22] calculated the Mahalanobis distance of the Hank matrix constructed by structural acceleration response, and compared with the subspace damage identification method, the effectiveness of the two methods was proved. The results also showed that the method can effectively reduce the influence of wind, temperature, and other surrounding environment. Chen et al. [23] proposed a damage identification method based on squared Mahalanobis distance cumulant, and the steel *i*-beam test results showed that the damage feature vector MDC had good sensitivity to small damage.

This study proposed a new structural damage identification method based on Mahalanobis distance. The methodology of Mahalanobis distance, empirical mode decomposition, and recursive algorithm autoRegresive-moving average with exogenous inputs was proposed. A damage feature vector sensitive to the damage based on Mahalanobis distance was established. The calculation method of the damage feature vector and the damage identification procedure were given. A numerical simulation with a four sensor system for verifying the damage identification availability was carried out. Finally, the strain measurements of a real bridge SHM system were used to identify whether the damage occurred. The results proved that the damage vector MDC has great potential for structural damage identification.

#### 2. Methodology

##### 2.1. Mahalanobis Distance Methodology

The Mahalanobis distance calculation method was proposed by an Indian statistician Mahalanobis P. C. in 1936 [24]. It is a weighted Euclidean distance. The weight function is the inverse matrix of the covariance of the cluster, which can denote the covariance distance of clusters. Mahalanobis distance takes into account the size of the cluster and the correlation characteristic between elements in clusters. It can effectively calculate the difference between two clusters. For structural damage identification, the variation between the reference cluster and the testing cluster is usually used to determine whether the structure has damages [16]. Mahalanobis distance can be defined as follows:where MD is the Mahalanobis distance from clusters to the mean estimation, *X*_{i} are the elements of the cluster *X*, *μ* is the mean estimation, and *S* is the covariance matrix.

##### 2.2. Empirical Mode Decomposition

Huang et al. [25, 26] proposed a time-domain analysis method for processing nonstationary and nonlinear signals, namely, empirical mode decomposition (EMD). The measured signal can be decomposed into intrinsic mode functions (IMFs) arranged from high frequency to low frequency. Yang et al. proposed a damage detection method using the first IMF to find the spike at the moment of damage [27, 28]. However, this damage identification method is greatly influenced by noise and damage severity. Therefore, the solution is to combine this method with the Hilbert spectrum and some more sensitive damage identification methods [28–31].

The core of the EMD method is to decompose the signal into a set of intrinsic mode functions. No information about the original signal is required. The decomposition is completely adaptive, and IMFs are arranged from high frequency to low frequency. The specific decomposition method is as follows:(a)First, the maximum and minimum values of *x*(*t*) are fitted to the upper envelopes *U* (*t*) and the lower envelopes *D* (*t*) by the cubic spline curve method. The average envelope *m*_{1}(*t*) is as follows: If *h*_{1}(*t*) does not satisfy IMF’s conditions, *h*_{1}(*t*) is equal to *x* (*t*). This iteration is continued to use in formulas (2) and (3) repeatedly until *h*_{1k}(*t*) satisfies IMF’s conditions. In this case, *h*_{1}(*t*) is *c*_{1}, which is the first-order IMF. For IMF, the signal frequency is considered as the “filter,” and *c*_{1} is considered to contain the highest signal frequency component: where *h*_{1k} (*t*) and *h*_{1(k−1)} (*t*) are the *k* times and (*k* − 1) times cycle signals, respectively, and *m*_{1k}(*t*) is the average envelope after *k* times cycles.(b)A new signal *r*_{l} (*t*) can be obtained according to the difference between the original signal *x* (*t*) and *c*_{1}(*t*) as follows:Here, *r*_{l} (*t*) is considered to be a new *x* (*t*), and step (a) is repeated. Then, all IMFs are followed by *c*_{2}(*t*), *c*_{3}(*t*), …. Only when the residual *r*_{n}(*t*) is a monotonic function or satisfies some special conditions, the decomposition will stop. As a result, the original signal *x* (*t*) is finally divided into a group of IMF components *c*_{i}(*t*) and a residual *r*_{n}(*t*), as shown below:

From the EMD method, it can be seen that the signal can be decomposed into IMFs and a residual, which are arranged in the order of high frequency to low frequency. However, the damage usually affects some interval frequencies, causing the energy redistribution of each IMF, although the redistribution law is unknown [28]. This makes some IMFs more sensitive to the damage, and some IMFs have more damage information. Therefore, we can select some more sensitive IMFs samples to make the damage easier to identify.

##### 2.3. NNRARMX Modelling

The recursive algorithm autoregressive-moving average with the exogenous inputs (RARMX) method is a model training based on the Gaussian Newton method [32]. First, the unit sampling interval can be defined as *t* = 1, 2, …. The input and output signals are and *y* (*t*), respectively. The relationship between input signals and output signals in the system can be expressed aswhereHere, *A* (*q*,*θ*), *G* (*q*,*θ*), and *H* (*q*,*θ*) are the parameter vectors and *y* (*t*) is the output (displacement, strain, and so on) at the moment *t*, where *t* is 1, 2, …, *n*. is the input (temperature and other external loads), and *e* (*t*) is the equivalent process value of noise and fitting difference. The RARMX model structure is shown in Figure 1.

The regression vector and the fitting vector of the RARMX model are expressed aswhere *φ* (*t*) is the vector composed of the system input vector and system output vector, is the main influencing factor of the health monitoring system, *ε* is the fitting error, *ε* (*t*) = *y* (*t*) − *ŷ* (*t*)), *n*_{a} is the number of past fitting value, *n*_{b} is the number of fitting influencing factors, *n*_{c} is the fitting error, and *n*_{k} is the time delay equal to 1.

The RARMX neural network (neural network, NN) model is shown in Figure 2. The NNRARMX model consists of a three-layer neural network model (*y*, , and *e*), which is an input layer, a hidden layer, and an output layer. In order to achieve general properties, it is necessary to connect a node and a two-layer weighting. Increasing the number of hidden neurons can achieve more complex system modelling. The NNRARMX model is fitted by the Levenberg–Marquardt method, which generally provides the minimization function of numerical solution in the parameter space of the nonlinear model.

The NNRARMX method can provide an effective way for signal fitting, which makes the signal have great nonlinearity [9, 33]. It is known that the long-term trend variation of the measurements is usually caused by the temperature and the damage. Therefore, in order to find more damage information, the NNRARMX method can be used to fit the trend, and the fitting residual can provide some more damage information.

##### 2.4. Damage Identification Method

###### 2.4.1. Damage Feature Vector

In this study, a damage feature vector was defined based on the segmentation summation of Mahalanobis distance square. The main purpose is that the segmentation of data can make damage-induced changes of each monitoring point accumulate as “amplification effect,” so as to obtain more damage information [23]. By comparing the variation of the feature vector before and after damage, whether the damage happened can be determined. IMFs are used because the energy of each IMF will be redistributed after structural damage, and the IMFs will also change [28]. In addition, the separated trend term of the original data also contains damage information, so the fitting residual of the trend term can provide help in damage identification [12].

A cluster can be defined as , and its mean estimation and covariance matrix are *μ* and *S*. , which can be calculated using formula (1). A vector is defined as , and Mahalanobis distance cumulant (MDC) is considered as the damage feature vector, which is expressed as follows:where *i* = (1, 2, … , *n*), *n* = *k* *×* *j*, and *j* is the length of vector *y*_{i} and *k* is the length of vector MDC.

###### 2.4.2. Damage Detection Procedures

*Step 1. *The sensor number of the system is *P*, and the measured nondestructive state signal is expressed as *f*_{1} (*t*), *f*_{2} (*t*), …, *f*_{p} (*t*). The signal length is *n*. IMFs of each signal of *P* sensors can be obtained by the EMD method.

*Step 2. *In each group of IMFs, the first *m* order of IMFs can be selected as the whole cluster {*X*}. The mean estimation of *μ* and the covariance matrix *S* of *X* can be calculated.

*Step 3. *According to formula (1), the Mahalanobis distance vector {*Y*} can be computed. Then, MDC_{k} and damage feature vector MDC are established using formula (10).

*Step 4. *EMD is used to decompose the measured signal into a group of IMF. The first *m* order of IMFs can be used as the tested clusters {}. The vector {} is the Mahalanobis distance of tested clusters {} to the whole clusters mean estimation *μ*. According to formula (10), MDC_{k} values and vector MDC can also be calculated.

*Step 5. *By comparing the damage feature vector MDC, the damage location of the structure is determined.

*Step 6. *If the measured signals have a residual decomposed by the EMD method, the NNRARMX method can be used for fitting the residual, and a new fitting residual can be obtained. Then, the new fitting residual is considered as a cluster to be used while performing Step 2∼Step 5 for damage identification.

#### 3. Mass-Spring System Numerical Simulation

Figure 3 shows a system with four sensors to detect the dynamic data of the mass-spring system, which has four masses and four springs (Figure 3). The system parameters are *m*_{1} = *m*_{2} = *m*_{3} = *m*_{4} = 100 kg and *k*_{1} = *k*_{2} = *k*_{3} = *k*_{4} =5 × 10^{5} kN/m.

The Motion equation of this system iswhere *M* is the mass matrix, *K* is the stiffness matrix, and *F* (*t*) is the transient force. The force is applied to mass point *m*_{4} for 0.01 second. Four mass points are making freedom movement. Their acceleration time histories are recorded. The sampling frequency *f* is 100 Hz. In formula (10), the sampling point *n* adopts 20000. Parameter *j* is 125, and parameter *k* is 160.

The mass-spring numerical simulation system was established using Matlab Software Simulink Toolbox. Four acceleration signals of mass points can be computed. In order to simulate the actual condition affected by noise, 5% white noise was added to the measured signals. G. Rilling EMD toolbox [34] was used to realize the EMD method. In real structures, the quality of structure usually changes little after the structure is damaged. Therefore, the system was adopted to reduce stiffness to simulate the damage. In the case of damage simulation, the stiffness of the spring *k*_{3} was reduced by 10% and 20%, respectively. The acceleration signal of each mass point can be decomposed into three clusters to compare the damage identification results. There are three cases as follows:

*Case 1. *Original acceleration signal

*Case 2. *First order of IMFs

*Case 3. *First three orders of IMFs

Figures 4–7 showed the acceleration time history and the first three orders of IMFs of four mass points by reducing stiffness of *k*_{3} with 10% and 20%, respectively. It can be found that the acceleration time history curve has little change after damage. However, the vibration amplitude of the first three orders of IMFs has little changes after the damage. For example, the amplitude of the second-order IMF of *m*_{1} has increased when the damage increased from 10% to 20%; moreover, the amplitude of the third-order IMF has changed slightly. When damage increased from 10% to 20%, the amplitude of the second-order IMF of *m*_{3} changed, and the second-order IMF of *m*_{4} decreased with the increase of damage. However, with these changes in IMFs, it is difficult to identify the locations of the damage.

Figure 8 shows the variation of MDC values calculated from the original acceleration signal before and after damage in Case 1. It can be seen that the whole variation range of MDC values increases slowly with the increase of damage.

Figure 9 shows the variation of MDC values before and after damage of the first-order IMF calculation corresponding to Case 2. By comparison, the MDC values of *m*_{1} and *m*_{2} did not change significantly. This is because that mass points *m*_{1} and *m*_{2} may be far away from the damage. The variation of MDC values of particle *m*_{3} increased in a ladder shape. It can be clearly seen that the subsections changes with the increase in damage. In the undamaged state, the range of MDC values of *m*_{4} is large. When the damage is accumulated to 20%, the variation of MDC values increased significantly.

Figure 10 shows the variation of MDC values calculated by the first three orders of IMFs before and after damage corresponding to Case 3. Like Case 2, the MDC values of *m*_{1} and *m*_{2} changed slightly after damage. However, after 10% damage, the MDC values of *m*_{3} present an overall upward trend. When the damage reaches 20%, the trend is significant. It can be clearly determined that the spring *k*_{3} has been damaged. The reason for the variation of stiffness is that IMFs are redistributed when the damage occurred; in addition, the Mahalanobis distance cumulant method can consider the correlation variation among the elements in clusters, which makes the small variation accumulate and make the damage easier to identify.

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

**(c)**

#### 4. A Case Study for a Real Bridge

##### 4.1. Structural Health Monitoring System of a Real Bridge in China Mainland

A real bridge structural health monitoring system was used to provide measurement data for damage identification by the method mentioned above. This prestressed concrete continuous box-girder bridge is located in Heilongjiang, China (N47°14′40″, E131°58′11″). The total length of the bridge is 1170 m, which consists of six main spans of 150 m and two side spans of 85 m.

In order to evaluate the long-term static and dynamic performances of the bridge under the influences of traffic and ambient environment, a long-term SHM system was implemented on the bridge. The sensor location of the SHM system is shown in Figure 11. A total of 24 self-adaptive strain sensors are installed on six sections of the box girder, which are, respectively, installed on the inner surface of the top and bottom plate. Detailed information of the SHM system is shown in Figure 12, and more information can be found in the literature [10, 35].

**(a)**

**(b)**

##### 4.2. NNRARMX Model Case Study

Generally, the residual of each strain sensor signal can be considered as a temperature trend term. The EMD method was used to extract the temperature trend-term signals of the strain sensors DS1 and DS2 at D-section on May 21. According to the NNRARMX method, the simulated trend term for temperature trend signals was fit as shown in Figure 13. The autocorrelation function (ACF) values of the fitting error are shown in Figures 14 and 15.

**(a)**

**(b)**

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

The self-correlation function values of the fitting error of the strain trend items calculated according to the ACF criteria are shown in Figures 14 and 15. The parameter *n*_{a} in the NNRARMX model is the number of parameters that have been fitted according to formula (9). For DS1 strain trend term, the autocorrelation function values with the fitting error of the model for *n*_{a} from 1 to 12 were calculated. For DS2 strain trend term, the fitting error of autocorrelation function values of the model for *n*_{a} from 1 to 15 was calculated. It can be seen that the error of autocorrelation fitting values of the model NNRARMX(12) for DS1 nearly is almost within 95% confidence interval, which means that there is no information loss and the model fitted well [9, 33]. For DS2, the error of autocorrelation fitting values of NNRARMX(15) is within 95% confidence interval. Therefore, the model NNRARMX(12) and NNRARMX(15) agreed well with the strain trend terms of DS1 and DS2, respectively.

##### 4.3. Temperature Trend-Item Extraction for Real Monitoring Strain

The strain data of sensor DS1 from May 16 to May 23 of the next year were selected as analysis clusters. Due to the construction around the bridge, the cable transmission was interrupted and the data from November 15 to January 8 of the next year were lost. The original strain measurements of sensor DS1 is shown in Figure 16. The trend term of the strain was separated from the original strain data according to the EMD method. Since the monitoring period is long (nearly one year), the total strain trend term was extracted by the method of merging daily trend terms. The strain trend term and the signal after separating the trend item are shown in Figures 17 and 18.

##### 4.4. Fitting Model of Temperature Trend Term

According to the NNRARMX method, the NNRARMX(25) model was used to fit the trend term. The *R*^{2} value was 0.96, indicating that the NNRARMX(25) model fitted well (Figures 19 and 20).

**(a)**

**(b)**

##### 4.5. Damage Identification

###### 4.5.1. Damage Identification by Overall Signals

The samples without trend item and a fitting residual signal from May to February of the next year were selected as two reference clusters. The samples without trend item and a fitting residual signal from March to May were considered as two unknown clusters. The variations of MDC values were calculated by the reference clusters and the testing clusters, as shown in Figures 21 and 22. As can be seen from Figures 21 and 22, the testing MDC vector is almost all less than the threshold of 0.95 times of the maximum value. The results indicated that no damage has occurred at the location of sensor DS1.

###### 4.5.2. Damage Identification by the Combination Signals of IMFs

The first seven orders of IMFs of DS1 monthly strain monitoring data in June and May of are shown in Figures 23 and 24, respectively.

The first order, first three orders, first five orders, and first seven orders of the IMFs were selected from the whole IMFs of DS1 strain of June and May of the next year as four reference and testing clusters, respectively. The damage feature vector of each reference cluster and each testing cluster were calculated, respectively. The selected cluster from IMFs was used to determine whether the structural damage is identified more easily by some high-frequency parts or some low-frequency parts. As can be seen from Figure 25, the MDC vector of the testing cluster is less than the threshold value of the reference cluster calculation.

**(a)**

**(b)**

**(c)**

**(d)**

The strain trend terms of DS1 monitoring data of June and May of the next year were fit by the NNRARMX method, respectively. The fitting residual is shown in Figure 26. The MDC vector of the reference clusters and testing clusters calculated according to fitting residual is shown in Figure 27. The results showed that the MDC vector of the cluster to be tested is less than the threshold value, and the location of sensor DS1 is not damaged.

**(a)**

**(b)**

#### 5. Conclusion

In this study, a structural damage identification method based on Mahalanobis distance cumulant with IMFs and the fitting residual was proposed. The damage feature vector MDC calculation method and the damage identification procedure were given. Through the numerical simulation of the mass-spring system and the analysis of real bridge monitoring data, the sensitivity of the damage feature vector and the effectiveness of the method were verified. Additionally, the SHM system usually needs to monitor a variety of data, such as displacement, strain, and acceleration. This method can provide a potential way for different types of monitoring data fusion for damage identification.

Before the damage identification, a series of solving processes are needed to make the data more valid and obtain more damage information. In this study, EMD and NNRARMX methods were applied for data preprocessing. The EMD method is used to decompose the original monitoring data into IMF and trend term, and the NNRARMX method is used to obtain the trend fitting residual. In this way, original monitoring data can work as a global judgment to roughly judge whether the damage has occurred; however, these data cannot analyze the exact location of the damage. IMFs of monitoring data contain the damage-induced structural response variation which is caused by the redistribution of the frequency energy, and the fitting residual of the trend term can help to find the long-term damage-induced structural changes. All of these can provide more damage information for microdamage identification and low signal-to-noise data for damage identification.

#### Data Availability

The data used to support this study are available from the first author via e-mail ([email protected]).

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.

#### Acknowledgments

This research was funded by the National Natural Science Foundation of China, under grant number 51908497.