#### Abstract

A new parameter identification method under non-white noise excitation using transformer encoder and long short-term memory networks (LSTMs) is proposed in the paper. In this work, the random decrement technique (RDT) processing of the data is equivalent to eliminating the noise of the raw data. In general, the addition of the gate in LSTM allows the network to selectively store data, which avoids gradient disappearance and gradient explosion to a certain extent. It is worthwhile mentioning that the encoder can learn the essence of data, which reduces the burden for the LSTM. More specifically, establish as simple LSTM structure as possible to learn the data of this essence to achieve the best training effect. Finally, the proposed method is used for simulation and experimental verification, and the results show that the method has the advantages of high recognition accuracy, strong anti-noise ability, and fast convergence rate. Specially, the results indicated appropriate accuracy proposed by deep learning combined with traditional method for parameter identification as well as proper performance of the proposed method.

#### 1. Introduction

Operational modal analysis only needs to measure the vibration response data of the structure, and there is no need to measure the input excitation, which saves the measure cost. In addition, the modal parameters can be directly applied to the on-line health monitoring and damage diagnosis of the structure. What is more, for some complex and large structures, such as aerospace vehicles, offshore platforms, and bridges, it is difficult to measure the excitation under the actual working conditions, so it is of great engineering significance to identify the modal parameters directly from the time-domain response signals of the structure [1–4]. Conventional modal analysis methods usually assume that the excitation of the structure is white noise, but in fact, in the working state of the structure, the ambient excitation is mostly non-white noise. Therefore, the research on structural modal parameter identification under non-white noise excitation is beneficial to the further development of structural dynamic analysis technology, so as to be better applied to engineering.

The RDT was a time-domain method to identify modal parameters proposed by Cole [5]. Subsequently, Ibrahim extended the RDT method to the field of multichannel signals and formed Ibrahim time-domain method, which was successfully applied to modal parameter identification of spacecraft model structure [6]. The RDT was originally applied to linear single degree of freedom systems with constant damping ratios, which was later used to extract aerodynamic damping from random crosswind responses [7]. Moreover, Kordestani et al. have proposed a two-stage time-domain output-only damage detection method with a new energy-based damage index [8]. As mentioned earlier, the RDT is used for monitoring and determining structural performance, being able to predict damage and handle the occurrence of sudden failures during operation of the structure. In brief, the RDT is considered as a unique nondestructive testing method, which is widely used in aerospace, civil engineering, and mechanical engineering [9]. In addition, other time-domain methods, such as natural excitation technique (NExT), eigensystem realization algorithm (ERA), and stochastic subspace identification (SSI), have also been applied in engineering [10–15]. In general, the time-domain methods use the measured response signal to identify the modal parameters of the system directly without Fourier transform, which reduces the data transformation error, but the anti-noise ability is poor.

On the contrary, deep learning methods demonstrate more attractive advantages in the anti-noise interference and can be used in damage assessment, health monitoring, modal identification, and so on [16, 17]. Hopfield invented a single-layer feedback neural network Hopfield network to solve combinatorial optimization problems, which is the prototype of the earliest RNN [18]. Nevertheless, given the abundant literature for RNN, it is noticeable that the conventional RNNs usually suffer from a dilemma between the long-range dependence and gradient vanishing. As a remedy, Hochreiter and Schmidhuber proposed the LSTM [19], which greatly alleviated the problem of the early RNN training by using gating unit and memory mechanism. Subsequently, Gers et al. [20] introduced the forgetting gate mechanism on the basis of literature [19], so that the LSTM can reset its own state. Specifically, Greff et al. [21] reviewed the development of the LSTM, compared and analyzed the abilities of eight LSTM variants in speech recognition, handwriting recognition, and chord music modeling, and proved that forgetting gate and output activation function are the key components of the LSTM. It is fair to assert that the neural network represents the most successful identification technology used in the modeling of dynamic system, and it has a unique advantage in antinoise interference; scholars began to study the parameter identification method based on neural network, aiming at better application in practical engineering [22, 23]. Many attempts have already started in this field, such as Xu and Wang who proposed a RNN-based approach for modal parameter identification of structure-unknown systems [24]. Then, the work [25] presented a structural identification method based on RNN and autoregressive and moving average (ARMA) model. Zhang et al. studied the modal parameter identification based on neural network with ARMA [26]. RNNs have unique advantages in processing time series data, and the time-domain method for modal parameter identification based on RNNs has great development potential.

Generally speaking, the limitation of the conventional OMA methods on input-type greatly reduces the adaptability of this method in practical engineering application. However, using the advantages of traditional methods and neural networks to establish a new method is worth studying. For this purpose, an adaptive operational modal analysis method using encoder LSTM with RDT is proposed in this paper. Initially, the data is processed by RDT, so that the recognition accuracy is the highest on the premise of simplifying the model as much as possible. In the second step, with the addition of encoder, LSTM can be regarded as a decoder in autoencoder. Then, establish the simplest network structure as possible to achieve the best performance. Finally, the results indicated appropriate accuracy proposed by encoder LSTM for parameter identification as well as proper performance of the proposed method. The rest of this paper is organized as follows. The RDT and the architecture of LSTM are described in Section 2. The proposed method and its simulation are described in Section 3. Experimental verification is described in Section 4. Finally, conclusions are given in Section 5.

#### 2. Background

##### 2.1. RDT

RDT extracts the free attenuation vibration response from the response of ambient excitation by means of average and mathematical statistics [5–7]. In a linear multidegree of freedom system, the forced vibration response of a measuring point under arbitrary excitation can be expressed as where is the free vibration response of the system with an initial displacement of 1 and an initial speed of 0; is the free vibration response of the system with initial displacement 0 and initial velocity 1; and are the initial displacement and initial velocity of the system vibration, respectively; is the unit impulse response function of the system; is external excitation.

Selecting the appropriate constant to intercept the random vibration response of a structure in situ , and a series of different intersection times () are obtained. The response from time can be expressed as

Since the is stable, the starting point of time does not affect randomness. The time series starting point is moved to the origin of coordinates, and the corresponding subsample function can be expressed as

Take the statistical average of

The excitation is random vibration with the mean value of 0, and the system vibration response and are also stationary random vibration with mean value of 0.

After RDT processing, the free vibration response with initial displacement and initial velocity 0 is obtained [5–9]. RDT has the characteristics of simplicity and clear physical meaning, so it is used in the preprocessing part of the dataset.

##### 2.2. LSTM

RNN is very effective for data with sequence characteristics, and it can mine temporal information in data [16]. However, due to the problems of gradient vanishing and gradient exploding, the training of RNN is very difficult and its application is very limited. Compared with RNN, LSTM has gating unit and memory mechanism and can selectively store information, so it solves the problems of gradient disappearance and gradient explosion.

In the LSTM, for each element in the input sequence, each layer computes the following function: where is the hidden state at time , is the cell state, is the input, is the hidden state of the layer at time or the initial hidden state at time 0, and , , , and are the input, forget, cell, and output gates, respectively. is the sigmoid function, and ⊙ is the Hadamard product.

The method of Adam (adaptive moment estimation) [27] is used to optimize, which has the advantages of simple implementation, high efficiency, less memory consumption, and suitable for large gradient noise problems. The loss function is mean square error (MSE) as follows. where is the total number of samples, is the actual output value, and is the predicted output value.

#### 3. The Proposed Method

Generally speaking, the rocket works in stages during launch, and the length is changing. Therefore, it is necessary to study the dynamic characteristics of beams with varying length. A cantilever beam with different length is taken as an example to verify the proposed method. The flowchart of the proposed method is shown in Figure 1.

##### 3.1. Dataset Processing

Here, the cantilever beams with 11 different lengths are used for numerical simulation. Furthermore, each beam is divided into 10 elements (as shown in Figure 2 and Table 1). The modal damping ratios of the cantilever beam are set to 0.01. The lengths of these beams are 0.75 m, 0.755 m, 0.76 m, 0.765 m, 0.77 m, 0.775 m, 0.78 m, 0.785 m, 0.79 m, 0.795 m, and 0.80 m. The beam is excited by uncorrelated white noise input, and the outputs are acceleration responses.

The construction of dataset is the first and foremost step of network training. Before going into the model, data preprocessing is particularly important. The acceleration response signal preprocessed by RDT and analytical solution are regarded as the input and output data of the network, respectively. More specifically, the dataset is composed of 11000 samples, and each sample is a two-dimensional matrix. The ratio of training and testing data is 8 : 2. where denotes the RDT, as described in Section 2.1.

##### 3.2. The Encoder LSTM Model

Transformer encoder layer is made up of self-attention and feedforward network. The encoder can get the essence of the raw data, and then, we only need to create a small neural network to learn the essence of the data, which not only reduces the burden of the neural network but also achieves good results. The dataset is written into after being processed by the transformer encoder layer.

Then, is substituted into the LSTM layer for calculation. where denotes the LSTM network calculation, which is detailed in Section 2.2.

Finally, in the full connection layer, where and are input and output data, respectively. The PReLU function is selected in the fully connected layer, which is characterized by fast convergence and simple gradient calculation,

In brief, the proposed encoder and LSTM model are consist of the transformer encoder layer, the LSTM layers, and the fully connected layer (Figure 1). For convenience, we use E, L, and F to represent the transformer encoder layer, the LSTM layer, and the fully connected layer, respectively. When given data, the model first uses the transformer encoder layer E1 to learn features, where the number of expected features in the input is 512, and the number of heads in the multihead attention models is 8. Then, the features in E1 are inputted into the layer L2 for deep learning through LSTMs. In the LSTMs, the number of expected features in the input is 512, the number of features in the hidden state is 256, and the number of recurrent layers is 3. Finally, the result is obtained after the calculation of the FC3. The network training process uses loss function to iterate until convergence.

##### 3.3. Results

Here, the encoder LSTM was established by repeatedly training with the iteration steps as 100 and the learning rate as 0.001. It is widely known that the finite element method can be directly used to solve the modal parameters of beam [28, 29]. And the natural frequency equation can be determined by the vibration differential equation [30] and boundary conditions, and then, the natural frequencies of the beam can be obtained. What is more, the analytical solution is taken as the output of the network and compared with the finite element solution (as shown in Figures 3 and 4).

Additionally, the beam with m is taken as an example to illustrate. It is generally known that signal to noise ratio (SNR) [31] is a common index to evaluate the strength of noise in a signal. When the signal contains more noise, the value of SNR is smaller. is the power of the effective signal in the signal, and is the power of the noise in the signal. So in order to test the antinoise ability of the proposed method, noise with different SNR is added to the response data of the beam. Then, the data is preprocessed to establish a dataset. Finally, the dataset is substituted into the model for training and tested.

The MSE of the first 10 steps is greatly reduced, and it is close to the optimal value in the 20th step, reaching 10^{-5} orders of magnitude in the 50th step and the 100th step, which indicates that the proposed method has the advantage of fast convergence (as shown in Figure 5). In addition, the results of the beam under different SNRs are the same, which indicate that the proposed method has strong anti-noise performance.

#### 4. Experimental Verification

##### 4.1. Dataset Processing

A slender aluminum beam (as shown in Figure 6) is selected as the experimental specimen. The shaker table provides a base excitation along the Y direction. The sixth acceleration sensor measures the excitation signal, including white noise and non-white noise excitations, and other sensors measure the response signals. The settings are shown in Table 2. The acquisition equipment is the Agilent VXI plus and play system. Sensors are the sensory organs of various mechanical and electronic devices. Without sensors to capture and convert the original information accurately and reliably, all measurement and control cannot be realized [32, 33]. The sensor type is PCB 333B32 SN 25222.

The conventional OMA method usually assumes that the excitation of the structure is a uniform spectrum excitation. However, in the operational state of structures, such as the flight of aerospace, the passing of bridge, the wind load, or the earth pulsation action of high-rise structure, the ambient excitation is mostly nonuniform. All these states will restrict and affect the application and accuracy of the conventional OMA method. Therefore, the modal analysis must be conducted under the nonuniform excitation spectrum. Traditionally, the four typical non-white noises correspond to blue noise, pink noise, purple noise, and brown noise. In order to make a more comprehensive study of different non-white noise excitation, the excitation spectra of four typical colored noise and white noise are mixed to excite the structure. Here, two typical trapezoidal spectra ambient excitations are used to excite the beam, and the vibration environment of the trapezoidal spectrum base excitation is controlled by the shaker table controller. Excitation spectrum 1 can be regarded as a combination of blue noise, narrow band white noise, and pink noise. Excitation spectrum 2 can be regarded as the combination of purple noise, narrow band white noise, and brown noise. More specifically, the spectrums inflection frequency are 10, 100, 600, and 1000 Hz, respectively (as shown in Figure 7).

The acceleration response signals generated under the excitation of the white noise excitation spectrum, the excitation spectrum 1, and the excitation spectrum 2 are denoted as data 1, data 2, and data 3, respectively. Under laboratory conditions, the method of using the simultaneously measured excitation and response signals of the structure to obtain the transfer rate function of the system for parameter identification is called experimental modal analysis method (the results obtained by this method are referred to as expected output). The acceleration response signals preprocessed by RDT are still used as the input of the network, but the difference is that the output data of the network are the results of experimental modal analysis.

##### 4.2. Model Training and Results

Under the laboratory conditions described in Section 4.1, NExT-ERA, NExT-ARMA, Data-Driven SSI (SSI-DATA) [34], Covariance-Driven SSI (SSI-COV) [35], frequency and spatial domain decomposition (FSDD) [36], and other methods are used to identify the modal parameters of the data 1, and the results are compared with the proposed method. Obviously, the natural frequencies identified by the proposed method are consistent with the expected output, so the recognition accuracy of the proposed method is higher than other methods (as shown in Figure 8). The damping is greatly affected by the external noise, and the recognition results of the proposed method are similar to NExT-ARMA method and FSDD method (as shown in Figure 9). The modal shape is consistent with the actual situation (as shown in Figure 10). Mode is the natural vibration characteristic of the structure. Each mode has a specific natural frequency, damping ratio, and modal shape, so the modal parameters of the structure will not change due to different excitations. Nevertheless, taking EFDD and FSDD as examples, there are modal leakages and false modes in parameter identification of the data 2 and the data 3 (as shown in Figure 11), and the results are inconsistent with the data 1, thereby suggesting that the conventional OMA method is not suitable for non-white noise excitation.

In order to test the performance of the proposed method, different network structures are used to train and test the same data, and the data includes data 1, data 2, and data 3. More specifically, Model 1 means that the data is not processed by RDT, but directly trained by RNN. Model 2 means that the data is not processed by RDT but directly trained by the encoder LSTM.

Obviously, Model 1 has obvious bulge in step 17, and Model 2 has obvious bulge in step 25, which indicates that these network training effects are not good (as shown in Figure 12). Fortunately, the loss function of the proposed method is smooth, and the results are consistent with Figure 8, which shows that the network training effect is good and the modal parameter identification accuracy of the data is high. As well in the loss function of the proposed method, the MSE of the first 10 steps is greatly reduced, and it is close to the optimal value in the 20 steps, reaching 10^{-5} orders of magnitude in the 50th step and the 100th step, which shows that the proposed method has a fast convergence rate. As mentioned earlier, the proposed method has strong generalization ability.

#### 5. Conclusion

An adaptive operational modal analysis method using transformer encoder and LSTM is proposed and has been applied to extract the mode from the acceleration response of cantilever beam model. Simulation and experimental results show that the proposed method has the advantages of strong anti-noise ability, fast convergence, and high accuracy, which provides a new method for the application of modal analysis in engineering.
(a)In the simulation, the proposed method is used to identify the response data with noise of different SNR, and the results are the same, which proves that the method has strong anti-noise ability.(b)In the experiment, different treatment methods are used for the beam, and the recognition results show that the proposed method is the best. Furthermore, in the loss function of the proposed, the first 10 steps decay rapidly and approach the optimal value at 20 steps, and the MSE of the 50th and 100th steps is in the order of 10^{-5}, which shows that the proposed method has a fast convergence rate.(c)In the experiment, compared with other conventional methods, the proposed method has higher recognition accuracy for the data 1. In addition, the result of the data recognition by the proposed method is consistent with that of the data 1, and the convergence speed is fast, which shows that the method has strong generalization ability.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

#### Acknowledgments

This work was supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).