Abstract

The general learning process of deep learning is extremely time-consuming. Unlike the traditional learning process, a weight-generating approach to quickly generate the weight vectors of a deep neural network model is proposed, which can be used for parameter identification of a dynamic system. Based on the analysis of three trained deep neural network models, which are used to identify the parameters of three different dynamic systems, the statistical relationships between the weight vectors of each hidden layer and its inputs are revealed. Then, the statistical patterns of the weight vectors are imitated by exploiting the statistical patterns of the inputs and these relationships. Then, a weight-generating approach is designed to quickly generate the weight vectors of a deep neural network model. The effectiveness of the weight-generating approach is tested on the tasks of parameter identification for the three dynamic systems. The numerical results are provided to demonstrate the validity and high efficiency of the proposed weight-generating approach.

1. Introduction

The temporal evolution of dynamic systems is widely described using ordinary differential equations (ODEs), which are a set of governing differential equations. For many dynamic systems, such as aircraft and spacecraft attitude dynamics, ODEs can accurately represent the physical laws of the actual system and are very accurate [1, 2]. However, the system parameters in these ODEs are usually unknown and need to be determined according to the input and output data, i.e., the control and state data, of the dynamic systems.

Many studies have been proposed in the field of parameter identification [3, 4]. Most identification methods estimate the system parameters by minimizing the mean square error between the measured state data of a dynamic system and the numerical solution of its ODEs obtained using the estimated parameters. The least squares identification method is the most widely used parameter identification method in the aerospace field [510]. However, since the least squares identification method needs to use the state derivative, which is usually obtained by calculating the numerical differences on state data, it is very sensitive to the difference accuracy. To solve the abovementioned problem, some parameter identification methods that use a set of basis functions to represent state data have been proposed [11, 12]. These methods can use the analytic derivative of the basis function group instead of the numerical difference to calculate the state derivative. However, the identification accuracies of these methods are highly dependent on the selection of the type and number of basis functions. Another important parameter identification method is the Kalman filter [1316]. The traditional Kalman filter can only be applied to linear systems, while the extended Kalman filter can be used for nonlinear systems. The extended Kalman filter needs to use linearization to approximate a nonlinear system, and the error introduced by linearization may lead the filter to diverge [17, 18]. Based on the above analysis, it can be found that most traditional identification methods need to solve the state derivative, which can easily introduce errors and reduce the identification accuracy. Thus, some novel identification approaches should be developed for solving this problem.

With the development of deep learning (DL) in various fields [1921], recent studies have attempted to expand DL applications to the field of parameter identification [2225]. According to the Takens theorem, a large class of nonlinear dynamical systems can be effectively reconstructed using a sufficiently large system state and control sequence [26]. The system parameters are important features for describing the system dynamics. DL technology can learn the mapping relationship between the state and control sequence of a dynamic system and its corresponding parameters by automatically extracting multilayer abstract features implied in the system state and control sequence during training. Therefore, DL identification methods can use only the state and control sequence to identify the parameters of the dynamic system without the procedure of addressing state derivatives. In the general process of DL-based parameter identification, a deep neural network (DNN) model is first constructed with randomly generated initialization weight vectors. The weight vectors of the DNN model are then iteratively trained using training data. The commonly used methods to optimize weight vectors in the training process include the momentum method [27] and the adaptive moment (Adam) method [28]. There are also some optimization methods that can satisfy the Lyapunov stability to ensure the convergence of the training process [2931]. But the acquisition of an accurate DNN model often takes a great deal of time.

Due to the high time consumption for training, some studies for a fast-learning approach of a neural network model have been explored. Random vector functional-link networks and extreme learning machines are widely used fast-learning approaches for neural networks with a single hidden layer, in which the weight vectors of the hidden layer are randomly generated without using any prior knowledge [3234]. Nevertheless, DL is a set of hierarchical feature-learning methods that can automatically extract multilayer abstract features needed for target tasks from input data through training. The features at a higher layer of a DNN model are formed by transforming the features at the previous layer and then further amplifying important aspects of the input data and suppressing irrelative variation [35]. The transformation is implemented by inputting the linear combination of the features at the previous layer and the weight vectors of the DNN model at the current layer to nonlinear activation functions. The weight vectors of each layer of a DNN model can thus be regarded as a “filter” to the input of each layer. For example, an edge detection operator in image processing can be perceived as a filter that highlights the areas of an image where the grayscale changes rapidly. For parameter identification of a dynamic system, some statistical regularities and patterns in the state sequences that are important for parameter identification may exist due to the dynamic constraints. The weight vectors of a trained DNN model can be used as filters to extract these important statistical regularities and patterns from the state sequences, i.e., the inputs to the DNN model, and transform them into more abstract features layer-by-layer. Thus, the weight vectors of each layer of a DNN model should have some close relationships with their inputs.

Based on the above considerations, this paper proposes a fast weight-generating approach to obtain the weight vectors of a DNN model used for parameter identification. By analyzing the trained DNN models for identifying the parameters of three dynamic systems, some statistical relationships between the weight vectors and inputs are discovered in each hidden layer. According to these relationships, a weight-generating approach is developed to generate the weight vectors by using the inputs of each hidden layer of the DNN model. The identification performance of the DNN model generated using the weight-generating approach is tested on the three dynamic systems. The results show the validity and high efficiency of this approach. More importantly, our work provides a way of understanding the operating mechanism for the DNN model, which can provide some ideas for further studies.

The rest of the paper is organized as follows. Section 2 introduces the general process of DL-based parameter identification. Detailed descriptions of the relationship analysis between the inputs and weight vectors of the trained DNN models and the design of the weight-generating approach are presented in Section 3. The effectiveness of the weight-generating approach is verified on the three dynamic systems in Section 4. Section 5 summarizes this work and discusses future work.

2. DL-Based Parameter Identification

2.1. Problem Statement

The ODEs of a dynamic system can be generally represented as

where is the state vector of the dynamic system at time , is the control vector, represents a nonlinear function for describing the evolution of the system, and denotes the unknown constant parameters in , which need to be identified.

To use DL to identify the parameters of the dynamic system, it is necessary to determine the input and output of the DNN model. The Takens theorem states that for a large class of nonlinear dynamic systems with an -dimensional state space, it can effectively reconstruct the system state to use historical outputs of the system [26]. According to the Takens theorem, for finite-dimensional dynamic systems, when the historical state-control sequence of the system is long enough, the reconstruction of the system state can be achieved, and the system parameters can also be effectively identified. Thus, the input of the DNN model is set to the state-control sequence , and the output is the parameter to be identified.

The objective of constructing the DNN model is to enable the DNN model to learn the complex mapping relationship between state control sequences and system parameters in a specified dynamic system parameter space during the training process and then achieve accurate identification of any system parameters in the parameter space.

2.2. Data Generation

To make the DNN model to be suitable for the nonlinear system accurately, a large amount of data needs to be used for training. For this reason, a rough range of can be empirically determined in practice and is used as a sampling dynamic system parameter space to generate data for training a DNN model. If the sampling space of is equal to or greater than the space of the actual parameters and the number of sample is sufficient, a trained DNN model can effectively perform the identification task [2224]. A subset of is then randomly selected from the sampling space . According to Equation (1), a state sequence can be solved by a group of given and control sequence . and constitute training data. is used as the input part of the training data, and is used as its training target. If the control trajectory is predetermined and unchangeable for each sample, the input part can be simplified to . A training set is obtained by addressing all samples of in this way.

2.3. DNN Model

The structure of the DNN model used for parameter identification is shown in Figure 1. It is a fully connected network, in which any neuron on any layer is connected to all neurons in the previous layer. The DNN model consists of one input layer, some hidden layers, and one output layer. The DNN identification model can be written as where is the output of the DNN model, is its input, and represents the weight and bias within the DNN model.

Each neuron of all hidden layers is calculated as where is the input vector of the th layer, which contains all outputs of the th layer, are the weight vector and bias of the th neuron of the th layer, and is a nonlinear activation function that introduces a nonlinearity to each neuron.

The output layer is a linear computation where is the input vector of the output layer, which contains the full output of the last hidden layer, and are the weight matrix and bias vector of the output layer, respectively.

2.4. Training Process

A DNN model with a determined structure needs to be trained to learn the mapping relationship between the state and control sequential state of a system and their corresponding parameters . A schematic diagram of the training process is shown in Figure 2.

The first step of the training process is to assign an initialization to the DNN model. The input of the training data is then input into the DNN model to obtain the output . An objective function is given to measure the difference between the and target . For parameter identification, the objective function is designed as where is the number of samples in the training set . An optimization method, which is commonly selected from the gradient descent algorithm and its variants, is used to iteratively update to minimize . Each optimization iteration contains two steps, the first step is to compute the gradient , and then, is updated by where is the learning rate of training. After some epochs, a trained DNN model is obtained to identify any parameter in the space .

3. Weight-Generating Approach of the DNN Model for Parameter Identification

3.1. Relationships between the Inputs and Weight Vectors of Each Hidden Layer

In this section, some relationships between the inputs and weight vectors of each hidden layer are revealed by analyzing the trained DNN models for identifying the parameters of three different dynamic systems. The detailed discussion is given below.

3.1.1. Attitude Dynamics of a Rigid Spacecraft

The attitude dynamics of a rigid spacecraft are first used for analysis. The adopted dynamic model is based on an axisymmetric rigid spacecraft, and its body coordinate is consistent with the direction of the principal axes. Suppose the moments of inertia satisfy the relation , i.e., the rigid spacecraft is axisymmetric about the spin axis , and . In the absence of an external torque, the ODEs of the attitude dynamics are written as where and are the angular rate and the angular acceleration, respectively, of the body coordinate frame in an inertial reference frame.

A training set is generated by setting the initial value and range. The initial angular velocity is set to , and the sampling range of is [100, 120]. The simulation time is 200 s, and the sampling frequency is 10 Hz. After solving Equation (7) based on the initial values and ranges, 2000 groups of data, which are generated by different values, are used as the training set. To be convenient for training, each group of is divided by 100 for the training target and has a new range of [1, 1.2]. Each data point consists of a three-dimensional angular velocity sequence and the parameters . Figure 3 shows a group of angular velocity sequences selected randomly from the training set. are periodic curves, and is constant. Due to , all points of are always equal to the initial value for all data. The input part of the training data is simplified to . In addition, a testing set with 100 groups of data that are different from the data of the training set is generated in this way to test model accuracy.

The structure of the DNN identification model is determined to have three hidden layers with 2048, 512, and 128 hidden neurons. A rectified linear unit (ReLU) is proposed as the activation function, and its expression could be written as [36]. The weight is initialized using the Microsoft Research Asia (MSRA) method, and the bias is set to 0 [37]. The used optimization method is the adaptive moment (Adam) estimation method, which can adaptively adjust the learning rate by estimating the first-order and second-order moments of the gradient in the objective function to accelerate the training efficiency of the network [38]. For the selections of hyperparameters, the learning rate is , and the size of a minibatch is 50. After 2000 optimization iterations, a trained DNN identification model is obtained and is called model A. The mean square error (MSE) of model A for identifying the test data is .

Figure 4 shows a visualization diagram of the weight vectors of the three neurons in model A. These neurons are randomly selected from the three different hidden layers. From Figure 4, the weight vector of each neuron is analogous to a random distribution; hence, it is difficult to determine the relationship between the distribution of the weight vectors and the distribution of the inputs to the DNN model.

To make the weight vectors more discriminative, an L2-regularization term is added to the object function where is the regularization coefficient and is the number of model weight vectors. The L2-regularization term can shrink model weight vectors toward 0, which makes the weight matrix sparse in each layer and reduces the complexity of the DNN model. The regularization coefficient is set to 0.001 during training. For the learning rate, it is difficult to find an optimal value. If the learning rate is too large, the training result may stay in a poor local minimum, which leads to a poor convergence accuracy, while it will cause slow convergence and long training time for a small learning rate [39]. Thus, a decaying learning rate is selected to train the DNN model, i.e., the learning rate is larger in the initial stage and gradually decreases as the training progresses, which could improve the convergence speed. The learning rate is set to and is multiplied by 0.2 after every 15,000 iterations. A trained model called model A-L2 is obtained after 60,000 optimization iterations. The weight matrix of each layer of model A-L2 has been effectively compressed. Only 30 of the 2048 weight vectors of the first hidden layer, 53 of the 512 weight vectors of the second layer, and 16 of the 128 weight vectors of the third layer are nonzero vectors. The MSE of model A-L2 for identifying the test data is and is superior to model A, which indicates that the weight vectors of model A-L2 extract the important information used for identification that is implicit in the weight vectors of model A.

To search the relationship between the inputs and weight vectors of the DNN model, the statistical characteristics of model A-L2 are computed for analysis. The mean curves and standard deviation curves of the inputs and weight vectors of the first hidden layer are shown in Figures 5 and 6, respectively. The two statistical curves of the inputs are obtained by calculating the mean and standard deviation of each angular velocity sequence point in the input parts of all training data. According to Equation (3), the input to each layer is combined with a weight vector to compute the inner product. Thus, based on the inner product calculation, the sequence order of the statistical curves of the weight vectors is consistent with that of the corresponding statistical curves of the inputs. As Figures 5 and 6 show, the parts in the red boxes in the statistical curves of the inputs have more rapid change rates than other parts, while the statistical curves of the weight vectors highlight the corresponding parts in its curves and suppress most other parts. This illustrates that the statistical patterns of the weight vectors of the first hidden layer are related to the change rates of the statistical patterns of the inputs.

The second hidden layer receives the outputs of the first hidden layer and further extracts features used for parameter identification. Figures 7 and 8 show the mean curves and standard deviation curves of the inputs and weight vectors, respectively, of the second hidden layer. As shown in Figures 7 and 8, all the statistical curves are composed of discretized points. By connecting the discretized points using red line segments, the variation trends of different statistical curves are compared by judging the monotonicity of these line segments. The variation trends for the two statistical curves of the inputs are almost the same as those for the two curves of the weight vectors. It should be noted that the mean curve and standard deviation curve of the weight vectors have nearly the same overall magnitude. This phenomenon also appears in the third hidden layer. The mean curves and standard deviation curves of the inputs and weight vectors of the third hidden layer are presented in Figures 9 and 10, respectively. As Figures 9 and 10 show, 88.5% and 76.9% of the variation trends of the mean curve and standard deviation curve, respectively, of the inputs are the same as those of the two curves of the weight vectors. The mean curve and standard deviation curve of the weight vectors of the third hidden layer also have nearly the same overall magnitude. This shows that the inputs and weight vectors of each of the last two hidden layers have similar statistical patterns, and the two statistical curves of the weight vectors of each layer have nearly the same overall magnitude.

3.1.2. Two-Dimensional Damped Harmonic Oscillator

A two-dimensional damped harmonic oscillator is selected to further verify the generalization of the relationships above. The ODEs of the system dynamics are written as follows [40]:

To generate a training set, the initial state is set to , and the sampling ranges of and are [0.1, 0.5] and [1, 5], respectively. The simulation time is 15 s, and the sampling frequency is 10 Hz. According to Equation (9), 2000 groups of data, which are generated using different , are used as the training set. Each pair of is normalized in a new range of [0.2, 0.8] and is used as the training target. Each data point includes a two-dimensional state sequence and the parameters . Figure 11 shows a state sequence selected randomly from the training set. Likewise, an extra testing set with 100 groups of data is generated to test model accuracy.

The structure of the DNN identification model is determined to have two hidden layers with 64 and 128 hidden neurons. The selections for the activation function, initialization method, and optimization method are consistent with those used for training model A. The learning rate is , and the size of a minibatch is 50. After 5000 optimization iterations, a trained DNN identification model called model B is obtained. The MSE of model B for identifying the test data is .

Another DNN model trained on the objective function with an L2-regularization term is also obtained to analyze the relationships between the inputs and weight vectors of each layer. The regularization coefficient is set to 0.001 during training, and the learning rate is set to and is multiplied by 0.2 after every 20,000 iterations. After 80,000 optimization iterations, a trained model called model B-L2 is obtained. In this case, 21 of the 64 weight vectors of the first hidden layer and 21 of the 128 weight vectors of the second layer are nonzero vectors. The MSE of model B-L2 for identifying the test data is and is quite close to that of model B.

The statistical characteristics of model B-L2 are then computed for analysis. Figures 12 and 13 show the mean curves and standard deviation curves of the inputs and weight vectors, respectively, of the first hidden layer. It can be seen that the relationship between the statistical patterns of the weight vectors and the change rates of the statistical patterns of the inputs are also similar to the previous analysis. The statistical curves of the weight vectors of the first hidden layer highlight the parts of the statistical curves of the inputs that have more rapid change rates and suppress most other parts.

Figures 14 and 15 show the mean curves and standard deviation curves of the inputs and weight vectors, respectively, of the second hidden layer. The variation trends for the two statistical curves of the inputs are also similar to those for the two curves of the weight vectors. By judging the monotonicity of the red line segments, 80.0% and 65.0% of the variation trends of the mean curve and standard deviation curve, respectively, of the inputs are the same as those of the two curves of the weight vectors. The two statistical curves of the weight vectors also have approximately the same overall magnitude.

3.1.3. Damped Pendulum

The third dynamic system used for analysis is a damped pendulum. The ODE of the system dynamics is written as

By substituting for , Equation (10) is rewritten as

The initial state is set to . The sampling ranges of and are [5, 10] and [0.5, 1], respectively. The simulation time is 25 s, and the sampling frequency is 10 Hz. According to Equation (11), 2000 groups of data are generated as a training set, and 100 groups are generated as a testing set. is then normalized in a new range of [0.5, 1]. Each data point includes a two-dimensional state sequence and the parameters . Figure 16 shows a state sequence selected randomly from the training set.

The DNN model for this identification task has two hidden layers with 128 and 16 hidden neurons. The selections for the activation function, initialization method, optimization method, and hyperparameters are consistent with those used for training model B. After 5000 optimization iterations, a trained model, model C, is obtained. The MSE of model C for identifying the test data is .

A DNN model trained on the objective function with an L2-regularization term is then obtained. The regularization coefficient is set to 0.001 during training. The learning rate is set to and is multiplied by 0.2 after every 10,000 iterations. The trained model, model C-L2, is obtained after 40,000 optimization iterations. In this case, 53 of the 128 weight vectors of the first hidden layer and 8 of the 16 weight vectors of the second layer are nonzero vectors. The MSE of model C-L2 for identifying the test data is . Figures 17 and 18 show the mean curves and standard deviation curves of the inputs and weight vectors, respectively, of the first hidden layer. Figures 19 and 20 show the mean curves and standard deviation curves of the inputs and weight vectors, respectively, of the second hidden layer. The relationships between the inputs and weight vectors of each layer are consistent with the previous analysis. The statistical curves of the weight vectors of the first hidden layer highlight the parts of the statistical curves of the inputs that have more rapid change rates and suppress most other parts. The statistical curves of the weight vectors of the second hidden layer have approximately consistent trends with the statistical curves of the inputs. As Figures 19 and 20 show, 76.9% and 75.0% of the variation trends for the mean curve and standard deviation curve, respectively, of the inputs are the same as those for the two curves of the weight vectors. For the mean curve and standard deviation curve of the weight vectors, their overall magnitudes are also quite close.

3.2. Weight-Generating Approach

According to the analysis above, there exist close relationships between the inputs and weight vectors of each hidden layer of a DNN model. There should be a way to directly transform the inputs of each hidden layer into the weight vectors of the corresponding layer by imitating these relationships to construct a DNN model, which can achieve high accuracy of parameter identification without traditional training processes. Based on this idea, a weight-generating approach of a DNN model for parameter identification is proposed.

The weight vectors of the first hidden layer are designed to combine the statistical patterns of the inputs with the change rates of their statistical patterns. The design goal is to highlight the parts with rapid change rates in the two statistical curves of the inputs to the first hidden layer and suppress the parts with slower change rates. The change rates of the mean curve and standard deviation curve of the inputs are obtained by calculating the absolute differences of the two statistical curves. The mean curve and standard deviation curve of the inputs to the first hidden layer are represented as where is the dimensions of the input to the first hidden layer, for example, the angular rate sequence contains two dimensions, and is the number of sequence points of each dimension. The absolute difference of each dimensional sequence of the statistical curve is calculated. where and are the th point of the th dimensional sequence in the mean curve and standard deviation curve, respectively, of the first hidden layer. All changing rates are then normalized as

Two normalized changing rate sequences are obtained

To further highlight the parts with rapid change rates, the statistical patterns of the weight vectors of the first hidden layer are imitated by calculating the dot product between the statistical patterns of the inputs and the normalized changing rate sequences where and are the imitated mean vector and standard deviation vector, respectively, of the weight vectors of the first hidden layer.

The weight vectors of the first hidden layer are thus generated based on and . The sequence point of each weight vector is sampled from a Gaussian distribution , in which the constant is used to adjust the scale of to ensure a close magnitude with . All weight vectors of the first hidden layer are generated in this way and are used to compute the inputs to the second hidden layer according to Equation (3).

For the other hidden layers, since their inputs and weight vectors have similar statistical patterns, each weight vector of the th hidden layer is generated by sampling from , where and represent the mean vector and standard deviation vector, respectively, of the inputs to the th hidden layer. and are the max values in and , which are used to make the magnitudes of the mean vector and standard deviation vector of the weight vectors consistent.

The operational method of the output layer is linear; hence, its weight matrix can be obtained using the ridge regression method [34]. where is the regularization coefficient which is used to mitigate the problem of multicollinearity and is an identity matrix. A schematic diagram of the weight-generating approach of the DNN model for parameter identification is shown in Figure 21.

4. Simulation and Analysis

4.1. Attitude Dynamics of a Rigid Spacecraft

The task of identifying the parameters of the attitude dynamics is first applied to verify the performance of the DNN model using the weight-generating approach. The DNN model A-F is set to have the same structure as model A. The hyperparameters, used to construct model A-F, are set to and . According to Equations (11)-(15), the imitated mean curve and standard deviation curve of the weight vectors of the first hidden layer are obtained and are shown in Figure 22. As shown in Figure 22, the two statistical curves have quite similar trends with the statistical curves of the weight vectors in Figures 5 and 6, which highlight the parts in the red boxes and suppress most other parts.

To verify the validity of the weight generating approach, the identification accuracy of model A-F is compared with model A and model A-L2, which are trained using the Adam method. And an extra DNN model is trained based on the extreme training machine, model A-R, whose weight vectors of the three hidden layers are all generated by sampling from a Gaussian distribution and the weight matrix of the output layer is also obtained according to Equation (17). In this work, all simulations are conducted on a laptop with an Intel Core i7-6820HK CPU and 32 GB memory.

Table 1 shows the comparison of the identification accuracies for the testing set using the different DNN models. As shown in Table 1, model A-F obtains a high identification accuracy. The identification accuracy of model A-F is superior to that of model A, which is trained on 2000 iterations, and is close to that of model A-L2, which is trained on 60,000 iterations. The identification accuracy of model A-R is the worst. The time to build model A-F is approximately 0.8 s, while the times for training model A and model A-L2 are approximately 19 min and 9 hours, respectively. This illustrates that the relationships between the inputs and the weight vectors of each hidden layer exist and can be exploited to rapidly build a DNN model with a high identification accuracy.

4.2. Two-Dimensional Damped Harmonic Oscillator

The weight-generating approach is then applied to generate a DNN model used for identifying the parameters of a two-dimensional damped harmonic oscillator. The structure of the DNN model is the same as that of model B. The hyperparameters used for building model B-F are set to and . Likewise, the imitated mean vector and standard deviation vector of the weight vectors of the first hidden layer are obtained according to Equations (11)-(15) and are shown in Figure 23. As Figure 23 shows, the trends of the two imitated statistical curves are also similar to the statistical curves of the weight vectors in Figures 12 and 13.

The identification accuracy of model B-F is also compared with three DNN models, model B and model B-L2 trained using the Adam method. And an extra DNN model is trained based on the extreme training machine, model B-R, whose weight vectors for the two hidden layers are all generated by sampling from a Gaussian distribution , and the weight matrix for the output layer is obtained according to Equation (17). Table 2 shows the comparison of the identification accuracies for the testing set using the different DNN models. As Table 2 shows, the identification accuracy of model B-F is still quite high. The identification accuracy of model B-R is the worst. The time to build model B-F is approximately 0.02 s, while the times for training model B and model B-L2 are approximately 4 min and 1 hour, respectively.

4.3. Damped Pendulum

A DNN model, model C-F, is also obtained using the weight generating approach. The structure of model C-F is the same as that of model C. The hyperparameters used for building model C-F are set to and . The imitated mean curve and standard deviation curve of the weight vectors of the first hidden layer are shown in Figure 24. The two imitated statistical curves also highlight the parts in the red boxes and suppress the other parts.

The comparison of identification accuracies for the testing set among model C-F, model C, model C-L2, and an extra DNN model trained based on the extreme training machine, model C-R, whose weight vectors for the two hidden layers are all generated by sampling from a Gaussian distribution and weight matrix for the output layer is obtained according to Equation (17), is shown in Table 3. Model C and model C-L2 are trained using the Adam method. As Table 3 shows, the identification accuracy of model C-F is significantly better than that of the other DNN models. The time to build model C-F is approximately 0.017 s, while the times for training model C and model C-L2 are approximately 5 min and 40 min, respectively. This further proves the validity and high efficiency of the weight-generating approach of a DNN model for parameter identification. Furthermore, the analysis results illustrate that the change rates of the statistical patterns of the state sequences are important information for the parameter identification tasks of dynamic systems based on a DL method.

5. Conclusion

In this paper, a weight-generating approach is designed to directly build a DNN model, which is used for identifying the parameters of dynamic systems. This paper analyzes the trained DNN models that are used to identify the parameters of three different dynamic systems and reveals some relationships between the inputs and weight vectors of the hidden layers in the tasks of parameter identification. The analysis results show that the statistical patterns of the weight vectors of the first hidden layer are related to the change rates of the statistical patterns of the inputs, while the weight vectors of each of the other hidden layers have a statistical pattern similar to their inputs. These relationships are utilized to design the weight generating approach. The performances of the DNN models generated using the weight-generating approach for identifying the parameters of the three dynamic systems are compared with different DNN models. The comparison results illustrate the validity and efficiency of the weight-generating approach for the task of parameter identification.

This work also provides insight for understanding the internal operating mechanisms of DNN models used for the parameter identifications of dynamic systems. Nevertheless, due to the complexity of the internal operating mechanisms of DNN models, there still exist other unknown and important information hidden in the weight vectors of DNN models. In future work, we will further analyze the internal operating mechanisms of DNN models to gain more insight and support theories that can be leveraged to improve the application capability of a DNN model in the field of parameter identification.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (91748203 and 11972102), the Shenzhen Science and Technology Program (Grant Nos. 202206193000001 and 20220816231330001), and the State Key Laboratory of Robotics and Systems (HIT) (SKLRS-2022-KF-08).