Abstract

The deep neural network (DNN) was applied for estimating a set of unknown parameters of a dynamical system whose measured data are given for a set of discrete time points. We developed a new vectorized algorithm that takes the number of unknowns (state variables) and number of parameters into consideration. The algorithm, first, trains the network to determine weights and biases. Next, the algorithm solves the systems of algebraic equations to estimate the parameters of the system. If the right hand side function of the system is smooth and the system have equal numbers of unknowns and parameters, the algorithm solves the algebraic equation at the discrete point where absolute error between the neural network solutions and the measured data is minimum. This improves the accuracy and reduces computational time. Several tests were carried out in linear and non-linear dynamical systems. Last, we showed that the DNN approach is more successful in terms of computational time as the number of hidden layers increases.

1. Introduction

Our environment is surrounded by phenomena that can often be modeled by dynamical systems. For instance, a researcher in mathematical biology [1], ecology [2, 3], or epidemiology [4] trying to understand and predict the interaction between different species may encounter a multidimensional dynamical systems with several parameters. Another example of applications of multidimensional dynamical systems are chemical reactions [5]. These also depend on certain parameters, such as the reaction rate or the equilibrium constant [6, 7].

Further applications can be found in economy. For example, when analyzing the dynamics of a certain economic systems for predicting an unfavorable situation, such as a recession or a depression [8]. Understanding the dynamics of such cases enables planning ahead to reduce the impact of possible negative effects.

The study of the dynamics of several continuous variables leads to the analysis of a system of ordinary differential equations [9, 10]. The coefficients of these systems usually depend on some parameters which need to be estimated so that the dynamical system is valid, i.e., it provides a reliable mathematical explanation for a certain phenomenon or predictions are reliable. We refer to the estimation of this set of parameters as calibrating the dynamical system.

In most cases, estimating the parameters involved in dynamical systems is, in fact, a challenging optimization problem which needs special consideration since it is an iterative method, and at times, there can be issues with the convergence [11, 12]. Some of the standard approaches: the Gauss–Newton method [13], multiple shooting, recursive estimation [14], collocation methods [15], modified multiple shooting algorithm [16], cross-entropy approach [17], a generalized smoothing approach [18], or principal differential analysis [19]; etc. However, most of these methods suffer from one or more of the following issues: small convergence region, convergence to a local minimum instead of the absolute minimum, high computational cost, convergence highly dependent on the initial guesses, see, for example, [20, 21] and further references therein. Despite the variety of methods, this estimation problem remains still a well-known challenging problem.

An alternative approach for calibrating dynamical systems is artificial neural networks (ANN). The beginning of ANN is often attributed to the research article by McCulloch [22]. By then, it was less popular due to the capacity of computational machines. Nevertheless, the fast development of computer science and technology in the last decade, has led to an exponential increase of the capacity of machines, for both: storing and processing data.

An ANN is defined as “an information-processing system that has certain performance characteristics in common with biological neural networks” [23, 24]. A network is composed of several layers. The first layer is usually called the input layer, whereas the last one is referred as the output layer. Layers falling in between the input and output layers are called hidden layers. Furthermore, each layer have a set of neurons or units. Deep neural networks (DNN) are ANNs with more than one hidden layer [25, 26]. DNNs are widely applied in artificial intelligence. For instance: in computer vision, image processing, pattern recognition, and cybersecurity [2729]. The successful performance of DNNs is owed to the fact that deep layers are able to capture more variances [30].

ANNs and, in particular, DNNs, could potentially address some of the challenges of the aforementioned standard methods. One of these drawbacks is the need of a large training dataset to obtain a sufficiently accurate estimation of the parameters, which entails a high computational cost [3133]. The ANN approach has been implemented to minimize this limitation. In particular, Dua [34, 35] proposed ANN methods for parameter estimation in systems of differential equations. However, to the best of the knowledge of the authors, we have not encountered in the literature a vectorized DNN algorithm which approximates the solution of a dynamical system with unknown parameters given a set of values of the solution in a set of time points, and thus, it is the main purpose of the paper. Additionally, we provide the following original results.(1)We extended the algorithm from [36] for systems of differential equations to systems of differential equations with unknown parameters.(2)We enhance the efficiency and accuracy of the algorithm in the case when the number of unknowns of the dynamical system coincides with the number of unknown parameters.(3)We show that the DNN algorithm for this problem becomes faster in terms of computational time as the number of layers increase.

For the calculation of gradients of the cost functions, we utilized the auto-differentiation technique supported in the code by the Autograd package [37] and for the optimization of the learning rule, we implemented the Adam method [38], which successfully addresses the local minimum problem present in the standard approaches.

The paper is structured as follows: (2) mathematical formulation of the problem; (3) DNN model; (4) vectorized algorithm for parameter estimation; (5) numerical experiments; and (6) conclusions and further work.

2. Problem Formulation

Let us consider a dynamical system described by ordinary differential equations involving unknown parameters,where is a vector field with components , ; is the vector containing the unknown parameters; is the initial condition andis a given vector-valued function, not necessarily linear, with components , . The measured data for the model equation (1) will be denoted by .

Objective: Given measured data at time points of (1), i.e., with , we aim to develop a vectorized deep neural network algorithm that estimates the unknown parameters and the solution for all .

3. Deep Neural Network Model

We consider a deep neural network with a similar architecture as in [36] for non-parametric systems of ODEs depicted in Figure 1. In this network diagram, with represents time points; is the state of the j-th neuron in layer with , where denotes the total number of layers; denotes the bias in layer ; and denotes the weight matrix of layer . Lastly, denotes the estimated solution of the unknown function , evaluated at the time point .

The architecture of the DNN is based on the following:(i)An input layer consisting of a single neuron corresponding to the time point ;(ii)An output layer with output functions ;(iii) hidden layers with hl neurons in each layer, .

Let us remark that the number of neurons in each hidden layer might not be the same. As a result, the weight matrix has dimension , and thus it might not be a square matrix.

Moreover, the number of neurons in each hidden layer could be determined based on the performance of the model [36]. Here, by performance of the model, we mean the DNN architecture which gives the best approximation with a small number of iterations and reduced computational time.

3.1. Feed-Forward Propagation

The feed-forward network proceeds as follows. For each data point , the learning process starts by assigning .

Then, to obtain given , we require two steps. The first step is working out a provisional vector by multiplying by the weight matrix , and then we add the bias vector , i.e.,

Alternatively, one can write equation (3) in matrix form as follows:

The second and last step requires applying an appropriate choice of activation function to to obtain . Thus, the values for the next layer are expressed by the following:

In short, for each given time point , we will obtain a and an output . To illustrate, the dependence from , we will add the superscript (i), i.e., . Therefore, using vector and matrix notation, we can write , and so that we can summarise (4) and (5) for all

We will refer to the values of the output layer as the output of the network, i.e., or, in component form , where denotes the j-th projection.

3.2. Parameter Estimation

In this section, we describe the DNN algorithm for estimating the parameters in the dynamical system (1).

Let us suppose that we have measured data at points of (1), i.e., with . For each , , we compute the network output, denoted by . Then, the trial solution satisfying the initial conditions is given by the following:

The expression (7) can be written in the matrix form as follows:

We train the network to find and by minimizing the following cost function,

With the optimal learning parameters and , the approximate solutions of the dynamical systems were obtained. In this paper, Adaptive Moment method (also called Adam method) was applied to update the learning parameters [39]. The updating rule iswhere are decay rates for the moment estimates, is the learning rate, and and are the first and the second moment vectors, respectively, that are initialized to be zero. The values , and are usually applied in the literature, see e.g., [39].

After obtaining the optimal values of the learning parameters and , we seek the values of parameter by solving the following non-linear system of algebraic equations,

When attempting to solve (11), we will distinguish two possible cases: (1) The number of equations equals to the number of parameters and (2) Otherwise.

Case 1. For the particular case in which the right hand side function of the dynamical system (1) has as many parameters as equations, i.e., , the following proposition addresses the existence of a solution in this case. This proposition directly follows from the well-known inverse function theorem.

Proposition 1. If number of unknowns and number of parameters are equal, i.e., , is in a neighborhood of some initial parameters , and if the Jacobian is non-singular, then equation (9) has unique solution in the neighborhood of and at point .

Proof. Let , then is constant vector, andis a system of non-linear algebraic equations having equations in unknowns. Hence, the proposition holds by virtue of the inverse function theorem.

Remark 1. The properties of in the proposition is determined by the right hand side function of the dynamical systems (1). Hence, we require the smoothness of in order to apply Proposition 1.

Proposition 2. The best estimation for the parameter is obtained if is chosen according to Proposition 1 such that the sum of the absolute error between solutions of the neural network and the measured data is minimum. That is, choose such thatis minimum.

Proof. Condition (13) ensures the best fit for the solution trajectories.

Case 2. The number of parameters does not equal to the number of equations of the dynamical system. In this case, we can solve the non-linear system (11) by minimizing the following objective function

Remark 2. Let us note, that the approach suggested in Case 2 can also be applied to Case 1 but not the other way round.

4. Vectorized Algorithm for Parameter Estimation

In this section, we describe the DNN algorithm to estimate the parameters of the system of differential equations (1). It differs from the algorithm in [36]. The current algorithm follows supervised learning, since we have the target output. Additionally, it includes the case where the number of parameters equals the number of unknowns.(1)Input data: Define the vector of size .(2)Define the deep neural network structure: Determine the numbers of layers , input layer (having one unit), hidden layers (having units), and the output layer (having units).(3)Initialize the network parameters: Choose the weights to have small random entries, whereas the biases and the moment matrices in the Adam algorithm to have zero entries.(i) has dimension,(ii) has dimension, has dimension,(iii) has dimension, has dimension,(iv) and have the same size as the corresponding , and(v) and have the same size as the corresponding .(4)Forward propagation:(i)For the input layer start by assigning, .(ii)For the hidden layers, ,where is the activation function corresponding to the hidden layer.(iii)For the output layer,(iv)Assign the trial solution using equation (8).(5)Compute the cost and its gradient (9): Calculate the partial derivatives of the function, given in (9), gradients with respect to and . To carry out these computations, we apply automatic differentiation techniques [37, 40].(6)Update the learning parameters using the Adam method described in (10).(7)Solve the non-linear system of algebraic equations (11) to obtain the parameter . Recall that there are two possible scenarios.(i)If the number of unknowns equals that of parameters i.e., , find which minimize the absolute error given by (13) and solve the algebraic equation given by (12).(ii)If ; solve the minimization problem (14).

5. Numerical Experiments

In this section, we apply the algorithm proposed in the previous Section 4, implemented in Python, to four different benchmark problems taken from the literature. The choice of numbers of neurons, numbers of hidden layers, activation functions, and other network parameters are specified within each example. Three out of four of these problems have an analytical exact solution which is then used for validating the model.

Example 1. In this example, taken from [34, 35] and ([6], problem 1), we consider the following system of differential equations related to the mathematical modeling of a chain of irreversible chemical reactionsThis system consists of two unknown functions, two parameters. In this case, given the parameters , the analytical expression of the solution is as follows:Let us take discrete points from the analytical solution (18), see Table 1.
To solve this problem, we considered a neural architecture with one hidden layer of neurons. Furthermore, sigmoid activation functions were applied to each unit and trained with epochs.
The results of applying the algorithm are shown in Figure 2. On the one hand, Figure 2(a) proves the accuracy of the method as we cannot distinguish between the analytical and the approximated solution. On the other hand, Figure 2(b) shows that the absolute error is bounded by 0.00015.
The parameters and were worked out using both approaches, Case 1 and Case 2, (see Remark 2). For the approach described in Case 1, the corresponding system of algebraic equation (11) was solved at . Table 2 shows the estimated parameters obtained through the approaches described in Case 1 and Case 2, which can be applied due to Remark 2. In this particular example, the approach from Case 1 proved to be more accurate. Moreover, the computational time of Step 7 in the algorithm was CPU time for the first approach, whereas for the second approach, it was .

Example 2. Dynamical system models a biomass transfer ([41], p. 531).with initial condition . For , the analytical solution is given by the following:

Following a similar procedure as for the previous application example, we consider 11 points uniformly distributed in the interval from the analytical solution (20), see, Table 3.

In this example, we have three unknown functions and three system parameters. The neural architecture considered has one hidden layer with neurons, activation functions for each unit. The network is trained with 71800 epochs and the results are shown in Figure 3. On the one hand, Figure 3 shows the estimated solutions using the DNN approach and the analytical solutions. As we can see, Figure 3(a) proves again the accuracy of the method. On the other hand, Figure 3(b) shows that the absolute error is bounded by 0.0002. The corresponding system of algebraic equation (11) was solved at . Again, the parameter estimation was conducted using approaches from Case 1 and Case 2. The output is shown in Table 4. Last, the computational time of Step 7 in the algorithm was CPU time for the first approach and for the second approach.

We conclude that in this example, the approach described in Case 1 was again more successful.

Example 3. The next problem is taken from [20] and consists of a dynamical system with initial value condition containing three parameters , and , see also, ([6], problem 7). The model occurs in several applications such as chemical kinetics, theoretical biology, ecology, etc.The data are taken from reference [20], as shown in Table 5.
It is to be noted that the number of unknowns and parameters do not coincide in this example. Hence, unique solvability for the corresponding algebraic system (11) is not guaranteed. For solving this ODE system, we considered a single layer neural network with units in the hidden layers, we have chosen epochs and the activation functions in the hidden layer were functions. Using the approach described in Case 2, we obtain . The proposed algorithm gives better fit compared to the results reported in [20]. This estimation of the parameters provides an accurate solution; see Figure 4.

Example 4. Consider a dynamical system with three state variables which models methanol-to-hydrocarbon process ([34], Example 5).with initial conditions, , and . We took the data from the reference ([34], Table 6).

Using the second approach, we estimated the parameters , and for Example 4. Here, the sigmoid activation function with epochs and one hidden layer having 40 neurons was used. See Table 7 for comparison. Figure 5 shows how the solution trajectories nicely fitted to the data.

Table 8 shows that the results obtained in this paper are in agreement with those from the existing literature.

5.1. Shallow vs. Deep Layers

Here, we looked at the effect of adding more layers in the network. Specifically, we compared the network architecture with one, two, and three hidden layers. We considered Examples 1 and 2. In the three possible network structure, we take the same activation function, and we fix the error tolerance of the cost function to be . Although further analysis is required, the experimental results from Tables 9 and 6 show that reasonable accuracy on the parameter estimation could be achieved by increasing the hidden layer with less numbers of epochs and computation time.

6. Conclusions and Outlook

In this paper, we developed a vectorized deep neural network algorithm for estimating the unknown parameters in a dynamical system based on a system of ODEs as well as the solution functions of the system. This problem arises in different sciences as shown in the Numerical Experiments Section. We highlighted that there are two approaches applicable to problems where the number of parameters coincides with the number of ODEs in the system. In this case, the system of algebraic equation arising from the optimal learning parameters (weights and biases) is uniquely solved at a specific point provided that the right side function satisfies the conditions stated in Proposition 1. Choosing significantly improves the accuracy of the parameter estimation.

Last, the experimental result shows that, for deep neural network, reasonable accuracy of parameter estimation could be achieved for less number of epochs and hence less running time.

For the future work, it is worth comparing the proposed method with the traditional methods in terms of accuracy, computational complexity, and robustness. Furthermore, it is worth also comparing with other neural network architectures. Moreover, the method is yet to be exploited, for instance, for solving delay differential equations, stochastic dynamical systems, or rather more complex problems such as integral equations.

Data Availability

The underlying data supporting this result are available from the literature cited in this article.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The third author truly acknowledges the support received from the grant PID2020-117800GB-I00 of the Ministry of Science and Innovation of the Government of Spain for the research project METRICA.