Research Article | Open Access
Numerical Optimal Control for Problems with Random Forced SPDE Constraints
A numerical algorithm for solving optimization problems with stochastic diffusion equation as a constraint is proposed. First, separation of random and deterministic variables is done via Karhunen-Loeve expansion. Then, the problem is discretized, in spatial part, using the finite element method and the polynomial chaos expansion in the stochastic part of the problem. This process leads to the optimal control problem with a large scale system in its constraint. To overcome these difficulties the adjoint technique for derivative computation to implementation of the optimal control issue in preconditioned Newton’s conjugate gradient method is used. By some numerical simulation, it is shown that this hybrid approach is efficient and simple to implement.
Physical problems, in many cases, can be formulated as optimization problems. These problems are utilized to gain a more widely understanding of physical systems. Typically, these problems depend on some models which, in many cases, are deterministic. Uncertainty might plague everything from modeling assumptions to experimental data. As such, in order to accommodate for these uncertainties, many practitioners have developed stochastic models. In order to make sense of and solve these models, in addition to randomness of the models, some additional theory is required to the resulting optimization problems. This paper proposes an adjoint based approach for solving optimization problems governed by stochastic diffusion equations. In order to deal with the stochastic partial differential equations (SPDE) as a constraint in an optimization problem one may first solve the SPDE. Since the provided systems by this approach are random and nonlinear, such kind of problems are difficult to handle and very challenging. One of the most challenging examples in this area is the control of stochastic diffusion equations with random forcing . Polynomial chaos expansion (PCE)  provides a good direction for solution of nonlinear SPDEs numerically. In the presence of the random forcing, PCE seems to be more accurate and efficient numerical method than Monte Carlo simulation. In fact, the PCE can be interpreted as the Fourier expansion in the probability space. Particularly, the aim of this work is the numerical solution for the distributed control problems involving the stochastic diffusion equations in the form: where is the spatial domain, is the boundary of the spatial domain, is probability space, , , is the solution of the SPDE, is the source function, and is the permeability field of the problem. As an important assumption for the stochastic diffusion equation (1), it is assumed that the random coefficient satisfies the elliptic condition ; that is, there exists a constant such that
Karhunen-Loeve expansion (KLE) of correlated random functions is used to separate the random and deterministic parts in random coefficient [4, 5]. Therefore, a finite dimensional approximation is performed by truncating the Karhunen-Loeve expansion of the permeability field . Then, (1) is approximated by
For approximation of the function with random variables and the Gaussian random variables of the highest degree , the number of PCE coefficients are, . Now, weak formulation of (4) and then Galerkin finite element method are used to solve the SPDE problem approximately. Generally in nature, optimization with SPDE-constraint is infinite dimensional, large, and complex. There are two numerical approaches for solving this problem [2, 7]:(i)discretize then optimize (DO): for problems that can be trivially discretized first,(ii)optimize then discretize (OD): for problems that are differentiable.
Our intention is to work with discretizethen optimization algorithms for smooth functions. Thus, we follow the DO approach. Galerkin finite element method incorporated with polynomial chaos expansion is used for discretizing the SPDE. By computing gradient and Hessian of the Lagrangian augmented function, preconditioned Newton’s conjugate gradient method is used to compute the best right hand side vector (control values at grid points), where the corresponding solution of SPDE (state values) according to this vector, stays as near as to the desired value approximation, and has lowest cost in the objective function. The organization of this paper is as follows: in Section 2, problem formulation in addition to KLE, PCE, and stochastic Galerkin method is presented. In Section 3, distributed control of random forced diffusion equation with some numerical examples is proposed.
2. Problem Formulation
We consider optimal control problems of the form where is the objective function, is an operator which stands as the constraint, is the spatial domain, is the probability space, and is the tensor product.
To provide a concrete setting for our discussion, the following problem is considered as the constraint operator where is the correlated random fields. Hence, the solutions of the SPDE (6) are also random fields. It is assumed that the control space . The weak form of the constraint function (6) can be defined for all as
It is assumed that the objective or cost functional is as where denotes the expected value, is a given desired function, and is the regularization parameter. Obviously, controlling the solution of this problem leads to controlling the statistics of the solution, for example, the expected value of the solution, given by
So, the optimal control problem can be interpreted as follows: given the random field , minimize the objective function (8) over all , subject to , such that for the given random field satisfies in the weak formulation (7) .
2.1. Basic Definitions
In (5), and are assumed to be continuously F-differentiable and that for each the state equation possesses a unique corresponding solution . Similar to the deterministic PDEs, existence and uniqueness of the solution for these kind of SPDEs can be established using the Lax-Milgram theorem. Thus, there is a solution operator . In addition, it is assumed that is continuously invertible. Then, existence and uniqueness of the solution for the above optimal control problems as well as the continuously differentiability of are ensured by the implicit function theorem .
2.2. Karhunen-Loeve Expansion (KLE)
Consider a random field , , with finite second order moment
Assume that . It is possible to expand , for a given orthonormal basis in , as a generalized Fourier series where are random variables with zero means. It is important, now, to find a special basis that makes corresponding uncorrelated: for all . Denoting the covariance function of by , the basis functions should satisfy
Completion and orthonormality of in follow that are eigenfunctions of : where . Indeed, by choosing basis functions as the solutions of the eigenproblem (14), the random variables will be uncorrelated. In (14), denoting , yields the following expansion: where satisfy and . In the case that is considered as a Gaussian process, , will be independent Gaussian random variables. The expansion (15) is known as the Karhunen-Loeve expansion (KLE) of the stochastic process .
Using the KLE (15), the stochastic process can be represented as a series of uncorrelated random variables. Since the basis functions are deterministic, the spatial dependence of the random process can be resolved by them. The convergence property of the KLE to the random process can be represented in the mean square sense where is a finite term KLE [4, 5].
2.3. Polynomial Chaos Expansion (PCE)
There are problems, as the solution of a PDE with random inputs, that the covariance function of a random process , is not known. The solution of such problems can be represented using a polynomial chaos expansion (PCE) given by where the functions are deterministic coefficients, is a vector of orthonormal random variables, and are multidimensional orthogonal polynomials that satisfy in the following properties:
Hence, this convergence justifies a truncation of PCE to a finite number of terms, where the value of is determined by the highest degree of polynomial , used to represent , and the number of random variables—the length of —with the formula . Generally, the value of is the same as the number of uncorrelated random variables in the system or equivalently the truncation length of the truncated KLE. Typically, the value of is chosen by some heuristic method. Indeed, in the case of and random variables, the KLE is a special case of the PCE [9, 10].
2.4. Stochastic Galerkin
Suppose with dimension for and with dimension . In addition, let for be basis of and a basis of . The finite dimensional tensor product space can be defined as the space spanned by the functions for all and . For simplification, let denote a multi-index whose th component and denote the set of such multi-indices; then, the basis functions for the tensor product space have the form where
Then, the Galerkin method is looking for a solution such that for , all and , where is the integration weight. Equation (24) is equivalent to the following equation, which results by plugging in the explicit formula for the basis functions for all and .
Now, if and have the following KLE
then, we can rewrite the Galerkin projection (28) as where for . Equation (31) is approximated by quadrature rule in spatial domain. Now, assume that there exist functions , , such that where, for , the above integral is approximated using quadrature rule in random domain as follows: Then, (30) becomes or, equivalently,
Finally, (35) can be considered as the following block diagonal system: or briefly .
3. Distributed Optimal Control Problem
Using the stochastic Galerkin method for general objective functions, computing the derivatives becomes very tedious and rather messy. Consider the following optimal control problem:
The solutions to linear elliptic SPDEs live in the space and the control space . Denoting the constraint equation by , usually, it is possible to invoke the implicit function theorem to find a solution to the constraint equation. This leads to an implicitly defined objective function . Now we focus on computing derivatives of . For any direction , where and are the dual space of and , respectively. Now, computing the derivative of the constraint with respect to , in the direction , and applying it to the implicit solution to yield So is Hence
Applying the adjoint to the term, Considering the adjoint state, , as the solution to the derivative of becomes The weak form of the constraint function is defined as follows: for all ,
Thus, the constraint function and . Since the constraint function acts linearly with respect to and , thus the corresponding derivative of with respect to in the direction , for all , is given by
Similarly, the derivative of with respect to , for any direction and for all , is given by
Both of these derivatives are symmetric, so and . From here, the adjoint can be computed as
Using this to any direction ,
By the chain rule, the derivative of with respect to in the direction is
Similarly, the derivative of with respect to in the direction is
With these computations, one can compute the first derivative of via the adjoint approach. The aim is to derive the discrete versions of the objective function, gradient, and Hessian times a vector calculation corresponding to the stochastic Galerkin solution technique for SPDE; that is, . Suppose is a finite dimensional subspace of dimension and is a finite dimensional subspace of dimension ; then is a finite dimensional subspace of the state space . Similarly, let be a finite dimensional subspace of the control space (with dimension). For the stochastic Galerkin method, it is assumed that , the space of polynomials with highest degree , and is any finite element space; here, is the space of linear functions built on a given mesh . We choose the system of polynomials that are -orthonormal to be a basis for ; that is, , where
The discretized optimization problem in the stochastic Galerkin framework is where is the stochastic Galerkin solution to the state equation since . The blocks of the above matrix have the form
First, in order to compute the derivatives of the objective function, we attempt to compute the adjoint state . Indeed, the adjoint state, in the infinite dimensional formulation, solves the following equations:
Thus, where and the matrix is of order .
Now, the derivative of in the direction for all , with the computed adjoint state, is
Therefore, the gradient can be computed as follows:
Now, for any vector , it is possible to write as
In order to compute the Hessian times a vector, that is, multiply the Hessian of the with a some vector , the equation for must be solved. Similar to the adjoint computation, solves the linear system where and for is for all . Thus,
Now, solving for requires the solution to the linear system , where or equivalently for all . Hence, in the direction for , can be approximated by
Now, by using the preconditioned conjugate gradient (PCG) method, the Newton equation (69) is solved approximately. When we access to the sufficiently small residual of Newton system, the PCG method is truncated. In real world computation, it is possible to employ some globalization technique for Newton’s method. Considering the case of implementation and relatively low computational cost, line search techniques are popular choices in this way. Finding an optimal step size and using this step size to generate the iterate , are the mission of a line search algorithm. The Armijo condition (or sufficient decrease condition) that the step size is required to satisfy is where and typically is quite small, for example, [12, 13].
3.1. Kronecker Product Preconditioners
Let us to rewrite (37) in the form , where
The strategy here is to introduce as a preconditioner, such that where as mentioned in PCE, is equal with the degree of polynomial chaos truncation, and denotes the Frobenius norm. The closed form of the solution can be written as follow :
Example 1. Consider the following distributed optimal control problem where , , the boundary conditions are given as and the desired function is given by
The random field is characterized by its mean and covariance function
Following Section 2.2, the truncated KLE can be represented as:
The aim is to calculate and such that for all and all
The eigenpairs in truncated KLE solve the integral equation
For this special case of the covariance function, we have explicit expressions for and , . Let and solve the equations
Then the even and odd indexed eigenfunctions are given by and corresponding eigenvalues
We choose to be independent random variables uniformly distributed over the interval .
First of all, we find numerical approximations of and with bisection method, and then, with the eigenpairs evaluated by this and , by choosing dimension and order , which emphasize that , we construct the KLE. Taking , Algorithm 1 is used to compute and . Illustration of computed optimal control and corresponding state in addition to the global matrix sparsity, standard deviation and expected values of the solution, adjoint values, and initial state approximation of the desired function is shown in Figure 1.
(a) Desired values
(b) Initial state
(c) Approximated solution (state)
(f) Expected value
(g) Standard dev.
(h) Sparsity of global matrix
In Figure 1(a), the desired function is plotted for , . Initial state is computed and plotted in Figure 1(b) by solving (37) where . Approximated corresponding to the optimal control is calculated by Algorithm 1 and its graph is depicted in Figure 1(c). As it was expected, by using the proposed Algorithm 1, even by very far initial state, the approximated state solution reaches to the desired function. In Figure 1(d), the is depicted. Serious changes during the computation for initial happened reach to the optimal control function . The adjoint state is computed and plotted in Figure 1(e). As it is expected from (58), the adjoint state corresponding to the must approximate the initial state (see Figures 1(b) and 1(c)). Figures 1(f) and 1(g) represent counter plots of the optimal state expectation and standard deviation. Finally, in Figure 1(h), the representation of coefficient matrix sparsity with the number of nonzero component in the computation is plotted in plane.
In Example 1 we considered an exponential desired function, with v-sharp points. In the next example the smooth desired function and boundary conditions are considered. This is the case that Algorithm 1 is more efficient to use.
Example 2. Consider the optimal control of Example 1, in which the right hand side function as well as boundary conditions is replaced by , , and , which emphasize that . Taking , the illustration of computed optimal control and corresponding state in addition to the global matrix sparsity, standard deviation and expected values of the solution, adjoint values and initial state approximation of the desired function is shown in Figure 2.
(a) Desired values
(b) Mean values
(c) Approximated solution (State)
(f) Expected value-SGS
(g) Standard dev.-SGS
(h) Sparsity of global matrix