Abstract

A computational approach is proposed for solving the discrete time nonlinear stochastic optimal control problem. Our aim is to obtain the optimal output solution of the original optimal control problem through solving the simplified model-based optimal control problem iteratively. In our approach, the adjusted parameters are introduced into the model used such that the differences between the real system and the model used can be computed. Particularly, system optimization and parameter estimation are integrated interactively. On the other hand, the output is measured from the real plant and is fed back into the parameter estimation problem to establish a matching scheme. During the calculation procedure, the iterative solution is updated in order to approximate the true optimal solution of the original optimal control problem despite model-reality differences. For illustration, a wastewater treatment problem is studied and the results show the efficiency of the approach proposed.

1. Introduction

Many real world problems can be formulated as the stochastic dynamical systems [13]. In presence of the random noises, the exact state trajectory is impossible to be obtained. The output sequence, which is measured from the process plant, is also disturbed unavoidably. Since the fluctuation behavior of the output sequence would be the actual outcome, obtaining such outcome from a mathematical model is a challenging task. In stochastic system, estimation, identification, and adaptive control are the general techniques [4]. Particularly, the Kalman filtering theory and the extended Kalman filter give a great impact in studying stochastic systems, both for linear and nonlinear cases [58]. The data-driven method that could be applied in the fault diagnosis provides an efficient identification approach for stochastic systems. Using the process data to identify the parameters without knowing the actual process model is one of the advantages in modeling stochastic systems [9]. In addition, the stochastic switching systems, subject to random abrupt changes in their dynamics, attract the researchers to design, model, control, and optimize the stochastic systems [10, 11].

The use of stochastic systems, therefore, plays the important role in the real world applications. The development of solution methods and the corresponding practical analysis are contributed to the stochastic research communities, ranging from engineering to business. From the literature, the applications of stochastic system have been well defined; see, for example, power management [12], portfolio selection [13], financial market debt crises [14], insurance with bankruptcy return [15], annuity contracts [16], natural gas networks [17], brain-machine interface operation [18], multi-degree-of-freedom systems [19], fleet composition problem [20], fault diagnosis [21], network control system [22], and stochastic switching system [2325].

In this paper, we propose a computational approach for optimal control of the nonlinear stochastic dynamical system in discrete time. Our aim is to obtain the optimal output solution of the original optimal control problem from a mathematical model. In doing so, a model-based optimal control problem is simplified from the original optimal control problem. Furthermore, the adjusted parameters are introduced into the model used. In this way, the differences between the real system and the model used can be computed. Thus, system optimization and parameter estimation are integrated interactively. On the other hand, the output, which is measured from the real plant, is fed back into the parameter estimation problem. This operation is implemented to establish a matching scheme, in turn, updating the optimal solution of the model used at each iteration step. Notice that the application of this operation, which is the advantage of the algorithm proposed, is in contrast to the works discussed in [2628], where the real output is fed back into the system optimization problem. When convergence is achieved, the iterative solution approximates to the true optimal solution of the original optimal control problem, in spite of model-reality differences. Hence, the efficiency of the approach proposed is highly recommended.

The rest of the paper is organized as follows. In Section 2, a discrete time nonlinear stochastic optimal control problem is described and the corresponding model-based optimal control problem is simplified. In Section 3, an expanded optimal control model, which integrates system optimization and parameter estimation interactively, is introduced. Then, the iterative algorithm based on principle of model-reality differences is derived, and the computation procedure is summarized. In Section 4, a convergence analysis is provided. In Section 5, an example of the optimal control of a wastewater treatment problem is illustrated. Finally, some concluding remarks are made.

2. Problem Statement

Consider the following discrete time nonlinear stochastic optimal control problem: where , , , , and , , are, respectively, the control sequence, the state sequence, and the output sequence. , , and , , are the stationary Gaussian white noise sequences with zero mean and their covariances are given by and , which are positive definite matrices. is a process coefficient matrix, represents the real plant, and is the output measurement. is the scalar cost function and is the expectation operator, whereas is the terminal cost and is the cost under summation. It is assumed that all functions in (1) are continuously differentiable with respect to their respective arguments.

The initial state is where is a random vector with mean and covariance given, respectively, by Here, is a positive definite matrix. It is assumed that initial state, process noise, and measurement noise are statistically independent.

This problem is referred to as Problem .

Because of the complexity in the structure of the real plant and the presence of the random sequences, the exact solution of Problem is impossible to be obtained. Moreover, applying the nonlinear filtering theory to estimate the state dynamics is computationally demanding. In view of these, we propose a simplified model-based optimal control problem, which is constructed by carrying out the linearization of Problem , in order to approximate the correct optimal solution of the original optimal control problem iteratively. This simplified model-based optimal control problem, which is referred to as Problem , is given by where , and , are the expected state sequence and the expected output sequence, respectively. , , and , are adjustable parameters. is a state transition matrix, is a control coefficient matrix, and is an output coefficient matrix, while and are positive semidefinite matrices and is a positive definite matrix.

Notice that, without the adjustable parameters, Problem is a standard linear quadratic regulator (LQR) optimal control problem. Solving this problem will not give us the optimal solution of the original optimal control problem. However, by adding the adjustable parameters into the model used, the differences between the real system and the model used can be computed such that system optimization and parameter estimation are integrated interactively.

3. System Optimization with Parameter Estimation

Now, let us introduce an expanded optimal control problem, which is referred to as Problem , given as follows: where , and , are introduced to separate the control and the expected state from the respective signals in the parameter estimation problem and denotes the usual Euclidean norm. The terms and are introduced to improve convexity and enhance convergence of the resulting iterative algorithm. It is important to note that the algorithm is to be designed such that the constraints and will be satisfied at the end of the iterations. In this situation, the state estimate and the control will be used for the computation in the parameter estimation and the matching schemes. On the other hand, the corresponding expected state and control will be reserved for optimizing the model-based optimal control problem.

It is important to note that the output measured from the real plant is fed back into the parameter estimation problem and the matching scheme, which aims at updating the model output from the model-based optimal control problem repeatedly. On this basis, the output residual could be reduced such that the model output approximates closely to the real output, in spite of model-reality differences. This improvement enhances the accuracy of the output solution as discussed in [2628].

3.1. Optimality Conditions

Define the Hamiltonian function for Problem as follows [2830]: Then, the augmented cost function becomes where , and are the appropriate multipliers to be determined later.

Applying the calculus of variation [26, 27, 2931] to (7), the following necessary optimality conditions are obtained.(a)Stationary condition:(b)Costate equation: (c)State equation: with the boundary conditions and .(d)Adjustable parameter equations:(e)Multiplier equations:with , and .(f)Separable variables:

In view of these necessary optimality conditions, conditions (8a), (8b), and (8c) are the necessary conditions for the modified model-based optimal control problem, conditions (9a), (9b), (9c), and (9d) define the parameter estimation problem, and conditions (10a), (10b), and (10c) are used to compute the multipliers.

3.2. Feedback Control Law

Taking the necessary optimality conditions (8a), (8b), and (8c), the modified model-based optimal control problem, which is referred to as Problem , is defined as follows:

To solve Problem , we will construct a feedback control law, which includes the model-reality differences for system optimization. Hence, with the determined value of the adjustable parameters, the corresponding result is stated in the following theorem.

Theorem 1 (expanded optimal control law). Suppose that the expanded optimal control law for Problem exists. Then, this optimal control law is the feedback control law for Problem given by wherewith the boundary conditions given and , and

Proof. From (8a), the stationary condition can be rearranged by Applying the sweep method [28, 30, 31], that is, substitute (17) for into (16) to yield Then, consider the state equation (8c) in (18). After some algebraic manipulations, the feedback control law (13) is obtained, where (14a) and (14b) are satisfied.
From (8b), the costate equation is rewritten as follows after substituting (17) for into (8b): Considering the state equation (8c) in (19), we have Use the feedback control law (13) in (20), and, doing some algebraic manipulations by considering (14a) and (14b), it is found that (14c) and (14d) are satisfied after comparing the manipulation result to (17). This completes the proof.

Taking (13) in (8c), the state equation becomes and the output is measured from

3.3. Adjustable Parameters and Multipliers

Now, we apply the separable variables given in (11) for solving the parameter estimation problem as defined in (9a), (9b), (9c), and (9d). Our aim is to establish the matching scheme, where the differences between the real system and the model used are taken into account. Consequently, the adjusted parameters, which are resulting from parameter estimation problem defined in (9a), (9b), (9c), and (9d), are calculated from

The multipliers, which are related to the Jacobian matrix of the functions and with respect to and , are computed from

3.4. Iterative Algorithm

From the discussion above, the resulting algorithm, which is an iterative algorithm, is summarized below.

Iterative Algorithm

Data. Consider , , , , , , , , , , , , , , , , , , , , and . Note that and may be chosen through the linearization of , and is obtained from the linearization of .

Step 0. Compute a nominal solution. Assume , , , , , and . Solve Problem defined by (4) to obtain , , and , . Then, with , , , , and , from data, compute and from (14b) and (14c), respectively. Set , , , and .

Step 1. Compute the adjustable parameter , , , and , , from (23a), (23b), (23c), and (23d). This is called the parameter estimation step.

Step 2. Compute the modifiers , , and , , from (24a), (24b), and (24c). This requires the partial derivatives of , , and with respect to and .

Step 3. With the determined , , , , , , , and , solve Problem defined by (12) using the result as given in Theorem 1. This is called the system optimization step.(3.1)Solve (14d) backward to obtain , and solve (14a), either backward or forward, to obtain .(3.2)Use (13) to obtain the new control .(3.3)Use (21) to obtain the new state .(3.4)Use (17) to obtain the new costate .(3.5)Use (22) to obtain the new output .

Step 4. Test the convergence and update the optimal solution of Problem . In order to provide a mechanism for regulating convergence, a simple relaxation method is employed:where are scalar gains. If , , and , , within a given tolerance, stop; else set and repeat from Steps 1–4.

Remarks(a)The offline computation is done, as stated in Step 0, to calculate , , and , for the control law design. Then, these parameters are used for solving Problem in Step 0 and for solving Problem in Step 3, respectively.(b)The variables , , , , , , and are zero in Step 0. Their calculated values, , , and in Step 1, , , and in Step 2, and in Step 3, change from iteration to iteration.(c)The driving input in (14a) corrects the differences between the real plant and the model used, and it also drives the controller given in (13).(d)Problem does not need to be linear or to have a quadratic cost function.(e)The conditions and are required to be satisfied for the converged optimal control sequence and the converged state estimate sequence. The following averaged 2 norms are computed and then they are compared with a given tolerance to verify the convergence of and :(f)The relaxation scalars are the step sizes in the regulating convergence mechanism. They could be normally chosen as a certain value in (0, 1] but this choice may not result in the optimal number of iterations. It is important to note that the optimal choice of is problem dependent. It is required to run the algorithm (from Step 1 to Step 4) several times. They are initially chosen as for first run of the algorithm (Steps 1–4), and then the algorithm is run with different values chosen from 0.1 to 0.9. The value which provides the optimal number of iterations can then be determined. The parameters and are to enhance the convexity so as to improve the convergence of the algorithm.

4. Convergence Analysis

In this section, we aim to provide a convergence analysis for the proposed computation procedure. The following assumptions are needed.

Assumption 1. (a) The derivatives of , , and exist; (b) the solution is the exact optimal expected solution to Problem with the optimal real output solution .
The convergence property of the approximated solution to the true optimal solution is addressed in the following theorem.

Theorem 2. Let be the converged solution to Problem . Then, there exist a converged output sequence and a sequence of , which is the original output sequence, such that That is, the converged solution is the true optimal solution of Problem and the converged output sequence is the true real output sequence.

Proof. Consider the real system of Problem with the exact optimal expected solution and the optimal real output solution is given by where is the exact optimal expected output solution and is the output noise sequence. Meanwhile, in Problem , the model used consists of Here, taking the adjusted parameters and from (23a) and (23b), the differences between the real system and the model used can be calculated fromat each iteration . Note that (30a) and (30b) establish a matching scheme, in which, for any , there exists a such that, for , Hence, by substituting (30a) and (30b) into (29a) and (29b) and comparing the result yielded to (28a), (28b), and (28c), we conclude that which are, respectively, the optimal expected solution and the optimal real output solution for the original optimal control problem. This completes the proof.

5. Illustrative Example

Consider the wastewater treatment problem [3234]. The process equations, which are assumed to be unknown, are given by with where is the methane gas flow rate, is the substrate output concentration, is the wastewater/dilution substance mix rate, is the input flow rate, is the bacterial growth rate, and is the sampling interval, which is 0.5 seconds. The initial state has a mean given by and its covariance is , where is the two-dimensional identity matrix. The process noise and the measurement noise have zero mean and their covariance is given by and , respectively. Here, the aim is to determine an optimal control sequence such that the cost function is minimized subject to the dynamic system given by (34).

This problem is regarded as Problem . The corresponding simplified model-based optimal control problem, which is referred to as Problem , is given by with the initial state

By applying the algorithm proposed to solve Problem , the computation result is shown in Table 1. There is a 99.27 percent of reduction to the cost function, which gives the final cost 1.1810 units. The graphical results, which present the trajectories of output, state, and control, are shown, respectively, in Figures 1, 2, and 3. It is noticed that the model output sequence tracks closely to the real output sequence with the output residual . Both of the smooth trajectories of state and control show the optimal expected solution to the original optimal control problem.

Now, consider the target sequence that is a periodic square wave, with for the first 48 time points and for the second time points, as discussed in [34]. Let this target sequence be the real state sequence in Problem . Then, the real output sequence is measured and is fed back into the parameter estimation problem. Here, the model used in Problem remains the same as we mentioned above. Figure 4 shows the model output trajectory, which is generated by the algorithm proposed, and tracks the target sequence accordingly.

From the results above, the output sequence, which is obtained by using the approach proposed, is efficient to optimal control of the discrete time nonlinear stochastic system. Hence, the applicability of the approach proposed is highlighted.

6. Concluding Remarks

In this paper, a computational approach was proposed, where the efficient output solution of the discrete time nonlinear stochastic optimal control problem is obtained. In our approach, the model-based optimal control problem is simplified from the original optimal control problem, where the adjusted parameters are introduced into the model used. On this basis, the differences between the real system and the model used are computed, and the integration of system optimization and parameter estimation could be made interactively. Establishment of the matching scheme by feeding back the output sequence that is measured from the real plant into the model used improves the accuracy of the model output sequence. For illustration, the wastewater treatment problem was studied, and the efficiency of the approach proposed is highly proven.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.