Abstract

Input and state constraints widely exist in chemical processes. The optimal control of chemical processes under the coexistence of inequality constraints on input and state is challenging, especially when the process model is only partially known. The objective of this paper is to design an applicable optimal control for chemical processes with known model structure and unknown model parameters. To eliminate the barriers caused by the hybrid constraints and unknown model parameters, the inequality state constraints are first transformed into equality state constraints by using the slack function method. Then, adaptive dynamic programming (ADP) with nonquadratic performance integrand is adopted to handle the augmented system with input constraints. The proposed approach requires only partial knowledge of the system, i.e., the model structure. The value information of the model parameters is not required. The feasibility and performance of the proposed approach are tested using two nonlinear cases including a continuous stirred-tank reactor (CSTR) example.

1. Introduction

Constraints on input and state commonly exist in chemical processes due to finite capability of the actuators [1], safety limits [2], requirements on product quality, and environmental regulations [3]. For example, in practice, it is required to keep the outlet species concentrations of a chemical reactor in reasonable and stable ranges. In addition, the value of manipulated variables is constrained in a certain range defined by the operating instructions. Therefore, the ability to handle constraints is an essential concern in the control design and synthesis of real chemical processes [4, 5].

The study of constrained optimal control has undergone different stages and can be classified into different categories, e.g., problem transformation methods, Lyapunov function-based methods, state-dependent Riccati equation (SDRE) methods, model predictive control (MPC), and machine learning-based methods, to name a few. Since the 1960s, numerous approaches have been proposed to handle the input and/or state constraints in the optimal control of linear/nonlinear systems. Leitmann [6] and Bryson and Denham [7] proposed penalty-function technique for the optimal control of systems with state-variable inequality constraints; the solution of the penalized optimal control problem converges to the constrained optimal control problem if the penalty multiplier approaches infinity [8]. Sakawa [9] transformed the optimal control of linear systems with input constraints into an infinite-dimensional nonlinear programming problem by integrating the linear differential equation of the system states. Jacobson and Lele [10] used slack variable method to eliminate the scalar inequality constraints on the state variables. Hager [11] applied the Ritz–Trefftz method in the optimal control of system with both state and control constraints. Vlassenbroeck [12] transformed the state-variable inequality constrained optimal control problem into a parameter optimization problem using Chebyshev series expansion. Lim et al. [13] generalized the separation theorem to the constrained linear-quadratic (LQ) and linear-quadratic-Gaussian (LQG) optimal control problem. Manousiouthakis and Chmielewski [14] approximated the optimal control of nonlinear systems subject to pointwise-in-time inequality constraints in the SDRE framework. El-Farra and Christofides [3] developed a unified framework for constrained optimal control of nonlinear uncertain systems based on a general state-space Lyapunov approach. Balestrino et al. [15] designed a control Lyapunov R-functions (CLRF) solution for constrained linear systems with input and state constraints. Kiefer et al. [16] first transformed the input and state constraints into the constraints on the outputs and their higher-order derivatives, and then, special saturation functions were utilized to incorporate the constraints into overall control design. Stathopoulos et al. [17] studied the linear-quadratic-regulation (LQR) with input and state constraints. By using proximal algorithms and duality, they decomposed the corresponding quadratic program (QP) into two subproblems including an infinite-dimensional least squares problem and a simple clipping of an infinite sequence to the nonpositive orthant.

MPC is a standard tool to handle input and state constraints within an optimal control setting [15], especially in industry applications [18], e.g., paper and pulp [19], minerals processing [20], chemical engineering [2123], renewable energy [24], mechatronics engineering [25], and urban water supply [26]. One problem in the MPC implementation is the high computational load for large-scale or fast-sampling systems. Scokaert and Rawlings [27] solved the constrained LQR problem in a finite-dimensional MPC setting with optimality and stability. Wang and Wan [28] developed a structured neural network to solve the QP problem in a constrained MPC. Bemporad [29] moved the MPC computations offline and proposed a technique to obtain the piecewise explicit optimal control law for both MPC and constrained linear-quadratic-regulation (CLQR). By offline approximation, Pin et al. [30] designed a MPC scheme for constrained nonlinear discrete-time systems which allows coping with discontinuous control laws. Mhaskar [31] combined control Lyapunov function (CLF) and MPC to form a Lyapunov-based MPC approach that guarantees stability and constraint satisfaction from an explicitly characterized set of initial conditions. More detailed review of constrained optimal control could be found in [3234] and the references therein.

Application of the aforementioned methods relies on the full knowledge of system dynamics. With the increased unavailability of quality raw materials, it is imperative that raw materials of low grade with large variations should be employed in the production in order to maximize the use of resources. In this context, the operation of some chemical processes exhibit complexity in terms of variable dynamic characteristics, strong nonlinearities, heavy coupling, unclear mechanism, and mathematically unmodelable parts. Therefore, a common situation is that a chemical process has various working conditions. The structure of the process can be derived by applying conservative laws and its essential physicochemical mechanisms. However, the model parameters are unknown under some working conditions, e.g., insufficient data samples for model identification and undetermined reaction mechanism. This constitutes challenges to the existing control theory and technology.

In recent years, the development of machine learning supported and gave rise to the emergence of data-driven constrained optimal control methods and the integration of machine learning and MPC [35]. Chakrabarty et al. [36] used a support-vector machine (SVM) to learn feasible region boundaries for an explicit nonlinear model predictive control (ENMPC). Lin and Zheng [37] devised a reinforcement learning agent to obtain the optimal control strategy for nonlinear systems with inequality constraints on input and state via cycle-by-cycle finite-time optimization. Full knowledge of the system model is still required in this approach. Abu-Khalaf and Lewis [38], Modares et al. [39], Luo et al. [40], Yang et al. [41], Zhang et al. [42], and Zhu et al. [43] used adaptive dynamic programming (ADP) [4446] to approximate the optimal state-feedback controller for input-constrained nonlinear systems. Fan and Yang [47] applied ADP to approximate the optimal control law for state-constrained nonlinear systems. These ADP-based approaches are model-free or do not need partial knowledge of the system model; however, they provide the optimal control solution for either input-constrained or state-constrained systems. Chi et al. [48] studied the constrained data-driven iterative learning control (ILC) for point-to-point optimal control problem of discrete-time nonlinear systems with input and output constraints.

The aim of this study is to design an approximated optimal control algorithm for continuous-time nonlinear chemical processes with both state and input constraints. The proposed algorithm requires only the structural knowledge of the system model, the value of the model parameters is not needed, which is a common situation in industrial applications. The input and state constraints were handled in a sequential manner. The state constraints were first eliminated by introducing slack functions to form an augmented system without inequality state constraints. Then, a nonquadratic performance integrand was adopted in the ADP framework to account for the input constraints in the augmented system.

The rest of this paper is organized as follows. In Section 2, the constrained optimal control is formulated, some preliminaries are introduced. The constraint handling approach and the approximated optimal control design are described in Section 3. Simulation results are presented and discussed in Section 4, followed by concluding remarks in Section 5.

2. Problem Formulation and Preliminaries

2.1. Formulation of the Nonlinear Constrained Optimal Control Problem

Consider a chemical process described by following continuous-time constrained system:where is the system state and is the manipulated input; and are differentiable and Lipschitz continuous functions. and are compact and contain the equilibrium point, i.e., the origin in their interiors. More specifically, and denote the hard constraints on the states and inputs, respectively. is a Lipschitz continuous function of x, which demonstrates the physical or technical constraints on the system states. , , , and are the maximum and minimum value vectors of and input variables.

For chemical processes described by (1) with system state and input constrained in and , the aim of optimal control is to find a control policy which minimizes the following performance index:where is the value function and and are positive definite weighting functions of the immediate states and inputs, respectively. is an overall index of the control performance. A lower value of indicates better control performance. evaluates the deviation of the system state with the origin, e.g., the difference between the actual outlet concentration of a chemical reactor with its setting value. accounts for the control effort to be placed on the process, e.g., the dosage of additive for the reactor to change the outlet concentration. is the time interval of interest.

2.2. Preliminaries on Optimal Control of Nonlinear Systems

To start with, the preliminaries on optimal control of nonlinear systems are introduced without considering the system and input constraints. When the system is unconstrained, and are usually selected as and , with and , the symmetric weighting matrices. The solution of the unconstrained optimal control problem can be obtained by solving a Hamilton–Jacobi–Bellman (HJB) equation:with the boundary condition . is the partial derivative of the cost function V regarding system state x. If the HJB equation has an optimal solution , then the optimal control is

The HJB equation involves a nonlinear differential equation whose analytic solution is difficult to obtain. However, for the HJB equation, if a control policy u has been improved using to generate a better policy , then corresponding to the improved policy can be used to yield an even better policy . This approach of solving the optimal control policy is called policy iteration, which can gradually converge to an optimal policy and optimal value function.

Therefore, successive approximation methods are developed to iteratively improve the control policy [49]. For system (1), starting from an initial admissible control , two sequences of and could be generated via the policy iteration (PI) approach defined in [49]:(1)For , solve the Lyapunov equation (LE) which is linear in V:(2)Use the solution to update the control policy:and(1)(2) is admissible(3)if and exist, and [50]

The above is the general iterative framework for solving the optimal control problem of nonlinear processes. Different from the unconstrained idea case, for practical chemical processes, the evolution trajectory of the system must obey the constraints. So, the feasible region of state and input is shrunk. These constraints must be taken into account in the optimal control design in order to avoid violating physical or technical limits.

3. Approximated Constrained Optimal Controller Design with Partially Known System Dynamics

In this section, an optimal controller design approach without knowing the full system dynamics is proposed for constrained nonlinear systems. The state constraints are eliminated first by introducing slack functions. Then, the input constraints are managed by using ADP in the optimal control of the resulted augmented system. The iterative solution procedure introduced in Section 2 relies on the precise knowledge of the system dynamics and . In order to eliminate this dependence, in this section, an approximated optimal control is studied and designed.

3.1. Handling State Constraints Using Slack Functions

Consider the state constraint ; it contains the following inequality constraints:

There exists slack function ([10, 47]), which satisfy, for each state constraint, where could be , , or , and according to the structure of and the value of and , the slack function could be selected as or , .

By using (8), the inequality state constraints could be transformed into equality state constraints. For example, consider a single inequality constraint:

Ifthen is chosen that

In addition, the relationship between the time derivative of and the input could be obtained by differentiating (8) times until the appearance of input u:where is the jth derivative of and is the th derivative of .

Denote the jth order derivative of as , and decompose as . If there exist k () state constraints and () slack variables, then combining (12) with system (1) could form the following augmented system:where

From (8), for each i, if exists and is bounded, then the state constraints are not violated. Therefore, regarding the following performance index, if there exists an optimal controller for system (13) without state constraints, then is also optimal for the state-constrained system (1):where and with a dimension .

3.2. Handling Input Constraints by Using Nonquadratic Performance Integrand

In order to incorporate the control constraints in the optimal control design, the nonquadratic sufficiently differentiable performance integrand could be applied [51]:where are vectors of bounded, monotonic functions ( and ) to approximate the hard constraints

The corresponding Hamiltonian and HJB equations using the nonquadratic performance integrand arewhere is the Lagrange multiplier and .

If the solution of the HJB equation exists, the optimal control law iswhich can be approximated by using a computational approach based on PI [38]:(1)Choose an initial stabilizing admissible controller .(2)For , solve the following LE for (3)Update the control law:

3.3. Approximated Optimal Control Design

In Section 3.1 and 3.2, the problems caused by the constraints on the states and inputs are eliminated, respectively. However, the PI denoted by (22) and (23) relies on the full knowledge of the system dynamics. To eliminate this requirement, a computational approach [50] is applied and tailored. For denotation simplicity, in this subsection, , , , and are denoted as x, , , and , respectively.

To start with, decompose u asthenwhere u is the control forced actually on the system and is the control policy to be improved iteratively.

Consider a cost function satisfying (22):

Integrate on a time interval :

Assume and are two infinite sequences of linearly independent smooth basis functions on , where and for all . Approximate the cost function and the control policy aswhere , , andwherewith

In (28)–(32), and are two sufficiently large integers. Substitute equations (28)–(32) into (27), and denote t and as and :

In (33), there are variables to be determined in and , considering a sufficient long time sequence with ; then, according to (33),whereand is the approximation error.

From (34), ifwhere , then and can be obtained aswhere .

Equation (41) along with (28)–(32) serves as a computational approach to approximate and generate an improved controller when an admissible control policy is given. By using this approach iteratively, starting from an admissible bounded control , a control sequence could be obtained, and it satisfies the following theorem.

Theorem 1. Consider system (1) with both state and input constraints, and augment it as (13) using an appropriate number of slack functions; if there exists an optimal controller and Assumption 1-2 hold, thenwhich is generated by the PI ((41), (28), and (29)), is an admissible control sequence for (1) on . The cost function satisfies the following Lyapunov equation (LE)and then for . In addition, and uniformly on .

Proof. The proof of this theorem follows the same lines of reasoning as in the proof of Lemma 1 and Theorem 1 in [38] and is omitted here for brevity.

Assumption 1. There exist and , such that for all ,where is the kth row of .

Assumption 2. The closed-loop system is composed of (18), andis ISS (input-to-state stable) [52] when the exploration noise e is considered as the input.

4. Case Study

In this section, two cases, including a CSTR (continuous stirred-tank reactor) model [14], are selected to test the feasibility and performance of the proposed control design.

4.1. A Nonlinear System

Consider the following nonlinear system:with state and control constraints and , and initial condition .

To start with, one slack function was introduced:

According to (11) and (12), and . Denote , and combine with the original system:

For simulation purpose, we set , so the augmented state is not taken into account in the value function, ; so, the value function mainly concerns the deviation between the actual state and its setting point, . The basis functions for approximation of value function and control input were selected as polynomials of system states:

The initial weight . The initial weight was determined using the structural information of system model and the basis functions. The actual input was set such that with the exploration noise

The ADP iteration started after the learning finished at . After 20 iterations, the difference between and was less than , and the ADP algorithm stopped. The resultant approximated optimal controller was .

The state trajectories driven by the initial admissible controller and the approximated optimal controller were given in Figure 1. The approximated cost difference was illustrated in Figure 2. The control input using the two controllers were shown in Figure 3. From the figures, both the initial admissible controller and the approximated optimal controller can force the system to the equilibrium point and . However, the state trajectories of and driven by the approximated optimal control has less fluctuations and less approximated control cost compared with the initial admissible controller.

4.2. A CSTR Model

Consider a reversible reaction taken place in a CSTR, and the forward and reverse kinetic constants with respect to species B are and , respectively. The dynamics of the process iswhere , , and are the deviation of the outlet concentrations of species A and B and the inlet concentration of species A, respectively. and are some constant variables. V and F are the volume and inlet flow-rate of the reactor, respectively. The constraints, initial conditions, and the values of model parameters were shown in Table 1.

According to Section 3.1, two slack functions were introduced first to transform the inequality constraints into equality constraints:

The values of and were calculated by setting both , and , as 0: and . The initial value of and were calculated using the initial value of and : and . By differentiating (51) and (52), the derivatives of and were obtained:

Then, (54) was differentiated until the appearance of input u which is present in :and .

Denote , , and as , , and , and combining the original system with the derived equality constraints, the following augmented system was obtained:

In the simulation, ; so, the augmented states were not considered when evaluating the control performance. Compared with the evolution of outlet concentrations to their set points in time, the selection of input variables is not that important, so and . The basis functions in include , , , , , , , , , , , , , , , and . The initial weight . The exploration noise was set such that with .

The ADP iteration started after the learning finished at . After 20 iterations, the difference between and was less than , and the ADP algorithm stopped. The resultant approximated optimal controller was .

The state trajectories of and under the initial admissible controller and the approximated optimal controller were given in Figures 4 and 5, respectively. The approximated cost difference was illustrated in Figure 6. The control input using the two controllers were shown in Figure 7. Compared with the initial admissible controller, the approximated optimal controller could force the CSTR to the desired working point and with less steady-state error and better performance regarding to the performance index.

In the simulation, structural knowledge of the system model was required to derive the explicit formulations of (47) and (56). However, the numerical value of the slack variables can be obtained without knowing the full system dynamics; only the model structure is required. Thus, via learning from online input and state information, the performance of the controller was improved.

5. Conclusions

An approximated optimal control approach for chemical processes with both input and state constraints was proposed. Its feasibility and performance were tested via two nonlinear examples. The proposed approach requires only the structure knowledge of the system model. The value information of the model parameters is not needed. This indicates the proposed approach can be applied to practical systems or a single unit (like a CSTR in this study) with determined model structure but has unknown parameters under some working conditions. However, the proposed approach is applicable only when Assumptions 1 and 2 hold. The global stability issue still needs future study.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (grant no. 61703441).