Abstract

Engineers strive to realize the goal of control in the best possible way based on a given quality criterion when developing control system. The well-known optimal control problem requires finding a solution as a function of time. Such a function cannot be directly used to control a real object because its application corresponds to an open-loop control system and engineers usually complete it with a feedback stabilization system. Hence, there is an obvious need to reformulate the optimal control problem so that its solution can be directly applied to real objects. The paper presents a new extended statement of the optimal control problem. An additional requirement for the control function is introduced to give the system describing the control object properties that will ensure the stability of solutions. The desired control function must provide for the optimal trajectory given the properties of the attractor in the neighbourhood. The solution to the extended optimal control problem can be directly used to control a real object. The paper presents a computational machine learning approach to solving the extended problem of optimal control based on the application of a synthesized optimal control technique. Examples of the practical solution to the stated problem are given to illustrate the efficiency of the approach, where the solution to the conventional optimal control problem is compared with the proposed extended one in the presence of perturbations in models and initial conditions.

1. Introduction

The conventional optimal control problem [1] is stated in the following form.where , , , is not given, but is limited, , is a compact set, is a given positive number.

In this problem (1) it is necessary to find a control function:such that, if it is inserted into the model equations, then the following system is obtained as follows:and, the partial solution of this system (3) from the given initial condition will hit the terminal condition , with the optimal value of the quality criterion

Further, the optimal control problem statements (1) and (2) have changed many times. Different types of state constraints were introduced into it [2–4], such as the terminal state equality constraint [5], inequality phase constraints [6], and dynamic phase constraints in the task for a group of objects [7]. One more direction is optimal control under uncertainty to control the systems under uncertain influences optimally [8]. But anyway, in all formulations of the problem, it remains in common that control is sought as a function of time (2).

The application challenge is that the optimal control in the form of a function of time (2) cannot be directly applied to a real object because of the existing uncertainties. Sometimes they are not very significant and practically do not affect the value of the functional, but more often, on the contrary, the quality of control can significantly decrease or even not reach the goal at all [9]. Optimal control (2) is open-loop, and any disturbance of the object will lead to the fact that the goal may not be reached and the value of the criterion may not be optimal.

The Bellman problem statement differs significantly from the optimal control problem statement. This problem statement generally leads to the solution of another problem, the control general synthesis problem, when control is sought as a function of state , which is the feedback control principle. But the main challenge here is that the continuous-time nonlinear optimal control problem is solved through the Hamilton–Jacobi–Bellman equation, which is a nonlinear partial differential equation. Even in simple cases, the HJB equation may not have global analytic solutions. In the general formulation, the synthesis problem does not yet have a general solution method. In this feedback case, it is necessary to look for an optimal control function that puts the object into a terminal state from a set of initial conditions. The dynamic programming method, as the main numerical approach for the Bellman problem statement does not find such a function but instead finds a set of control vectors for a set of state vectors. Various numerical methods based on dynamic programming have been proposed in the literature [10, 11], including modern adaptive dynamic programming techniques [12, 13] and reinforcement learning [14, 15] that approximate the needed function using machine learning techniques, but still the solution is searched from one initial state, and in this case, this solution is not much different from the solution in the form of a function of time [16]. However, the main drawback of dynamic programming methods today is still the computational complexity required to describe the value function which grows exponentially with the dimension of its domain.

Since the original optimal control problem is much easier than the Bellman synthesis problem, and there have been accumulated a vast pool of solution methods both direct and indirect ones, so it is tacitly accepted among control specialists such a strategy that they firstly find the optimal control, that is a partial solution to the equation (3), which is the optimal program trajectory, and then stabilize the movement of a real object along the optimal trajectory, like in [17, 18]. There are many different approaches to solve stabilization problem. In the work [19], based on the proposed c-nonholonomic trajectory approach, the arbitrary configuration stabilization control problem is solved under input saturation. The authors of [20] addressed the stabilization problem by two-step design controller, on the first step, robust state feedback point-to-point stabilization control is designed; in the second, the controller is modified so as to address the tracking problem for reference trajectories. In these and other works on the synthesis of the stabilization system, the reference trajectory is used and not the optimal one, since most likely the optimality of the trajectory will not be preserved when the stabilization system is introduced.

There are several reasons for this. First, when constructing a motion stabilization system along a program trajectory, some additional control resource may be required; however, they may simply not be enough, since all control resources were involved in calculating the optimal control. Second, the system for stabilizing the movement of an object along a program trajectory will change the mathematical model of the control object, and the optimal control for the object without a stabilization system will not be optimal for the object with a stabilization system. Third, the movement of the object along the optimal trajectory and the movement of the object in the vicinity of this trajectory at any amount of deviation from the trajectory can have completely different values of the quality criterion, which is clearly illustrated in a wonderful example by Young in [21].

Most importantly, the construction of these stabilization systems does not follow in any way from the formulation of the optimal control problem. When developing a stabilization system, requirements are put forward that do not follow from the initial formulation of the optimal control problem and are not substantiated in any way from the point of view of the problem.

As a result of the initial impractical formulation of the optimal control problem, when an open-loop control is to be searched, the optimal control problem in its conventional mathematical statement is less and less solved when creating new autonomous devices. Today we observe that mathematical research on the optimal control problem is increasingly moving away from the practice of their implementation. Engineering approaches become much more popular, but they are moving away from optimality, paying much attention to stability. From what has been said, it is clear that the problem of optimal control lacks some condition or requirement for the control being developed, which would provide the found optimal control with the same property that the feedback system gives.

The novelty of this work is in a new extended formulation of the optimal control problem. It differs from the classical formulations in that additional requirements are introduced, which in practice lead to the need to build a stabilization system. An additional requirement for the optimal trajectory is introduced to give the system describing the control object such properties that will ensure the stability in the neighbourhood of solutions. As a result, the controls found as a result of solving the extended optimal control problem will have such a qualitative property, which is also provided by the feedback system, and in this case, they will be mathematically justified.

Thus, to solve the proposed extended optimal control problem statement, some stabilization system is to be introduced into the object model, so the optimal control will be calculated already for the new model of the control object, which will include a stabilization system and ensure the movement of the object along the optimal trajectory so that the possible error does not grow over time.

The presented optimal control problem in its extended formulation can be solved by well-known analytical and numerical methods that allow calculating the stabilization system of an object and substituting it directly into the model for further calculation of optimal control. In the paper, we demonstrate one numerical approach that allows solving the stated optimal control problem using machine learning. The approach is based on the application of symbolic regression for stabilization system synthesis and evolutionary methods for optimal control calculation. The examples are provided to demonstrate the correctness of the proposed problem statement for different optimal control tasks, including the uncertainties.

2. Extended Statement of the Optimal Control Problem

Let us introduce the following extension to the optimal control problem statements (1) and (2). In order to take into account possible perturbations of the object or initial conditions, we must formulate the optimal control problem in such a way that all possible trajectories will be in some given area from the optimal trajectory and reach the terminal state.

Thus, to solve the problem of optimal control (1), it is necessary to find the control function in the following formsuch that if this control function is inserted into the right side of the mathematical model of the controlled object, then the resulting system of equations

must have the following properties:(a)the partial solution of the system (6) from the given initial condition will hit the terminal condition , with the optimal value of the quality criterion.(b)and for other partial solutions , , the following condition is true: if for the given and  then for , (c)there is a -neighborhood of the terminal state such that, for any partial solution in -neighborhood the following conditions is performed: if  then ,  and , when where is a given accuracy of a terminal state achievement.

This additional requirement means that the optimal trajectory received as a result of the solution of the new optimal control problems (1), (5)–(11), will have an additional special property. It will attract all the nearest partial solutions: if the solution gets into some delta tube near the optimal solution, then it will not leave this tube, and moreover, in the vicinity of the target terminal point, it will shrink towards it. Thus, the resulting optimal control will be insensitive to certain model inaccuracies or perturbations, which will make it possible to implement it directly on board.

Let us notice that the optimal solutions stable, according to Lyapunov [22] corresponds to this requirement. We can even say that in the extended setting we are looking for an optimal control that ensures the stability of motion according to Lyapunov. However, the difficulty lies in the fact that there are no general recommendations or general approaches on how to find such a solution. Existing works mostly provide analysis rather than synthesis of such motions [23]. Most often, stability at a point is ensured, and not stability of movement because the difficulty here is that the stability of movement must be ensured in time. Usually, engineers make stability at a point located on the optimal trajectory. However, stability must also be ensured according to the quality criterion, because for example, you can move to a point in a straight line or you can move in a spiral. Both movements are stable, so the quality of the movement must also be taken into account. So, compliance with both stability and quality of movement is ensured by the introduced requirements (7)–(11).

Note, that the solution of the general synthesis problem, when it is possible to solve it for a certain class of models and functionals, such as analytical design of optimal controllers [24], backstepping [25] and others, is one of the ways to solve this extended problem statement as well, finding the optimal control that satisfies the introduced additional requirements. But there is still a need to develop some generic approaches.

The extension of the optimal control problem requires the development of new general approaches to its solution for any generic nonlinear control object.

One of the promising techniques, that we propose, is a two-stage approach to synthesized optimal control [26]. Initially, a stabilization system is introduced into the model of the controlled object, which will ensure the existence of a stable position for the object. Next, the optimal control is calculated for a stable plant so that the obtained solutions will satisfy the requirements (7)–(11). The method is based on the application of machine learning methods, and therefore, it is not tied to the type of object model and can be considered a general approach. Moreover, it is computationally easier than the numerical solution of the general synthesis problem.

Let us consider in more detail this synthesized approach to solving the extended optimal control problem.

3. Synthesized Approach to the Solution of the Extended Optimal Control Problem

The approach is called synthesized because in the first stage the stabilization control synthesis problem is solved and stability property is provided for the object in some domain:where control function has the following properties:

The properties (13) indicate that for the system there is always a stable equilibrium point , and the equilibrium point possesses attractor properties, since near this point all solutions converges.

In the second stage, the following optimal control problem is solved.

Here, it is necessary to find a control function as a function of timesuch that the systemwill have a partial solution that from the given initial condition will reach the given terminal condition , , with the optimal value of the given quality criterion

Theorem 1. Let , and points such that for the functionwhere

A partial solution , and . Then a solution of the extended optimal control problem (1), (5)–(11) in the class of functions (18) exists.

Proof. According to the condition of the theorem, the function is a piecewise constant time function, which at each interval switches the position of the stable equilibrium point . If it were possible to find such intervals and vector values at each interval that the partial solution of the differential equation (16) from the initial state will hit into the terminal , then this solution is always in the area of attraction of a stable equilibrium point . Therefore, it satisfies the additional conditions (7) and (8) of the extended optimal control problem, since the distance between two any partial solutions of the differential equation located in the attraction region of the stable equilibrium point decreases over time. And, the last point ensures the presence of the stability property of the terminal condition and the fulfilment of conditions (9)–(11). Thus, if such a partial solution is the only one, it is the solution of the extended optimal control problem. If there are several such solutions, then the optimal solution is one that gives the lowest functional value (17). This completes the proof.
Here, we assume that the feasibility of mathematical models in practice is possible only when the error between the model and the real object decreases over time [27]. Since in our case the solution is always in the neighbourhood of a stable equilibrium point, over time the distance between the two solutions, true and erroneous, is reduced; therefore, this solution of the optimal control problem has the property of feasibility.
Note, that the first task (12) of stabilization control synthesis can be solved by various methods, depending on the complexity of the object model, particularly analytical methods of backstepping [25, 28] or analytical design of aggregated controllers [29], or synthesis based on the application of the Lyapunov function [30], as well as any classical methods for linear systems like modal control [31]. Most of the known engineering methods today specify the control function with an accuracy of the parameter values, for example, methods associated with the solution of the Bellman equation, like analytical design of optimal controllers [24], as well as the use of various controllers, including PID or PI controllers [32], as well as modern artificial neural networks [33–35] that are also the techniques with the given structure and only parameters are tuned. In the overwhelming majority of cases, the stabilization synthesis problem is solved analytically or technically, taking into account the specific properties of the mathematical model.
Today, modern numerical machine learning methods of symbolic regression can be applied to find a solution for generic dynamic objects [36, 37]. Further in the computational example, we use these symbolic regression methods for stabilization system synthesis.
As far as the additional requirement is satisfied then the second task (14) of optimal control can be solved using both direct and indirect approaches. The direct approach reduces the optimal control problem to a nonlinear programming problem. This provides the transition from an optimization problem in an infinite-dimensional space to an optimization problem in a finite-dimensional space. Another approach, which is often called the indirect one, is to use the Pontryagin maximum principle. As a result of using this principle, the problem is transformed into a boundary-value problem.where .
Optimal value is defined from maximum of Hamiltonian (20)Next, we will consider a couple of examples to demonstrate the importance of the introduced requirement for the optimal control in the sense of its feasibility.

4. Computational Examples

4.1. Example 1

Consider a well-known optimal control problem, discussed by Pontryagin in [1].where

Let the given initial and terminal conditions have the following valueswhere is not given but it is limited .

The goal is to hit to the terminal condition (25) from the initial state (24) with the shortest time.

Initially, let us remind the conventional analytical solution of this problem. The optimal control function has the following form.

The switching point was determined by the moment when the zero extremal was reached. A value of the functional (26) for the optimal solution is .

The Figure 1 shows the optimal trajectory.

The Figure 2 presents the received optimal control.

Now, we are solving the extended optimal control problem by the described two-stage approach.

For stabilization of the control object the following control function describing proportional regulator was used.where

, are constant coefficients

When substituting the control function (28) and (29) into the model equations (22) the following system is obtained as follows:

Now, we should find an equilibrium point for this model.

We check the stability of the system (30)

As can be seen from (32) for any values and an equilibrium point is always stable, if , and . Therefore, let the coefficients will be , .

Now the optimal control problem (14) is to be solved. Let us set the time interval .

A number of intervals is equal

It is necessary to find constant values that define a new control in every interval .

The following restrictions are set for values of the new control .

Any nonlinear programming algorithm can be used to search for the optimal control. In this work we used particle swarm optimization (PSO) algorithm [38]. As a result, the following optimal values of control were obtained as follows:

A new control function has the following formwhere , .

A value of the functional (26) for the found optimal solution is . The optimal trajectory is presented in the Figure 3. The optimal control for the extended statement of the optimal control problem is presented in the Figure 4.

Optimal points , with optimal trajectory on the plane are presented in Figure 5.

At first glance, both obtained solutions for the conventional optimal control problem and the extended optimal control problem are very similar to each other. To compare these solutions from the point of view of their feasibility, let us check the obtained optimal controls (27), (28), and (35) in the presence of uncertainties. For this we introduce perturbations into the initial conditions and right parts of the model of the control object.

The model of the control object with perturbations in the conventional problem statement has the following form:where is a constant value of perturbations level, is a random function, which returns a random value from to at every call.

The model of the control object with perturbations in the new extended statement has the following form:

For both of the models (36) and (37) the initial conditions also were perturbedwhere is a constant level of perturbations for initial conditions.

The results of the computational experiments with small perturbations and are shown in Table 1.

As can be seen from Table 1, in all experiments with small perturbations, the values of the functional for the conventional optimal control were about 4% worse than for the new optimal control.

4.2. Example 2

Now, let us consider the optimal control problem for a mobile robot moving on the plane with two obstacles. The mathematical model of the robot is described by the following differential equation system [39].where the control is restricted .

The initial condition is set .

The terminal condition is set , where the terminal time is not given, but it is limited, .

The obstacles are described by the following inequalities:where , , , , , .

It necessary to find the optimal control taking into account the restrictions on the control values, such that the control object from the specified initial condition has reached the defined terminal condition not hitting the given obstacles in a minimum time.

The goal functional includes penalties for violation of phase constraints and for accuracy of achieving the terminal conditions.where , , is a Heaviside step function.

First, consider the solution of the optimal control problem as searching for a control function as a function of time. To solve the conventional optimal control problem the direct approach is used. For this purpose, the time axis is divided on intervals .

The control function for each control interval was searched in the form of a piecewise linear approximation.wherewhere , , , is a vector of searched parameters.

It was necessary to find totally parameters. To find the optimal parameter vector, the PSO algorithm was applied. During the search process the parameters were also restricted .

In the result, the following optimal parametric vector was found.

The value of the functional (41) was . Trajectory on the plane is presented in the Figure 6. As can be seen, a mobile robot reached the terminal position without violating phase constraints.

Now, the extended optimal control problem is solved by the synthesized optimal control method.

First, the control synthesis problem is solved. For this purpose, a symbolic regression method of the network operator [35] was applied. The network operator matrix had dimensions , and twenty elementary functions with one argument and two functions with two arguments were used. To synthesize a stabilization control system numerically, it was necessary to achieve the terminal position from 216 initial positions. This means that we take a certain region of initial conditions and find such a feedback function that will reach a given terminal position from various points in this region.

As a result, the following control (43) was obtained, where

On the second stage the control function is searched. This control function is searched in the form of piecewise constant function of time. The time axis was divided into intervals and the vector was found on criterion (41). It was necessary to find 30 parameters. PSO was also used as optimization algorithm. During the search, the following restrictions on parameters were applied

In the result, the following solution was found

The value of the functional (41) for the found solution was . The trajectory on the plane is presented in the Figure 7. As can be seen from the Figure 7, the mobile robot has achieved the terminal position without violating the given phase constraints. In the Figure 7 small black squares are projections of the found vectors , on the plane . The comparison of trajectories in the Figures 6 and 7 show that the optimal control problem with such phase constraints has two optimal solutions.

To assess the feasibility of the obtained controls let us introduce perturbations in the modeland in the initial conditions.

The results of the experiments are presented in Table 2. Ten trials for each value of and were conducted. In Table 2, is a mean value of the functional (41) and is its standard deviation for the perturbed model (49) and initial condition (50) with the optimal control (43)–(45), and are correspondingly a mean value of the functional (41) and its standard deviation for model (44) and (45) with the optimal control (43), (46), and (48).

So, two examples were considered. One is a classical mathematical problem from Pontryagin’s monograph; the other is a more applied example from the field of motion control of mobile robots. In both examples, the problem was solved by the proposed method of synthesized control and by the classical direct method. If we do not consider perturbations, then the results obtained by both methods are comparable, both in the value of the functional and in the shape of the trajectory. Next, the efficiency of the obtained controls was studied in the presence of perturbations. As we can see from the examples, especially for the mobile robot, the conventional optimal control is very sensitive to disturbances, especially in the initial conditions. This means that in this form, it is not suitable for implementation on real objects. This is the main essence of the extended setting proposed in the paper, that is, to search for optimal control in a class of controls that are insensitive to certain perturbations and would be realizable on board.

5. Conclusions

A new extended formulation of the optimal control problem is presented in the paper. In the new formulation, it is necessary to find a control function that should provide for an optimal solution a special property so that the error due to uncertainties does not grow with time. In other words, it is necessary to solve the optimal control problem so that the obtained solution is feasible, that is, it preserves the optimality with some accuracy of the value of the functional under certain perturbations. An extended formulation of the optimal control problem allows one to mathematically determine the requirements that a real system must satisfy, as is provided in systems with feedback.

Now various, including engineering, approaches to the creation of applied control systems can be tied to a common mathematical formulation. In general, this is a question for further research. It is possible to solve the problem of optimal control in the extended formulation using various approaches, including various methods for synthesizing optimal control, synthesizing a stabilization system, or, as suggested in the paper, based on a synthesized optimal control. Initially, the stabilization system synthesis problem is solved and the object becomes stable relative to some point of equilibrium in the state space. The object is then optimally controlled by moving the position of the stable equilibrium point. Examples of solving a new extended optimal control problem are provided. It has been experimentally shown that the obtained optimal control is less sensitive to small disturbances than the control obtained as a result of the solution of the conventional optimal control problem. In the future, the developed approach for solving the extended formulation of the optimal control problem is planned to be applied to more diverse and complex object models, such as groups of objects.

In the future, the developed approach for solving the extended formulation of the optimal control problem is planned to be applied to more diverse and complex object models, such as groups of objects. Concerning the proposed method of synthesized optimal control, it is also necessary to study each of the stages in more detail. For example, at the first stage, to solve the synthesis problem, one can investigate the influence of the choice of the quality criterion on the quality of stabilization, or one can study the solution of the optimal control problem at the second stage for a set of initial conditions that should give a result more resistant to perturbations.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Ministry of Science and Higher Education of the Russian Federation (Project No. 075-15-2020-799).