Abstract
This paper addresses a distributed model predictive control (DMPC) scheme for multiagent systems with improving control performance. In order to penalize the deviation of the computed state trajectory from the assumed state trajectory, the deviation punishment is involved in the local cost function of each agent. The closed-loop stability is guaranteed with a large weight for deviation punishment. However, this large weight leads to much loss of control performance. Hence, the time-varying compatibility constraints of each agent are designed to balance the closed-loop stability and the control performance, so that the closed-loop stability is achieved with a small weight for the deviation punishment. A numerical example is given to illustrate the effectiveness of the proposed scheme.
1. Introduction
Interests in the cooperative control of multiagent systems have been growing significantly over the last years. The main motivation is the wide range of military and civilian applications, including formation flight of UAV and automated traffic systems. Compared with the traditional approach, model predictive control (MPC), or receding horizon control (RHC) has the ability to redefine cost functions and constraints as needed to reflect changes in the system and/or the environment. Therefore, MPC is extensively applied to the cooperative control of multiagent systems, which makes the agents operate close to the constraint boundaries and obtain better performance than traditional approaches [1β3]. Moreover, due to the computational advantages and the convenience of communication, distributed MPC (DMPC) is recognized as a nature technique to address trajectory optimization problems for multiagent systems.
One of the challenges for distributed control is to ensure that local control actions keep consistent with the actions of others agents [4, 5]. For the coupled systems, the local optimization problem is solved based on the states of its neighborsβ at sample time instant using Nash-optimization technique in [6]. As the local controllers lack of communication and cooperation, the local control actions cannot keep consistent [7, 8]. require each local controller exchange information with all other local controllers to improve optimality and consistency based on sufficient communication. For the decoupled systems [9], exploits the estimation of the prediction state trajectories of the neighborsβ [10]; treats the prediction state trajectories of the neighbor agents as bounded disturbance where a min-max optimal problem is solved for each agent with respect to the worst-case disturbance. In [11, 12], the optimal variables of the local optimization problem contain the control action of its own and its neighborsβ which are coupled in collision avoidance constraints and cost function. Obviously, the deviation between the actions of what the agent is actually doing and of what its neighbor estimates for it affects the control performance. Sometimes the consistency and collision avoidance cannot be achieved, and the feasibility and stability of this scheme cannot be guaranteed [13]. proposes a distributed MPC with a fixed compatibility constraint to restrict the deviation. When the bound of this constraint is sufficiently small, the closed-loop system state enter a neighborhood of the objective state [14, 15] give an improvement over [13] by adding deviation punishment term to penalize the deviation of the computed state trajectory from the assumed state trajectory. Closed-loop exponential stability follows if the weight on the deviation function term is large enough. But the large weight leads to the loss of the control performance.
A contribution in this paper is to propose an idea to reduce the adverse effect of the deviation punishment on the control performance. At each sample time, the value of compatibility constraint is set as the maximum value of the deviation of the previous sample time. We give the stability condition to guarantee the exponential stability of the global closed-loop system with a small weight on the deviation punishment term, which is obtained by dividing the centralize stability constraint as the manner of [16, 17]. The effectiveness of the scheme is also demonstrated by a numerical example.
Notations
is the value of vector at time is the value of vector at a future time , predicted at time is the absolute value for each component of . For a vector and positive-definite matrix , .
2. Problem Statement
Let us consider a system which is composed of agents. The dynamics of agent [11] is where , , and , are the input, state, and state transition function of agent , respectively. , . The sets of feasible input and state of agent are denoted as and , respectively, that is, At each time , the control objective is [18] to minimize with respect to , , where , ; , ; , . , , and , . Then, where , . is the equilibrium point of agent , and is the corresponding equilibrium point of all agents. , . The models for all agents are completely decoupled. The coupling between agents arises due to the fact that they operate in the same environment, and that the βcooperativeββ objective is imposed on each agent by the cost function. Hence, there are the coupling cost function and coupling constraints [19]. The coupling constraints can be transformed to coupling cost function term directly or handled as decoupling constraints using the technique of [15]. In the present paper we will not consider this issue.
The control objective for all system is to cooperatively asymptotically stabilize all agents to an equilibrium point of (4). In this paper we assumed that the , . The corresponding equilibrium point for each agent is , . Assumption is not restrictive, since if , one can always shift the origin of the system to it.
The resultant control law for minimization of (3) can be implemented in a centralized way. However, the existing methods for centralized MPC are only computationally tractable for small-scale system. Furthermore, the communication cost of implementing a centralized receding horizon control law may be costly. Hence, by means of decomposition, is divided as βs such that the minimization of (3) is implemented in distributed manner, with where ; includes the states of the neighbors. The set of neighborsβ of agent is denoted as . , , . For each agent , the control objective is to stabilize it to the equilibrium point , . is obtained by dividing using the technique of [19]. For the agents that have decoupled dynamics, the couplings of control moves for all system are not considered. is a diagonal matrix and is directly obtained.
Under the networked environment, the bandwidth limitation can restrict the amount of information exchange [17]. It is thus appropriate to allow agents to exchange information only once in each sampling interval. We assume that the connectivity of the interagent communication network is sufficient for agents to obtain information regarding all the variables that appear in their local problems.
In the receding horizon control manner, a finite-horizon cost function is exploited to approximate . According to the (5), the evolution of the control moves with predictive horizon for agent is based on the estimation of the state trajectories of the neighborsβ, which are substituted by the assumed state trajectories as [11]. In each control interval, the transmitted information between agents is the assumed state trajectories. As the cooperative consistency and efficiency of distributed control moves is affected for the existence of the deviation of the computed state trajectory from the assumed state trajectory, it is appreciate to penalize it by adding the deviation punishment term into the local cost function.
Define is the gain of distributed state feedback controller.
Consider where includes the assumed states of the neighbors. and satisfy Obviously, is designed to stabilize the agent to the local equilibrium point, independently. is designed to stabilize the agent to the local equilibrium point with neighbor agents, cooperatively. is the weight on the deviation punishment term, to penalize the deviation of the computed state trajectory from the assumed state trajectory.
At each time , the optimization problem for distributed MPC is transformed as:, only when is implemented, and the problem (9) is solved again at time .
Remark 1. The local deviate punishment by each agent effects the control performance, that is, incurs the loss of optimality.
3. Stability of Distributed MPC
The stability of distributed MPC by simply applying the procedure as in the centralized MPC will be discussed. The compact and convex terminal set is defined where , are specified such that is a control invariant set. So using the idea of [20, 21], one simultaneously determines a linear feedback such that is a positively invariant under this feedback.
Define the local linearization at the equilibrium point and assume that is stabilizable. When , enters into the terminal set , the local linear feedback control law is assumed as . is a constant which is calculated off line as follows.
3.1. Design of the Local Control Law
The following equation follows for achieving closed-loop stability:
Lemma 1. Suppose that there exist , , , which satisfy the Lyapunov-equation: for some . Then, there exists a constant such that defined in (11) satisfies (13).
Remark 2. Lemma 1 is directly obtained by referring to ββ in [21]. For MPC, the stability margin can be adjusted by turning the value of according to Lemma 1. With regard to DMPC, [11] adjusts the stability margin by tuning the weight in the local cost function. The control objective is to asymptotically stabilize the closed-loop system, so that and . For , summing (13) obtains
Considering both (7) and (15), yields
where is a finite-horizon cost function, which consists of a finite horizon standard cost, to specify the desired control performance and a terminal cost, to penalize the states at the end of the finite horizon.
The terminal region for agent is designed, so that it is invariant for nonlinear system controlled by a local linear state feedback. The quadratic terminal cost bounds the infinite horizon cost of the nonlinear system starting from and controlled by the local linear state feedback.
3.2. Compatibility Constraint for Stability
As in [18], we define two terms, , ,
Lemma 2. Suppose that (9) holds and there exits such that, for all ,
Then, by solving the receding-horizon optimization problem and implementing , the stability of the global closed-loop system is guaranteed, once a feasible solution at time is found.
Proof. Define . Suppose, at time , there are optimal solution , , which yields
At time , according to Lemma 2, is feasible, which yields
where . By applying (9) and Lemma 2, (11) guarantees that
Substituting (22) into yields
By applying (17)β(19),
At time , by reoptimization, . Hence, it leads to
where is the minimum eigenvalue of . This indicates that the closed-loop system is exponentially stable.
Satisfaction of (18) indicates that all should not deviate too far from their assumed values [13]. Hence, (18) can be taken as a new version of the compatibility condition. This compatibility condition is derived from a single compatibility condition that collects all the states (whether predicted or assumed) with in the switching horizon and is disassembled to each agent in distributed manner, which results in local compatibility constraint for each agent.
3.3. Synthesis Approach of Distributed MPC
In the synthesis approach, the local optimization problem incorporates the above compatibility condition. Since for all agent is coupled with other agents through (18), it is necessary to assign the constraint to each agent so as to satisfy (18) along the optimization. The continued discussion on stability depends on handling of (18).
Denote , . At time , by solving the optimization problem, there exits a parameter , , for each element of , .
Define and denote , . At time , set following constraint for each agent : From (26) and (27), it is shown that and .
Denote Then , .
By applying (26)β(29), it is shown that (18) is guaranteed by assigning is dispensed to agent : By using (26)β(28), conservativeness is introduced. Hence, (31) is more stringent than (18).
Remark 3. By adding the deviation punishment term in the local cost function, the closed-loop stability follows with a large weight. The larger weight means the more loss of the performance [14, 19]. For a small value of , we can adjust the value of to obtain exponential stability. As the is set by optimization, this scheme has more freedom to tuning parameters, to balance the closed-loop stability and control performance.
Remark 4. According to (31), the maximum value and minimum value of can be calculated by considering the range of each variable. We choose the middle value for . Obviously, the is time varying and denoted as .
4. Control Strategy
For practical implementation, distributed MPC is formulated in the following algorithm.
Algorithm
Off-line stage:(i)Set the value of the prediction horizon .(ii)According to (3), (5) and (9), find , ,, for all agents.(iii)Set the value of the compatibility constraint for all agents , .(iv)Calculate the terminal weight , local linear feedback control gain and the terminal set .
On-line stage:βFor agent , perform the following steps at :(i)Take the measurement of . Set .(ii)Send to its neighbor , of agent . Receive .(iii)Set ,, and .(iv)Solve problem (19).(v)Implement .(vi)Get and the value of compatibility constraint .(vii)Send and to its neighbor ,. Receive and . Calculate .ββFor the agent , perform the following steps at :(i)Take the measurement of .(ii)Solve problem (19).(iii)Implement .(iv)Get and the new value of compatibility constraint .(v)Send and to its neighbor ,. Receive and . (vi)Calculate .
5. Numerical Example
We consider the model of agent [22] as
which is obtained by discretizing the continuous-time model
(, and are positions in the horizontal and vertical directions, resp. and are velocities in the horizontal and vertical directions, resp.) with sampling time interval of 0.5 second. There are four agents. A set of positions of the four agents constitute a formation.
The initial positions of the four agents are
Linear constraints on states and input are
The agent , are selected as the core agents of the formation. is designed as . If all systems achieve the desire formation and the core agents cooperatively cover the virtue leader, then = 0, . The global cost function is obtained as They cooperatively track the virtual leader whose reference is . The distance between agents is defined as . Choose , , , . Then,
and , . Choose and , , . The terminal set is . The above choice of model, cost, and constraints allow us to rewrite problem (19) as a quadratic programming with quadratic constraint. To solve the optimal control problems numerically, the package NPSOL 5.02 is used. From top to bottom, the first subgraph of Figure 1 is the evolution of the formation with central MPC; the second sub-graph of Figure 1 is the evolution of the formation with distributed MPC with time-varying compatible constraint; the third sub-graph of Figure 1 is the evolution of the formation with distributed MPC with a fixed compatibility constraint.
(a)
(b)
(c)
With the three control schemes, the formation of all agents can be achieved. The obtained are , , and , respectively. Compared with the second sub-graph, the third sub-graph have a large overshoot at the time-instant (nearby the position ). The distributed MPC with the time-varying compatible constraint has a better control process comparing to the one with fixed compatible constraint. The value of is shown in Figure 2. β*β for agent 1; βOβ for agent 2; β>β for agent 3; β<β for agent 4.
Remark 5. For the second simulation, the value of the fixed compatible constraint is 0.2. For the third simulation, the values of the time-varying compatible constraint is calculated according to the states deviation of the previous horizon.
6. Conclusions
In this paper, we have proposed an improved distributed MPC scheme for multiagent systems based on deviation punishment. One of the features of the proposed scheme is that the cost function of each agent penalizes the deviation between the predicted state trajectory and the assumed state trajectory, which improves the consistency and optimal control trajectory. At each sample time, the value of compatibility constraint is set by the deviation of previous sample time-instant. The closed-loop stability is guaranteed with a small value for the weight of the deviation function term. Furthermore, the effectiveness of the scheme has been investigated by a numerical example. One of the future works will focus on feasibility of optimization.
Acknowledgment
This work is supported by a Grant from the Fundamental Research Funds for the Central Universities of China, no. CDJZR10170006.