Abstract

This paper addresses a distributed model predictive control (DMPC) scheme for multiagent systems with improving control performance. In order to penalize the deviation of the computed state trajectory from the assumed state trajectory, the deviation punishment is involved in the local cost function of each agent. The closed-loop stability is guaranteed with a large weight for deviation punishment. However, this large weight leads to much loss of control performance. Hence, the time-varying compatibility constraints of each agent are designed to balance the closed-loop stability and the control performance, so that the closed-loop stability is achieved with a small weight for the deviation punishment. A numerical example is given to illustrate the effectiveness of the proposed scheme.

1. Introduction

Interests in the cooperative control of multiagent systems have been growing significantly over the last years. The main motivation is the wide range of military and civilian applications, including formation flight of UAV and automated traffic systems. Compared with the traditional approach, model predictive control (MPC), or receding horizon control (RHC) has the ability to redefine cost functions and constraints as needed to reflect changes in the system and/or the environment. Therefore, MPC is extensively applied to the cooperative control of multiagent systems, which makes the agents operate close to the constraint boundaries and obtain better performance than traditional approaches [1–3]. Moreover, due to the computational advantages and the convenience of communication, distributed MPC (DMPC) is recognized as a nature technique to address trajectory optimization problems for multiagent systems.

One of the challenges for distributed control is to ensure that local control actions keep consistent with the actions of others agents [4, 5]. For the coupled systems, the local optimization problem is solved based on the states of its neighbors’ at sample time instant using Nash-optimization technique in [6]. As the local controllers lack of communication and cooperation, the local control actions cannot keep consistent [7, 8]. require each local controller exchange information with all other local controllers to improve optimality and consistency based on sufficient communication. For the decoupled systems [9], exploits the estimation of the prediction state trajectories of the neighbors’ [10]; treats the prediction state trajectories of the neighbor agents as bounded disturbance where a min-max optimal problem is solved for each agent with respect to the worst-case disturbance. In [11, 12], the optimal variables of the local optimization problem contain the control action of its own and its neighbors’ which are coupled in collision avoidance constraints and cost function. Obviously, the deviation between the actions of what the agent is actually doing and of what its neighbor estimates for it affects the control performance. Sometimes the consistency and collision avoidance cannot be achieved, and the feasibility and stability of this scheme cannot be guaranteed [13]. proposes a distributed MPC with a fixed compatibility constraint to restrict the deviation. When the bound of this constraint is sufficiently small, the closed-loop system state enter a neighborhood of the objective state [14, 15] give an improvement over [13] by adding deviation punishment term to penalize the deviation of the computed state trajectory from the assumed state trajectory. Closed-loop exponential stability follows if the weight on the deviation function term is large enough. But the large weight leads to the loss of the control performance.

A contribution in this paper is to propose an idea to reduce the adverse effect of the deviation punishment on the control performance. At each sample time, the value of compatibility constraint is set as the maximum value of the deviation of the previous sample time. We give the stability condition to guarantee the exponential stability of the global closed-loop system with a small weight on the deviation punishment term, which is obtained by dividing the centralize stability constraint as the manner of [16, 17]. The effectiveness of the scheme is also demonstrated by a numerical example.

Notations
π‘₯π‘–π‘˜ is the value of vector π‘₯𝑖 at time π‘˜β‹…π‘₯π‘–π‘˜,𝑑 is the value of vector π‘₯𝑖 at a future time π‘˜+𝑑, predicted at time π‘˜β‹…|π‘₯|=[|π‘₯1|,|π‘₯2|,…,|π‘₯𝑁|] is the absolute value for each component of π‘₯. For a vector π‘₯ and positive-definite matrix 𝑄, β€–π‘₯β€–2𝑄=π‘₯𝑇𝑄π‘₯.

2. Problem Statement

Let us consider a system which is composed of π‘π‘Ž agents. The dynamics of agent 𝑖 [11] isπ‘₯π‘–π‘˜+1=𝑓𝑖π‘₯π‘–π‘˜,π‘’π‘–π‘˜ξ€Έ,(1) where π‘’π‘–π‘˜βˆˆβ„π‘šπ‘–, π‘₯π‘–π‘˜βˆˆβ„π‘›π‘–, and π‘“π‘–βˆΆβ„π‘›π‘–Γ—β„π‘šπ‘–β†¦β„π‘›π‘–, are the input, state, and state transition function of agent 𝑖, respectively. π‘’π‘–π‘˜=[π‘’π‘˜π‘–,1,…,𝑒𝑖,π‘šπ‘–π‘˜]T, π‘₯π‘–π‘˜=[π‘₯π‘˜π‘–,1,…,π‘₯𝑖,π‘›π‘–π‘˜]T. The sets of feasible input and state of agent 𝑖 are denoted as π’°π‘–βŠ‚β„π‘šπ‘– and π’³π‘–βŠ‚β„π‘›π‘–, respectively, that is,π‘’π‘–π‘˜βˆˆπ’°π‘–,π‘₯π‘–π‘˜βˆˆπ’³π‘–,π‘˜β‰₯0.(2) At each time π‘˜, the control objective is [18] to minimizeπ½π‘˜=βˆžξ“π‘‘=0‖π‘₯π‘˜,𝑑‖2𝑄+β€–π‘’π‘˜,𝑑‖2𝑅(3) with respect to π‘’π‘˜,𝑑, 𝑑β‰₯0, where π‘₯=[(π‘₯1)T,…,(π‘₯π‘π‘Ž)T]T, 𝑒=[(𝑒1)T,…,(π‘’π‘π‘Ž)T]T; π‘₯π‘–π‘˜,𝑑+1=𝑓𝑖(π‘₯π‘–π‘˜,𝑑,π‘’π‘–π‘˜,𝑑), π‘₯π‘–π‘˜,0=π‘₯π‘–π‘˜; 𝑄=𝑄T>0, 𝑅=𝑅T>0. π‘’βˆˆβ„π‘š, βˆ‘π‘š=π‘–π‘šπ‘–, and π‘₯βˆˆβ„π‘›, βˆ‘π‘›=𝑖𝑛𝑖. Then,π‘₯π‘˜+1ξ€·π‘₯=π‘“π‘˜,π‘’π‘˜ξ€Έ,(4) where 𝑓=[𝑓1,𝑓2,…,π‘“π‘π‘Ž]T, π‘“βˆΆβ„π‘›Γ—β„π‘šβ†¦β„π‘›. (π‘₯𝑖𝑒,𝑒𝑖𝑒) is the equilibrium point of agent 𝑖, and (π‘₯𝑒,𝑒𝑒) is the corresponding equilibrium point of all agents. 𝒳=𝒳1×𝒳2Γ—β‹―Γ—π’³π‘π‘Ž, 𝒰=𝒰1×𝒰2Γ—β‹―Γ—π’°π‘π‘Ž. The models for all agents are completely decoupled. The coupling between agents arises due to the fact that they operate in the same environment, and that the β€œcooperative’’ objective is imposed on each agent by the cost function. Hence, there are the coupling cost function and coupling constraints [19]. The coupling constraints can be transformed to coupling cost function term directly or handled as decoupling constraints using the technique of [15]. In the present paper we will not consider this issue.

The control objective for all system is to cooperatively asymptotically stabilize all agents to an equilibrium point (π‘₯𝑒,𝑒𝑒) of (4). In this paper we assumed that the (π‘₯𝑒,𝑒𝑒)=(0,0), 𝑓(π‘₯𝑒,𝑒𝑒)=0. The corresponding equilibrium point for each agent is (π‘₯𝑖𝑒,𝑒𝑖𝑒)=(0,0), 𝑓𝑖(π‘₯𝑖𝑒,𝑒𝑖𝑒)=0. Assumption 𝑓𝑖(0,0)=0 is not restrictive, since if (π‘₯𝑖𝑒,𝑒𝑖𝑒)β‰ (0,0), one can always shift the origin of the system to it.

The resultant control law for minimization of (3) can be implemented in a centralized way. However, the existing methods for centralized MPC are only computationally tractable for small-scale system. Furthermore, the communication cost of implementing a centralized receding horizon control law may be costly. Hence, by means of decomposition, π½π‘˜ is divided as π½π‘–π‘˜β€™s such that the minimization of (3) is implemented in distributed manner, withπ½π‘–π‘˜=βˆžξ“π‘‘=0ξ‚ƒβ€–π‘§π‘–π‘˜,𝑑‖2𝑄𝑖+β€–π‘’π‘–π‘˜,𝑑‖2𝑅𝑖,π½π‘˜=π‘π‘Žξ“π‘–=1π½π‘–π‘˜,(5) where π‘§π‘–π‘˜,𝑑=[(π‘₯π‘–π‘˜,𝑑)T(π‘₯βˆ’π‘–π‘˜,𝑑)T]T; π‘₯βˆ’π‘–π‘˜,𝑑 includes the states of the neighbors. The set of neighbors’ of agent 𝑖 is denoted as 𝒩𝑖. π‘₯π‘˜βˆ’π‘–={π‘₯π‘—π‘˜βˆ£π‘—βˆˆπ’©π‘–}, π‘₯π‘˜βˆ’π‘–βˆˆβ„π‘›βˆ’π‘–, π‘›βˆ’π‘–=βˆ‘π‘—βˆˆπ’©π‘–π‘›π‘—. For each agent 𝑖, the control objective is to stabilize it to the equilibrium point (π‘₯𝑖𝑒,𝑒𝑖𝑒)⋅𝑄𝑖=𝑄T𝑖>0, 𝑅𝑖=𝑅T𝑖>0. 𝑄𝑖 is obtained by dividing 𝑄 using the technique of [19]. For the agents that have decoupled dynamics, the couplings of control moves for all system are not considered. 𝑅 is a diagonal matrix and 𝑅𝑖 is directly obtained.

Under the networked environment, the bandwidth limitation can restrict the amount of information exchange [17]. It is thus appropriate to allow agents to exchange information only once in each sampling interval. We assume that the connectivity of the interagent communication network is sufficient for agents to obtain information regarding all the variables that appear in their local problems.

In the receding horizon control manner, a finite-horizon cost function is exploited to approximate π½π‘–π‘˜. According to the (5), the evolution of the control moves with predictive horizon for agent 𝑖 is based on the estimation of the state trajectories π‘₯βˆ’π‘–π‘˜,𝑑,𝑑≀𝑁 of the neighbors’, which are substituted by the assumed state trajectories Μ‚π‘₯βˆ’π‘–π‘˜,𝑑,𝑑≀𝑁 as [11]. In each control interval, the transmitted information between agents is the assumed state trajectories. As the cooperative consistency and efficiency of distributed control moves is affected for the existence of the deviation of the computed state trajectory from the assumed state trajectory, it is appreciate to penalize it by adding the deviation punishment term into the local cost function.

Defineπ‘’π‘–π‘˜,𝑑=𝐹𝑖(π‘˜)π‘₯π‘–π‘˜,𝑑,βˆ€π‘‘β‰₯𝑁.(6)𝐹𝑖(π‘˜) is the gain of distributed state feedback controller.

ConsiderΜ†π½π‘–π‘˜=π‘βˆ’1𝑑=0ξ‚ƒβ€–Μ‚π‘§π‘–π‘˜,𝑑‖2𝑄𝑖+β€–π‘’π‘–π‘˜,𝑑‖2𝑅𝑖+β€–π‘₯π‘–π‘˜,π‘‘βˆ’Μ‚π‘₯π‘–π‘˜,𝑑‖2𝑇𝑖+βˆžξ“π‘‘=𝑁‖π‘₯π‘–π‘˜,𝑑‖2𝑄𝑖+β€–π‘’π‘–π‘˜,𝑑‖2𝑅𝑖,(7) whereΜ‚π‘§π‘–π‘˜,𝑑=ξ‚Έξ‚€π‘₯π‘–π‘˜,𝑑Tξ€·Μ‚π‘₯βˆ’π‘–π‘˜,𝑑Tξ‚ΉT,Μ‚π‘₯π‘–π‘˜,0=π‘₯π‘–π‘˜,(8)Μ‚π‘₯βˆ’π‘–π‘˜,𝑑 includes the assumed states of the neighbors. 𝑄𝑖=𝑄T𝑖>0 and 𝑅𝑖=𝑅T𝑖=𝑅𝑖 satisfy𝑄diag1,𝑄2,…,π‘„π‘π‘Žξ€Ύξ€½π‘…β‰₯𝑄,diag1,𝑅2,…,π‘…π‘π‘Žξ€Ύ=𝑅.(9) Obviously, 𝑄𝑖 is designed to stabilize the agent 𝑖 to the local equilibrium point, independently. 𝑄𝑖 is designed to stabilize the agent 𝑖 to the local equilibrium point with neighbor agents, cooperatively. 𝑇𝑖 is the weight on the deviation punishment term, to penalize the deviation of the computed state trajectory from the assumed state trajectory.

At each time π‘˜, the optimization problem for distributed MPC is transformed as:minπ‘ˆπ‘–π‘˜,𝐹𝑖(π‘˜)Μ†π½π‘–π‘˜,𝑠.𝑑.(1),(2),(6),(7).(10)π‘ˆπ‘˜βˆ—π‘–=[(π‘’βˆ—π‘–π‘˜,0)T,(π‘’βˆ—π‘–π‘˜,1)T,…,(π‘’βˆ—π‘–π‘˜,π‘βˆ’1)T]T, only when π‘’π‘˜βˆ—π‘–=π‘’βˆ—π‘–π‘˜,0 is implemented, and the problem (9) is solved again at time π‘˜+1.

Remark 1. The local deviate punishment by each agent effects the control performance, that is, incurs the loss of optimality.

3. Stability of Distributed MPC

The stability of distributed MPC by simply applying the procedure as in the centralized MPC will be discussed. The compact and convex terminal set Ω𝑖 is definedΩ𝑖=π‘₯π‘–βˆˆβ„π‘›π‘–βˆ£ξ€·π‘₯𝑖T𝑃𝑖π‘₯𝑖≀𝛼𝑖,(11) where 𝑃𝑖>0, 𝛼𝑖>0 are specified such that Ω𝑖 is a control invariant set. So using the idea of [20, 21], one simultaneously determines a linear feedback such that Ω𝑖 is a positively invariant under this feedback.

Define the local linearization at the equilibrium point𝐴𝑖=πœ•π‘“π‘–πœ•π‘₯𝑖(0,0),𝐡𝑖=πœ•π‘“π‘–πœ•π‘’π‘–(0,0).(12) and assume that (𝐴𝑖,𝐡𝑖) is stabilizable. When π‘₯π‘–π‘˜,𝑁+𝑑, 𝑑β‰₯0 enters into the terminal set Ω𝑖, the local linear feedback control law is assumed as π‘’π‘–π‘˜,𝑁+𝑑=𝐹𝑖(π‘˜)π‘₯π‘–π‘˜,𝑁+𝑑=𝐾𝑖π‘₯π‘–π‘˜,𝑁+𝑑. 𝐾𝑖 is a constant which is calculated off line as follows.

3.1. Design of the Local Control Law

The following equation follows for achieving closed-loop stability:β€–β€–π‘₯π‘–π‘˜,𝑁+𝑑+1β€–β€–2π‘ƒπ‘–βˆ’β€–β€–π‘₯π‘–π‘˜,𝑁+𝑑‖‖2𝑃𝑖‖‖π‘₯β‰€βˆ’π‘–π‘˜,𝑁+𝑑‖‖2π‘„π‘–βˆ’β€–β€–π‘’π‘–π‘˜,𝑁+𝑑‖‖2𝑅𝑖,𝑑β‰₯0.(13)

Lemma 1. Suppose that there exist 𝑄𝑖>0, 𝑅𝑖>0, 𝑃𝑖>0, which satisfy the Lyapunov-equation: 𝐴𝑖+𝐡𝑖𝐾𝑖T𝑃𝑖𝐴𝑖+π΅π‘–πΎπ‘–ξ€Έβˆ’π‘ƒπ‘–=βˆ’πœ…π‘–π‘ƒπ‘–βˆ’π‘„π‘–βˆ’πΎT𝑖𝑅𝑖𝐾𝑖,(14) for some πœ…π‘–>0. Then, there exists a constant 𝛼𝑖>0 such that Ω𝑖 defined in (11) satisfies (13).

Remark 2. Lemma 1 is directly obtained by referring to β€œπΏπ‘’π‘šπ‘šπ‘Ž1” in [21]. For MPC, the stability margin can be adjusted by turning the value of πœ…π‘– according to Lemma 1. With regard to DMPC, [11] adjusts the stability margin by tuning the weight in the local cost function. The control objective is to asymptotically stabilize the closed-loop system, so that π‘₯π‘–π‘˜,∞=0 and π‘’π‘–π‘˜,∞=0. For 𝑑=0,…,∞, summing (13) obtains βˆžξ“π‘‘=𝑁‖‖π‘₯π‘–π‘˜,𝑑‖‖2𝑄𝑖+β€–β€–π‘’π‘–π‘˜,𝑑‖‖2𝑅𝑖≀‖‖π‘₯π‘–π‘˜,𝑁‖‖2𝑃𝑖.(15)
Considering both (7) and (15), yields Μ†π½π‘–π‘˜β‰€π½π‘–π‘˜=π‘βˆ’1𝑑=0ξ‚ƒβ€–β€–Μ‚π‘§π‘–π‘˜,𝑑‖‖2𝑄𝑖+β€–β€–π‘’π‘–π‘˜,𝑑‖‖2𝑅𝑖+β€–β€–π‘₯π‘–π‘˜,π‘‘βˆ’Μ‚π‘₯π‘–π‘˜,𝑑‖‖2𝑇𝑖+β€–β€–π‘₯π‘–π‘˜,𝑁‖‖2𝑃𝑖,(16) where π½π‘–π‘˜ is a finite-horizon cost function, which consists of a finite horizon standard cost, to specify the desired control performance and a terminal cost, to penalize the states at the end of the finite horizon.
The terminal region Ω𝑖 for agent 𝑖 is designed, so that it is invariant for nonlinear system controlled by a local linear state feedback. The quadratic terminal cost β€–π‘₯π‘–π‘˜,𝑁‖2𝑃𝑖 bounds the infinite horizon cost of the nonlinear system starting from Ω𝑖 and controlled by the local linear state feedback.

3.2. Compatibility Constraint for Stability

As in [18], we define two terms, πœ‰βˆ’π‘–=π‘₯βˆ’βˆ—π‘–βˆ’Μ‚π‘₯βˆ’π‘–, πœ‰π‘–=π‘₯βˆ—π‘–βˆ’Μ‚π‘₯𝑖,𝑄𝑖=βŽ‘βŽ’βŽ’βŽ£π‘„1𝑖𝑄𝑖12𝑄𝑖12T𝑄3π‘–βŽ€βŽ₯βŽ₯⎦,πΆβˆ—π‘₯(π‘˜)=π‘π‘Žξ“π‘–=1π‘βˆ’1𝑑=1ξ‚»2ξ‚€π‘₯βˆ—π‘–π‘˜,𝑑T𝑄𝑖12πœ‰βˆ’π‘–π‘˜,𝑑+2Μ‚π‘₯βˆ’π‘–π‘˜,𝑑T𝑄3π‘–πœ‰βˆ’π‘–π‘˜,𝑑+ξ€·πœ‰βˆ’π‘–π‘˜,𝑑T𝑄3π‘–πœ‰βˆ’π‘–π‘˜,𝑑,πΆβˆ—πœ‰(π‘˜)=π‘π‘Žξ“π‘–=1π‘βˆ’1𝑑=1ξ€·πœ‰π‘–π‘˜,𝑑Tπ‘‡π‘–πœ‰π‘–π‘˜,𝑑,(17)

Lemma 2. Suppose that (9) holds and there exits 𝜌(π‘˜) such that, for all π‘˜>0, 0β‰€πœŒ(π‘˜)≀1,βˆ’πœŒ(π‘˜)π‘π‘Žξ“π‘–=1ξ‚»β€–β€–ξ€·π‘₯𝑖(π‘˜)T,ξ€·Μ‚π‘₯π‘˜βˆ’π‘–ξ€ΈTβ€–β€–2𝑄𝑖+β€–β€–π‘’βˆ—π‘–β€–β€–(0βˆ£π‘˜)2𝑅𝑖+πΆβˆ—π‘₯(π‘˜)βˆ’πΆβˆ—πœ‰(π‘˜)≀0.(18)

Then, by solving the receding-horizon optimization problemminπ‘ˆπ‘–(π‘˜)π½π‘–π‘˜,𝑠.𝑑.(1),(2),(14),(16),π‘’π‘–π‘˜,𝑁=𝐾𝑖π‘₯π‘–π‘˜,𝑁,π‘₯π‘–π‘˜,π‘βˆˆΞ©π‘–,(19) and implementing π‘’βˆ—π‘–π‘˜,0, the stability of the global closed-loop system is guaranteed, once a feasible solution at time π‘˜=0 is found.

Proof. Define βˆ‘π½(π‘˜)=π‘π‘Žπ‘–=1π½π‘–π‘˜. Suppose, at time π‘˜, there are optimal solution π‘ˆπ‘˜βˆ—π‘–, π‘–βˆˆ{1,…,π‘π‘Ž}, which yields π½βˆ—(π‘˜)=π‘π‘Žξ“π‘–=1ξ‚»β€–β€–ξ€·π‘₯π‘–π‘˜ξ€ΈT,ξ€·Μ‚π‘₯π‘˜βˆ’π‘–ξ€ΈTβ€–β€–2𝑄𝑖+β€–β€–π‘’βˆ—π‘–π‘˜,0β€–β€–2𝑅𝑖+π‘π‘Žξ“π‘–=1π‘βˆ’1𝑑=1ξ‚»β€–β€–β€–ξ‚€π‘₯βˆ—π‘–π‘˜,𝑑T,ξ‚€Μ‚π‘₯βˆ’π‘–k,𝑑Tβ€–β€–β€–2𝑄𝑖+β€–β€–π‘’βˆ—π‘–π‘˜,𝑑‖‖2𝑅𝑖+β€–β€–β€–ξ‚€π‘₯βˆ—π‘–π‘˜,𝑑Tβˆ’ξ€·Μ‚π‘₯π‘–π‘˜,𝑑Tβ€–β€–β€–2𝑇𝑖+π‘π‘Žξ“π‘–=1β€–β€–π‘₯βˆ—π‘–π‘˜,𝑁‖‖2𝑃𝑖.(20)
At time 𝑑+1, according to Lemma 2, π‘ˆπ‘–π‘˜+1={π‘’βˆ—π‘–π‘˜,1,…,π‘’βˆ—π‘–π‘˜,π‘βˆ’1,𝐾𝑖π‘₯βˆ—π‘–π‘˜,𝑁} is feasible, which yields 𝐽(π‘˜+1)=π‘π‘Žξ“π‘π‘–=1𝑑=1ξ‚»β€–β€–β€–ξ‚€π‘₯βˆ—π‘–π‘˜,𝑑T,ξ‚€π‘₯βˆ’βˆ—π‘–π‘˜,𝑑Tβ€–β€–β€–2𝑄𝑖+β€–β€–π‘’βˆ—π‘–π‘˜,𝑑‖‖2𝑅𝑖+π‘π‘Žξ“π‘–=1β€–β€–π‘₯βˆ—π‘–π‘˜,𝑁+1β€–β€–2𝑃𝑖=π‘π‘Žξ“π‘–=1π‘βˆ’1𝑑=1ξ‚»β€–β€–β€–ξ‚€π‘₯βˆ—π‘–π‘˜,𝑑T,ξ‚€π‘₯βˆ’βˆ—π‘–π‘˜,𝑑Tβ€–β€–β€–2𝑄𝑖+β€–β€–π‘’βˆ—π‘–π‘˜,𝑑‖‖2𝑅𝑖+β€–β€–π‘₯βˆ—π‘˜,𝑁‖‖2𝑄+β€–β€–π‘’βˆ—π‘˜,𝑁‖‖2𝑅+β€–β€–π‘₯βˆ—π‘˜,𝑁+1β€–β€–2𝑃,(21) where 𝑃=diag{𝑃1,𝑃2,…,π‘ƒπ‘π‘Ž}. By applying (9) and Lemma 2, (11) guarantees that β€–β€–π‘₯βˆ—π‘˜,𝑁+1β€–β€–2π‘ƒβˆ’β€–β€–π‘₯βˆ—π‘˜,𝑁‖‖2𝑃‖‖π‘₯β‰€βˆ’βˆ—π‘˜,𝑁‖‖2π‘„βˆ’β€–β€–π‘’βˆ—π‘˜,𝑁‖‖2𝑅.(22) Substituting (22) into 𝐽(π‘˜+1) yields 𝐽(π‘˜+1)β‰€π‘π‘Žξ“π‘–=1π‘βˆ’1𝑑=1ξ‚»β€–β€–β€–ξ‚€π‘₯βˆ—π‘–π‘˜,𝑑T,ξ‚€π‘₯βˆ’βˆ—π‘–π‘˜,𝑑Tβ€–β€–β€–2𝑄𝑖+β€–β€–π‘’βˆ—π‘–π‘˜,𝑑‖‖2𝑅𝑖+π‘π‘Žξ“π‘–=1β€–β€–π‘₯βˆ—π‘–π‘˜,𝑁‖‖2𝑃𝑖.(23) By applying (17)–(19), 𝐽(π‘˜+1)βˆ’π½βˆ—(π‘˜)β‰€βˆ’(1βˆ’πœŒ(π‘˜))π‘π‘Žξ“π‘–=1ξ‚»β€–β€–ξ€·π‘₯π‘–π‘˜ξ€ΈT,ξ€·Μ‚π‘₯π‘˜βˆ’π‘–ξ€ΈTβ€–β€–2𝑄𝑖+β€–β€–π‘’βˆ—π‘–π‘˜,0β€–β€–2𝑅𝑖‖‖π‘₯β‰€βˆ’(1βˆ’πœŒ(π‘˜))π‘˜β€–β€–2𝑄.(24)
At time π‘˜+1, by reoptimization, π½βˆ—(π‘˜+1)≀𝐽(π‘˜+1). Hence, it leads to π½βˆ—(π‘˜+1)βˆ’π½βˆ—(β€–β€–π‘₯π‘˜)β‰€βˆ’(1βˆ’πœŒ(π‘˜))π‘˜β€–β€–2π‘„β‰€βˆ’(1βˆ’πœŒ(π‘˜))πœ†min(𝑄)β€–π‘₯(π‘˜)β€–2𝑄,(25) where πœ†min(𝑄) is the minimum eigenvalue of 𝑄. This indicates that the closed-loop system is exponentially stable.
Satisfaction of (18) indicates that all π‘₯π‘–π‘˜,𝑑 should not deviate too far from their assumed values Μ‚π‘₯π‘–π‘˜,𝑑 [13]. Hence, (18) can be taken as a new version of the compatibility condition. This compatibility condition is derived from a single compatibility condition that collects all the states (whether predicted or assumed) with in the switching horizon and is disassembled to each agent in distributed manner, which results in local compatibility constraint for each agent.

3.3. Synthesis Approach of Distributed MPC

In the synthesis approach, the local optimization problem incorporates the above compatibility condition. Since π‘₯βˆ—π‘˜,𝑑 for all agent 𝑖 is coupled with other agents through (18), it is necessary to assign the constraint to each agent so as to satisfy (18) along the optimization. The continued discussion on stability depends on handling of (18).

Denote πœ‰π‘–π‘˜=[πœ‰π‘˜π‘–,1,…,πœ‰π‘–,π‘›π‘–π‘˜]T, πœ‰π‘˜βˆ’π‘–={πœ‰π‘—π‘˜βˆ£π‘—βˆˆπ’©π‘–}. At time π‘˜>0, by solving the optimization problem, there exits a parameter β„°π‘˜π‘–,𝑙, 𝑙=1,…,𝑛𝑖, for each element of πœ‰π‘˜π‘–,𝑙, 𝑙=1,…,𝑛𝑖.

Defineβ„°π‘˜π‘–,𝑙=max𝑑||πœ‰π‘–,π‘™π‘˜βˆ’1,𝑑||,(26) and denote β„°π‘–π‘˜=[β„°π‘˜π‘–,1,…,ℰ𝑖,π‘›π‘–π‘˜]T, β„°π‘˜βˆ’π‘–={β„°π‘—π‘˜βˆ£π‘—βˆˆπ’©π‘–}. At time π‘˜+1>0, set following constraint for each agent 𝑖:||||ξ‚€π‘₯π‘–π‘˜+1,𝑑Tβˆ’ξ€·Μ‚π‘₯π‘–π‘˜+1,𝑑T||||<β„°π‘–π‘˜.(27) From (26) and (27), it is shown that πœ‰π‘–π‘˜+1,𝑑<β„°π‘–π‘˜ and πœ‰βˆ’π‘–π‘˜+1,𝑑<β„°π‘˜βˆ’π‘–.

Denote𝐢π‘₯βˆ—π‘–(π‘˜)=π‘βˆ’1𝑑=1ξ‚»2ξ‚€π‘₯βˆ—π‘–π‘˜,𝑑T𝑄𝑖12β„°π‘˜βˆ’π‘–ξ€·+2Μ‚π‘₯βˆ’π‘–π‘˜,𝑑T𝑄3π‘–β„°π‘˜βˆ’π‘–+ξ€·β„°π‘˜βˆ’π‘–ξ€ΈT𝑄3π‘–ξ€·β„°π‘˜βˆ’π‘–ξ€ΈT,(28)πΆπœ‰βˆ—π‘–(π‘˜)=π‘βˆ’1𝑑=1ξ€·πœ‰π‘–π‘˜,𝑑Tπ‘‡π‘–πœ‰π‘–π‘˜,𝑑.(29) Then πΆβˆ—π‘₯βˆ‘(π‘˜)β‰€π‘π‘Žπ‘–=1𝐢π‘₯βˆ—π‘–(π‘˜), πΆβˆ—πœ‰βˆ‘(π‘˜)=π‘π‘Žπ‘–=1πΆπœ‰βˆ—π‘–(π‘˜).

By applying (26)–(29), it is shown that (18) is guaranteed by assigning0β‰€πœŒπ‘–(π‘˜)≀1,π‘π‘Žξ“π‘–=1βˆ’πœŒπ‘–ξ‚»β€–β€–ξ€·π‘₯(π‘˜)π‘–π‘˜ξ€ΈT,ξ€·Μ‚π‘₯π‘˜βˆ’π‘–ξ€ΈTβ€–β€–2𝑄𝑖+β€–β€–π‘’βˆ—π‘–π‘˜,0β€–β€–2𝑅𝑖+π‘π‘Žξ“π‘–=1𝐢π‘₯βˆ—π‘–(π‘˜)βˆ’π‘π‘Žξ“π‘–=1πΆπœ‰βˆ—π‘–(π‘˜)≀0.(30) is dispensed to agent 𝑖:0β‰€πœŒπ‘–(π‘˜)≀1,π‘βˆ’1𝑑=1β€–β€–πœ‰π‘–π‘˜,𝑑‖‖2𝑇𝑖β‰₯βˆ’πœŒπ‘–ξ‚»β€–β€–ξ€·π‘₯(π‘˜)π‘–π‘˜ξ€ΈT,ξ€·Μ‚π‘₯π‘˜βˆ’π‘–ξ€ΈTβ€–β€–2𝑄𝑖+β€–β€–π‘’βˆ—π‘–k,0β€–β€–2𝑅𝑖+𝐢π‘₯βˆ—π‘–(π‘˜).(31) By using (26)–(28), conservativeness is introduced. Hence, (31) is more stringent than (18).

Remark 3. By adding the deviation punishment term in the local cost function, the closed-loop stability follows with a large weight. The larger weight means the more loss of the performance [14, 19]. For a small value of 𝑇𝑖, we can adjust the value of πœŒπ‘–(π‘˜) to obtain exponential stability. As the πœŒπ‘–(π‘˜) is set by optimization, this scheme has more freedom to tuning parameters, to balance the closed-loop stability and control performance.

Remark 4. According to (31), the maximum value and minimum value of 𝑇𝑖 can be calculated by considering the range of each variable. We choose the middle value for 𝑇𝑖. Obviously, the 𝑇𝑖 is time varying and denoted as 𝑇𝑖(π‘˜).

4. Control Strategy

For practical implementation, distributed MPC is formulated in the following algorithm.

Algorithm
Off-line stage:(i)Set the value of the prediction horizon 𝑁.(ii)According to (3), (5) and (9), find 𝑄𝑖, 𝑅𝑖,𝑄𝑖,𝑅𝑖,𝑑=0,…,π‘βˆ’1, for all agents.(iii)Set the value of the compatibility constraint for all agents ℰ𝑖(0)=+∞, π‘—βˆˆπ’©π‘–.(iv)Calculate the terminal weight 𝑃𝑖, local linear feedback control gain 𝐾𝑖 and the terminal set Ω𝑖.
On-line stage: For agent 𝑖, perform the following steps at π‘˜=0:(i)Take the measurement of π‘₯𝑖0. Set 𝑇𝑖=0.(ii)Send π‘₯𝑖0 to its neighbor 𝑗, π‘—βˆˆπ’©π‘– of agent 𝑖. Receive π‘₯𝑗0.(iii)Set Μ‚π‘₯𝑗𝑑,0=π‘₯𝑗0,0,π‘—βˆˆπ’©π‘–, 𝑑=0,…,π‘βˆ’1 and Μ‚π‘₯𝑖0,𝑑=π‘₯𝑖0.(iv)Solve problem (19).(v)Implement 𝑒𝑖0=π‘’βˆ—π‘–0,0.(vi)Get Μ‚π‘₯𝑖𝑑,0 and the value of compatibility constraint ℰ𝑖(1).(vii)Send Μ‚π‘₯𝑖0,𝑑 and ℰ𝑖(1) to its neighbor 𝑗,π‘—βˆˆπ‘π‘–. Receive Μ‚π‘₯𝑗0,𝑑 and ℰ𝑗(1). Calculate 𝑇𝑖(π‘˜).  For the agent 𝑖, perform the following steps at π‘˜>0:(i)Take the measurement of π‘₯π‘–π‘˜.(ii)Solve problem (19).(iii)Implement π‘’π‘–π‘˜=π‘’βˆ—π‘–0,π‘˜.(iv)Get Μ‚π‘₯π‘–π‘˜,𝑑 and the new value of compatibility constraint ℰ𝑖(π‘˜+1).(v)Send Μ‚π‘₯π‘–π‘˜,𝑑 and ℰ𝑖(π‘˜+1) to its neighbor 𝑗,π‘—βˆˆπ’©π‘–. Receive Μ‚π‘₯π‘—π‘˜,𝑑 and ℰ𝑗(π‘˜+1). (vi)Calculate 𝑇𝑖(π‘˜).

5. Numerical Example

We consider the model of agent 𝑖 [22] asπ‘₯π‘–π‘˜+1=⎑⎒⎒⎣𝐼2𝐼20𝐼2⎀βŽ₯βŽ₯⎦π‘₯π‘–π‘˜+⎑⎒⎒⎣0.5𝐼2𝐼2⎀βŽ₯βŽ₯βŽ¦π‘’π‘–π‘˜,(32)

which is obtained by discretizing the continuous-time modelΜ‡π‘₯𝑖=⎑⎒⎒⎣0𝐼2⎀βŽ₯βŽ₯⎦π‘₯00𝑖+⎑⎒⎒⎣0𝐼2⎀βŽ₯βŽ₯βŽ¦π‘’π‘–.(33)

(π‘₯π‘–π‘˜=[π‘žπ‘˜π‘–,π‘₯,π‘žπ‘˜π‘–,𝑦,π‘£π‘˜π‘–,π‘₯,π‘£π‘˜π‘–,𝑦]T, π‘žπ‘˜π‘–,π‘₯ and π‘žπ‘˜π‘–,𝑦 are positions in the horizontal and vertical directions, resp. π‘£π‘˜π‘–,π‘₯ and π‘£π‘˜π‘–,𝑦 are velocities in the horizontal and vertical directions, resp.) with sampling time interval of 0.5 second. There are four agents. A set of positions of the four agents constitute a formation.

The initial positions of the four agents areξ‚ƒπ‘žπ‘œ1,π‘₯,π‘žπ‘œ1,𝑦=[],ξ‚ƒπ‘ž0,2π‘œ2,π‘₯,π‘žπ‘œ2,𝑦=[]ξ‚ƒπ‘žβˆ’2,0,(34)π‘œ3,π‘₯,π‘žπ‘œ3,𝑦=[],ξ‚ƒπ‘ž0,βˆ’3π‘œ4,π‘₯,π‘žπ‘œ4,𝑦=[]2,0.(35)

Linear constraints on states and input are||π‘₯𝑖||≀1001001515T,||𝑒𝑖||≀22T.(36)

The agent 𝑖, 𝑖=1,2,3 are selected as the core agents of the formation. π’œ0 is designed as π’œ0={(1,2);(1,3);(2,4)}. If all systems achieve the desire formation and the core agents cooperatively cover the virtue leader, then π‘’π‘˜π‘–,π‘₯(π‘˜) = 0, π‘’π‘˜π‘–,𝑦=0. The global cost function is obtained as𝐽(π‘˜)=βˆžξ“π‘‘=0ξ‚ƒβ€–β€–π‘ž1π‘˜,π‘‘βˆ’π‘ž2π‘˜,𝑑+𝑐12β€–β€–2+β€–β€–π‘ž1π‘˜,π‘‘βˆ’π‘ž3π‘˜,𝑑+𝑐13β€–β€–2+β€–β€–π‘ž2π‘˜,π‘‘βˆ’π‘ž4π‘˜,𝑑+𝑐24β€–β€–2+19β€–β€–ξ‚€π‘ž1π‘˜,𝑑+π‘ž2π‘˜,𝑑+π‘ž3π‘˜,π‘‘ξ‚βˆ’π‘žπ‘β€–β€–2+‖‖𝑣1π‘˜,𝑑‖‖2+‖‖𝑣2π‘˜,𝑑‖‖2+‖‖𝑣3π‘˜,𝑑‖‖2+‖‖𝑣4π‘˜,𝑑‖‖2+β€–β€–π‘’π‘˜,𝑑‖‖2ξ‚„.(37) They cooperatively track the virtual leader whose reference is π‘žπ‘=(0.5βˆ—π‘˜,0). The distance between agents is defined as 𝑐12=(βˆ’2,1),𝑐13=(βˆ’2,βˆ’1),𝑐24=(βˆ’2,1). Choose 𝒩1={2}, 𝒩2={1}, 𝒩3={1}, 𝒩4={2}. Then,⎑⎒⎒⎒⎒⎒⎒⎒⎒⎒⎒⎒⎒⎒⎒⎒⎣21𝑄=9𝐼280βˆ’9𝐼280βˆ’9𝐼20000𝐼2βˆ’80000009𝐼21029𝐼2019𝐼20βˆ’πΌ20000𝐼2βˆ’800009𝐼2019𝐼21019𝐼200000000𝐼20000βˆ’πΌ2000𝐼200000000𝐼2⎀βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯⎦,𝑅=𝐼8.𝑄1=⎑⎒⎒⎒⎒⎒⎒⎒⎣79𝐼240βˆ’9𝐼20013𝐼2βˆ’4009𝐼20𝐼2010003𝐼2⎀βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯⎦,𝑄2=⎑⎒⎒⎒⎒⎒⎒⎒⎣119𝐼240βˆ’9𝐼20013𝐼2βˆ’4009𝐼2049𝐼2010003𝐼2⎀βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯⎦,𝑄3=⎑⎒⎒⎒⎒⎒⎒⎒⎣119𝐼280βˆ’9𝐼20012𝐼2βˆ’8009𝐼2089𝐼2010003𝐼2⎀βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯⎦,𝑄4=⎑⎒⎒⎒⎒⎒⎒⎣𝐼20βˆ’πΌ200𝐼200βˆ’πΌ20𝐼2010003𝐼2⎀βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯⎦,(38)

and 𝑅𝑖=𝐼2, π‘–βˆˆ{1,2,3,4}. Choose 𝑄𝑖=6.85βˆ—πΌ4 and 𝑅𝑖=𝐼2, π‘–βˆˆ{1,2,3,4}, 𝑁=10. The terminal set is 𝛼𝑖=0.22. The above choice of model, cost, and constraints allow us to rewrite problem (19) as a quadratic programming with quadratic constraint. To solve the optimal control problems numerically, the package NPSOL 5.02 is used. From top to bottom, the first subgraph of Figure 1 is the evolution of the formation with central MPC; the second sub-graph of Figure 1 is the evolution of the formation with distributed MPC with time-varying compatible constraint; the third sub-graph of Figure 1 is the evolution of the formation with distributed MPC with a fixed compatibility constraint.

With the three control schemes, the formation of all agents can be achieved. The obtained 𝐽trueξ…žs are 2.5779Γ—106, 4.8725Γ—106, and 5.654Γ—106, respectively. Compared with the second sub-graph, the third sub-graph have a large overshoot at the time-instant π‘˜=9 (nearby the position (3,0)). The distributed MPC with the time-varying compatible constraint has a better control process comparing to the one with fixed compatible constraint. The value of πœŒπ‘–(π‘˜) is shown in Figure 2. β€œ*” for agent 1; β€œO” for agent 2; β€œ>” for agent 3; β€œ<” for agent 4.

Remark 5. For the second simulation, the value of the fixed compatible constraint is 0.2. For the third simulation, the values of the time-varying compatible constraint is calculated according to the states deviation of the previous horizon.

6. Conclusions

In this paper, we have proposed an improved distributed MPC scheme for multiagent systems based on deviation punishment. One of the features of the proposed scheme is that the cost function of each agent penalizes the deviation between the predicted state trajectory and the assumed state trajectory, which improves the consistency and optimal control trajectory. At each sample time, the value of compatibility constraint is set by the deviation of previous sample time-instant. The closed-loop stability is guaranteed with a small value for the weight of the deviation function term. Furthermore, the effectiveness of the scheme has been investigated by a numerical example. One of the future works will focus on feasibility of optimization.

Acknowledgment

This work is supported by a Grant from the Fundamental Research Funds for the Central Universities of China, no. CDJZR10170006.