Abstract

This paper presents a dynamical recurrent neural network- (RNN-) based model predictive control (MPC) structure for the formation flight of multiple unmanned quadrotors. A distributed hierarchical control system with the translation subsystem and rotational subsystem is proposed to handle the formation-tracking problem for each quadrotor. The RNN-based MPC is proposed for each subsystem, where the RNN is introduced as the predictive model in MPC. And to improve the modeling accuracy, an adaptive updating law is developed to tune weights online for the RNN. Besides, the adaptive differential evolution (DE) algorithm is utilized to solve the optimization problem for MPC. Furthermore, the closed-loop stability is analyzed; meanwhile, the convergence of the DE algorithm is discussed as well. Finally, some simulation examples are provided to illustrate the validity of the proposed control structure.

1. Introduction

The unmanned quadrotor is a class of pilot-less aerial vehicles which enable to hover, take off, and land vertically with aerodynamic and propulsion characteristics. Due to the capabilities of simplicity, maneuverability, and payload [1, 2], the unmanned quadrotor has been gaining extensive attention and plays an important role in the military and civilian fields, such as real-time reconnaissance, surveillance, search and rescue missions, bush fire monitoring, agricultural crop dusting, and different airborne operations [35]. These flight missions not only impose stringent requirements over the modeling and control of quadrotor dynamics but also increasingly require multiple quadrotors to coordinate in a formation. Here, the basis of these flight missions is the trajectory tracking. However, the dynamics of the quadrotor are highly nonlinear and subject to uncertain disturbances, so it is a challenge to design a desirable control system for the unmanned quadrotor.

To solve the above problems, several control approaches have been investigated in recent years, such as proportional-integral-derivative (PID) control and linear quadratic regulator (LQR) control [6], sliding mode control [7], and backstepping control [8]. However, these methods only consider the control effects and often neglect the actual flight constraints. Besides, the optimality of the path cannot be ensured, as the selection of reactive parameters is aimless or blind.

To solve the problems further, a new developing method, the model predictive control (MPC), has been proposed for the unmanned quadrotor system [9, 10], where the physical constraints of the system inputs and outputs could be easily solved by using the proper penalty terms in cost function. In [11], the MPC method was proposed to solve the path-tracking problem for the unmanned quadrotor, where a future control sequence was computed in a defined horizon, such that the prediction of the quadrotor output was driven close to the reference. In [12], a novel control strategy which combined the active disturbance rejection control and model predictive control was presented for the unmanned quadrotor. The proposed method improved the robustness for the modeling error and disturbances while performing a smooth trajectory tracking.

The performance of MPC highly depends on the accuracy of the described predictive model. Conventional techniques of system identification such as the maximum likelihood estimation method, modified maximum likelihood estimation method, and Kalman filtering method [1315] have been widely used in system identification. However, these methods require the known structure of the mathematical model, which is difficult for the unmanned quadrotor to know its accurate mathematical model, considering uncertain disturbances in the complex environment.

As a nonconventional method, the neural network (NN) does not explicitly need an accurate mathematical model [1619]. Based on that, some works on NN-based MPC have improved modeling performance for many different nonlinear systems. In [20], Nikdel et al. presented a model predictive controller with the feedforward neural network (FNN) model for a single degree of freedom-rotary manipulator achieving well closed-loop performance. In [21], a model predictive controller based on the FNN model was utilized to improve the transient stability of a power system. However, the main drawback of FNNs considered in [20, 21] is that they essentially own static input-to-output maps and their capability of representing complex and time-varying nonlinear systems is limited. Besides, they can only provide predetermined finite predictions [22, 23]. To overcome the above drawbacks, a recurrent neural network (RNN) is introduced for MPC, which can capture the system behavior dynamically and provide long-range predictions even in the presence of disturbances [2426]. In a RNN, the current control signal and state are given as the inputs to obtain the outputs which are fed back as new states. Therefore, RNN is more suitable to model the quadrotor system for MPC in this paper.

On the other hand, the performance of MPC also relies on the local optimization problem in MPC, which can be solved by either analytical or numerical solutions. The solution depends on the structure of the cost function. For the tracking problem of the multiquadrotor formation in this paper, the corresponding cost function is complex and nondifferentiable. Traditional search techniques using the characteristics of the problem to determine the next sampling point (gradients, Hessians, and linearity) are not feasible, because the characteristics of tracking performance are affected by mismatched disturbances. Nevertheless, stochastic search techniques, such as the genetic algorithm (GA), particle swarm optimization (PSO) algorithm, and chaotic predator prey biogeography-based optimization (CPPBBO) algorithm [2729], make no such solutions. Instead, the next point is determined by the stochastic decision rules. Therefore, this paper proposes a stochastic search technique based on the differential evolution (DE) algorithm [30] to solve the optimization problem in MPC, which has been successfully applied in many fields recently [31, 32].

Although the concept of NN-based MPC is not new [33, 34], the application of RNN-based MPC in multiquadrotor formation flight is rather scarce. Motivated by this fact, this paper presents the dynamical RNN-based MPC strategy for the multiquadrotor formation flight. The main contributions are summarized as follows: (i)A distributed hierarchical control system with the translation subsystem and rotational subsystem is proposed to simplify the control process for the unmanned quadrotor(ii)To overcome the drawbacks of conventional MPC, RNN is introduced as the predictive model in MPC to deal with the mismatched modeling for both subsystems(iii)The composite control structure based on RNN and MPC is novel for multiquadrotor formation flight, which shows a good control performance in simulation results as well as in theoretical analysis

The remainder of this paper is organized as follows. In Section 2, the problem formulation is introduced briefly. Section 3 models the MPC-based tracking problem. In Section 4, the RNN-based prediction model is presented. The control structure is provided in Section 5. In Section 6, the main results concerning stability of the proposed control scheme are analyzed. Simulation results are shown in Section 7. Section 8 includes the conclusion.

2. Problem Formulation

Suppose that there are () unmanned quadrotors with the same dynamic characteristics moving in , which together compose a multiquadrotor formation system. In what follows, the unmanned quadrotor is a miniature four-rotor quadrotor [35]. Each quadrotor can be viewed as a 6 Degree-of-Freedom (DOF) rigid body with generalized coordinates , where denotes the quadrotor’s positions with reference to the earth-fixed frame (E-frame) and represents the attitudes with reference to the body-fixed frame (B-Frame) (). Furthermore, the flight motion is controlled by a total thrust and three control torques , , and . And the dynamical model of the quadrotor can be defined by the nonlinear state-space function . To simplify the control process, the quadrotor system can be decomposed into two subsystems in Figure 1: the translational subsystem and the rotational subsystem [36].

Define the system state of the translational subsystem as and its control input as [37], where and are two intermediate control inputs, given by

Then, the dynamics of translational subsystem can be described as where : is a nonlinear function, is the bounded disturbance, is the translational subsystem output, and

Similarly, the system state and control input of rotational subsystem are represented by and , respectively [37]. Then, the dynamic equation is described as where : is a nonlinear function, is the bounded disturbance, is the rotational subsystem output, and

Based on (2) and (4), both subsystems can be expressed in the discrete-time domain as follows: where and are discrete nonlinear functions and is the sampling time.

As shown in Figure 2, the leader-follower structure is introduced for the multiquadrotor system. Define a quadrotor as the leader in the formation, and its state vector is represented by . Each quadrotor is equipped with a complete set of communication equipment and controllers. Assuming that the information exchange topology is undirected and connected, each follower receives the control commands from the leader directly or indirectly, then executes the corresponding maneuvers to keep the relative state to the leader, where denotes the relative state of the follower.

The distance between the quadrotor and the quadrotor can be described as follows:

In order to avoid collision among quadrotors, should be greater than the safe anti-collision distance , i.e.,

Also, to ensure the real-time direct communication of adjacent quadrotors, would be less than the communication distance , i.e.,

3. Model Predictive Control

Generally, MPC is a control process based on predicting future outputs and obtaining nonlinear model outputs at each time horizon [38]. In this paper, a distributed MPC feedback system is designed for each quadrotor in Figure 3. It is mainly composed of the predictive model, reference planning, feedback compensation, and rolling optimization, where , , , , , and denote the control input, output state, reference, predictive output, revised predictive output, and disturbance input, respectively.

At each time horizon, the predictive output is obtained by the predictive model, to determine the revised predictive output by feedback compensation. Then, the control input is optimally determined by comparing the revised predictive output and the reference . Based on that, the system output is obtained. Meanwhile, the next time horizon is starting and the new predictive output is predicted by the predictive model. The whole process will be online iterative until the output of system reaches the reference .

Due to the nonlinearity and time-variant characteristic, the predictive output is hard to match the real output . The predictive errors between the actual output and predictive output are used to revise the predictive output. The formulation can be described as where is the weight coefficient matrix.

Considering that the quadrotor system decomposed into two subsystems, MPC is introduced separately for both subsystems. In the outer loop, the translational subsystem is controlled to follow a sequence of reference . The optimal control inputs , , and are obtained by minimizing the trajectory tracking errors. With , the desired attitudes and are calculated using (1). Then, the rotational subsystem’s attitudes are adjusted to achieve the reference in the inner loop. By minimizing attitude errors, the optimal controls , , and are calculated. Finally, the optimal control inputs , , , and are applied to the quadrotor.

Consider the MPC formulation based on the constrained finite-horizon optimization of tracking problem for translational subsystem [39]: subject to where is the revised predictive output and and are control inputs and incremental control moves, respectively. is the reference position. and are the upper and lower bounds of the system states and control inputs, respectively. and are two 2-norms weighted by positive definite matrices and , respectively. is the prediction horizon, is the control prediction horizon (), and represents the penalty factor.

Remark 1. The tracking problem of the quadrotor formation is decomposed into two phases. Firstly, the leader tracks a given trajectory in a certain state. Then, the followers regard the leader as the trace point to track and further achieve keeping formation. Therefore, if the quadrotor is the leader, denotes reference trajectory; otherwise, it represents the position of the leader for the followers.

Similarly, for the rotational subsystem, the MPC optimization problem is described as subject to where represents the reference attitude.

In addition, the nonlinear constraints of (11a) and (12a) should be considered further. A feasible approach is to introduce a penalty cost, transforming the constrained problem to an alternative unconstrained form [39]. Taking the translational subsystem for example, we substitute and (inequality constraint (11b)), and (inequality constraint (11c)), and and (inequality constraint (11d)), then the penalty cost function can be defined as follows: where with and , if ; otherwise, and . The introduction of makes it possible to consider the inequality constraints during the optimization.

Similarly, the penalty cost function of the rotational subsystem can be given by

Based on (8) and (9), the constraints of and are also taken into account. The corresponding auxiliary optimization term is described as where and as a penalty term is added in (13) according to the flight conditions of the multiquadrotor.

Remark 2. The cost functions of (13) and (15) are complex and nondifferentiable. So the optimization problem using traditional search techniques is not feasible. In this study, the DE algorithm [30] as a probabilistic search technique is proposed to solve the optimization problem for MPC. The corresponding details will be presented in Section 5.

4. RNN-Based Prediction Models

Although the feedback mechanism of MPC tolerates some model mismatches for multiquadrotor system, MPC still demands that the prediction model be sufficiently accurate. In this section, RNN is considered as the prediction model for MPC, due to its superiority in comparison with other conventional modeling methods [40].

4.1. Dynamical RNN

Generally, in a RNN, there is usually a feedback from the hidden layer output to the hidden layer input. This recurrent connection allows the NN to detect and generate time-varying patterns [41]. In this paper, the RNN predictive model is separately developed for the translational subsystem and rotational subsystem. Both subsystems belong to the class of multi-input/multioutput square nonlinear dynamical systems (that is, the systems with as many inputs as outputs), which have the general form of where is the state, is the control input, is the system output, is a smooth function, is a constant matrix, and is a matrix with columns ; note that and contain parametric uncertainties.

This paper deals with the control of both subsystems described by (17), i.e., and . However, due to the complex environment and disturbances, the system (17) is difficult to use if the functions and cannot be accurately known. Therefore, in order to provide a solution to this problem, a dynamical RNN is chosen to obtain a more accurate model of (17) in Figure 4.

This RNN contains dynamical hidden neurons, i.e., a number of neurons equal to the dimension of the state vector of subsystem dynamics. And it can be described by the following system [42, 43]: with the state , the input , the output , is a matrix of adjustable synaptic weights, is a diagonal matrix with negative eigenvalues , is a diagonal matrix with scalar elements , and is a diagonal matrix with adjustable synaptic weights. is a 6-dimensional vector with elements of , and is a matrix with elements of . and are represented by sigmoids of the form where , , , and are constants representing the bound and slope of the sigmoid’s curvature and is a constant that shifts the sigmoid, such that for all . In this paper, the designed parameters are chosen as , , , and .

4.2. Online Updating for RNN

Based on the above analysis, suppose that both subsystems of the quadrotor can be accurately described by the dynamical RNN (19) plus a modelling error term as follows: where and are the corresponding weight values.

Note that the optimal control input is applied to both the real system and the predictive model. Define the error between the predictive model (19) and the real system (22) as . Assuming is zero, we obtain where and , which are substituted online through the appropriate updating laws.

To obtain the stable updating laws, the Lyapunov synthesis method is introduced. Consider a Lyapunov function candidate as where satisfies the Lyapunov equation .

Differentiating (25) along the solution of (24), there exists

By choosing then

Consequently, based on (27), the weight matrices can be updated online over time horizons.

Theorem 1. The updating laws (27) can guarantee the following properties: (i)(ii)(iii)

Proof 1. Since in (28) is negative semidefinite, we have , which implies , , and . Furthermore, is bounded. Since is a nonincreasing function of time and bounded from below, there exists . By integrating from 0 to , we obtain which implies . By definition, and are bounded for all and all inputs to RNN are also bounded by assumption. From (24), we conclude . Since and , based on Barbalat’s lemma, there exists . Due to the boundedness of , , , and the convergence of to zero, we have that and also convergence to zero.

Remark 3. Combined with the MPC, updating online for RNN is executed over time horizons. With the actual outputs and obtained at the time horizon, the weight matrices of the RNNs are updated online to predict the subsystem output for the next time horizon.

5. Control Structure

The composite control structure based on RNN and MPC is proposed in Figure 5 for each quadrotor, which consists of three main parts: the predictive model based on RNN, the designed cost function, and the evolutionary algorithm. In this figure, the and blocks represent the predictive models of the translational subsystem and the rotational subsystem, respectively. and are the optimal control sequence of the translational subsystem and the rotational subsystem, respectively, where the first elements, i.e., and , are taken as the control inputs for both subsystems.

The evolutionary algorithm to introduce in this paper is the DE algorithm. It is an optimization algorithm which searches randomly in continuous space with three evolutionary operations, namely, mutation, recombination, and selection. Suppose that the initial population has individuals inside the given -dimensional search space. The solution vector in continuous space can be represented by , , [30]. The optimization goal of the DE algorithm is to find the proper control variable that minimizes the cost (13) and (15).

5.1. Mutation Operation

As shown in Figure 6, for each in the parent population, the mutation individual is produced by where , , and are randomly chosen and different from the running index and is a real constant scaling factor within , which controls the amplification of the differential variation .

5.2. Recombination Operation

For each , a trial individual is generated by coping components from the mutation vector and the target vector in dependence, as shown in Figure 7. It can be formulated as follows: where is a uniform random number and the recombination control parameter is a constant in the interval .

5.3. Selection Operation

Offspring competes one to one with its parent . The selection operation is expressed by

This selection scheme ensures that the best control variable with the minimum cost can be retained when moving from one generation to the next, which results in a fast convergence behavior.

Generally, the DE algorithm has been successfully applied in some fields due to its high search accuracy, robustness, and good convergence speed. However, when the optimization problem of multi-UAV tracking is complex and time-varying, selecting suitable mutation strategies and control parameters at different evolution stages is difficult. Therefore, in this paper, we adopt the ZEPDE algorithm [44], where the appropriate mutation strategy can be gradually adjusted by a roulette wheel, and suitable combinations of and can be generated by zoning evolution. Each individual has its own control parameter combination and mutation strategy. And the corresponding details can be found in [44].

The control algorithm can be summarized as follows:

Step 1. Load the state of the quadrotor and its corresponding reference. Initialize the solution and generate individuals randomly.

Step 2. The decoded values of each individual and in the population carried out the and models to predict the values of and , . Whereas, consecutive inputs to the RNN models can change their internal state in a detrimental way. Then, the DE algorithm is executed at fault due to this situation, leading the optimal control variables to not be found. To avoid this problem, the RNN models should return their initial condition before each individual carries out the RNN models.

Step 3. Calculate the fitness value of each individual using the predicted states of and , .

Step 4. The individual which has the best fitness value is recorded. Perform the selection method based on the fitness of individuals, and in this way, the population remains better individuals.

Step 5. Execute mutation, recombination to create new individuals in the population.

Step 6. Produce the prediction values for each new individual in the population and calculate their fitness values based on procedures in Steps 2 and 3. The best individual that has been recorded previously is located in the new generation instead of the individual with the worst fitness value.

Step 7. Repeat Steps 4-7 until convergence criteria of the evaluation function are met.

Step 8. The first elements and of the best individuals and apply both the RNN models and the real quadrotor system. The corresponding outputs are obtained.

Step 9. Perform the feedback compensation in MPC and online updating for RNN. Then, go to Step 2 for the next time horizon.

6. Stability Analysis

In this section, we first give the stability of the proposed RNN-based MPC. Meanwhile, the convergence analysis of the DE algorithm is also discussed.

6.1. Stability Analysis of RNN-Based MPC

The quadrotor system in this paper is represented in state space, and the cost functions (11a) and (12a) are also based on states. Therefore, the stability of the proposed predictive scheme is investigated by checking the monotonicity of the cost function with respect to time [39].

Problems (11a) and (12a) can be given in the following general form:

Assume that is the optimal control sequence at time . Then, the corresponding cost function is

Then, at time , based on the optimal control sequence of time , a suboptimal control sequence is postulated as

For the suboptimal control , the cost function can be described as

Assuming that with the same control input, the predictions at time are the same as ones derived at time , the difference between the and can be defined as

Taking into account the terminal equality constraints (33e), it is obvious that , then

Furthermore, introduce as the optimal solution further. It is obvious that , so

It is clear that for , the cost is monotonically decreased with respect to time while the proposed control system is stable.

6.2. Convergence Analysis of DE Algorithm

It is known that the MPC method divides the global control problem into a series of local optimizations and each local optimization problem is solved by the adaptive DE algorithm. In this section, the asymptotic convergence of the local optimization is first analyzed; on this basis, the global convergence analysis is presented as well.

6.2.1. Local Convergence Analysis [45]

Suppose that at the time horizon, the optimal solution set of the cost function can be defined as follows: where is a measurable search space and represents the optimal solution individual.

However, condition (40) is hard to satisfy in the actual process. Thus, an expanded optimal solution set is considered as follows: where is a small positive real number. Denoting as the Lebesgue measure, we suppose that the for each . And the expanded optimal solution (41) can be approximately regarded as the optimal one.

Before analyzing the convergence of the local optimization, we need to introduce a concept of the convergence in probability as follows:

Define as a population sequence generated by the adaptive DE algorithm for the local optimization problem, where . It is required that the algorithm converges to the optimal solution set in probability if and only if where denotes the probability of an event.

Assumption 1. Based on the above definition, it is assumed that for each population of the adaptive DE algorithm, there exists at least one individual such that where is a small positive value.

Proof 2. Suppose that an individual is generated randomly in each population . The probability that belongs to the optimal solution set is given by

It means that . And the probability that the first populations do not include an individual can be estimated by

Based on basic ideas of the DE algorithm, the best individual in the population has the same or better fitness value than the best one from all the previous populations, implying in other words, which proves (42).

6.2.2. Global Convergence Analysis

Based on the basic ideas of the MPC approach, the local optimizations follow the same control algorithm by using the adaptive DE algorithm. Therefore, the local convergence analysis of the time horizon can be applicable for the time horizon, . Furthermore, the local optimization of the time horizon is executed on the basis of the optimization results of the time horizon. Finally, this local convergence of each time horizon leads to the global convergence of the entire control process.

7. Simulation Analysis

The performance of RNN-based MPC for quadrotor formation flight is evaluated by computer simulations. All the simulations are programmed with MATLAB, version 2015, and are run on a PC with a clock speed of 2.8 GHz and 4 GB RAM, in a Microsoft Windows 8.1 environment.

7.1. Offline Training for RNN

Before the experiment, the weights of RNN should be determined preliminarily. The input output data for RNN training is obtained by simulating a quadrotor model [35] with a PID controller. A large training data set spanning over 1000 seconds is generated by using the random step references ranging from 5 seconds to 15 seconds. All system states and control inputs are restricted within the corresponding ranges. To enable the RNN model to learn the noisy dynamics in the presence of sensor noises, wind gusts and aerodynamic coupling effects, additive disturbances are included for data generation during the simulation. The disturbances and are in the form of white noise with various intensity. And the network is adopted in the form of (19). Figure 8 shows one sample of the training data sets. Based on the test data, the training performance of RNN is tested in Figure 9, compared with the FNN model [20] and nonlinear partial-differential-equation-based model [35]. From the figures, the apparent advantage of the RNN model is the prediction accuracy. In other words, these results show the success of the dynamic RNN structure.

7.2. Results and Discussion

A scenario is constructed to simulate a certain flight condition for the multiquadrotor formation. Assuming that a square formation is composed of five quadrotors, the relative position of each quadrotor is described in Figure 10, and their initial positions are , , , , and . Besides, the initial attitudes of all quadrotors are supposed to be . Define the 3rd quadrotor as the leader flying along a reference trajectory called “elliptical”: , , , and . And the other quadrotors as the followers fly with the following relative states: , , , and . To simulate the real flight environment, the bounded disturbances are introduced in rotation subsystem (white noise of intensity 0.3 rad/sec) and transition subsystem (white noise of intensity 0.5 m/sec). Moreover, the quadrotor’s safe anticollision and communication regions are , , respectively.

The parameter settings of the proposed system condition are as follows: the dynamic parameters of unmanned quadrotor system in this paper are represented in Table 1. The limits on the system states including the range of , , and ; the corresponding turn rates , , and ; and the control inputs , , , and are shown in Table 2. Table 3 shows the DE algorithm-run parameters and the associated weight values of the fitness function.

In such a case, the MPC with the RNN online model is introduced for the formation-tracking problem over the prediction horizons, compared with the conventional MPC and traditional PID, where the parameters of the PID controller are , , and . For both conventional MPC and RNN-based MPC, the prediction horizon and control horizon are and and the sampling frequency is 4 Hz. In Figure 11, taking the tracking performance of the leader as an example, the results of three different control methods are presented. It is obvious that the performances of the RNN-based MPC approach with smallest tracking errors are better than the other two approaches. In particular, the control inputs of the PID controller are out of constraints at some time horizons.

In order to reflect the control results intuitively, the tracking performance of the quadrotor formation can be measured using the integral of absolute error (IAE) [46]. where is the total number of quadrotors and is the total number of time horizons.

The IAEs of the conventional MPC and RNN-based MPC are presented in Table 4. Based on that, the tracking performance controlled by RNN-based MPC is better with the smaller errors. This is because the RNN is trained based on the test data and has learned the influences of unknown disturbances before experiments. On the other hand, due to the updating online for RNN, the proposed MPC method could estimate in a timely manner the disturbances improving the control performance.

The 3-D performance of formation tracking with RNN-based MPC are depicted graphically in Figure 12. The overall variations of attitude and position are shown in Figure 13. The corresponding errors of relative attitude (Figure 14(a)) and position (Figure 14(b)) between the leader quadrotor and followers are within the anticipated range. Generally, the proposed control system manages to accurately achieve formation tracking with fast and precise responses and effectively attenuate the mismatched disturbances within the acceptable error range.

8. Conclusion

In this paper, a novel predictive control scheme based on RNNs was proposed for the multiquadrotor system to achieve trajectory tracking and formation keeping simultaneously. Different from the conventional MPC, the proposed control approach designed the RNNs as the predictive model and used the adaptive DE algorithm to solve the local optimization problem for both subsystems of each quadrotor. Moreover, the closed-loop stability was analyzed; meanwhile, the convergence of the DE algorithm was discussed as well. Finally, the effectiveness of the designed control strategy was illustrated through comparative simulation results. However, the applicability of the proposed control structure is limited to the high computation burden of receding optimization. Therefore, an appealing extension of this work could focus on the time efficiency of whole control process.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors gratefully acknowledge the support of the Aeronautical Science Foundation of China under Grant No. 20155896025.