#### Abstract

We address a dynamic configuration strategy for teams of Unmanned Air Vehicles (UAVs). A team is a collection of UAVs which may evolve through different organizations, called configurations. The team configuration may change with time to adapt to environmental changes, uncertainty, and adversarial actions. Uncertainty comes from the stochastic nature of the environment and from incomplete knowledge of adversary behaviors. To each configuration, there corresponds a set of different properties for the UAVs in the team. The design for the configuration control problem involves a distributed hierarchical control architecture where the properties of the system can be formally analyzed. We do this in the framework of dynamic networks of hybrid automata. We present results from simulation to demonstrate different scenarios for adversarial response.

#### 1. Introduction

The use of Unmanned Aerial Vehicles (UAVs) has increased radically in the last decades. In order to reduce the risk to human life, both emerging military and civilian applications promote autonomous UAVs use. Civilian applications include geological surveying, fire monitoring, and rescue missions. Military applications include Intelligence, Surveillance and Reconnaissance (ISR), Suppression of Enemy Air Defense (SEAD), and high-value asset recovery Scenarios. In early years, UAVs were completely controlled by human operators from the ground. The last decade has witnessed unprecedented interactions between technological developments in computing, control, and communications. These developments led to the design and implementation of interacting dynamical systems such as networked unmanned multivehicle systems.

##### 1.1. Motivation

The problem of modeling the operations of opposing forces in a battlefield is challenging, for several reasons. One is the combinatorial nature of the problem. Other difficulties include the complexity of interactions (e.g., by task coupling and uncertainty), the dynamic nature of the situation, the fact that decisions must be made with partial, limited information, and the need for stochastic modeling. Open research problems include what is the most adequate form of organization for a particular pattern of adversarial behaviors? How can the form of an organization adapt to changes in the adversarial behaviors?

A closer look at the operational environments tells us that their structure may provide guidelines for the design of automated organizations. In fact, these environments have some structure and we should be able to take advantage of this fact in order to propose forms of organization for unmanned air vehicles that are not only best adapted to each specific situation presented by the adversary, but that are also able to adapt the organization to changing situations.

##### 1.2. Problem Statement

This paper is devoted to describing the modeling required to perform high-level tasks with teams of UAVs in the context of adversarial operations. For motivation, we consider a high-value asset recovery Scenario, see Figure 1. In this scenario, we focus on the Blue team, which consists of UAVs trying to penetrate/eliminate the Red force’s integrated air defense system to open the road for the ground vehicles to get to a high-value asset. We are concerned with the organization and control of the Blue team under the assumption that forms of organization of the Red force are known. To this end, we introduce the notion of a team. We define a team as a grouping of an *n* number of UAVs which have a common mission, where *n* is unity or greater. Note that it is possible for a team to be composed of only one UAV. For the examples considered in this paper, all UAVs have the same abilities. Future work will consider heterogeneous UAVs. Team configurations are intended to model the properties resulting from synergistic interactions inside the Blue or Red force. Team configurations are modified and adjusted frequently. For example, the integrated air defense system under consideration consists of Surface-to-Air Missile (SAM) sites, whose operations are coordinated by a command center. There are many situations in which the configuration of a team of UAVs should be adjusted online, to adapt to changing parameters, to maximize the effectiveness of the mission, or to maintain the safety of the units involved. Here, we are dealing with a large adversarial environment which contains many areas of interest. Each area of interest contains a different number of SAM sites with short-range radars and a command center. Each team of UAVs should adjust its configuration according to the SAM sites’ configurations and shooting capabilities.

We aim to study what is meant by configurations and how can certain configurations adapt to changes in the adversarial behavior. To do that, we consider two different types of configurations; UAV configurations and SAM configurations. We assume that SAMs can work in one of these two configurations: cooperatively or independently. We define the cooperative configurations for the SAMs as the case when the command center is alive and performing its communication role between SAM sites, the SAMs’ probabilities of detecting and destructing UAVs increase, and the firing strategy changes so that fewer missiles are fired (in comparison to the case of independent behavior). When the command center is rendered inoperative-in the case of independent configuration, the SAMs will be changed to the isolated firing mode. With respect to UAVs configurations, UAVs can either work as an integrated team or isolated. We assume that if we have a number of UAVs we will send them sequentially in small groups of one or more. So the question is should we send one UAV or a team of UAVs to target this area now? We assume that if we have only one UAV, it will have more ability to shoot and hide. A team of UAVs will have higher collaborative shooting ability but it will also be easier to track and shoot them. For all configurations, UAVs should fulfill the allowable risk factor condition (i.e. if the risk factor for a given SAM site from a certain location is higher than the input allowable risk, the UAVs should try to find another location to shoot this SAM or go for another SAM which fulfills the risk condition). The solution to these problems is not easy because of the combinatorial nature of the problem, the complexity of interactions, the dynamic nature of the situation, and the need for stochastic modeling.

##### 1.3. Literature Review

There has been much work recently dealing with control and coordination of teamed UAVs in adversarial environments. The complexities of managing teamed UAVs in stochastic adversarial terrain require new UAV designs, new techniques for navigation and control, and new collaborative methodologies, as well as interfaces with the human operator. Collaboration of multiple UAVs in complicated environments is also an area of wide scope and great interest, including planning, scheduling, and resource allocation for multiUAV, multitask missions [1–4], coordinated area surveillance [5, 6], collision avoidance [7], etc. New concepts of operation are emerging for UAVs in SEAD mission, as described in [8–11], and facilitated by the Interactive Warfare Simulation (IWARS) [8], Boeing C4ISim Open Experimentation platform (Boeing OEP) [9], and Flexible Analysis Modeling and Exercise System (FLAMES) software [11]. A number of hardware platforms [12–16] are available for those wishing to validate their ideas experimentally. Until recently, much of the work available in literature [14, 17] dealt with motion control of vehicles, usually in the form of waypoint following or trajectory tracking. Parallel advances in wireless communication technologies are enabling applications that include cooperative control of multiple UAVs working together to accomplish a greater mission goal. Multiple-UAV control strategies are emerging, for example, in battlefield scenarios where *N* UAVs are assigned to strike *T* known targets in the presence of dynamic threats [18, 19] and in the fields of synchronized path planning and cooperative rendezvous problems where multiple UAVs must arrive at their targets simultaneously [18, 20–23]. Cooperative strategies have also been considered, for example, in [24], which considers cooperative search strategies under collision avoidance and communication range constraints, and in [25], which presents a completely decentralized, hybrid systems approach and does not require that the UAVs stay within communication range of each other, as well as in [26, 27] which consider formation flight problems. Other approaches to supervision and control of multiple UAVs are considered in [28, 29]. However, only a small number of papers [9, 30–32] to date address the problem of dynamic teams whose composition may change at run time and of team supervision in this context.

##### 1.4. Original Contributions Statement

The original contributions of this paper are as follows.

(i)First, we present a new theoretic formulation of coordinated battle management of teamed UAVs. (ii)Second, we propose an approach utilizing Stochastic Dynamic Programming (SDP) for dynamical team configuration of UAVs in the presence of adversarial behavior. (iii)Third, we model this game in the framework of Dynamic Network of Hybrid Automata (DNHA) [33–37]. DNHA describe systems consisting of components which can be created, interconnected, and destroyed as the system evolves. Informally, a DNHA is a collection of hybrid automata that interact through the exchange of data and messages. (iv)Fourth, we scale our approach to large numbers of vehicles in either force.##### 1.5. Manuscript Organization

This paper is organized as follows. In Section 2, we present the problem formulation and the solution architecture. Section 3 represents the Coordination approach where SDP controller will be used to control UAVs team modeling. Section 4 is devoted to present the DNHA framework modeling. Simulation results for multi-UAV modeling in different adversarial environments are presented in Section 5. System scalability discussed in Section 6. The paper ends with conclusion.

#### 2. Problem Formulation and Solution Architecture

In order to manage the organization and behavior of a network of UAVs and control its ability to perform tasks over a region of interest and a period of time, we organize the functions into hierarchical layers as shown in Figure 2. Thus the complex design problem is partitioned into a number of more manageable subproblems that are addressed in separate layers. At each layer in the hierarchy, we model components by using DNHA.

##### 2.1. Theater-Level Decision Layer

This layer is an overseeing entity in charge of a geographic region (generally large). Oftentimes a human operator is responsible for the theater-level decisions, or capable of directly intervening in the processes of an autonomous agent. The theater-level decision layer allocates a target region for the UAV teams to survey. According to the information about the area of interest (i.e. the number of SAMs it contains and their capabilities) and after relating their capabilities to the UAVs’ capabilities, this layer makes the decision of how many UAVs will target each area.

##### 2.2. Team Coordination Mechanism Layer

The team coordination mechanism layer receives the target area and the number of UAVs allocated for this area from the Theatre-level Decision Layer, and configures the UAVs into teams to accomplish the given task. We address the team’s configuration control problem through Stochastic Dynamic Programming (SDP). This controller will address the problem of the UAVs configuration (isolated or integrated).

##### 2.3. UAV Supervisor Layer

The UAV Supervisor controller will plan the path for each UAV. It will organize SAMs according to their risk factor in an ascending order while trying to minimize the total elapsed time. If some SAMs are equal in risk factor, the controller will organize them according to a path cost policy. In order to minimize the time, we tried three different path policies: nearest point path, shortest path, and respective path. In the Respective Path Policy, the UAV will move to the first SAM that pops up and then to the next one until it captures all of them. In the Nearest Path Policy, the UAV will go to the nearest SAM from its current location each time. For the Shortest Path Policy, given a number of SAMs and their locations the UAV should visit them in such an order that the total distance it travels is minimum. Object and collision avoidance are also included as part of the UAV supervisor layer. Their description is outside of the scope of the paper [38].

##### 2.4. Vehicle Kinematics

We use steerable unicycle kinematics on the Manhattan grid. This level of abstraction is sufficient to evaluate the architecture and provides fast simulation results. UAVs are assumed to fly at low altitude along streets at constant speed, the cost of 90° turns is 0 and there is no 180° degree turn. We can describe these types of vehicles by the following differential equations:

where *x* and *y* are the UAV horizontal coordinates, is the linear forward velocity, and is the orientation of the vehicle. In addition, time is used as a continuous variable throughout.

#### 3. Coordination Approach

Here we consider the design of the Team Coordination Mechanism Layer. In particular, we study the problem of a number of UAVs trying to destroy a certain number of SAMs at each of N periods, while maximizing the incurred expected value function. This value function is described below. We set up the problem in the framework of dynamic programming [39]. Dynamic programming restates an optimization problem with uncertainties in recursive multistage decision process form. A solution is not merely a set of functions of time, or a set of numbers, but a rule telling the decisionmaker what to do, a policy. In this section, we propose an algorithm along the same lines, which solves our Problem to optimality.

##### 3.1. State Equation

The state at time , , is taken to be the number of SAMs alive at the beginning of the period. is the number of target SAMs in this period. is the probability of SAM destroying UAV according to the UAVs’ mode of operation (integrated or isolated).

##### 3.2. Control Constraint

If , the UAVs will work in isolated mode at stage *k*, if , the UAVs will integrate in a team at stage *k*.

##### 3.3. State Constraint

The number of SAMs alive must always be positive (or zero).

The value function is an additive function representing the sum of the expected conditional reward at each step for the destroyed SAMs minus the penalty at each step for the destroyed UAVs. Given an admissible policy where maps states into controls , and given the states and disturbances for the system given by , the optimal expected cost is

where is the optimal expected cost for *(N-K)* stages, is the terminal cost incurred at the end of the process, and is the cost incurred at time *k*.

The value equation for the *k*th period is given by

where is taken to be the number of the UAVs at the beginning of the *k*th period, *T* is the SAM value, and *U* is the UAV value. Thus the value per period is

In our problem, we assume at each of *N* periods two UAVs (*s _{k}=2*) will attack one SAM site (

*e*).

_{k}=1So the terminal value will be

##### 3.4. SDP Recursion

We study each step to find the optimal strategy (integrated or isolated mode) in this step.

By substitution in (7), the terminal cost function is

is taken to be the probability of SAM destroying UAV in “isolated” mode, is taken to be the probability of SAM destroying UAV in “integrated” mode. We assume that when UAVs integrate, the destroying probability will increase by a factor:

As ,

We focus on the relationship between the value function and the destroying probability at step to make the decision (integrated or isolated mode) at this step.

By substitution in (7) and after adding the terminal cost from (9), the cost at step is

By equating the integrated mode value function with the isolated mode value function,

there is a point , for which before this point, isolated mode value is always the maximum value function and after this point the integrated mode value is the maximum value function. So after observing the area of interest and surveying the battlefield, this controller will decide the UAVs team organization. For example, for *U/T*= 2 and , . We calculate for different *U/T* ratios. In Table 1 we present values for different *U/T* ratios.

For *U/T* = 2 and , the value function curve with respect to the destroying probability will be as in Figure 3.

If we track for different *U/T* ratios cost, we can see from Figure 4 that they are inversely proportional.

So if we try to implement the decision making controller flow chart for the UAVs, it will be represented as in Figure 5.

#### 4. DNHA Modeling Framework

We describe the model of the distributed control structure. We do this for individual layers in the framework of DNHA. We use DNHA for their ability to model interacting Hybrid Automata (HA) and allow for the creation and destruction of links and hybrid automata. Our model for interacting HA is as follows:

where *Q* is the finite set of states, *I* is the finite set of input events, *O* is the finite set of the output events, *V* are the internal variables, and Init is the initial state. The interpretation is that an input causes the system to move from one state to another state producing the output . For general representative reference see [33–37].

##### 4.1. Theater-Level Decision HA Representation

The model for Theater-level Decision Controller is a hybrid automaton with the following structure:

;–(,*…*, . are configurations); = number of SAMs in the target area and their configuration; = ;; the transition relation (that encodes the control logic).

So the input to this layer is the number of SAMs in the target area and their configuration. According to this input, the output will be determining number of UAVs targeting this area now.

##### 4.2. Team Coordination Mechanism HA Representation

Given the number of UAVs, this layer determines their configuration:

;;number of UAVs in the team, SAMs’ configuration and capabilities;;; the transition relation (that encodes the control logic).The directed graph corresponding to this hybrid automaton is shown in Figure 6.

##### 4.3. UAV Supervisor HA Representation

This layer is a combination of path planning controller and weapon management. The input to this layer is the UAVs’ configuration (isolated or integrated) and the number of SAMs assigned to this UAV (in isolated mode) or these UAVs (in integrated mode). From these target SAMs and according to their risk factor, this layer will choose the current target and plan the path for the UAV(s). The logic for this layer is shown below in Figure 7.

There is a local controller, the supervisor, for each UAV. The supervisor is modeled as a hybrid automaton:

;;; —the output events (state messages “go left, go right, go up, go down, live, dead, Tlive, Tdead”) where “live’’ or “dead’’ are for UAV status and “Tlive’’ or “Tdead’’ indicate target SAM status;; the transition relation (that encodes the control logic).The UAV-maneuver controller is shown in Figure 8. It contains three different states: idle, shoot, and move. Transition from the move state to the shoot state should go through the idle state first. Both the move and shoot states have their own logic sequences and will be represented as HA.

The “move” sequence can be decomposed hierarchically in two sections: “choose a path policy” and “follow this path.” The following directed graph (Figure 9) shows part of the move hybrid automaton which is related to the policy for choosing the next SAM.

The path following for each UAV has five different states. The “stop” state is used as a transition state between directional maneuvers as shown in Figure 10.

The shoot sequence for the UAVs has a different shooting probability in “isolated” mode than in “integrated” mode, which is represented as an efficiency command coming from the Team Coordination Mechanism Layer. The shoot sequence has HA representation as shown in Figure 11. The efficiency command in the case of SAM shooting is related to the command center status (operative or inoperative).

#### 5. Simulation Results

In the simulation, we tried a number of different configurations. First, we used a configuration as in Figure 12, in which the SAM sites are located in a heterogeneous distribution around the command center. In this configuration, the UAVs can start shooting the command center first from the opening locations and then go to the SAM sites according to their risk factor. Then, we used a slightly different configuration (Figure 13) in which SAM sites are distributed uniformly around the Command Center (CC). The command center can not shoot the UAVs; its role is just a communication role. The SAM sites in the presence of the command center will shoot with a probability , which is higher than, the shooting probability without the command center. Depending on the SAMs’ shooting probability, the Team Coordination Mechanism Layer will decide if UAVs should work in “integrated” or “isolated” mode. Then the UAV Supervisor will pick a target (a given SAM) for each UAV to shoot now as the current task. When the task is completed, the Team Coordination Mechanism has to check the SAMs probabilities again to decide if the UAVs will integrate or not in this step, and so on.

In the configuration given in Figure 13, the UAVs try to open a path for the command center by shooting the SAM sites first. But due to their higher shooting ability in the presence of the command center, the SAM sites disable one of the UAVs.

In all modes, the next target is selected as follows: first of all, the UAV will target the command center to minimize the shooting probability. As we can see in Figures 14–16, the UAV is shooting the command center after one UAV tried, but was destroyed by SAMs. In this step, when the UAV shoots the command center, all SAMs who are in range () of the UAV will start shooting this UAV at the same time. Keep in mind that if the UAVs integrate, they will have higher collaborative shooting ability. But it will also be easier to target and shoot them. Moreover in **“**isolated” mode, they will have more ability to shoot and hide. We use stochastic dynamic programming (SDP) to solve the problem of deciding when the UAVs should integrate. This approach yields a threshold above which UAVs should integrate and below which UAVs should remain in isolated mode.

Two uncertainties, risk and ability, are studied in the context of mission planning and execution; both of them are adversarial uncertainty and will be conceptualized in game-theoretic approaches.

(1)Risk: the SAM site’s probability of destroying the UAV; it is higher with the command center alive and doing its role allowing the SAM sites to communicate with each other. (2)Ability: the UAVs’ probability of destroying the SAM sites; whenever UAVs integrate, the risk is higher and the ability is lower.So for *U/T* = 2, we picked up one point before the () and ran the software 100 times in “integrated” mode and another 100 times in “isolated” mode and calculated the average value function for both of them to verify that the value function of the “isolated” mode is the maximum one and then did the same thing with another point after the to verify that the “integrated” value function is the maximum cost function. Now, we can see from Table 2 that the average value function of the “isolated” mode is greater than the average value function of the “integrated” mode for (which is lower than). On the other hand, the average value function of “isolated” mode is less than the average value function of “integrated” mode for (which is greater than).

#### 6. Conclusion

This paper proposed and studied a novel form of autonomous complex UAV mission management in hostile environments. We considered modeling formalisms for operation of opposing forces in the battlefield. A new Stochastic Dynamic Programming (SDP) scheme enables a number of UAVs to achieve autonomous battle management in the presence of adversarial behavior. A hierarchical specification methodology was introduced and it was shown that via hierarchy, the operator can inspect the mission and specify the number of UAVs in this mission and their specifications and also set the threshold limits for this mission offline. Then the SDP controller will tune this team of UAVs online during the mission. The online tuning depends upon risk assessment related to the threshold limits and predefined cost function. In this paper, we studied two different configurations for the UAVs team and modeled the scheme in the framework of Dynamic Network of Hybrid Automata to represent the evolving structure of the system. Even though the algorithm can be extended to incorporate more configurations in either team (Red or Blue). For the Blue team, you can solve the problem offline and specify threshold regions for each configuration. Then online, the controller will tune the team according to your threshold limits. For the Red team, we proposed different configurations in the examples. It was shown that the SDP methodology was robust enough to incorporate different variations relating to adversary actions and behavior models.

Further work in this direction would involve the case of heterogeneous enemies. Applications to problems of pursuit-evasion are also possible.

#### Acknowledgments

This research was supported by the Egyptian Ministry of Higher Education and the Michigan/AFRL Collaborative Center in Control Science.