Modeling Multioperator Multi-UAV Operator Attention Allocation Problem Based on Maximizing the Global Reward

Wu, Yuhang; Huang, Zhonghua; Li, Yinlin; Wang, Zhiqi

doi:https://doi.org/10.1155/2016/1825134

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion References Copyright Related Articles

Research Article | Open Access

Volume 2016 | Article ID 1825134 | https://doi.org/10.1155/2016/1825134

Modeling Multioperator Multi-UAV Operator Attention Allocation Problem Based on Maximizing the Global Reward

Yuhang Wu,¹Zhonghua Huang,¹Yinlin Li,¹and Zhiqi Wang¹

Academic Editor: Asier Ibeas

Received15 Jun 2016

Revised22 Oct 2016

Accepted13 Nov 2016

Published13 Dec 2016

Abstract

This paper focuses on the attention allocation problem (AAP) in modeling multioperator multi-UAV (MOMU), with the operator model and task properties taken into consideration. The model of MOMU operator AAP based on maximizing the global reward is established and used to allocate tasks to all operators as well as set work time and rest time to each task simultaneously for operators. The proposed model is validated in Matlab simulation environment, using the immune algorithm and dynamic programming algorithm to evaluate the performance of the model in terms of the reward value with regard to the work time, rest time, and task allocation. The result shows that the total reward of the proposed model is larger than the one obtained from previously published methods using local maximization and the total reward of our method has an exponent-like relation with the task arrival rate. The proposed model can improve the operators’ task processing efficiency in the MOMU command and control scenarios.

1. Introduction

The mode of human in the loop is widely used in the command and control (C2) of unmanned aerial vehicles (UAVs) presently. With the development of autonomous control and artificial intelligence technology, the problems in the single operator controlling multi-UAV (SOMU) have been investigated [1–5]. Given the increasing complexity of the missions, paradigm is shifting from single operator managing single UAV (SOSU) or multiple UAVs (SOMU) towards the concept of multioperator managing multi-UAV (MOMU) [6]. By assigning multiple operators to multiple UAVs, the flexibility of human-system decision-making can be improved [6]. Therefore, there have been many researchers working on C2 of MOMU [7–12]. However, the increased size of information captured by UAVs will restrict the tasks that need to be accomplished in a timely manner [6]. In order to improve the task processing throughput in MOMU, it is of necessity to understand which tasks the operators should attend to and when they should, which is known as the attention allocation problem (AAP).

Srivastava et al. [13] pointed out that work time and rest time for each task should be allocated to the operator. Bertuccelli et al. [4] proposed a nonpreemptive scheduling formulation for a single operator performing a search mission with multi-UAV in a time-constrained environment. Jian et al. [5] modeled a single operator AAP with one operator controlling multi-UAV. Srivastava et al. [13–16] proposed an optimization framework to solve the concerns about how much attention the operator should be allocated as well as where it should be allocated. Crandall et al. [17, 18] introduced an attention allocation method for one operator controlling multirobot, and they applied this method with different strategies. It was found that guiding operator attentional resources can more effectively exercise the operator’s judgment and experience than dictating them. However, the methods for single operator AAP proposed above are not applicable for multioperator AAP. In multioperator AAP, not only the work time and rest time but also how the tasks would be allocated to the operators should be considered and dealt with.

A few studies were concerned about multioperator AAP. Verma and Rai [6] modeled MOMU operator AAP and balanced reward maximization and operators’ workload minimization. But the work time of tasks of the model was fixed and a local optimization of one task had been achieved. Majji and Rai [19] further extended the research of Srivastava et al. [16] and developed an optimized solution to the multioperator AAP, which obtained a local optimization of one operator. The aforementioned methods accomplished a local optimization in solving multioperator AAP, without considering the influence of wait time on the tasks. Our research further improved the study of some previous works [5, 13, 19] by taking the wait time into consideration for multioperator AAP. Targeting to maximize the global reward, a model of multioperator AAP based on maximizing global reward is developed.

The paper is organized as follows: Section 2 introduces the theory on attention allocation; Section 3 proposes the MOMU operator AAP model; the simulation and result are presented in Section 4; and finally Section 5 discusses the conclusion and further improvement.

2. Theory on Attention Allocation Problem

Multioperator AAP in MOMU is dominated by various factors such as operator state, task properties, and environment [19]. Environment is a complex factor due to its unpredictability; thus the two following main aspects are considered here: operator state and task properties. Operator state includes operator utilization ratio, operator skill level, and operator performance. Task properties include task starting time, latency penalty due to wait time, weight, complexity, and reward.

2.1. Operator State

2.1.1. Operator Utilization Ratio

Operator utilization ratio represents the busy level of one operator. The higher the utilization ratio is, the busier the operator will be.

The relation of operator utilization ratio and operator performance obeys the Yerkes-Dodson law [20], which is shown in Figure 1. According to the Yerkes-Dodson law, the performance of a human operator is unimodal, which can be expressed by an inverted-U function of the utilization ratio [13]. When an operator is within an appropriate range of utilization ratio, the performance of the operator will be relatively high. Otherwise, the performance will decrease. High utilization ratio will fatigue the operator while low utilization ratio will make the operator out of the control loop, which will lead to the decreasing situational awareness of the operator. As a result, both high utilization ratio and low utilization ratio will distract the operator’s focus on the C2 of UAVs. In this way, the operator will have to take more time to deal with a task. Operator utilization ratio can be defined by the following differential equation [13]: where is a constant that depends on operator’s sensitivity.

If the initial operator utilization ratio is , the work time is and the rest time after the task is . Then, recursive expression for the operator utilization ratio is

The evolution of operator utilization ratio with work time and rest time is shown in Figure 2. The utilization ratio will increase as the work time becomes longer. On the contrary, the ratio will decrease with longer rest time.

2.1.2. Operator Skill Level

Operator skill level represents the ability of the operator to process tasks. Operator skill level will affect the performance of operator. In this article, all operators are assumed to have the same skill level to avoid any difficulties in quantification.

2.1.3. Operator Performance

As UAVs are under supervisory control, a UAV operator can be treated as a two-alternative decision-making server. The decision made by an operator can be either correct or wrong. The probability of correct decision is defined to be the performance of the operator. There are two models of two-alternative decision-making for operators.

Pew’s Model [21]. The probability of correct decision at a given work time , while , is expressed as follows:where is the correct decision, is the work time, and , , and are the parameters related to operators and tasks. The evolution of the probability of the correct decision under Pew’s model is shown in Figure 3(a).

(a) Pew’s model

(b) Drift diffusion model

Drift Diffusion Model [22]. The probability of correct decision at a given work time , while , is expressed as follows:where is drift rate, σ is diffusion rate, is decision threshold, and is the evidence at time t. The evolution of the probability of the correct decision under drift diffusion model is shown in Figure 3(b).

Pew’s model has been widely used as operator performance models in recent researches on operator AAP [5, 16, 19]. In this paper Pew’s model is also used as the operator performance model. The operator performance formula is where and are parameters that are determined by operator skill level and task complexity. Assume that all the operators have the same skill level, so and are determined only by task complexity.

2.2. Task Properties

Task Weight. Tasks performed by UAV operators have different types. Even the tasks having the same type may have different degrees of importance. Thus, each task has a task weight denoted by , which is used to represent the task importance [16].

Task Complexity. Task complexity is denoted by , which can affect work time and operator performance.

Latency Penalty. When the task queue is not empty, operator cannot handle all the tasks on time. Tasks waiting in the queue will cause latency penalty. The latency penalty per unit time of one task is [16] (each task has a different ); work time is ; rest time after the task is . All the tasks after the handling one requires wait time of , which will cause latency penalty. Assume that the sum of latency penalty per unit time of all the tasks after the handling one is ; then the latency penalty caused by the handling task is .

Task Reward. Task reward is the product of task weight and operator performance: [16].

Task Starting Time. As tasks arrive dynamically, their starting time is not the same, which causes different wait times in task queue. And different wait times will affect the latency penalty.

2.3. Evaluation Criterion of Attention Allocation

2.3.1. Integrated Reward

With dynamic arriving of tasks, the attention allocation is solved in multistage. In each stage, there is integrated reward of both a single task and a single operator.

Integrated reward of a single task is the task reward subtracting the latency penalty caused by this task. Integrated reward of a single operator is the sum of integrated rewards of all his current tasks in that stage.

2.3.2. Global Reward

Global reward is the evaluation criterion of attention allocation in one stage, which is the sum of integrated rewards of all the operators in one stage.

2.3.3. Total Reward

Total reward is the sum of global rewards in all stages, which is the evaluation criterion of attention allocation after all the tasks have been allocated.

3. MOMU Operator AAP Modeling

In Section 2, the concepts about attention allocation were introduced. According to the process of multi-UAV supervisory control, a multi-UAV multioperator attention allocation framework was established, which is shown in Figure 4.

UAV swarm generates and broadcasts the tasks, which need to be handled by the operators. And then, according to the task properties and operator state, the support system allocates all the tasks in the task queue to multiple operators using MOMU operator AAP model based on maximizing the global reward. At the same time, the support system allocates work time and rest time to each task. Operators deal with the tasks and send results to the support system to achieve the C2 of UAVs.

Assume that there are operators, and they need to handle tasks at some point; the operator performance for task is expressed by the operator performance formula (6): . Task complexity is , which is characterized by the pair . According to the importance of the task, the support system will set a weight to task . Latency penalty per unit time of task is . Assume that the support system allocates work time and rest time to task ; the reward of task is , and the latency penalty due to the wait time of task is . The utilization ratio of operator before processing task is . is the function of , which is used to determine the lower limit of expected work time of task for operator . The support system solves multioperator AAP in order to maximize the global reward at each stage.

At a stage, assume that operator gets tasks. Work time and rest time are allocated to task , and task reward for processing task is . The work time and rest time of task will not affect the tasks before it; the latency penalty caused by task is . Integrated reward of one task is the task reward subtracting the latency penalty caused by this task. Then, the integrated reward of task is

The integrated reward of operator is the sum of integrated rewards of his tasks and it is written as

The sum of integrated rewards of all the operators is the global reward. For all , indicates whether operator is assigned to perform task , where means that operator will process task ; otherwise, operator will not process task . Task reward for processing task is . The sum of latency penalty per unit time of all the tasks after task is ; then latency penalty caused by task is . The integrated reward of task of operator is

Then the integrated reward of operator is

Let be the utilization ratio of operator before starting task and be the function of that captures expected service time of operator on task . The aim of the model is to maximize the global reward in an attention allocation of a stage; the objective function of multioperator AAP, which is the sum of the integrated rewards of operators, can be written as follows:where , , , , , , . are the bounds of operator utilization ratio, and , are N-vectors of , , which represent work time and rest time, respectively. is an matrix of , which represents task assignment indicator variable.

Suppose that tasks arrive at a certain rate ; the task queue will change dynamically. Attention allocation for dynamically arrived tasks will be solved at multistage. After an attention allocation which is referred to as a stage here, operators will work for a while to fulfill the tasks. And when one of the operators has no task to handle, the support system will start another stage. The interval between two adjacent stages is , which is the shortest total time of operators’ remaining tasks in the previous stage. During time , the number of tasks in the task queue will be . Assume that the number of the tasks will not change during a stage; at each stage the number of tasks will be constant. Just before each stage, some of the operators’ local task queues may not be empty, so two variables and are introduced. is the integrated reward of remaining tasks in the local task queue of operator , and is the time required for the remaining tasks. The sum of latency penalty per unit time of all the new tasks is , and the latency penalty cause by is . Then the integrated reward of operator is where , , and is the number of remained tasks of operator .

Two variables and are added to the integrated reward in (12). The objective function is the sum of the integrated rewards of operators. Then the objective function is evolved to

During an attention allocation, operator may be performing a task, and this task will not be a complete one. Assume that and are the remaining time to complete task ; then and are modified as where , . Actually, only and of the first task in the local task queue of an operator are different from and .

Assume that there are stages; the total reward is the sum of the global rewards at all the stages. The global reward at each stage can be obtained by (13). At stage there are tasks. Then the total reward will be written as

4. Simulation

All the simulations are carried out in Matlab. The setup of each simulation of the method in this paper is shown in Simulation Setup.

Simulation Setup(1)Initialization: , , , , , (where is the total number of the tasks);(2)Get first properties of all the tasks remained. Delete the first properties. .(3)Generate initial individuals: ;(4)Calculate fitness using dynamic programming algorithm. Calculate the replication according to the Euclidean distance;(5)Termination condition is satisfied? Yes: Output the optimal task assignment , , , go to step ; No: go to step ;(6) if () Then , ; if , then . Calculate , using equation (14), go to step ; Else Then end.(7)Copy according to the fitness and concentration of replication;(8)Cross;(9)Variation, go to step .

4.1. The Result of MOMU Operator AAP Model Based on Maximizing Global Reward

Assume that per unit time one task comes into the task queue, as it means . Task properties are randomly generated with bounds, where task complexity is a pair of random integers: , , latency penalty per unit time is , and weight is a random integer of . The lower limit of expected work time of task for operator is [13]. The operator utilization ratio range is . The number of operators is . The initial value of utilization of four operators is . The initial number of tasks in the task queue is 10, and there will be 30 tasks dynamically generated. The MOMU operator AAP model based on maximizing global reward is solved by immune algorithm and dynamic programming algorithm. The result of this simulation is shown in Figure 5, where work time is shown in Figure 5(a) and rest time is shown in Figure 5(b).

(a) Work time for each task

(b) Rest time after each task

The result of the simulation shows that the model can dynamically allocate the attention of operators. However, not all the tasks are handled during the simulation, while some tasks are dropped in order to maximize the global reward.

4.2. Comparison with Model Based on Maximizing Local Reward

This experiment is set to compare the model based on maximizing global reward (BOGR) with the model from the reference [19]. Since dynamic attention allocation was not introduced in [19], here is a brief description of dynamic attention allocation model based on maximizing local reward (BOLR).

Step 1. Sequence the operator in ascending order according to the utilization rate.

Step 2. Use the optimal solution from [19] to solve APP for all the operators in turn.

Step 3. Sequence the operator in ascending order according to the total time for all the tasks in their local task queues.

Step 4. Wait until one of the operators has no task. If the task queue is empty, wait a unit time for the tasks coming until no task comes; otherwise, return to Step 2.

With the initial number of tasks in the task queue being 10 and following 30 tasks dynamically generated, the model introduced in this paper which is based on maximizing global reward and the method described above which is based on maximizing local reward are used to solve multioperator AAP, respectively. In both methods, the total rewards are obtained. This experiment considers two conditions with the number of operators being and , respectively. In these two conditions, ten times of simulations are carried out using the two methods. And the results are shown in Figures 6(a) and 6(b). It is shown that, compared to the total reward of the model based on maximizing local reward, the total reward obtained from the model in this paper is increased by 28% in the case of on average while it is increased by 30% in the case of .

(a)

(b)

From the results of this experiment, it is shown that the total reward of the model based on maximizing global reward is larger than the reward obtained from the method based on maximizing local reward. But the model in this paper costs longer time than the compared model because of using intelligent optimization algorithm.

4.3. The Effect of Task Arrival Rate on Total Reward

The number of operators is 4 in this experiment, and the task arrival rate is from 1 to 10. For each arrival rate the simulation is carried out ten times, and the average values of the total reward for each arrival rate are shown in Figure 7.

The result shows that the total reward decreases with the increasing of the task arrival rate at the beginning (). With the task arrival rate keeping increasing (), the total reward fluctuates. The total reward decreases at the beginning (), because tasks arrive earlier when the task arrival rate increases. Thus, they will have to wait longer in the task queue and task latency penalty will become larger. As the task arrival rate keeps increasing () and the total number of tasks remains constant, the support system will allocate all the tasks almost in two stages. The first stage will allocate 10 tasks and the second stage will allocate all 30 tasks dynamically generated. With the same number of stages, the same property of tasks will lead to nearly the same result, different properties of tasks will lead to different results without too many changes for the bounds of the properties. So the total reward fluctuates with no significant increase or decrease.

5. Conclusion

The MOMU operator AAP model based on maximizing the global reward is established in this paper. This model can dynamically allocate all the tasks in the task queue to the proper operators and set work time and rest time to each task at the same time. Validated by the simulation routine using Matlab environment, it is found that the total reward of our model is larger compared to the values obtained from the previous methods based on local reward maximization. Moreover, the result shows that the task arrival rate and total reward of MOMU operator AAP model have an exponent-like relation.

The proposed method improves the study of the MOMU operator attention allocation problem by evaluating the reward of task assignment planning of the command and control operator team, instead of only one single operator as concerned by the previous attempts. Accordingly, the team performance of command and control, or the global reward, has been enhanced. The method can be applied to the MOMU command and control scenarios and contribute to the task processing efficiency of the operator team.

In this paper, operators are treated as two-alternative decision-making models. However, in other cases, operators of UAVs may have more than two decisions for command and control. In the future, we will consider multialternative decision-making models for operators.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

D. Perez, I. Maza, F. Caballero, D. Scarlatti, E. Casado, and A. Ollero, “A ground control station for a Multi-UAV surveillance system: design and validation in field experiments,” Journal of Intelligent and Robotic Systems: Theory and Applications, vol. 69, no. 1–4, pp. 119–130, 2013.
View at: Publisher Site | Google Scholar
M. L. Cummings, S. Bruni, S. Mercier, and P. J. Mitchell, “Automation architecture for single operator, multiple UAV command and control,” International C2 Journal, no. 2, 2007.
View at: Google Scholar
G. Zhang, X. Lei, Y. Niu, and D. Zhang, “Architecture design and performance analysis of supervisory control system of multiple UAVs,” Defence Science Journal, vol. 65, no. 2, pp. 93–98, 2015.
View at: Publisher Site | Google Scholar
L. F. Bertuccelli, N. W. M. Beckers, and M. L. Cummings, “Developing operator models for UAV search scheduling,” in Proceedings of the AIAA Guidance, Navigation, and Control Conference, Toronto, Canada, August 2010.
View at: Publisher Site | Google Scholar
L. Jian, D. Yin, L. Shen, and J. Yang, “Optimal attention allocation to visual search tasks of multi-UAVs based on operator model,” in Proceedings of the 12th IEEE International Conference on Mechatronics and Automation (ICMA '15), pp. 1767–1771, August 2015.
View at: Publisher Site | Google Scholar
A. Verma and R. Rai, “Modeling multi operator-multi-UAV (MOMU) operator attention allocation problem,” in Proceedings of the ASME 2013 International Mechanical Engineering Congress and Exposition, vol. 39, no. 1, San Diego, Calif, USA, November 2013.
View at: Google Scholar
B. Mekdeci and M. L. Cummings, “Modeling multiple human operators in the supervisory control of heterogeneous unmanned vehicles,” in Proceedings of the the 9th Workshop on Performance Metrics for Intelligent Systems, pp. 1–8, Gaithersburg, Md, USA, September 2009.
View at: Publisher Site | Google Scholar
F. Gao and M. Cummings, “Using discrete event simulation to model multi-robot multi-operator teamwork,” in Proceedings of the Human Factors and Ergonomics Society Annual Meeting (HFES '12), vol. 56, pp. 2093–2097, October 2012.
View at: Publisher Site | Google Scholar
M. Lewis, H. Wang, S. Y. Chien, and P. Scerri, “Teams organization and performance in multi-human/multi-robot teams,” IEEE International Conference on Systems Man & Cybernetics, vol. 12, no. 4, pp. 1617–1623, 2010.
View at: Google Scholar
P. J. Lee, H. Wang, S. Y. Chien, M. Lewis, and P. Scerri, “Teams for teams performance in multi-human/multi-robot teams,” Human Factors & Ergonomics Society Annual Meeting Proceedings, vol. 54, no. 4, pp. 438–442, 2010.
View at: Google Scholar
H. Wang, M. Lewis, and S.-Y. Chien, “Teams organization and performance analysis in autonomous human-robot teams,” in Proceedings of the 10th Performance Metrics for Intelligent Systems Workshop (PerMIS '10), pp. 251–257, Baltimore, Md, USA, September 2010.
View at: Publisher Site | Google Scholar
H. Wang, S. Y. Chien, M. Lewis, P. Velagapudi, P. Scerri, and K. Sycara, “Human teams for large scale multirobot control,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '09), pp. 1269–1274, October 2009.
View at: Publisher Site | Google Scholar
V. Srivastava, A. Surana, and F. Bullo, “Adaptive attention allocation in human-robot systems,” in Proceedings of the American Control Conference (ACC '12), pp. 2767–2774, Montreal, Canada, June 2012.
View at: Google Scholar
V. Srivastava, R. Carli, C. Langbort, and F. Bullo, “Task release control for decision making queues,” in Proceedings of the American Control Conference (ACC '11), pp. 1855–1860, San Francisco, Calif, USA, June 2011.
View at: Google Scholar
V. Srivastava, A. Surana, M. P. Eckstein, and F. Bullo, “Mixed human-robot team surveillance,” Mathematics, In press.
View at: Google Scholar
V. Srivastava, R. Carli, C. Langbort, and F. Bullo, “Attention allocation for decision making queues,” Automatica, vol. 50, no. 2, pp. 378–388, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
J. Crandall and M. Cummings, “attention allocation efficiency in human-UV teams,” in Proceedings of the AIAA Infotech@Aerospace Conference, vol. 17, pp. 1–9, Rohnert Park, Calif, USA, 2007.
View at: Google Scholar
J. W. Crandall, M. L. Cummings, M. D. Penna, and P. M. A. De Jong, “Computing the effects of operator attention allocation in human control of multiple robots,” IEEE Transactions on Systems, Man, and Cybernetics C: Applications and Reviews, vol. 41, no. 3, pp. 385–397, 2011.
View at: Google Scholar
M. Majji and R. Rai, “Autonomous task assignment of multiple operators for human robot interaction,” in Proceedings of the American Control Conference, pp. 6454–6459, 2013.
View at: Google Scholar
R. M. Yerkes and J. D. Dodson, “The relation of strength of stimulus to rapidity of habit-formation,” Journal of Comparative Neurology and Psychology, vol. 18, no. 5, pp. 459–482, 1908.
View at: Publisher Site | Google Scholar
R. W. Pew, “The speed-accuracy operating characteristic,” Acta Psychologica, vol. 30, pp. 16–26, 1969.
View at: Publisher Site | Google Scholar
R. Bogacz, E. Brown, J. Moehlis, P. Holmes, and J. D. Cohen, “The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks,” Psychological Review, vol. 113, no. 4, pp. 700–765, 2006.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2016 Yuhang Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

679

Downloads

777

Citations