#### Abstract

The reasonable scheduling of multisensor systems to maximize combat benefits has become a research hotspot in the field of sensor management. To minimize the uncertainty in the threat level of targets and improve the survivability of sensors, a risk-based multisensor scheduling method is proposed in this paper. In this scheduling problem, the best sensors are systematically selected to observe targets for the trade-off between the threat assessment risk and the emission risk. First, the scheduling problem is modelled as a partially observable Markov decision process (POMDP) for target threat assessment. Second, the calculation methods of the threat assessment risk and the emission risk are proposed to quantify the potential loss caused by the uncertainty in the threat level of targets and the emission of sensors. Then, a nonmyopic sensor scheduling objective function is built to minimize the total risk which is the weighted sum of the threat assessment risk and the emission risk. Furthermore, to solve the high complexity computational problem in optimization, a decision tree search algorithm based on branch pruning is designed. Finally, simulations are conducted, and the results show that the proposed algorithm can significantly reduce the searching time and memory consumption in optimization compared with those of traditional algorithms, and the proposed method has a better risk control effect than the existing sensor scheduling methods.

#### 1. Introduction

With the development of sensing technology, multisensor systems play an increasingly important role in various fields. In the military field, the multisensor systems often have constraints in operation, algorithmic complexity, deployment space, and other aspects, especially when the uncertainties of the target state increase, and the operational difficulty of the sensor system will increase greatly [1, 2]. Therefore, it is necessary to manage sensors in an effective way to maximize combat benefits. Researchers have begun to focus on Bayesian management optimization methods since Nash used the linear programming theory to establish a sensor management objective function in 1977. Until now, the management method based on Bayesian theory has evolved into three main methods, namely, task-based methods, information-based methods, and risk-based methods [3].

The above three methods all build objective functions related to the optimal scheme of sensor management. In the task-based method, the objective function is directly related to the tasks that the sensors need to execute. Typical examples are the covariance matrix of target state [4–6], the posterior Cramér-Rao lower bound [7–9], the probability of target detection [10–12], and the posterior expected number of targets [13]. The information-based method is aimed at reducing the uncertainty in targets or the environment and is driven by information, which can be approximated as a general management model for various tasks. This kind of method usually establishes the objective function relative to the information gain of the sensors. Typical examples are Shannon entropy [14], Fisher information [15–17], Kullback-Leibler divergence [18, 19], and Rényi divergence [20–22]. The above two methods focus on optimizing one or more indicators through sensor management. However, due to the uncertainty in the target state and sensor measurement, there are uncertain risks in every decision. In some cases, it is better to control the risks for reducing the potential losses caused by decisions, rather than obtaining the optimal values of these indicators [23, 24]. For example, a better target tracking accuracy can be obtained by the first two methods in the process of target tracking. However, if the targets need not to be attacked, it is enough to keep the target without loss instead of obtaining too high tracking accuracy. (If all the possible positions where the targets can occur are distributed in the radar beam range, even if the target tracking accuracy is poor, the target can not be lost) At this time, it is of less practical significance to consider the target tracking accuracy. In risk-based sensor management, the risk of losing the target caused by different tracking schemes is further considered in the model, which makes the model more practical. Furthermore, the risk-based method also focuses on decreasing the risk caused by the misjudgement of the target state, such as the classification error in target recognition [25, 26] and the false alarm error in target detection [27].

However, previous work has not focused on controlling the risk in the process of target threat assessment. Target threat assessment is the basis of many combat tasks, and commanders will determine what action to take with regard to the target by the assessment results. However, the assessment results will be affected by the measurement error of sensors and the uncertainty in the target state, which may generate corresponding risks. For example, when a high-threat target is misjudged as a low-threat target, we will assign fewer defensive resources to the target, which may result in a lethal attack on the defence target, and when a low-threat target is misjudged as a high-threat target, this misjudgement may only result in a waste of defensive resources, which represents a smaller loss than in the previous case. Therefore, it is necessary to control the risks in the process of target threat assessment to minimize the uncertainty in assessment results. Furthermore, the emitted signal of the active sensor may be intercepted by enemy detectors, thus exposing its position to the enemy. Therefore, it is necessary to manage the emission risk of active sensors to improve their battlefield survivability. A reasonable quantification of the emission state of the sensor is a prerequisite for controlling emission risk. In [28–30], the transmitting power, target echo power, target receiver sensitivity, and other parameters are used to calculate the intercepted probability of the sensor to represent the emission state at each time step. However, it is difficult to obtain prior knowledge, such as the target receiver sensitivity in practice. To solve this problem, the emission level impact (ELI) is used to replace the intercepted probability in [31], which represents the cumulative intercepted emission of the sensor intercepted by the enemy. It is not necessary to obtain related parameters of targets for the calculations, which has good practical application value.

In this paper, we aim to schedule sensors to control the total risk in the process of target threat assessment. The total risk is divided into the threat assessment risk and the emission risk to quantify the potential loss caused by the uncertainty in the threat level of targets and the electromagnetic emission of sensors. Moreover, the above risk-based management methods are all myopic managements methods; that is to say, the methods take the minimum risk in the next sampling time step as the optimization objective. Although the method has a lower calculation cost, it makes the decision-making optimization of the whole sensor management process become a greedy search to a certain extent, without considering the influence of sensor actions on the future system state [32]. Therefore, this paper expands the risk-based management method from a myopic to a nonmyopic method, which takes the cumulative risk over a period of time as the basis for decision-making to obtain better combat benefits.

The main ideas in this paper are summarized as follows. In Section 2, sensor scheduling model is modelled as a partially observable Markov decision process (POMDP). In Section 3, the target threat and emission state are quantified, and calculation methods of the threat assessment risk and the emission risk are proposed. Based on the analysis of Sections 2 and 3, a nonmyopic multisensor scheduling objective function is established in Section 4. Then, a decision tree search algorithm is proposed to solve this scheduling problem in Section 5. In Section 6, the simulations are conducted and the paper is concluded in Section 7.

#### 2. Sensor Scheduling Model

For the convenience of presentation and expression, we made the following assumptions.

*Assumption 1. *All sensors work independently and all of the sensors are active sensors.

*Assumption 2. *All targets move independently.

*Assumption 3. *There are* M* sensors to assess the threat level of* N* targets, which tend to attack our defence target.

*Assumption 4. *Let denote discrete time step.

A schematic of multisensor scheduling is shown in Figure 1. Because of the uncertainty in the sensor measurements and the randomness of target motions in the whole scheduling process, the scheduling problem is a decision-making problem under uncertain information conditions, and the POMDP is a theoretical method to study multistage decision-making in a stochastic environment [33]. Therefore, the sensor scheduling problem can be modelled as a continuous-state, discrete-time POMDP, which is described in detail as follows.

##### 2.1. Sensor Scheduling Action

To denote the sensor assignment at time step , sensor scheduling action is an matrix where elements or indicate whether sensor is to be scheduled for assessing the threat level of target at the next time step .

##### 2.2. System State

The system state is denoted by .

indicates the state of all targets at time step , where is the state of target including the position coordinates and velocity information in a three-dimensional Cartesian coordinate system. The target state at the next time step can be determined according to the following state transition equation:where , , and represent the state transition matrix, process noise gain matrix, and Gaussian process noise, respectively. Generally, the process noise gain matrix and the process noise are used to describe the uncertainties of target state distribution in motions. The process noise gain matrix represents the magnitude of Gaussian process noise on each state variable. The Gaussian process noise is zero-mean white Gaussian noise and the covariance matrix of the process noise is , where is the process noise standard deviation of the position estimates [28, 34]. There are two common motion models: the nearly constant velocity (NCV) model and nearly constant turn (NCT) model, which can be described aswhere is sampling interval and is the turn rate of the target.

indicates the emission state of all sensors at time step , where is the ELI state of sensor , representing the cumulative intercepted emission until time step . The value of can be quantized into a positive integer set , and each value in this set represents a true emission level. The larger the value, the greater the probability of the sensor being attacked by the enemy [31]. The ELI state transition process can be approximated as a Markov process by introducing state transition matrix to describe the state transition. If sensor is activated to the assess target threat at time step , thenwhere is the transition probability. Otherwise, is an dimension identity matrix.

##### 2.3. System Observation

Similarly, the system observation is denoted by .

represents the observations of all targets at time step , where observation value of target is obtained by sensor measurements. If sensor is activated to observe target , the measurement equation is as follows:where represents the measurement equation of sensor , is the Gaussian measurement noise, , , and represent the measurement noise of range , azimuth angle , and elevation angle , respectively. The calculation methods of , and are shown in where , , and are the position coordinates of sensor .

represents the instantaneous emission observations of all sensors at time step , where denotes the instantaneous emission of . We quantize the instantaneous emission as a positive integer set , which can be called the instantaneous observed emission level [31]. If sensor is activated at time step , then its instantaneous observed emission level can be calculated by a set of observation matrices:where indicates the probability that the level is when the ELI state changes from to . Otherwise, its observation matrix is an dimension identity matrix.

#### 3. Risk Calculation Method

Predicting the total risk in future action cycles is the premise of formulating scheduling plans. In this paper, we divide the total risk into the threat assessment risk and the emission risk.

##### 3.1. Threat Assessment Risk

In the process of threat assessment, since the threat degree is a variable that changes with the target state, the uncertainty in the target state will be expanded when the threat degree is calculated, making the result of the threat assessment difficult to accurately assess. For example, when we consider that threat degree is a variable related to target velocity and horizontal distance , its uncertainty distribution is shown in Figure 2.

###### 3.1.1. Threat Model

In this paper, we consider the horizontal distance , the height , the velocity , and the course angle as the threat degree factors of target at time step , which can be calculated as where , , and are the coordinates of our defence target position.

Then the threat function of each factor is constructed as follows:where , , and are the points where the threat degree is equal to the extremum. And , , , and are the distance coefficient, the height coefficient, the speed coefficient, and the angle coefficient, respectively.

After obtaining the threat degree of each factor, the total threat degree is calculated aswhere , , , and represent the weights of the horizontal distance, height, speed, and course angle to the total threat degree, respectively. Generally, the weights are set according to the operational experience and actual situation [3, 35, 36].

Then, threat level of the target can be determined by setting level rule. In this paper, we define the threat level as 1 (low-threat level), 2 (medium-threat level), and 3 (high-threat level), and the level rule is as follows:where and represent the boundary points of the threat level.

In addition, more threat degree factors and different threat levels can be considered in the threat model according to different actual situations.

###### 3.1.2. Threat Assessment Risk Calculation

Because the threat degree is only related to the target state, the risk is stated by estimating the target state. Since the target state cannot be fully observed, the target belief state is introduced, which is a sufficient statistic of historical information, indicating the probability distribution of the target state [37].

According to the Bayesian estimation method, the transition of the target belief state can be divided into the prediction stage and update stage. The prediction stage is described as

The updating stage is described as

If both the process noise and the measurement noise are Gaussian noise, the target belief state is also a Gaussian distribution. The target belief state can be updated by cubature Kalman filter (CKF) [38]. When the target has multiple motion models, its belief state can be updated by combining the interacting multiple model (IMM) algorithm and the CKF [39].

Then, threat assessment risk prediction can be conducted by predicting the target belief state, and the process is as follows.

*Step 1. *Obtain the belief state of target at time step , and predict the belief state at time step by the CKF.

*Step 2. *Obtain sample points that can be implemented in the Monte Carlo method by sampling from the belief state at time step .

*Step 3. *Calculate the corresponding threat level by using the sample points according to (8)-(11) and obtain the number of sample points with low-, medium-, and high-threat levels of , , and , respectively.

*Step 4. *Get the probability distribution of threat level .

*Step 5. *Set loss matrix , where indicates the potential loss of assessment error in which the real threat level is but the estimated level is .

If the threat level of target is estimated as =, the risk is . Then, the threat assessment risk is defined as , and the threat level is .

*Step 6. *Set , go to Step 1, and perform calculation cycles ( is the horizon length). Then, the cumulative threat assessment risk of target is given by

Take the sensor scheduling actions into consideration; is defined as the threat assessment risk of target by sensor at time step . Then, the cumulative threat assessment risk of all targets is given by

##### 3.2. Sensor Emission Risk

Similarly, the ELI state of the sensors cannot be fully observed, so the ELI belief state is introduced to indicate the probability distribution of the ELI state at time step . If the instantaneous emission observation is , the ELI belief state can be updated by the hidden Markov model filter:where the symbol represents the Hadamard product, and is the identity vector.

However, in actual scheduling, we cannot obtain at time step , but its probability distribution is obtained as

The distribution is represented by the following matrix:

Then, the predicted ELI state is updated as

Furthermore, the emission cost of sensor at time step is calculated aswhere represents the ELI value corresponding to the belief state.

According to the relationship between ELI and intercepted probability, the intercepted probability of the sensor is calculated as

Taking the loss into consideration when the sensor is destroyed, and the sensor emission risk is calculated aswhere represents the tactical value of sensor .

Taking the sensor scheduling actions over time steps into consideration, the cumulative emission risk is calculated as

Furthermore, the cumulative emission risk of all sensors is given by

#### 4. Objective Function

Considering the threat assessment risk and the sensor emission risk, combined with (15) and (24), the objective function over the future horizon of time steps is given bywhere represents the total risk and is the equilibrium coefficient.

Specifically, the sensor scheduling objective function proposed in this paper is a nonmyopic function that considers the cumulative total risk in the time step domain . That is, the sensor scheduling scheme that can gain the lowest cumulative total risk will be selected. The optimal solution of the objective function is the optimal scheduling scheme in the time step domain of . At this time, the scheduling problem is converted to an optimization problem. Since the problem is a POMDP problem with continuous scheduling actions, its computational complexity will increase exponentially with increasing time step, which makes it difficult to meet the real-time requirements of scheduling. Therefore, to obtain the optimal solution in a short time, we transform the scheduling problem into a decision tree optimization problem and propose a uniform cost search (UCS) algorithm based on branch pruning in the next section.

#### 5. UCS Algorithm Based on Branch Pruning

Figure 3 shows a decision tree with , , and . As can be seen from Figure 3, each available scheduling action at each time step is a node, and the lower node contains the scheduling scheme of all the upper nodes. We denote the node cost as the cumulative total risk corresponding to the scheduling scheme contained in the node. Then, the optimal solution is converted to the scheduling scheme contained in the lowest node with the smallest node cost.

There are three commonly used decision tree search algorithms, namely, the breadth-first search (BFS), depth-first search (DFS), and UCS. Compared with BFS and DFS, in which all nodes are traversed, UCS preferentially opens the node with the lowest cost, which has the highest search speed [40]. However, as the number of nodes increases exponentially, UCS still needs a lot of time. Therefore, the branch pruning method is introduced in this paper. By estimating the lower bound of the node, the branch with a lower node bound greater than the current minimum node cost can be deleted in time, and the number of opened nodes can be reduced. For the node containing scheduling scheme , its lower bound is given by

Here, node cost is known, but is unknown, which indicates the minimum cumulative total risk in the future time step domain of . The minimum risk in future time steps must be calculated by opening all child nodes, so we propose an approximate estimation method for the lower bound value to improve the efficiency of the algorithm.

Because of the stability of the sensor measurements and the persistence of the target motions, the estimation error of the target state will not change much from the previous time step. Therefore, the threat assessment risk will not be mutated most of the time. Furthermore, according to simulations, the threat assessment risk in the next time step is usually 0.6-1.6 times that in the previous next time step. Then, the suboptimal lower bound of the threat assessment risk is given bywhere represents the threat assessment risk at time step after performing scheduling scheme .

This is because, , the following applies:where represents the emission risk at time step after performing scheduling scheme .

Therefore, the suboptimal lower bound of the emission risk can be given:

Then, combined with (27) and (29), the suboptimal lower bound of the node is as follows:

When UCS is carried out, the lower bound of the node needs to be compared with the current optimal cumulative total risk to determine whether the pruning condition is met. In summary, the UCS algorithm based on branch pruning is as follows.

*Step 1. *Add the root node to the list, set the initial optimal risk , and specify that, once the node is opened, the node will be deleted from the list.

*Step 2. *If the list is not empty, open the first node in the list.

If the depth of the child node , calculate the lower bounds of all the child nodes and compare the lower bounds with in turn. Delete the node with a lower bound greater than and order the remaining nodes in the list from small to large by the values of the lower bounds, and go to Step 2.

If the depth of the child node , calculate the node cost. If the node cost is less than , record the current cost as and the corresponding scheduling scheme of the node as the optimal scheme, and go to Step 2.

*Step 3. *If the list is empty, end the search. Record the current optimal scheduling scheme as the optimal solution.

#### 6. Simulations

In our simulations, 4 sensors are used to assess the threat levels of 2 enemy targets. The sensor sampling interval is 1 s and the simulation duration is 60 s. Target 1 moves along a uniform straight line, whose initial position and velocity are and , respectively. Target 2 turns left at an angle of from 20 to 30s, turns right at an angle of from 30 to 45s, and moves along a uniform straight line during other times, and its initial position and velocity are and , respectively. The sensor parameters are shown in Table 1. The values of the parameters in (9) are set as , , , , , , and . Furthermore, the weights of threat factors , , , and are 0.4, 0.2, 0.1, and 0.3, respectively, the boundary points of the threat level are set as and , and the assessment loss matrix is .

Moreover, we quantify the ELI state as and the instantaneous observed emission as . For generality, the sensor with a high measurement accuracy more easily obtains a high emission state. Then, the ELI state transition matrix of each sensor is as follows:

##### 6.1. Determination of the Equilibrium Coefficient

Equilibrium coefficient can adjust the impacts of the threat assessment risk and emission risk on the total risk, thus affecting the decision-making process. We study the impacts of different equilibrium coefficients on the two kinds of risks for a horizon length , which are shown in Figure 4. It can be seen in Figure 4 that the greater , the lower the threat assessment risk and the higher the emission risk. This phenomenon is because, with the increasing , the impact of threat assessment risk on the total risk will gradually increase, and the system will pay more attention to control the threat assessment risk, which makes it fall. Similarly, the system will pay less attention to control the emission risk. Only when , are the two kinds of risks very similar, indicating that the scheduling scheme at this time can balance the impacts of the two risks well. Therefore, we choose in the next simulations.

##### 6.2. Algorithm Performance Comparisons

To verify the advantages of UCS based on branch pruning (UCS-BP), we introduce traditional UCS to compare with. Figure 5 reveals the comparison of the percentage of opened nodes for each algorithm under different lengths of . Figure 6 shows the comparison of the cumulative total risk. Table 2 summarizes the statistics, which include the average number of opened nodes, the maximum number of stored nodes, and the cumulative total risk. The search time and memory consumption of the algorithm are proportional to the average number of opened nodes and the maximum number of stored nodes, respectively [18]. The cumulative total risk can reflect the solution quality of each algorithm.

**(a)**

**(b)**

**(c)**

**(d)**

It can be seen that UCS is able to search for a better solution, so its cumulative total risk is lower than that of UCS-BP. However, compared with UCS, UCS-BP can significantly reduce the number of nodes opened and the maximum number of nodes stored, thus reducing the search time and memory consumption. In addition, it can be seen from Table 2 and Figure 6 that when = 2, 3, or 4, the risk decreases with increasing . However, the risk rises instead of falling when . This result is because the prediction error of the system state will increase with increasing of , resulting in an increase in the risk.

Furthermore, there is little difference between the optimal solution and the solution under UCS-BP when . Considering the computational complexity and solution quality of the algorithm, we choose in the next simulations.

##### 6.3. Analysis of the Scheduling Method

Figure 7 shows the time-varying total risk curve of the predicted value and the estimated value. The predicted value is the risk predicted by the belief state before performing a scheduling action, and the estimated value is the risk estimated by the actual measurement after executing a scheduling action. It can be seen that the predicted value is approximately equal to the estimated value the whole time. Consequently, the belief state prediction method can effectively predict the state of the target and the ELI state in future time steps even if the system state is not observable. In addition, the results indicate that it is reasonable to use the predicted risk as the scheduling basis.

Figure 8 shows the projections of the trajectories on the X-Y plane and the optimal scheduling scheme. Figure 9 shows the ratio of the threat assessment risk to the total risk. It can be seen that as target 1 moves close to defence target and target 2 moves away from defence target, the uncertainty in the threat level becomes increasingly small, so the threat assessment risk also becomes low, and the system will pay more attention to controlling the sensor emission. According to Figure 8, we can see that, at the end of simulation time (52-60 s), the system only schedules sensors 1 and 2 with a low emission performance, and the threat assessment risk ratio is less than 10%, which shows that the main factor affecting decision-making is the emission risk during these time steps. It also reveals the reason why the total risk in Figure 7 tends to be stable after decreasing.

Figure 10 is the threat level error sampling diagram of target 1 to show the uncertainty in the threat assessment. It can be seen that, in the process of threat assessment, the uncertainty in the target state will be transmitted to the threat model, which will generate the corresponding risk in the threat level assessment. From 20 to 35 seconds, the sampling points are widely distributed between both high- and medium-threat levels, and the risk of threat assessment is the highest. As the target gradually approaches the defence target, the distribution of sampling points moves to the high-threat level range, and the uncertainty in the sampling points decreases. When all sampling points are at a high level, the probability of a high-threat level of the target is 1 and the threat assessment risk is 0. At this time, only the emission risk of sensors need to be controlled.

To fully illustrate the effectiveness of the proposed scheduling method (PSM), we choose four existing methods to compare with, namely, (1) the random scheduling method (RSM) where the random sensor combinations are scheduled at each time step, (2) the closest scheduling method (CSM) where the sensors closest to each target are scheduled [28], (3) the fixed scheduling method (FSM) where the fixed sensor combinations are scheduled in the simulation duration (in this paper, we choose to assign sensor 1 and sensor 2 to assess target 1 and 2, respectively) [28], and (4) the myopic scheduling method (MSM) where scheduling plans are decided based on the total risk of the next time step [29].

Figure 11 shows the total risk curve of each method, and Figure 12 compares the normalized risk for the different methods. Since RSM, CSM, and FSM do not predict risk of scheduling, the corresponding total risk is very high. We find that PSM minimizes the three kinds of risks among all methods, indicating that the proposed method in this paper can control the total risk well to improve the security of sensor systems and defence target. The total risk of MSM is lower than those of RSM, CSM, and FSM but higher than that of PSM, which reveals that myopic scheduling can control the risk to some extent by predicting the risk in one step, but cannot obtain the best risk control effect compared with nonmyopic scheduling.

#### 7. Conclusions

In this paper, we propose a risk-based multisensor nonmyopic scheduling method that schedules sensors to minimize the uncertainty in the threat level of targets and improve the survivability of sensors. First, a sensor scheduling model based on the POMDP is established, and the state transition law and observation law of target and sensor emissions in the scheduling process are presented. Next, the threat assessment model and the emission model are established, and corresponding risk calculation methods are proposed. Then, a nonmyopic multisensor scheduling objective function is built. Furthermore, to determine the optimal scheduling scheme quickly, an improved decision tree search algorithm, namely, the UCS algorithm based on branch pruning is designed. The simulation results show that the proposed algorithm can obtain high quality solutions quickly and the proposed scheduling method can reasonably predict and control the total risk. In addition, compared with common scheduling methods, the proposed scheduling method can reduce both the threat assessment risk and radiation risk.

A sensor management method that has continuity and relevance for the combination of multiple combat tasks, such as target detection, target recognition, and target tracking, needs to be proposed in future studies.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This research was funded by DPRFC (Defense Pre-Research Fund Project of China), grant number 012015012600A2203.