International Journal of Aerospace Engineering

Volume 2018, Article ID 7901917, 9 pages

https://doi.org/10.1155/2018/7901917

## Optimal Maneuver Strategy of Observer for Bearing-Only Tracking in Threat Environment

Information and Navigation College, Air Force Engineering University, Xi’an 710077, China

Correspondence should be addressed to Hao Wu; moc.361@ydutsoahuw

Received 19 October 2017; Accepted 5 June 2018; Published 18 July 2018

Academic Editor: Mauro Pontani

Copyright © 2018 Renke He et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The optimal maneuver of observer for bearing-only tracking (BOT) in a threat environment is a complex problem which involves nonlinear filtering, threat avoidance, and optimal maneuver strategy. Under comprehensive consideration, the reward function comprised of the lower bound on detFIM and threat cost was established; the finite-horizon MDP principle was applied to obtain the optimal strategy. The quantization method was used to discretize the BOT process and calculate the transition matrix of Markov chain; to achieve quantization in the beginning of each period, CKF was applied to provide the initial state estimate and the corresponding error covariance. The numerical simulations illustrated the applicability and superior performance for static and dynamic target tracking in several scenarios in the threat environment.

#### 1. Introduction

Bearing-only tracking (BOT) techniques are applied in various scenarios of target location, and the related theoretical and practical problems have been studied for decades [1–3]. The accuracy and efficiency of the target location have witnessed significant improvement in recent years due to the improvement of advanced filtering techniques [4–6] and UAV (unmanned aerial vehicle) platform. Many researchers focus on location problems with static target or tracking problems under conditions of observer’s specific trajectories, while the target could be located more accurately if the observer implements maneuverings based on certain rules [7]. In a military field, the observer’s (aircraft, UAV, car, warship, submarine, etc.) maneuvering trajectories may be constrained by some threats (such as missile, torpedo, no fly zone, and air defense radar), and the optimal observer’s trajectories for BOT could fail to satisfy those constraints; in result, the observer’s maneuver need to not only ensure the achievement of bearing information but to also ensure the safety of the observer itself. This problem concerns the balance between accuracy of target tracking and threat avoidance.

For bearing-only tracking problem, the Cramer-Rao lower bound (CRLB) and Fisher information matrix (FIM) are usually used to evaluate the performance of target tracking and observer’s maneuvering. Fawcett [1] used the CRLB to evaluate the effect of course maneuvers on bearing-only range, but the optimization problem of the observer’s maneuver was not considered. In [8], the FIM was used to achieve the optimal leg of the observer, and the EKF algorithm was updated by heuristic evolutionary optimization algorithms to enhance the accuracy of the BOT, while heuristic evolutionary optimization algorithms cannot ensure the efficiency of calculation. In [9], recursive Bayesian estimation methods are used to several angle-only applications in sea-to-sea, air-to-sea, and sea-to-sea scenarios; the particle filter (PF) and range parameterized EKF were used for comparison. In [10, 11], Kalman filter (KF), EKF, and PF have been applied in bearing and elevation measurements for real-time object tracking in underwater environment, while they did not consider the optimal trajectories of the observer. In [12], the observability was analyzed when the observer maneuvers smoothly, and the necessary and sufficient conditions of observability were established. In [13], stochastic control was applied for underwater optimal trajectories in BOT, and dynamic programming was used to achieve the optimal control sequence. Zhang et al. [14, 15] applied Markov decision processes (MDP) and stochastic optimal control for optimization problem of the observer’s trajectories. In [15], the lower bound of the determinant of the FIM matrix was used as reward function, while the parameters evaluating distance from the observer to target were ignored due to its inaccuracy. The observer’s trajectories in all those researches do not consider the constraints of the threat environment, while obstacle/threat avoidance is the pure control or decision methodology applied for an autonomous flight of UAV [16, 17], which has little connection with BOT problem. Thus, it remains difficult to ensure that the observer’s maneuvering trajectories satisfy the requirements of BOT and the threat avoidance at the same time.

In this paper, the model of threat environment was established, and the cost of threat avoidance was combined with the detFIM in reward function. Based on the initial state estimate and the corresponding error covariance provided by the cubature Kalman filter (CKF) [18], the quantization method [19] was applied to discretize the whole process. Finally, the optimal maneuvering strategy can be calculated in each step by the finite-horizon MDP approach based on the reward function. The paper is organized as follows. Section 2 defines the BOT problem, Section 3 introduces the quantization method and CKF, Section 4 introduces the establishment of reward function, Section 5 presents the finite-horizon MDP approach, Section 6 briefly outlines the algorithm, Section 7 presents the numerical simulations, and Section 7 summarizes the work.

#### 2. Problem Definition

Suppose that the observer is an UAV platform, the target is a car moving with constant velocity, and the height from the observer to the target is known. Considering the static threats in the environment, the threats must be avoided by the UAV to ensure the safety of the flight. The positions are set in 2-dimensional Cartesian coordinate; the state of target is , where the position and velocity are and , respectively. The state of the observer is ; the related state , where . The model of BOT can be described as follows:

Equations (1) and (2) are state function and measurement function, respectively, where , , and are the dimension. is a zero-mean Gaussian noise with covariance matrix , is the state transition matrix, and is the input. where is the time interval of measurements, is the intensity of process noise, and is a zero-mean Gaussian noise with covariance matrix .

#### 3. Discretization of the Process

In order to meet the requirements of the discrete time MDP approach, a quantization method is applied to approximate the continuous process of BOT. Actually, the quantization is used to approximate the process by a finite Markov chain, which is defined in the quantization algorithm [19, 20]. For each time, is divided by grid , which is common to all these approximation methods.

In the marginal quantization filtering, the grid points and probabilities are generated by the Monte Carlo approach. These grids can define a new state by

There is one-to-one match between the nearest-neighbor projection and Voronoi tessellation of , which means Borel partitions of satisfy

The updating rule of grids is the competitive learning vector quantization (CLVQ) method. For every time of , is the number of Monte Carlo; select as the closest neighbor of in ; in the learning phase, set where is the updating rate. The CLVQ has been widely adopted for the neuron updating in the neural network. The details of the algorithm can be seen in [19, 20]. At each step from to , the transition matrix of the Markov chain is

Thus, the process is replaced by the Markov chain with the transition matrix at each step. The whole process is divided in several periods; the initial distribution of each period is also quantized by , whose initial values are given randomly. So, the CKF is used to estimate the actual position of the target and provide the density as the initial distribution in each period of quantization.

Compared with EKF, the accuracy of CKF is much higher when they are applied for nonlinear filtering problem. The brief steps of CKF are as follows: (i)Time update (ii)Measurement update(iii)Factorize (iv)Evaluate the cubature points (v)Evaluate the propagated cubature points (vi)Estimate the predicted measurement (vii)Estimate the innovation covariance matrix (viii)Estimate the cross-covariance matrix (ix)Estimate the Kalman gain (x)Estimate the updated state (xi)Estimate the corresponding error covariance

#### 4. Reward Function

It is important to note that the reward function is the key to solve the problem with BOT and threat avoidance, namely, it consists of two parts, one of which describes the profits of bearing information from target, and the other is the cost of threat avoidance. For the former part, the detFIM performance indexed by a lower bound on detFIM is commonly used: where is the standard deviation of the bearing measurement error, is the time horizon, is the bearing rate, and is the relative distance from the observer to a static threat. This function is convenient to evaluate.

The second part is the cost of threat avoidance. The intensity of threats is determined by the relative distance from the observer to different threats, which means the smaller the relative distance, the greater the intensity. A potential field can be used to establish the threat environment.

In the two-dimensional Cartesian coordinate, the coordinate of the observer is , the coordinate of a threat is , and the potential at the coordinate of the observer at time is

In time horizon , the cost of the threats is

The intensity of threat can only exert influences within a limited distance , so we define the potential of threats as where is the relative distance of the observer and threats. Thus, the reward function can be represented by where is a constant coefficient, which ensures an agreement of order of magnitude in two parts.

#### 5. Optimal Maneuvering Strategy

The core issue of optimal strategy is the maximization of the reward function established. To achieve this purpose, the finite-horizon MDP principle can be used, and it demands that the reward function satisfies the dynamic programming property [7]:

So the sequence of controls must satisfy

For the model of the finite-horizon MDP, where is the state space, a Borel space; is the control or action set, a Borel space; is the transition probability function; is the reward function at each step; and is the terminal reward the each finite time horizon.

The maneuvering strategy is ; is the angle set of maneuvering; the state of the observer with the maneuver strategy is defined as

For the whole process, the maneuver strategy is ; the expected reward function is

The optimal whole reward function is

For ,

The maneuver strategy can be calculated by

Based on the quantization, , can be replaced by :

#### 6. Algorithm

The algorithm is composed of two processes; firstly, a quantization method is applied to provide the discretized density and transition matrix, which is used by the finite-horizon MDP to calculate the optimal maneuvering strategy. The reward function combines the detFIM and the cost of threats. The parameters can be achieved by quantization and based on the parameters and reward function; the optimal maneuvering strategy can be output by the finite-horizon MDP. At the same time, CKF provides the density for as initial density for each period. The diagram of the algorithm is shown in Figure 1 as below.