Abstract

This paper presents a summary of a subset of the extensive vision-based tracking methods developed at Georgia Tech. The problem of a follower aircraft tracking an uncooperative leader, using vision information only, is addressed. In all the results presented, a single monocular camera is used as the sole source of information used to maintain formation with the leader. A Kalman filter formulation is provided for the case where image processing may be used to estimate leader motion in the image plane. An additional piece of information, the subtended angle, also available from computer vision algorithm is used to improve range estimation accuracy. In situations where subtended angle information is not available, an optimal trajectory is generated that improves range estimation accuracy. Finally, assumptions on the target acceleration are relaxed by augmenting a Kalman Filter with an adaptive element.

1. Introduction

The most commonly used set of sensors required for accurate state estimation of a flight vehicle is a combination of an Inertial Measurement Unit (IMU) and a Global Positioning System (GPS). In most cases, heading information may be estimated when the vehicle is in motion. A magnetometer may be used if an accurate measurement is required even if the vehicle is not in motion, a helicopter for example. The IMU and Magnetometer rely on measuring quantities that are always available. The GPS signal however, is an external signal that is human generated. By its very nature of being a low-power signal, it is susceptible to jamming. Indeed, GPS simulators are available that are able to emulate GPS signals within the confines of a research lab. In the case of flight vehicles flying through terrain such as an urban city or enclosed areas, where GPS signals degrade, the translational variable estimates such as position and velocity also degrade in accuracy. Thus, there is a desire to develop reliable and affordable navigation alternatives that do not require an external signal.

Birds, insects, or, for that matter, most organisms with the ability to see the visual spectrum are able to navigate (in flight or on the ground), quite reliably using vision information. Highly evolved organisms, especially predators, use a pair of eyes to allow stereo vision ability in order to estimate range to targets. In contrast to the IMU, most animals use relatively low-quality mechanical sensors equivalents, but use vision extensively to determine relevant state information. If a human pilot can visually acquire a target, he or she can usually fly to it; hence an automated system should be able to replicate the same behavior. From as early as the year 2000, the UAV Research Facility (UAVRF) at Georgia Tech has been interested in producing navigation solutions using a single vision camera. In this paper, we consider the specific problem of navigating (following) a target object (leader) using a single vision camera. One of the earliest problems addressed was the estimation, control, and guidance of a glider [1, 2] towards a target window opening, after launch, in order to deliver a simple payload. The only sensor available on the glider was a single camera. Subsequently, under a Multi-University Research Initiative (MURI) program, the Active-Vision Control Systems project, initiated in 2005, a large body of theoretical work, simulation, and flight testing.

Vision-based estimation and navigation methods depend greatly on the application at hand. Some early work on landing an unmanned vehicle using vision have been reported in [3], where multiple views of a landing target from a single camera are used to provide motion estimates in order to land an unmanned helicopter on a moving platform. A more recent result [4] uses natural landmarks and a feature tracking algorithm to provide SLAM solutions for high-altitude UAVs. Both these methods use homography-based techniques to compute vehicle relative translations and rotations. An impressive result is provided in recent works by Chiba University and Universite de Technologie de Compiegne and is reported in [5]. Here, optic flow and inertial measurements are used to compute the position and velocity of a vehicle carrying a single monocular camera. An interesting and useful aspect of [5] is the combined analysis of image processing and controller-plant stability results. In many of these works and related references therein, the approach used is strongly influenced by the task at hand. However, if one were to separate the different phases of the estimation problem, it is the image processing data reduction tasks that caters most directly to the environment the vehicle operates in. Once raw relative motion data is available, an estimator is generally used to fuse the vision-based data with other sensors to maintain an estimate of the vehicle's state.

The focus of this paper is to present tracking and estimation methods to effectively track an uncooperative target using monocular vision only and fly in formation with it. Even though the leader may have its own navigation solution, this information is not used (for practical flight testing purposes, however, in order to setup the initial formation, leader GPS information is indeed used. It is then slowly phased out with vision information being phased in for leader relative position estimation purposes) by the follower aircraft. The goal is to use monocular vision in order to estimate and track the relative position to the target. In this paper, the terms leader and target are used interchangeably. The leader-follower problem appears in many ways. For example, navigating relative to a known landmark (fixed target) or avoiding an obstacle, both, involve estimating the target position and maintaining a safe offset from it. This paper will address only the tracking and estimation part of the problem. The particular image processing algorithms used in our application are described in [6].

The overall approach is to use a vision camera to produce a 2-D image plane representation of the scene. This setup is shown in Figure 1. The 2-D image produced is discrete, has a finite resolution determined by the number of pixels, and can only see a subset of the scene because of a limited field-of-view. This 2-D image is the only source of information on the target, that may be measured. If the target undergoes any motion relative to the follower, those changes are readily visible in the followerā€™s 2-D image of the scene, as long as the target is in the field-of-view. Image processing techniques may be used to track the target's movement on the image plane. Changes in range from camera to target will not be easily observable due to the difficulty in determining depth perception with monocular vision. The bioinspired solution would be to use stereo vision to improve range estimates. Using just a single camera, the range estimate will have to be improved using other techniques. It is known that the target in question has certain dimensions. Image processing techniques may be used to compute to angle that the 2-D image of the target subtends on the image plane (essentially, size of the blob) and may be used to improve range estimates. If the target is far away, or small, the finite resolution of the 2-D images may prevent this approach. In such cases, the cameraā€™s trajectory relative to the target may be modified. Unlike stereo vision where two perspectives of the scene at the same instant is available, monocular vision can only rely on image sequences and camera motion to improve range estimates. Additionally, sophisticated image processing may be used to leverage any shape/size information that may be available in the 2-D image.

The paper is structured as follows, Section 2 provides a problem statement that illustrates the equations governing the relative dynamics. Image processing techniques are based on [6] and extended to use the Fast Level Set Marching Method [7] for computational speed. Section 3 provides the Extended Kalman Filter formulation for the navigation problem where image processing is able to provide subtended angle information. Simulation and Flight Test results for this case are provided. Next, the scenario where subtended angle information is not available due to lack of image processing or when the target is very far away is considered in Section 4. Here, the problem of minimizing the variance on the range estimation errors is treated as an optimization problem, and the resulting guidance trajectory is compared to the nominal guidance policy. Simulation results are provided. Until Section 4, the Kalman Filter formulation assumes a model for target maneuvering. For example, in Section 3.3 flight test results are described where the target is moving in a circle. This is used as a priori knowledge and used to model the target acceleration characteristics. In order to mitigate this problem of having to assume a target model, Section 5 introduces a Kalman Filter augmented with an Adaptive Element that is able to deal with unmodeled/unknown target accelerations. Simulation results for this case are also provided. In general, the following sections outline the core theory associated with each method and provides simulation and/or flight test results. In each section, the reader is referred to publications where further details may be found. The Extended Kalman Filter (EKF) formulation used for Section 3 is described in the Appendix.

2. Problem Formulation

Let š—š‘£ and š•š‘£ be a vehicleā€™s position and velocity vectors expressed in an inertial frame (denoted by š¹š¼). Suppose that the vehicle dynamics can be modeled as the following simple linear system:Ģ‡š—š‘£(š‘”)=š•š‘£Ģ‡š•(š‘”),š‘£(š‘”)=ššš‘£(š‘”),(1) where ššš‘£ is the vehicleā€™s acceleration input. It is assumed that all the vehicle states are available through its own-ship navigation system. The target dynamics are similarly given byĢ‡š—š‘”(š‘”)=š•š‘”Ģ‡š•(š‘”),š‘”(š‘”)=ššš‘”(š‘”),(2) where š—š‘”, š•š‘”, and ššš‘” are the target's position, velocity and acceleration vectors, respectively. Relative position, velocity and acceleration of the target with respect to the vehicle are defined byš—(š‘”)=š—š‘”(š‘”)āˆ’š—š‘£š•(š‘”),(š‘”)=š•š‘”(š‘”)āˆ’š•š‘£(š‘”),šš(š‘”)=ššš‘”(š‘”)āˆ’ššš‘£(š‘”).(3) Then the relative motion dynamics are formulated as follows:Ģ‡Ģ‡š—(š‘”)=š•(š‘”),š•(š‘”)=šš(š‘”).(4) Since the targetā€™s maneuver is unknown in most cases, we need to apply some model for ššš‘” based on its prior knowledge. For example, the following target model is called the Singer model [8]:Ģ‡ššš‘”(š‘”)=āˆ’š›¼š‘ ššš‘”(š‘”)+š°(š‘”),š›¼š‘ >0,(5) where š°(š‘”) represents zero mean Gaussian noise.

In this problem, for simplicity, it is assumed that a 2-D passive vision sensor is mounted at the center of gravity of the vehicle (this assumption can be easily removed by including extra rotational dynamics caused by an offset in the relative motion model.) We also assume that an image processor which is able to detect a target position in an image frame is available. Let šæš‘ denote a known camera attitude represented by a rotation matrix from the inertial frame š¹š¼ to a camera frame which is denoted by š¹š¶. A camera frame is taken so that the cameraā€™s optical axis aligned with its š‘‹š‘ axis. Then the relative position expressed in š¹š¶ will beš—š‘(š‘”)=šæš‘ī€ŗš‘‹(š‘”)š—(š‘”)=š‘(š‘”)š‘Œš‘(š‘”)š‘š‘ī€»(š‘”)š‘‡.(6) Assuming a pinhole camera model shown in Figure 1, the target position in the image š±(š‘”) is given byī‚øī‚¹=š‘“š±(š‘”)=š‘¦(š‘”)š‘§(š‘”)š‘‹š‘ī‚øš‘Œ(š‘”)š‘š‘(š‘”)š‘ī‚¹(š‘”),(7) where š‘“ is the focal length of the camera. In this paper, š‘“=1 is used without loss of generality. More target information will be available if the image processor can detect the target's shape, size, contours, or other characteristics in addition to its, center position š±. The vision-based navigation objective is to estimate the unknown target states from the image processor outputs and the known vehicle and camera states.

3. Vision-Based Formation Flight

In this section, an Extended Kalman Filter (EKF) is designed to estimate the relative state of the leader aircraft with respect to the follower using vision information. More details on the EKF are provided in the Appendix. The EKF is implemented and tested in an image-in-the-loop 6 DoF multiple UAV flight simulation and in actual flights of UAVs. An own-ship navigation filter and a flight guidance and control system have already been developed and implemented [9]. The image processor used is developed for realtime target tracking by Ha et al. [10, 11]. In flight experiments, closed-loop vision-based formation flights of two UAVs have been successfully achieved [12]. Estimation results obtained in both simulations and flight tests are illustrated in this section.

3.1. Navigation Filter Design
3.1.1. Process Model

In the EKF design for this application, an estimation state vector is taken as follows:ī‚ƒš®š±=š‘‡Ģ‡š®š‘‡1š‘ŸĢ‡š‘Ÿš‘Ÿš‘ī‚„š‘‡,(8) where š® is an unit vector pointing from the follower to the leader and š‘Ÿ is a range between the two aircraft. They are defined byš—š®=ā€–š—ā€–,š‘Ÿ=ā€–š—ā€–,(9) where š— is the relative position vector. In bearing-only tracking problems, it is very common to use an inverse of range 1/š‘Ÿ instead of the range itself to reduce the nonlinearity associated with its dynamics [13]. Moreover, the unit vector is chosen as an estimation state, instead of bearing and elevation angles which are commonly used [13], to avoid a singularity. š‘ in (8) is a constant wingspan of the leader airplane. When using the random walk model for the targetā€™s velocity, the process model is written as follows:Ģ‡āŽ”āŽ¢āŽ¢āŽ¢āŽ¢āŽ¢āŽ¢āŽ¢āŽ£Ģ‡š®āˆ’ī‚€ā€–Ģ‡š±=š®ā€–2āˆ’1š‘Ÿš®ā‹…ššš‘£ī‚š®āˆ’2Ģ‡š‘Ÿš‘ŸĢ‡1š®āˆ’š‘Ÿššš‘£āˆ’1š‘ŸĢ‡š‘Ÿš‘Ÿā€–Ģ‡š®ā€–2āˆ’1š‘Ÿš®ā‹…ššš‘£āˆ’ī‚€Ģ‡š‘Ÿš‘Ÿī‚20āŽ¤āŽ„āŽ„āŽ„āŽ„āŽ„āŽ„āŽ„āŽ¦ī€·+š°=šŸš±,ššš‘£ī€ø+š°,(10) where ššš‘£ is the followerā€™s acceleration input and š° is the process noise.

3.1.2. Measurement Model

It is well-known that the range information is not always observable when only a 2-D image position of a target center is measured from a single camera [14]. To guarantee range observability, the line-of-sight from the camera to the target needs to be in motion [15]. However, in the formation flight application, the objective is to maintain a constant offset from the leader, leading to poor range estimates. To overcome this observability issue, an angle that is subtended by the leaderā€™s size (e.g., wingspan) is introduced as an additional measurement. The subtended angle š›¼ is defined byš›¼=2tanāˆ’1š‘.2š‘Ÿ(11) In defining this subtended angle, there are some assumptions made regarding the targetā€™s orientation with respect to the camera. The most common configuration for pursuit is with the camera directly behind the target. This results in the wing span being the most convenient parameter to represent the size of the target. Hence, the difference between the camera and target heading angles is assumed to be small. If for example the heading of the target is offset 90 degrees from the camera heading, it is the fuselage length that will be visible to any image processing. Additionally, as long as the difference between the camera and target pitch angles is small, there will be no ambiguity between pitch and roll orientation of the target. The image processor developed in [10] tracks the targetā€™s contour in images, and can thus extract the target size. Figure 2 shows an example of the image tracking result. The image processor processes raw images captured by the onboard camera, and then outputs, in image coordinates, the center and the two wing-tips of the leader airplane. Let š±š‘ be an image coordinate of the center position, and let š±š‘™ and š±š‘Ÿ be those of the left and right wing-tip positions, respectively. Since a pinhole camera model is assumed, the subtended angle measurement can be calculated by using š±š‘™ and š±š‘Ÿ as follows:š›¼=2tanāˆ’1ā€–ā€–š±š‘™āˆ’š±š‘Ÿā€–ā€–2.(12) The measurement vector in the EKF is chosen asī€ŗš®š³=š‘‡š‘š›¼ī€»š‘‡,(13) where š®š‘ is the unit vector expressed in the camera frame š¹š¶. The unit vector is chosen as a measurement instead of the two angles for the same reason as choosing it as an estimation state. The measurement vector š³ can be expressed as a nonlinear function of the estimation state š±. The measurement model is given byš³š‘˜=āŽ”āŽ¢āŽ¢āŽ£šæš‘š‘˜š®š‘˜2tanāˆ’1š‘š‘˜2ī‚€1š‘Ÿī‚š‘˜āŽ¤āŽ„āŽ„āŽ¦+š‚š‘˜ī€·š±=š”š‘˜ī€ø+š‚š‘˜,(14) where šæš‘š‘˜ is the known camera attitude at the time step š‘”š‘˜ and š‚š‘˜ is a measurement error. The EKF prediction and update procedures discussed in the Appendix are applied to the process and measurement models given in (10) and (14).

3.1.3. State Constraint Enforcement

Since the unit vector š® defined in (9) always satisfies ā€–š®ā€–=1, the estimates of š® and its changing rate Ģ‡š® in the state vector used in the EKF design have the following constraints:ā€–š®ā€–=1,š®š‘‡Ģ‡š®=0.(15) These constraints are naturally satisfied in the EKF prediction procedure (A.3), but not in the EKF update procedure (A.9). Therefore, the constraints (15) need to be enforced after the EKF update at each time step as follows:Ģ‚š®š‘˜=Ģ‚š®š‘˜ā€–ā€–Ģ‚š®š‘˜ā€–ā€–,Ģ‚Ģ‡š®š‘˜=Ģ‚Ģ‡š®š‘˜āˆ’ī€·Ģ‚š®š‘‡š‘˜Ģ‚Ģ‡š®š‘˜ī€øĢ‚š®š‘˜.(16)

Calise [16] proposed another method to enforce the state constraints in the EKF design. In his work, the Kalman gain was computed by minimizing the augmented Lagrangian that includes the estimation error and the state constraint. This results in simply adding the correction term to the original EKF update laws (see (A.8), (A.9)). However, this approach is not used in this work to avoid using a singular measurement error covariance matrix.

3.1.4. Leaderā€™s State Estimation

Once the estimation state is updated by using a measurement at š‘”š‘˜ in the EKF, the leaderā€™s estimated position and velocity can be calculated from the updated estimate Ģ‚š±š‘˜ and from the known followerā€™s state as shown belowīš—š‘”š‘˜=Ģ‚š®š‘˜ī€·Ģ‚ī€ø1/š‘Ÿš‘˜+š—š‘£š‘˜,īš•š‘”š‘˜=Ģ‚Ģ‡š®š‘˜ī€·Ģ‚ī€ø1/š‘Ÿš‘˜+Ģ‚š®š‘˜ī€·Ģ‚ī€ø1/š‘Ÿš‘˜ā‹…ī‚µĢ‚Ģ‡š‘Ÿš‘Ÿī‚¶š‘˜+š•š‘£š‘˜.(17) Since a nonaccelerating target is assumed in the EKF design, the leaderā€™s acceleration is estimated by Ģ‚ššš‘”š‘˜=šŸŽ. These estimated leader states are fed back to the guidance system to maintain the formation.

3.2. Simulation Results

Figure 3 shows a 6-DOF multiairplane flight simulation in a formation flight configuration Figure 3(a) and the airplane model used in this simulation Figure 3(b). The airplane is the MURI research airplane with a wingspan of 15.7ā€‰(ft) and overall length of 11ā€‰(ft). Flight control is achieved using an adaptive neural network-based controller that generates actuator signals based on the navigation system outputs and commanded position, velocity, and attitude [17]. The follower airplane has a camera, whose position, orientation and field of view generates simulated images. The synthetic images are processed and provide the locations of the leader's center and the two wing-tips. The synthetic image and the image processor outputs are shown in the left bottom window of the simulation interface in Figure 3.

In simulation, the leader airplane flies straight with a constant speed of 100ā€‰(ft/sec). The follower aircraft is commanded to maintain a given position relative to the leader. The formation is maintained by using the estimation result. Figure 4 compares the estimated (in blue) and the true (in green) relative positions and velocities. For example, at the start of the time scale (20ā€‰s) in Figure 4, the follower is commanded to keep the leader 100ā€‰ft ahead (š‘‹=100ft), stay directly behind (š‘Œ=0ft), and fly at the same altitude (š‘=0ft) as the leader.

The vision-based estimation results match the corresponding true states well. There is a bias in the range (š‘‹ position) estimation, that is, difference between the two solid lines. Even though the range estimation accuracy is improved after the vehicle's lateral motion due to the relative position command changes, a small bias remains. This is due to a measurement bias in the leaderā€™s size. However, the estimation is sufficiently accurate to achieve closed-loop vision-based formation flight. Formation breakup happens when the image processor can no longer track the target. Depending on the image processor, this happens when the target leaves or partially leaves the visual field of the camera. This may occur if š‘‹ estimates are inaccurate enough where the leader overshoots the target or lags so far behind the target that the image processor is no longer able to segment the target contours. The higher likelihood of formation breakup is in the orthogonal directions, where the š‘Œ and š‘ relative position estimates have enough errors where the target leaves the cameraā€™s field of view. Subsequent to validation of the EKF-based estimation performance in simulation, flight tests were conducted and is discussed in the next section.

3.3. Flight Test Results
3.3.1. Platform Aircraft

The simulation-tested algorithms from the previous sections were implemented on two different types of UAVs, a rotary-wing aircraft, the GTMax, and a fixed-wing aircraft, the GTYak. Initially, the GTMax helicopter (based on Yamahaā€™s RMax helicopter) was used as a follower aircraft. In later tests, the fixed-wing aircraft, the GTYak (scale model of the Yak-54), was deployed as a follower. For the vision-based guidance and control tests, the UAVs were equipped with cameras fixed to their bodies. The real-time image processor was implemented and executed within the flight computers of the GTMax and the GTYak. The GTEdge fixed-wing aircraft was deployed as the leader for all tests. The GTEdge is a 33% scale model of the aerobatic Edge 540T airplane and is highly maneuverable with vertical hover capabilities [18]. All three aircraft of both types (fixed-wing and rotary-wing) use the same adaptive trajectory tracking controller described in Johnson and Kannan [17] for flight control purposes. Table 1 summarizes the physical specifications of the three UAVs used in the flight tests.

3.3.2. EKF Modification for Circular Target Maneuvers

During flight tests, the GTEdge was commanded to fly in a circle with a constant ground speed š‘‰š‘”=65ā€‰(ft/s) and a constant angular rate šœ”š‘”=0.1ā€‰(rad/s). Unlike the straight level flight case used in the simulations, a circling motion was chosen because of a limited flight test field size. Since the leader's motion does not satisfy the zero acceleration assumption, the EKF design discussed in Section 3.1 was slightly modified. Suppose that it is known that the leader is flying in a circle at a constant altitude. In the modified EKF design, the leader's lateral acceleration š‘Žlat is added as an extra estimation state of the EKF. Since š‘Žlat should be constant in a circling motion, the dynamics can be modeled byĢ‡š‘Žlat=0+š‘¤š‘Žlat,(18) where š‘¤š‘Žlat is zero mean Gaussian noise. By assuming that the leaderā€™s lateral acceleration is perpendicular to the velocity vector in the horizontal š‘‹-š‘Œ plane, the leaderā€™s acceleration is estimated byššš‘”=š‘Žlatī”š‘‰2š‘‹š‘”+š‘‰2š‘Œš‘”āŽ”āŽ¢āŽ¢āŽ£āˆ’š‘‰š‘Œš‘”š‘‰š‘‹š‘”0āŽ¤āŽ„āŽ„āŽ¦,(19) where the leader's velocity vector š•š‘”=[š‘‰š‘‹š‘”š‘‰š‘Œš‘”š‘‰š‘š‘”]š‘‡ is given in (17). In the process model (10), ššš‘£ should be replaced by ššš‘£āˆ’ššš‘”.

3.3.3. Formation Flight Results

The first sustained closed-loop vision-based formation flight between the GTMax and the GTEdge was achieved in June, 2006. This may have been the first time automated formation flight based on vision only has ever been accomplished. Results pertaining to this particular flight test between a rotary-wing and fixed-wing aircraft are available in [12, 19].

Following these tests, it was concluded that a follower vehicle which is comparable in performance to the leader would allow flights at higher speeds and accommodate more maneuvering. Hence the GTYak was chosen to act as the leader in future experiments. The closed-loop vision-based formation flight between two fixed-wing airplanes, the GTYak and the GTEdge, was achieved in July, 2007. In this flight test, the GTEdge flew in a circle with š‘‰š‘”=75ā€‰(ft/sec) and šœ”š‘”=0.1ā€‰(rad/sec). Since the GTYak has more maneuverability than the GTMax, a higher speed was used. The GTYak was commanded to hold a relative position of š—com=[800āˆ’20]š‘‡ā€‰(ft) from the leader. Figure 5 shows the GTYak and the GTEdge flying in formation. The image processor keeps track of every time it loses lock on the target, even briefly. This is shown in the Figure 6(a) along with image processor outputs. Although the image processor sometimes failed to capture the leader airplane, even though it was in view, it was able to recover and lock onto the leader after several frames of no lock. The right plot of Figure 6 shows estimated and true relative positions. Figure 7 compares the vision-based (solid-blue) and GPS-INS-based estimates (dashed-green) of velocity and acceleration. The vision-based estimates agree well with the GPS-INS solutions and is sufficient to maintain formation.

4. Optimal Trajectory for Range Observability

There exist cases where in (12) it may not be possible to measure š›¼, the subtended angle, reliably. The monocular cameras used for vision have finite resolution. At large values of the range š‘Ÿ, of the follower from the target, the image of the target on the image plane will not have enough pixel width to reliably compute subtended angle information. At these large ranges, š±š‘™ and š±š‘Ÿ are close to each other and will result in a large variance in š›¼ measurements. Another situation where š›¼ may not be available is when sufficient processing power is not available to compute š›¼. In either case, the inability to measure the subtended angle results in poor estimates of the range š‘Ÿ. This then becomes a ā€œBearings-Onlyā€ [13] tracking problem where we want to estimate depth from image sequences [20]. Keeping in mind that we have used the unit vector š® to parameterize any bearing information to avoid singularities, we are interested in improving the accuracy of the range estimation.

A rigorous evaluation of the observability of depth information may be computed using the extended output Jacobian-based observability condition put forth in the works of Mariottini [21, 22]. Some of the motivation for using a nonlinear observability condition arises from the geometry of the problem being nonlinear where a linearization of the equations is nonobservable [23]. However, intuitively, Ģ‡š›½, the bearing to target, has to be nonzero in order for range to be observable. We can come to the same conclusion by examining š‘ƒš‘˜, the error covariance matrix of our Kalman Filter implementation described in the Appendix. The objective is to maximize š‘ƒš‘˜āˆ’1, which may be achieved by maximizing the term corresponding to the range estimation error in the term š»š‘‡š‘˜š‘…š‘˜āˆ’1š»š‘˜. This particular term, if expanded, is given byī‚µĢ‚š‘¢āˆ’š‘¦š‘˜šœŽšœ‰š‘¦ī‚¶2+ī‚µĢ‚š‘¢āˆ’š‘§š‘˜šœŽšœ‰š‘§ī‚¶2.(20) We also desire to limit the magnitude of control in š‘¦ and š‘§ directions, that is, minimize š‘Žš‘¦,š‘Žš‘§. The optimization problem may then be stated asminš‘Žš‘¦,š‘Žš‘§12ī€œš‘”š‘“0īƒÆāˆ’ī‚µĢ‚š‘¢āˆ’š‘¦š‘˜šœŽšœ‰š‘¦ī‚¶2āˆ’ī‚µĢ‚š‘¢āˆ’š‘§š‘˜šœŽšœ‰š‘§ī‚¶2+š¾2š‘¦š‘Ž2š‘¦+š¾2š‘§š‘Ž2š‘§īƒ°š‘‘š‘”(21) subject to the camera dynamics given by (1), since it is attached to the vehicle and here we assume it is not rotating with the follower's body frame. Here, š‘”š‘“ represents the terminal time when the vehicle reaches the destination.

In order to derive an analytical solution to the optimization, the following assumptions are made.(i)The camera motion is limited to the š‘‹-š‘Œ plane, and the š‘ position is held constant. This allows Ģ‚š‘¢āˆ’š‘§š‘˜ to be assumed constant and š‘Žš‘§=0. Thus the second and fourth terms of the performance index in (21) may be eliminated.(ii)To further simplify the problem, Ģ‡š‘¢š‘„ may be assumed constant and š‘Žš‘„=0.

This reduces the problem to having just two states š‘¢š‘¦ and Ģ‡š‘¢š‘¦. The boundary conditions are given by the initial and terminal statesš‘¢š‘¦(0)=š‘¢š‘¦0,Ģ‡š‘¢š‘¦(0)=Ģ‡š‘¢š‘¦0,š‘¢š‘¦ī€·š‘”š‘“ī€ø=š‘¢š‘¦š‘“,Ģ‡š‘¢š‘¦ī€·š‘”š‘“ī€ø=Ģ‡š‘¢š‘¦š‘“.(22)

The analytical solution for this problem may be obtained by solving the Euler-Lagrange equations [24]. With the Hamiltonian defined as1š»=āˆ’2š‘¢2š‘¦šœŽ2šœ‰š‘¦+12š¾2š‘¦š‘Ž2š‘¦āˆ’šœ†1Ģ‡š‘¢š‘¦+šœ†2š‘Žš‘¦,(23) the resulting Euler-Lagrange equations may be expressed as Ģ‡šœ†1=āˆ’šœ•š»šœ•š‘¢š‘¦=š‘¢š‘¦šœŽ2šœ‰š‘¦,Ģ‡šœ†2=āˆ’šœ•š»šœ•Ģ‡š‘¢š‘¦=šœ†1,šœ•š»šœ•š‘Žš‘¦=š¾2š‘¦š‘Žš‘¦+šœ†2=0,(24) where šœ†1 and šœ†2 are Lagrange multipliers. A differential equation for the followerā€™s lateral acceleration š‘Žš‘¦ can be derived and is given byš‘‘4š‘Žš‘¦š‘‘š‘”4=1ī‚€šœŽ2šœ‰š‘¦š¾š‘¦ī‚2š‘Žš‘¦.(25) The solutions for š‘¢š‘¦(š‘”),Ģ‡š‘¢š‘¦, and š‘Žš‘¦(š‘”) is given byš‘¢š‘¦šœŽ(š‘”)=āˆ’šœ‰š‘¦š¾š‘¦ī€·š“sinšœ”š‘”+šµcosšœ”š‘”āˆ’š¶š‘’šœ”š‘”āˆ’š·š‘’āˆ’šœ”š‘”ī€ø,Ģ‡š‘¢š‘¦āˆš(š‘”)=šœŽšœ‰š‘¦š¾š‘¦š¾2š‘¦ī€·š“cosšœ”š‘”āˆ’šµsinšœ”š‘”āˆ’š¶š‘’šœ”š‘”+š·š‘’āˆ’šœ”š‘”ī€ø,š‘Žš‘¦(1š‘”)=āˆ’š¾2š‘¦ī€·š“sinšœ”š‘”+šµcosšœ”š‘”+š¶š‘’šœ”š‘”+š·š‘’āˆ’šœ”š‘”ī€ø,(26) where the constants š“,šµ,š¶,š· may be found by satisfying the boundary conditions given by (22), and the parameter šœ” is given by1šœ”=āˆšš¾š‘¦šœŽšœ‰š‘¦.(27) Small values of š¾š‘¦ give larger lateral maneuvers, resulting in better estimates for the range information. In practical applications, too large a lateral maneuver can quickly result in the target moving out of the view of the camera. If this is a situation where the target is so far away that the subtenend angle cannot be used, small values of š¾š‘¦ are acceptable because larger lateral deviation will not necessarily move the object out of the cameraā€™s field of view. However, if this is a situation where the target is close, and the image processor is such that subtended angle is not available, š¾š‘¦ will have to be increased as the range decreases. In general, we want to keep š¾š‘¦ as high as possible resulting in less maneuvering in the follower. Thus, the variance of the range estimation error may be used to schedule š¾š‘¦ by decreasing š¾š‘¦ as the variance increases.

Figure 8 shows the estimation and guidance results with and without the optimal guidance policy. In both cases, a step command is issued in the š‘„-axis and š‘¦-axis, from an initial commanded relative position of [100,10,0]ā€‰(ft) to [50,0,0]ā€‰(ft). In Figure 8(b) the guidance policy is only turned on from š‘”=20s to š‘”=60s. The š‘¦-axis relative position plot in Figure 8(b) shows the meandering lateral path generated due to the optimal policy. In Figure 8(a), the lack of an optimal policy results in a large 20ā€‰ft steady state error between the true state of the aircraft and the estimated state of the aircraft along the š‘„-axis. This estimation error results in a large error between the command and the true state. In contrast, Figure 8(b), has smaller estimation errors and hence smaller command tracking errors. Finally, Figure 8(c) shows a plot of the variance in estimation of š‘‘=1/š‘‹ and conclusively illustrates the advantages of the optimal policy in reducing estimation error.

5. Adaptive State Estimation

One of the major challenges for target tracking arises from target motion uncertainty. This uncertainty refers to the fact that an accurate dynamic model of the target being tracked is generally not available to the tracker. In addition, any measurements of the target being tracked are corrupted by noise and time delays. A Kalman filter or one of its several variants is usually used as the target state estimator, but its performance may be seriously degraded unless the estimation error due to unknown target maneuvers and other uncertainties is accounted for. Some of the most popular approaches to handling the unknown target maneuvers in the target state estimator design are based on the so-called model-based filtering techniques [25]. The models may: (1) approximate the nonrandom target maneuver as a random process with certain properties, or (2) describe typical target trajectories by some representative motion models. Examples of the former include the simple white-noise acceleration model [26], the slightly more sophisticated Markov process based Singer model [8], and the more complex interacting multiple model technique [27]. A comprehensive survey of models may be found in Li and Jilkov [25]. When there is some a priori knowledge of the target maneuver, for example, if it is known that the target is maneuvering in a circle, circular motion models like (19) can be used for designing the target state estimator. In general, for the model-based approaches to target state estimation, filter performance may not be satisfactory when the target maneuver does not comply with the model, and every approach can be defeated with a suitably chosen target maneuver. Neural Network- (NN-) based adaptive estimation and filtering techniques for state estimation design have been proposed to compensate for the modeling errors that arise due to nonlinearities and unmodeled dynamics [28, 29]. In these approaches, a nominal time-varying estimator is augmented with the output of an adaptive NN that is trained online with the residuals of the nominal estimator and with delayed values of available system measurements as inputs.

Using an adaptive approach can eliminate the need to have apriori knowledge of the target maneuvering such as those given by (5) and (19).

5.1. Problem Formulation

Consider the following bounded single-input-single-output (SISO) nonlinear system:Ģ‡Ģ‡š±(š‘”)=š“š±(š‘”)+šµš‘”(š±(š‘”),š³(š‘”)),(28)š³(š‘”)=šŸš‘§(š±(š‘”),š³(š‘”)),(29)š‘¦(š‘”)=š¶š³(š‘”),(30) with initial conditions š±(0)=š±0 and š³(0)=š³0. Here š±āˆˆš·š‘„āŠ†ā„›š‘›š‘„ and š³āˆˆš·š‘§āŠ†ā„›š‘›š‘§ are the states of the system. š± represents the states that have been modeled, and š³ represents unknown/unmodeled states. š·š‘§,š·š‘„ are compact sets and šŸš‘§(š±,š³)āˆ¶ā„›š‘›š‘„Ɨā„›š‘›š‘§ā†’ā„›š‘›š‘§ is unknown, but bounded, and represents the unmodeled dynamics. š‘”(š±,š³)āˆ¶ā„›š‘›š‘„Ɨā„›š‘›š‘§ā†’ā„› is unknown, but is uniformly bounded and continuous. It is essentially through š‘”(š±,š³) that the unmodeled dynamics šŸš‘§ couples into the system dynamics. The dimension of the š³ dynamics need not be known, but must be bounded. That is, š‘›š‘§ is unknown but bounded. The output is represented by š‘¦āˆˆā„›. The matrices (š“,šµ,š¶) are known, and the pair (š“,š¶) is assumed to be observable.

The function š‘”(š±,š³) acts as the unknown system input or disturbance to the nominal linear system given by the matrices (š“,šµ,š¶). In the context of target tracking, šŸš‘§(š±,š³) represents unknown/unmodeled target dynamics. The objective is to design a state estimator to estimate the states š± of the system in (28) with bounded estimation error in the presence of the unknown system input š‘”(š±,š³). We start by approximating š‘”(š±,š³) using a neural network parametrization given byš‘”(š±,š³)=š–š‘‡šˆī€·šī€ø+šœ–(š),(31) where š– is the ideal weights that results in a function approximation error šœ–. We assume that ā€–š–ā€–š¹ā‰¤š‘Šāˆ—, ā€–šœ–(š)|ā‰¤šœ–āˆ—, and |š|ā‰¤šœ‡āˆ—. šˆ(š)=[šœŽ1(š,ā€¦,šœŽš‘(š)]š‘‡ is a vector of sigmoidal functions. š‘Šāˆ— is a bound on the Frobenious norms of the ideal, unknown weights š–. šœ–āˆ— is a bound on the Neural Network function approximation error šœ–. š‘ is the number of neurons. š is the input vector given byī‚ƒš(š‘¦(š‘”),š‘‘)=1,Ī”š‘‘(0)š‘¦š‘‡(š‘”)ā‹ÆĪ”š‘‘š‘›āˆ’1š‘¦š‘‡ī‚„(š‘”),(32) whereĪ”š‘‘(0)š‘¦š‘‡(š‘”)=š‘¦š‘‡Ī”(š‘”)š‘‘(š‘˜)š‘¦š‘‡Ī”(š‘”)=š‘‘(š‘˜āˆ’1)š‘¦š‘‡(š‘”)āˆ’Ī”š‘‘(š‘˜āˆ’1)š‘¦š‘‡(š‘”āˆ’š‘‘)š‘‘š‘˜=1,2,ā€¦(33)š‘‘>0 represents a time delay. The sigmoidal functions are smooth and uniformly bounded that is, |šœŽš‘–(š)|ā‰¤1. The above set of equations originating with (31) essentially states that a continuous function of the states of (28) that is, š±,š³ can be approximated to an arbitrary degree of accuracy by a Neural Network over a compact domain by using inputs š(š‘¦(š‘”),š‘‘) that are a finite sample of the output (š‘¦) history of the system.

5.2. Adaptive Element augmented Kalman Filter

One may design a time-varying filter to estimate the states of (28) as follows:Ģ‡Ģ‚š±Ģ‚š±(š‘”)=š“(š‘”)+š¾(š‘”)(š‘¦(š‘”)āˆ’Ģ‚š‘¦(š‘”))+šµšœˆad,Ģ‚š±Ģ‚š±(0)=0,Ģ‚š‘¦(š‘”)=š¶Ģ‚š‘„(š‘”),(34) where the Kalman gain š¾(š‘”) may be obtained through the following matrix differential Ricatti equation [30], Ģ‡š‘ƒ(š‘”)=š“š‘ƒ(š‘”)+š‘ƒ(š‘”)š“š‘‡āˆ’š‘ƒ(š‘”)š¶š‘‡š‘…āˆ’1š¶š‘ƒ(š‘”)+š‘„,š¾(š‘”)=š‘ƒ(š‘”)š¶š‘‡š‘…āˆ’1,(35) where š‘ƒ(0)=š‘ƒ0>0, š‘„=š‘„š‘‡>0, š‘…=š‘…š‘‡>0 are design constants. The solution š‘ƒ(š‘”) is bounded, symmetric, positive definite, and continuously differentiable. In (34), the term šµšœˆad represents the Neural Network augmentation to the Kalman Filter. šœˆad represents the output of the Neural Network and is given byšœˆad=ī‚Šš–(š‘”)š‘‡šˆī€·šœ‡ī€ø,(36) where ī‚Šš–(š‘”) is the estimate of the weight vector š– and šœˆad is designed to approximate the bounded disturbance š‘”(š±,š³). The difference between the output š‘¦(š‘”) and our estimate of the output Ģ‚š‘¦(š‘”), that is, the residual signal Ģƒš‘¦(š‘”)=š‘¦(š‘”)āˆ’Ģ‚š‘¦(š‘”) may be used as a training signal to train the Neural Network. A second training signal is usually derived to improve the effectiveness of the NN adaptation performance. Further details on the second training signal and the Neural Network weight update laws may be found in [28]. Figure 9 illustrates a block diagram of the system. The TDL block represents the delay line give by (32).

5.3. Simulation Results

The detailed equations for this particular rendering of the estimation problem for target tracking may be found in [29]; the results, however, are presented here. In contrast to using an estimate for the target acceleration such as that given by (19), with the adaptive element augmentation, the target ššš‘” need not be assumed apriori and is simply treated as unmodeled dynamics šŸš‘§(š±,š³) which results in an unknown input š‘”(š±,š³) that the adaptive element can approximate.

In this scenario, the leader aircraft is flying in a circle in the horizontal plane at a constant heading rate. The follower aircraft is tasked with maintaining specified separation distances along the š‘„,š‘¦, and š‘§ axes of the follower body-fixed frame. The follower is first put into the desired formation using only GPS-communicated data of the leader inertial position, velocity, and acceleration. The leader GPS data is communicated at about 5ā€‰Hz and is filtered to produce leader state estimates at the rate required (50ā€‰Hz) by the follower aircraft guidance and flight control algorithms. Once the leader aircraft is at the desired separation distance, the image processing and target state estimation algorithms are switched on. The update rate of the image processing in simulation 10ā€‰Hz. The estimates of the leader position, velocity, and acceleration from using the vision-based target state estimator are blended in with the corresponding GPS estimates to produce the leader state estimates that are used in the guidance and flight control algorithms for formation keeping. The formation separation commands for the results shown below are given by [š‘‘š‘„,š‘‘š‘¦,š‘‘š‘§]commanded=[60,15,10]ft.

Figure 10(a) shows various flags used during the simulation and serves to indicate the method in which this scenario was tested. The IP flag, if 1, indicates the image processing system returned a valid result. If 0, it generally implies that the target was not in view. The use of vision information in the navigation solution was slowly blended into information available from the leaderā€™s navigation solution which is communicated to the follower. Hence a zero for useVision indicates a fully cooperative leader and a one indicates a completely uncooperative target. The NN switch flag indicates where the adaptive element is active. At š‘”=140s, the vision estimates are blended in (at 50%) and the reliance on vision information is slowly increased, where at about š‘”=290s, only vision information is used to estimate the leaderā€™s position, velocity, and acceleration. In doing so, Figure 10(b) illustrates the ability of the follower to maintain formation at the commanded [š‘‘š‘„,š‘‘š‘¦,š‘‘š‘§]. Figures 10(c) and 10(d) show the estimates of the leader's velocity and error in those estimates respectively. Similarly, Figures 10(e) and 10(f) show the estimates, of the leaderā€™s acceleration and error in those estimates respectively. In general, the ability of the follower to track the desired offset command is sufficient to keep formation with 15ā€‰ft in each axis.

Note that an initially cooperative target, that is, using the leaderā€™s GPS position (useVision = 0), is only assumed in order to position the leader within the followerā€™s field of view before the experiment may begin. This assumption does not affect the ability of the Neural Network to improve tracking results. Other methods such as external radar-based information may be used in a real situation where the target is truly uncooperative.

In order to demonstrate the effect of adaptation on this problem, the adaptive element was switched off leaving just the Kalman filter to estimate the leader's position and velocity. In this case only the flags (Figure 11(a)) and the range estimation error (Figure 11(b)) are shown. Even at a low blending in of vision information (20%), the IP flag temporarily goes to zero indicating a loss of vision tracking. As the vision information is further blended in, the range estimates diverges and the leader aircraft drifts out of the field-of-view, indicated by the repeated zeroing of the IP flag, eventually resulting in divergence of the range estimate. This clearly indicates the requirement of an adaptive element when no a priori information on target maneuvers is assumed.

6. Conclusions

It is shown that even though the range observability is problematic when using monocular vision, especially when target subtended angle information is not available, it is practical to generate follower maneuvers that improve range observability. A combination of flight test and simulation results are presented for vision-based tracking of an uncooperative target when subtended angle information is available. Initially, a target maneuver model which reflects the circular maneuvers performed in simulation and flight tests is assumed to be available. The achievement of the closed-loop formation flight verified the estimation performance of the vision-based relative navigation filter design. When the requirement for a target maneuver model is removed, the lack of a model was found to be detrimental to leader state estimation performance, and formation flight was not possible. By treating the target as unmodeled dynamics, an adaptive element was augmented with a Kalman Filter allowing leader states to be estimated. The adaptive element was found to be critical in maintaining formation geometry.

Appendix

Extended Kalman Filter

Since the 2-D vision-based measurement (7) is a nonlinear function with respect to the 3-D relative state, an extended Kalman filter (EKF) is applied to estimate the relative states from the measurement. The Kalman filter is a recursive solution to the least squares method for a linear filtering problem [31]. Since the filter was introduced by Kalman in 1960, it has been the subject of extensive research and application particularly in the area of autonomous navigation [32]. The EKF is an extension of the standard linear Kalman filter so that it can be applied to nonlinear systems by linearizing the system about the predicted estimate at each time step [30, 33, 34]. Even though the convergence of its estimate cannot be theoretically proven, good estimation performance of the EKF has been demonstrated in many practical applications.

A general formulation of the EKF is presented in this section. Consider the following nonlinear system:Ģ‡š³š±(š‘”)=šŸ(š±(š‘”),š®(š‘”))+š°(š‘”),(A.1)š‘˜ī€·š±=š”š‘˜ī€ø+š‚š‘˜,(A.2) where š± is a state vector, š® is a system input, š° is a random zero-mean process noise, š³š‘˜ is a measurement at time step š‘˜, and š‚š‘˜ is a discrete measurement noise. Let Ģ‚š±āˆ’š‘˜ and Ģ‚š±š‘˜ be the predicted and updated estimates of š±(š‘”š‘˜), and let š‘ƒāˆ’š‘˜ and š‘ƒš‘˜ be their estimated error covariance matrices. A state estimate at š‘”š‘˜āˆ’1 is propagated to the next time step š‘”š‘˜ through a first-order Euler integration of the original nonlinear system (A.1),Ģ‚š±āˆ’š‘˜ā‰ƒĢ‚š±š‘˜āˆ’1ī€·Ģ‚š±+šŸš‘˜āˆ’1,š®š‘˜āˆ’1š‘”ī€øī€·š‘˜āˆ’š‘”š‘˜āˆ’1ī€ø.(A.3) Then the system dynamics and the measurement model ((A.1)-(A.2)) are linearized about Ģ‚š±āˆ’š‘˜ and discretized as follows:š±š‘˜=Ī¦š‘˜š±š‘˜āˆ’1+Ī“š‘˜š®š‘˜āˆ’1+š°š‘˜āˆ’1,š³š‘˜=š»š‘˜š±š‘˜+š‚š‘˜,(A.4) whereĪ¦š‘˜=š‘’š¹š‘˜(š‘”š‘˜āˆ’š‘”š‘˜āˆ’1),Ī“š‘˜=ī€œš‘”š‘˜š‘”š‘˜āˆ’1š‘’š¹š‘˜(š‘”š‘˜āˆ’š‘ )šŗš‘˜š°š‘‘š‘ ,š‘˜āˆ’1=ī€œš‘”š‘˜š‘”š‘˜āˆ’1š‘’š¹š‘˜(š‘”š‘˜āˆ’š‘ )š¹š°(š‘ )š‘‘š‘ ,š‘˜=šœ•šŸ(š±,š®)|||šœ•š±š±=Ģ‚š±āˆ’š‘˜,š®=š®š‘˜āˆ’1,šŗš‘˜=šœ•šŸ(š±,š®)|||šœ•š®š±=Ģ‚š±āˆ’š‘˜,š®=š®š‘˜āˆ’1,š»š‘˜=šœ•š”(š±š‘˜)šœ•š±š‘˜||||š±š‘˜=Ģ‚š±āˆ’š‘˜.(A.5) Now the linear discrete Kalman filtering algorithm can be applied to the linearized system (A.4). The predicted and updated error covariance matrices and the Kalman gain are calculated byš‘ƒāˆ’š‘˜=Ī¦š‘˜š‘ƒš‘˜āˆ’1Ī¦š‘‡š‘˜+š‘„š‘˜,š¾(A.6)š‘˜=š‘ƒāˆ’š‘˜š»š‘‡š‘˜ī€·š»š‘˜š‘ƒāˆ’š‘˜š»š‘‡š‘˜+š‘…š‘˜ī€øāˆ’1š‘ƒ,(A.7)š‘˜=ī€·š¼āˆ’š¾š‘˜š»š‘˜ī€øš‘ƒāˆ’š‘˜,(A.8) where š‘„š‘˜ and š‘…š‘˜ are covariance matrices of the discrete process noise š°š‘˜ and measurement noise š‚š‘˜. Finally, the updated state estimate is obtained byĢ‚š±š‘˜=Ģ‚š±āˆ’š‘˜+š¾š‘˜ī€·š³š‘˜ī€·Ģ‚š±āˆ’š”āˆ’š‘˜ī€øī€ø.(A.9)

Since a cameraā€™s field of view is limited and the image processor may sometimes fail to capture the target, the vision-based measurement is not always available. When this happens, only the EKF prediction procedure ((A.3) and (A.6)) is performed. The absence of a measurement corresponds to having a measurement with an infinitely large noise. When š‘…š‘˜=āˆž in (A.7), the Kalman gain š¾š‘˜ becomes zero. It results in Ģ‚š±š‘˜=Ģ‚š±āˆ’š‘˜ and š‘ƒš‘˜=š‘ƒāˆ’š‘˜, and nothing will be changed in the EKF update procedure (see (A.8) and (A.9)).