Abstract

We develop a path-planning algorithm to guide autonomous amphibious vehicles (AAVs) for flood rescue support missions. Specifically, we develop an algorithm to control multiple AAVs to reach/rescue multiple victims (also called targets) in a flood scenario in 2D, where the flood water flows across the scene and the targets move (drifted by the flood water) along the flood stream. A target is said to be rescued if an AAV lies within a circular region of a certain radius around the target. The goal is to control the AAVs such that each target gets rescued while optimizing a certain performance objective. The algorithm design is based on the theory of partially observable Markov decision process (POMDP). In practice, POMDP problems are hard to solve exactly, so we use an approximation method called nominal belief-state optimization (NBO). We compare the performance of the NBO approach with a greedy approach.

1. Introduction

Various guidance algorithms for autonomous amphibious vehicles (AAVs) are being designed and tested to fight today’s global warming disasters such as flooding, typhoon, and hurricane [13]. With this motivation, we present a guidance framework to control multiple AAVs to rescue multiple victims (henceforth called targets) in a flood situation, where the flood water (interchangeably called river) flows along a valley as shown in Figure 1. A target is said to be rescued when an AAV is within the circular region of radius on the 2D plane around the target. In general, AAVs are equipped with various advanced sensors such as polarized stereo vision, laser scanning, and SONAR [46]. The sensors onboard an AAV generate the (noisy) measurements corresponding to the targets and the river. Our goal is to design a path-planning algorithm that guides the AAVs so that every target gets rescued, while maximizing a performance measure (discussed later). The algorithm runs on a notional central fusion node, which collects the measurements from the sensors on-board each AAV, fuses them and updates the tracks on the targets and the river state (discussed later), computes the control commands for the AAVs, and sends the control commands back to the AAVs.

Guidance control methods [1, 79] for AAVs are normally based on a standard three-layered system architecture that requires human-machine interactions. We design the guidance algorithm based on the theory of partially observable Markov decision process (POMDP) [10, 11]. There are several other autonomous control methods in the literature for AAVs and underwater vehicles, for example, [1214]. Our approach differs from these existing approaches in that we place the guidance problem in the context of POMDP, wherein this approach has a look-ahead property, which trades off short-term for long-term performance.

2. Problem Specification

The AAV guidance problem is specified as follows.

2.1. Targets

In this study, we assume that there are multiple mobile targets (flood victims) located in a river, being drifted down by the flood water, as shown in Figure 1.

2.2. Autonomous Amphibious Vehicles (AAVs)

There are multiple autonomous amphibious vehicles (AAVs) located on the shore, as shown in Figure 1. An AAV is controlled by the following kinematic controls: forward acceleration and steering angle. Each AAV is equipped with on-board sensors that generate measurements of targets and the river depth. In this problem, AAVs float when moving in the river. For the purpose of this study, we assume that the number of AAVs and the number of targets are the same.

2.3. Environmental Conditions

The elevation map of the region is known a priori. The landscape for this problem is shown in Figure 1, which shows a river flowing along a valley from the north toward the south. The state of the river includes the depth at a reference point on the map (lowest point in the landscape, e.g., some location at the bottom of the valley as shown in Figure 1).

2.4. River Model

Typically a river flows slowly near the coastlines (where the river is shallow) and flows quickly far from the coastlines (i.e., toward the center of the river where the river is deep). In this paper, we assume that the river flows from the north toward the south in a v-shaped channel as shown in Figure 1. We adopt the logarithmic velocity profile to model the velocity of the flow (see [15] for a detailed description). According to this model, the speed of the river, at the surface, at the location at time is given by where is the depth of the river at the location at time , and (a function of the viscosity and the density of flood water) and are constants (see [15] for more details).

2.5. Observations

The sensors onboard an AAV generate noisy observations of target locations and the depth of the river directly beneath the vehicle, that is, the sensors generate the observations of the depth of the river only when the AAV is in the river.

2.6. Objective

A target is said to be rescued if there is an AAV within a circular region of radius around the target. The objective is to minimize the average rescue time, where the average is over the number of targets, and the rescue time of a target is defined as the time it takes to rescue the target.

3. Problem Formulation

We cast the AAV guidance problem into the framework of a partially observable Markov decision process (POMDP). A POMDP is a mathematical framework useful for solving resource control problems and enables us to exploit approximation methods for POMDPs to design our AAV guidance algorithm. A POMDP evolves in discrete time steps. We use as the discrete-time index. To cast the AAV guidance problem into the POMDP framework, we need to define the following key components in terms of our guidance problem as follows.

3.1. States

Let represent the state of the system at time . The state of the system includes the state of the vehicles (AAVs) , river state (depth of the river at a reference location) , target state , and track states , that is, . The vehicle state includes the locations and the velocities of the AAVs at time . The river state is the depth of the river at the reference point at time . The reference point is the lowest point in the elevation map, that is, some location at the bottom of the valley in the landscape, as shown in Figure 1. Here, we assume that the flow direction of the river is the same everywhere and is known a priori. The target state includes the locations and the velocities of the targets at time . The track states represent the state of the tracking algorithm, where and are the mean and the variance, standard in Kalman filter equations, corresponding to the river state, and, similarly, is the mean vector and is the covariance matrix corresponding to the target state.

3.2. Observations and Observation Law

The vehicle and the track states are assumed to be fully observable. The river and the target states are only partially observable. The observation of the river state at an AAV is given by where , and is the measurement variance. The sensors at an AAV generate the measurement of the river state only when the AAV is in the river. In practice, the sensors on an AAV measure the depth of the river exactly below the AAV. We wrote the observation model (2) as if the sensors are generating the observations of the depth of the river at the reference point. The rationale behind this assumption is that we can always calculate the depth of the river at the reference point given the elevation map and the observed depth of the river at a different location. The observation of the th target at an AAV is given by where is the target-state observation model, is the state of th target, and , where is the measurement covariance matrix. The line-of-sight between the target and the AAV is blocked sometimes, for example, whenever the target sinks in the water.

3.3. Actions

The actions include the controllable aspects of the system. In this problem, the actions include the decisions on the assignment of AAVs to targets, and kinematic control commands for AAVs. Let be the action tuple at time , which is given by , where represents kinematic control vectors (includes forward acceleration and steering angle for each AAV), and is a vector, which represents the assignment of AAVs to targets, that is, means that the th AAV is assigned to the th target. For the purpose of this study, the number of AAVs and the targets is the same. Each AAV is assigned to only one target, and each target gets assigned only one AAV, that is, represents a one-to-one correspondence between the AAVs and the targets.

3.4. State-Transition Law

The state-transition law specifies the next-state distribution given the current state and the action. The transition function for the vehicle state is given by , where (defined later) represents the AAV kinematic model, is the vehicle state, is the kinematic control vector (includes forward acceleration and steering angle), and is the estimated river state at time . The river state evolves according to the following equation: where is the process variance corresponding to the river state evolution. The target state evolves according to where represents the target motion model, and is the process covariance matrix corresponding to the target state evolution. The track states evolve according to the Kalman filter equations given the observations from the sensors onboard the AAVs. When the observations are not available, the track states evolve according to the Kalman filter equations, where only the prediction step is performed and the update step is not performed.

3.5. Cost

The cost function represents the cost of performing an action at the current state. The cost function is given by where represents the 2D position coordinates of th AAV, represents the estimated 2D position coordinates of the th target at time , is the Euclidean norm (everywhere in this paper), and is the indicator function which equals 1 when the expected distance between the AAV and the target at time is greater than some threshold distance and 0 otherwise.

3.6. Belief State

The belief state is the posterior distribution of the state at time . The vehicle and the track states are assumed to be fully observable, that is, the belief state corresponding to the vehicle state is given by , where is the Kronecker delta function. Similarly, the belief states corresponding to the track states can be written in terms of the actual track states. The belief states corresponding to the river and the target are the posterior distributions of and , respectively, given the history of observations.

4. Objective and Optimal Policy

The goal is to find the action sequence such that the expected cumulative cost over a time horizon is minimized. The expected cumulative cost is given by We can write the expected cumulative cost in terms of the belief states given the initial belief state (similar to the treatment in [10, 11]) as follows: where , and is the belief state at time . From Bellman’s principle of optimality [16], the optimal objective function value is given by where is the random next belief state, is the optimal cumulative cost over the horizon , , and is the conditional expectation given the current belief state and the current action at time . Let us define the value of taking action given the current belief state : The optimal policy (from Bellman’s principle) at time can be written as In general, it is hard to obtain the value exactly. There are several approximation methods in the literature: heuristic expected-cost-to-go (ECTG) [17], parametric approximation [18], policy rollout [19], hindsight optimization [20], and foresight optimization [21]. In this paper, we use one such approximation method called nominal belief-state optimization (NBO), which was introduced in [11] along with other approximations and techniques specific to guidance problems. The rationale behind choosing NBO method over other methods to solve POMDP is that it is relatively inexpensive in terms of computation time, that is, the computational requirements are not prohibitive unlike other approximation methods. The following subsection provides a brief description of the NBO method.

4.1. NBO Approximation Method

The computational requirements of obtaining the optimal assignments of AAVs to targets () over a long horizon are prohibitive. Also, we expect that the optimal assignment of AAVs to targets () over a long horizon does not change with time. For these reasons, in the NBO method, we keep the assignment of AAVs to targets fixed. In other words, in approximating the expected cost-to-go in (10), remains fixed over the planning horizon . Therefore, we drop the subscript from in the objective function used in the planning based on (10), that is, for all . In the NBO approximation method, we use the following objective function, written in terms of belief states: where represents the kinematic controls for the AAVs, and is the assignment of AAVs to the targets.

The belief states corresponding to the river state and the target state are given by where are the track states corresponding to the river and the target states, respectively, which evolve according to the Kalman filter equations. In the NBO method, we approximate the objective function as follows: where is a nominal belief-state sequence, and the optimization is over an action sequence . We obtain the nominal belief states by evolving the current belief state with exactly zero-noise sequence over the horizon (similar to the treatment in [10, 11]). Therefore, the objective function from the NBO method is given by where is the nominal position of the th AAV (defined below), is the nominal belief state of the th target at time , where (component of ) represents the position estimate of the target. This nominal target belief state is obtained by evolving the track state component with exactly zero-noise sequence as follows: The evolution of vehicle state depends on the river state estimate . In the NBO method, is replaced with in the AAV kinematic model , where are the nominal track state components corresponding to the river state, and the obtained positions of the th AAV are called nominal positions.

Here, we adopt an approach called “receding horizon control,” according to which we optimize the action sequence for time steps at the current time step, implement only the action corresponding to the current time step, and again optimize the action sequence for time steps in the next time step. The length of the planning horizon should be large enough for an AAV to receive a benefit by moving toward a target. Due to computational constraints, we cannot have an arbitrarily long horizon. Therefore, we truncate the length of the horizon to a few time steps (we set in our simulations) and append the cost function with an appropriate expected cost-to-go (ECTG). The following is a distance-based ECTG: where is the nominal position of the th AAV, and is the estimated location of the th target (from NBO approach) at time . Therefore, the objective function from the NBO method is given by where is the distance-based ECTG.

4.2. AAV Kinematics

The kinematic equations of an AAV vary depending on whether the AAV is in the river or on the land. When the AAV is in the river, we take into account the speed of the river to write the kinematic equations. The steering and thrust generation of the vehicle are modeled based on the work done by the authors of [2, 22], which is designed using single drive system. The vehicle is front-wheel driven on land. When the AAV is in the river, it is propelled using the centrifugal pump from the front wheels. The following subsections describe the kinematics of AAV on the land and in the river.

4.2.1. Kinematics of AAVs on the Land

This subsection provides the definition of , which was introduced in Section 3, when the vehicle is on land. Let be the state of the vehicle at time , where represents the location of the vehicle on the 2D plane, represents the speed of the vehicle along the heading direction, and represents the heading angle of the vehicle at time . Let represent the action vector of the vehicle, where represents the acceleration along the direction of the front wheels, and represents the steering angle of the front wheels. The (simplified) schematic of a basic four-wheeled vehicle is shown in Figure 2. The control variable lies within the interval , where (or ) is the maximum acceleration (or deceleration), and the control variable lies within the interval , where is the maximum steering angle. The function can be specified by a set of nonlinear kinematic equations, as shown below: where is the length of the time step, is the width of the vehicle, and is the distance between the front axle and the rear axle. The derivation of the heading angle update (19) is as follows. When the front wheels of the vehicle are oriented at a particular angle with respect to the main axis of the vehicle (as shown in Figure 2), the heading direction of the vehicle at time is derived as follows:

4.2.2. Kinematics of AAVs on the River

This subsection provides the definition of , when the vehicle is in the river. The kinematic equations of the AAV motion are as follows: where and are the estimated speeds of the river at the location in and directions, respectively, which are obtained from the river state estimate and the river model presented in Section 2. The speed and the heading angle update equations remain the same as in the case of land. When in water (or river), the control variable lies within the interval , where is the maximum acceleration, and lies within the interval , where is the maximum steering angle. Typically, the values of and are much smaller compared to that of and .

5. Simulation

We implement the NBO method in MATLAB, and we use the command fmincon (MATLAB’s optimization tool) to solve the optimization problem. For performance comparison, we also implement a greedy approach, where we optimize only the current kinematic control for the AAVs such that the following symmetric-distance-based cost is minimized: where and are the nominal positions (obtained by evolving the belief states with zero noise) of the th AAV and the th target at time , respectively. Our simulation environment is two dimensional, that is, the AAVs, the river, and the targets move in 2D. According to the river model, the speed of the river stream at a location is given by , where is the depth of the river at , and and are constants. Since the depth of the river is not fully observable, we estimate as follows. The elevation map of the landscape is known a priori, that is, if we know the depth of the river at a particular location, we can obtain the depth of the river at all locations. Therefore, we estimate the depth of the river at location , that is, using the estimated depth of the river at the reference point (). Therefore, the estimated speed of the river at location is given by . We set the length of the horizon to 6 time steps, and the length of the times step to 1 second. In the simulations, the flooded river flows along a valley in the landscape from the north toward the south as shown in Figure 1. Since the simulations are in 2D, the river flows toward the direction, and the river speed in direction (toward the east) is zero at every location. Therefore, the estimated speeds of the river at location in and directions are given by and . Here, we model the dynamics of the target motion by the constant velocity model (see [23] for the definition of the variables and in (5)).

In the simulations, an AAV is represented by a rectangle, and the line connecting the rectangles represents the trajectory of the AAV. We define a performance metric called average rescue time—the average of the rescue times of each target (the rescue time of a target is the time elapsed after the start of the simulation until it is rescued). The POMDP cost function defined in Section 3 is reflective of this performance metric. We simulate three scenarios: Scenario I, Scenario II, and Scenario III. In Scenario I, there are two AAVs, each one located on the opposite banks of the river, and two targets are moving (being drifted by the moving water) in the river, as shown in Figure 3. Figure 3 shows a snapshot of the scenario at the end of the simulation with the NBO approach, where the average rescue time is 36 time steps. We also simulate Scenario I with the greedy approach, as shown in Figure 4, where the average rescue time is 64 time steps. In Scenario II, there are two AAVs on the left bank of the river, and two targets are moving in the river. We simulate this scenario with both the NBO and the greedy approaches. Figure 5 shows the snapshot of the scenario with the NBO approach at the end of the simulation, where the average rescue time is 45 time steps, and Figure 6 shows the simulation of the same scenario with the greedy approach, where the average rescue time is 62 time steps. In Scenario III, there are three AAVs (two on the left bank of the river and one on the right), and three targets are moving in the river. We simulate this scenario with both the NBO and the greedy approaches. Figure 7 shows the scenario with the NBO approach, where the average rescue time is 48 time steps, and Figure 8 shows the simulation of the same scenario with the greedy approach, where the average rescue time is 76 time steps. The simulation of these scenarios demonstrates that the NBO approach achieves a better coordination among the AAVs compared to the greedy approach while rescuing the targets, as evident from the average rescue times.

We compare the performance of the NBO approach with that of the greedy approach through Monte-Carlo simulations. We simulate the above scenarios with the NBO and the greedy approaches separately for 50 Monte-Carlo runs. In each scenario, we compute the average rescue time in every run for both the NBO and the greedy approaches. Figures 9, 10, and 11 show the plots of the cumulative frequencies of average rescue times for the NBO and the greedy approaches for Scenarios I, II, and III, respectively. Figures 9, 10, and 11 demonstrate that the NBO approach significantly outperforms the greedy approach.

The algorithm (NBO) runtime to compute the control commands for three AAVs (in Scenario III) in any time step in MATLAB is approximately 4 seconds on a lab computer (Intel Core i7-860 Quad-Core Processor with 8 MB Cache and 2.80 GHz speed). This runtime can be greatly reduced on a better processor and by further optimizing the code. Since the algorithm runtime is not prohibitive, it can be used in real time (i.e., for practical purposes).

6. Conclusions, Remarks, and Future Scope

We designed a guidance algorithm for autonomous amphibious vehicles (AAVs) to rescue moving targets in a 2D flood scenario, where the flood water flows across the scene, and the targets move in the flood water. We designed this algorithm based on the theory of partially observable Markov decision process (POMDP). Since a POMDP problem is intractable to solve exactly, we used an approximation method called nominal belief-state optimization (NBO). We simulated a few scenarios to demonstrate the coordination among the AAVs achieved by the NBO approach. We defined a performance metric called average rescue time to compare the performance of our approach with a greedy approach. Our results show that the NBO approach outperforms the greedy approach significantly. This was expected because unlike the greedy approach the NBO approach has a lookahead property, that is, the NBO approach trades off the short-term performance for the long-term performance. Although the greedy approach achieves coordination among the AAVs in that the AAVs eventually rescue all the targets, but the performance in terms of average rescue time, which is crucial in these kinds of rescue missions, is poor compared to our NBO approach. In our future work, we would like to develop methods to further improve our NBO approach (e.g., NBO with adaptive horizon). We would also like to extend our approach to a decentralized AAV guidance problem to rescue multiple targets. In this decentralized case, we will induce coordination among the AAVs to rescue multiple targets by appropriately optimizing the communication (at the network level) between the AAVs along with the kinematic controls for the AAVs.

Acknowledgments

This work was supported in part by the Fulbright Foundation. The authors would also like to acknowledge Colorado State University’s support via the Libraries Open Access Research and Scholarship Fund (OARS).