Many planning applications must address conflicting plan objectives, such as cost, duration, and resource consumption, and decision makers want to know the possible tradeoffs. Traditionally, such problems are solved by invoking a single-objective algorithm (such as A*) on multiple, alternative preferences of the objectives to identify nondominated plans. The less-popular alternative is to delay such reasoning and directly optimize multiple plan objectives with a search algorithm like multiobjective A* (MOA*). The relative performance of these two approaches hinges upon the number of 𝑓-values computed for individual search nodes. A* may revisit a node several times and compute a different 𝑓-value each time. MOA* visits each node once and may compute some number of 𝑓-values (each estimating the value of a different nondominated solution constructed from the node). While A* does not share 𝑓-values between searches for different solutions, MOA* can sometimes find multiple solutions while computing a single 𝑓-value per node. The results of extensive empirical comparison show that (i) the performance of multiple invocations of a single-objective A* versus a single invocation of MOA* is often worse in time and quality and (ii) that techniques for balancing per node cost and exploration are promising.

1. Introduction

Most realistic planning problems have multiple competing objectives. It is common in practice to select a preference over the objectives and solve the problem with respect to this preference [1]. It is also common that the preference is highly subjective, and the solution is found without knowledge of possibly better, alternative solutions. Instead, by finding multiple solutions, a human can apply their subjective preference over the objectives to the solution set (with full knowledge of the tradeoffs available). In this sense, finding multiple solutions can facilitate—or circumvent entirely—preference elicitation. The approach taken in many applications is to apply multiobjective reasoning [2–7], and in this work we study multiobjective search for planning.

1.1. Finding Sets of Plans

There are two ways to find a set of solutions that trade off the objectives differently: iterate a single objective algorithm over multiple preferences, or use a multiobjective algorithm with no assumptions about the preferences. However, the poor scalability of multiobjective heuristic search algorithms, such as multiobjective A* (MOA*) [8] has led to favoring the iteration a single-objective algorithms (such as A*) over different aggregations or bounds (i.e., preferences) on the objectives. Upon closer examination, we notice a fundamental inefficiency with multiple invocations of a single-objective algorithm: each search episode may expand many of the same search nodes and recompute the 𝑓-values (a relatively large cost in planning). The high number of redundant node expansions is especially pronounced when the set of nondominated plans positively interact (i.e., the plans share a common subsequence of actions). Moreover, with multiple invocations of the single-objective algorithm, there may be no guarantee that each solution will be nondominated with respect to the other solutions.

1.2. MOA*

MOA* generalizes A* to find multiple nondominated solutions in a very straightforward manner. A* searches for a single “best” solution in terms of one optimization metric captured by each node’s 𝑓-value. MOA* searches for multiple “best” solutions by permitting multiple 𝑓-values (a vector of 𝑓-values) per node. Each MOA*  𝑓-value is associated with a different nondominated solution. If a node is in the open list and at least one of its 𝑓-values is nondominated with respect to the 𝑓-values of the other nodes, then the node can be expanded.

1.3. Computing 𝑓-Values

Noticing that having a single nondominated 𝑓-value is enough to keep a node on the nondominated search frontier is an important observation that we exploit in this work. Computing multiple ℎ-values (to get multiple 𝑓-values) for a search node can be both good and bad. Having more ℎ-values improves a node’s chance for being on the nondominated search frontier, but also increases the per node cost. Ideally, we would like to compute only those ℎ-values that are needed to keep solution-bearing nodes on the nondominated frontier. Our intuition is that many plans positively interact, and if we can compute a single ℎ-value (or a small set of ℎ-values) for a set of interacting plans, then search will find these plans at a lower cost.

We explore several approaches to computing heuristics for MOA*: (i) compute a single ℎ-value per node that estimates the longest solution path, hoping that other solutions will positively interact, (ii) compute a uniform grid of ℎ-values per search node to avoid missing solutions that do not positively interact, (iii) with probability 1−𝑃 compute a uniform grid of ℎ-values (as in (ii)), and with probability 𝑃 compute the ℎ-values that are similar to the parent node’s ℎ-values that were nondominated in the open list. We find (in our experiments) that the third approach is the most useful because it balances exploration with preservation of nondominated partial solutions. We compare to the baseline approach of computing several solutions with A* by setting different bounds (preferences) on the plan objectives.

1.4. Probabilistic Planning

We compare A* and MOA* in probabilistic planning, where the plan length and probability of goal satisfaction are the competing objectives. Probabilistic planning largely simplifies our analysis because in partial solutions, one objective (plan length) is free to change, but the other objective (probability of goal satisfaction) is fixed (i.e., no partial plan collects the probability of goal satisfaction until it is a complete plan). The effect is that there is a single best 𝑔-value for each node, and the only way to obtain multiple 𝑓-values is to compute multiple ℎ-values. Hence, we can focus solely on how to compute multiple ℎ-values without considering multiple 𝑔-values. While we do not study partial satisfaction planning [9] in this work, we note that it largely resembles probabilistic planning from the perspective of a search algorithm, and we believe our techniques are applicable in this problem as well.

In the following, we present the MOA* algorithm, provide the intuition for investigating which 𝑓-values are computed in MOA*, discuss several approaches to computing ℎ-values for MOA*, describe how to formulate probabilistic planning at A* and MOA* search, study the empirical performance of the techniques on several domains, discuss related work, and conclude with future research directions. Our contribution is to evaluate the relative strengths of single objective and multiobjective search, and show that multiobjective search is preferable when nondominated solutions share common search nodes.

2. MOA*

MOA* [8] is a search algorithm that finds a set of paths, and relies on established notions of solution (non) dominance in multiobjective problem solving.

Let Π denote a set of solutions for some problem. Each solution 𝜋∈Π is associated with a value vector 𝑣(𝜋)=(𝑣1(𝜋),…,𝑣𝑚(𝜋)) defining its quality for each of 𝑚 objectives. A solution 𝜋  dominates solution ğœ‹î…ž, denoted ğœ‹â‰ºğœ‹î…ž, if it is no worse than ğœ‹î…ž in all objectives, 𝑣𝑖(𝜋)≤𝑣𝑖(ğœ‹î…ž), 𝑖=1,…,𝑚, and 𝑣(𝜋)≠𝑣(ğœ‹î…ž). Let ℰ(Π) denote the set of efficient (also called nondominated) solutions, such that for no pair of solutions 𝜋,ğœ‹î…žâˆˆâ„°(Π), ğœ‹â‰ºğœ‹î…ž or ğœ‹â‰»ğœ‹î…ž. As such, we seek efficient solutions that minimize the objectives differently. We characterize the quality of a set of solutions by computing its hypervolume [10], the size of the objective space dominated by the solutions. As the efficient set gains quality, the hypervolume increases. Let 𝐻(Π) denote the hypervolume (size of the dominated objective space) of a set of solutions Π={𝜋1,…,𝜋𝑛}, calculated as (our implementation uses an optimized, but equivalent computation for two objectives): 𝐻(Π)=𝑛𝑖=1(−1)𝑖+1𝑘1<⋯<𝑘𝑖𝑀𝑚=1⎡⎢⎢⎣1−max𝑗=1,...,ğ‘–âŽ›âŽœâŽœâŽğ‘£ğ‘šî‚€ğœ‹ğ‘˜ğ‘—î‚âˆ’ğ›¼ğ‘šğ›¼ğ‘šâˆ’ğ›¼ğ‘šâŽžâŽŸâŽŸâŽ âŽ¤âŽ¥âŽ¥âŽ¦,(1) where 𝛼𝑚 and 𝛼𝑚 denote respective upper and lower bounds on the 𝑚th objective (e.g., the lower and upper bounds on a solution’s probability of success is zero and one, resp.).

Consider the example in Figure 1, where five solutions {ğ‘Ž,…,𝑒} are shown. This example shows a normalized objective space (with two objectives) so that all solution values fall in the interval [0,1] in each objective. The solution 𝑑 is dominated by both ğ‘Ž and 𝑏 because it is greater in both objectives, and, likewise, 𝑒 is dominated by 𝑐. The efficient set is ℰ({ğ‘Ž,…,𝑒})={ğ‘Ž,𝑏,𝑐}, containing the nondominated solutions. The rectangles encompassing the regions that are greater in both objectives than solutions ğ‘Ž, 𝑏, and 𝑐 denote the hypervolumes (in this case, areas) dominated by each solution. The union of the hypervolumes dominated by the set is the hypervolume of the set, and its magnitude indicates the quality of the set. Notice that the space dominated by the set {ğ‘Ž,𝑏,𝑐} is larger than the space dominated by {ğ‘Ž,𝑏}, making the former set a better solution set.

MOA* [8] (Algorithm 1) and its variations [11] generalize the traditional, single-objective A* heuristic search algorithm by operating on graphs whose edge costs are vectors. Each efficient path from a start node to a terminal node (a solution) has an associated nondominated value vector (equal to the sum of the edge costs). MOA* finds a set of efficient paths from a source node to one of the terminal nodes by maintaining efficient sets for the familiar, scalar A* constructs (𝑔-, ℎ-, and 𝑓-values, and backpointers). The efficient set of 𝑔-values for each node represents the cost of all efficient paths reaching the node, the set of ℎ-values represents the estimated cost of all efficient paths reaching terminal nodes, and the set of 𝑓-values contains all efficient members of the cross-product of 𝑔- and ℎ-values (i.e., each 𝑓-value is the sum of a 𝑔-value with an ℎ-value). Nodes in the MOA*  Open list follow a strict partial order and have potential for expansion if one of their 𝑓-values are efficient with respect to 𝑓-values of other Open list nodes and previously found solutions. Our implementation expands a random efficient node from the Open list in each iteration.

(1) Initialize Open with the start node. Initialize an empty set of nodes Solutions.
(2) Find a subset of Open, denoted ℰ ( 𝑂 𝑝 𝑒 𝑛 ) , of nodes with 𝑓 -values not dominated by an 𝑓 -value of any node in Open or
(3) If | ℰ ( 𝑂 𝑝 𝑒 𝑛 ) | = 0 , then exit returning solutions found by following efficient back pointers from nodes in Solutions.
(4) Otherwise, select node 𝑛 from ℰ ( 𝑂 𝑝 𝑒 𝑛 ) , remove from Open, and add to Closed.
(5) If 𝑛 is a solution node, then add 𝑛 to Solutions and Goto 2.
(6) Otherwise, Expand 𝑛 , adding successors to Open (removing previously expanded successors from Closed) and computing
   successor 𝑓 -values. Goto 2.

MOA* is guaranteed to terminate with an optimal efficient set of solutions (a Pareto optimal set) when (i) edge cost vector components are all greater than zero, (ii) the graph is locally finite, (iii) the ℎ-values are admissible, and (iv) there are no cycles. In this work, we violate two of these assumptions: (i) is violated in probabilistic planning because goal satisfaction is zero for nonterminal actions, and (iii) is violated because the heuristics employed are inadmissible. As such, it is not guaranteed the MOA* will terminate, nor that it finds the optimal set of solutions. These features are obviously not very theoretically appealing, but we note that they are necessary in practice because scalability requirements and time constraints often prohibit optimality and termination. We adopt an empirical anytime approach to evaluating MOA* and characterize its performance over time. While our case study violates the requirements for MOA* optimality, these limitations are easily overcome by employing an admissible heuristic and enforcing nonzero edge costs.

3. Computing 𝑓-Values

The intuition for judiciously selecting which 𝑓-values to compute is based on the following observations. A* and MOA* can, and often will, expand the same search nodes to find a set of solutions. To find multiple solutions (each with respect to a different preference over the plan objectives), A* must compute a new 𝑓-value for reexpanded search nodes under each new preference. MOA* does not usually (nodes removed from the Closed list in step 6 of Algorithm 1 are technically reexpanded, but 𝑓-values are only added, not recomputed) reexpand nodes, and does not necessarily compute a different 𝑓-value for each preference over the plan objectives (nor does it require any preference).

Since it is possible for A* and MOA* to expand the same nodes and find the same solutions, whichever algorithm with a lower combined per node and per iteration cost will perform better. A*’s per node cost is dependent upon how many times the node is reexpanded (i.e., its 𝑓-value is recomputed for a new preference) and its per iteration cost relies on sorting the Open list. MOA*’s per node cost includes the cost of computing multiple 𝑓-values and its per iteration cost is predominantly incurred while finding the set of nodes with efficient 𝑓-values (its equivalent of sorting the Open list). (our implementation of MOA* uses Kung’s algorithm [12] to find the efficient set in each iteration, with complexity O(𝑛log𝑛) when there are two objectives and 𝑛 is the number of 𝑓-values of all nodes in Open).

With MOA*, it is possible to bootstrap a node’s 𝑓-values to find multiple efficient solutions. For example, if a node must be expanded to find all efficient solutions, then it is enough that one of its 𝑓-values places it within the set of efficient nodes in ℰ(𝑂𝑝𝑒𝑛). If an oracle can determine which 𝑓-value, among the set of possible 𝑓-values, will make the node most competitive within the efficient set (i.e., give the node the best chance of being efficient), then computing only this 𝑓-value will lead to a lower per node cost. This single 𝑓-value represents a set of positively interacting solutions to be found by expanding the node.

Without an oracle, there are several options for computing the 𝑓-values. All options balance two competing desires: lowering the per node cost and giving nodes the best chance at staying on the efficient frontier ℰ(𝑂𝑝𝑒𝑛). In the following section, we examine three approaches to computing 𝑓-values for MOA* that address these competing desires.

4. Computing MOA*𝑓-Values

While there are many ways to compute 𝑓-values for MOA*, such as restricting the 𝑔-values or the combinations of 𝑔- and ℎ-values, we focus solely on methods for computing ℎ-values. These methods include computing (i) a single ℎ-value per node (called 𝑀+), (ii) a uniform grid of ğ‘›â€‰â€‰â„Ž-values (called 𝑀𝑛), and (iii) a probabilistic choice between computing a uniform grid of ℎ-values or a set of ℎ-values from nondominated ℎ-values of the parent node (called 𝑀𝑁𝑛). Each of these methods involves placing an upper bound on all but one objective, and then computing the value of the free objective with respect to the bounds. We obtain multiple ℎ-values by computing the free objective value with respect to different bounds. The following explores the intuition behind, and computation of, each method.

4.1. Single ℎ-Value

𝑀+: computing a single ℎ-value (i.e., selecting a single bound for all but one objective) is difficult because it must represent the “cost to go” of an efficient solution found via the node under consideration. Our desires from the previous section are to lower the per node cost (which can only decrease here by not computing an ℎ-value) and to compute a most competitive ℎ-value. Without computing additional ℎ-values for comparison (e.g., by comparing the hypervolume dominated by each), it is difficult to determine which is the most competitive. Instead, we take an approach based on the intuition that, by estimating the cost of the solution that is deepest in the search graph, we can maximize potential for other solutions to be found while searching for the deepest solution. Each bound objective in the ℎ-value is bound with its estimated deepest solution value, and the last, free objective is estimated with respect to the bounds. Subsequently all ℎ-values for different nodes are compared solely on the basis of the free objective because the same bounds are use for the ℎ-value of each node.

4.2. Multiple ℎ-Values

𝑀𝑛: extending the 𝑀+ method to compute several ℎ-values can improve the competitiveness of a search node, albeit at increased cost. While one ℎ-value may never lead to an efficient 𝑓-value, another computed for the same node might. Similar to 𝑀+, 𝑀𝑛 bounds the value of all but one objective and computes an ℎ-value with respect to the bounds. Each of the ğ‘›â„Ž-values bounds the values of the objectives differently. We evaluate this method by spacing the bounds uniformly, but note that other approaches that bias the spacing may be viable as well.

4.3. Probabilistically Nondominated ℎ-Values

𝑀𝑁𝑛: the 𝑀+ and 𝑀𝑛 methods are two extremes that either use very few ℎ-values to keep per node cost low or use many ℎ-values to seek out many different plans. A simple way to combine the approaches is to probabilistically select between them when computing the ℎ-values for a node. While 𝑀+ will save some effort expanding the node, it does not account for which ℎ-values caused the current node’s parent to be expanded. There are likely to be only very few efficient 𝑓-values that caused the parent to be expanded, so instead of computing a single value, we can compute a set of ℎ-values similar to the parent’s efficient ℎ-values (i.e., those ℎ-values whose associated 𝑓-values were nondominated with respect to all other 𝑓-values of nodes in the Open list). The ℎ-values are similar to the parent ℎ-values because they use the same bounds on the fixed objectives and recompute the free objective. In this manner, we can compute ℎ-values that are likely to be most competitive. By probabilistically selecting between 𝑀𝑛 and computing ℎ-values similar to the parent node, we can both explore and preserve efficient ℎ-values.

5. Case Study of Probabilistic Planning

Probabilistic planning is a naturally multiobjective problem, where at its simplest, the plan objectives are the plan length and probability of goal satisfaction. More precisely, we use the risk (one minus the probability of goal satisfaction) so that we can minimize each objective. We choose to study probabilistic planning because it has one property that largely simplifies our analysis and otherwise boosts empirical performance when comparing MOA* and A*. As previously noted, MOA* associates cost vectors with search graph edges; in probabilistic planning, all edges leading to nonterminal nodes incur unit cost with respect to plan length and zero cost with respect to risk; however, edges leading to terminal nodes incur zero cost for the plan length and a possibly nonzero risk cost. The effect of this property is that there is a single efficient 𝑔-value for each nonterminal node, meaning that multiple 𝑓-values for a search node can only arise from computing multiple ℎ-values.

The following subsections define the conformant probabilistic planning problem, its formulation as a graph search problem in both A* and MOA*, and discussion of an existing reachability heuristic used in the search algorithms.

5.1. Conformant Probabilistic Planning

A conformant probabilistic planning problem is given by the tuple 𝐶𝑃𝑃=(𝑃,𝐴,𝑏𝐼,𝐺), where 𝑃 is a set of propositions, 𝐴 is a set of actions, 𝑏𝐼 is an initial belief state (probability distribution over initial states), and 𝐺 is the goal description (a conjunctive set of propositions).

A belief state is a probability distribution over all states, where each state is a set of propositions. The probability of a state 𝑠 in a belief state 𝑏, 𝑃(𝑠∣𝑏), is denoted 𝑏(𝑠). We say that a state 𝑠 is in 𝑏 (𝑠∈𝑏) if 𝑏(𝑠)>0. The probability that a belief state 𝑏 satisfies the goal 𝐺, is the sum of the probabilities of states where the goal is satisfied (𝐺⊆𝑠).

An action ğ‘Žâˆˆğ´ is a tuple (𝜌𝑒(ğ‘Ž),Φ(ğ‘Ž)), where 𝜌𝑒(ğ‘Ž) is an enabling precondition, and Φ(ğ‘Ž) is a set of outcomes. The enabling precondition 𝜌𝑒(ğ‘Ž) is a conjunctive set of propositions that determines action applicability. An action ğ‘Ž is applicable appl(ğ‘Ž,𝑏) in belief state 𝑏 if it is applicable in each state in the belief state, ∀𝑠∈𝑏𝜌𝑒(ğ‘Ž)⊆𝑠. The causative outcomes Φ(ğ‘Ž) are a set of tuples (𝑤𝑖(ğ‘Ž),Φ𝑖(ğ‘Ž)) representing possible outcomes (indexed by 𝑖), where 𝑤𝑖(ğ‘Ž) is the probability of outcome 𝑖 being realized, and Φ𝑖(ğ‘Ž) is a mutually exclusive and exhaustive set of conditional effects (indexed by 𝑗). Each conditional effect 𝜑𝑖𝑗(ğ‘Ž)∈Φ𝑖(ğ‘Ž) is of the form 𝜌𝑖𝑗(ğ‘Ž)→(𝜀+𝑖𝑗(ğ‘Ž),𝜀−𝑖𝑗(ğ‘Ž)), where both the antecedent (secondary precondition) 𝜌𝑖𝑗(ğ‘Ž) and positive 𝜀+𝑖𝑗(ğ‘Ž) and negative 𝜀−𝑖𝑗(ğ‘Ž) consequents are conjunctive sets of propositions. This representation of effects follows the 1ND normal form presented by Rintanen [13]. As outlined in the probabilistic PDDL (PPDDL) standard [14], it is possible to use the effects of every action to derive a state transition function 𝑇(𝑠,ğ‘Ž,ğ‘ î…ž) that defines a probability that executing ğ‘Ž in state 𝑠 will result in state ğ‘ î…ž. Executing action ğ‘Ž in belief state 𝑏, denoted exec(ğ‘Ž,𝑏)=ğ‘ğ‘Ž, defines the successor belief state such that ğ‘ğ‘Ž(ğ‘ î…žâˆ‘)=𝑠∈𝑏𝑏(𝑠)𝑇(𝑠,ğ‘Ž,ğ‘ î…ž).

A sequence of actions (ğ‘Ž1,…,ğ‘Žğ‘š), executed in belief state 𝑏, results in a state ğ‘î…ž, where ğ‘î…ž=exec(ğ‘Žğ‘š,exec(ğ‘Žğ‘šâˆ’1,…exec(ğ‘Ž1,𝑏)…)) and each action is executable in the appropriate belief state. The probability that the sequence of actions satisfies the goal is the probability that the final belief state satisfies the goal. The number of actions in the sequence is the length. As long as the sequence of actions respects the definition of applicability of each action, the sequence (including the empty sequence) is a feasible plan, but not necessarily an efficient plan.

5.2. Formulating CPP as Graph Search

We formulate CPP as both A* and MOA* search over the belief state space. Each search node is a belief state, and each edge is an action.

A* search associates a unit cost with each edge, and defines terminal nodes as those nodes where the belief state’s probability of goal satisfaction is no less than a given threshold 𝜏. The actions associated with edges leading to the terminal node identify the plan. We find multiple solutions with A* by invoking A* with different values for 𝜏.

MOA* search associates a cost vector with each edge. The first component of the vector is the action execution cost (as in A*), and the second component is the risk. Risk is only incurred when transitioning to terminal nodes, and only action execution cost is incurred when transitioned to nonterminal nodes. MOA* treats terminal nodes differently than A* because it does not exit upon finding a solution (i.e., MOA* does not use a goal satisfaction threshold 𝜏 and pursues multiple solutions). MOA* adds a single, unique terminal node and unique edges for “quit” actions that allow the search to choose to transition from any belief state to the terminal node. The quit actions (i) are applicable to all belief states, (ii) incur zero action execution cost, (iii) incur a risk cost equal to one minus the probability of goal satisfaction of the belief state where applied, and (iv) are not added to the plan extracted from the edges leading to the terminal node. By using quit actions, MOA* can continue searching through nodes that satisfy the goal (with some probability), but retain the solutions.

5.3. Heuristics for CPP

The most effective heuristics for CPP involve estimating the cost to achieve the goal with probability no less than 𝜏 [15, 16]. We employ the McLUG technique described by Bryce et al. [15] to compute relaxed plans, using the number of actions as the heuristic. The approach taken by Bryce et al. [15] to compute the heuristic for a belief state and value of 𝜏 is to compute 𝑘 deterministic planning graphs and extract a relaxed plan that achieves the goals in at least 𝑘𝜏 of the planning graphs. Each planning graph is deterministic because it is built with respect to a state sampled from the belief state and sampled outcomes of each probabilistic action in each action layer. Symbolic techniques make the construction of multiple planning graphs and the relaxed plan efficient.

In MOA*, the heuristic is computed once for the bound 𝜏=1.0 in 𝑀+. For example, when 𝜏=1.0, the relaxed plan might contain ten actions; in this case, MOA* would add an ℎ-value vector (10,0.0; i.e., 1.0−𝜏=risk). The 𝑀𝑛 method computes the heuristic for each bound 𝜏∈{1/𝑛,2/𝑛,…,𝑛/𝑛}, adding the set of ℎ-value vectors (𝑐1,1−1/𝑛),(𝑐2,1−2/𝑛),…,(𝑐𝑛,0.0), where 𝑐1,𝑐2,…,𝑐𝑛 are the numbers of actions in the relaxed plans for each value of 𝜏. With probability 𝑃=0.5, the 𝑀𝑁𝑛 heuristic uses the 𝑀𝑁 heuristic and, with probability 1−𝑃 computes the relaxed plan heuristic for the same values of 𝜏 that the nondominated parent ℎ-values computed.

6. Empirical Evaluation

We compare A* to multiple versions of MOA* (using different heuristic computation strategies) on several CPP problems across four domains. The questions that we attempt to answer are as follows. (i)Will multiple invocations of A* with different preferences or one invocation of MOA* with no preferences find a better set of solutions, find the set faster, or both?(ii)Which method for computing ℎ-values in MOA* will perform best?

The following describes the evaluation metrics, domains, the test environment, and results.

6.1. Evaluation Metrics

As previously mentioned, we measure the quality of a set of solutions by its hypervolume. We can measure the hypervolume over time with MOA*, but because A* finds a single solution at a time, only measuring its hypervolume over time is not as appealing. Thus, we compute a set of solutions using A* and also compare the total time taken by MOA* to find a plan set with the same or better hypervolume. We also compare the maximum hypervolume found by each technique, within twenty minutes for MOA*, and for the fixed number of solutions found by A* (where each invocation to find a solution is given a twenty-minute limit).

6.2. Domains

The evaluation domains include an artificial domain, called Grid and Ladder (GL), and several domains from the CPP literature, including Logistics, Gripper, and Sand Castle.

The GL domain is an adaptation of the Grid domain presented by Hyafil and Bacchus [17] that adjusts the degree of positive interaction and independence among the nondominated plans. GL includes five actions: move right, move left, move up, move down, and climb. The move actions have the effect of moving along the intended axis with 0.8 probability and the effect of moving laterally along the other axis with 0.1 probability in each direction. The initial state starts the agent at one corner of the grid (with certainty), and the goal is to reach the opposite corner and reach the top of the ladder. The deterministic climb ladder action can only be performed in the destination corner once for each rung of the ladder and, once executed, prevents further move actions (it is possible to climb the ladder, but not go down and move about the grid). The probability of achieving the goal is equal to the probability of reaching the corner with the ladder because the climb action does not change the probability of goal satisfaction (assuming the goal of being at the top of the ladder is attained by repeated climb actions). Alternative solutions perform different sequences of grid moves to reach the corner with different probabilities and plan lengths, and all plans must climb the ladder. The problems vary the size of the grid and the height of the ladder. A larger grid equates to longer positively interacting plan prefixes, and a ladder with more rungs equates to longer independent plan suffixes. GL presents challenges to MOA* because computing too many ℎ-values during the grid traversal phase of the plan is costly, but additional ℎ-values are required prior to the ladder climbing phase to push search to find alternative plans. GL also challenges A* because it must recomputes the 𝑓-value for many of the same nodes within the plan prefix.

The other domains are not modified from their original versions to help gauge the MOA* approaches in problems without clear structure. Logistics [17] involves probabilistic (un)load actions and initial belief states where package locations are uncertain, and we use the instance p2-2-2, with two cities, two packages, and two possible locations for each package. Gripper [18] involves several machining operations to manufacture widgets that work probabilistically. Sand Castle [19] involves two actions to build a castle or dig a moat, both of which are probabilistic and affect the success of the other. GL and Logistics are relatively challenging domains that cause state-of-the-art CPP planners to struggle whereas Gripper and Sand Castle are fairly simple.

6.3. Environment

All experiments were conducted on a 3 Ghz Xeon processor running Linux with 8 GB of RAM. All code was written in C++ and is based on the POND planner [15], and all hypervolume computations were done offline after planning. The McLUG heuristic computed for the GL domain used 32 planning graphs per ℎ-value, and 128 planning graphs per ℎ-value in the other domains. The comparisons of hypervolume across instances were with respect to the same lower and upper bounds on each objective (computed from the planner output). The results for A* are averaged over five runs of computing plans for the set of thresholds and the MOA* results are averaged over five invocations on each problem. A* is invoked with thresholds {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0} for each problem. Invocations not returning a solution are not counted in the results.

6.4. Results

Figure 2 shows the hypervolume achieved over time for nine problems in the GL domain, where each is a grid of size three, six, or ten, and has a ladder of height two, six, or ten. Each row of plots in the figure is a common grid size (increasing in size with each row), and each column shares a common height ladder (increasing from left to right). There are four methods shown in each plot: A*, MOA* with 𝑀10, MOA* with 𝑀𝑁2, and MOA* with 𝑀𝑁10. Some methods described in the accompanying Table 1 are not shown in the plots because they were not as competitive. Table 1 includes results for the 𝑀+ and 𝑀𝐺2 methods, and the data is the average time taken in seconds to exceed the average hypervolume found by A* (T(s)) and the average maximum hypervolume found by each method (HV). The T(s) and HV results for A* are the average time taken to find the average hypervolume using A*, and the T(s) results for all other methods are the time taken to exceed the A* HV. The HV for all other methods is the maximum hypervolume found before the timeout. The “—” entries indicate that either no solution set could be found that exceeds the A* hypervolume, in the case of T(s), or that no single solution was found, in the case of HV.

The plots that show the hypervolume over time indicate that the 𝑀𝑁𝑛 method tends to find the most hypervolume of the MOA* methods, outperforming 𝑀+ and 𝑀𝑛, especially as the problems become larger. We also see that as the problems become larger, using 𝑀𝑁10 is required to find more hypervolume. A* tends to find its first solutions very quickly, and it takes longer for the MOA* methods to find their first solutions. However, MOA* tends to attain quite a bit of hypervolume earlier than A*. As the problems become more difficult, MOA* takes increasingly longer over A* to find initial solutions, but finds considerably more hypervolume within a short period of time thereafter. Note that small improvements in hypervolume later in the search are not insignificant because they represent new and improved solutions that may be difficult to find despite the small hypervolume they add.

Figure 3 and Table 2 present results for the other domains that show trends similar to those seen in the GL domain. The MOA* approaches tend to find solutions later than A*, but often find much better solutions as measured by maximum hypervolume. The difference in hypervolume is especially pronounced in the Gripper and Sand Castle problems because these problems have many solutions whose probability of goal satisfaction falls in the interval [0.9, 1.0]; and, because we use only use the extreme points of this interval as values for 𝜏 in A*, it misses many solutions in the interval. MOA* does not use bounds on the objectives and can seek out all the solutions within intervals that are created by the bounds—leading to more hypervolume. By selecting an a priori set of preferences on the objectives, A* cannot find some solutions.

6.5. Summary

From our analysis, we have seen the following trends in comparing MOA* and A*. (i)MOA* improves the quality of solution sets over A* because it is not limited to finding a predetermined number of solutions, and it continues to search and improve upon the solutions.(ii)The 𝑀𝑁𝑛 method for computing ℎ-values is the most effective MOA* technique because it balances exploration with the retention of efficient partial solutions. (iii)The 𝑀+ method does not pursue enough different solutions to attain high hypervolume, or the domains tested did not have a large degree of positive interaction among the solution sets. (iv)The 𝑀𝑛 method was useful for finding high hypervolume in smaller problems, but failed to scale well. (v)A* is fast to find a first solution, but much slower than MOA* at finding a set with large hypervolume.

Multiobjective problem solving has been previously studied in planning [1, 9, 20–22]. However, to our knowledge, all prior works study how to best formulate preferences over the objectives and solve a single objective problem. As discussed in the previous section, iteratively solving the problem with different preferences as a single objective problem cannot only miss solutions, but may take considerably longer, or fail, to find a set of solutions with comparable quality.

The work of Van Den Briel et al. [9] on partial satisfaction planning (PSP) is especially close to our study of probabilistic planning. PSP allows for the satisfaction of a subset of goals with utilities, where CPP allows for a probabilistic satisfaction of all goals. Thus, both problems have the same interesting property in MOA*: there is only one 𝑔-value for each search node, and the only way to obtain multiple 𝑓-values is by computing multiple ℎ-values. We expect that applying MOA* to PSP would attain results similar to those presented in this work.

Finding a set of diverse solutions to planning problems is an important problem recently studied in planning [23]. Srivastava et al. [23] construct solutions that are diverse with respect to a distance measure, defined in terms of the causal structure of the plans. We take a different approach, where we find plans that are diverse in their objectives. These two approaches can be seen as complementary, the former finding diversity in the decision space, and the latter finding diversity in the objective space.

MOA* was originally studied by Stewart and White [8], showing conditions for optimality, and termination. More recently, Mandow and Perez de la Cruz [11] modified the algorithm to expand the different 𝑓-values of a node within each iteration, where the original algorithm expanded the node itself. Our implementation is based on the original algorithm, and our work explores which 𝑓-values to compute, a topic not discussed by prior work on MOA*.

8. Conclusion and Future Work

We have shown that MOA* can find a better set of solutions than A* by balancing the explorative capability attained by increasing the number of 𝑓-values per node with the decreased per node cost associated with computing fewer 𝑓-values. The most effective technique for computing a node’s 𝑓-values randomly chooses between computing a uniform grid of 𝑓-values and recomputing the 𝑓-values proven efficient by its parent node. By finding a better set of solutions more quickly, MOA* is a viable choice for computing multiple solutions for a problem. Moreover, the efficient solutions are naturally diverse in the objective space. The limitations of MOA* are that when users have clear preferences it is rendered unnecessary, and as found by our evaluation, it can be more efficient to perform multiple A* searches when the nondominated solutions share few common search nodes.

In future work, we intend to explore additional techniques for computing ℎ-values, methods for managing multiple 𝑔-values per search node, other types of planning problems (such as PSP), and planning with more objectives. We are also interested in combining our approach with techniques for finding plans that are diverse in the decision space (i.e., causally diverse).