This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Efficient Constraint Generation for Stochastic Shortest Path Problems

Johannes Schmalz, Felipe Trevizan
Abstract

Current methods for solving Stochastic Shortest Path Problems (SSPs) find states’ costs-to-go by applying Bellman backups, where state-of-the-art methods employ heuristics to select states to back up and prune. A fundamental limitation of these algorithms is their need to compute the cost-to-go for every applicable action during each state backup, leading to unnecessary computation for actions identified as sub-optimal. We present new connections between planning and operations research and, using this framework, we address this issue of unnecessary computation by introducing an efficient version of constraint generation for SSPs. This technique allows algorithms to ignore sub-optimal actions and avoid computing their costs-to-go. We also apply our novel technique to iLAO resulting in a new algorithm, CG-iLAO. Our experiments show that CG-iLAO ignores up to 57%57\% of iLAO’s actions and it solves problems up to 8×8\times and 3×3\times faster than LRTDP and iLAO.

1 Introduction

Planning is an important facet of AI that gives efficient algorithms for solving current real-world problems. Stochastic Shortest Path problems (SSPs) (Bertsekas and Tsitsiklis 1991) generalise classical (deterministic) planning by introducing actions with probabilistic effects, which lets us model problems where the actions are intrinsically probabilistic. Value Iteration (VI) (Bellman 1957) is a dynamic programming algorithm that forms the basis of optimal algorithms for solving SSPs. VI finds the cost-to-go for each state, which describes the solution of an SSP. A state ss’s cost-to-go is the minimum expected cost of reaching a goal from ss, and similarly a action aa’s cost-to-go is the minimum after applying aa. VI finds the optimal cost-to-go by iteratively applying Bellman backups, which update each state’s cost-to-go with the minimal outgoing action’s cost-to-go.

LRTDP (Bonet and Geffner 2003) and iLAO (Hansen and Zilberstein 2001), the state-of-the-art algorithms for optimally solving SSPs, build on VI and offer significant speedup by using heuristics to apply Bellman backups only to promising states and pruning states that are deemed too expensive. A shortcoming of such algorithms is that each Bellman backup must consider all applicable actions. For instance, let ss and aa be a state and an applicable action, even if all successors of aa will be pruned because they are too expensive, a Bellman backup on ss still computes the QQ-value of aa, so these algorithms can prune unpromising states but not actions. This issue is compounded because algorithms for SSPs require arbitrarily many Bellman backups on each state ss to find the optimal solution, thus wasting time on computing QQ-values for such actions many times.

This issue of computing unnecessary QQ-values for a state ss is addressed by action elimination (Bertsekas 1995), which can be implemented in search algorithms to prune useless actions. Action elimination looks for pairs (a,a^)(a,\widehat{a}) of applicable actions in a state, such that a lower bound on a^\widehat{a}’s cost-to-go exceeds an upper bound on aa’s cost-to-go, in which case a^\widehat{a} is proved to be a useless action and can be pruned. Although domain-independent lower bounds (heuristics) can be computed efficiently, finding an efficient, domain-independent upper bound remains an open question to the best of our knowledge. This gap has limited the use of action elimination in domain-independent planning. In the context of optimal heuristic planning for SSPs, the only algorithm we are aware of that utilises action elimination to prune actions is FTVI (Dai, Weld et al. 2009). Other algorithms, such as BRTDP (McMahan, Likhachev, and Gordon 2005), FRTDP (Smith and Simmons 2006) and VPI-RTDP (Sanner et al. 2009), use upper bounds in conjunction with lower bounds to guide their search. However, unlike FTVI, they do not perform action elimination.

We present a general technique for ignoring actions that does not rely on upper bounds. In contrast to action elimination that incrementally prunes useless actions, our approach initially treats all actions as inactive, i.e., not contributing to the solution. It then iteratively adds actions to the search that are guaranteed to improve the current solution. To develop our approach, we strengthen the connections between planning and operations research by relating heuristic search to variable and constraint generation. Similar to heuristic search, variable and constraint generation enable the solving of large Linear Programs (LPs) by considering only a subset of variables and constraints. We show that algorithms such as iLAO implicitly perform constraint generation, albeit in a trivial manner, to the LP encoding of VI. Building on this, we introduce an efficient algorithm for constraint generation for SSPs that leads to inactive actions being ignored. We apply our approach to iLAO to get the novel algorithm: CG-iLAO.

In our experiments, CG-iLAO solves problems up to 8×8\times and 3×3\times faster than LRTDP and iLAO, respectively. CG-iLAO is faster than the others over various problem difficulties: its improvement is apparent from problems that require only 4 minutes to solve, and the improvement gap increases as problems take longer to solve. To explain this, we quantify that CG-iLAO only considers 43–65% of iLAO’s total actions thus fewer actions’ costs-to-go are computed. Investigating further, we empirically show that CG-iLAO combines iLAO’s efficient use of backups and LRTDP’s strong pruning capability, thereby displaying the best characteristics of both in a single algorithm.

The structure of this paper is as follows: we first introduce the background for SSPs and existing methods for solving SSPs. Second, we give a brief background to linear programming and connect linear programming techniques to heuristic search. Then we explain and motivate our novel algorithm CG-iLAO and prove its correctness. Finally, we empirically evaluate the performance of CG-iLAO.

2 Background

A Stochastic Shortest Path problem (SSP) (Bertsekas and Tsitsiklis 1991) is a tuple 𝖲,s0,𝖦,𝖠,P,C\langle\mathsf{S},s_{0},\mathsf{G},\mathsf{A},P,C\rangle where: 𝖲\mathsf{S} is a finite set of states; s0𝖲s_{0}\in\mathsf{S} is the initial state; 𝖦𝖲\mathsf{G}\subset\mathsf{S} are the goal states with 𝖦\mathsf{G}\neq\emptyset; 𝖠\mathsf{A} is a finite set of actions and 𝖠(s)𝖠\mathsf{A}(s)\subseteq\mathsf{A} denotes the actions applicable in state ss; P(s|s,a)P(s^{\prime}|s,a) gives the probability that applying action aa in state ss results in state ss^{\prime}; C(s,a)>0C(s,a)\in\mathbb{R}_{>0} is the cost of applying aa in ss.

The states immediately reachable by applying aa to ss are called successors and are given by succ(s,a){s𝖲:P(s|s,a)>0}\text{succ}(s,a)\coloneqq\{s^{\prime}\in\mathsf{S}:P(s^{\prime}|s,a)>0\}. A solution to an SSP is given by a map π:𝖲𝖠\pi:\mathsf{S}\to\mathsf{A}, called a policy. A policy π\pi is closed w.r.t. s0s_{0} if each state ss that can be reached by following π\pi from s0s_{0} is either a goal or π\pi is defined for ss. A policy is proper w.r.t. s0s_{0} if it reaches the goal with probability 1 from s0s_{0} and it is improper otherwise. An optimal policy π\pi^{*} is any proper policy that minimises the expected cost to reach the goal from the initial state s0s_{0}.

For simplicity, we make two standard assumptions: (i) there exists at least one proper policy w.r.t. s0s_{0}, this is called the reachability assumption; and (ii) all improper policies have infinite expected cost. A consequence of assumption (ii) is that 𝖠(s)\mathsf{A}(s)\neq\emptyset for all 𝖲𝖦\mathsf{S}\setminus\mathsf{G}. Note that we define CC to be strictly positive in order to avoid zero-cost cycles that would violate assumption (ii). In our experiments, we relax the reachability assumption by applying the fixed-penalty transformation of SSPs (Trevizan, Teichteil-Königsbuch, and Thiébaux 2017) resulting in a new SSP without dead ends. Other approaches for handling SSPs such as S3P (Teichteil-Königsbuch 2012) are also compatible with our approach.

An SSP’s optimal solution is uniquely represented by the optimal value function VV^{*}, which is the unique fixed-point solution to the Bellman equations:

V(s)={0 if s𝖦mina𝖠(s)Q(s,a)otherwises𝖲V(s)=\begin{cases}0&\text{ if }s\in\mathsf{G}\\ \min_{a\in\mathsf{A}(s)}Q(s,a)&\text{otherwise}\end{cases}\quad\forall s\in\mathsf{S} (1)

where Q(s,a)C(s,a)+s𝖠(a)P(s|s,a)V(s)Q(s,a)\coloneqq C(s,a)+\sum_{s^{\prime}\in\mathsf{A}(a)}P(s^{\prime}|s,a)V(s) is known as the QQ-value of ss and aa. A (optimal) value function V(s)V(s) and the associated QQ-value Q(s,a)Q(s,a) represent the (minimum) expected cost to reach the goal from state ss and after executing action aa on state ss, respectively. Given a value function VV, the policy associated with it is defined as πV(s)argmina𝖠(s)Q(s,a)\pi_{V}(s)\coloneqq\mathop{\rm argmin}_{a\in\mathsf{A}(s)}Q(s,a) and is known as the greedy policy for VV. Ties can be broken arbitrarily thanks to assumption (ii), and for simplicity we assume some tie-breaking rules that ensure the greedy policy is unique.

The Bellman equations (1) can be iteratively solved by Value Iteration (VI) (Bellman 1957): VI starts with an arbitrary value function V0V^{0} and computes Vt+1(s)mina𝖠(s)Q(s,a)V^{t+1}(s)\coloneqq\min_{a\in\mathsf{A}(s)}Q(s,a) over all states, where Q(s,a)Q(s,a) uses the previous value function VtV^{t}. This process of computing Vt+1(s)V^{t+1}(s) using VtV^{t} is called a Bellman backup. VI is guaranteed to asymptotically converge to VV^{*} regardless of V0V^{0}. For practical reasons, VI is terminated at iteration tt when the Bellman residual res(s)|Vt(s)mina𝖠(s)Q(s,a)|\textsc{res}(s)\coloneqq|V^{t}(s)-\min_{a\in\mathsf{A}(s)}Q(s,a)| is less than or equal to ϵ>0\epsilon\in\mathbb{R}_{>0} for all s𝖲s\in\mathsf{S}.

Given a policy π\pi, its policy envelope 𝖲π𝖲\mathsf{S}^{\pi}\subseteq\mathsf{S} is the set of all reachable states when following π\pi from s0s_{0}. Note that VI explores the complete state space 𝖲\mathsf{S} regardless of the optimal policy envelope 𝖲π\mathsf{S}^{\pi*}’s size. To address this shortcoming, heuristic search algorithms such as iLAO (Hansen and Zilberstein 2001) and LRTDP (Bonet and Geffner 2003) use a heuristic function HH to initialise V0V^{0}, which guides the exploration of the state space 𝖲\mathsf{S} in a way that expands as few states as possible. To find VV^{*}, heuristic search algorithms require the heuristic to be admissible, that is, H(s)V(s)s𝖲H(s)\leq V^{*}(s)\;\forall s\in\mathsf{S}. Often heuristics are also monotonic, which immediately implies admissibility. A value function VV is monotonic if V(s)mina𝖠(s)Q(s,a)s𝖲V(s)\leq\min_{a\in\mathsf{A}(s)}Q(s,a)\;\forall s\in\mathsf{S}, and the definition is analogous for HH. Similar to VI, heuristic search algorithms converge to VV^{*} asymptotically and require a practical stop criterion. This stop criterion is known as ϵ-consistency\epsilon\text{-consistency} (Bonet and Geffner 2003) and is defined as:

Definition 1 (ϵ-consistency\epsilon\text{-consistency}).

A value function VV is ϵ-consistent\epsilon\text{-consistent} if res(s)ϵs𝖲πV\textsc{res}(s)\leq\epsilon\;\forall s\in\mathsf{S}^{\pi_{V}}.

Notice that ϵ-consistency\epsilon\text{-consistency} checks the residual only on the policy envelope of a greedy policy and states outside the envelope are permitted to have a residual larger than ϵ\epsilon.

1 Function iLAO (𝕊,H,ϵ)(\mathbb{S},H,\epsilon)
2       𝕊^\widehat{\mathbb{S}}\leftarrow partial SSP {s0},s0,{s0},,P,C,H\langle\{s_{0}\},s_{0},\{s_{0}\},\emptyset,P,C,H\rangle
3       VValue Function initialised by HV\leftarrow\text{Value Function initialised by }H
4       repeat
5             \mathcal{E}\leftarrow post-order DFS traversal of π^V\widehat{\pi}_{V} from s0s_{0}
6             𝕊^Expand-Fringes(𝕊,𝕊^,)\widehat{\mathbb{S}}\leftarrow\textsc{Expand-Fringes}{}(\mathbb{S},\widehat{\mathbb{S}},\mathcal{E})
7             𝖥𝖲π^V(𝖦^𝖦)\mathsf{F}\leftarrow\mathsf{S}^{\widehat{\pi}_{V}}\cap(\widehat{\mathsf{G}}\setminus\mathsf{G})
8             V,res,π^oldBackups(𝕊^,,V,𝖥)V,\textsc{res},\widehat{\pi}_{\text{old}}\leftarrow\textsc{Backups}{}(\widehat{\mathbb{S}},\mathcal{E},V,\mathsf{F})
9            
10      until 𝖥=\mathsf{F}=\emptyset and π^old=π^V\widehat{\pi}_{\text{old}}=\widehat{\pi}_{V} and res<ϵ\textsc{res}<\epsilon
11      return VV
12
13Function Expand-Fringes(𝕊,𝕊^,)\textsc{Expand-Fringes}{}(\mathbb{S},\widehat{\mathbb{S}},\mathcal{E})
14       for sf(𝖦^𝖦)s_{f}\in\mathcal{E}\cap(\widehat{\mathsf{G}}\setminus\mathsf{G}) do
15             𝖦^𝖦^{sf}\widehat{\mathsf{G}}\leftarrow\widehat{\mathsf{G}}\setminus\{s_{f}\}
16             Add-Actions(𝕊,𝕊^,sf,𝖠(sf))\textsc{Add-Actions}(\mathbb{S},\widehat{\mathbb{S}},s_{f},\mathsf{A}(s_{f}))
17            
18       return 𝕊^\widehat{\mathbb{S}}
19
20Function Add-Actions(𝕊,𝕊^,s,𝖠)\textsc{Add-Actions}{}(\mathbb{S},\widehat{\mathbb{S}},s,\mathsf{A}^{\prime})
21       𝖠^(s)𝖠^(s)𝖠\widehat{\mathsf{A}}(s)\leftarrow\widehat{\mathsf{A}}(s)\cup\mathsf{A}^{\prime}
22       for a𝖠a\in\mathsf{A}^{\prime} do
23             𝖦^𝖦^(succ(s,a)𝖲^)\widehat{\mathsf{G}}\leftarrow\widehat{\mathsf{G}}\cup(\text{succ}(s,a)\setminus\widehat{\mathsf{S}})
24             𝖲^𝖲^succ(s,a)\widehat{\mathsf{S}}\leftarrow\widehat{\mathsf{S}}\cup\text{succ}(s,a)
25            
26      
27
28Function Backups(𝕊^,,V,𝖥)\textsc{Backups}{}(\widehat{\mathbb{S}},\mathcal{E},V,\mathsf{F})
29       π^oldπ^V\widehat{\pi}_{\text{old}}\leftarrow\widehat{\pi}_{V}
30       repeat
31             res0\textsc{res}\leftarrow 0
32             for s𝖦^s\in\mathcal{E}\setminus\widehat{\mathsf{G}} do
33                   Qminmina𝖠^(s)Q(s,a)Q_{\min}\leftarrow\min_{a\in\widehat{\mathsf{A}}(s)}Q(s,a)
34                   resmax(|V(s)Qmin|,res)\textsc{res}\leftarrow\max(|V(s)-Q_{\min}|,\textsc{res})
35                   V(s)QminV(s)\leftarrow Q_{\min}
36                  
37            
38      until res<ϵ\textsc{res}<\epsilon or π^Vπ^old\widehat{\pi}_{V}\neq\widehat{\pi}_{\text{old}} or 𝖥\mathsf{F}\neq\emptyset
39      return V,res,π^oldV,\textsc{res},\widehat{\pi}_{\text{old}}
Algorithm 1 iLAO

We close this section by reviewing iLAO (Hansen and Zilberstein 2001). iLAO (alg. 1) is an iterative algorithm which works by incrementally growing a partial SSP:111Called the explicit graph in the original paper.

Definition 2 (Partial SSP).

Given an SSP 𝖲,s0,𝖦,𝖠,P,C\langle\mathsf{S},s_{0},\mathsf{G},\mathsf{A},P,C\rangle and a heuristic HH, a partial SSP 𝕊^\widehat{\mathbb{S}} is an SSP with terminal costs defined by 𝕊^=𝖲^,s0,𝖦^,𝖠^,P,C,H\widehat{\mathbb{S}}=\langle\widehat{\mathsf{S}},s_{0},\widehat{\mathsf{G}},\widehat{\mathsf{A}},P,C,H\rangle with 𝖲^𝖲,𝖦^𝖲^,𝖦𝖲^𝖦^,𝖠^(s)𝖠(s)s𝖲^\widehat{\mathsf{S}}\subseteq\mathsf{S},\widehat{\mathsf{G}}\subset\widehat{\mathsf{S}},\mathsf{G}\cap\widehat{\mathsf{S}}\subseteq\widehat{\mathsf{G}},\widehat{\mathsf{A}}(s)\subseteq\mathsf{A}(s)\;\forall s\in\widehat{\mathsf{S}} and terminal cost HH.

SSPs with terminal costs have a one-time terminal cost of H(g^)H(\widehat{g}) that is incurred when g^𝖦^\widehat{g}\in\widehat{\mathsf{G}} is reached, so their Bellman equations are (1) with the goal case replaced by V(g^)=H(g^)V(\widehat{g})=H(\widehat{g}) for all g^𝖦^\widehat{g}\in\widehat{\mathsf{G}}. It is trivial to see that SSPs with terminal costs are equivalent to SSPs, and we use them to simplify our presentation. In a partial SSP 𝕊^\widehat{\mathbb{S}}, we refer to states in 𝖦^𝖦\widehat{\mathsf{G}}\setminus\mathsf{G} as artificial goals, and we define π^V\widehat{\pi}_{V} to be the greedy policy over VV restricted to 𝖲^𝖦^\widehat{\mathsf{S}}\setminus\widehat{\mathsf{G}}.

At each iteration, iLAO expands its partial SSP 𝕊^\widehat{\mathbb{S}} by expanding the artificial goals reachable by π^V\widehat{\pi}_{V} into regular states. To expand g^𝖦^𝖦\widehat{g}\in\widehat{\mathsf{G}}\setminus\mathsf{G}, iLAO adds g^\widehat{g}’s applicable actions to 𝕊^\widehat{\mathbb{S}} and adds new reachable states as artificial goals (alg. 1 alg. 1). This artificial goal expansion is done to make π^V\widehat{\pi}_{V} eventually closed w.r.t. s0s_{0} for the original SSP 𝕊\mathbb{S}. Simultaneously, iLAO also works towards making VV ϵ-consistent\epsilon\text{-consistent} by applying a Bellman backup to all the states reachable by π^V\widehat{\pi}_{V} (alg. 1 alg. 1). These Bellman backups are ordered by a post-order DFS traversal of π^V\widehat{\pi}_{V}, so states that occur closer to artificial goals are updated first, and s0s_{0} is updated last. Note that, when a state is expanded, it may have a successor ss already within the partial SSP. If this happens, the DFS in alg. 1 alg. 1 must keep traversing π^V\widehat{\pi}_{V} from ss to ensure the policy envelope \mathcal{E} is accurate.

In the next section, we show how to interpret iLAO through the lens of Operations Research by relating it to techniques used for handling large linear programs.

3 iLAO as Linear Program

SSPs can be solved by the Linear Program (LP) presented in LP 1. This LP is known as the primal LP or VI LP since it directly encodes the Bellman equations.

max𝓥\displaystyle\max_{\bm{\mathcal{V}}}\quad Vs0\displaystyle\mathrlap{V_{s_{0}}} (LP 1) s.t.\displaystyle\mathrm{s.t.}\quad VsC(s,a)+s𝖲P(s|s,a)Vs\displaystyle\mathrlap{V_{s}\leq C(s,a)+\!\sum_{\mathclap{s^{\prime}\in\mathsf{S}}}P(s^{\prime}|s,a)V_{s^{\prime}}} s𝖲𝖦,a𝖠(s)\displaystyle\forall s\in\mathsf{S}\!\setminus\!\mathsf{G},a\!\in\!\mathsf{A}(s) (C1) Vg0\displaystyle\mathrlap{V_{g}\leq 0} g𝖦\displaystyle\forall g\in\mathsf{G} (C2)

Each variable Vs𝓥V_{s}\in\bm{\mathcal{V}} represents V(s)V(s), and for each state s𝖲𝖦s\in\mathsf{S}\setminus\mathsf{G} the relevant constraints C1 encode V(s)mina𝖠(s)Q(s,a)V(s)\leq\min_{a\in\mathsf{A}(s)}Q(s,a). When clear from context, we use V(s)V(s) to represent VsV_{s}, and Q(s,a)Q(s,a) for the right-hand side of ss and aa’s constraint C1. Together with the objective that maximises V(s0)V(s_{0}), we obtain the Bellman equations (1) for the states in the optimal policy envelope 𝖲π\mathsf{S}^{\pi*}, i.e., the constraints are active (tight) for the pairs (s,π(s))(s,\pi^{*}(s)) for all s𝖲πs\in\mathsf{S}^{\pi*} and inactive (slack) everywhere else. We reframe iLAO’s incremental growing of its partial SSP as solving LP 1 with variable and constraint generation (Bertsimas and Tsitsiklis 1997).222Variable generation is also known as column generation and constraint generation is also known as the cutting plane method.

Variable generation is a technique from Operations Research that enables us to solve LPs with a large number of variables by considering only a subset of variables. Given an LP with missing variables called the Reduced Master Problem (RMP), variable generation finds a set of variables outside the RMP whose addition lets the RMP’s solution quality improve. Variable generation provides a sound and complete method to select such variables and a stop criterion that ensures the optimal solution for the RMP is also optimal for the original LP. Heuristic search algorithms, such as iLAO, can be seen as a variable generation algorithm over LP 1, where each of its partial SSPs represents an RMP with the subset of variables {Vs:s𝖲^}\{V_{s}:s\in\widehat{\mathsf{S}}\}. For iLAO, the variable selection mechanism is inherited from A and is represented by the expansion of the artificial goals reachable by π^V\widehat{\pi}_{V} (alg. 1 algs. 1 to 1): for all sf𝖲π^V(𝖦^𝖦)s_{f}\in\mathsf{S}^{\widehat{\pi}_{V}}\cap(\widehat{\mathsf{G}}\setminus\mathsf{G}) and a𝖠(sf)a\in\mathsf{A}(s_{f}), we add the variables VsV_{s} such that ssucc(sf,a)s\in\text{succ}(s_{f},a) and s𝖲^s\not\in\widehat{\mathsf{S}}.

Constraint generation is a similar technique which enables the solving of LPs with a large (potentially infinite) number of constraints. The key idea is that the optimal solution of an LP with many constraints only makes a small number of constraints active, thus only a subset of constraints is needed to characterise this optimal solution. In constraint generation, the intermediate LPs are known as relaxed LPs since they relax the original LP by removing one or more constraints. Given a relaxed LP, constraint generation finds one or more constraints in the original LP that are violated by the optimal solution of the relaxed LP. By iteratively adding these violated constraints and re-optimising the new relaxed LP, a sequence of relaxed LPs with an increasing number of constraints is generated. When no violations are found, the optimal solution of the relaxed LP is an optimal solution for the original LP. The algorithm used to check constraint violations is called a separation oracle and the effectiveness of constraint generation relies on the availability of an efficient separation oracle for the original LP.

In the case of iLAO, constraint generation adds all the actions of the expanded artificial goal (alg. 1 alg. 1) where each action a𝖠(sf)a\in\mathsf{A}(s_{f}) added to the partial SSP implicitly represents the constraint V(sf)Q(sf,a)V(s_{f})\leq Q(s_{f},a). This separation oracle is computationally cheap since no checks are performed to detect if this new constraint is needed or not, at the cost that inactive constraints are unnecessarily added to the partial SSP. In the next section, we present our new algorithm that uses an efficient separation oracle that only adds violated constraints.

4 CG-iLAO

1 Function CG-iLAO (𝕊,H,ϵ)(\mathbb{S},H,\epsilon)
2      
3      𝕊^\widehat{\mathbb{S}}\leftarrow partial SSP {s0},s0,{s0},,P,C,H\langle\{s_{0}\},s_{0},\{s_{0}\},\emptyset,P,C,H\rangle
4       Vvalue function initialised by HV\leftarrow\text{value function initialised by }H
5       repeat
6             \mathcal{E}\leftarrow post-order DFS traversal of π^V\widehat{\pi}_{V} from s0s_{0}
7             𝕊^Partly-Expand-Fringes(𝕊,𝕊^,,V)\widehat{\mathbb{S}}\leftarrow\textsc{Partly-Expand-Fringes}{}(\mathbb{S},\widehat{\mathbb{S}},\mathcal{E},V)
8             𝖥𝖲π^V(𝖦^𝖦)\mathsf{F}\leftarrow\mathsf{S}^{\widehat{\pi}_{V}}\cap(\widehat{\mathsf{G}}\setminus\mathsf{G})
9             V,res,π^old,ΓCG-Backups(𝕊,𝕊^,,V,𝖥,Γ)V,\textsc{res},\widehat{\pi}_{\text{old}},\Gamma\leftarrow\textsc{CG-Backups}{}(\mathbb{S},\widehat{\mathbb{S}},\mathcal{E},V,\mathsf{F},\Gamma)
10             V,res,Γ,𝕊^Fix-Constrs(𝕊,𝕊^,V,Γ,res)V,\textsc{res},\Gamma,\widehat{\mathbb{S}}\leftarrow\textsc{Fix-Constrs}{}(\mathbb{S},\widehat{\mathbb{S}},V,\Gamma,\textsc{res})
11            
12      until 𝖥=\mathsf{F}=\emptyset and π^old=π^V\widehat{\pi}_{\text{old}}=\widehat{\pi}_{V} and resϵ\textsc{res}\leq\epsilon
13       return VV
14
15Function Partly-Expand-Fringes(𝕊,𝕊^,,V)\textsc{Partly-Expand-Fringes}{}(\mathbb{S},\widehat{\mathbb{S}},\mathcal{E},V)
16       for sf(𝖦^𝖦)s_{f}\in\mathcal{E}\cap(\widehat{\mathsf{G}}\setminus\mathsf{G}) do
17             𝖦^𝖦^{sf}\widehat{\mathsf{G}}\leftarrow\widehat{\mathsf{G}}\setminus\{s_{f}\}
18             Qminmina𝖠(sf)Q(sf,a)Q_{min}\leftarrow\min_{a\in\mathsf{A}(s_{f})}Q(s_{f},a)
19             𝖠{a𝖠(sf)|Q(sf,a)=Qmin}\mathsf{A}^{\prime}\leftarrow\{a\in\mathsf{A}(s_{f})|Q(s_{f},a)=Q_{min}\}
20             Add-Actions(𝕊,𝕊^,sf,𝖠)\textsc{Add-Actions}(\mathbb{S},\widehat{\mathbb{S}},s_{f},\mathsf{A}^{\prime})
21            
22      return 𝕊^\widehat{\mathbb{S}}
23      
24
25Function CG-Backups(𝕊,𝕊^,,V,𝖥,Γ)\textsc{CG-Backups}{}(\mathbb{S},\widehat{\mathbb{S}},\mathcal{E},V,\mathsf{F},\Gamma)
26       π^oldπ^V\widehat{\pi}_{\text{old}}\leftarrow\widehat{\pi}_{V}
27       repeat
28             res0\textsc{res}\leftarrow 0
29             for s𝖦^s\in\mathcal{E}\setminus\widehat{\mathsf{G}} do
30                   Qminmina𝖠^(s)Q(s,a)Q_{\min}\leftarrow\min_{a\in\widehat{\mathsf{A}}(s)}Q(s,a)
31                   if QminV(s)>ϵQ_{\min}-V(s)>\epsilon then
32                         ΓΓExt-Succs(s,𝕊,𝕊^)\Gamma\leftarrow\Gamma\cup\textsc{Ext-Succs}(s,\mathbb{S},\widehat{\mathbb{S}})
33                        
34                   else if V(s)Qmin>ϵV(s)-Q_{\min}>\epsilon then
35                         ΓΓPreds(s,𝕊)\Gamma\leftarrow\Gamma\cup\textsc{Preds}(s,\mathbb{S})
36                        
37                  resmax(|V(s)Qmin|,res)\textsc{res}\leftarrow\max(|V(s)-Q_{\min}|,\textsc{res})
38                   V(s)QminV(s)\leftarrow Q_{\min}
39                  
40            
41      until resϵ\textsc{res}\leq\epsilon or π^Vπ^old\widehat{\pi}_{V}\neq\widehat{\pi}_{\text{old}} or 𝖥\mathsf{F}\neq\emptyset
42       return V,res,π^old,ΓV,\textsc{res},\widehat{\pi}_{\text{old}},\Gamma
43
44Function Fix-Constrs(𝕊,𝕊^,V,Γ,res)\textsc{Fix-Constrs}{}(\mathbb{S},\widehat{\mathbb{S}},V,\Gamma,\textsc{res})
45       Γ\Gamma^{\prime}\leftarrow\emptyset
46       for (s,a)Γ s.t. V(s)>Q(s,a)+ϵ(s,a)\in\Gamma\text{ s.t. }V(s)>Q(s,a)+\epsilon do
47             if a𝖠^(s)a\not\in\widehat{\mathsf{A}}(s) then
48                   𝖠^(s)𝖠^(s){a}\widehat{\mathsf{A}}(s)\leftarrow\widehat{\mathsf{A}}(s)\cup\{a\}
49                   𝖦^𝖦^(succ(s,a)𝖲^)\widehat{\mathsf{G}}\leftarrow\widehat{\mathsf{G}}\cup(\text{succ}(s,a)\setminus\widehat{\mathsf{S}})
50                   𝖲^𝖲^succ(s,a)\widehat{\mathsf{S}}\leftarrow\widehat{\mathsf{S}}\cup\text{succ}(s,a)
51                  
52            resmax(V(s)Q(s,a),res)\textsc{res}\leftarrow\max(V(s)-Q(s,a),\textsc{res})
53             V(s)Q(s,a)V(s)\leftarrow Q(s,a)
54             ΓΓPreds(s,𝕊)\Gamma^{\prime}\leftarrow\Gamma^{\prime}\cup\textsc{Preds}(s,\mathbb{S})
55            
56      return V,res,Γ,𝕊^V,\textsc{res},\Gamma^{\prime},\widehat{\mathbb{S}}
Algorithm 2 CG-iLAO

In iLAO, as most search algorithms, each state is either unexpanded or fully expanded, i.e., either none or all of its applicable actions are considered and it is not possible to ignore just a subset of the applicable actions. We start by defining which actions can be safely ignored in a partial SSP.

Definition 3 (Inactive Action).

Consider an SSP 𝕊\mathbb{S}, its partial SSP 𝕊^\widehat{\mathbb{S}}, a value function VV, and a state s𝖲^s\in\widehat{\mathsf{S}}. An action a𝖠(s)𝖠^(s)a\in\mathsf{A}(s)\setminus\widehat{\mathsf{A}}(s) is inactive in state ss if V(s)<Q(s,a).V(s)<Q(s,a).

An inactive action a𝖠(s)𝖠^(s)a\in\mathsf{A}(s)\setminus\widehat{\mathsf{A}}(s) for s𝖲^s\in\widehat{\mathsf{S}} represents the inactive constraint C1 for ss and aa in LP 1. Since inactive actions are not in the partial SSP and their associated constraints are inactive, adding them to the partial SSP does not change the solution and only adds overhead in the form of QQ-value computation for sub-optimal actions.

We generalise iLAO by allowing states to be partially expanded, so these states only have a subset of actions available in the partial SSP. Under the lens of linear programming, we use constraint generation to identify and add actions that may be needed to encode the optimal solution and to ignore inactive actions. We call this algorithm Constraint-Generation iLAO (CG-iLAO).

CG-iLAO is presented in alg. 2. One of the defining changes from iLAO is in CG-iLAO’s expanding phase (alg. 2 alg. 2) where Partly-Expand-Fringes only expands a state with the greedy actions on VV, rather than all the applicable actions. This introduces two challenges: (i) actions that were not added by the partial expansion may need to be added later when VV is more accurate; and (ii) when we add such actions, VV must be updated to reflect aa’s availability. If (ii) is not addressed, the reduction to VV offered by aa is not propagated, potentially leading to a suboptimal solution since VV would overestimate VV^{*}. The key insight of our algorithm is that both of these challenges are instances of constraint violation. Thus, we can solve both issues by finding which constraints are violated with a separation oracle and enforcing them in the style of constraint generation.

The trivial separation oracle checks all constraints (s,a)(s,a) for s𝖲^s\!\in\!\widehat{\mathsf{S}} and a𝖠(s)a\!\in\!\mathsf{A}(s) for violations; this is needlessly expensive since some non-violated constraints remain non-violated from one iteration to the next. Our separation oracle exploits this persistence between iterations by tracking changes in VV to compute a subset of constraints which could potentially be violated by the following rules (alg. 2 algs. 2 to 2 and alg. 2): suppose V(s)V(s) is assigned mina𝖠^(s)Q(s,a)\min_{a\in\widehat{\mathsf{A}}(s)}Q(s,a), then there are three cases:

  1. 1.

    𝑽(𝒔)V(s) stays the same. No new constraint violations.

  2. 2.

    𝑽(𝒔)V(s) increases. The constraints V(s)Q(s,a)V(s)\leq Q(s,a^{\prime}) may be violated for (s,a)Ext-Succs(s,𝕊,𝕊^){(s,a):a𝖠(s)𝖠^(s)}(s,a^{\prime})\in\textsc{Ext-Succs}(s,\mathbb{S},\widehat{\mathbb{S}})\coloneqq\{(s,a):a\in\mathsf{A}(s)\setminus\widehat{\mathsf{A}}(s)\}. Note that if (s,a)(s,a^{\prime}) is already inside 𝕊^\widehat{\mathbb{S}}, i.e., a𝖠^(s)a^{\prime}\in\widehat{\mathsf{A}}(s), its constraint can not be violated since V(s)mina𝖠^(s)Q(s,a)Q(s,a)V(s)\leftarrow\min_{a\in\widehat{\mathsf{A}}(s)}Q(s,a)\leq Q(s,a^{\prime}).

  3. 3.

    𝑽(𝒔)V(s) decreases. The only constraints that may be violated are V(s)Q(s,a)V(s^{\prime})\leq Q(s^{\prime},a^{\prime}) for (s,a)Preds(s,𝕊){(s,a):ssucc(s,a),a𝖠(s)}(s^{\prime},a^{\prime})\in\textsc{Preds}(s,\mathbb{S})\coloneqq\{(s^{\prime},a^{\prime}):s\in\text{succ}(s^{\prime},a^{\prime}),a^{\prime}\in\mathsf{A}(s^{\prime})\}.

We store potential violations in Γ\Gamma, i.e., if V(s)V(s) increases, the elements of Ext-Succs(s,𝕊,𝕊^)\textsc{Ext-Succs}(s,\mathbb{S},\widehat{\mathbb{S}}) are added to Γ\Gamma (alg. 2 alg. 2); and if V(s)V(s) decreases, the elements of Preds(s,𝕊)\textsc{Preds}(s,\mathbb{S}) are added to Γ\Gamma (alg. 2 alg. 2). Elements are removed from Γ\Gamma after they are checked. As we prove later in this section, checking constraints in Γ\Gamma is sufficient to find any constraint violations.

CG-iLAO fixes a violated constraint (s,a)(s,a) by setting V(s)Q(s,a)V(s)\leftarrow Q(s,a) (alg. 2 alg. 2). This change in VV may create a new violation in another state, so we must track such potential violations in the same way as before. This ensures that all constraint violations are tracked and fixed eventually before termination.

s0s_{0} 33s1s_{1} 22s2s_{2} 11s3s_{3} 22gg 01a0a_{0}1a1a_{1}1a1a^{\prime}_{1}3a2a_{2}2a3a_{3}
Figure 1: An SSP where CG-iLAO’s value function is not monotonically non-decreasing.

Note that V(s)V(s) may decrease after an update (case 3 of the constraint violations) in CG-iLAO even if the heuristic used is monotonic. This is a departure from all other algorithms based on Bellman backups where VV is guaranteed to be monotonically non-decreasing during their executions when initialised with a monotonic heuristic. To illustrate a scenario where VV decreases in CG-iLAO, consider the SSP in fig. 1 where HH is monotonic and represented inside nodes. The first iterations of CG-iLAO applied to this SSP are:

  1. Iter. 1

    expands s0s_{0}.

  2. Iter. 2

    partly expands s1s_{1} with a1a_{1}.

  3. Iter. 3

    expands s2s_{2} with a2a_{2} and, after CG-Backups, we have V(s2)=3V(s_{2})=3, V(s1)=4V(s_{1})=4, V(s0)=5V(s_{0})=5 and Γ={(s1,a1)}\Gamma=\{(s_{1},a^{\prime}_{1})\}. Since Γ\Gamma\neq\emptyset, Fix-Constrs verifies that (s1,a1)(s_{1},a^{\prime}_{1}) is currently better than the existing action a1a_{1} for s1s_{1}, so a1a^{\prime}_{1} is added to 𝖠^(s1)\widehat{\mathsf{A}}(s_{1}) and V(s1)V(s_{1}) is changed from 4 to 3. Recall that V(s0)=5V(s_{0})=5, so when V(s1)V(s_{1}) is updated to 33, {s0,a0}\{s_{0},a_{0}\} is inserted into Γ\Gamma, and no further changes are made in this iteration.

  4. Iter. 4

    expands s3s_{3} and CG-Backups reduces V(s0)V(s_{0}) from 5 to 4, so VV has been decreased by CG-Backups.

CG-iLAO generalises iLAO by using a more precise separation oracle that only adds violated constraints, which translates to CG-iLAO ignoring inactive actions. For a state ss, any action that has been ignored and left out of the partial SSP 𝕊^\widehat{\mathbb{S}} is not considered by a Bellman backup of ss in 𝕊^\widehat{\mathbb{S}}, so such actions’ QQ-values are not computed. However, CG-iLAO needs to compute additional QQ-values in its separation oracle to check violations. As our experiments in section 5 show, the QQ-values saved by ignoring inactive actions outweigh the additional QQ-values in the separation oracle, which lets CG-iLAO outperform iLAO.

To close the section, we prove that CG-iLAO: (i) terminates (thm. 1); (ii) tracks all constraint violations (lem. 1); and (iii) returns an ϵ-consistent\epsilon\text{-consistent} value function (thm. 2). Thus, CG-iLAO is optimal for SSPs.

Theorem 1.

CG-iLAO terminates.

Proof.

For contradiction, suppose CG-iLAO does not terminate. 𝕊\mathbb{S} is finite and we do not add duplicate states nor constraints, so eventually 𝕊^\widehat{\mathbb{S}} is fixed and 𝖥=\mathsf{F}=\emptyset. Then there must be a finite set of states X𝖲^X\subseteq\widehat{\mathsf{S}} that are updated with Bellman backups infinitely often by CG-Backups and/or Fix-Constrs. But XX induces a new partial SSP, and applying Bellman backups infinitely often to all XX solves this new partial SSP with VI, so VV must converge to a fixed point and the residual will be less than ϵ\epsilon in finite time. Thus, VV will not be updated, and all remaining termination conditions will be satisfied, giving us the desired contradiction. ∎

Lemma 1.

If there is s𝖲^𝖦^s\in\widehat{\mathsf{S}}\setminus\widehat{\mathsf{G}} and a𝖠(s)a\in\mathsf{A}(s) such that V(s)>Q(s,a)+ϵV(s)>Q(s,a)+\epsilon, then (s,a)Γ(s,a)\in\Gamma.

Proof.

We prove by induction over nn, the number of updates to VV. In the base case, n=0n=0, 𝕊^\widehat{\mathbb{S}} is the initial partial SSP with 𝖲^𝖦^=\widehat{\mathsf{S}}\setminus\widehat{\mathsf{G}}=\emptyset, so the claim is vacuously true. Now, we show the claim holds after n+1n+1 updates to VV, assuming that the claim holds for nn updates. Any violations must have been introduced in the latest update to VV by CG-Backups or Fix-Constrs, but we add any potential violations to Γ\Gamma in alg. 2 algs. 2 to 2 and alg. 2 respectively. ∎

Theorem 2.

CG-iLAO outputs an ϵ-consistent\epsilon\text{-consistent} VV.

Proof.

For contradiction, suppose CG-iLAO has terminated and outputs VV with s𝖲π^Vs\in\mathsf{S}^{\widehat{\pi}_{V}} such that res(s)>ϵ\textsc{res}(s)>\epsilon. By CG-iLAO’s termination condition (alg. 2 alg. 2) we know that π^V=π^old\widehat{\pi}_{V}=\widehat{\pi}_{\text{old}} and 𝖥=\mathsf{F}=\emptyset, so CG-Backups applies Bellman backups to all states in the envelope until resϵ\textsc{res}\leq\epsilon (alg. 2 alg. 2). Therefore, the inconsistency of ss must be introduced by Fix-Constrs, either directly by updating VV, or indirectly by forcing a policy change. But the residual is tracked (alg. 2 alg. 2) and policy changes are flagged when π^Vπ^old\widehat{\pi}_{V}\neq\widehat{\pi}_{\text{old}}, which are both checked in the termination condition. So, Fix-Constrs can not introduce any inconsistency either. But these two methods are the only ones affecting VV, which yields the desired contradiction. This proves ϵ-consistency\epsilon\text{-consistency} (defn. 1), but previous heuristic search methods rely on the invariant VVV\leq V^{*} to safely prune states that can not be part of an optimal policy’s envelope, which CG-iLAO does not have. We must ensure that states ss outside the policy envelope with V(s)>V(s)V(s)>V^{*}(s) can not lead to a cheaper policy if we apply more Bellman backups to them. Consider such ss outside the greedy policy envelope with V(s)>V(s)V(s)>V^{*}(s), and for contradiction let V(s)>Q(s,a)+ϵV(s)>Q(s,a)+\epsilon for some a𝖠(s)a\in\mathsf{A}(s). Since states in 𝖦^\widehat{\mathsf{G}} are initialised with an admissible HH, we know that s𝖲^𝖦^s\in\widehat{\mathsf{S}}\setminus\widehat{\mathsf{G}}, so (s,a)Γ(s,a)\in\Gamma by lem. 1. Since Fix-Constrs overwrites Γ\Gamma, (s,a)(s,a) must have been added in the previous call, but then resmax(V(s)Q(s,a),res)>ϵ\textsc{res}\leftarrow\max(V(s)-Q(s,a),\textsc{res})>\epsilon (alg. 2 alg. 2), so the termination criteria (alg. 2 alg. 2) are not satisfied, giving us a contradiction. Therefore, all inadmissible states ss satisfy V(s)Q(s,a)+ϵV(s)\leq Q(s,a)+\epsilon. Thus, if CG-iLAO terminates with resϵ\textsc{res}\leq\epsilon, additional backups to states with V(s)>V(s)V(s)>V^{*}(s) would not change VV, so we can conclude that CG-iLAO outputs ϵ-consistent\epsilon\text{-consistent} VV. ∎

5 Experiments

In this section we empirically compare CG-iLAO to two state-of-the-art optimal heuristic search planners: iLAO (Hansen and Zilberstein 2001) and LRTDP (Bonet and Geffner 2003). We also compared CG-iLAO against FTVI (Dai, Weld et al. 2009), the only algorithm we are aware of that uses action elimination to prune actions, but it is uncompetitive and FTVI’s results are reported in section 7. We consider the following admissible heuristics: h-max (hmaxh^{\text{max}}); lm-cut (hlmch^{\text{lmc}}(Helmert and Domshlak 2009); and h-roc (hroch^{\text{roc}}(Trevizan, Thiébaux, and Haslum 2017). As in (Trevizan, Thiébaux, and Haslum 2017), we use hroch^{\text{roc}} with hmaxh^{\text{max}} as a dead-end detection mechanism for problems with dead ends. We use ϵ=0.0001\epsilon=0.0001 and convert SSPs into dead-end free SSPs (Trevizan, Teichteil-Königsbuch, and Thiébaux 2017) with a penalty of D=500D=500 for all domains except Parc Printer variants where D=107D=10^{7} due to the large cost of single actions.

On all problems, we collected 50 runs with different random seeds for each combination of planner and heuristic. We refer to a problem paired with a fixed seed as an instance. All runs have a cutoff of 30 minutes of CPU time and 8GB of memory. The experiments were conducted in a cluster of Intel Xeon 3.2 GHz CPUs and each run used a single CPU core. The LP solver used for computing hroch^{\text{roc}} was CPLEX version 20.1. We consider the following domains:

\Vertex\Vertex\Vertex\Vertex\Vertex\VertexABC\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Vertex\Vertex\Vertex\Vertex\Vertex\Vertex\Vertex\Vertex\Vertex\Vertex\Vertex\Vertex\Vertex\Vertex\VertexABC\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge\Edge
Figure 2: Triangle Tire World problems 1 (left) and 2 (right).
Refer to caption
Refer to caption
Refer to caption
Figure 3: For each algorithm and heuristic, the cumulative plot of how many instances were solved w.r.t. time in seconds (left) and number of QQ-values (right). Both plots start at 200 solved instances and (right) is cut off at 4×1084\times 10^{8} QQ-values.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Cumulative plot of state density, i.e., |𝖠^(s)|/|𝖠(s)|{|\widehat{\mathsf{A}}(s)|}/{|\mathsf{A}(s)|} over 50 instances for the largest solved problem per domain.

Triangle Tire World with Head-start (TW)

In the original Triangle Tire World domain (Little, Thiebaux et al. 2007; Buffet 2008), the agent is provided a map of locations in a triangular layout with corners A,B,CA,B,C, as in fig. 2. The agent’s task is to travel from corner AA to CC, but it gets a flat tyre with probability 0.50.5 every time it moves. Once the car gets a flat tyre, it must change the tyre if a spare is available, otherwise no action is available and the goal is no longer reachable. The agent can only store one spare at a time, and it can only obtain spare tyres in select locations (circles in fig. 2). A shortcoming of this domain is that its difficulty scales exponentially, so it may be easy to solve problem nn and impractical for n+1n+1. For this reason we extend the domain by allowing the agent a head-start, that is, its starting location may be anywhere along the edge ABAB, for instance, on problem 2 these are locations 1-1, 2-1, …, 5-1 (fig. 2 (right)). Let TW(n,d)\text{TW}(n,d) denote an instance of Triangle Tire World with Head-start where nn is the problem size and dd denotes the distance between the agent’s starting location and corner BB. When d=2nd=2n, we obtain the original problem of size nn and reducing dd makes the problem easier until we reach d=1d=1, the easiest variant. Experiments using LRTDP and hroch^{\text{roc}} suggest the following relation in terms of CPU time: TW(n+1,2n3)TW(n,2n)TW(n+1,2n2)\text{TW}(n+1,2n-3)\leq\text{TW}(n,2n)\leq\text{TW}(n+1,2n-2). Therefore, for each size nn, we consider 5 problems: TW(n,2n4),,TW(n,2n)\text{TW}(n,2n-4),\dots,\text{TW}(n,2n).

Probabilistic Blocks World (BW) (Buffet 2008)

As in the deterministic Blocks World from IPC, the agent is tasked with arranging blocks on a table into a particular configuration with actions to pick up blocks, put them down, or stack them. The probabilistic version adds a 0.25 probability to each action that the handled block falls onto the table. Furthermore, actions are added that allow the agent to pick up, put down, and stack a tower of three blocks; these have 0.9 probability of the whole tower falling onto the table.

Exploding Blocks World (ExBW) (Buffet 2008)

Another variation for the deterministic Blocks World but this time each block is rigged with an explosive that can detonate once and destroy the table or block immediately underneath it. When a block is placed on the table or another block, it detonates with probability 0.4 and 0.1 respectively. Once the table or a block has been destroyed, the agent can not interact with them anymore; therefore, if they are not in their goal position, the goal will be unreachable.

Probabilistic PARC Printer (PARC) (Trevizan, Thiébaux, and Haslum 2017)

This domain is a probabilistic extension of the PARC Printer domain from IPC. It models a modular printer consisting of various components and each page scheduled for printing needs to pass through multiple components in a particular order. The goal is to optimise how each page is directed through the different components to satisfy the printing requirements. With probability 0.1, a component jams ruining the relevant page and forcing it to be reprinted. The domain comes in two flavours: with repair (PARC-R), where jammed components can be repaired and then used again; and without repair (PARC-N), where jammed components remain unusable.

Our code and benchmarks are available at Schmalz and Trevizan (2023). We now present a summary of our findings.

What is the best planner and heuristic combination?

A common metric to evaluate planners is coverage, i.e., the total number of instances solved in a given amount of time, thus larger coverage is better. Fig. 3 (left) shows the coverage of each combination of planner and heuristic as a function of time. The top three combinations and their total coverages are CG-iLAOroc{}^{*}_{\text{roc}} (1300), LRTDProc{}_{\text{roc}} (1202), and iLAOroc{}^{*}_{\text{roc}} (1200).Note that CG-iLAOroc{}^{*}_{\text{roc}} and LRTDProc{}_{\text{roc}} alternate in the top spot up to 220 seconds and CG-iLAOroc{}^{*}_{\text{roc}} has the best coverage after that until the experiment cutoff. Moreover, from 300 seconds onwards, CG-iLAOroc{}^{*}_{\text{roc}}’s lead varies from 50 to 243 instances. We present a breakdown of coverage per domain in tab. 1. For each domain considered, CG-iLAOroc{}^{*}_{\text{roc}} reaches the highest coverage over other planners and heuristics. For all three heuristics considered, CG-iLAO also obtains the highest coverage in all domains against the other planners for the same heuristic. In tab. 2, we present the minimum and maximum speedup per domain of CG-iLAO over the other planners for hroch^{\text{roc}}. The speedups w.r.t. iLAOroc{}^{*}_{\text{roc}} vary from 0.9×0.9\times (i.e., 11% slower) in the largest PARC-R problem and 3×3\times in problem #9 of ExBW. Against LRTDProc{}_{\text{roc}}, the speedups vary from 0.9×0.9\times in problem #8 of ExBW to 8.4×8.4\times in PARC-N problem s4-c3. For the performance of each planner and heuristic per problem, see section 7.

How many actions can CG-iLAO ignore?

To answer this question, we look at the density of states ss, defined as |𝖠^(s)|/|𝖠(s)|{|\widehat{\mathsf{A}}(s)|}/{|\mathsf{A}(s)|}, in the final partial SSP of CG-iLAOroc{}^{*}_{\text{roc}}. Fig. 4 shows, for each domain, the cumulative plot of density, i.e., how many states contain up to and including a given proportion of their applicable actions. In all instances, at least one third of the states contains at most 50% of the applicable actions. The density is high for TW because many states only have one applicable action. The density is also high for ExBW because heuristics are comparatively weak for this domain. For the other domains the results are much stronger: between 56% and 75% of states contain at most 50% of the actions. Overall, CG-iLAOroc{}^{*}_{\text{roc}} added between 38%38\% and 66%66\% of all possible actions in its own partial SSP, and added between 43%43\% and 65%65\% of iLAO’s actions.

BW ExBW PARC-N PARC-R  TW Total
Num. of instances 300 250 300 250 200 1300
hroch^{\text{roc}} CG-iLAO 𝟑𝟎𝟎\bm{300} 𝟐𝟓𝟎\bm{250} 𝟑𝟎𝟎\bm{300} 𝟐𝟓𝟎\bm{250} 𝟐𝟎𝟎\bm{200} 𝟏𝟑𝟎𝟎\bm{1300}
iLAO 𝟑𝟎𝟎\bm{300} 200 𝟑𝟎𝟎\bm{300} 𝟐𝟓𝟎\bm{250} 150 1200
LRTDP 257 𝟐𝟓𝟎\bm{250} 𝟑𝟎𝟎\bm{300} 200 195 1202
hlmch^{\text{lmc}} CG-iLAO 150 𝟐𝟓𝟎\bm{250} 𝟑𝟎𝟎\bm{300} 200 150 1030
iLAO 150 200 𝟑𝟎𝟎\bm{300} 200 140 990
LRTDP 0 200 𝟑𝟎𝟎\bm{300} 50 149 699
hmaxh^{\text{max}} CG-iLAO 150 200 150 0 161 661
iLAO 150 150 150 0 150 600
LRTDP 150 200 150 0 150 650
Table 1: Coverage per domain. Best coverage for each domain (column) in bold.
BW ExBW PARC-R PARC-N TW
iLAOroc{}^{*}_{\text{roc}} 1.1–1.4 1.0–3.0 0.9–1.3 1.3–2.2 1.4–1.6
LRTDProc{}_{\text{roc}} 1.3–2.0 0.9–2.9 2.0–8.4 0.7–0.9 1.2–1.6
Table 2: CG-iLAOroc{}^{*}_{\text{roc}}’s speed-up over iLAOroc{}^{*}_{\text{roc}} and LRTDProc{}_{\text{roc}}. The speed-ups only consider problems solved by both algorithms.

Are QQ-values being saved?

CG-iLAO can save QQ-value computations by ignoring inactive actions, but at the cost of computing additional QQ-values in its separation oracle. The cumulative plot over QQ-values in fig. 3 (right) shows that the savings in QQ-values outweigh the overhead, i.e., given a budget in QQ-values computations, CG-iLAO is capable of solving more instances than the other planners for the same heuristic. At their maximum coverage, iLAOroc{}^{*}_{\text{roc}} and LRTDProc{}_{\text{roc}} use 4×4\times and 10×10\times more QQ-values than CG-iLAOroc{}^{*}_{\text{roc}}, respectively. Moreover, CG-iLAOroc{}^{*}_{\text{roc}} reaches its maximum coverage of 1300 using 1.64×1081.64\times 10^{8} QQ-values while iLAOroc{}^{*}_{\text{roc}} and LRTDProc{}_{\text{roc}} only solve 1149 and 1052 instances, respectively, for the same number of QQ-values. A similar trend is observed when using hlmch^{\text{lmc}}; however, when using hmaxh^{\text{max}}, the least informative heuristic considered, LRTDPmax{}_{\text{max}} is slightly more QQ-value efficient than CG-iLAOmax{}^{*}_{\text{max}}.

What is the impact of the heuristic on CG-iLAO?

Note that, in fig. 3, as the heuristic becomes more informative, the performance gains of CG-iLAO over iLAO and LRTDP increases. To explore this trend, we use the heuristic hwpert(s)h^{\text{pert}}_{w}(s) defined for w[0,1]w\in[0,1] as V(s)rV^{*}(s)\cdot r where rr is a uniform randomly selected number from (w,1](w,1], which lets us quantify how informative a heuristic is (on average) with ww. The randomness in the weight ww ensures that the ordering of states induced by hwperth^{\text{pert}}_{w} is different from the one induced by VV^{*}. Due to the high cost of computing VV^{*} we only consider the smallest problems of BW, ExBW, PARC-N, and TW. Over these problems and 50 instances each, fig. 5 shows the mean search time and 95% C.I. as ww varies over 0.1,0.2,,0.90.1,0.2,\dots,0.9. Search time excludes time spent computing the heuristic. The ratio between CG-iLAO’s and iLAO’s runtime supports that CG-iLAO scales better with better heuristics: the ratio starts at 86% and decreases to 49% and 46% for w=0.5w=0.5 and w=0.9w=0.9 respectively. The reason for this behaviour is that good heuristics (i.e., tighter lower bounds) prevent CG-iLAO from adding inactive actions to its partial SSP, resulting in more savings in QQ-value computation. Over all values of ww, there is no statistically significant difference between CG-iLAO and LRTDP, but both offer substantial improvement over iLAO. This suggests LRTDP’s sampling approach can more efficiently leverage the information provided the heuristics than iLAO and CG-iLAO bridges the gap between them, allowing a non-sampling-based planner to use the heuristics as effectively as LRTDP.

Refer to caption
Figure 5: Search time (excludes compute time for the heuristic) of algorithm with hwperth^{\text{pert}}_{w} as ww varies. We show mean and 95% C.I. of all considered problems over 50 instances.

6 Conclusion, Related and Future Work

Building on existing connections between operations research and planning, we presented a new interpretation of heuristic search on SSPs as solving LPs using variable and constraint generation. We exploit this equivalence to introduce a new and efficient separation oracle for SSPs, which enables a search algorithm to selectively add actions when they are deemed necessary to find the optimal solution. This addresses the shortcoming of state-of-the-art algorithms that add all applicable actions during state expansion, with no mechanism for ignoring actions that will not contribute to the solution. Using this principle, we generalised iLAO into a new optimal heuristic search algorithm CG-iLAO. Empirical evaluation showed that CG-iLAO’s ability to consider a subset of actions results in significant savings in the number of QQ-values computed, which in turn reduces the runtime of the algorithm compared to the state-of-the-art.

Regarding related work, LP 1 has been approximated for factored MDPs to get a more compact LP, called the approximate LP (ALP). Schuurmans and Patrascu (2001) apply constraint generation to the ALP; their separation oracle has a similar condition for adding constraints as CG-iLAO; however, it checks for the condition naively, which is only practical on the compact ALP offered by factored MDPs, and is infeasible for SSPs. Constraint generation lends itself well to complex planning problems where a relaxation can be efficiently solved and constraint violations by the relaxed solution can be efficiently detected, e.g., in multiagent planning (Calliess and Roberts 2021) and metric hybrid factored planning in nonlinear domains (Say and Sanner 2019). For POMDPs, an LP with constraint generation can be used to prune unneeded vectors from the set of vectors used to represent the value function (Walraven and Spaan 2017). In all these works, the separation oracle either naively checks all possible constraints or relies on sampling to find violations.

As future work, we aim to expand the application of CG-iLAO to more complex models that can benefit from our iterative method of generating applicable actions. Models with imprecise parameters, such as MDPIPs and MDPSTs (White III and Eldeib 1994; Trevizan, Cozman, and Barros 2007), are suitable candidates for our approach since they have a minimax semantics for the Bellman equations. In this minimax semantics, the value function minimises the expected cost-to-go assuming that an adversary aims to maximise the cost-to-go by selecting the values of the imprecise parameters. As a result, computing Q(s,a)Q(s,a) in these models requires solving a maximisation problem; therefore, ignoring inactive actions could lead to significant improvements in performance.

Other suitable models include SSPs with PLTL constraints (Baumgartner, Thiébaux, and Trevizan 2018; Mallet, Thiébaux, and Trevizan 2021) in which both the state space and action space are augmented to keep track of constraint violations. In these models, the concept of inactive actions can be extended to also prevent adding actions that lead to constraint violations to their partial problems. The methods presented in this paper may also be applicable to model checking more broadly. In particular, there has been work investigating how to use heuristics to guide the search for probabilistic reachability (Brázdil et al. 2014), in which action elimination is applicable.

Acknowledgements

We thank the anonymous reviewers for their feedback. This research/project was undertaken with the assistance of resources and services from the National Computational Infrastructure (NCI), which is supported by the Australian Government.

References

  • Baumgartner, Thiébaux, and Trevizan (2018) Baumgartner, P.; Thiébaux, S.; and Trevizan, F. 2018. Heuristic Search Planning With Multi-Objective Probabilistic LTL Constraints. In Proc. of 16th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR).
  • Bellman (1957) Bellman, R. 1957. Dynamic programming. Princeton University Press.
  • Bertsekas (1995) Bertsekas, D. 1995. Dynamic Programming and Optimal Control, volume 2. Athena Scientific.
  • Bertsekas and Tsitsiklis (1991) Bertsekas, D.; and Tsitsiklis, J. 1991. An Analysis of Stochastic Shortest Path Problems. Mathematics of Operations Research.
  • Bertsimas and Tsitsiklis (1997) Bertsimas, D.; and Tsitsiklis, J. 1997. Introduction to Linear Optimization. Athena Scientific.
  • Bonet and Geffner (2003) Bonet, B.; and Geffner, H. 2003. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming. In Proc. of 13th Int. Conf. on Automated Planning and Scheduling (ICAPS).
  • Brázdil et al. (2014) Brázdil, T.; Chatterjee, K.; Chmelík, M.; Forejt, V.; Křetínský, J.; Kwiatkowska, M.; Parker, D.; and Ujma, M. 2014. Verification of Markov Decision Processes Using Learning Algorithms. In Automated Technology for Verification and Analysis.
  • Buffet (2008) Buffet, D. 2008. International Planning Competition Uncertainty Part: Benchmarks and Results.
  • Calliess and Roberts (2021) Calliess, J.-P.; and Roberts, S. 2021. Multi-Agent Planning with Mixed-Integer Programming and Adaptive Interaction Constraint Generation (Extended Abstract). Proc. of 14th Symposium on Combinatorial Search (SoCS).
  • Dai, Weld et al. (2009) Dai, P.; Weld, D.; et al. 2009. Focused topological value iteration. In Proc. of 19th Int. Conf. on Automated Planning and Scheduling (ICAPS), volume 19, 82–89.
  • Hansen and Zilberstein (2001) Hansen, E.; and Zilberstein, S. 2001. LAO: A heuristic search algorithm that finds solutions with loops. Artificial Intelligence.
  • Helmert and Domshlak (2009) Helmert, M.; and Domshlak, C. 2009. Landmarks, Critical Paths and Abstractions: What’s the Difference Anyway? Proc. of 19th Int. Conf. on Automated Planning and Scheduling (ICAPS).
  • Little, Thiebaux et al. (2007) Little, I.; Thiebaux, S.; et al. 2007. Probabilistic planning vs. replanning. In ICAPS Workshop on IPC: Past, Present and Future.
  • Mallet, Thiébaux, and Trevizan (2021) Mallet, I.; Thiébaux, S.; and Trevizan, F. 2021. Progression Heuristics for Planning with Probabilistic LTL Constraints. In Proc. of 35th AAAI Conference on Artificial Intelligence.
  • McMahan, Likhachev, and Gordon (2005) McMahan, B.; Likhachev, M.; and Gordon, G. 2005. Bounded Real-Time Dynamic Programming: RTDP with Monotone Upper Bounds and Performance Guarantees. In Proc. of 22nd Int. Conf. on Machine Learning.
  • Sanner et al. (2009) Sanner, S.; Goetschalckx, R.; Driessens, K.; and Shani, G. 2009. Bayesian real-time dynamic programming. In Proc. of 21st Int. Joint Conf. on AI (IJCAI).
  • Say and Sanner (2019) Say, B.; and Sanner, S. 2019. Metric Hybrid Factored Planning in Nonlinear Domains with Constraint Generation, 502–518. Springer International Publishing.
  • Schmalz and Trevizan (2023) Schmalz, J.; and Trevizan, F. 2023. Code, benchmarks, and technical report for AAAI 2024 paper “Efficient Constraint Generation for Stochastic Shortest Path Problems”. https://doi.org/10.5281/zenodo.10344842.
  • Schuurmans and Patrascu (2001) Schuurmans, D.; and Patrascu, R. 2001. Direct value-approximation for factored MDPs. In Advances in Neural Information Processing Systems (NeurIPS).
  • Smith and Simmons (2006) Smith, T.; and Simmons, R. 2006. Focused real-time dynamic programming for MDPs: Squeezing more out of a heuristic. In Proc. of 20th AAAI Conf. on Artificial Intelligence.
  • Teichteil-Königsbuch (2012) Teichteil-Königsbuch, F. 2012. Stochastic safest and shortest path problems. In Proc. of 26th AAAI Conf. on AI.
  • Trevizan, Cozman, and Barros (2007) Trevizan, F.; Cozman, F. G.; and Barros, L. N. 2007. Planning under Risk and Knightian Uncertainty. In Proc. of 20th Int. Joint Conf. on AI (IJCAI).
  • Trevizan, Teichteil-Königsbuch, and Thiébaux (2017) Trevizan, F.; Teichteil-Königsbuch, F.; and Thiébaux, S. 2017. Efficient Solutions for Stochastic Shortest Path Problems with Dead Ends. In Proc. of 33rd Int. Conf. on Uncertainty in Artificial Intelligence (UAI).
  • Trevizan, Thiébaux, and Haslum (2017) Trevizan, F.; Thiébaux, S.; and Haslum, P. 2017. Occupation Measure Heuristics for Probabilistic Planning. In Proc. of 27th Int. Conf. on Automated Planning and Scheduling (ICAPS).
  • Walraven and Spaan (2017) Walraven, E.; and Spaan, M. 2017. Accelerated Vector Pruning for Optimal POMDP Solvers. Proc. of 31st AAAI Conf. on AI.
  • White III and Eldeib (1994) White III, C. C.; and Eldeib, H. K. 1994. Markov decision processes with imprecise transition probabilities. Operations Research, 42(4): 739–749.

7 Appendix

We revisit the key questions of our experiments section by presenting additional results and explaining them.

What is the best planner and heuristic combination?

We report the performance statistics of each planner and heuristic combination per benchmark problem in tabs. 4 to 8. Concretely, we report the coverage over 50 runs, and over the runs that terminated we report the mean and 95% confidence interval associated with CPU time, QQ-values computed, and the number of calls to the heuristic. For each problem (column) we highlight the planner and heuristic combinations with highest coverage, and among these combinations with tied highest coverage, we highlight the 95% confidence intervals that are tied as the best (lowest) for each problem. Formally, an interval [xl,xu][x_{l},x_{u}] is highlighted if there is no other interval [yl,yu][y_{l},y_{u}] for that problem s.t. yu<xly_{u}<x_{l}. The highlighted intervals are also known as non-dominated intervals since there is no interval that is strictly better. If a planner and heuristic combination is missing from a table, then the planner and heuristic combination had a 0% coverage over the relevant domain, e.g., all FTVI combinations are missing from PARC-N.

How many actions can CG-iLAO ignore?

Following on from the relevant experiment in section 5, which considers the largest solved problems of each domain, we report how many actions CG-iLAO can ignore in tab. 3. In particular, we show the number of actions that are in CG-iLAO’s final partial SSP, compared with the potential number of actions that can be in CG-iLAO’s partial SSP if all applicable actions are added for all partial states, and the number of actions in iLAO’s final partial SSP. CG-iLAO’s partial actions are significantly fewer than iLAO’s total actions, i.e., CG-iLAO adds significantly fewer actions than iLAO, which supports our claim that, with a sufficiently informative heuristic, CG-iLAO considers fewer actions and thereby saves on computation.

Num. Actions in Final Partial SSP

partial actions CG-iLAO s𝖲^|𝖠^(s)|\sum_{s\in\widehat{\mathsf{S}}}|\widehat{\mathsf{A}}(s)| potential actions CG-iLAO s𝖲^|𝖠(s)|\sum_{s\in\widehat{\mathsf{S}}}|\mathsf{A}(s)| total actions iLAO s𝖲^|𝖠(s)|\sum_{s\in\widehat{\mathsf{S}}^{\prime}}|\mathsf{A}(s)|
BW 3.77×1053.77\times 10^{5} 9.98×1059.98\times 10^{5} 8.86×1058.86\times 10^{5}
ExBW 3.90×1073.90\times 10^{7} 5.98×1075.98\times 10^{7} 5.96×1075.96\times 10^{7}
PARC-R 8.97×1078.97\times 10^{7} 2.19×1082.19\times 10^{8} 1.82×1081.82\times 10^{8}
PARC-N 1.92×1071.92\times 10^{7} 3.91×1073.91\times 10^{7} 3.64×1073.64\times 10^{7}
TW 8.05×1078.05\times 10^{7} 1.21×1081.21\times 10^{8} 1.24×1081.24\times 10^{8}
Table 3: This table shows, after running CG-iLAO and iLAO with hroch^{\text{roc}}, the number of CG-iLAO’s partial actions (the actions added to CG-iLAO’s final partial SSP), CG-iLAO’s potential actions (the union of applicable actions over all states in CG-iLAO’s final partial SSP), and iLAO’s total actions (the actions added to iLAO’s final partial SSP). 𝕊^\widehat{\mathbb{S}} is the final partial SSP of CG-iLAO and 𝕊^\widehat{\mathbb{S}}^{\prime} is final partial SSP of iLAO. Each row corresponds to the largest solved problem of the domain and values are means over 50 instances.

What is the impact of the heuristic on CG-iLAO?

We have already shown how the search time of CG-iLAO, iLAO, and LRTDP is affected as ww varies; we now show how the number of QQ-values is affected in fig. 6. As ww increases, CG-iLAO uses the fewest QQ-values. This is in line with our other results, but it is important to note that, for w=0.1w=0.1, CG-iLAO does not use fewer QQ-values than LRTDP, which suggests that CG-iLAO does rely on a reasonably informative heuristic, and then scales the best as informativeness increases.

Refer to caption
Refer to caption
Figure 6: Line plots over weight ww of hwperth^{\text{pert}}_{w} that show mean and 95% C.I. of all considered problems over 50 instances of search time (excludes compute time for the heuristic) and the number of QQ-values computed.
algorithm heuristic 8-24967 8-23171 8-25241 10-14262 10-19475 12-19848
CG-iLAO hroch^{\text{roc}} 𝟓𝟎\bm{50} (13.3±0.060)\bm{(13.3\pm 0.060)} [𝟏𝟒,𝟓𝟒𝟑±𝟏𝟕𝟓]\bm{[14,543\pm 175]} 1,399±15.3\langle 1,399\pm 15.3\rangle 𝟓𝟎\bm{50} (13.7±0.082)\bm{(13.7\pm 0.082)} [22,432±920][22,432\pm 920] 𝟏,𝟗𝟖𝟐±81.5\bm{\langle 1,982\pm 81.5\rangle} 𝟓𝟎\bm{50} (13.1±0.078)\bm{(13.1\pm 0.078)} [8,844±1,019][8,844\pm 1,019] 1,158±114\langle 1,158\pm 114\rangle 𝟓𝟎\bm{50} (86.5±0.440)\bm{(86.5\pm 0.440)} [𝟖,𝟕𝟖𝟏±𝟖𝟑𝟒]\bm{[8,781\pm 834]} 𝟗𝟏𝟐±89.7\bm{\langle 912\pm 89.7\rangle} 𝟓𝟎\bm{50} (87.6±0.396)\bm{(87.6\pm 0.396)} [𝟏𝟓,𝟐𝟔𝟗±𝟏,𝟗𝟔𝟖]\bm{[15,269\pm 1,968]} 𝟐,𝟐𝟐𝟗±𝟐𝟓𝟏\bm{\langle 2,229\pm 251\rangle} 𝟓𝟎\bm{50} (𝟒𝟒𝟓±3.56)\bm{(445\pm 3.56)} [𝟕𝟗,𝟕𝟕𝟏±𝟏𝟑,𝟔𝟔𝟒]\bm{[79,771\pm 13,664]} 𝟏𝟏,𝟎𝟐𝟗±𝟏,𝟖𝟗𝟎\bm{\langle 11,029\pm 1,890\rangle}
FTVI hroch^{\text{roc}} 𝟓𝟎\bm{50} (15.7±0.086)(15.7\pm 0.086) [16,134±248][16,134\pm 248] 5,351±98.9\langle 5,351\pm 98.9\rangle 𝟓𝟎\bm{50} (15.5±0.175)(15.5\pm 0.175) [𝟏𝟕,𝟔𝟗𝟓±𝟖𝟐𝟗]\bm{[17,695\pm 829]} 5,081±271\langle 5,081\pm 271\rangle 𝟓𝟎\bm{50} (14.0±0.159)(14.0\pm 0.159) [𝟔,𝟔𝟖𝟕±𝟖𝟔𝟏]\bm{[6,687\pm 861]} 2,314±248\langle 2,314\pm 248\rangle 𝟓𝟎\bm{50} (91.4±0.664)(91.4\pm 0.664) [12,162±717][12,162\pm 717] 2,891±282\langle 2,891\pm 282\rangle 𝟓𝟎\bm{50} (93.8±0.821)(93.8\pm 0.821) [𝟏𝟔,𝟏𝟐𝟐±𝟐,𝟐𝟏𝟗]\bm{[16,122\pm 2,219]} 6,170±723\langle 6,170\pm 723\rangle 𝟓𝟎\bm{50} (523±9.09)(523\pm 9.09) [123,339±13,841][123,339\pm 13,841] 52,385±5,382\langle 52,385\pm 5,382\rangle
iLAO hroch^{\text{roc}} 𝟓𝟎\bm{50} (14.8±0.070)(14.8\pm 0.070) [26,575±1,122][26,575\pm 1,122] 𝟏,𝟐𝟑𝟒±8.92\bm{\langle 1,234\pm 8.92\rangle} 𝟓𝟎\bm{50} (15.2±0.073)(15.2\pm 0.073) [49,315±2,779][49,315\pm 2,779] 𝟏,𝟗𝟏𝟒±76.1\bm{\langle 1,914\pm 76.1\rangle} 𝟓𝟎\bm{50} (14.5±0.078)(14.5\pm 0.078) [20,672±2,543][20,672\pm 2,543] 𝟖𝟖𝟗±94.4\bm{\langle 889\pm 94.4\rangle} 𝟓𝟎\bm{50} (97.5±0.560)(97.5\pm 0.560) [36,477±4,751][36,477\pm 4,751] 𝟖𝟏𝟗±43.4\bm{\langle 819\pm 43.4\rangle} 𝟓𝟎\bm{50} (98.0±0.441)(98.0\pm 0.441) [41,180±5,678][41,180\pm 5,678] 𝟏,𝟖𝟖𝟖±𝟐𝟎𝟔\bm{\langle 1,888\pm 206\rangle} 𝟓𝟎\bm{50} (507±3.43)(507\pm 3.43) [260,141±33,101][260,141\pm 33,101] 𝟏𝟑,𝟔𝟓𝟗±𝟏,𝟕𝟑𝟗\bm{\langle 13,659\pm 1,739\rangle}
LRTDP hroch^{\text{roc}} 𝟓𝟎\bm{50} (24.0±0.283)(24.0\pm 0.283) [102,730±2,772][102,730\pm 2,772] 19,342±522\langle 19,342\pm 522\rangle 𝟓𝟎\bm{50} (25.6±0.424)(25.6\pm 0.424) [128,979±5,024][128,979\pm 5,024] 22,313±811\langle 22,313\pm 811\rangle 𝟓𝟎\bm{50} (26.2±1.18)(26.2\pm 1.18) [144,314±14,547][144,314\pm 14,547] 22,941±2,053\langle 22,941\pm 2,053\rangle 𝟓𝟎\bm{50} (109±1.32)(109\pm 1.32) [55,781±5,392][55,781\pm 5,392] 14,774±1,470\langle 14,774\pm 1,470\rangle 𝟓𝟎\bm{50} (150±5.61)(150\pm 5.61) [248,434±23,864][248,434\pm 23,864] 58,918±5,093\langle 58,918\pm 5,093\rangle 7 (1,301±258)(1,301\pm 258) [2,304,921±724,145][2,304,921\pm 724,145] 532,833±159,334\langle 532,833\pm 159,334\rangle
CG-iLAO hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (132±7.05)(132\pm 7.05) [780,075±60,269][780,075\pm 60,269] 35,591±2,054\langle 35,591\pm 2,054\rangle 𝟓𝟎\bm{50} (573±21.5)(573\pm 21.5) [3,609,948±220,483][3,609,948\pm 220,483] 129,509±5,591\langle 129,509\pm 5,591\rangle 𝟓𝟎\bm{50} (512±29.6)(512\pm 29.6) [2,787,033±277,267][2,787,033\pm 277,267] 106,939±6,900\langle 106,939\pm 6,900\rangle
iLAO hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (121±6.59)(121\pm 6.59) [851,211±54,597][851,211\pm 54,597] 31,891±1,871\langle 31,891\pm 1,871\rangle 𝟓𝟎\bm{50} (545±21.0)(545\pm 21.0) [3,955,151±217,566][3,955,151\pm 217,566] 116,921±4,809\langle 116,921\pm 4,809\rangle 𝟓𝟎\bm{50} (478±26.9)(478\pm 26.9) [2,948,487±241,209][2,948,487\pm 241,209] 99,287±6,219\langle 99,287\pm 6,219\rangle
CG-iLAO hmaxh^{\text{max}} 𝟓𝟎\bm{50} (302±3.25)(302\pm 3.25) [16,176,234±22,293][16,176,234\pm 22,293] 459,032±180\langle 459,032\pm 180\rangle 𝟓𝟎\bm{50} (1,615±14.4)(1,615\pm 14.4) [86,204,696±69,607][86,204,696\pm 69,607] 922,272±0.000\langle 922,272\pm 0.000\rangle 𝟓𝟎\bm{50} (1,688±10.3)(1,688\pm 10.3) [89,642,935±187,375][89,642,935\pm 187,375] 922,271±0.000\langle 922,271\pm 0.000\rangle
iLAO hmaxh^{\text{max}} 𝟓𝟎\bm{50} (240±3.19)(240\pm 3.19) [18,475,207±83,150][18,475,207\pm 83,150] 378,670±445\langle 378,670\pm 445\rangle 𝟓𝟎\bm{50} (743±8.73)(743\pm 8.73) [60,577,420±130,400][60,577,420\pm 130,400] 805,407±188\langle 805,407\pm 188\rangle 𝟓𝟎\bm{50} (819±8.47)(819\pm 8.47) [66,960,805±29,399][66,960,805\pm 29,399] 839,082±133\langle 839,082\pm 133\rangle
LRTDP hmaxh^{\text{max}} 𝟓𝟎\bm{50} (567±2.48)(567\pm 2.48) [33,537,233±30,476][33,537,233\pm 30,476] 922,203±2.75\langle 922,203\pm 2.75\rangle 𝟓𝟎\bm{50} (748±4.41)(748\pm 4.41) [49,482,874±26,003][49,482,874\pm 26,003] 922,273±0.000\langle 922,273\pm 0.000\rangle 𝟓𝟎\bm{50} (863±6.07)(863\pm 6.07) [58,962,949±50,989][58,962,949\pm 50,989] 922,272±0.000\langle 922,272\pm 0.000\rangle
Table 4: BW: A(B)[C]DA(B)[C]\langle D\rangle where AA is the coverage out of 50 runs, (B)(B) is the mean CPU time in seconds (and the 95% confidence interval), [C][C] is the mean number of QQ-values computed (and the 95% C.I.), and D\langle D\rangle is the mean number of calls to the heuristic (and the 95% C.I.). Values for BB, CC and DD only consider successful runs.
algorithm heuristic p07-n7-N9-s7 p08-n8-N10-s8 p09-n9-N11-s9 p10-n10-N12-s10 p11-n11-N13-s11
CG-iLAO hroch^{\text{roc}} 𝟓𝟎\bm{50} (0.585±0.031)(0.585\pm 0.031) [3,289±360][3,289\pm 360] 1,406±130\langle 1,406\pm 130\rangle 𝟓𝟎\bm{50} (68.8±1.70)(68.8\pm 1.70) [8,551,674±24,812][8,551,674\pm 24,812] 164,822±1,461\langle 164,822\pm 1,461\rangle 𝟓𝟎\bm{50} (7.09±0.086)(7.09\pm 0.086) [𝟒𝟓𝟖,𝟏𝟔𝟏±𝟐,𝟗𝟔𝟓]\bm{[458,161\pm 2,965]} 𝟐𝟏,𝟓𝟎𝟑±𝟐𝟐𝟎\bm{\langle 21,503\pm 220\rangle} 𝟓𝟎\bm{50} (299±9.21)(299\pm 9.21) [75,922,919±228,249][75,922,919\pm 228,249] 214,709±1,046\langle 214,709\pm 1,046\rangle 𝟓𝟎\bm{50} (𝟒𝟑𝟑±2.36)\bm{(433\pm 2.36)} [11,183,164±33,827][11,183,164\pm 33,827] 1,444,012±2,101\langle 1,444,012\pm 2,101\rangle
FTVI hroch^{\text{roc}} 𝟓𝟎\bm{50} (1.15±0.127)(1.15\pm 0.127) [4,323±687][4,323\pm 687] 3,929±564\langle 3,929\pm 564\rangle
iLAO hroch^{\text{roc}} 𝟓𝟎\bm{50} (0.601±0.026)(0.601\pm 0.026) [9,971±571][9,971\pm 571] 1,434±112\langle 1,434\pm 112\rangle 𝟓𝟎\bm{50} (200±1.86)(200\pm 1.86) [31,838,096±38,023][31,838,096\pm 38,023] 160,179±1,431\langle 160,179\pm 1,431\rangle 𝟓𝟎\bm{50} (21.3±0.802)(21.3\pm 0.802) [2,689,211±133,082][2,689,211\pm 133,082] 27,797±284\langle 27,797\pm 284\rangle 𝟓𝟎\bm{50} (490±2.96)(490\pm 2.96) [26,675,612±33,185][26,675,612\pm 33,185] 1,465,487±1,379\langle 1,465,487\pm 1,379\rangle
LRTDP hroch^{\text{roc}} 𝟓𝟎\bm{50} (1.16±0.092)(1.16\pm 0.092) [11,266±1,480][11,266\pm 1,480] 3,904±376\langle 3,904\pm 376\rangle 𝟓𝟎\bm{50} (62.5±1.01)\bm{(62.5\pm 1.01)} [3,048,449±95,535][3,048,449\pm 95,535] 228,649±3,906\langle 228,649\pm 3,906\rangle 𝟓𝟎\bm{50} (19.9±0.270)(19.9\pm 0.270) [1,688,157±20,216][1,688,157\pm 20,216] 63,101±985\langle 63,101\pm 985\rangle 𝟓𝟎\bm{50} (136±1.80)(136\pm 1.80) [𝟖,𝟕𝟒𝟏,𝟒𝟓𝟐±𝟔𝟐,𝟒𝟏𝟔]\bm{[8,741,452\pm 62,416]} 428,008±5,579\langle 428,008\pm 5,579\rangle 𝟓𝟎\bm{50} (1,272±26.3)(1,272\pm 26.3) [20,222,574±270,633][20,222,574\pm 270,633] 4,334,206±68,377\langle 4,334,206\pm 68,377\rangle
CG-iLAO hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (0.136±0.006)\bm{(0.136\pm 0.006)} [𝟏,𝟔𝟕𝟐±69.7]\bm{[1,672\pm 69.7]} 𝟖𝟔𝟖±34.1\bm{\langle 868\pm 34.1\rangle} 𝟓𝟎\bm{50} (65.8±1.39)(65.8\pm 1.39) [9,020,624±21,098][9,020,624\pm 21,098] 𝟏𝟑𝟕,𝟎𝟑𝟐±𝟏,𝟓𝟗𝟑\bm{\langle 137,032\pm 1,593\rangle} 𝟓𝟎\bm{50} (8.88±0.082)(8.88\pm 0.082) [598,012±3,912][598,012\pm 3,912] 22,187±320\langle 22,187\pm 320\rangle 𝟓𝟎\bm{50} (347±8.46)(347\pm 8.46) [67,704,563±205,137][67,704,563\pm 205,137] 𝟐𝟎𝟖,𝟎𝟒𝟖±𝟖𝟒𝟓\bm{\langle 208,048\pm 845\rangle} 𝟓𝟎\bm{50} (1,016±4.25)(1,016\pm 4.25) [𝟏𝟏,𝟎𝟐𝟔,𝟐𝟏𝟔±𝟑𝟑,𝟑𝟖𝟖]\bm{[11,026,216\pm 33,388]} 1,268,011±2,443\langle 1,268,011\pm 2,443\rangle
FTVI hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (0.561±0.082)(0.561\pm 0.082) [3,828±615][3,828\pm 615] 3,431±506\langle 3,431\pm 506\rangle
iLAO hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (0.233±0.021)(0.233\pm 0.021) [8,635±453][8,635\pm 453] 1,297±101\langle 1,297\pm 101\rangle 𝟓𝟎\bm{50} (229±3.25)(229\pm 3.25) [35,341,647±45,652][35,341,647\pm 45,652] 𝟏𝟑𝟕,𝟓𝟗𝟑±𝟏,𝟓𝟓𝟎\bm{\langle 137,593\pm 1,550\rangle} 𝟓𝟎\bm{50} (21.8±0.678)(21.8\pm 0.678) [2,581,881±102,361][2,581,881\pm 102,361] 27,040±285\langle 27,040\pm 285\rangle 𝟓𝟎\bm{50} (1,111±6.48)(1,111\pm 6.48) [25,842,856±43,427][25,842,856\pm 43,427] 𝟏,𝟐𝟔𝟑,𝟔𝟗𝟐±𝟖𝟑𝟕\bm{\langle 1,263,692\pm 837\rangle}
LRTDP hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (0.485±0.047)(0.485\pm 0.047) [5,561±716][5,561\pm 716] 2,863±280\langle 2,863\pm 280\rangle 𝟓𝟎\bm{50} (91.9±1.23)(91.9\pm 1.23) [𝟐,𝟖𝟓𝟖,𝟑𝟗𝟕±𝟓𝟔,𝟑𝟓𝟒]\bm{[2,858,397\pm 56,354]} 240,751±3,305\langle 240,751\pm 3,305\rangle 𝟓𝟎\bm{50} (35.7±0.613)(35.7\pm 0.613) [1,704,587±20,139][1,704,587\pm 20,139] 65,962±1,098\langle 65,962\pm 1,098\rangle 𝟓𝟎\bm{50} (392±5.43)(392\pm 5.43) [9,892,329±59,686][9,892,329\pm 59,686] 516,447±7,122\langle 516,447\pm 7,122\rangle
CG-iLAO hmaxh^{\text{max}} 𝟓𝟎\bm{50} (1.69±0.011)(1.69\pm 0.011) [284,993±2,474][284,993\pm 2,474] 43,323±242\langle 43,323\pm 242\rangle 𝟓𝟎\bm{50} (78.5±1.96)(78.5\pm 1.96) [18,188,194±35,225][18,188,194\pm 35,225] 819,653±1,842\langle 819,653\pm 1,842\rangle 𝟓𝟎\bm{50} (4.00±0.051)\bm{(4.00\pm 0.051)} [966,920±5,282][966,920\pm 5,282] 43,816±443\langle 43,816\pm 443\rangle 𝟓𝟎\bm{50} (308±11.5)(308\pm 11.5) [107,680,890±50,168][107,680,890\pm 50,168] 282,810±945\langle 282,810\pm 945\rangle
iLAO hmaxh^{\text{max}} 𝟓𝟎\bm{50} (2.98±0.023)(2.98\pm 0.023) [837,143±4,822][837,143\pm 4,822] 41,336±324\langle 41,336\pm 324\rangle 𝟓𝟎\bm{50} (331±5.48)(331\pm 5.48) [57,048,934±51,381][57,048,934\pm 51,381] 739,570±1,511\langle 739,570\pm 1,511\rangle 𝟓𝟎\bm{50} (13.8±0.764)(13.8\pm 0.764) [2,678,145±114,072][2,678,145\pm 114,072] 42,956±410\langle 42,956\pm 410\rangle
LRTDP hmaxh^{\text{max}} 𝟓𝟎\bm{50} (6.59±0.169)(6.59\pm 0.169) [779,021±18,621][779,021\pm 18,621] 175,517±3,824\langle 175,517\pm 3,824\rangle 𝟓𝟎\bm{50} (89.1±0.880)(89.1\pm 0.880) [19,435,368±123,222][19,435,368\pm 123,222] 1,349,405±25,971\langle 1,349,405\pm 25,971\rangle 𝟓𝟎\bm{50} (14.0±0.195)(14.0\pm 0.195) [3,314,262±32,347][3,314,262\pm 32,347] 145,387±1,891\langle 145,387\pm 1,891\rangle 𝟓𝟎\bm{50} (𝟏𝟎𝟓±0.821)\bm{(105\pm 0.821)} [19,713,845±112,489][19,713,845\pm 112,489] 842,756±9,501\langle 842,756\pm 9,501\rangle
Table 5: ExBW: A(B)[C]DA(B)[C]\langle D\rangle where AA is the coverage out of 50 runs, (B)(B) is the mean CPU time in seconds (and the 95% confidence interval), [C][C] is the mean number of QQ-values computed (and the 95% C.I.), and D\langle D\rangle is the mean number of calls to the heuristic (and the 95% C.I.). Values for BB, CC and DD only consider successful runs.
algorithm heuristic s4-c1 s4-c2 s4-c3 s5-c1 s5-c2 s5-c3
CG-iLAO hroch^{\text{roc}} 𝟓𝟎\bm{50} (10.5±0.155)(10.5\pm 0.155) [1,013,976±15,173][1,013,976\pm 15,173] 31,350±447\langle 31,350\pm 447\rangle 𝟓𝟎\bm{50} (16.1±0.256)(16.1\pm 0.256) [1,715,321±20,139][1,715,321\pm 20,139] 42,106±761\langle 42,106\pm 761\rangle 𝟓𝟎\bm{50} (22.1±0.403)(22.1\pm 0.403) [3,022,500±43,724][3,022,500\pm 43,724] 54,854±1,271\langle 54,854\pm 1,271\rangle 𝟓𝟎\bm{50} (114±1.38)(114\pm 1.38) [12,324,266±114,576][12,324,266\pm 114,576] 307,166±3,809\langle 307,166\pm 3,809\rangle 𝟓𝟎\bm{50} (180±3.49)(180\pm 3.49) [20,993,881±208,715][20,993,881\pm 208,715] 404,594±7,231\langle 404,594\pm 7,231\rangle 𝟓𝟎\bm{50} (244±4.14)(244\pm 4.14) [37,055,726±466,292][37,055,726\pm 466,292] 515,254±12,255\langle 515,254\pm 12,255\rangle
iLAO hroch^{\text{roc}} 𝟓𝟎\bm{50} (14.0±0.194)(14.0\pm 0.194) [3,044,522±74,676][3,044,522\pm 74,676] 𝟐𝟕,𝟖𝟒𝟓±𝟐𝟔𝟗\bm{\langle 27,845\pm 269\rangle} 𝟓𝟎\bm{50} (24.0±0.389)(24.0\pm 0.389) [5,344,883±121,538][5,344,883\pm 121,538] 𝟑𝟖,𝟒𝟓𝟏±𝟓𝟒𝟑\bm{\langle 38,451\pm 543\rangle} 𝟓𝟎\bm{50} (39.8±0.888)(39.8\pm 0.888) [9,597,755±277,641][9,597,755\pm 277,641] 𝟓𝟎,𝟐𝟎𝟏±𝟏,𝟎𝟒𝟓\bm{\langle 50,201\pm 1,045\rangle} 𝟓𝟎\bm{50} (184±2.73)(184\pm 2.73) [43,507,677±1,056,857][43,507,677\pm 1,056,857] 𝟐𝟖𝟔,𝟐𝟗𝟕±𝟐,𝟑𝟗𝟎\bm{\langle 286,297\pm 2,390\rangle} 𝟓𝟎\bm{50} (321±6.36)(321\pm 6.36) [73,968,382±1,836,330][73,968,382\pm 1,836,330] 𝟑𝟕𝟖,𝟎𝟕𝟏±𝟓,𝟖𝟕𝟑\bm{\langle 378,071\pm 5,873\rangle} 𝟓𝟎\bm{50} (525±8.69)(525\pm 8.69) [127,869,617±3,713,867][127,869,617\pm 3,713,867] 𝟒𝟕𝟖,𝟕𝟕𝟖±𝟗,𝟕𝟑𝟐\bm{\langle 478,778\pm 9,732\rangle}
LRTDP hroch^{\text{roc}} 𝟓𝟎\bm{50} (8.78±0.141)\bm{(8.78\pm 0.141)} [𝟒𝟑𝟔,𝟓𝟑𝟏±𝟑,𝟎𝟓𝟖]\bm{[436,531\pm 3,058]} 31,717±561\langle 31,717\pm 561\rangle 𝟓𝟎\bm{50} (13.1±0.270)\bm{(13.1\pm 0.270)} [𝟓𝟖𝟓,𝟖𝟖𝟖±𝟑,𝟎𝟕𝟏]\bm{[585,888\pm 3,071]} 41,835±942\langle 41,835\pm 942\rangle 𝟓𝟎\bm{50} (16.4±0.372)\bm{(16.4\pm 0.372)} [𝟔𝟖𝟎,𝟐𝟑𝟕±𝟒,𝟏𝟗𝟎]\bm{[680,237\pm 4,190]} 53,681±1,522\langle 53,681\pm 1,522\rangle 𝟓𝟎\bm{50} (𝟏𝟎𝟎±1.23)\bm{(100\pm 1.23)} [𝟓,𝟐𝟔𝟏,𝟏𝟑𝟐±𝟒𝟏,𝟕𝟒𝟑]\bm{[5,261,132\pm 41,743]} 310,181±4,941\langle 310,181\pm 4,941\rangle 𝟓𝟎\bm{50} (𝟏𝟓𝟎±2.32)\bm{(150\pm 2.32)} [𝟕,𝟎𝟗𝟖,𝟏𝟏𝟒±𝟒𝟑,𝟓𝟓𝟎]\bm{[7,098,114\pm 43,550]} 398,327±8,576\langle 398,327\pm 8,576\rangle 𝟓𝟎\bm{50} (𝟏𝟖𝟒±3.72)\bm{(184\pm 3.72)} [𝟖,𝟐𝟕𝟑,𝟒𝟗𝟕±𝟒𝟎,𝟖𝟗𝟏]\bm{[8,273,497\pm 40,891]} 𝟒𝟗𝟔,𝟑𝟕𝟏±𝟏𝟑,𝟕𝟎𝟏\bm{\langle 496,371\pm 13,701\rangle}
CG-iLAO hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (10.7±0.070)(10.7\pm 0.070) [1,608,662±17,455][1,608,662\pm 17,455] 46,829±436\langle 46,829\pm 436\rangle 𝟓𝟎\bm{50} (19.4±0.138)(19.4\pm 0.138) [3,367,791±40,108][3,367,791\pm 40,108] 77,150±782\langle 77,150\pm 782\rangle 𝟓𝟎\bm{50} (33.1±0.196)(33.1\pm 0.196) [5,962,957±81,595][5,962,957\pm 81,595] 138,353±1,487\langle 138,353\pm 1,487\rangle 𝟓𝟎\bm{50} (141±0.855)(141\pm 0.855) [19,536,893±198,117][19,536,893\pm 198,117] 484,752±4,355\langle 484,752\pm 4,355\rangle 𝟓𝟎\bm{50} (268±2.53)(268\pm 2.53) [43,716,409±423,614][43,716,409\pm 423,614] 829,264±8,947\langle 829,264\pm 8,947\rangle 𝟓𝟎\bm{50} (477±5.28)(477\pm 5.28) [75,760,732±884,923][75,760,732\pm 884,923] 1,476,484±17,030\langle 1,476,484\pm 17,030\rangle
iLAO hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (16.5±0.198)(16.5\pm 0.198) [4,649,445±114,510][4,649,445\pm 114,510] 39,767±379\langle 39,767\pm 379\rangle 𝟓𝟎\bm{50} (36.5±0.499)(36.5\pm 0.499) [10,851,237±231,976][10,851,237\pm 231,976] 64,040±835\langle 64,040\pm 835\rangle 𝟓𝟎\bm{50} (61.8±1.06)(61.8\pm 1.06) [17,429,097±478,163][17,429,097\pm 478,163] 109,311±1,250\langle 109,311\pm 1,250\rangle 𝟓𝟎\bm{50} (229±3.64)(229\pm 3.64) [59,864,905±1,529,872][59,864,905\pm 1,529,872] 405,784±3,557\langle 405,784\pm 3,557\rangle 𝟓𝟎\bm{50} (539±8.11)(539\pm 8.11) [148,174,993±3,339,338][148,174,993\pm 3,339,338] 658,100±8,781\langle 658,100\pm 8,781\rangle 𝟓𝟎\bm{50} (941±15.5)(941\pm 15.5) [238,099,853±6,527,230][238,099,853\pm 6,527,230] 1,121,729±13,299\langle 1,121,729\pm 13,299\rangle
LRTDP hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (11.6±0.165)(11.6\pm 0.165) [932,940±5,927][932,940\pm 5,927] 68,166±1,087\langle 68,166\pm 1,087\rangle 𝟓𝟎\bm{50} (20.9±0.315)(20.9\pm 0.315) [1,490,562±8,226][1,490,562\pm 8,226] 116,119±2,166\langle 116,119\pm 2,166\rangle 𝟓𝟎\bm{50} (32.0±0.404)(32.0\pm 0.404) [2,113,526±10,839][2,113,526\pm 10,839] 177,797±3,138\langle 177,797\pm 3,138\rangle 𝟓𝟎\bm{50} (156±1.72)(156\pm 1.72) [11,634,002±108,431][11,634,002\pm 108,431] 702,833±10,416\langle 702,833\pm 10,416\rangle 𝟓𝟎\bm{50} (269±3.27)(269\pm 3.27) [18,809,583±146,182][18,809,583\pm 146,182] 1,182,469±23,053\langle 1,182,469\pm 23,053\rangle 𝟓𝟎\bm{50} (432±5.53)(432\pm 5.53) [26,274,405±171,814][26,274,405\pm 171,814] 1,808,411±33,918\langle 1,808,411\pm 33,918\rangle
CG-iLAO hmaxh^{\text{max}} 𝟓𝟎\bm{50} (107±1.18)(107\pm 1.18) [45,597,493±83,407][45,597,493\pm 83,407] 639,153±73.7\langle 639,153\pm 73.7\rangle 𝟓𝟎\bm{50} (374±5.78)(374\pm 5.78) [132,743,888±220,135][132,743,888\pm 220,135] 1,940,914±351\langle 1,940,914\pm 351\rangle 𝟓𝟎\bm{50} (610±11.1)(610\pm 11.1) [203,454,719±410,922][203,454,719\pm 410,922] 2,773,558±1,585\langle 2,773,558\pm 1,585\rangle
iLAO hmaxh^{\text{max}} 𝟓𝟎\bm{50} (217±3.22)(217\pm 3.22) [99,246,994±1,488,977][99,246,994\pm 1,488,977] 618,593±142\langle 618,593\pm 142\rangle 𝟓𝟎\bm{50} (790±14.3)(790\pm 14.3) [317,087,417±6,424,885][317,087,417\pm 6,424,885] 1,787,826±447\langle 1,787,826\pm 447\rangle 𝟓𝟎\bm{50} (1,425±38.8)(1,425\pm 38.8) [448,028,105±13,746,351][448,028,105\pm 13,746,351] 2,229,375±1,267\langle 2,229,375\pm 1,267\rangle
LRTDP hmaxh^{\text{max}} 𝟓𝟎\bm{50} (39.5±0.419)(39.5\pm 0.419) [17,779,098±21,526][17,779,098\pm 21,526] 588,247±45.9\langle 588,247\pm 45.9\rangle 𝟓𝟎\bm{50} (254±2.51)(254\pm 2.51) [126,641,023±252,570][126,641,023\pm 252,570] 1,627,845±75.4\langle 1,627,845\pm 75.4\rangle 𝟓𝟎\bm{50} (403±5.60)(403\pm 5.60) [175,415,264±251,069][175,415,264\pm 251,069] 2,593,447±1,011\langle 2,593,447\pm 1,011\rangle
Table 6: PARC-N: A(B)[C]DA(B)[C]\langle D\rangle where AA is the coverage out of 50 runs, (B)(B) is the mean CPU time in seconds (and the 95% confidence interval), [C][C] is the mean number of QQ-values computed (and the 95% C.I.), and D\langle D\rangle is the mean number of calls to the heuristic (and the 95% C.I.). Values for BB, CC and DD only consider successful runs.
algorithm heuristic s4-c1 s4-c2 s4-c3 s5-c1 s5-c2
CG-iLAO hroch^{\text{roc}} 𝟓𝟎\bm{50} (45.9±0.525)\bm{(45.9\pm 0.525)} [𝟔,𝟐𝟐𝟖,𝟗𝟗𝟕±𝟐𝟎,𝟔𝟓𝟔]\bm{[6,228,997\pm 20,656]} 95,096±261\langle 95,096\pm 261\rangle 𝟓𝟎\bm{50} (𝟏𝟐𝟏±1.38)\bm{(121\pm 1.38)} [14,811,715±28,578][14,811,715\pm 28,578] 208,401±175\langle 208,401\pm 175\rangle 𝟓𝟎\bm{50} (181±2.27)(181\pm 2.27) [22,138,217±27,456][22,138,217\pm 27,456] 281,753±109\langle 281,753\pm 109\rangle 𝟓𝟎\bm{50} (𝟓𝟖𝟑±7.32)\bm{(583\pm 7.32)} [𝟔𝟑,𝟖𝟐𝟏,𝟐𝟕𝟖±𝟏𝟖𝟎,𝟔𝟕𝟔]\bm{[63,821,278\pm 180,676]} 1,061,846±1,937\langle 1,061,846\pm 1,937\rangle 𝟓𝟎\bm{50} (1,657±16.4)(1,657\pm 16.4) [𝟏𝟔𝟏,𝟗𝟗𝟏,𝟔𝟎𝟗±𝟐𝟒𝟐,𝟓𝟑𝟐]\bm{[161,991,609\pm 242,532]} 2,299,310±1,043\langle 2,299,310\pm 1,043\rangle
FTVI hroch^{\text{roc}} 𝟓𝟎\bm{50} (765±37.9)(765\pm 37.9) [𝟓,𝟓𝟏𝟒,𝟖𝟖𝟕±𝟑𝟗,𝟕𝟕𝟐]\bm{[5,514,887\pm 39,772]} 235,023±1,076\langle 235,023\pm 1,076\rangle 𝟓𝟎\bm{50} (613±69.3)(613\pm 69.3) [𝟔,𝟕𝟔𝟕,𝟎𝟎𝟓±𝟒𝟗,𝟑𝟐𝟑]\bm{[6,767,005\pm 49,323]} 307,358±1,049\langle 307,358\pm 1,049\rangle
iLAO hroch^{\text{roc}} 𝟓𝟎\bm{50} (58.3±0.398)(58.3\pm 0.398) [15,636,208±88,234][15,636,208\pm 88,234] 𝟖𝟒,𝟑𝟒𝟎±𝟏𝟐𝟑\bm{\langle 84,340\pm 123\rangle} 𝟓𝟎\bm{50} (126±0.667)(126\pm 0.667) [32,377,004±137,024][32,377,004\pm 137,024] 𝟏𝟖𝟑,𝟎𝟕𝟗±𝟐𝟐𝟑\bm{\langle 183,079\pm 223\rangle} 𝟓𝟎\bm{50} (𝟏𝟔𝟖±0.749)\bm{(168\pm 0.749)} [41,961,804±133,535][41,961,804\pm 133,535] 268,261±312\langle 268,261\pm 312\rangle 𝟓𝟎\bm{50} (652±4.57)(652\pm 4.57) [150,228,580±756,329][150,228,580\pm 756,329] 𝟖𝟖𝟕,𝟓𝟐𝟓±𝟏,𝟐𝟏𝟕\bm{\langle 887,525\pm 1,217\rangle} 𝟓𝟎\bm{50} (𝟏,𝟓𝟎𝟐±17.5)\bm{(1,502\pm 17.5)} [292,637,047±1,224,742][292,637,047\pm 1,224,742] 𝟏,𝟗𝟒𝟓,𝟏𝟓𝟔±𝟑,𝟔𝟒𝟑\bm{\langle 1,945,156\pm 3,643\rangle}
LRTDP hroch^{\text{roc}} 𝟓𝟎\bm{50} (93.2±0.568)(93.2\pm 0.568) [41,042,790±152,970][41,042,790\pm 152,970] 92,102±236\langle 92,102\pm 236\rangle 𝟓𝟎\bm{50} (674±4.92)(674\pm 4.92) [352,166,761±1,539,678][352,166,761\pm 1,539,678] 196,298±270\langle 196,298\pm 270\rangle 𝟓𝟎\bm{50} (1,513±8.48)(1,513\pm 8.48) [797,662,187±3,139,450][797,662,187\pm 3,139,450] 𝟐𝟔𝟓,𝟖𝟒𝟒±𝟐𝟏𝟑\bm{\langle 265,844\pm 213\rangle} 𝟓𝟎\bm{50} (1,335±11.3)(1,335\pm 11.3) [547,832,850±1,127,533][547,832,850\pm 1,127,533] 984,575±2,753\langle 984,575\pm 2,753\rangle
CG-iLAO hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (68.8±0.339)(68.8\pm 0.339) [9,184,226±26,200][9,184,226\pm 26,200] 148,718±270\langle 148,718\pm 270\rangle 𝟓𝟎\bm{50} (212±2.82)(212\pm 2.82) [26,805,385±35,474][26,805,385\pm 35,474] 351,686±229\langle 351,686\pm 229\rangle 𝟓𝟎\bm{50} (322±2.82)(322\pm 2.82) [38,522,863±94,665][38,522,863\pm 94,665] 517,017±607\langle 517,017\pm 607\rangle 𝟓𝟎\bm{50} (1,003±7.77)(1,003\pm 7.77) [105,649,983±292,360][105,649,983\pm 292,360] 1,669,291±2,205\langle 1,669,291\pm 2,205\rangle
FTVI hlmch^{\text{lmc}} 20 (367±116)(367\pm 116) [12,477,085±70,147][12,477,085\pm 70,147] 563,694±6,066\langle 563,694\pm 6,066\rangle
iLAO hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (84.9±1.31)(84.9\pm 1.31) [20,392,448±117,401][20,392,448\pm 117,401] 143,780±238\langle 143,780\pm 238\rangle 𝟓𝟎\bm{50} (213±1.13)(213\pm 1.13) [49,478,612±188,165][49,478,612\pm 188,165] 342,211±345\langle 342,211\pm 345\rangle 𝟓𝟎\bm{50} (296±2.26)(296\pm 2.26) [67,320,675±240,270][67,320,675\pm 240,270] 471,351±641\langle 471,351\pm 641\rangle 𝟓𝟎\bm{50} (1,112±6.03)(1,112\pm 6.03) [209,913,627±967,213][209,913,627\pm 967,213] 1,614,163±2,175\langle 1,614,163\pm 2,175\rangle
LRTDP hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (340±1.30)(340\pm 1.30) [119,765,590±396,502][119,765,590\pm 396,502] 190,315±712\langle 190,315\pm 712\rangle
Table 7: PARC-R: A(B)[C]DA(B)[C]\langle D\rangle where AA is the coverage out of 50 runs, (B)(B) is the mean CPU time in seconds (and the 95% confidence interval), [C][C] is the mean number of QQ-values computed (and the 95% C.I.), and D\langle D\rangle is the mean number of calls to the heuristic (and the 95% C.I.). Values for BB, CC and DD only consider successful runs.
algorithm heuristic TW(4,8)\text{TW}(4,8) (original TW 4) TW(5,10)\text{TW}(5,10) (original TW 5) TW(6,8)\text{TW}(6,8) TW(6,9)\text{TW}(6,9)
CG-iLAO hroch^{\text{roc}} 𝟓𝟎\bm{50} (14.4±0.211)(14.4\pm 0.211) [1,148,040±16,387][1,148,040\pm 16,387] 𝟒𝟎,𝟔𝟑𝟒±𝟏𝟑𝟔\bm{\langle 40,634\pm 136\rangle} 𝟓𝟎\bm{50} (𝟑𝟗𝟓±3.11)\bm{(395\pm 3.11)} [31,842,440±247,363][31,842,440\pm 247,363] 𝟓𝟕𝟗,𝟏𝟕𝟔±𝟏,𝟕𝟕𝟗\bm{\langle 579,176\pm 1,779\rangle} 𝟓𝟎\bm{50} (𝟓𝟏𝟔±6.29)\bm{(516\pm 6.29)} [32,427,798±556,567][32,427,798\pm 556,567] 𝟗𝟎𝟖,𝟗𝟑𝟔±𝟏,𝟔𝟔𝟓\bm{\langle 908,936\pm 1,665\rangle} 𝟓𝟎\bm{50} (𝟏,𝟐𝟐𝟑±14.4)\bm{(1,223\pm 14.4)} [𝟖𝟎,𝟕𝟒𝟔,𝟐𝟏𝟓±𝟏,𝟎𝟔𝟏,𝟔𝟔𝟕]\bm{[80,746,215\pm 1,061,667]} 𝟏,𝟖𝟕𝟓,𝟖𝟓𝟕±𝟒,𝟏𝟏𝟏\bm{\langle 1,875,857\pm 4,111\rangle}
FTVI hroch^{\text{roc}} 𝟓𝟎\bm{50} (42.0±0.601)(42.0\pm 0.601) [𝟑𝟖𝟔,𝟕𝟎𝟐±𝟑,𝟑𝟓𝟔]\bm{[386,702\pm 3,356]} 190,435±3,029\langle 190,435\pm 3,029\rangle 𝟓𝟎\bm{50} (695±11.5)(695\pm 11.5) [𝟕,𝟐𝟗𝟎,𝟕𝟕𝟑±𝟖𝟑,𝟔𝟗𝟐]\bm{[7,290,773\pm 83,692]} 2,400,870±43,908\langle 2,400,870\pm 43,908\rangle 𝟓𝟎\bm{50} (1,156±31.4)(1,156\pm 31.4) [𝟗,𝟒𝟐𝟗,𝟖𝟗𝟔±𝟏𝟐𝟎,𝟖𝟏𝟔]\bm{[9,429,896\pm 120,816]} 3,646,876±109,529\langle 3,646,876\pm 109,529\rangle
iLAO hroch^{\text{roc}} 𝟓𝟎\bm{50} (20.3±0.162)(20.3\pm 0.162) [2,456,523±11,941][2,456,523\pm 11,941] 50,203±295\langle 50,203\pm 295\rangle 𝟓𝟎\bm{50} (627±5.06)(627\pm 5.06) [64,753,531±556,643][64,753,531\pm 556,643] 730,691±4,690\langle 730,691\pm 4,690\rangle 𝟓𝟎\bm{50} (794±9.02)(794\pm 9.02) [63,978,036±925,163][63,978,036\pm 925,163] 1,175,526±7,985\langle 1,175,526\pm 7,985\rangle
LRTDP hroch^{\text{roc}} 𝟓𝟎\bm{50} (23.6±0.134)(23.6\pm 0.134) [2,003,448±12,008][2,003,448\pm 12,008] 88,076±545\langle 88,076\pm 545\rangle 𝟓𝟎\bm{50} (481±6.70)(481\pm 6.70) [49,385,315±285,830][49,385,315\pm 285,830] 1,048,156±7,230\langle 1,048,156\pm 7,230\rangle 𝟓𝟎\bm{50} (719±7.32)(719\pm 7.32) [58,413,438±267,244][58,413,438\pm 267,244] 1,563,223±13,711\langle 1,563,223\pm 13,711\rangle 45 (1,667±16.6)(1,667\pm 16.6) [140,361,933±662,188][140,361,933\pm 662,188] 3,195,700±27,373\langle 3,195,700\pm 27,373\rangle
CG-iLAO hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (14.9±0.380)(14.9\pm 0.380) [2,090,103±11,607][2,090,103\pm 11,607] 100,788±452\langle 100,788\pm 452\rangle 𝟓𝟎\bm{50} (581±10.5)(581\pm 10.5) [58,509,834±285,036][58,509,834\pm 285,036] 1,717,974±10,015\langle 1,717,974\pm 10,015\rangle 𝟓𝟎\bm{50} (972±16.0)(972\pm 16.0) [90,024,795±604,542][90,024,795\pm 604,542] 3,208,338±17,172\langle 3,208,338\pm 17,172\rangle
FTVI hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (20.7±0.257)(20.7\pm 0.257) [991,116±14,912][991,116\pm 14,912] 460,168±6,570\langle 460,168\pm 6,570\rangle 𝟓𝟎\bm{50} (746±13.7)(746\pm 13.7) [21,539,497±356,024][21,539,497\pm 356,024] 7,726,282±108,583\langle 7,726,282\pm 108,583\rangle 37 (1,621±29.0)(1,621\pm 29.0) [36,677,426±771,289][36,677,426\pm 771,289] 14,835,983±193,833\langle 14,835,983\pm 193,833\rangle
iLAO hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (22.7±0.212)(22.7\pm 0.212) [4,444,825±21,791][4,444,825\pm 21,791] 111,278±490\langle 111,278\pm 490\rangle 𝟓𝟎\bm{50} (1,082±12.2)(1,082\pm 12.2) [129,029,339±518,976][129,029,339\pm 518,976] 1,893,490±8,924\langle 1,893,490\pm 8,924\rangle 40 (1,735±12.3)(1,735\pm 12.3) [195,389,276±910,083][195,389,276\pm 910,083] 3,527,186±17,438\langle 3,527,186\pm 17,438\rangle
LRTDP hlmch^{\text{lmc}} 𝟓𝟎\bm{50} (17.3±0.114)(17.3\pm 0.114) [3,018,205±12,805][3,018,205\pm 12,805] 321,326±5,765\langle 321,326\pm 5,765\rangle 𝟓𝟎\bm{50} (601±15.2)(601\pm 15.2) [67,063,225±207,619][67,063,225\pm 207,619] 4,924,764±104,716\langle 4,924,764\pm 104,716\rangle 49 (1,490±33.9)(1,490\pm 33.9) [115,518,704±447,329][115,518,704\pm 447,329] 9,968,737±218,595\langle 9,968,737\pm 218,595\rangle
CG-iLAO hmaxh^{\text{max}} 𝟓𝟎\bm{50} (10.6±0.307)(10.6\pm 0.307) [2,090,103±11,607][2,090,103\pm 11,607] 100,788±452\langle 100,788\pm 452\rangle 𝟓𝟎\bm{50} (473±7.72)(473\pm 7.72) [58,509,834±285,036][58,509,834\pm 285,036] 1,717,974±10,015\langle 1,717,974\pm 10,015\rangle 𝟓𝟎\bm{50} (737±13.1)(737\pm 13.1) [90,024,795±604,542][90,024,795\pm 604,542] 3,208,338±17,172\langle 3,208,338\pm 17,172\rangle 11 (1,746±19.6)(1,746\pm 19.6) [201,475,187±2,092,087][201,475,187\pm 2,092,087] 6,584,678±88,077\langle 6,584,678\pm 88,077\rangle
FTVI hmaxh^{\text{max}} 𝟓𝟎\bm{50} (11.7±0.189)(11.7\pm 0.189) [991,116±14,912][991,116\pm 14,912] 460,168±6,570\langle 460,168\pm 6,570\rangle 𝟓𝟎\bm{50} (520±11.5)(520\pm 11.5) [21,539,497±356,024][21,539,497\pm 356,024] 7,726,282±108,583\langle 7,726,282\pm 108,583\rangle 𝟓𝟎\bm{50} (1,147±38.7)(1,147\pm 38.7) [38,153,835±990,907][38,153,835\pm 990,907] 15,053,868±212,968\langle 15,053,868\pm 212,968\rangle
iLAO hmaxh^{\text{max}} 𝟓𝟎\bm{50} (18.7±0.249)(18.7\pm 0.249) [4,444,825±21,791][4,444,825\pm 21,791] 111,278±490\langle 111,278\pm 490\rangle 𝟓𝟎\bm{50} (992±11.2)(992\pm 11.2) [129,029,339±518,976][129,029,339\pm 518,976] 1,893,490±8,924\langle 1,893,490\pm 8,924\rangle 𝟓𝟎\bm{50} (1,604±16.8)(1,604\pm 16.8) [195,835,939±828,889][195,835,939\pm 828,889] 3,536,389±16,793\langle 3,536,389\pm 16,793\rangle
LRTDP hmaxh^{\text{max}} 𝟓𝟎\bm{50} (9.80±0.083)\bm{(9.80\pm 0.083)} [3,013,611±10,921][3,013,611\pm 10,921] 321,347±5,796\langle 321,347\pm 5,796\rangle 𝟓𝟎\bm{50} (415±8.91)(415\pm 8.91) [66,996,690±214,188][66,996,690\pm 214,188] 4,923,361±104,809\langle 4,923,361\pm 104,809\rangle 𝟓𝟎\bm{50} (1,030±31.3)(1,030\pm 31.3) [115,558,245±461,900][115,558,245\pm 461,900] 10,022,348±213,992\langle 10,022,348\pm 213,992\rangle
Table 8: TW: A(B)[C]DA(B)[C]\langle D\rangle where AA is the coverage out of 50 runs, (B)(B) is the mean CPU time in seconds (and the 95% confidence interval), [C][C] is the mean number of QQ-values computed (and the 95% C.I.), and D\langle D\rangle is the mean number of calls to the heuristic (and the 95% C.I.). Values for BB, CC and DD only consider successful runs.