This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

General Policies, Serializations, and Planning Width

Blai Bonet,1 Hector Geffner2
Abstract

It has been observed that in many of the benchmark planning domains, atomic goals can be reached with a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width. Such problems have indeed a bounded width: a width that does not grow with the number of problem variables and is often no greater than two. Yet, while the notion of width has become part of the state-of-the-art planning algorithms like BFWS, there is still no good explanation for why so many benchmark domains have bounded width. In this work, we address this question by relating bounded width and serialized width to ideas of generalized planning, where general policies aim to solve multiple instances of a planning problem all at once. We show that bounded width is a property of planning domains that admit optimal general policies in terms of features that are explicitly or implicitly represented in the domain encoding. The results are extended to the larger class of domains with bounded serialized width where the general policies do not have to be optimal. The study leads also to a new simple, meaningful, and expressive language for specifying domain serializations in the form of policy sketches which can be used for encoding domain control knowledge by hand or for learning it from traces. The use of sketches and the meaning of the theoretical results are all illustrated through a number of examples.00footnotetext: This paper is the long version of (Bonet and Geffner 2021).

Introduction

Pure width-based search methods exploit the structure of states to enumerate the state space in ways that are different from standard methods like breadth-first, depth-first, or random search (Lipovetzky and Geffner 2012). For this, width-based methods appeal to a notion of novelty to establish a preference for first visiting states that are most novel. Novelty-based methods have also been used in the context of genetic algorithms where a greedy focus on the function to optimize (fitness) often leads to bad local optima (Lehman and Stanley 2011a, b), and in reinforcement learning to guide exploration in large spaces where reward is sparse (Tang et al. 2017; Pathak et al. 2017; Ostrovski et al. 2017).

In classical planning, i.e., planning in factored spaces for achieving a given goal from a known initial state (Geffner and Bonet 2013; Ghallab, Nau, and Traverso 2016), the notion of novelty is now part of state-of-the-art search algorithms like BFWS (Lipovetzky and Geffner 2017b, a) and has been applied successfully in purely exploratory settings where no compact model of the actions or goals is assumed to be known a priori (Francès et al. 2017; Bandres, Bonet, and Geffner 2018). The basic width-based planning algorithms are simple and they all assume that the states are factored, assigning values to a fixed number of boolean features FF that in classical planning is given by the atoms in the problem. The procedure IW(1)\text{IW}(1) is indeed a breadth-first search that starts in the given initial state and prunes all the states that do not make a feature from FF true for the first time in the search. IW(k)\text{IW}(k) is like IW(1)\text{IW}(1) but using the set of features FkF^{k} that stand for conjunctions of up to kk features from FF. For many benchmark domains, it has been shown that IW(k)\text{IW}(k) for a small and constant value of kk like k=2k=2, called the domain width, suffices to compute plans, and indeed optimal plans, for any atomic goal (Lipovetzky and Geffner 2012). Other algorithms like BFWS make use of this property for serializing conjunctive goals into atomic ones; an idea that is also present in algorithms that precompute atomic landmarks (Hoffmann, Porteous, and Sebastia 2004), and in particular, those that use landmark counting heuristics (Richter and Westphal 2010).

A key open question in the area is why these width-based methods are effective at all, and in particular, why so many domains have a small width when atomic goals are considered. Is this a property of the domains? Is it an accident of the manual representations used? In this work, we address these and related questions. For this, we bring the notion of general policies; policies that solve multiple instances of a planning domain all at once (Srivastava, Immerman, and Zilberstein 2008; Bonet, Palacios, and Geffner 2009; Hu and De Giacomo 2011; Belle and Levesque 2016; Segovia, Jiménez, and Jonsson 2016), in some cases by appealing to a fixed set Φ\Phi of general state features that we restrict to be linear. A number of correspondences are then obtained connecting the notions of width, the size of general policies as measured by the number of features used, and the more general and useful notion of serialized width; i.e., width under a given serialization. The study leads us to formalize the abstract notion of serialization and to analyze the conditions under which serializations are well-formed. Moreover, from the relations established between general policies and serializations, we obtain a new simple, meaningful, and expressive language for specifying serializations, called policy sketches, which can be used for encoding domain control knowledge by hand or for learning it from traces. While exploring these potential uses of sketches is beyond the scope of this paper, the use of sketches and the meaning of the theoretical results are all illustrated through a number of examples.

The paper is organized as follows. We review first the notions of width, representations, and general policies, and relate width with the size of such policies, as measured by the number of features. Then we introduce serializations, the more general notion of serialized width, the relation between general policies and serialized width, and finally, policy sketches and their properties.

Width

IW(1)\text{IW}(1) is a simple search procedure that operates on a rooted directed graph where the nodes represent states, and states assign values vv to a given set FF of features ff (Lipovetzky and Geffner 2012). IW(1)\text{IW}(1) performs a breadth-first search starting at the root but pruning the states that do not make an atom f=vf{=}v true for the first time in the search. For classical planning problems expressed in languages such as (grounded) STRIPS, the features ff are the problem variables which can take the values true or false. In other settings, like the Atari games as supported in ALE (Bellemare et al. 2013), the features and their values are defined in other ways (Lipovetzky, Ramirez, and Geffner 2015; Bandres, Bonet, and Geffner 2018). The procedure IW(k)\text{IW}(k) for k>1k>1 is IW(1)\text{IW}(1) but with a feature set given by FkF^{k}.

A finding reported by Lipovetzky and Geffner (2012) and exploited since in width-based algorithms is that the procedure IW(k)\text{IW}(k) with k=2k=2 suffices to solve a wide variety of planning problems. Indeed, Lipovetzky and Geffner consider the 37,921 instances that result from all the domains used in planning competitions until 2012, where each instance with a goal made up of a conjunction of kk atoms is split into kk instances, each one with a single atomic goal. They report that IW(2)\text{IW}(2) solves more than 88% of such instances, and moreover, 100% of the instances from 26 of the 37 domains considered.

This is remarkable because IW(k)\text{IW}(k) when kk is smaller than the number nn of problem variables is an incomplete procedure. Indeed, while breadth-first search expands a number of nodes that is exponential in nn, IW(2)\text{IW}(2) expands a quadratic number of nodes at most. Moreover, the results are not an accident resulting from a lucky choice of the instances, as one can formally prove that IW(2)\text{IW}(2) solves any instance of many of these domains when the goal is a single atom.

Underlying the IW algorithms is the notion of problem width, which borrows from similar notions developed for constraint satisfaction problems and Bayesian networks, that are intractable but have algorithms that run in time and space that are exponential in the treewidth of the underlying graphs (Freuder 1982; Pearl 1988; Dechter 2013). For planning, the notion introduced by Lipovetzky and Geffner takes this form:111For convenience, we set the width of problems that can be solved in one step to zero.

Definition 1 (Based on Lipovetzky and Geffner, 2012).

The width w(P)w(P) of problem PP is the minimum kk for which there is a sequence t0,t1,,tmt_{0},t_{1},\ldots,t_{m} of atom tuples tit_{i}, each with at most kk atoms, such that:

  1. 1.

    t0t_{0} is true in the initial state of PP,

  2. 2.

    any optimal plan for tit_{i} can be extended into an optimal plan for ti+1t_{i+1} by adding a single action, i=1,,n1i=1,\ldots,n-1,

  3. 3.

    any optimal plan for tmt_{m} is an optimal plan for PP.

The width is w(P)=0w(P)=0 iff the initial state of PP is a goal state. For convenience, we set w(P)w(P) to 0 if the goal of PP is reachable in a single step, and to w(P)=N+1w(P)=N+1 if PP has no solution where NN is the number of atoms in PP.

Chains of tuples θ=(t0,t1,,tm)\theta=(t_{0},t_{1},\ldots,t_{m}) that comply with conditions 1–3 are called admissible, and the size of the chain is the size |ti||t_{i}| of the largest tuple in the chain. The width w(P)w(P) is thus kk if kk is the minimum size of an admissible chain for PP. Notice that the definition of width does not presuppose a particular language for specifying the actions or goals, which can actually be specified procedurally. It just assumes that states are factored and assign truth values to a pool of atoms. The tuples tit_{i} in an admissible chain can be regarded as subgoals or stepping stones in the way to the goal of PP that ensure that it will be reached optimally. The main properties of the IW(k)\text{IW}(k) algorithm can then be expressed as follows:222It is assumed that the number of atoms affected by an action is bounded by a constant. When this is not the case, the time bound in Theorem 2 becomes O(bN2k)O(bN^{2k}).

Theorem 2 (Lipovetzky and Geffner, 2012).

IW(k)\text{IW}(k) expands up to NkN^{k} nodes, generates up to bNkbN^{k} nodes, and runs in time and space O(bN2k1)O(bN^{2k-1}) and O(bNk)O(bN^{k}), respectively, where NN is the number of atoms and bb bounds the branching factor in problem PP. IW(k)\text{IW}(k) is guaranteed to solve PP optimally (shortest path) if w(P)kw(P)\leq k.

Proof.

Since the number of tuples of size at most kk is bounded by NkN^{k}, IW(k)\text{IW}(k) expands up to NkN^{k} nodes and generates up to bNkbN^{k} nodes, where each expansion takes time bounded by bb. Under the assumption that each transition flips the value of up to MM (constant) number of atoms, each dequed state can make true up to (Nk)(NMk)=O(Nk1){N\choose k}-{N-M\choose k}=O(N^{k-1}) new tuples of atoms of size at most kk. Seen tuples are stored in a perfect hash of size O(Nk)O(N^{k}) which supports operations in constant time. Therefore, the test for expansion for each dequed state takes time O(Nk1)O(N^{k-1}), and thus the total time incurred by IW(k)\text{IW}(k) for exploring the search tree is O(bN2k1)O(bN^{2k-1}). From the definition of width, it is not hard to see that if w(P)kw(P)\leq k, then IW(k)\text{IW}(k) generates a node for a goal state, and the path from the initial state to the first goal state that is generated is an optimal (shortest) path. ∎

When the width of problem PP is not known, the IW algorithm can be run instead which calls IW(k)\text{IW}(k) iteratively for k=0,1,,Nk\!=\!0,1,\ldots,N until the problem is solved, or shown to have no solution when IW(N)\text{IW}(N) finds no plan. While these algorithms are not aimed at being practical as planning algorithms, some state-of-the-art planners, make use of these ideas in a slightly different form (Lipovetzky and Geffner 2017b, a).

Example.  Let 𝒬clear\mathcal{Q}_{clear} be the class of Blocksworld problems PP in the standard stack/unstack encoding whose goal is to achieve the atom clear(x)clear(x) and an empty gripper, starting with clear(x)clear(x) false and an empty gripper. Let B1,,BmB_{1},\ldots,B_{m} for m>0m>0 be the blocks above xx from top to bottom in PP, and let us consider the chain of subgoals t0,,t2m1t_{0},\ldots,t_{2m-1} where t0={clear(B1)}t_{0}=\{clear(B_{1})\}, t2i1={hold(Bi)}t_{2i-1}=\{hold(B_{i})\}, and t2i={ontable(Bi)}t_{2i}=\{ontable(B_{i})\}, i=1,,mi=1,\ldots,m. It is easy to check that conditions 1–3 above hold for this chain, hence, w(P)1w(P)\leq 1. Since w(P)>0w(P)>0, as the goal cannot be reached in zero or one step in general, w(P)=1w(P)=1.  

Example.  Let 𝒬on\mathcal{Q}_{on} be the class of Blocksworld problems PP where the goal is on(x,y)on(x,y). For simplicity, let us assume that xx and yy are not initially in the same tower, and there are blocks B1,,BmB_{1},\ldots,B_{m} above xx, and blocks D1,,DD_{1},\ldots,D_{\ell} above yy. It is not difficult to check that the chain t0,,t2m,t0,,t2,t0′′,t1′′t_{0},\ldots,t_{2m},t^{\prime}_{0},\ldots,t^{\prime}_{2\ell},t^{\prime\prime}_{0},t^{\prime\prime}_{1} is admissible and of size 2, where tit_{i} for i=0,,2mi=0,\ldots,2m is as in previous example, t2i={hold(Di),clear(x)}t^{\prime}_{2i}=\{hold(D_{i}),clear(x)\} and t2i1t^{\prime}_{2i-1} ={ontable(Di),=\{ontable(D_{i}), clear(y)}clear(y)\} for i=0,,i=0,\ldots,\ell, t0′′={hold(x),clear(y)}t^{\prime\prime}_{0}=\{hold(x),clear(y)\}, and t1′′={on(x,y)}t^{\prime\prime}_{1}=\{on(x,y)\}. Then, w(P)=2w(P)=2 since w(P)1w(P)\not\leq 1.  

Representations

The width of a planning problem is tied to the representation language and the encoding of the problem in the language. For example, the problem of moving a number of packages NN from one room to the next, one by one, has a width that grows with NN in standard encodings where each package has a name, but width 22 when the packages are indistinguishable from each other and the number of packages is encoded in unary (with one atom per counter value).333The problem would still not have bounded width if the counter is represented in binary using a logarithmic number of atoms.

In order to deal with a variety of possible languages and encodings, and since width-based methods rely on the structure of states but not on the structure of actions (i.e., action preconditions and effects), we consider first-order languages for describing states in terms of atoms that represent objects and relations, leaving out from the language the representation of action preconditions and effects. It is assumed that the possible state transitions (s,s)(s,s^{\prime}) from a state ss are a function of the state but no particular representation of this function is assumed. In addition, the state language is extended with features ff whose values f(s)f(s) in a state ss are determined by the state. The features provide additional expressive power and ways for bridging different state representation languages, although they are logically redundant as their value is determined by the truth value of the atoms in the state. The features extend the notion of derived predicates as defined in PDDL, as they do not have to be boolean, and they do not have to be defined in the language of first-order logic or logic programs (Thiébaux, Hoffmann, and Nebel 2005), but can be defined via procedures. Domains, instances, and states are defined as follows:

Definition 3 (Domains, problems, and states).

A domain is a pair D=(R,F)D=(R,F) where RR is a set of primitive predicate symbols with their corresponding arities, and FF is a set of features defined in terms of the primitive predicates with their corresponding range of feature values. A problem PP over domain D=(R,F)D=(R,F) is a tuple P=(D,O,I,G)P=(D,O,I,G) where OO is a set of unique object names cc (objects), and II and GG are sets of ground atoms that denote the initial and goal states of PP. A ground atom r(c1,,ca(r))r(c_{1},\ldots,c_{a(r)}) is made of a predicate rRr\in R and an object tuple in Oa(r)O^{a(r)} for the arity a(r)a(r) of rr. A state ss over problem P=(D,O,I,G)P=(D,O,I,G) is a collection of ground atoms. The state ss is a goal if GsG\subseteq s, and a dead end if a goal state cannot be reached from ss in PP. A state ss denotes a unique valuation for the ground atoms in PP: sr(c1,,ck)s\vDash r(c_{1},\ldots,c_{k}) iff r(c1,,ck)r(c_{1},\ldots,c_{k}) belongs to ss.

This is all standard except for the two details mentioned before: there are no action schemas, and there are state features. For the former, it is implicitly assumed that in each problem PP, there is a function that maps states ss into the set of possible transitions (s,s)(s,s^{\prime}). This implies, for example, that the states may contain static atoms, like adjacency relations, whose truth value are not affected by any action. For the features, we make the assumption that they are linear, in the sense that they can be computed efficiently and only span a linear number of values. More precisely:

Linear features assumption.  The features ff in FF are either boolean or numerical, ranging in the latter case over the non-negative integers. The value of the feature ff in a state ss for problem PP, f(s)f(s), can be computed in time bounded by O(bN)O(bN) where NN is the number of atoms and bb bounds the branching factor in PP. Numerical features can take up to NN values.  

This assumption rules out features like V(s)V^{*}(s) that stands for the optimal cost (distance) from ss to a goal which may take a number of values that is not linear in the number of problem atoms, and whose computation may take exponential time. In many cases, the features can be defined in the language of first-order logic but this is not a requirement.

Example.  Three state languages for Blocksworld are:

  1. 1.

    BW1\mathcal{L}_{BW}^{1} with the binary predicate (symbol) on2on^{2} and the unary ontable1ontable^{1} (superindex indicates arity),

  2. 2.

    BW2\mathcal{L}_{BW}^{2} with predicates on2on^{2}, ontable1ontable^{1}, hold1hold^{1}, and clear1clear^{1},

  3. 3.

    BW3\mathcal{L}_{BW}^{3} with predicates on2on^{2} and hold1hold^{1}, and boolean features ontable1ontable^{1} and clear1clear^{1}.  

Example.  Four languages for a domain Boxes, where boxes bb containing marbles mama must be removed from a table, and for this, the marbles in the box must be removed one by one first:

  1. 1.

    B1\mathcal{L}_{B}^{1} with predicates ontable1(b)ontable^{1}(b) and in2(ma,b)in^{2}(ma,b),

  2. 2.

    B2\mathcal{L}_{B}^{2} with predicates ontable1ontable^{1}, in2in^{2}, and empty1(b)empty^{1}(b),

  3. 3.

    B3\mathcal{L}_{B}^{3} with predicates ontable1ontable^{1} and in2in^{2}, and features n(b)n(b) that count the number of marbles in bb,

  4. 4.

    B4\mathcal{L}_{B}^{4} with predicates ontable1ontable^{1} and in2in^{2}, and features mm and nn counting the number of marbles in a box with the least number of marbles, and the number of boxes left.  

By abstracting away the details of the domain dynamics and the ability to introduce features, it is simple to move from one state representation to another.

The notion of width and the IW algorithms generalize to state languages containing features in a direct fashion. In both cases, the set of atoms considered is extended to contain the possible feature values f=vf{=}v where ff is a feature and vv is one of its possible values. Features are logically redundant but can have drastic effect on the problem width.

The width for class 𝒬\mathcal{Q} of problems PP over some domain DD is kk, written as w(𝒬)=kw({\mathcal{Q}})=k, if w(P)=kw(P)=k for some P𝒬P\in\mathcal{Q} and w(P)kw(P^{\prime})\leq k for every other problem PP^{\prime} in 𝒬\mathcal{Q}.

Example.  The width for the class of problems 𝒬clear\mathcal{Q}_{clear} where block xx has to be cleared has width 11 in the state languages BWi\mathcal{L}_{BW}^{i}, i=1,2,3i=1,2,3, while the class 𝒬on\mathcal{Q}_{on} has width 22 for the three languages. On the other hand, for the class 𝒬B1\mathcal{Q}_{B_{1}} of instances from Boxes with a single box, the width is not bounded as it grows with the number of marbles when encoded in the languages B1\mathcal{L}_{B}^{1} and B2\mathcal{L}_{B}^{2}, but it is 11 when encoded in the languages B3\mathcal{L}_{B}^{3} or B4\mathcal{L}_{B}^{4}. Likewise, for the class 𝒬B\mathcal{Q}_{B} of instances from Boxes with arbitrary number of boxes, the encoding in the language B3\mathcal{L}_{B}^{3} has width that is not bounded as it grows with the number of boxes, but remains bounded and equal to 22 in B4\mathcal{L}_{B}^{4}.  

Generalized policies

We want to show that bounded width is a property of domains that admit a certain class of general policies. Different language for expressing general policies have been developed, some of which can deal with relational domains where different instances involve different (ground) actions. Most closely to this work, general policies have been defined in terms of qualitative numerical planning problems (QNPs) (Srivastava et al. 2011; Bonet and Geffner 2018, 2020). We build on this idea but avoid the introduction of QNPs by defining policies directly as mappings from boolean feature conditions into feature value changes.

A boolean feature condition for a set of features Φ\Phi is a condition of the form pp or ¬p\neg p for a boolean feature pp in Φ\Phi, or n=0n=0 or n>0n>0 for a numerical feature nn in Φ\Phi. Similarly, a feature value change for Φ\Phi is an expression of the form pp, ¬p\neg p, or p?p? for a boolean feature pp in Φ\Phi, and nn\raisebox{0.6458pt}{$\downarrow$}, nn\raisebox{0.6458pt}{$\uparrow$}, or n?n? for a numerical feature nn in Φ\Phi. General policies are given by a set of rules CEC\mapsto E where CC and EE stands for boolean feature conditions and feature changes respectively.

Definition 4 (Policies).

A general policy πΦ\pi_{\Phi} for a domain DD over a set of features Φ\Phi is a set of rules of the form CEC\mapsto E, where CC is a set of boolean feature conditions and EE is a set of feature value changes. The condition n> 0n{\,>\,}0 is assumed in rules with effects nn\raisebox{0.6458pt}{$\downarrow$} or n?n?.

The policy πΦ\pi_{\Phi} prescribes the possible actions aa to be done in a state ss over a problem PP indirectly, as the set of state transitions (s,s)(s,s^{\prime}) that the actions in PP make possible and which are compatible with the policy:

Definition 5.

A transition (s,s)(s,s^{\prime}) satisfies the effect EE when:

  1. 1.

    if pp (resp. ¬p\neg p) in EE, p(s)=1p(s^{\prime})=1 (resp. p(s)=0p(s^{\prime})=0),

  2. 2.

    if nn\raisebox{0.6458pt}{$\downarrow$} (resp. nn\raisebox{0.6458pt}{$\uparrow$}) in EE, n(s)>n(s)n(s)>n(s^{\prime}) (resp. n(s)<n(s))n(s)<n(s^{\prime})),

  3. 3.

    if pp (resp. nn) is not mentioned at all in EE, p(s)=p(s)p(s)=p(s^{\prime}) (resp. n(s)=n(s)n(s)=n(s^{\prime})).

The transition (s,s)(s,s^{\prime}) is compatible with policy πΦ\pi_{\Phi} (or is a πΦ\pi_{\Phi}-transition) if there is a policy rule CEC\mapsto E such that ss makes true CC and (s,s)(s,s^{\prime}) satisfies EE.

Policy rules provide a description of how the value of the features must change along the state trajectories that are compatible with the policy. Every transition (s,s)(s,s^{\prime}) in such trajectories have to be compatible with a policy rule CEC\mapsto E. The expressions p?p? and n?n? in EE stand for uncertain effects, meaning that pp and nn may change in any way or not change at all. Features not mentioned in EE, on the other hand, must keep their values unchanged in a transition compatible with the rule CEC\mapsto E.

The definition does not exclude the presence of multiple policy rules CEC\mapsto E with conditions CC that are all true in a state ss. In such a case, for a state transition (s,s)(s,s^{\prime}) to be compatible with the policy, the definition requires (s,s)(s,s^{\prime}) to satisfy one of the effect expressions EE. On the other hand, if the body CC of a single rule CEC\mapsto E is true in ss, for the transition to be compatible with the policy, it must satisfy the effect EE of that rule.

Example.  A policy for solving the class 𝒬clear\mathcal{Q}_{clear} can be expressed in terms of the features Φ={H,n}\Phi=\{H,n\}, where HH is true if a block is being held, and nn counts the number of blocks above xx. The policy can be expressed with two rules:

{¬H,n> 0}{H,n};{H,n> 0}{¬H}.\{\neg H,n{\,>\,}0\}\mapsto\{H,n\raisebox{0.6458pt}{$\downarrow$}\}\ \ ;\ \ \{H,n{\,>\,}0\}\mapsto\{\neg H\}\,. (1)

The first rule says that when the gripper is empty and there are blocks above xx, an action that decreases nn and makes HH true must be chosen, while the second rule says that when the gripper holds a block and there are blocks above xx, an action that makes HH false and does not affect nn must be selected.  

The conditions under which a general policy πΦ\pi_{\Phi} solves an instance PP and class 𝒬\mathcal{Q} are:

Definition 6 (Trajectories and solutions).

A state trajectory s0,,sns_{0},\ldots,s_{n} for problem PP is compatible with policy πΦ\pi_{\Phi} over features Φ\Phi (or is πΦ\pi_{\Phi}-trajectory), iff s0s_{0} is the initial state of PP, no state sis_{i} is goal, 0i<n0\leq i<n, and each pair (si,si+1)(s_{i},s_{i+1}) is a possible state transition in PP that is compatible with πΦ\pi_{\Phi}. It is maximal if either sns_{n} is a goal state, no transition (sn,sn+1)(s_{n},s_{n+1}) in PP is compatible with πΦ\pi_{\Phi}, or the trajectory is infinite (i.e., n=n=\infty). A policy πΦ\pi_{\Phi} solves a problem PP if all maximal state trajectories s0,,sns_{0},\ldots,s_{n} compatible with πΦ\pi_{\Phi} are goal reaching (i.e., sns_{n} is a goal state in PP). πΦ\pi_{\Phi} solves a collection 𝒬\mathcal{Q} of problems if it solves each problem in 𝒬\mathcal{Q}.

It is easy to show that the policy captured by the rules in (1) solves 𝒬clear\mathcal{Q}_{clear}. The verification and synthesis of general policies of this type have been addressed by Bonet and Geffner (2020) in the context of qualitative numerical planning, and by Bonet, Frances, and Geffner (2019) where the set of features Φ\Phi and QNP model are learned from traces.

Generalized Policies and Width

The first result establishes a relation between classes of problems that are solvable by certain type of policies and their width. Namely, if there is a “Markovian” policy πΦ\pi_{\Phi} that generates optimal plans for a class 𝒬\mathcal{Q}, the problems in 𝒬\mathcal{Q} can be encoded to have width bounded by |Φ||\Phi|.

A policy πΦ\pi_{\Phi} over the features Φ\Phi is Markovian in 𝒬\mathcal{Q} when the features provide a suitable abstraction of the states; i.e., when the possible next feature valuations are not just a function of the current state, but of the feature valuation in the state. More precisely, if we say that a state ss is optimal for a feature valuation ff in a problem PP when f=f(s)f=f(s) and there is no state ss^{\prime} with the same feature valuation f=f(s)f=f(s^{\prime}) that is reachable in PP in less number of steps than ss, the Markovian property is defined as follows:

Definition 7 (Markovian).

A policy πΦ\pi_{\Phi} is Markovian for a problem PP iff the existence of a transition (s,s)(s,s^{\prime}) compatible with πΦ\pi_{\Phi} with feature valuations f=f(s)f=f(s) and f=f(s)f^{\prime}=f(s^{\prime}), implies the existence of transitions (s1,s1)(s_{1},s^{\prime}_{1}) with the same feature valuations f=f(s1)f=f(s_{1}) and f=f(s1)f^{\prime}=f(s^{\prime}_{1}), in all states s1s_{1} that are optimal for ff in PP. The policy is Markovian for a class of problems 𝒬\mathcal{Q} if it is so for each PP in 𝒬\mathcal{Q}.

The states s1s_{1} on which the Markovian condition is required in the definition are not necessarily among those which are reachable with the policy. The definition captures a weak Markovian assumption. The standard, but stronger, Markovian assumption requires the same condition on all states s1s_{1} that reach the feature valuation ff and not just those that reach ff optimally. A sufficient condition that ensures the (strong) Markovian property is for the policy to be deterministic over the feature valuations, meaning that if ff is followed by ff^{\prime} in some transition compatible with the policy, ff can always be followed by ff^{\prime} in every state transition (s,s)(s,s^{\prime}) compatible with the policy where f(s)=ff(s)=f, and moreover, that ff can only be followed by ff^{\prime} then. This form of determinism results when the bodies CC of the policy rules CEC\mapsto E are logically inconsistent with each other, the heads EE do not have uncertain effects p?p? or n?n?, the domain is such that all increments and decrements nn\raisebox{0.6458pt}{$\uparrow$} and nn\raisebox{0.6458pt}{$\downarrow$} are by fixed amounts, and there is a transition (s,s)(s,s^{\prime}) compatible with EE in the reachable states where CC holds.

For reasoning about the policy by reasoning about the features, the features however must also distinguish goal from non-goal states:

Definition 8 (Separation).

The features Φ\Phi separate goals from non-goals in 𝒬\mathcal{Q} iff there is a set of boolean feature valuations κ\kappa such that for any problem PP in 𝒬\mathcal{Q} and any reachable state ss in PP, ss is a goal state iff f(s)f(s) is in κ\kappa. The valuations in κ\kappa are called goal valuations.

The boolean feature valuations determined by state ss refer to the truth valuations of the expressions pp and n=0n=0 for the boolean and numerical features pp and nn in Φ\Phi, respectively. While the number of feature valuations is not bounded as the size of the instances in 𝒬\mathcal{Q} is not bounded in general, the number of boolean feature valuations is always 2|Φ|2^{|\Phi|}.

The last notion needed for relating general policies and the width of the instances is the notion of optimality:

Definition 9 (Optimal policies).

A policy πΦ\pi_{\Phi} that solves a class of problems 𝒬\mathcal{Q} is optimal if any plan ρ\rho induced by πΦ\pi_{\Phi} over a problem PP in 𝒬\mathcal{Q} is optimal for PP.

If the policy πΦ\pi_{\Phi} solves a problem PP, the plans induced by the policy are the action sequences ρ\rho that yield the goal-reaching state trajectories s0,,sns_{0},\ldots,s_{n} that are compatible with πΦ\pi_{\Phi}. These plans are optimal for PP if there are no shorter plans for PP.

It can be shown that if πΦ\pi_{\Phi} is a Markovian policy that solves the class of problems 𝒬\mathcal{Q} optimally with the features Φ\Phi separating goals from non-goals, then any feature valuation ff reached in the way to the goal in an instance PP in 𝒬\mathcal{Q} is reached optimally; i.e., no action sequence in PP can reach a state ss with the same feature valuation f(s)=ff(s)=f in a smaller number of steps.

Theorem 10.

Let πΦ\pi_{\Phi} be a Markovian policy that solves a class of problems 𝒬\mathcal{Q} optimally, and where the features Φ\Phi separate goals in 𝒬\mathcal{Q}. Then, πΦ\pi_{\Phi} is optimal relative to feature valuations. That is, if ρ\rho is a sequence of actions induced by πΦ\pi_{\Phi} over an instance P𝒬P\in\mathcal{Q} that reaches the state sis_{i}, ρ\rho is an optimal plan in PP for the feature valuation f(sif(s_{i}).

Proof.

Let τ=(s0,,sn)\tau^{*}=(s_{0},\ldots,s_{n}) be an optimal trajectory for PP compatible with πΦ\pi_{\Phi}, and let τ\tau be an optimal trajectory that ends in state ss with f(s)=f(si)f(s)=f(s_{i}), for 0in0\leq i\leq n. We want to show that |τ|=i|\tau|=i. By the Markovian property, there is a πΦ\pi_{\Phi}-transition (s,s)(s,s^{\prime}) with f(s)=f(si+1)f(s^{\prime})=f(s_{i+1}). If τ\tau extended with the state ss^{\prime} is an optimal trajectory for f(si+1)f(s_{i+1}), we can extend it again with a πΦ\pi_{\Phi}-(s,s′′)(s^{\prime},s^{\prime\prime}) into a trajectory for f(si+2)f(s_{i+2}). Otherwise, there is an optimal trajectory τ\tau^{\prime} for f(si+1)f(s_{i+1}) which can then be extended using the Markovian property into a trajectory for f(si+2)f(s_{i+2}). Thus, repeating the argument, there is an optimal trajectory τn\tau_{n} for f(sn)f(s_{n}) of length at most |τ|+ni|\tau|+n-i (where the length of a trajectory is the number of transitions in it). On the other hand, since πΦ\pi_{\Phi} is optimal and the features separate the goals, τn\tau_{n} is a goal-reaching trajectory and thus n|τ|+nin\leq|\tau|+n-i which implies i|τ|i\leq|\tau|. Since τ\tau^{*} contains a trajectory that reaches f(si)f(s_{i}) on ii steps, |τ|i|\tau|\leq i and thus |τ|=i|\tau|=i. ∎

As a result, under these conditions, the width of the instances in 𝒬\mathcal{Q} can be bounded by the number of features |Φ||\Phi| in the policy πΦ\pi_{\Phi}, provided that the features are represented explicitly in the instances:

Theorem 11.

Let πΦ\pi_{\Phi} be a Markovian policy that solves a class of problems 𝒬\mathcal{Q} optimally, where the features Φ\Phi separate goals. If the features in Φ\Phi are explicitly represented in the instances PP in 𝒬\mathcal{Q}, w(P)|Φ|w(P)\leq|\Phi|.

Proof.

Let k=|Φ|k=|\Phi|, let τ=(s0,,sn)\tau^{*}=(s_{0},\ldots,s_{n}) be a goal-reaching πΦ\pi_{\Phi}-trajectory in PP, and let ti=f(si)t_{i}=f(s_{i}), i=0,1,,ni=0,1,\ldots,n. Clearly, |ti|=k|t_{i}|=k and we only need to show that the chain θ=(t0,t1,,tn)\theta=(t_{0},t_{1},\ldots,t_{n}) is admissible; i.e., it satisfies the 3 conditions in Definition 1. The first condition is direct since t0=f(s0)t_{0}=f(s_{0}).

For the second condition, let ρ\rho be an optimal plan that achieves tuple tit_{i} at state ss. We need to show that there is an action aa in PP such that (ρ,a)(\rho,a) is an optimal plan for ti+1t_{i+1}. Since the transition (si,si+1)(s_{i},s_{i+1}) is reachable by πΦ\pi_{\Phi} (i.e., belongs to τ\tau^{*}), the Markovian property implies there must be a πΦ\pi_{\Phi}-transition (s,s)(s,s^{\prime}) associated to an action aa such that f(s)=f(si+1)=ti+1f(s^{\prime})=f(s_{i+1})=t_{i+1}. By Theorem 10, |ρ|=i|\rho|=i and any optimal plan for ti+1t_{i+1} is of length i+1i+1. Therefore, the plan (ρ,a)(\rho,a) is a plan for ti+1t_{i+1} of length i+1i+1 that is optimal.

For the last condition, we show that any optimal plan that achieves tnt_{n} is an optimal plan for problem PP. This is direct since tnt_{n} is a goal valuation as Φ\Phi separates goals: a trajectory is goal reaching iff it ends is a state that makes tnt_{n} true. ∎

Indeed, if a policy πΦ\pi_{\Phi} solves 𝒬\mathcal{Q} optimally under the given conditions, an admissible chain t0,t1,,tnt_{0},t_{1},\ldots,t_{n} of size k=|Φ|k=|\Phi| can be formed for solving each problem PP in 𝒬\mathcal{Q} optimally, where tit_{i} is the valuation of the features in Φ\Phi at the ii-th state of any state trajectory that results from applying the policy.

Related to this result, if we let IWΦ\text{IW}_{\Phi} refer to the variant of IW that replaces tuples of atoms by feature valuations and thus deems a state ss novel in the search (unpruned) when f(s)f(s) has not been seen before, one can show:

Theorem 12.

Let πΦ\pi_{\Phi} be a Markovian policy that solves a class of problems 𝒬\mathcal{Q} optimally with features Φ\Phi that separate the goals. The procedure IWΦ\text{IW}_{\Phi} solves any instance PP in 𝒬\mathcal{Q} optimally in time O(N|Φ|)O(N^{|\Phi|}) where NN is the number of atoms in PP.

IWΦ\text{IW}_{\Phi} can be thought as a standard breadth-first search that treats states with the same feature valuations as “duplicate” states. It runs in time O(N|Φ|)O(N^{|\Phi|}) as the number of features in Φ\Phi is fixed and does not grow with the instance size, as opposed to NN that stands for the number of atoms in the instance.

Proof.

Let s0,s1,,sns_{0},s_{1},\ldots,s_{n} be a goal-reaching trajectory generated by πΦ\pi_{\Phi} on problem PP in 𝒬\mathcal{Q}, and let f0,f1,,fnf_{0},f_{1},\ldots,f_{n} be the sequence of feature valuations for the states in the trajectory. We prove by induction on kk that IWΦ\text{IW}_{\Phi} finds a node nkn^{\prime}_{k} for state sks^{\prime}_{k} such that f(sk)=fkf(s^{\prime}_{k})=f_{k}, the path from the root node n0n_{0} to nkn^{\prime}_{k} is optimal, and the node nkn^{\prime}_{k} is not pruned. The base case k=0k=0 is direct since the empty path leads to s0s_{0} and it is optimal. Let us assume that the claim holds for 0k<n0\leq k<n. By inductive hypothesis, IWΦ\text{IW}_{\Phi} finds a node nkn^{\prime}_{k} for state sks^{\prime}_{k} such that f(sk)=fkf(s^{\prime}_{k})=f_{k}, the path from n0n_{0} to nkn^{\prime}_{k} is optimal, and nkn^{\prime}_{k} is not pruned. Since the transition (sk,sk+1)(s_{k},s_{k+1}) exists in PP, the Markovian property implies that there is a transition (sk,sk+1)(s^{\prime}_{k},s^{\prime}_{k+1}) with f(sk+1)=fk+1f(s^{\prime}_{k+1})=f_{k+1}, and thus IWΦ\text{IW}_{\Phi} generates a node nk+1n^{\prime}_{k+1} for state sk+1s^{\prime}_{k+1}. If nk+1n^{\prime}_{k+1} is the first node where fk+1f_{k+1} holds, it is not pruned. Otherwise, there is another node n′′n^{\prime\prime} that makes fk+1f_{k+1} true and the length of the path from n0n_{0} to n′′n^{\prime\prime} is less than or equal to the length of the path from n0n_{0} to nk+1n^{\prime}_{k+1}. In either case, since the length of an optimal path that achieves fk+1f_{k+1} is k+1k+1 by Theorem 10, IWΦ\text{IW}_{\Phi} finds an optimal path to a node that achieves fk+1f_{k+1} and is not pruned.

We have just shown that IWΦ\text{IW}_{\Phi} finds an optimal path to a state that makes fnf_{n} true. Since the features separate the goals, such a path is an optimal path for PP. ∎

Example.  A general policy for Boxes is given by the rules: {m> 0}{m}\{m{\,>\,}0\}\mapsto\{m\raisebox{0.6458pt}{$\downarrow$}\} and {m= 0,n> 0}{n,m?}\{m{\,=\,}0,n{\,>\,}0\}\mapsto\{n\raisebox{0.6458pt}{$\downarrow$},m?\} where mm and nn are two features that count the number of marbles in a box with a least number of marbles, and the number of boxes left on the table. Since the policy complies with the conditions in Theorem 11 and the two features are represented explicitly in the language B4\mathcal{L}_{B}^{4}, it follows that the width of instances of Boxes in such encoding is 22.  

Example.  Similarly, the policy πΦ\pi_{\Phi} with features Φ={H,n}\Phi=\{H,n\} for 𝒬clear\mathcal{Q}_{clear} given by the two rules in (1) is Markovian, solves 𝒬clear\mathcal{Q}_{clear} optimally, and the features separate goals. Yet there are no atoms representing the counter nn explicitly. Still, Theorem 12 implies that IWΦ\text{IW}_{\Phi} solves the instances in 𝒬clear\mathcal{Q}_{clear} optimally in quadratic time.  

Theorem 11 relates the number of features in a general policy πΦ\pi_{\Phi} that solves 𝒬\mathcal{Q} with the width of 𝒬\mathcal{Q} provided that the features are part of the problem encodings. This, however, is not strictly necessary:

Theorem 13.

Let πΦ\pi_{\Phi} be an optimal and Markovian policy that solves a class 𝒬\mathcal{Q} of problems over some domain DD for which Φ\Phi separates the goals. The width of the problems PP is bounded by kk if for any sequence of feature valuations {fi}i\{f_{i}\}_{i} generated by the policy πΦ\pi_{\Phi} in the way to the goal, there is a sequence of sets of atoms {ti}i\{t_{i}\}_{i} in PP of size at most kk such that the optimal plans for tit_{i} and the optimal plans for fif_{i} coincide.

Proof.

Let f0,,fnf_{0},\ldots,f_{n} be the sequence of feature valuations generated by the policy πΦ\pi_{\Phi} in the way to the goal in some instance PP in 𝒬\mathcal{Q}, and let t0,,tnt_{0},\ldots,t_{n} be the sequence of sets of atoms that comply with the conditions in the theorem, where |ti|k|t_{i}|\leq k for i=0,,ni=0,\ldots,n. We show that t0,,tnt_{0},\ldots,t_{n} is an admissible chain for PP.

First, since any optimal plan for f0f_{0} is an optimal plan for t0t_{0} and the empty plan is optimal plan for f0f_{0}, t0s0t_{0}\subseteq s_{0}. Second, if ρ\rho is an optimal plan for tnt_{n}, ρ\rho is an optimal plan for fnf_{n}, and therefore an optimal plan for PP as the features separate the goals. Finally, if ρ\rho is an optimal plan for tit_{i} we must show that there is an action aa in PP such that (ρ,a)(\rho,a) is an optimal plan for ti+1t_{i+1}, i<ni<n. Let ρ\rho be an optimal plan for tit_{i} that ends in state ss. Since πΦ\pi_{\Phi} is Markovian and ρ\rho is optimal for fif_{i}, there is a transition (s,s)(s,s^{\prime}) in PP with f(s)=fi+1f(s^{\prime})=f_{i+1}. That is, there is an action bb such that the plan (ρ,b)(\rho,b) reaches fi+1f_{i+1}. By Theorem 10, the optimal plans for fi+1f_{i+1} are of length 1+|ρ|1+|\rho|, and thus (ρ,b)(\rho,b) is an optimal plan for fi+1f_{i+1}

Example.  Consider an instance PP in 𝒬clear\mathcal{Q}_{clear} where B1,,BmB_{1},\ldots,B_{m} are the blocks above xx initially, from top to bottom, m>0m>0. The feature valuations in the way to the goal following the Markovian policy πΦ\pi_{\Phi} are fi={H,n=mi}f_{i}=\{H,n=m-i\}, i=1,,mi=1,\ldots,m, and gi={¬H,n=mi}g_{i}=\{\neg H,n=m-i\}, i=0,,m1i=0,\ldots,m-1. The policy is optimal, Markovian, and the features separate the goals. The tuples of atoms in PP that capture these valuations as expressed in Theorem 13 are tfi={hold(Bi)}t_{f_{i}}=\{hold(B_{i})\} and tgi={ontable(Bi)}t_{g_{i}}=\{ontable(B_{i})\} for i>0i>0, and tg0={clear(B1)}t_{g_{0}}=\{clear(B_{1})\}. Since these tuples have size 11, the widths w(P)w(P) and w(𝒬clear)w(\mathcal{Q}_{clear}) are both equal to 11.  

Admissible Chains and Projected Policies

Theorem 13 relates the width of a class 𝒬\mathcal{Q} to the size of the atom tuples tit_{i} in the instances of 𝒬\mathcal{Q} that capture the values of the features fif_{i}, following an optimal policy for 𝒬\mathcal{Q}. We extend this result now by showing that it is often sufficient if the tuples tit_{i} capture the value of the features fif_{i} in some of those trajectories only. The motivation is to explain all proofs of bounded width for infinite classes of problems 𝒬\mathcal{Q} that we are aware of in terms of general policies. For this, we start with the notion of feasible chains:

Definition 14 (Feasible chain).

Let θ=(t0,t1,,tn)\theta=(t_{0},t_{1},\ldots,t_{n}) be a chain of tuples of atoms from PP. The chain θ\theta is feasible in problem PP if t0t_{0} is true in the initial state, the optimal plans for tnt_{n} have length nn, and they are all optimal for PP.

An admissible chain, as used in the definition of width, is a feasible chain that satisfies an extra condition; namely, that every optimal plan ρ\rho for the tuple tit_{i} in the chain can be extended with a single action into an optimal plan for the tuple ti+1t_{i+1}, i=0,1,,n1i=0,1,\ldots,n-1. In order to account for this key property, we map feasible chains into features and policies as follows:

Definition 15.

Let θ=(t0,t1,,tn)\theta=(t_{0},t_{1},\ldots,t_{n}) be a chain of atom tuples from PP, and let t~i(s)\tilde{t}_{i}(s) denote the boolean state feature that is true in ss when tit_{i} is true in ss and tjt_{j} is false for all i<jni<j\leq n, i=1,,ni=1,\ldots,n. The chain defines a policy πθ\pi_{\theta} over PP with rules {t~i}{t~i+1,¬t~i}\{\tilde{t}_{i}\}\mapsto\{\tilde{t}_{i+1},\neg\tilde{t}_{i}\}, i=0,,n1i=0,\ldots,n-1.

The first result gives necessary and sufficient conditions for a chain θ\theta to be admissible.

Theorem 16.

Let θ=(t0,t1,,tn)\theta=(t_{0},t_{1},\ldots,t_{n}) be a chain of tuples for a problem PP. θ\theta is admissible in PP if and only if θ\theta is feasible and the policy πθ\pi_{\theta} solves PP optimally and is Markovian.

Proof.

Let s0s_{0} be the initial state of problem PP. As always, the length |τ||\tau| of a (state) trajectory τ=(s0,,sn)\tau=(s_{0},\ldots,s_{n}) is nn. We first establish some claims:

  1. C1.

    If θ\theta is admissible, there is no optimal trajectory τ\tau for tit_{i} that also reaches tjt_{j} for 0i<jn0\leq i<j\leq n.

    Proof. Any optimal trajectory for tit_{i} can be extended with jij-i transitions into an optimal trajectory for tjt_{j}, thus no optimal trajectory for tit_{i} can reach tjt_{j}, j>ij>i.

  2. C2.

    If θ\theta is admissible, the optimal trajectories for tit_{i} have length ii like the optimal trajectories for t~i\tilde{t}_{i}, 0in0\leq i\leq n.

    Proof. Let τ\tau be an optimal trajectory for tit_{i}, and let τ~\tilde{\tau} be an optimal trajectory for t~i\tilde{t}_{i}. By definition, |τ||τ~||\tau|\leq|\tilde{\tau}|, by C1, |τ~||τ||\tilde{\tau}|\leq|\tau|, and by admissibility of θ\theta, |τ|=i|\tau|=i.

  3. C3.

    If θ\theta is feasible, πθ\pi_{\theta} is optimal and Markovian for PP, and τ\tau is a trajectory for tit_{i}, there is an optimal trajectory for PP of length at most |τ|+ni|\tau|+n-i, 0in0\leq i\leq n.

    Proof. Let τ\tau^{*} be a goal-reaching πθ\pi_{\theta}-trajectory for PP, and let τ\tau be a trajectory for tit_{i}. τ\tau reaches some feature t~j\tilde{t}_{j} for ijni\leq j\leq n. Then, there is an optimal trajectory τj\tau_{j} for t~j\tilde{t}_{j}, and thus for tjt_{j}, of length |τj||τ||\tau_{j}|\leq|\tau|. Using the Markovian property with τ\tau^{*}, the trajectory τj\tau_{j} can be extended with a πθ\pi_{\theta}-transition that reaches t~j+1\tilde{t}_{j+1}. Repeating the argument, we find an optimal trajectory τn\tau_{n} for t~n\tilde{t}_{n} of length at most |τ|+ni|\tau|+n-i.

  4. C4.

    If θ\theta is feasible, πθ\pi_{\theta} is optimal and Markovian for PP, and τ\tau is an optimal trajectory for tit_{i}, |τ|=i|\tau|=i, 0in0\leq i\leq n.

    Proof. Let τ\tau be an optimal trajectory for tit_{i}. By C3, there is an optimal trajectory for tnt_{n} of length at most |τ|+ni|\tau|+n-i. By feasibility of θ\theta, the length of any optimal trajectory for tnt_{n} is exactly nn. Therefore, n|τ|+nin\leq|\tau|+n-i which implies i|τ|i\leq|\tau|. On the other hand, any goal-reaching trajectory τ\tau^{*} induced by πθ\pi_{\theta} contains a subtrajectory of length ii for tit_{i}, and thus |τ|i|\tau|\leq i.

  5. C5.

    If θ\theta is feasible, πθ\pi_{\theta} is optimal and Markovian for PP, and τ\tau is an optimal trajectory for tit_{i}, τ\tau is also optimal for t~i\tilde{t}_{i}, 0in0\leq i\leq n.

    Proof. In the proof of C3, if the feature t~j\tilde{t}_{j} is for j>ij>i, we can find an optimal trajectory for PP of length at most |τ|+njn1|\tau|+n-j\leq n-1 since |τ|=i|\tau|=i by C4. Then, τ\tau reaches t~i\tilde{t}_{i}. On the other hand, if τ\tau^{\prime} is an optimal trajectory for t~i\tilde{t}_{i} with |τ|<|τ||\tau^{\prime}|<|\tau|, using the Markovian property we can construct a goal-reaching trajectory of length strictly less than nn. Therefore, τ\tau is optimal for t~i\tilde{t}_{i}.

Forward implication. Let us assume that θ\theta is admissible for PP. Then, by definition, θ\theta is feasible. Let us now consider a maximal πθ\pi_{\theta}-trajectory τ\tau in PP. Since the initial state s0s_{0} makes true t~0\tilde{t}_{0}, by admissibility and C1 and C2, the trajectory τ\tau is of length nn and ends in a state sns_{n} that makes true t~n\tilde{t}_{n}. In particular, τ\tau is an optimal trajectory for tnt_{n} and thus it is an optimal goal-reaching trajectory for PP. This shows that πθ\pi_{\theta} is optimal. Finally, to see that πθ\pi_{\theta} is Markovian for PP, let (s1,s1)(s_{1},s^{\prime}_{1}) be a πθ\pi_{\theta}-reachable transition and let s2s_{2} be a state closest to s0s_{0} with f(s2)=f(s1)f(s_{2})=f(s_{1}). On one hand, By definition, there is one and only one policy rule in πθ\pi_{\theta} that is compatible with (s1,s1)(s_{1},s^{\prime}_{1}), and thus the states s1s_{1} and s1s^{\prime}_{1} make true the features t~i\tilde{t}_{i} and t~i+1\tilde{t}_{i+1} respectively. On the other hand, by C2, s2s_{2} is a closest state to s0s_{0} that makes tit_{i} true. Hence, by admissibility, the trajectory τ\tau that leads to s2s_{2} can be extended with one transition (s2,s2)(s_{2},s^{\prime}_{2}) into an optimal trajectory for ti+1t_{i+1}. Therefore, by C1, the state s2s^{\prime}_{2} makes true t~i+1\tilde{t}_{i+1}, and the transition (s2,s2)(s_{2},s^{\prime}_{2}) is compatible with πθ\pi_{\theta}. Hence, πθ\pi_{\theta} is Markovian.

Backward implication. Let us assume that θ\theta is feasible, and that πθ\pi_{\theta} is optimal and Markovian for PP. We need to show that θ\theta is an admissible chain; i.e., 1) t0t_{0} is true in the initial state s0s_{0}, 2) any optimal plan for tit_{i} can be extended into an optimal plan for ti+1t_{i+1} by adding a single action, and 3) any optimal plan for tnt_{n} is an optimal plan for PP. Conditions 1 and 3 follow directly from the definition of feasible chains. For 2, let τ=(s0,,si)\tau=(s_{0},\ldots,s_{i}) be an optimal trajectory that reaches tit_{i}; its length is ii by C4. We want to show that τ\tau can be extended with a transition (si,si+1)(s_{i},s_{i+1}) into an optimal trajectory for ti+1t_{i+1}. By C5, τ\tau is an optimal trajectory for t~i\tilde{t}_{i}, and thus, by the Markovian property, there is a transition (si,si+1)(s_{i},s_{i+1}) that is compatible with πθ\pi_{\theta}. Therefore, the state si+1s_{i+1} makes true t~i+1\tilde{t}_{i+1} and tit_{i}. The extended trajectory τ=(s0,,si+1)\tau^{\prime}=(s_{0},\ldots,s_{i+1}) is optimal for ti+1t_{i+1} by C4. ∎

This result connects admissible chains with policies that are optimal and Markovian, but it does not connect admissible chains with general policies. This is done next. We first define when a policy π1\pi_{1} can be regarded as a projection of another policy π2\pi_{2}:

Definition 17 (Projection).

Let πΦ\pi_{\Phi} be a policy over a class 𝒬\mathcal{Q} and let πΦ\pi_{\Phi^{\prime}} be a policy for a problem PP in 𝒬\mathcal{Q}. The policy πΦ\pi_{\Phi^{\prime}} is a projection of πΦ\pi_{\Phi} in PP if every maximal state trajectory compatible with πΦ\pi_{\Phi^{\prime}} in PP is a maximal state trajectory in PP compatible with πΦ\pi_{\Phi}.

Notice that it is not enough for the state trajectories s0,,sis_{0},\ldots,s_{i} compatible with πΦ\pi_{\Phi^{\prime}} to be state trajectories compatible with πΦ\pi_{\Phi}; it is also required that if sis_{i} is a final state in the first trajectory that it is also a final state in the second one. This rules out the possibility that there is a continuation of the first trajectory that is compatible with πΦ\pi_{\Phi} but not with πΦ\pi_{\Phi^{\prime}}. A result of this is that if πΦ\pi_{\Phi} is optimal for 𝒬\mathcal{Q}, the projected policy πΦ\pi_{\Phi^{\prime}} must be optimal for PP. From this, the main theorem of this section follows:

Theorem 18.

Let πΦ\pi_{\Phi} be an optimal policy for a class 𝒬\mathcal{Q} of problems. If for any problem PP in 𝒬\mathcal{Q}, there is a feasible chain θ\theta of size at most kk such that πθ\pi_{\theta} is a projection of πΦ\pi_{\Phi} in PP that is Markovian, then w(𝒬)kw(\mathcal{Q})\leq k.

Proof.

Theorem 16 implies that if θ\theta is feasible and πθ\pi_{\theta} is Markovian and a projection of an optimal policy πΦ\pi_{\Phi} in P𝒬P\in\mathcal{Q}, then θ\theta is admissible, and hence that w(P)w(P) is bounded by the size of θ\theta (i.e., max size of a tuple in θ\theta). If this size is bounded by kk for all PP in 𝒬\mathcal{Q}, w(𝒬)w(\mathcal{Q}) is bounded by kk. ∎

While the proof of this theorem is a direct consequence of Theorem 16, the result is important as it renders explicit the logic underlying all proofs of bounded width for meaningful classes of problems 𝒬\mathcal{Q} that we are aware of. In all such cases, the proofs have been constructed by finding feasible chains with tuples tit_{i} that are a function of the instance P𝒬P\in\mathcal{Q}, and whose role is to capture suitable projections of some general optimal policy πΦ\pi_{\Phi}.

Example.  Let 𝒬G\mathcal{Q}_{G} be a grid navigation domain where an agent at some initial cell has to move to a target cell with atoms of the form x(i)x(i) and y(j)y(j) that specify the coordinates of the agent along the two axes. An optimal policy for 𝒬G\mathcal{Q}_{G} can be obtained with the singleton Φ={d}\Phi=\{d\} where the linear feature dd measures the distance to the target cell, d= 0d{\,=\,}0 identifies the goal states, and there is a single policy rule {d> 0}{d}\{d{\,>\,}0\}\mapsto\{d\raisebox{0.6458pt}{$\downarrow$}\}. Let PP be a problem in 𝒬G\mathcal{Q}_{G} where the agent is initially located at cell (i0,j0)(i_{0},j_{0}) and the goal cell is (in,jm)(i_{n},j_{m}), which for simplicity satisfies i0ini_{0}\leq i_{n} and j0jmj_{0}\leq j_{m}. Further, let θ=(t0,t1,,tn+m)\theta=(t_{0},t_{1},\ldots,t_{n+m}) be the chain of tuples where tk={x(i0+k)}t_{k}=\{x(i_{0}+k)\} for 0k<n0\leq k<n, and tk={x(in),y(j0+kn)}t_{k}=\{x(i_{n}),y(j_{0}+k-n)\} for nkm+nn\leq k\leq m+n. Then, since θ\theta is feasible, and πθ\pi_{\theta} is the projection of the optimal policy πΦ\pi_{\Phi} while being Markovian, it follows from Theorem 18 that w(𝒬G)2w(\mathcal{Q}_{G})\leq 2. Notice that the tuples tit_{i} in θ\theta do not capture the values did_{i} of the distance feature dd in all the trajectories that are compatible with the policy πΦ\pi_{\Phi} in PP (an exponential number of such trajectories), but in one such trajectory only; namely, the one that is compatible with the projection πθ\pi_{\theta} of πΦ\pi_{\Phi}, with states sis_{i}, i=0,,n+mi=0,\ldots,n+m, where the tuple tit_{i} is true iff the feature t~i\tilde{t}_{i} used in πθ\pi_{\theta} is true.  

Example.  In the Delivery (DD) domain an agent moves in a grid to pick up packages and deliver them to a target cell, one by one; Delivery-1 (D1D_{1}) is the version with one package. A general policy πΦ\pi_{\Phi} for DD and D1D_{1} is given in terms of four rules that capture: move to the (nearest) package, pick it up, move to the target cell, and drop the package, in a cycle, until no more packages are left. The rules can be expressed in terms of the features Φ={H,p,t,n}\Phi=\{H,p,t,n\} that express holding, distance to the nearest package (zero if agent is holding a package or no package to be delivered remains), distance to the target cell, and the number of undelivered packages respectively. Hence, n= 0n{\,=\,}0 identifies the goal states for the problems in the classes 𝒬D\mathcal{Q}_{D} and 𝒬D1\mathcal{Q}_{D_{1}} for the problems DD and D1D_{1} respectively. The rules that define the policy πΦ\pi_{\Phi} are {¬H,p> 0}{p,t?}\{\neg H,p{\,>\,}0\}\mapsto\{p\raisebox{0.6458pt}{$\downarrow$},t?\}, {¬H,p= 0}{H}\{\neg H,p{\,=\,}0\}\mapsto\{H\}, {H,t> 0}{t}\{H,t{\,>\,}0\}\mapsto\{t\raisebox{0.6458pt}{$\downarrow$}\}, and {H,n> 0,t= 0}{¬H,n,p?}\{H,n{\,>\,}0,t{\,=\,}0\}\mapsto\{\neg H,n\raisebox{0.6458pt}{$\downarrow$},p?\}.444A different formulation involves packages that need to be delivered to target cells that depend on the package. The set of features is the same Φ={H,p,t,n}\Phi=\{H,p,t,n\} except that tt is defined as the distance to the current target cell (zero if agent holds nothing or there are no more packages to deliver). A policy for this formulation has the rules {¬H,p> 0}{p}\{\neg H,p{\,>\,}0\}\mapsto\{p\raisebox{0.6458pt}{$\downarrow$}\}, {¬H,p= 0}{H,t}\{\neg H,p{\,=\,}0\}\mapsto\{H,t\raisebox{0.6458pt}{$\uparrow$}\}, {H,t> 0}{t}\{H,t{\,>\,}0\}\mapsto\{t\raisebox{0.6458pt}{$\downarrow$}\}, and {H,n> 0,t= 0}{¬H,n,p?}\{H,n{\,>\,}0,t{\,=\,}0\}\mapsto\{\neg H,n\raisebox{0.6458pt}{$\downarrow$},p?\}.

Let us consider a problem PP in the class 𝒬D1\mathcal{Q}_{D_{1}}. If the encoding of PP contains atoms like at(celli)at(cell_{i}), atp(pkgi,cellj)atp(pkg_{i},cell_{j}), hold(pkgi)hold(pkg_{i}), and emptyempty, it can be shown that 𝒬D1\mathcal{Q}_{D_{1}} has width 22. Indeed, without loss of generality, let us assume that in PP, the package is initially at cellicell_{i} and has to be delivered at celltcell_{t}, and the agent is initially at cell0cell_{0}, and let θ=(t0,t1,,tn)\theta=(t_{0},t_{1},\ldots,t_{n}) be the chain made of tuples of the form {at(cellk)}\{at(cell_{k})\} for the cells on a shortest path from cell0cell_{0} to cellicell_{i}, followed by tuples of the form {at(cellk),hold(pkgj)}\{at(cell_{k}),hold(pkg_{j})\}, with cellkcell_{k} now ranging over a shortest path from cellicell_{i} up to celltcell_{t}, and a last tuple tnt_{n} of the form {at(pkgj,cellt)}\{at(pkg_{j},cell_{t})\}. It is easy to show that the chain θ\theta is feasible, and πθ\pi_{\theta} is the projection of the general policy πΦ\pi_{\Phi} that in D1D_{1} is both optimal and Markovian (although not in DD). Then, by Theorems 16 and 18, it follows that the chain θ\theta is admissible, and w(𝒬D1)2w(\mathcal{Q}_{D_{1}})\leq 2.  

Serialized Width

Theorem 18 suggests that bounded width is most often the result of general policies that can be projected on suitable tuples of atoms. We extend now these relations to the much larger and interesting class of problems that have bounded serialized width, where the general policies do not have to be optimal or Markovian. The notion of serialized width is based on the decomposition of problems into subproblems, and explicates and generalizes the logic and scope of the Serialized IW (SIW) algorithm of Lipovetzky and Geffner (2012). There are indeed many domains with large or unbounded widths that are simple and admit simple policies, like the Delivery domain above.

In SIW, problems are serialized by using one feature, a goal counter denoted by #g\#g, that tracks the number of unachieved goal atoms, assuming that goals are conjunctions of atoms. Other serializations have appealed to other counters and also to heuristics (Lipovetzky and Geffner 2017a).555These serializations appear in state-of-the-art algorithms like BFWS that use novelty measures inside complete best-first procedures.

We take a more general and abstract approach that replaces the goal counter by a strict partial order (an irreflexive and transitive binary relation), over the feature valuations, also referred to as Φ\Phi-tuples. A serialization for a class 𝒬\mathcal{Q} of problems is defined as follows:

Definition 19 (Serializations).

Let 𝒬\mathcal{Q} be a class of problems, let Φ\Phi be a set of goal-separating features for 𝒬\mathcal{Q}, and let \prec be a strict partial order over Φ\Phi-tuples. The pair (Φ,)(\Phi,\prec) is a serialization over 𝒬\mathcal{Q} if 1) the ordering \prec is well-founded; i.e. there is no infinite descending chain f1f2f3f_{1}\succ f_{2}\succ f_{3}\succ\cdots where fifi+1f_{i}\succ f_{i+1} stands for fi+1fif_{i+1}\prec f_{i}, and 2) the goal feature valuations ff are \prec-minimal; i.e., no fff^{\prime}\prec f for any feature valuation ff^{\prime}.

A serialization over 𝒬\mathcal{Q} decomposes each problem PP into subproblems which define the width of the serialization or serialized width:

Definition 20 (Subproblems).

Let (Φ,)(\Phi,\prec) be a serialization. The subproblem P[s,]P[s,\prec] is the problem of finding a state ss^{\prime} reachable from ss such that ss^{\prime} is goal in PP or f(s)f(s)f(s^{\prime})\prec f(s); i.e., the goal states in P[s,]P[s,\prec] are the goal states in PP and the states ss^{\prime} such that f(s)f(s)f(s^{\prime})\prec f(s). The collection P[]P[\prec] of subproblems for problem PP is the smallest subset of problems P[s,]P[s,\prec] that comply with:

  1. 1.

    if the initial state s0s_{0} in PP is not goal, P[s0,]P[]P[s_{0},\prec]\in P[\prec],

  2. 2.

    P[s,]P[]P[s^{\prime},\prec]\in P[\prec] if P[s,]P[]P[s,\prec]\in P[\prec] and ss^{\prime} is a non-goal state in PP that is at shortest distance from ss such that f(s)f(s)f(s^{\prime})\prec f(s), and no goal state of PP is strictly closer from ss than ss^{\prime}

Definition 21 (Serialized width).

Let (Φ,)(\Phi,\prec) be a serialization for a collection 𝒬\mathcal{Q} of problems. The serialized width of problem PP relative to (Φ,)(\Phi,\prec) is kk, written as wΦ(P)=kw_{\Phi}(P)=k with the ordering “\prec” left implicit, if there is a subproblem P[s,]P[s,\prec] in P[]P[\prec] that has width kk, and every other subproblem in P[]P[\prec] has width at most kk. The serialized width for 𝒬\mathcal{Q} is kk, written as wΦ(𝒬)=kw_{\Phi}(\mathcal{Q})=k, if wΦ(P)=kw_{\Phi}(P)=k for some problem P𝒬P\in\mathcal{Q}, and wΦ(P)kw_{\Phi}(P)\leq k every other problem P𝒬P^{\prime}\in\mathcal{Q}.

If a class of problems 𝒬\mathcal{Q} has bounded serialized width and the ordering fff\prec f^{\prime} can be tested in polynomial time, each problem PP in 𝒬\mathcal{Q} can be solved in polynomial time using a variant SIWΦ\text{SIW}_{\Phi} of the SIW algorithm: starting at the state s=s0s=s_{0}, SIWΦ\text{SIW}_{\Phi} performs an IW search from ss to find a state ss^{\prime} that is a goal state or that renders the precedence constraint f(s)f(s)f(s^{\prime})\prec f(s) true. If ss^{\prime} is not a goal state, ss is set to ss^{\prime}, s:=ss:=s^{\prime}, and the loop repeats until a goal state is reached. The result below follows from the observations by Lipovetzky and Geffner (2012) once the goal counter is replaced by a partial order, and the notion of serialized width is suitably formalized:

Theorem 22.

Let 𝒬\mathcal{Q} be a collection of problems and let (Φ,)(\Phi,\prec) be a serialization for 𝒬\mathcal{Q} with width wΦ(𝒬)kw_{\Phi}(\mathcal{Q})\leq k. Any problem PP in 𝒬\mathcal{Q} is solved by SIWΦ in time and space bounded by O(bN|Φ|+2k1Λ)O(bN^{|\Phi|+2k-1}\Lambda) and O(bNk+N|Φ|+k)O(bN^{k}+N^{|\Phi|+k}) respectively, where bb bounds the branching factor in PP, NN is the number of atoms in PP, and Λ\Lambda bounds the time to test the order \prec for the Φ\Phi-tuples that arise from the states in PP.

Proof.

We show something more general. If PP is a problem in 𝒬\mathcal{Q}, SIWΦ\text{SIW}_{\Phi} finds a plan ρ\rho for PP and state sequence s0,s1,,sns_{0},s_{1},\ldots,s_{n} such that

  1. 1.

    the state s0s_{0} is the initial state in PP,

  2. 2.

    the state si+1s_{i+1}, reachable from state sis_{i}, can be found by IW(k)\text{IW}(k) on the problem P[si,]P[s_{i},\prec], for i=0,1,,n1i=0,1,\ldots,n-1,

  3. 3.

    f(si+1)f(si)f(s_{i+1})\prec f(s_{i}), for i=0,1,,n1i=0,1,\ldots,n-1,

  4. 4.

    sns_{n} is a goal state, and

  5. 5.

    |ρ|N|Φ|+k|\rho|\leq N^{|\Phi|+k}.

SIWΦ\text{SIW}_{\Phi} achieves this in time and space bounded by O(bN|Φ|+2k1Λ)O(bN^{|\Phi|+2k-1}\Lambda) and O(bNk+N|Φ|+k)O(bN^{k}+N^{|\Phi|+k}) respectively.

By definition, SIWΦ\text{SIW}_{\Phi} calls IW iteratively to find such a sequence. In the first call, over subproblem P[s0,]P[s_{0},\prec], IW finds a state s1s_{1} such that f(s1)f(s0)f(s_{1})\prec f(s_{0}). Then, iteratively, after sis_{i} is found, IW is called to solve the subproblem P[si,]P[s_{i},\prec]. The process continues until IW finds a goal state sns_{n}.

Let us assume that such a sequence is found by SIWΦ\text{SIW}_{\Phi}. A run of IW involves runs of IW(1),IW(2),\text{IW}(1),\text{IW}(2),\ldots until the problem is solved (guaranteed to succeed before the call to IW(k+1)\text{IW}(k+1) since wΦ(𝒬)kw_{\Phi}(\mathcal{Q})\leq k). By Theorem 2 and the assumption on the cost of testing \prec, each call to IW requires time and space bounded by O(bN2k1Λ)O(bN^{2k-1}\Lambda) and O(bNk)O(bN^{k}) respectively. The plan ρi\rho_{i} found by IW that maps sis_{i} into si+1s_{i+1} has length bounded by NkN^{k}. All these plans are concatenated into a single plan ρ\rho that solves PP. Since the number of Φ\Phi-tuples is bounded by N|Φ|N^{|\Phi|}, SIWΦ\text{SIW}_{\Phi} invests total time and space bounded by O(bN|Φ|+2k1)O(bN^{|\Phi|+2k-1}) and O(bNk+N|Φ|+k)O(bN^{k}+N^{|\Phi|+k}) respectively, and |ρ|N|Φ|+k|\rho|\leq N^{|\Phi|+k}.

We now show that the state sequence is indeed found. The first requirement is clear since s0s_{0} is the initial state in PP. We reason inductively to show that the subproblem P[si,]P[s_{i},\prec] belongs to the collection P[]P[\prec] of subproblems, and that the state si+1s_{i+1} is at minimum distance from sis_{i} such that f(si+1)f(si)f(s_{i+1})\prec f(s_{i}).

For the base case, P[s0,]P[s_{0},\prec] belongs to P[]P[\prec] by definition of P[]P[\prec] when s0s_{0} is not a goal state; if s0s_{0} is a goal state, the sequence with the single state s0s_{0} satisfies the requirements. Since wΦ(P)kw_{\Phi}(P)\leq k, IW(k)\text{IW}(k) is guaranteed to find a state s1s_{1} at minimum distance from s0s_{0} that is a goal or f(s1)f(s0)f(s_{1})\prec f(s_{0}). If s1s_{1} is a goal state, the sequence s0,s1s_{0},s_{1} satisfies the requirements. Otherwise, by definition of P[]P[\prec], P[s1,]P[s_{1},\prec] belongs to P[]P[\prec]. Assume now that sis_{i} is not a goal state and P[si,]P[s_{i},\prec] belongs to P[]P[\prec]. A similar argument shows that IW(k)\text{IW}(k) finds a state si+1s_{i+1} at minimum distance from sis_{i} that is a goal or f(si+1)f(si)f(s_{i+1})\prec f(s_{i}). If si+1s_{i+1} is a goal state, SIWΦ\text{SIW}_{\Phi} has found the required state sequence. Otherwise, the problem P[si+1,]P[s_{i+1},\prec] belongs to P[]P[\prec]. Finally, since the number of Φ\Phi-tuples is finite, this process terminate with a goal state sns_{n}. ∎

The class 𝒬VA\mathcal{Q}_{V\!A} of instances for the problem VisitAll, where all cells in a grid must be visited, has serialized width wΦ(𝒬VA)=1w_{\Phi}(\mathcal{Q}_{V\!A})=1 for Φ={#g}\Phi=\{\#g\} where #g\#g counts the number of unvisited cells (unachieved goals), with the obvious ordering (’\prec’ set to ’<<’). The class 𝒬BW\mathcal{Q}_{BW} of Blocksworld instances cannot be serialized into subproblems of bounded width with Φ={#g}\Phi^{\prime}=\{\#g\} because achieved goals may have to be undone, yet with Φ={#m}\Phi=\{\#m\} where #m\#m counts the number of misplaced blocks (i.e., not on their targets or above one such block), wΦ(𝒬BW)=2w_{\Phi}(\mathcal{Q}_{BW})=2. The result above implies that SIWΦ solves these problems in polynomial time.

From General Policies to Serializations

We show next how serializations can be inferred from general policies, a result that paves the way to introduce a convenient language for defining serializations. Given a general policy πΦ\pi_{\Phi} that solves a class of problems 𝒬\mathcal{Q} for features Φ\Phi that separates the goals, the intuition is to consider serializations (Φ,)(\Phi,\prec) where fff^{\prime}\prec f if there is a state transition (s,s)(s,s^{\prime}) compatible with the policy πΦ\pi_{\Phi} in some instance PP in 𝒬\mathcal{Q} with f=f(s)f^{\prime}=f(s^{\prime}) and f=f(s)f=f(s). The problem with this ordering, however, is that it does not necessarily define a strict partial order as required, as it is possible that there is another transition (s1,s2)(s_{1},s_{2}) compatible with the policy in another instance P𝒬P^{\prime}\in\mathcal{Q} where f(s1)=ff(s_{1})=f^{\prime} and f(s2)=ff(s_{2})=f. In other words, even if the policy orders the feature valuations in a strict partial order in an instance PP, it is possible that the orderings over two or more instances in 𝒬\mathcal{Q} cannot be combined into a single strict partial order.

Thus in order to obtain a serialization (Φ,)(\Phi,\prec) over a class of problems 𝒬\mathcal{Q} from a policy πΦ\pi_{\Phi} certain additional restrictions are required. For this, we focus on the policies πΦ\pi_{\Phi} that solve 𝒬\mathcal{Q} structurally; i.e., by virtue of the effects of the actions on the features as expressed in the policy rules in πΦ\pi_{\Phi}. A policy solves a class of problems 𝒬\mathcal{Q} structurally when this can be verified from the form of the policy rules without having to consider the instances in 𝒬\mathcal{Q}. Fortunately, we do not have to formulate the conditions under which a policy πΦ\pi_{\Phi} solves a class of problems 𝒬\mathcal{Q} structurally from scratch, we can use ideas developed in the context of QNPs (Srivastava et al. 2011; Bonet and Geffner 2020).

The key idea is the notion of terminating policy graph. Recall that every feature valuation ff defines a projected boolean valuation b(f)b(f) over the expressions pp and n=0n=0 for the boolean and numerical features pp and nn in Φ\Phi, where pp and n=0n=0 are true in b(f)b(f) iff pp and n=0n=0 are true in ff respectively. The policy graph for policy πΦ\pi_{\Phi} is:

Definition 23 (Policy graph).

The policy graph G(πΦ)G(\pi_{\Phi}) for policy πΦ\pi_{\Phi} has nodes bb, one for each of the 2|Φ|2^{|\Phi|} boolean feature valuations over Φ\Phi, and edges bbb\rightarrow b^{\prime} labeled with EE if bb is not a goal valuation and (b,b)(b,b^{\prime}) is compatible with a rule CEC\rightarrow E in the policy.

The edge bbb\rightarrow b^{\prime} is compatible with a rule CEC\rightarrow E if CC is true in bb, and (b,b)(b,b^{\prime}) is compatible with EE; namely, if pp (resp. ¬p\neg p) is in EE, pp (resp. ¬p\neg p) must be true in bb^{\prime}, and if nn\raisebox{0.6458pt}{$\uparrow$} is in EE, n= 0n{\,=\,}0 must be false in EE. Effects p?p?, n?n?, and nn\raisebox{0.6458pt}{$\downarrow$} in EE put no constraints on bb^{\prime}. In particular, in the latter case, n=0n=0 can be either true or false in bb^{\prime}, meaning that after a decrement, a numerical feature may remain positive or have value 0. Notice that the policy graph associated with a policy πΦ\pi_{\Phi} does not take the class of problems 𝒬\mathcal{Q} into account. Indeed, the policy graph may have an edge bbb\rightarrow b^{\prime} even if there is no state transition (s,s)(s,s^{\prime}) in an instance in 𝒬\mathcal{Q} where this transition between boolean feature valuations arises. The policy graph and the notion of termination below are purely structural as they depend on the form of the policy rules only:

Definition 24 (Termination).

A policy πΦ\pi_{\Phi} and a policy graph G(πΦ)G(\pi_{\Phi}) are terminating if for every cycle in the graph G(πΦ)G(\pi_{\Phi}), i.e., any sequence of edges bibi+1b_{i}\rightarrow b_{i+1} with labels EiE_{i} that start and end in the same node, there is a numerical feature nn in Φ\Phi that is decreased along some edge and increased in none. That is, nEkn\raisebox{0.6458pt}{$\downarrow$}\in E_{k} for some kk-th edge in the cycle, and nEjn\raisebox{0.6458pt}{$\uparrow$}\not\in E_{j} and n?Ejn?\not\in E_{j} for all others edges in the cycle.

The termination condition can be checked in time that is polynomial in the size of the policy graph by a procedure called Sieve that looks at the strongly connected components in the graph (Srivastava et al. 2011; Bonet and Geffner 2020).666See Bonet and Geffner (2020) for more details on the formal properties of termination in QNPs. Our setting is slightly different as our numerical features are linear and cannot grow without bound as in QNPs, but we do not use this property to define termination. A key property of terminating policies is that they give rise to state trajectories that are finite.

Theorem 25.

Let πΦ\pi_{\Phi} be a terminating policy with features Φ\Phi over 𝒬\mathcal{Q} that separate the goals. Then, πΦ\pi_{\Phi} cannot give rise to infinite state trajectories in instances PP in 𝒬\mathcal{Q}.

Proof.

For a proof by contradiction, let us assume that there is an infinite state trajectory s0,s1,s_{0},s_{1},\ldots that is compatible with πΦ\pi_{\Phi}. Since the number of states in PP is finite, there must be indices i<ji<j such that si=sjs_{i}=s_{j}. The subsequence si,,sjs_{i},\ldots,s_{j} defines a cycle b(f(si)),,b(f(sj))b(f(s_{i})),\ldots,b(f(s_{j})) in the policy graph for πΦ\pi_{\Phi}. The termination condition implies that there is a numerical feature nn in Φ\Phi that is decreased by some edge in the cycle but not increased by any edge in the cycle. However, this is impossible since si=sjs_{i}=s_{j} implies that the feature nn has the same value on both states. Therefore, such infinite state trajectory cannot exist. ∎

Since terminating policies induce state trajectories with a final state, a sufficient condition for a terminating policy to solve a class 𝒬\mathcal{Q} is to be closed:

Definition 26 (Closed policy).

A policy πΦ\pi_{\Phi} is closed over a class of problems 𝒬\mathcal{Q} if for any non-goal state ss in a problem PP in 𝒬\mathcal{Q} that is πΦ\pi_{\Phi}-reachable, there is a transition (s,s)(s,s^{\prime}) that is compatible with πΦ\pi_{\Phi}.

Theorem 27.

If πΦ\pi_{\Phi} is closed over 𝒬\mathcal{Q}, the features in Φ\Phi separate goals in 𝒬\mathcal{Q}, and πΦ\pi_{\Phi} is terminating, πΦ\pi_{\Phi} solves 𝒬\mathcal{Q}.

Proof.

If πΦ\pi_{\Phi} does not solve instance PP in 𝒬\mathcal{Q}, there is a maximal non-goal reaching trajectory in PP that is compatible with πΦ\pi_{\Phi}. This trajectory is either infinite or terminates at state sns_{n} where no transition (sn,s)(s_{n},s) compatible with πΦ\pi_{\Phi} exists in PP. The former is impossible by Theorem 25, and the latter is impossible since πΦ\pi_{\Phi} is closed. ∎

We are interested in the conditions under which general policies determine serializations, and in particular, serializations with bounded width. For the first part, we do not need the policy to be closed or solve 𝒬\mathcal{Q}, but just to be terminating:

Theorem 28.

A terminating policy πΦ\pi_{\Phi} with features Φ\Phi over 𝒬\mathcal{Q} that separate goals determines a serialization (Φ,)(\Phi,\prec) of 𝒬\mathcal{Q} where ‘\prec’ is the minimal strict partial relation (i.e., the transitive closure) that satisfies fff^{\prime}\prec f for a non-goal valuation ff, if (f,f)(f,f^{\prime}) is compatible with a policy rule in πΦ\pi_{\Phi}.

Here a pair of feature valuations (f,f)(f,f^{\prime}) is compatible with a rule CEC\rightarrow E, if CC is true in ff and the change in values from ff to ff^{\prime} is compatible with EE. Notice that if a state transition (s,s)(s,s^{\prime}) is compatible with CEC\rightarrow E, the pair of feature valuations (f(s),f(s))(f(s),f(s^{\prime})) is compatible with the rule, but the existence of such a state transition is not a requirement for the pair (f,f)(f,f^{\prime}) to be compatible with the policy rule; they can be arbitrary feature valuations over Φ\Phi.

Proof.

We must show that \prec is irreflexive and well founded. For the first, assume that there is a sequence of feature valuations f1f2fnf_{1}\prec f_{2}\prec\cdots\prec f_{n} with n>1n>1 and f1=fnf_{1}=f_{n}. Without loss of generality, we may assume that the pair (fi,fi+1)(f_{i},f_{i+1}) is compatible with a policy rule (i.e., fi+1fif_{i+1}\prec f_{i} is not the result of the transitive closure). In such a case, the boolean valuations bib_{i} associated with the feature valuations fif_{i} must form a cycle in the policy graph, and therefore from the termination condition, some variable must be decreased and not increased in the cycle, which contradicts f1=fnf_{1}=f_{n}.

For well-foundness, let us assume that there is an infinite chain f1f2f_{1}\succ f_{2}\succ\cdots where fifi+1f_{i}\succ f_{i+1} stands for fi+1fif_{i+1}\prec f_{i}, and let us assume, without loss of generality, that (fi,fi+1)(f_{i},f_{i+1}) satisfies a policy rule. Consider the sequence boolean valuations {bi}i\{b_{i}\}_{i} associated with the feature valuations {fi}i\{f_{i}\}_{i}, let bb be a boolean valuation that appears infinitely often in the sequence, and let BB be the set of edges bbb\rightarrow b^{\prime} in the policy graph that are traversed infinitely often in the sequence. Then, by definition of BB, there is an infinite subsequence {ij}j0\{i_{j}\}_{j\geq 0} of indices such that bij=bb_{i_{j}}=b and all the edges traversed in the sequence bij,,bij+1b_{i_{j}},\ldots,b_{i_{j+1}} belong to BB, for j0j\geq 0. By definition of termination, there is a numerical feature nn that is decremented in some edge in BB but not incremented by any edge in BB. Yet this is impossible, since the numerical feature nn is integer valued, non-negative, it is decremented infinitely often, and never incremented. The contradiction comes for supposing the existence of the infinite chain f1f2f_{1}\succ f_{2}\succ\cdots. Therefore, \prec is a well founded strict partial order. ∎

There are simple syntactic conditions that ensure that a set of policy rules and the graph that they define are terminating (Bonet and Geffner 2020). For example: a) each policy rule decreases some feature and increases none, b) there is an ordering n1,,nkn_{1},\ldots,n_{k} of the numerical features such that each policy rule decreases a feature nin_{i}, while possibly increasing features njn_{j}, j>ij>i, and c) a further generalization where b) holds except in some policy rules that affect boolean features only, which cannot produce a cycle of truth valuations over the boolean features without involving a policy rule that affects some numerical variable. If the set of policy rules complies with the conditions a) or b), the rules are said to be regular; if they comply with c), the rules are said to be weakly regular. In either case, these are sufficient syntactic conditions that ensure termination; unlike the conditions captured by Definition 24, they are not necessary.

Example.  The policy πΦ\pi_{\Phi} for Boxes with rules {m> 0}{m}\{m{\,>\,}0\}\mapsto\{m\raisebox{0.6458pt}{$\downarrow$}\} and {m= 0,n> 0}{n,m?}\{m{\,=\,}0,n{\,>\,}0\}\mapsto\{n\raisebox{0.6458pt}{$\downarrow$},m?\} and goal n= 0n{\,=\,}0 is closed and regular. It is closed since for any state ss where n> 0n{\,>\,}0, there is a transition (s,s)(s,s^{\prime}) that is compatible with πΦ\pi_{\Phi}: if m= 0m{\,=\,}0, ss^{\prime} results from an action that puts an empty box away, while if m> 0m{\,>\,}0, ss^{\prime} results from an action that puts a marble away from a box with a least number of marbles. Theorem 27 implies that πΦ\pi_{\Phi} solves 𝒬B\mathcal{Q}_{B}.  

If a terminating policy for the class 𝒬\mathcal{Q} defines a serialization over 𝒬\mathcal{Q}, a terminating policy that solves 𝒬\mathcal{Q} should define a serialization over 𝒬\mathcal{Q} with 0 width. Two conditions are needed for this though. The first is the notion of goal connectedness:

Definition 29 (Goal connected).

A policy πΦ\pi_{\Phi} for 𝒬\mathcal{Q} with goal separating features and its policy graph are said to be goal connected when all nodes b(f(s0))b(f(s_{0})) associated with the initial states s0s_{0} of instances PP in 𝒬\mathcal{Q} are connected only to nodes bb that are connected with goal nodes.

Clearly, a policy πΦ\pi_{\Phi} for 𝒬\mathcal{Q} is not closed if its policy graph is not goal-connected, but goal-connectedness does not imply that the policy is closed. For this, we need a condition that goes beyond the structure of the policy graph:777The notion of soundness is similar to action soundness in QNPs when QNPs are used to abstract classes of problems (Bonet and Geffner 2018; Bonet, Frances, and Geffner 2019).

Definition 30 (Sound policy).

A policy πΦ\pi_{\Phi} over 𝒬\mathcal{Q} is sound if for any reachable non-goal state ss in an instance PP in 𝒬\mathcal{Q} where the policy rule CEC\mapsto E is applicable (i.e., where CC holds), there is a transition (s,s)(s,s^{\prime}) in PP that is compatible with πΦ\pi_{\Phi}.

Soundness and goal connectedness imply that a policy is closed, and both are critical for establishing the conditions under which the serialization induced by a terminating policy has zero width:

Theorem 31.

If πΦ\pi_{\Phi} is sound and goal-connected in 𝒬\mathcal{Q}, then πΦ\pi_{\Phi} is closed in 𝒬\mathcal{Q}.

From Theorem 27, it follows that a terminating policy πΦ\pi_{\Phi} that is sound and goal-connected in 𝒬\mathcal{Q}, solves 𝒬\mathcal{Q}. In such a case, we say that the policy πΦ\pi_{\Phi} solves the class of problems 𝒬\mathcal{Q} structurally, as two of the conditions, termination and goal-connectedness, can be tested on the policy graph. Soundness, on the other hand, is the condition that ensures that only sink nodes in the policy graph over any instance P𝒬P\in\mathcal{Q} are the goal nodes.

Proof.

If πΦ\pi_{\Phi} is not closed on some instance PP in 𝒬\mathcal{Q}, there is a non-goal state ss in PP that is reachable from the initial state s0s_{0} using πΦ\pi_{\Phi}, but there is no transition (s,s)(s,s^{\prime}) that is compatible with πΦ\pi_{\Phi}. The node b(f(s0))b(f(s_{0})) is connected in the policy graph to b(f(s))b(f(s)), which from the definition of goal connectedness, must be connected to a goal node. This implies that there is an edge b(f(s))bb(f(s))\rightarrow b^{\prime} in the policy graph, and therefore there is some policy rule CEC\mapsto E that is applicable at ss. Soundness of πΦ\pi_{\Phi} then implies that there is a transition (s,s)(s,s^{\prime}) that is compatible with πΦ\pi_{\Phi}, a contradiction. ∎

As expected, if a terminating policy πΦ\pi_{\Phi} solves 𝒬\mathcal{Q}, the width of 𝒬\mathcal{Q} under the serialization determined by the policy (Theorem 28) is 0, provided however, that certain structural conditions hold:

Theorem 32.

Let πΦ\pi_{\Phi} be a policy that solves 𝒬\mathcal{Q} structurally and let (Φ,)(\Phi,\prec) be the serialization over 𝒬\mathcal{Q} determined by πΦ\pi_{\Phi}. If πΦ\pi_{\Phi} is sound and goal connected, wΦ(𝒬)=0w_{\Phi}(\mathcal{Q})=0.

Proof.

By definition of serialized width, for any problem PP in 𝒬\mathcal{Q}, we need to show that the width of any subproblem P[s,]P[s,\prec] in P[]P[\prec] is zero. For this, it is enough to show that for any such state ss, there is a transition (s,s)(s,s^{\prime}) that is compatible with πΦ\pi_{\Phi} because then, by definition, f(s)f(s)f(s^{\prime})\prec f(s) and P[s,]P[s,\prec] has width equal to zero.

If P[s0,]P[s_{0},\prec] is in P[]P[\prec] for the initial state s0s_{0} of PP, s0s_{0} is not a goal state and there is some transition (s0,s)(s_{0},s^{\prime}) that is compatible with πΦ\pi_{\Phi} since πΦ\pi_{\Phi} solves PP. Let us now consider the subproblem P[s,]P[s^{\prime},\prec] for ss0s^{\prime}\neq s_{0}. By definition, there is a subproblem P[s,]P[s,\prec] in P[]P[\prec] such that ss^{\prime} is a non-goal state at shortest distance from ss with f(s)f(s)f(s^{\prime})\prec f(s). By definition of \prec and reasoning inductively, there is a path in the policy graph for πΦ\pi_{\Phi} from the boolean feature valuation b(f(s0))b(f(s_{0})) to b(f(s))b(f(s^{\prime})). Since the features in Φ\Phi separate goals in 𝒬\mathcal{Q} and πΦ\pi_{\Phi} is goal connected, there is a policy rule CEC\mapsto E that is applicable at ss^{\prime}. Thus, by soundness, there must be a transition (s,s′′)(s^{\prime},s^{\prime\prime}) in PP that is compatible with πΦ\pi_{\Phi}. ∎

Example.  The policy for Boxes in the previous example is closed, goal connected, and sound. By Theorem 32, it determines a serialization (Φ,)(\Phi,\prec) of width zero, where f(s)=[n(s),m(s)]f(s)=[n(s),m(s)]f(s)=[n(s),m(s)]\prec f(s^{\prime})=[n(s^{\prime}),m(s^{\prime})] iff n(s)<n(s)n(s)<n(s^{\prime}) or n(s)=n(s)n(s)=n(s^{\prime}) and m(s)<m(s)m(s)<m(s^{\prime}).  

H¯,p> 0,t= 0,n> 0\overline{H},p{\,>\,}0,t{\,=\,}0,n{\,>\,}0 H¯,p= 0,t= 0,n> 0\overline{H},p{\,=\,}0,t{\,=\,}0,n{\,>\,}0 H,p= 0,t= 0,n> 0H,p{\,=\,}0,t{\,=\,}0,n{\,>\,}0 H¯,p= 0,t= 0,n= 0\overline{H},p{\,=\,}0,t{\,=\,}0,n{\,=\,}0 H¯,p> 0,t> 0,n> 0\overline{H},p{\,>\,}0,t{\,>\,}0,n{\,>\,}0 H¯,p= 0,t> 0,n> 0\overline{H},p{\,=\,}0,t{\,>\,}0,n{\,>\,}0 H,p= 0,t> 0,n> 0H,p{\,=\,}0,t{\,>\,}0,n{\,>\,}0 H¯,p> 0,t= 0,n= 0\overline{H},p{\,>\,}0,t{\,=\,}0,n{\,=\,}0 {p,t?}\{p\raisebox{0.6458pt}{$\downarrow$},t?\} {p,t?}\{p\raisebox{0.6458pt}{$\downarrow$},t?\} {p,t?}\{p\raisebox{0.6458pt}{$\downarrow$},t?\} {p,t?}\{p\raisebox{0.6458pt}{$\downarrow$},t?\} {H}\{H\} {¬H,n,p?}\{\neg H,n\raisebox{0.6458pt}{$\downarrow$},p?\} {¬H,n,p?}\{\neg H,n\raisebox{0.6458pt}{$\downarrow$},p?\} {¬H,n,p?}\{\neg H,n\raisebox{0.6458pt}{$\downarrow$},p?\} {¬H,n,p?}\{\neg H,n\raisebox{0.6458pt}{$\downarrow$},p?\} {p,t?}\{p\raisebox{0.6458pt}{$\downarrow$},t?\} {p,t?}\{p\raisebox{0.6458pt}{$\downarrow$},t?\} {p,t?}\{p\raisebox{0.6458pt}{$\downarrow$},t?\} {p,t?}\{p\raisebox{0.6458pt}{$\downarrow$},t?\} {H}\{H\} {t}\{t\raisebox{0.6458pt}{$\downarrow$}\} {t}\{t\raisebox{0.6458pt}{$\downarrow$}\}
Figure 1: Policy graph for Delivery for the policy defined by the rules {¬H,p> 0}{p,t?}\{\neg H,p{\,>\,}0\}\mapsto\{p\raisebox{0.6458pt}{$\downarrow$},t?\}, {¬H,p= 0}{H}\{\neg H,p{\,=\,}0\}\mapsto\{H\}, {H,t> 0}{t}\{H,t{\,>\,}0\}\mapsto\{t\raisebox{0.6458pt}{$\downarrow$}\}, and {H,n> 0,t= 0}{¬H,n,p?}\{H,n{\,>\,}0,t{\,=\,}0\}\mapsto\{\neg H,n\raisebox{0.6458pt}{$\downarrow$},p?\}. Yellow and green nodes denote initial and goal nodes respectively. Red nodes and edges stand for nodes and transitions in the policy graph that do not arise in the instances. The graph is terminating and goal connected, and the policy is closed and sound for the classes 𝒬D\mathcal{Q}_{D} and 𝒬D1\mathcal{Q}_{D_{1}}.

Example.  In Delivery, we use the features Φ={H,p,t,n}\Phi=\{H,p,t,n\} and the policy πΦ\pi_{\Phi} defined by the rules {¬H,p> 0}{p,t?}\{\neg H,p{\,>\,}0\}\mapsto\{p\raisebox{0.6458pt}{$\downarrow$},t?\}, {¬H,p= 0}{H}\{\neg H,p{\,=\,}0\}\mapsto\{H\}, {H,t> 0}{t}\{H,t{\,>\,}0\}\mapsto\{t\raisebox{0.6458pt}{$\downarrow$}\}, and {H,n> 0,t= 0}{¬H,n,p?}\{H,n{\,>\,}0,t{\,=\,}0\}\mapsto\{\neg H,n\raisebox{0.6458pt}{$\downarrow$},p?\}. It is easy to check that πΦ\pi_{\Phi} is sound for the classes 𝒬D\mathcal{Q}_{D} and 𝒬D1\mathcal{Q}_{D_{1}} since for any reachable non-goal state ss in a problem PP, there is a transition (s,s)(s,s^{\prime}) that is compatible with πΦ\pi_{\Phi}. On the other hand, the policy graph for πΦ\pi_{\Phi}, depicted in Fig. 1, is clearly terminating and goal connected. Since Φ\Phi separates goals for the classes 𝒬D\mathcal{Q}_{D} and 𝒬D1\mathcal{Q}_{D_{1}}, by Theorem 27, πΦ\pi_{\Phi} solves both classes structurally, and the induced serialization (Φ,)(\Phi,\prec) has width zero for the classes 𝒬D\mathcal{Q}_{D} and 𝒬D1\mathcal{Q}_{D_{1}} by Theorem 32.  

Sketches: A Language for Serializations

The notion of serialization plays a key role in the connection between general policies and serialized width, and since width is a special case of serialized width,888w(P)=wΦ(P)w(P)=w_{\Phi}(P) for the empty serialization (Φ,)(\Phi,\prec); i.e., the one for which fff\prec f^{\prime} is always false. serializations emerge as a central concept in the study. Indeed, serializations can be used to reduce the width of a problem, and if this width is reduced to zero over 𝒬\mathcal{Q}, a general policy for 𝒬\mathcal{Q} results. We focus next on a specific language for specifying serializations in a compact manner. The language can be used either to encode serializations by hand, as a form of domain-specific knowledge for extending the scope of width-based algorithms such as SIW, or for learning serializations from traces. Such uses, however, are beyond the scope of this paper.

A policy sketch or simply sketch RΦR_{\Phi}, for a class of problems 𝒬\mathcal{Q}, is a set of policy rules over the features Φ\Phi that distinguish the goals of 𝒬\mathcal{Q}. The sketch RΦR_{\Phi} can be a full fledged policy over 𝒬\mathcal{Q}, part of it, or just set of policy rules CEC\rightarrow E, including the empty set. By interpreting RΦR_{\Phi} as a policy, we can transfer previous results that involve policy graphs and termination. We call the rules in a sketch RΦR_{\Phi}, sketch rules because their semantics is different from the semantics of policy rules.

Definition 33 (Sketch).

A sketch for 𝒬\mathcal{Q} is a set RΦR_{\Phi} of sketch rules CEC\rightarrow E over features Φ\Phi that separate goals in 𝒬\mathcal{Q}. The sketch RΦR_{\Phi} is well-formed if the set of rules RΦR_{\Phi} interpreted as a policy is terminating.

Notice that the definition of terminating policies does not require the policy to be closed or even to solve 𝒬\mathcal{Q}. Theorem 28 directly yields:

Theorem 34.

A well-formed sketch RΦR_{\Phi} for 𝒬\mathcal{Q} defines a serialization (Φ,)(\Phi,\prec) over 𝒬\mathcal{Q} where ‘\prec’ if the smallest strict partial order that satisfies fff^{\prime}\prec f if the pair of feature valuations (f,f)(f,f^{\prime}) is compatible with a sketch rule in RΦR_{\Phi}.

The distinction between policy and sketch rules is semantical, not syntactical. A policy πΦ\pi_{\Phi} defines a filter on state transitions: a state transition is compatible with the policy if it is compatible with one of its rules. A sketch RΦR_{\Phi}, on the other hand, is not to be used in this way: a well-formed sketch defines a serialization, and a sketch rule CEC\mapsto E defines subproblems of the serialization: the subproblem of going from a state ss where CC holds in f=f(s)f=f(s) to a state ss^{\prime} with feature valuation f=f(s)f^{\prime}=f(s^{\prime}) such that the pair (f,f)(f,f^{\prime}) is compatible with the sketch rule. The key difference is that this subproblem is not solvable in a single step in general. Theorems 25 and 28 about policy rules that are terminating and that induce well-formed serializations are valid for sketch rules, because in both cases, the relevant notions, like sketch graphs, are defined structurally, without considering the state transitions that are possible in the target class 𝒬\mathcal{Q}. Another way to see the difference between policies and sketches is that (terminating) policies πΦ\pi_{\Phi} play two different roles in our analysis: they specify control, i.e., the possible actions to be done in a given instance PP of 𝒬\mathcal{Q}, and they define serializations. Sketches, on the other hand, play the latter role only.

Sketches provide a language for decomposing a problem into subproblems and thus for reducing its width, which goes well beyond the language of goal counters and variations, as the language for sketches includes the language of general policies.

Example.  The serialization given by the single feature #g\#g that counts the number of unachieved (top) goals is captured with the sketch that only contains the rule {#g> 0}{#g}\{\#g{\,>\,}0\}\mapsto\{\#g\raisebox{0.6458pt}{$\downarrow$}\} when there are no other features, and the rule {#g> 0}{#g,p?,n?}\{\#g{\,>\,}0\}\mapsto\{\#g\raisebox{0.6458pt}{$\downarrow$},p?,n?\} when pp and nn are other features. The rules say that it is “good” to decrease the goal counter independently of the effects on other features. In problems such as Blocksworld, this serialization does not work (has unbounded width), but serializing instead with the single feature #m\#m that counts the number of misplaced blocks with the sketch rule {#m> 0}{#m}\{\#m{\,>\,}0\}\mapsto\{\#m\raisebox{0.6458pt}{$\downarrow$}\}, yields a serialization of width 22. A block is misplaced when it is on a wrong block or is above a mislaced block. The sketch thus decomposes any Blocksworld problem into subproblems whose goals are to put a block away or to put them at the right place. The width of the first subproblem is 1 while for second is 2.  

Our last results are about the width of the serializations defined by sketches, and the modifications in the SIWΦ\text{SIW}_{\Phi} algorithm to work with sketches:

Definition 35 (Sketch width).

Let RΦR_{\Phi} be a well-formed sketch for a class of problems 𝒬\mathcal{Q} such that Φ\Phi separates the goals, and let ss be a reachable state in some instance PP of 𝒬\mathcal{Q}. The width of the sketch RΦR_{\Phi} at state ss of problem PP, wR(P[s])w_{R}(P[s]), is the width of the subproblem P[s]P[s] that is like PP but with initial state ss and goal states ss^{\prime} such that ss^{\prime} is a goal state of PP, or the pair (f(s),f(s))(f(s),f(s^{\prime})) is compatible with a sketch rule CEC\mapsto E. The width of the sketch RΦR_{\Phi}, wR(𝒬)w_{R}(\mathcal{Q}), is the maximum width wR(P[s])w_{R}(P[s]) for any reachable state ss in any problem PP in 𝒬\mathcal{Q}.

Theorem 36.

Let RΦR_{\Phi} be a well-formed sketch for a class 𝒬\mathcal{Q} of problems, and let (Φ,)(\Phi,\prec) be the serialization determined by RΦR_{\Phi} from Theorem 34. The width wΦ(𝒬)w_{\Phi}(\mathcal{Q}) of the serialization is bounded by the width wR(𝒬)w_{R}(\mathcal{Q}) of the sketch.

Proof.

By definition, wΦ(𝒬)kw_{\Phi}(\mathcal{Q})\leq k if the width w(P[s,])w(P[s,\prec]) of the subproblems P[s,]P[s,\prec] in P[]P[\prec] is bounded by kk, for any PP in 𝒬\mathcal{Q}. The goals of subproblem P[s,]P[s,\prec] are the states ss^{\prime} that are a goal of PP, or f(s)f(s)f(s^{\prime})\prec f(s). In particular, if the pair (f(s),f(s))(f(s),f(s^{\prime})) is compatible with a sketch rule, then f(s)f(s)f(s^{\prime})\prec f(s), but the converse does not hold in general. That is, for any subproblem P[s,]P[s,\prec] in P[]P[\prec], the goals of the subproblem P[s]P[s] are also goals of P[s,]P[s,\prec]. Hence, wΦ(P[s,])wR(P[s])w_{\Phi}(P[s,\prec])\leq w_{R}(P[s]) and wΦ(𝒬)wR(𝒬)w_{\Phi}(\mathcal{Q})\leq w_{R}(\mathcal{Q}). ∎

If a well-formed sketch RΦR_{\Phi} has bounded width for a class of problems 𝒬\mathcal{Q}, then the problems in 𝒬\mathcal{Q} can be solved in polynomial time by the algorithm SIWR\text{SIW}_{\text{R}} that is like SIWΦ\text{SIW}_{\Phi}, with the difference that the precedence test fff\prec f^{\prime} among pairs of feature valuations ff and ff^{\prime} is replaced by the test of whether the feature valuation pair (f,f)(f,f^{\prime}) is compatible with a rule in RΦR_{\Phi}. In other words, SIWR\text{SIW}_{\text{R}} start at the state s:=s0s:=s_{0}, where s0s_{0} is the initial state of PP, and then performs an IW search from ss to find a state ss^{\prime} that is a goal state or such the pair (f(s),f(s))(f(s),f(s^{\prime})) is compatible with a sketch rule in RΦR_{\Phi}. Then if ss^{\prime} is not a goal state, ss is set to ss^{\prime}, s:=ss:=s^{\prime}, and the loop repeats until a goal state is reached. The precedence test in RΦR_{\Phi} can be done in constant time unlike the general test fff^{\prime}\prec f in SIWΦ\text{SIW}_{\Phi}.999For testing fff^{\prime}\prec f, one needs to check if there is a sequence {fi}i=0n\{f_{i}\}_{i=0}^{n} of feature valuations such that f0=ff_{0}=f, fn=ff_{n}=f^{\prime}, and each pair (fi,fi+1)(f_{i},f_{i+1}) is compatible with a sketch rule, i=0,1,,n1i=0,1,\ldots,n-1. However, this test can be done in constant time too, provided that the binary relation ‘\prec’ for each instance PP is precompiled in a boolean hash table with NkN^{k} rows and NkN^{k} columns where NN is the number of atoms in PP, kk is the number of features, and NkN^{k} is the number of feature valuations in PP. Unlike the procedure SIWR\text{SIW}_{\text{R}}, this precompilation is not practical in general. The efficiency of SIWR\text{SIW}_{\text{R}} comes at a price: by testing “progress” with the sketch rules directly and not with the serializations that results from such rules, SIWR\text{SIW}_{\text{R}} is not using the serialization at full, as it ignores the transitive closure of the precedence relations. The runtime properties of SIWR\text{SIW}_{\text{R}} are thus similar to those of SIWΦ\text{SIW}_{\Phi}, as captured in Theorem 22, with precedence tests that can be done in constant time:

Theorem 37.

Let RΦR_{\Phi} be a well-formed sketch for a class 𝒬\mathcal{Q} of problems. If the sketch width wR(𝒬)w_{R}(\mathcal{Q}) is bounded by kk, SIWR\text{SIW}_{\text{R}} solves any problem PP in 𝒬\mathcal{Q} in O(bN|Φ|+2k1)O(bN^{|\Phi|+2k-1}) time and O(bNk+N|Φ|+k)O(bN^{k}+N^{|\Phi|+k}) space, where bb and NN bound the branching factor and number of atoms in PP respectively.

Proof.

Like the proof of Theorem 22 but with the test fff\prec f^{\prime} in SIWΦ\text{SIW}_{\Phi} replaced by checking if the pair of feature valuations (f,f)(f,f^{\prime}) is compatible with a rule in RΦR_{\Phi}. ∎

Example.  Different and interesting sketches are given in Table 1 for the two classes of problems for Delivery: the class 𝒬D\mathcal{Q}_{D} of problems with an arbitrary number of packages and the class 𝒬D1\mathcal{Q}_{D_{1}} of problems with a single package. In the table, the entries in the columns 𝒬D1\mathcal{Q}_{D_{1}} and 𝒬D\mathcal{Q}_{D} upper bound the width of the different sketches in the table for the two classes of Delivery problems. The entries unbunb and ‘—’ stand respectively for unbounded width and ill-defined (non-terminating) sketch. The features used are: (boolean) HH for holding a package, pp is distance to nearest package (zero if holding a package or no package to be delivered remains, tt is distance to current target cell (zero if holding nothing), and nn is number of packages still to be delivered.

Policy sketch 𝒬D1\mathcal{Q}_{D_{1}} 𝒬D\mathcal{Q}_{D}
σ0={}\sigma_{0}=\{\} 2 unbunb
σ1={{H}{¬H,p?,t?}}\sigma_{1}=\{\{H\}\mapsto\{\neg H,p?,t?\}\} 2 unbunb
σ2={{¬H}{H,p?,t?}}\sigma_{2}=\{\{\neg H\}\mapsto\{H,p?,t?\}\} 1 unbunb
σ3=σ1σ2\sigma_{3}=\sigma_{1}\cup\sigma_{2}
σ4={{n> 0}{n,H?,p?,t?}}\sigma_{4}=\{\{n{\,>\,}0\}\mapsto\{n\raisebox{0.6458pt}{$\downarrow$},H?,p?,t?\}\} 2 2
σ5=σ2σ4\sigma_{5}=\sigma_{2}\cup\sigma_{4} 1 1
σ6={{¬H,p> 0}{p,t?}}\sigma_{6}=\{\{\neg H,p{\,>\,}0\}\mapsto\{p\raisebox{0.6458pt}{$\downarrow$},t?\}\} 2 unbunb
σ7={{H,t> 0}{t,p?}}\sigma_{7}=\{\{H,t{\,>\,}0\}\mapsto\{t\raisebox{0.6458pt}{$\downarrow$},p?\}\} 2 unbunb
σ8=σ2σ4σ6σ7\sigma_{8}=\sigma_{2}\cup\sigma_{4}\cup\sigma_{6}\cup\sigma_{7} 0 0
Table 1: Upper bounds on the width of different sketches for the classes 𝒬D1\mathcal{Q}_{D_{1}} and 𝒬D\mathcal{Q}_{D} of Delivery problems. The entries unbunb and ‘—’ mean, respectively, unbounded width and ill-defined sketch. For sketches of bounded width, SIWR\text{SIW}_{\text{R}} solves any instance in the class in polynomial time.

We briefly explain the entries in the table without providing formal proofs (such proofs can be obtained with Theorem 18). σ0\sigma_{0} is the empty sketch whose width is the same as the plain width, 2 for D1D_{1} and unbounded for DD, as no problem PP is decomposed into subproblems. The rule {H}{¬H,p?,t?}\{H\}\mapsto\{\neg H,p?,t?\} in σ1\sigma_{1} does not help in initial states that do not satisfy HH, and hurts a bit in states that do satisfy HH. Indeed, in the latter states ss, there is a state ss^{\prime} at one step ahead from ss with f(s)f(s)f(s^{\prime})\prec f(s) (obtained from ss by dropping the package) that gets chosen by any algorithm like SIWR\text{SIW}_{\text{R}}. However, in the resulting subproblem with initial state ss^{\prime} there is no state s′′s^{\prime\prime} for which f(s′′)f(s)f(s^{\prime\prime})\prec f(s^{\prime}) holds as there is no sketch rule CEC\mapsto E where CC is true in ss^{\prime}. As a result, the goal of the subproblem rooted as ss^{\prime} is the true goal of the problem, which may actually involve going back from ss^{\prime} to ss and from ss to the goal. This is indeed what SIWR\text{SIW}_{\text{R}} does given σ1\sigma_{1}. Since this subproblem has width 22, the serialized width of D1D_{1} remains 22. For σ2\sigma_{2}, the rule {¬H}{H,p?,t?}\{\neg H\}\mapsto\{H,p?,t?\} says that a state ss where ¬H\neg H holds can be “improved” by finding a state ss^{\prime} where HH holds, while possibly affecting pp, tt, or both. Hence, any problem PP in D1D_{1} is split in two subproblems: achieve HH first and then the goal, reducing the sketch width of D1D_{1} to 1 but not the sketch width of DD. The sketch σ3\sigma_{3} is not well-formed as it is not terminating: indeed the resulting ordering is not a strict partial order: a state ss where the agent is at the same location as a package but does not hold it is “improved” into a state ss^{\prime} by picking the package, which is improved again by dropping the package without affecting any numerical feature. For σ4\sigma_{4}, that subgoals on the top goal captured by the feature nn, that counts the number of undelivered packages, the sketch width of DD reduces to 2 but that of D1D_{1} remains unchanged. The sketch σ5\sigma_{5} combines the rules in σ2\sigma_{2} and σ4\sigma_{4}, and as a result, decomposes DD into width 2 subproblems, the first of which, that corresponds to D1D_{1}, is further decomposed into two subproblems. The sketch width of both DD and D1D_{1} becomes then 1: the first subproblem is to collect the nearest package, the second one to drop it at the target cell, and the same sequence of subproblems gets repeated. The sketch σ6\sigma_{6} renders the subproblem of getting close to the nearest package with width 0, but the remaining subproblem in D1D_{1} that involves picking up the package and delivering it at the target cell, still has width 2. The sketch σ7\sigma_{7} does not decompose the problem when HH is initially false. Finally, the combination in σ8\sigma_{8} yields a full policy, and thus a serialization of width 0 where each subproblem is solved in a single step.  

Conclusions

We have established a number of connections between the notions of width, as developed and used in classical planning, and the notion of generalized plans, which are summarized in Table 2. The results suggest a deep connection between these two notions and that bounded width for infinite collections of problems 𝒬\mathcal{Q} is often the result of simple general policies that solve 𝒬\mathcal{Q} optimally in terms of features that are partially represented in the problem encodings. When this is not the case, we have shed light on the representations that deliver such properties, and hence, polynomial-time searches. We have also formalized and generalized the notion of serialized width by appealing to an explicit and abstract notion of serializations, and established connections between generalized policies and serialized width by borrowing notions from QNPs. Moreover, from this connection, we introduced policy sketches which make use of the language of policy rules but with a different semantics for providing a convenient and powerful language for specifying subproblems and serializations that can be exploited by algorithms such as SIW. The language can be used for encoding domain-specific knowledge by hand, or as a target language for learning domain serializations automatically from traces. These are interesting challenges that we would like to address in the future.

Thm Notes
2 Performance and guarantees of IW(k)\text{IW}(k).
11 Optimal and Markovian policies bound width if features encoded.
12 Markovian policies guarantee optimal solutions with IWΦ\text{IW}_{\Phi}.
13 Bounded width also when tuples capture the features in all optimal trajectories.
18 Bounded width when tuples capture features in a projection.
22 Performance and guarantees of SIWΦ.
25 Conditions for termination of policies.
27 Closed and terminating policies define structural solutions.
28 Terminating policies define serializations.
32 Structural solutions that are sound and closed define serializations of zero width.
34 Well-formed sketches define serializations.
36 Sketch width bounds width of induced serialization.
37 Performance and guarantees of SIWR\text{SIW}_{\text{R}} given sketches.
Table 2: Summary of main formal results.

Acknowledgments

The research is partially funded by an ERC Advanced Grant (No 885107), by grant TIN-2015-67959-P from MINECO, Spain, and by the Knut and Alice Wallenberg (KAW) Foundation through the WASP program. H. Geffner is also a Wallenberg Guest Professor at Linköping University, Sweden.

References

  • Bandres, Bonet, and Geffner (2018) Bandres, W.; Bonet, B.; and Geffner, H. 2018. Planning with Pixels in (Almost) Real Time. In AAAI.
  • Belle and Levesque (2016) Belle, V.; and Levesque, H. J. 2016. Foundations for Generalized Planning in Unbounded Stochastic Domains. In Proc. KR, 380–389.
  • Bellemare et al. (2013) Bellemare, M. G.; Naddaf, Y.; Veness, J.; and Bowling, M. 2013. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47: 253–279.
  • Bonet, Frances, and Geffner (2019) Bonet, B.; Frances, G.; and Geffner, H. 2019. Learning features and abstract actions for computing generalized plans. In Proc. AAAI, 2703–2710.
  • Bonet and Geffner (2018) Bonet, B.; and Geffner, H. 2018. Features, Projections, and Representation Change for Generalized Planning. In Proc. IJCAI, 4667–4673.
  • Bonet and Geffner (2020) Bonet, B.; and Geffner, H. 2020. Qualitative Numeric Planning: Reductions and Complexity. Journal of Artificial Intelligence Research (JAIR) 69: 923–961.
  • Bonet and Geffner (2021) Bonet, B.; and Geffner, H. 2021. General Policies, Representations, and Planning Width. In Proc. AAAI.
  • Bonet, Palacios, and Geffner (2009) Bonet, B.; Palacios, H.; and Geffner, H. 2009. Automatic Derivation of Memoryless Policies and Finite-State Controllers Using Classical Planners. In Proc. ICAPS-09, 34–41.
  • Dechter (2013) Dechter, R. 2013. Reasoning with probabilistic and deterministic graphical models: Exact algorithms. Morgan & Claypool Publishers.
  • Francès et al. (2017) Francès, G.; Ramírez, M.; Lipovetzky, N.; and Geffner, H. 2017. Purely Declarative Action Representations are Overrated: Classical Planning with Simulators. In Proc. IJCAI.
  • Freuder (1982) Freuder, E. C. 1982. A sufficient condition for backtrack-free search. Journal of the ACM 29(1): 24–32.
  • Geffner and Bonet (2013) Geffner, H.; and Bonet, B. 2013. A Concise Introduction to Models and Methods for Automated Planning. Morgan & Claypool Publishers.
  • Ghallab, Nau, and Traverso (2016) Ghallab, M.; Nau, D.; and Traverso, P. 2016. Automated planning and acting. Cambridge U.P.
  • Hoffmann, Porteous, and Sebastia (2004) Hoffmann, J.; Porteous, J.; and Sebastia, L. 2004. Ordered Landmarks in Planning. Journal of Artificial Intelligence Research 22: 215–278.
  • Hu and De Giacomo (2011) Hu, Y.; and De Giacomo, G. 2011. Generalized planning: Synthesizing plans that work for multiple environments. In Proc. IJCAI, 918–923.
  • Lehman and Stanley (2011a) Lehman, J.; and Stanley, K. O. 2011a. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation 19(2): 189–223.
  • Lehman and Stanley (2011b) Lehman, J.; and Stanley, K. O. 2011b. Evolving a diversity of virtual creatures through novelty search and local competition. In Proc. of the 13th annual conference on Genetic and evolutionary computation, 211–218.
  • Lipovetzky and Geffner (2012) Lipovetzky, N.; and Geffner, H. 2012. Width and serialization of classical planning problems. In Proc. ECAI, 540–545.
  • Lipovetzky and Geffner (2017a) Lipovetzky, N.; and Geffner, H. 2017a. Best-First Width Search: Exploration and Exploitation in Classical Planning. In Proc. AAAI.
  • Lipovetzky and Geffner (2017b) Lipovetzky, N.; and Geffner, H. 2017b. A polynomial planning algorithm that beats LAMA and FF. Proc. ICAPS .
  • Lipovetzky, Ramirez, and Geffner (2015) Lipovetzky, N.; Ramirez, M.; and Geffner, H. 2015. Classical Planning with Simulators: Results on the Atari Video Games. In Proc. IJCAI.
  • Ostrovski et al. (2017) Ostrovski, G.; Bellemare, M. G.; Oord, A.; and Munos, R. 2017. Count-Based Exploration with Neural Density Models. In Proc. ICML, 2721–2730.
  • Pathak et al. (2017) Pathak, D.; Agrawal, P.; Efros, A. A.; and Darrell, T. 2017. Curiosity-driven exploration by self-supervised prediction. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 16–17.
  • Pearl (1988) Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.
  • Richter and Westphal (2010) Richter, S.; and Westphal, M. 2010. The LAMA planner: Guiding cost-based anytime planning with landmarks. Journal of Artificial Intelligence Research 39(1): 127–177.
  • Segovia, Jiménez, and Jonsson (2016) Segovia, J.; Jiménez, S.; and Jonsson, A. 2016. Generalized planning with procedural domain control knowledge. In Proc. ICAPS, 285–293.
  • Srivastava, Immerman, and Zilberstein (2008) Srivastava, S.; Immerman, N.; and Zilberstein, S. 2008. Learning generalized plans using abstract counting. In Proc. AAAI, 991–997.
  • Srivastava et al. (2011) Srivastava, S.; Zilberstein, S.; Immerman, N.; and Geffner, H. 2011. Qualitative Numeric Planning. In AAAI.
  • Tang et al. (2017) Tang, H.; Houthooft, R.; Foote, D.; Stooke, A.; Chen, O. X.; Duan, Y.; Schulman, J.; DeTurck, F.; and Abbeel, P. 2017. # exploration: A study of count-based exploration for deep reinforcement learning. In Advances in neural information processing systems, 2753–2762.
  • Thiébaux, Hoffmann, and Nebel (2005) Thiébaux, S.; Hoffmann, J.; and Nebel, B. 2005. In defense of PDDL axioms. Artif. Intell. 168(1-2): 38–69.