General Policies, Serializations, and Planning Width∗
Abstract
It has been observed that in many of the benchmark planning domains, atomic goals can be reached with a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width. Such problems have indeed a bounded width: a width that does not grow with the number of problem variables and is often no greater than two. Yet, while the notion of width has become part of the state-of-the-art planning algorithms like BFWS, there is still no good explanation for why so many benchmark domains have bounded width. In this work, we address this question by relating bounded width and serialized width to ideas of generalized planning, where general policies aim to solve multiple instances of a planning problem all at once. We show that bounded width is a property of planning domains that admit optimal general policies in terms of features that are explicitly or implicitly represented in the domain encoding. The results are extended to the larger class of domains with bounded serialized width where the general policies do not have to be optimal. The study leads also to a new simple, meaningful, and expressive language for specifying domain serializations in the form of policy sketches which can be used for encoding domain control knowledge by hand or for learning it from traces. The use of sketches and the meaning of the theoretical results are all illustrated through a number of examples.00footnotetext: ∗This paper is the long version of (Bonet and Geffner 2021).
Introduction
Pure width-based search methods exploit the structure of states to enumerate the state space in ways that are different from standard methods like breadth-first, depth-first, or random search (Lipovetzky and Geffner 2012). For this, width-based methods appeal to a notion of novelty to establish a preference for first visiting states that are most novel. Novelty-based methods have also been used in the context of genetic algorithms where a greedy focus on the function to optimize (fitness) often leads to bad local optima (Lehman and Stanley 2011a, b), and in reinforcement learning to guide exploration in large spaces where reward is sparse (Tang et al. 2017; Pathak et al. 2017; Ostrovski et al. 2017).
In classical planning, i.e., planning in factored spaces for achieving a given goal from a known initial state (Geffner and Bonet 2013; Ghallab, Nau, and Traverso 2016), the notion of novelty is now part of state-of-the-art search algorithms like BFWS (Lipovetzky and Geffner 2017b, a) and has been applied successfully in purely exploratory settings where no compact model of the actions or goals is assumed to be known a priori (Francès et al. 2017; Bandres, Bonet, and Geffner 2018). The basic width-based planning algorithms are simple and they all assume that the states are factored, assigning values to a fixed number of boolean features that in classical planning is given by the atoms in the problem. The procedure is indeed a breadth-first search that starts in the given initial state and prunes all the states that do not make a feature from true for the first time in the search. is like but using the set of features that stand for conjunctions of up to features from . For many benchmark domains, it has been shown that for a small and constant value of like , called the domain width, suffices to compute plans, and indeed optimal plans, for any atomic goal (Lipovetzky and Geffner 2012). Other algorithms like BFWS make use of this property for serializing conjunctive goals into atomic ones; an idea that is also present in algorithms that precompute atomic landmarks (Hoffmann, Porteous, and Sebastia 2004), and in particular, those that use landmark counting heuristics (Richter and Westphal 2010).
A key open question in the area is why these width-based methods are effective at all, and in particular, why so many domains have a small width when atomic goals are considered. Is this a property of the domains? Is it an accident of the manual representations used? In this work, we address these and related questions. For this, we bring the notion of general policies; policies that solve multiple instances of a planning domain all at once (Srivastava, Immerman, and Zilberstein 2008; Bonet, Palacios, and Geffner 2009; Hu and De Giacomo 2011; Belle and Levesque 2016; Segovia, Jiménez, and Jonsson 2016), in some cases by appealing to a fixed set of general state features that we restrict to be linear. A number of correspondences are then obtained connecting the notions of width, the size of general policies as measured by the number of features used, and the more general and useful notion of serialized width; i.e., width under a given serialization. The study leads us to formalize the abstract notion of serialization and to analyze the conditions under which serializations are well-formed. Moreover, from the relations established between general policies and serializations, we obtain a new simple, meaningful, and expressive language for specifying serializations, called policy sketches, which can be used for encoding domain control knowledge by hand or for learning it from traces. While exploring these potential uses of sketches is beyond the scope of this paper, the use of sketches and the meaning of the theoretical results are all illustrated through a number of examples.
The paper is organized as follows. We review first the notions of width, representations, and general policies, and relate width with the size of such policies, as measured by the number of features. Then we introduce serializations, the more general notion of serialized width, the relation between general policies and serialized width, and finally, policy sketches and their properties.
Width
is a simple search procedure that operates on a rooted directed graph where the nodes represent states, and states assign values to a given set of features (Lipovetzky and Geffner 2012). performs a breadth-first search starting at the root but pruning the states that do not make an atom true for the first time in the search. For classical planning problems expressed in languages such as (grounded) STRIPS, the features are the problem variables which can take the values true or false. In other settings, like the Atari games as supported in ALE (Bellemare et al. 2013), the features and their values are defined in other ways (Lipovetzky, Ramirez, and Geffner 2015; Bandres, Bonet, and Geffner 2018). The procedure for is but with a feature set given by .
A finding reported by Lipovetzky and Geffner (2012) and exploited since in width-based algorithms is that the procedure with suffices to solve a wide variety of planning problems. Indeed, Lipovetzky and Geffner consider the 37,921 instances that result from all the domains used in planning competitions until 2012, where each instance with a goal made up of a conjunction of atoms is split into instances, each one with a single atomic goal. They report that solves more than 88% of such instances, and moreover, 100% of the instances from 26 of the 37 domains considered.
This is remarkable because when is smaller than the number of problem variables is an incomplete procedure. Indeed, while breadth-first search expands a number of nodes that is exponential in , expands a quadratic number of nodes at most. Moreover, the results are not an accident resulting from a lucky choice of the instances, as one can formally prove that solves any instance of many of these domains when the goal is a single atom.
Underlying the IW algorithms is the notion of problem width, which borrows from similar notions developed for constraint satisfaction problems and Bayesian networks, that are intractable but have algorithms that run in time and space that are exponential in the treewidth of the underlying graphs (Freuder 1982; Pearl 1988; Dechter 2013). For planning, the notion introduced by Lipovetzky and Geffner takes this form:111For convenience, we set the width of problems that can be solved in one step to zero.
Definition 1 (Based on Lipovetzky and Geffner, 2012).
The width of problem is the minimum for which there is a sequence of atom tuples , each with at most atoms, such that:
-
1.
is true in the initial state of ,
-
2.
any optimal plan for can be extended into an optimal plan for by adding a single action, ,
-
3.
any optimal plan for is an optimal plan for .
The width is iff the initial state of is a goal state. For convenience, we set to if the goal of is reachable in a single step, and to if has no solution where is the number of atoms in .
Chains of tuples that comply with conditions 1–3 are called admissible, and the size of the chain is the size of the largest tuple in the chain. The width is thus if is the minimum size of an admissible chain for . Notice that the definition of width does not presuppose a particular language for specifying the actions or goals, which can actually be specified procedurally. It just assumes that states are factored and assign truth values to a pool of atoms. The tuples in an admissible chain can be regarded as subgoals or stepping stones in the way to the goal of that ensure that it will be reached optimally. The main properties of the algorithm can then be expressed as follows:222It is assumed that the number of atoms affected by an action is bounded by a constant. When this is not the case, the time bound in Theorem 2 becomes .
Theorem 2 (Lipovetzky and Geffner, 2012).
expands up to nodes, generates up to nodes, and runs in time and space and , respectively, where is the number of atoms and bounds the branching factor in problem . is guaranteed to solve optimally (shortest path) if .
Proof.
Since the number of tuples of size at most is bounded by , expands up to nodes and generates up to nodes, where each expansion takes time bounded by . Under the assumption that each transition flips the value of up to (constant) number of atoms, each dequed state can make true up to new tuples of atoms of size at most . Seen tuples are stored in a perfect hash of size which supports operations in constant time. Therefore, the test for expansion for each dequed state takes time , and thus the total time incurred by for exploring the search tree is . From the definition of width, it is not hard to see that if , then generates a node for a goal state, and the path from the initial state to the first goal state that is generated is an optimal (shortest) path. ∎
When the width of problem is not known, the IW algorithm can be run instead which calls iteratively for until the problem is solved, or shown to have no solution when finds no plan. While these algorithms are not aimed at being practical as planning algorithms, some state-of-the-art planners, make use of these ideas in a slightly different form (Lipovetzky and Geffner 2017b, a).
Example. Let be the class of Blocksworld problems in the standard stack/unstack encoding whose goal is to achieve the atom and an empty gripper, starting with false and an empty gripper. Let for be the blocks above from top to bottom in , and let us consider the chain of subgoals where , , and , . It is easy to check that conditions 1–3 above hold for this chain, hence, . Since , as the goal cannot be reached in zero or one step in general, .
Example. Let be the class of Blocksworld problems where the goal is . For simplicity, let us assume that and are not initially in the same tower, and there are blocks above , and blocks above . It is not difficult to check that the chain is admissible and of size 2, where for is as in previous example, and for , , and . Then, since .
Representations
The width of a planning problem is tied to the representation language and the encoding of the problem in the language. For example, the problem of moving a number of packages from one room to the next, one by one, has a width that grows with in standard encodings where each package has a name, but width when the packages are indistinguishable from each other and the number of packages is encoded in unary (with one atom per counter value).333The problem would still not have bounded width if the counter is represented in binary using a logarithmic number of atoms.
In order to deal with a variety of possible languages and encodings, and since width-based methods rely on the structure of states but not on the structure of actions (i.e., action preconditions and effects), we consider first-order languages for describing states in terms of atoms that represent objects and relations, leaving out from the language the representation of action preconditions and effects. It is assumed that the possible state transitions from a state are a function of the state but no particular representation of this function is assumed. In addition, the state language is extended with features whose values in a state are determined by the state. The features provide additional expressive power and ways for bridging different state representation languages, although they are logically redundant as their value is determined by the truth value of the atoms in the state. The features extend the notion of derived predicates as defined in PDDL, as they do not have to be boolean, and they do not have to be defined in the language of first-order logic or logic programs (Thiébaux, Hoffmann, and Nebel 2005), but can be defined via procedures. Domains, instances, and states are defined as follows:
Definition 3 (Domains, problems, and states).
A domain is a pair where is a set of primitive predicate symbols with their corresponding arities, and is a set of features defined in terms of the primitive predicates with their corresponding range of feature values. A problem over domain is a tuple where is a set of unique object names (objects), and and are sets of ground atoms that denote the initial and goal states of . A ground atom is made of a predicate and an object tuple in for the arity of . A state over problem is a collection of ground atoms. The state is a goal if , and a dead end if a goal state cannot be reached from in . A state denotes a unique valuation for the ground atoms in : iff belongs to .
This is all standard except for the two details mentioned before: there are no action schemas, and there are state features. For the former, it is implicitly assumed that in each problem , there is a function that maps states into the set of possible transitions . This implies, for example, that the states may contain static atoms, like adjacency relations, whose truth value are not affected by any action. For the features, we make the assumption that they are linear, in the sense that they can be computed efficiently and only span a linear number of values. More precisely:
Linear features assumption. The features in are either boolean or numerical, ranging in the latter case over the non-negative integers. The value of the feature in a state for problem , , can be computed in time bounded by where is the number of atoms and bounds the branching factor in . Numerical features can take up to values.
This assumption rules out features like that stands for the optimal cost (distance) from to a goal which may take a number of values that is not linear in the number of problem atoms, and whose computation may take exponential time. In many cases, the features can be defined in the language of first-order logic but this is not a requirement.
Example. Three state languages for Blocksworld are:
-
1.
with the binary predicate (symbol) and the unary (superindex indicates arity),
-
2.
with predicates , , , and ,
-
3.
with predicates and , and boolean features and .
Example. Four languages for a domain Boxes, where boxes containing marbles must be removed from a table, and for this, the marbles in the box must be removed one by one first:
-
1.
with predicates and ,
-
2.
with predicates , , and ,
-
3.
with predicates and , and features that count the number of marbles in ,
-
4.
with predicates and , and features and counting the number of marbles in a box with the least number of marbles, and the number of boxes left.
By abstracting away the details of the domain dynamics and the ability to introduce features, it is simple to move from one state representation to another.
The notion of width and the IW algorithms generalize to state languages containing features in a direct fashion. In both cases, the set of atoms considered is extended to contain the possible feature values where is a feature and is one of its possible values. Features are logically redundant but can have drastic effect on the problem width.
The width for class of problems over some domain is , written as , if for some and for every other problem in .
Example. The width for the class of problems where block has to be cleared has width in the state languages , , while the class has width for the three languages. On the other hand, for the class of instances from Boxes with a single box, the width is not bounded as it grows with the number of marbles when encoded in the languages and , but it is when encoded in the languages or . Likewise, for the class of instances from Boxes with arbitrary number of boxes, the encoding in the language has width that is not bounded as it grows with the number of boxes, but remains bounded and equal to in .
Generalized policies
We want to show that bounded width is a property of domains that admit a certain class of general policies. Different language for expressing general policies have been developed, some of which can deal with relational domains where different instances involve different (ground) actions. Most closely to this work, general policies have been defined in terms of qualitative numerical planning problems (QNPs) (Srivastava et al. 2011; Bonet and Geffner 2018, 2020). We build on this idea but avoid the introduction of QNPs by defining policies directly as mappings from boolean feature conditions into feature value changes.
A boolean feature condition for a set of features is a condition of the form or for a boolean feature in , or or for a numerical feature in . Similarly, a feature value change for is an expression of the form , , or for a boolean feature in , and , , or for a numerical feature in . General policies are given by a set of rules where and stands for boolean feature conditions and feature changes respectively.
Definition 4 (Policies).
A general policy for a domain over a set of features is a set of rules of the form , where is a set of boolean feature conditions and is a set of feature value changes. The condition is assumed in rules with effects or .
The policy prescribes the possible actions to be done in a state over a problem indirectly, as the set of state transitions that the actions in make possible and which are compatible with the policy:
Definition 5.
A transition satisfies the effect when:
-
1.
if (resp. ) in , (resp. ),
-
2.
if (resp. ) in , (resp. ,
-
3.
if (resp. ) is not mentioned at all in , (resp. ).
The transition is compatible with policy (or is a -transition) if there is a policy rule such that makes true and satisfies .
Policy rules provide a description of how the value of the features must change along the state trajectories that are compatible with the policy. Every transition in such trajectories have to be compatible with a policy rule . The expressions and in stand for uncertain effects, meaning that and may change in any way or not change at all. Features not mentioned in , on the other hand, must keep their values unchanged in a transition compatible with the rule .
The definition does not exclude the presence of multiple policy rules with conditions that are all true in a state . In such a case, for a state transition to be compatible with the policy, the definition requires to satisfy one of the effect expressions . On the other hand, if the body of a single rule is true in , for the transition to be compatible with the policy, it must satisfy the effect of that rule.
Example. A policy for solving the class can be expressed in terms of the features , where is true if a block is being held, and counts the number of blocks above . The policy can be expressed with two rules:
(1) |
The first rule says that when the gripper is empty and there are blocks above , an action that decreases and makes true must be chosen, while the second rule says that when the gripper holds a block and there are blocks above , an action that makes false and does not affect must be selected.
The conditions under which a general policy solves an instance and class are:
Definition 6 (Trajectories and solutions).
A state trajectory for problem is compatible with policy over features (or is -trajectory), iff is the initial state of , no state is goal, , and each pair is a possible state transition in that is compatible with . It is maximal if either is a goal state, no transition in is compatible with , or the trajectory is infinite (i.e., ). A policy solves a problem if all maximal state trajectories compatible with are goal reaching (i.e., is a goal state in ). solves a collection of problems if it solves each problem in .
It is easy to show that the policy captured by the rules in (1) solves . The verification and synthesis of general policies of this type have been addressed by Bonet and Geffner (2020) in the context of qualitative numerical planning, and by Bonet, Frances, and Geffner (2019) where the set of features and QNP model are learned from traces.
Generalized Policies and Width
The first result establishes a relation between classes of problems that are solvable by certain type of policies and their width. Namely, if there is a “Markovian” policy that generates optimal plans for a class , the problems in can be encoded to have width bounded by .
A policy over the features is Markovian in when the features provide a suitable abstraction of the states; i.e., when the possible next feature valuations are not just a function of the current state, but of the feature valuation in the state. More precisely, if we say that a state is optimal for a feature valuation in a problem when and there is no state with the same feature valuation that is reachable in in less number of steps than , the Markovian property is defined as follows:
Definition 7 (Markovian).
A policy is Markovian for a problem iff the existence of a transition compatible with with feature valuations and , implies the existence of transitions with the same feature valuations and , in all states that are optimal for in . The policy is Markovian for a class of problems if it is so for each in .
The states on which the Markovian condition is required in the definition are not necessarily among those which are reachable with the policy. The definition captures a weak Markovian assumption. The standard, but stronger, Markovian assumption requires the same condition on all states that reach the feature valuation and not just those that reach optimally. A sufficient condition that ensures the (strong) Markovian property is for the policy to be deterministic over the feature valuations, meaning that if is followed by in some transition compatible with the policy, can always be followed by in every state transition compatible with the policy where , and moreover, that can only be followed by then. This form of determinism results when the bodies of the policy rules are logically inconsistent with each other, the heads do not have uncertain effects or , the domain is such that all increments and decrements and are by fixed amounts, and there is a transition compatible with in the reachable states where holds.
For reasoning about the policy by reasoning about the features, the features however must also distinguish goal from non-goal states:
Definition 8 (Separation).
The features separate goals from non-goals in iff there is a set of boolean feature valuations such that for any problem in and any reachable state in , is a goal state iff is in . The valuations in are called goal valuations.
The boolean feature valuations determined by state refer to the truth valuations of the expressions and for the boolean and numerical features and in , respectively. While the number of feature valuations is not bounded as the size of the instances in is not bounded in general, the number of boolean feature valuations is always .
The last notion needed for relating general policies and the width of the instances is the notion of optimality:
Definition 9 (Optimal policies).
A policy that solves a class of problems is optimal if any plan induced by over a problem in is optimal for .
If the policy solves a problem , the plans induced by the policy are the action sequences that yield the goal-reaching state trajectories that are compatible with . These plans are optimal for if there are no shorter plans for .
It can be shown that if is a Markovian policy that solves the class of problems optimally with the features separating goals from non-goals, then any feature valuation reached in the way to the goal in an instance in is reached optimally; i.e., no action sequence in can reach a state with the same feature valuation in a smaller number of steps.
Theorem 10.
Let be a Markovian policy that solves a class of problems optimally, and where the features separate goals in . Then, is optimal relative to feature valuations. That is, if is a sequence of actions induced by over an instance that reaches the state , is an optimal plan in for the feature valuation ).
Proof.
Let be an optimal trajectory for compatible with , and let be an optimal trajectory that ends in state with , for . We want to show that . By the Markovian property, there is a -transition with . If extended with the state is an optimal trajectory for , we can extend it again with a - into a trajectory for . Otherwise, there is an optimal trajectory for which can then be extended using the Markovian property into a trajectory for . Thus, repeating the argument, there is an optimal trajectory for of length at most (where the length of a trajectory is the number of transitions in it). On the other hand, since is optimal and the features separate the goals, is a goal-reaching trajectory and thus which implies . Since contains a trajectory that reaches on steps, and thus . ∎
As a result, under these conditions, the width of the instances in can be bounded by the number of features in the policy , provided that the features are represented explicitly in the instances:
Theorem 11.
Let be a Markovian policy that solves a class of problems optimally, where the features separate goals. If the features in are explicitly represented in the instances in , .
Proof.
Let , let be a goal-reaching -trajectory in , and let , . Clearly, and we only need to show that the chain is admissible; i.e., it satisfies the 3 conditions in Definition 1. The first condition is direct since .
For the second condition, let be an optimal plan that achieves tuple at state . We need to show that there is an action in such that is an optimal plan for . Since the transition is reachable by (i.e., belongs to ), the Markovian property implies there must be a -transition associated to an action such that . By Theorem 10, and any optimal plan for is of length . Therefore, the plan is a plan for of length that is optimal.
For the last condition, we show that any optimal plan that achieves is an optimal plan for problem . This is direct since is a goal valuation as separates goals: a trajectory is goal reaching iff it ends is a state that makes true. ∎
Indeed, if a policy solves optimally under the given conditions, an admissible chain of size can be formed for solving each problem in optimally, where is the valuation of the features in at the -th state of any state trajectory that results from applying the policy.
Related to this result, if we let refer to the variant of IW that replaces tuples of atoms by feature valuations and thus deems a state novel in the search (unpruned) when has not been seen before, one can show:
Theorem 12.
Let be a Markovian policy that solves a class of problems optimally with features that separate the goals. The procedure solves any instance in optimally in time where is the number of atoms in .
can be thought as a standard breadth-first search that treats states with the same feature valuations as “duplicate” states. It runs in time as the number of features in is fixed and does not grow with the instance size, as opposed to that stands for the number of atoms in the instance.
Proof.
Let be a goal-reaching trajectory generated by on problem in , and let be the sequence of feature valuations for the states in the trajectory. We prove by induction on that finds a node for state such that , the path from the root node to is optimal, and the node is not pruned. The base case is direct since the empty path leads to and it is optimal. Let us assume that the claim holds for . By inductive hypothesis, finds a node for state such that , the path from to is optimal, and is not pruned. Since the transition exists in , the Markovian property implies that there is a transition with , and thus generates a node for state . If is the first node where holds, it is not pruned. Otherwise, there is another node that makes true and the length of the path from to is less than or equal to the length of the path from to . In either case, since the length of an optimal path that achieves is by Theorem 10, finds an optimal path to a node that achieves and is not pruned.
We have just shown that finds an optimal path to a state that makes true. Since the features separate the goals, such a path is an optimal path for . ∎
Example. A general policy for Boxes is given by the rules: and where and are two features that count the number of marbles in a box with a least number of marbles, and the number of boxes left on the table. Since the policy complies with the conditions in Theorem 11 and the two features are represented explicitly in the language , it follows that the width of instances of Boxes in such encoding is .
Example. Similarly, the policy with features for given by the two rules in (1) is Markovian, solves optimally, and the features separate goals. Yet there are no atoms representing the counter explicitly. Still, Theorem 12 implies that solves the instances in optimally in quadratic time.
Theorem 11 relates the number of features in a general policy that solves with the width of provided that the features are part of the problem encodings. This, however, is not strictly necessary:
Theorem 13.
Let be an optimal and Markovian policy that solves a class of problems over some domain for which separates the goals. The width of the problems is bounded by if for any sequence of feature valuations generated by the policy in the way to the goal, there is a sequence of sets of atoms in of size at most such that the optimal plans for and the optimal plans for coincide.
Proof.
Let be the sequence of feature valuations generated by the policy in the way to the goal in some instance in , and let be the sequence of sets of atoms that comply with the conditions in the theorem, where for . We show that is an admissible chain for .
First, since any optimal plan for is an optimal plan for and the empty plan is optimal plan for , . Second, if is an optimal plan for , is an optimal plan for , and therefore an optimal plan for as the features separate the goals. Finally, if is an optimal plan for we must show that there is an action in such that is an optimal plan for , . Let be an optimal plan for that ends in state . Since is Markovian and is optimal for , there is a transition in with . That is, there is an action such that the plan reaches . By Theorem 10, the optimal plans for are of length , and thus is an optimal plan for ∎
Example. Consider an instance in where are the blocks above initially, from top to bottom, . The feature valuations in the way to the goal following the Markovian policy are , , and , . The policy is optimal, Markovian, and the features separate the goals. The tuples of atoms in that capture these valuations as expressed in Theorem 13 are and for , and . Since these tuples have size , the widths and are both equal to .
Admissible Chains and Projected Policies
Theorem 13 relates the width of a class to the size of the atom tuples in the instances of that capture the values of the features , following an optimal policy for . We extend this result now by showing that it is often sufficient if the tuples capture the value of the features in some of those trajectories only. The motivation is to explain all proofs of bounded width for infinite classes of problems that we are aware of in terms of general policies. For this, we start with the notion of feasible chains:
Definition 14 (Feasible chain).
Let be a chain of tuples of atoms from . The chain is feasible in problem if is true in the initial state, the optimal plans for have length , and they are all optimal for .
An admissible chain, as used in the definition of width, is a feasible chain that satisfies an extra condition; namely, that every optimal plan for the tuple in the chain can be extended with a single action into an optimal plan for the tuple , . In order to account for this key property, we map feasible chains into features and policies as follows:
Definition 15.
Let be a chain of atom tuples from , and let denote the boolean state feature that is true in when is true in and is false for all , . The chain defines a policy over with rules , .
The first result gives necessary and sufficient conditions for a chain to be admissible.
Theorem 16.
Let be a chain of tuples for a problem . is admissible in if and only if is feasible and the policy solves optimally and is Markovian.
Proof.
Let be the initial state of problem . As always, the length of a (state) trajectory is . We first establish some claims:
-
C1.
If is admissible, there is no optimal trajectory for that also reaches for .
Proof. Any optimal trajectory for can be extended with transitions into an optimal trajectory for , thus no optimal trajectory for can reach , .
-
C2.
If is admissible, the optimal trajectories for have length like the optimal trajectories for , .
Proof. Let be an optimal trajectory for , and let be an optimal trajectory for . By definition, , by C1, , and by admissibility of , .
-
C3.
If is feasible, is optimal and Markovian for , and is a trajectory for , there is an optimal trajectory for of length at most , .
Proof. Let be a goal-reaching -trajectory for , and let be a trajectory for . reaches some feature for . Then, there is an optimal trajectory for , and thus for , of length . Using the Markovian property with , the trajectory can be extended with a -transition that reaches . Repeating the argument, we find an optimal trajectory for of length at most .
-
C4.
If is feasible, is optimal and Markovian for , and is an optimal trajectory for , , .
Proof. Let be an optimal trajectory for . By C3, there is an optimal trajectory for of length at most . By feasibility of , the length of any optimal trajectory for is exactly . Therefore, which implies . On the other hand, any goal-reaching trajectory induced by contains a subtrajectory of length for , and thus .
-
C5.
If is feasible, is optimal and Markovian for , and is an optimal trajectory for , is also optimal for , .
Proof. In the proof of C3, if the feature is for , we can find an optimal trajectory for of length at most since by C4. Then, reaches . On the other hand, if is an optimal trajectory for with , using the Markovian property we can construct a goal-reaching trajectory of length strictly less than . Therefore, is optimal for .
Forward implication. Let us assume that is admissible for . Then, by definition, is feasible. Let us now consider a maximal -trajectory in . Since the initial state makes true , by admissibility and C1 and C2, the trajectory is of length and ends in a state that makes true . In particular, is an optimal trajectory for and thus it is an optimal goal-reaching trajectory for . This shows that is optimal. Finally, to see that is Markovian for , let be a -reachable transition and let be a state closest to with . On one hand, By definition, there is one and only one policy rule in that is compatible with , and thus the states and make true the features and respectively. On the other hand, by C2, is a closest state to that makes true. Hence, by admissibility, the trajectory that leads to can be extended with one transition into an optimal trajectory for . Therefore, by C1, the state makes true , and the transition is compatible with . Hence, is Markovian.
Backward implication. Let us assume that is feasible, and that is optimal and Markovian for . We need to show that is an admissible chain; i.e., 1) is true in the initial state , 2) any optimal plan for can be extended into an optimal plan for by adding a single action, and 3) any optimal plan for is an optimal plan for . Conditions 1 and 3 follow directly from the definition of feasible chains. For 2, let be an optimal trajectory that reaches ; its length is by C4. We want to show that can be extended with a transition into an optimal trajectory for . By C5, is an optimal trajectory for , and thus, by the Markovian property, there is a transition that is compatible with . Therefore, the state makes true and . The extended trajectory is optimal for by C4. ∎
This result connects admissible chains with policies that are optimal and Markovian, but it does not connect admissible chains with general policies. This is done next. We first define when a policy can be regarded as a projection of another policy :
Definition 17 (Projection).
Let be a policy over a class and let be a policy for a problem in . The policy is a projection of in if every maximal state trajectory compatible with in is a maximal state trajectory in compatible with .
Notice that it is not enough for the state trajectories compatible with to be state trajectories compatible with ; it is also required that if is a final state in the first trajectory that it is also a final state in the second one. This rules out the possibility that there is a continuation of the first trajectory that is compatible with but not with . A result of this is that if is optimal for , the projected policy must be optimal for . From this, the main theorem of this section follows:
Theorem 18.
Let be an optimal policy for a class of problems. If for any problem in , there is a feasible chain of size at most such that is a projection of in that is Markovian, then .
Proof.
Theorem 16 implies that if is feasible and is Markovian and a projection of an optimal policy in , then is admissible, and hence that is bounded by the size of (i.e., max size of a tuple in ). If this size is bounded by for all in , is bounded by . ∎
While the proof of this theorem is a direct consequence of Theorem 16, the result is important as it renders explicit the logic underlying all proofs of bounded width for meaningful classes of problems that we are aware of. In all such cases, the proofs have been constructed by finding feasible chains with tuples that are a function of the instance , and whose role is to capture suitable projections of some general optimal policy .
Example. Let be a grid navigation domain where an agent at some initial cell has to move to a target cell with atoms of the form and that specify the coordinates of the agent along the two axes. An optimal policy for can be obtained with the singleton where the linear feature measures the distance to the target cell, identifies the goal states, and there is a single policy rule . Let be a problem in where the agent is initially located at cell and the goal cell is , which for simplicity satisfies and . Further, let be the chain of tuples where for , and for . Then, since is feasible, and is the projection of the optimal policy while being Markovian, it follows from Theorem 18 that . Notice that the tuples in do not capture the values of the distance feature in all the trajectories that are compatible with the policy in (an exponential number of such trajectories), but in one such trajectory only; namely, the one that is compatible with the projection of , with states , , where the tuple is true iff the feature used in is true.
Example. In the Delivery () domain an agent moves in a grid to pick up packages and deliver them to a target cell, one by one; Delivery-1 () is the version with one package. A general policy for and is given in terms of four rules that capture: move to the (nearest) package, pick it up, move to the target cell, and drop the package, in a cycle, until no more packages are left. The rules can be expressed in terms of the features that express holding, distance to the nearest package (zero if agent is holding a package or no package to be delivered remains), distance to the target cell, and the number of undelivered packages respectively. Hence, identifies the goal states for the problems in the classes and for the problems and respectively. The rules that define the policy are , , , and .444A different formulation involves packages that need to be delivered to target cells that depend on the package. The set of features is the same except that is defined as the distance to the current target cell (zero if agent holds nothing or there are no more packages to deliver). A policy for this formulation has the rules , , , and .
Let us consider a problem in the class . If the encoding of contains atoms like , , , and , it can be shown that has width . Indeed, without loss of generality, let us assume that in , the package is initially at and has to be delivered at , and the agent is initially at , and let be the chain made of tuples of the form for the cells on a shortest path from to , followed by tuples of the form , with now ranging over a shortest path from up to , and a last tuple of the form . It is easy to show that the chain is feasible, and is the projection of the general policy that in is both optimal and Markovian (although not in ). Then, by Theorems 16 and 18, it follows that the chain is admissible, and .
Serialized Width
Theorem 18 suggests that bounded width is most often the result of general policies that can be projected on suitable tuples of atoms. We extend now these relations to the much larger and interesting class of problems that have bounded serialized width, where the general policies do not have to be optimal or Markovian. The notion of serialized width is based on the decomposition of problems into subproblems, and explicates and generalizes the logic and scope of the Serialized IW (SIW) algorithm of Lipovetzky and Geffner (2012). There are indeed many domains with large or unbounded widths that are simple and admit simple policies, like the Delivery domain above.
In SIW, problems are serialized by using one feature, a goal counter denoted by , that tracks the number of unachieved goal atoms, assuming that goals are conjunctions of atoms. Other serializations have appealed to other counters and also to heuristics (Lipovetzky and Geffner 2017a).555These serializations appear in state-of-the-art algorithms like BFWS that use novelty measures inside complete best-first procedures.
We take a more general and abstract approach that replaces the goal counter by a strict partial order (an irreflexive and transitive binary relation), over the feature valuations, also referred to as -tuples. A serialization for a class of problems is defined as follows:
Definition 19 (Serializations).
Let be a class of problems, let be a set of goal-separating features for , and let be a strict partial order over -tuples. The pair is a serialization over if 1) the ordering is well-founded; i.e. there is no infinite descending chain where stands for , and 2) the goal feature valuations are -minimal; i.e., no for any feature valuation .
A serialization over decomposes each problem into subproblems which define the width of the serialization or serialized width:
Definition 20 (Subproblems).
Let be a serialization. The subproblem is the problem of finding a state reachable from such that is goal in or ; i.e., the goal states in are the goal states in and the states such that . The collection of subproblems for problem is the smallest subset of problems that comply with:
-
1.
if the initial state in is not goal, ,
-
2.
if and is a non-goal state in that is at shortest distance from such that , and no goal state of is strictly closer from than
Definition 21 (Serialized width).
Let be a serialization for a collection of problems. The serialized width of problem relative to is , written as with the ordering “” left implicit, if there is a subproblem in that has width , and every other subproblem in has width at most . The serialized width for is , written as , if for some problem , and every other problem .
If a class of problems has bounded serialized width and the ordering can be tested in polynomial time, each problem in can be solved in polynomial time using a variant of the SIW algorithm: starting at the state , performs an IW search from to find a state that is a goal state or that renders the precedence constraint true. If is not a goal state, is set to , , and the loop repeats until a goal state is reached. The result below follows from the observations by Lipovetzky and Geffner (2012) once the goal counter is replaced by a partial order, and the notion of serialized width is suitably formalized:
Theorem 22.
Let be a collection of problems and let be a serialization for with width . Any problem in is solved by SIWΦ in time and space bounded by and respectively, where bounds the branching factor in , is the number of atoms in , and bounds the time to test the order for the -tuples that arise from the states in .
Proof.
We show something more general. If is a problem in , finds a plan for and state sequence such that
-
1.
the state is the initial state in ,
-
2.
the state , reachable from state , can be found by on the problem , for ,
-
3.
, for ,
-
4.
is a goal state, and
-
5.
.
achieves this in time and space bounded by and respectively.
By definition, calls IW iteratively to find such a sequence. In the first call, over subproblem , IW finds a state such that . Then, iteratively, after is found, IW is called to solve the subproblem . The process continues until IW finds a goal state .
Let us assume that such a sequence is found by . A run of IW involves runs of until the problem is solved (guaranteed to succeed before the call to since ). By Theorem 2 and the assumption on the cost of testing , each call to IW requires time and space bounded by and respectively. The plan found by IW that maps into has length bounded by . All these plans are concatenated into a single plan that solves . Since the number of -tuples is bounded by , invests total time and space bounded by and respectively, and .
We now show that the state sequence is indeed found. The first requirement is clear since is the initial state in . We reason inductively to show that the subproblem belongs to the collection of subproblems, and that the state is at minimum distance from such that .
For the base case, belongs to by definition of when is not a goal state; if is a goal state, the sequence with the single state satisfies the requirements. Since , is guaranteed to find a state at minimum distance from that is a goal or . If is a goal state, the sequence satisfies the requirements. Otherwise, by definition of , belongs to . Assume now that is not a goal state and belongs to . A similar argument shows that finds a state at minimum distance from that is a goal or . If is a goal state, has found the required state sequence. Otherwise, the problem belongs to . Finally, since the number of -tuples is finite, this process terminate with a goal state . ∎
The class of instances for the problem VisitAll, where all cells in a grid must be visited, has serialized width for where counts the number of unvisited cells (unachieved goals), with the obvious ordering (’’ set to ’’). The class of Blocksworld instances cannot be serialized into subproblems of bounded width with because achieved goals may have to be undone, yet with where counts the number of misplaced blocks (i.e., not on their targets or above one such block), . The result above implies that SIWΦ solves these problems in polynomial time.
From General Policies to Serializations
We show next how serializations can be inferred from general policies, a result that paves the way to introduce a convenient language for defining serializations. Given a general policy that solves a class of problems for features that separates the goals, the intuition is to consider serializations where if there is a state transition compatible with the policy in some instance in with and . The problem with this ordering, however, is that it does not necessarily define a strict partial order as required, as it is possible that there is another transition compatible with the policy in another instance where and . In other words, even if the policy orders the feature valuations in a strict partial order in an instance , it is possible that the orderings over two or more instances in cannot be combined into a single strict partial order.
Thus in order to obtain a serialization over a class of problems from a policy certain additional restrictions are required. For this, we focus on the policies that solve structurally; i.e., by virtue of the effects of the actions on the features as expressed in the policy rules in . A policy solves a class of problems structurally when this can be verified from the form of the policy rules without having to consider the instances in . Fortunately, we do not have to formulate the conditions under which a policy solves a class of problems structurally from scratch, we can use ideas developed in the context of QNPs (Srivastava et al. 2011; Bonet and Geffner 2020).
The key idea is the notion of terminating policy graph. Recall that every feature valuation defines a projected boolean valuation over the expressions and for the boolean and numerical features and in , where and are true in iff and are true in respectively. The policy graph for policy is:
Definition 23 (Policy graph).
The policy graph for policy has nodes , one for each of the boolean feature valuations over , and edges labeled with if is not a goal valuation and is compatible with a rule in the policy.
The edge is compatible with a rule if is true in , and is compatible with ; namely, if (resp. ) is in , (resp. ) must be true in , and if is in , must be false in . Effects , , and in put no constraints on . In particular, in the latter case, can be either true or false in , meaning that after a decrement, a numerical feature may remain positive or have value . Notice that the policy graph associated with a policy does not take the class of problems into account. Indeed, the policy graph may have an edge even if there is no state transition in an instance in where this transition between boolean feature valuations arises. The policy graph and the notion of termination below are purely structural as they depend on the form of the policy rules only:
Definition 24 (Termination).
A policy and a policy graph are terminating if for every cycle in the graph , i.e., any sequence of edges with labels that start and end in the same node, there is a numerical feature in that is decreased along some edge and increased in none. That is, for some -th edge in the cycle, and and for all others edges in the cycle.
The termination condition can be checked in time that is polynomial in the size of the policy graph by a procedure called Sieve that looks at the strongly connected components in the graph (Srivastava et al. 2011; Bonet and Geffner 2020).666See Bonet and Geffner (2020) for more details on the formal properties of termination in QNPs. Our setting is slightly different as our numerical features are linear and cannot grow without bound as in QNPs, but we do not use this property to define termination. A key property of terminating policies is that they give rise to state trajectories that are finite.
Theorem 25.
Let be a terminating policy with features over that separate the goals. Then, cannot give rise to infinite state trajectories in instances in .
Proof.
For a proof by contradiction, let us assume that there is an infinite state trajectory that is compatible with . Since the number of states in is finite, there must be indices such that . The subsequence defines a cycle in the policy graph for . The termination condition implies that there is a numerical feature in that is decreased by some edge in the cycle but not increased by any edge in the cycle. However, this is impossible since implies that the feature has the same value on both states. Therefore, such infinite state trajectory cannot exist. ∎
Since terminating policies induce state trajectories with a final state, a sufficient condition for a terminating policy to solve a class is to be closed:
Definition 26 (Closed policy).
A policy is closed over a class of problems if for any non-goal state in a problem in that is -reachable, there is a transition that is compatible with .
Theorem 27.
If is closed over , the features in separate goals in , and is terminating, solves .
Proof.
If does not solve instance in , there is a maximal non-goal reaching trajectory in that is compatible with . This trajectory is either infinite or terminates at state where no transition compatible with exists in . The former is impossible by Theorem 25, and the latter is impossible since is closed. ∎
We are interested in the conditions under which general policies determine serializations, and in particular, serializations with bounded width. For the first part, we do not need the policy to be closed or solve , but just to be terminating:
Theorem 28.
A terminating policy with features over that separate goals determines a serialization of where ‘’ is the minimal strict partial relation (i.e., the transitive closure) that satisfies for a non-goal valuation , if is compatible with a policy rule in .
Here a pair of feature valuations is compatible with a rule , if is true in and the change in values from to is compatible with . Notice that if a state transition is compatible with , the pair of feature valuations is compatible with the rule, but the existence of such a state transition is not a requirement for the pair to be compatible with the policy rule; they can be arbitrary feature valuations over .
Proof.
We must show that is irreflexive and well founded. For the first, assume that there is a sequence of feature valuations with and . Without loss of generality, we may assume that the pair is compatible with a policy rule (i.e., is not the result of the transitive closure). In such a case, the boolean valuations associated with the feature valuations must form a cycle in the policy graph, and therefore from the termination condition, some variable must be decreased and not increased in the cycle, which contradicts .
For well-foundness, let us assume that there is an infinite chain where stands for , and let us assume, without loss of generality, that satisfies a policy rule. Consider the sequence boolean valuations associated with the feature valuations , let be a boolean valuation that appears infinitely often in the sequence, and let be the set of edges in the policy graph that are traversed infinitely often in the sequence. Then, by definition of , there is an infinite subsequence of indices such that and all the edges traversed in the sequence belong to , for . By definition of termination, there is a numerical feature that is decremented in some edge in but not incremented by any edge in . Yet this is impossible, since the numerical feature is integer valued, non-negative, it is decremented infinitely often, and never incremented. The contradiction comes for supposing the existence of the infinite chain . Therefore, is a well founded strict partial order. ∎
There are simple syntactic conditions that ensure that a set of policy rules and the graph that they define are terminating (Bonet and Geffner 2020). For example: a) each policy rule decreases some feature and increases none, b) there is an ordering of the numerical features such that each policy rule decreases a feature , while possibly increasing features , , and c) a further generalization where b) holds except in some policy rules that affect boolean features only, which cannot produce a cycle of truth valuations over the boolean features without involving a policy rule that affects some numerical variable. If the set of policy rules complies with the conditions a) or b), the rules are said to be regular; if they comply with c), the rules are said to be weakly regular. In either case, these are sufficient syntactic conditions that ensure termination; unlike the conditions captured by Definition 24, they are not necessary.
Example. The policy for Boxes with rules and and goal is closed and regular. It is closed since for any state where , there is a transition that is compatible with : if , results from an action that puts an empty box away, while if , results from an action that puts a marble away from a box with a least number of marbles. Theorem 27 implies that solves .
If a terminating policy for the class defines a serialization over , a terminating policy that solves should define a serialization over with width. Two conditions are needed for this though. The first is the notion of goal connectedness:
Definition 29 (Goal connected).
A policy for with goal separating features and its policy graph are said to be goal connected when all nodes associated with the initial states of instances in are connected only to nodes that are connected with goal nodes.
Clearly, a policy for is not closed if its policy graph is not goal-connected, but goal-connectedness does not imply that the policy is closed. For this, we need a condition that goes beyond the structure of the policy graph:777The notion of soundness is similar to action soundness in QNPs when QNPs are used to abstract classes of problems (Bonet and Geffner 2018; Bonet, Frances, and Geffner 2019).
Definition 30 (Sound policy).
A policy over is sound if for any reachable non-goal state in an instance in where the policy rule is applicable (i.e., where holds), there is a transition in that is compatible with .
Soundness and goal connectedness imply that a policy is closed, and both are critical for establishing the conditions under which the serialization induced by a terminating policy has zero width:
Theorem 31.
If is sound and goal-connected in , then is closed in .
From Theorem 27, it follows that a terminating policy that is sound and goal-connected in , solves . In such a case, we say that the policy solves the class of problems structurally, as two of the conditions, termination and goal-connectedness, can be tested on the policy graph. Soundness, on the other hand, is the condition that ensures that only sink nodes in the policy graph over any instance are the goal nodes.
Proof.
If is not closed on some instance in , there is a non-goal state in that is reachable from the initial state using , but there is no transition that is compatible with . The node is connected in the policy graph to , which from the definition of goal connectedness, must be connected to a goal node. This implies that there is an edge in the policy graph, and therefore there is some policy rule that is applicable at . Soundness of then implies that there is a transition that is compatible with , a contradiction. ∎
As expected, if a terminating policy solves , the width of under the serialization determined by the policy (Theorem 28) is , provided however, that certain structural conditions hold:
Theorem 32.
Let be a policy that solves structurally and let be the serialization over determined by . If is sound and goal connected, .
Proof.
By definition of serialized width, for any problem in , we need to show that the width of any subproblem in is zero. For this, it is enough to show that for any such state , there is a transition that is compatible with because then, by definition, and has width equal to zero.
If is in for the initial state of , is not a goal state and there is some transition that is compatible with since solves . Let us now consider the subproblem for . By definition, there is a subproblem in such that is a non-goal state at shortest distance from with . By definition of and reasoning inductively, there is a path in the policy graph for from the boolean feature valuation to . Since the features in separate goals in and is goal connected, there is a policy rule that is applicable at . Thus, by soundness, there must be a transition in that is compatible with . ∎
Example. The policy for Boxes in the previous example is closed, goal connected, and sound. By Theorem 32, it determines a serialization of width zero, where iff or and .
Example. In Delivery, we use the features and the policy defined by the rules , , , and . It is easy to check that is sound for the classes and since for any reachable non-goal state in a problem , there is a transition that is compatible with . On the other hand, the policy graph for , depicted in Fig. 1, is clearly terminating and goal connected. Since separates goals for the classes and , by Theorem 27, solves both classes structurally, and the induced serialization has width zero for the classes and by Theorem 32.
Sketches: A Language for Serializations
The notion of serialization plays a key role in the connection between general policies and serialized width, and since width is a special case of serialized width,888 for the empty serialization ; i.e., the one for which is always false. serializations emerge as a central concept in the study. Indeed, serializations can be used to reduce the width of a problem, and if this width is reduced to zero over , a general policy for results. We focus next on a specific language for specifying serializations in a compact manner. The language can be used either to encode serializations by hand, as a form of domain-specific knowledge for extending the scope of width-based algorithms such as SIW, or for learning serializations from traces. Such uses, however, are beyond the scope of this paper.
A policy sketch or simply sketch , for a class of problems , is a set of policy rules over the features that distinguish the goals of . The sketch can be a full fledged policy over , part of it, or just set of policy rules , including the empty set. By interpreting as a policy, we can transfer previous results that involve policy graphs and termination. We call the rules in a sketch , sketch rules because their semantics is different from the semantics of policy rules.
Definition 33 (Sketch).
A sketch for is a set of sketch rules over features that separate goals in . The sketch is well-formed if the set of rules interpreted as a policy is terminating.
Notice that the definition of terminating policies does not require the policy to be closed or even to solve . Theorem 28 directly yields:
Theorem 34.
A well-formed sketch for defines a serialization over where ‘’ if the smallest strict partial order that satisfies if the pair of feature valuations is compatible with a sketch rule in .
The distinction between policy and sketch rules is semantical, not syntactical. A policy defines a filter on state transitions: a state transition is compatible with the policy if it is compatible with one of its rules. A sketch , on the other hand, is not to be used in this way: a well-formed sketch defines a serialization, and a sketch rule defines subproblems of the serialization: the subproblem of going from a state where holds in to a state with feature valuation such that the pair is compatible with the sketch rule. The key difference is that this subproblem is not solvable in a single step in general. Theorems 25 and 28 about policy rules that are terminating and that induce well-formed serializations are valid for sketch rules, because in both cases, the relevant notions, like sketch graphs, are defined structurally, without considering the state transitions that are possible in the target class . Another way to see the difference between policies and sketches is that (terminating) policies play two different roles in our analysis: they specify control, i.e., the possible actions to be done in a given instance of , and they define serializations. Sketches, on the other hand, play the latter role only.
Sketches provide a language for decomposing a problem into subproblems and thus for reducing its width, which goes well beyond the language of goal counters and variations, as the language for sketches includes the language of general policies.
Example. The serialization given by the single feature that counts the number of unachieved (top) goals is captured with the sketch that only contains the rule when there are no other features, and the rule when and are other features. The rules say that it is “good” to decrease the goal counter independently of the effects on other features. In problems such as Blocksworld, this serialization does not work (has unbounded width), but serializing instead with the single feature that counts the number of misplaced blocks with the sketch rule , yields a serialization of width . A block is misplaced when it is on a wrong block or is above a mislaced block. The sketch thus decomposes any Blocksworld problem into subproblems whose goals are to put a block away or to put them at the right place. The width of the first subproblem is 1 while for second is 2.
Our last results are about the width of the serializations defined by sketches, and the modifications in the algorithm to work with sketches:
Definition 35 (Sketch width).
Let be a well-formed sketch for a class of problems such that separates the goals, and let be a reachable state in some instance of . The width of the sketch at state of problem , , is the width of the subproblem that is like but with initial state and goal states such that is a goal state of , or the pair is compatible with a sketch rule . The width of the sketch , , is the maximum width for any reachable state in any problem in .
Theorem 36.
Let be a well-formed sketch for a class of problems, and let be the serialization determined by from Theorem 34. The width of the serialization is bounded by the width of the sketch.
Proof.
By definition, if the width of the subproblems in is bounded by , for any in . The goals of subproblem are the states that are a goal of , or . In particular, if the pair is compatible with a sketch rule, then , but the converse does not hold in general. That is, for any subproblem in , the goals of the subproblem are also goals of . Hence, and . ∎
If a well-formed sketch has bounded width for a class of problems , then the problems in can be solved in polynomial time by the algorithm that is like , with the difference that the precedence test among pairs of feature valuations and is replaced by the test of whether the feature valuation pair is compatible with a rule in . In other words, start at the state , where is the initial state of , and then performs an IW search from to find a state that is a goal state or such the pair is compatible with a sketch rule in . Then if is not a goal state, is set to , , and the loop repeats until a goal state is reached. The precedence test in can be done in constant time unlike the general test in .999For testing , one needs to check if there is a sequence of feature valuations such that , , and each pair is compatible with a sketch rule, . However, this test can be done in constant time too, provided that the binary relation ‘’ for each instance is precompiled in a boolean hash table with rows and columns where is the number of atoms in , is the number of features, and is the number of feature valuations in . Unlike the procedure , this precompilation is not practical in general. The efficiency of comes at a price: by testing “progress” with the sketch rules directly and not with the serializations that results from such rules, is not using the serialization at full, as it ignores the transitive closure of the precedence relations. The runtime properties of are thus similar to those of , as captured in Theorem 22, with precedence tests that can be done in constant time:
Theorem 37.
Let be a well-formed sketch for a class of problems. If the sketch width is bounded by , solves any problem in in time and space, where and bound the branching factor and number of atoms in respectively.
Proof.
Like the proof of Theorem 22 but with the test in replaced by checking if the pair of feature valuations is compatible with a rule in . ∎
Example. Different and interesting sketches are given in Table 1 for the two classes of problems for Delivery: the class of problems with an arbitrary number of packages and the class of problems with a single package. In the table, the entries in the columns and upper bound the width of the different sketches in the table for the two classes of Delivery problems. The entries and ‘—’ stand respectively for unbounded width and ill-defined (non-terminating) sketch. The features used are: (boolean) for holding a package, is distance to nearest package (zero if holding a package or no package to be delivered remains, is distance to current target cell (zero if holding nothing), and is number of packages still to be delivered.
Policy sketch | ||
2 | ||
2 | ||
1 | ||
— | — | |
2 | 2 | |
1 | 1 | |
2 | ||
2 | ||
0 | 0 |
We briefly explain the entries in the table without providing formal proofs (such proofs can be obtained with Theorem 18). is the empty sketch whose width is the same as the plain width, 2 for and unbounded for , as no problem is decomposed into subproblems. The rule in does not help in initial states that do not satisfy , and hurts a bit in states that do satisfy . Indeed, in the latter states , there is a state at one step ahead from with (obtained from by dropping the package) that gets chosen by any algorithm like . However, in the resulting subproblem with initial state there is no state for which holds as there is no sketch rule where is true in . As a result, the goal of the subproblem rooted as is the true goal of the problem, which may actually involve going back from to and from to the goal. This is indeed what does given . Since this subproblem has width , the serialized width of remains . For , the rule says that a state where holds can be “improved” by finding a state where holds, while possibly affecting , , or both. Hence, any problem in is split in two subproblems: achieve first and then the goal, reducing the sketch width of to 1 but not the sketch width of . The sketch is not well-formed as it is not terminating: indeed the resulting ordering is not a strict partial order: a state where the agent is at the same location as a package but does not hold it is “improved” into a state by picking the package, which is improved again by dropping the package without affecting any numerical feature. For , that subgoals on the top goal captured by the feature , that counts the number of undelivered packages, the sketch width of reduces to 2 but that of remains unchanged. The sketch combines the rules in and , and as a result, decomposes into width 2 subproblems, the first of which, that corresponds to , is further decomposed into two subproblems. The sketch width of both and becomes then 1: the first subproblem is to collect the nearest package, the second one to drop it at the target cell, and the same sequence of subproblems gets repeated. The sketch renders the subproblem of getting close to the nearest package with width , but the remaining subproblem in that involves picking up the package and delivering it at the target cell, still has width 2. The sketch does not decompose the problem when is initially false. Finally, the combination in yields a full policy, and thus a serialization of width 0 where each subproblem is solved in a single step.
Conclusions
We have established a number of connections between the notions of width, as developed and used in classical planning, and the notion of generalized plans, which are summarized in Table 2. The results suggest a deep connection between these two notions and that bounded width for infinite collections of problems is often the result of simple general policies that solve optimally in terms of features that are partially represented in the problem encodings. When this is not the case, we have shed light on the representations that deliver such properties, and hence, polynomial-time searches. We have also formalized and generalized the notion of serialized width by appealing to an explicit and abstract notion of serializations, and established connections between generalized policies and serialized width by borrowing notions from QNPs. Moreover, from this connection, we introduced policy sketches which make use of the language of policy rules but with a different semantics for providing a convenient and powerful language for specifying subproblems and serializations that can be exploited by algorithms such as SIW. The language can be used for encoding domain-specific knowledge by hand, or as a target language for learning domain serializations automatically from traces. These are interesting challenges that we would like to address in the future.
Thm | Notes |
---|---|
2 | Performance and guarantees of . |
11 | Optimal and Markovian policies bound width if features encoded. |
12 | Markovian policies guarantee optimal solutions with . |
13 | Bounded width also when tuples capture the features in all optimal trajectories. |
18 | Bounded width when tuples capture features in a projection. |
22 | Performance and guarantees of SIWΦ. |
25 | Conditions for termination of policies. |
27 | Closed and terminating policies define structural solutions. |
28 | Terminating policies define serializations. |
32 | Structural solutions that are sound and closed define serializations of zero width. |
34 | Well-formed sketches define serializations. |
36 | Sketch width bounds width of induced serialization. |
37 | Performance and guarantees of given sketches. |
Acknowledgments
The research is partially funded by an ERC Advanced Grant (No 885107), by grant TIN-2015-67959-P from MINECO, Spain, and by the Knut and Alice Wallenberg (KAW) Foundation through the WASP program. H. Geffner is also a Wallenberg Guest Professor at Linköping University, Sweden.
References
- Bandres, Bonet, and Geffner (2018) Bandres, W.; Bonet, B.; and Geffner, H. 2018. Planning with Pixels in (Almost) Real Time. In AAAI.
- Belle and Levesque (2016) Belle, V.; and Levesque, H. J. 2016. Foundations for Generalized Planning in Unbounded Stochastic Domains. In Proc. KR, 380–389.
- Bellemare et al. (2013) Bellemare, M. G.; Naddaf, Y.; Veness, J.; and Bowling, M. 2013. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47: 253–279.
- Bonet, Frances, and Geffner (2019) Bonet, B.; Frances, G.; and Geffner, H. 2019. Learning features and abstract actions for computing generalized plans. In Proc. AAAI, 2703–2710.
- Bonet and Geffner (2018) Bonet, B.; and Geffner, H. 2018. Features, Projections, and Representation Change for Generalized Planning. In Proc. IJCAI, 4667–4673.
- Bonet and Geffner (2020) Bonet, B.; and Geffner, H. 2020. Qualitative Numeric Planning: Reductions and Complexity. Journal of Artificial Intelligence Research (JAIR) 69: 923–961.
- Bonet and Geffner (2021) Bonet, B.; and Geffner, H. 2021. General Policies, Representations, and Planning Width. In Proc. AAAI.
- Bonet, Palacios, and Geffner (2009) Bonet, B.; Palacios, H.; and Geffner, H. 2009. Automatic Derivation of Memoryless Policies and Finite-State Controllers Using Classical Planners. In Proc. ICAPS-09, 34–41.
- Dechter (2013) Dechter, R. 2013. Reasoning with probabilistic and deterministic graphical models: Exact algorithms. Morgan & Claypool Publishers.
- Francès et al. (2017) Francès, G.; Ramírez, M.; Lipovetzky, N.; and Geffner, H. 2017. Purely Declarative Action Representations are Overrated: Classical Planning with Simulators. In Proc. IJCAI.
- Freuder (1982) Freuder, E. C. 1982. A sufficient condition for backtrack-free search. Journal of the ACM 29(1): 24–32.
- Geffner and Bonet (2013) Geffner, H.; and Bonet, B. 2013. A Concise Introduction to Models and Methods for Automated Planning. Morgan & Claypool Publishers.
- Ghallab, Nau, and Traverso (2016) Ghallab, M.; Nau, D.; and Traverso, P. 2016. Automated planning and acting. Cambridge U.P.
- Hoffmann, Porteous, and Sebastia (2004) Hoffmann, J.; Porteous, J.; and Sebastia, L. 2004. Ordered Landmarks in Planning. Journal of Artificial Intelligence Research 22: 215–278.
- Hu and De Giacomo (2011) Hu, Y.; and De Giacomo, G. 2011. Generalized planning: Synthesizing plans that work for multiple environments. In Proc. IJCAI, 918–923.
- Lehman and Stanley (2011a) Lehman, J.; and Stanley, K. O. 2011a. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation 19(2): 189–223.
- Lehman and Stanley (2011b) Lehman, J.; and Stanley, K. O. 2011b. Evolving a diversity of virtual creatures through novelty search and local competition. In Proc. of the 13th annual conference on Genetic and evolutionary computation, 211–218.
- Lipovetzky and Geffner (2012) Lipovetzky, N.; and Geffner, H. 2012. Width and serialization of classical planning problems. In Proc. ECAI, 540–545.
- Lipovetzky and Geffner (2017a) Lipovetzky, N.; and Geffner, H. 2017a. Best-First Width Search: Exploration and Exploitation in Classical Planning. In Proc. AAAI.
- Lipovetzky and Geffner (2017b) Lipovetzky, N.; and Geffner, H. 2017b. A polynomial planning algorithm that beats LAMA and FF. Proc. ICAPS .
- Lipovetzky, Ramirez, and Geffner (2015) Lipovetzky, N.; Ramirez, M.; and Geffner, H. 2015. Classical Planning with Simulators: Results on the Atari Video Games. In Proc. IJCAI.
- Ostrovski et al. (2017) Ostrovski, G.; Bellemare, M. G.; Oord, A.; and Munos, R. 2017. Count-Based Exploration with Neural Density Models. In Proc. ICML, 2721–2730.
- Pathak et al. (2017) Pathak, D.; Agrawal, P.; Efros, A. A.; and Darrell, T. 2017. Curiosity-driven exploration by self-supervised prediction. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 16–17.
- Pearl (1988) Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.
- Richter and Westphal (2010) Richter, S.; and Westphal, M. 2010. The LAMA planner: Guiding cost-based anytime planning with landmarks. Journal of Artificial Intelligence Research 39(1): 127–177.
- Segovia, Jiménez, and Jonsson (2016) Segovia, J.; Jiménez, S.; and Jonsson, A. 2016. Generalized planning with procedural domain control knowledge. In Proc. ICAPS, 285–293.
- Srivastava, Immerman, and Zilberstein (2008) Srivastava, S.; Immerman, N.; and Zilberstein, S. 2008. Learning generalized plans using abstract counting. In Proc. AAAI, 991–997.
- Srivastava et al. (2011) Srivastava, S.; Zilberstein, S.; Immerman, N.; and Geffner, H. 2011. Qualitative Numeric Planning. In AAAI.
- Tang et al. (2017) Tang, H.; Houthooft, R.; Foote, D.; Stooke, A.; Chen, O. X.; Duan, Y.; Schulman, J.; DeTurck, F.; and Abbeel, P. 2017. # exploration: A study of count-based exploration for deep reinforcement learning. In Advances in neural information processing systems, 2753–2762.
- Thiébaux, Hoffmann, and Nebel (2005) Thiébaux, S.; Hoffmann, J.; and Nebel, B. 2005. In defense of PDDL axioms. Artif. Intell. 168(1-2): 38–69.