Domain-Independent Dynamic Programming

Ryo Kuroiwa [email protected] J. Christopher Beck [email protected]

Abstract

For combinatorial optimization problems, model-based paradigms such as mixed-integer programming (MIP) and constraint programming (CP) aim to decouple modeling and solving a problem: the ‘holy grail’ of declarative problem solving. We propose domain-independent dynamic programming (DIDP), a novel model-based paradigm based on dynamic programming (DP). While DP is not new, it has typically been implemented as a problem-specific method. We introduce Dynamic Programming Description Language (DyPDL), a formalism to define DP models based on a state transition system, inspired by artificial intelligence (AI) planning. We show that heuristic search algorithms can be used to solve DyPDL models and propose seven DIDP solvers. We experimentally compare our DIDP solvers with commercial MIP and CP solvers (solving MIP and CP models, respectively) on common benchmark instances of eleven combinatorial optimization problem classes. We show that DIDP outperforms MIP in nine problem classes, CP also in nine problem classes, and both MIP and CP in seven. DIDP also achieves superior performance to existing state-based solvers including domain-independent AI planners.

keywords:

Dynamic Programming , Combinatorial Optimization , Heuristic Search , State Space Search

^†^†journal: Artificial Intelligence

\affiliation

[nii]organization=National Institute of Informatics, addressline=2-1-2 Hitotsubashi, Chiyoda-ku, city=Tokyo, postcode=101-8430, country=Japan

\affiliation

[uoft]organization=Department of Mechanical and Industrial Enginnering, University of Toronto,addressline=5 King’s College Road, city=Toronto, postcode=M5S 3G8, state=Ontario, country=Canada

{highlights}

We propose domain-independent dynamic programming (DIDP), a novel model-based paradigm for combinatorial optimization.

The modeling language for DIDP is designed so that a user can investigate efficient models by incorporating redundant information.

We implement DIDP solvers using heuristic search in an open-source software framework.

The DIDP solvers outperform commercial mixed-integer programming and constraint programming solvers in a number of problem classes.

1 Introduction

Combinatorial optimization is a class of problems requiring a set of discrete decisions to be made to optimize an objective function. It has wide real-world application fields including transportation [1], scheduling [2, 3], and manufacturing [4] and thus has been an active research topic in artificial intelligence (AI) [5, 6, 7] and operations research (OR) [8]. Among methodologies to solve combinatorial optimization problems, model-based paradigms such as mixed-integer programming (MIP) and constraint programming (CP) are particularly important as they represent steps toward the ‘holy grail’ of declarative problem-solving [9]: in model-based paradigms, a user formulates a problem as a mathematical model and then solves it using a general-purpose solver, i.e., a user just needs to define a problem to solve it. Benefitting from this declarative nature, MIP and CP have been applied to a wide range of combinatorial optimization problems [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20].

Dynamic programming (DP) [21] is a commonly used computational method to solve diverse decision-making problems, but little work has considered DP as a model-based paradigm for combinatorial optimization. In DP, a problem is described by a Bellman equation, a recursive equation where the optimal objective value of the original problem is defined by the optimal objective values of subproblems (states). Although a Bellman equation can be viewed as a declarative mathematical model, it has typically been solved by a problem-specific algorithm in previous work on DP for combinatorial optimization problems [22, 23, 24, 25, 26, 27, 28, 29].

We propose domain-independent dynamic programming (DIDP), a novel model-based paradigm for combinatorial optimization based on DP. Our modeling language is inspired by domain-independent AI planning [30], where a problem is described by a common modeling language based on a state transition system. At the same time, DIDP follows the approach of OR, which investigates efficient optimization models. For example, in MIP, different constraints can result in different strengths of linear relaxations while sharing the same integer feasible region. In CP, global constraints are specifically designed for common substructures of combinatorial optimization problems so that a solver can exploit them to achieve high performance. Moreover, problem-specific DP methods for combinatorial optimization sometimes exploit problem-specific information to reduce the effort to solve the Bellman equations [23, 25, 26, 28, 29]. Following these approaches, DIDP allows (but does not require) a user to explicitly include information that is implied by the problem definition and can be potentially exploited by a solver. This design opens up the possibility of investigating the development of better models as in MIP and CP. To solve the models, we develop general-purpose DIDP solvers using state space search, particularly heuristic search algorithms [31, 32, 33], which are also used in AI planners [34, 35, 36].

1.1 Related Work

Some prior studies used DP as a problem-independent framework. There are four main directions explored in the literature: theoretical formalisms for DP, automated DP in logic programming languages, DP languages or software for applications other than combinatorial optimization, and decision diagram-based solvers. However, compared to DIDP, they are insufficient to be model-based paradigms for combinatorial optimization; they either are purely theoretical, are not explicitly designed for combinatorial optimization, or require additional information specific to the solving algorithm.

Some problem-independent formalisms for DP have been developed, but they were not actual modeling languages. In their seminal work, Karp and Held [37] introduced a sequential decision process (sdp), a problem-independent formalism for DP based on a finite state automaton. While they used sdp to develop DP algorithms problem-independently, they noted that “In many applications, however, the state-transition representation is not the most natural form of problem statement” and thus introduced a discrete decision process, a formalism to describe a combinatorial optimization problem, from which sdp is derived. This line of research was further investigated by subsequent work [38, 39, 40, 41, 42, 43, 44]. In particular, Kumar and Kanal [44] proposed a theoretical formalism based on context-free grammar.

In logic programming languages, a recursive function can be defined with memoization [45] or tabling [46, 47] techniques that store the result of a function evaluation in memory and reuse it given the same arguments. To efficiently perform memoization, Puchinger and Stuckey [48] proposed allowing a user to define a bound on the objective value to prune states, motivated by applications to combinatorial optimization. Picat [49] hybridizes logic programming with other paradigms including MIP, CP, and AI planning, and the AI planning module uses tabling with state pruning based on a user-defined objective bound. Unlike DIDP, the above approaches cannot model dominance between different states since memoization or tabling can avoid the evaluation of a state only when the state is exactly the same as a previously evaluated one.

DP languages and software have been developed for multiple application fields. Algebraic dynamic programming (ADP) [50] is a software framework to formulate a DP model using context-free grammar. It was designed for bioinformatics and was originally limited to DP models where a state is described by a string. While ADP has been extended to describe a state using data structures such as a set, it is still focused on bioinformatics applications [51]. Dyna is a declarative programming language for DP built on logic programming, designed for natural language processing and machine learning applications [52, 53]. For optimal control, DP solvers have been developed in MATLAB [54, 55].

Hooker [56] pointed that a decision diagram (DD), a data structure based on directed graphs, can be used to represent a DP model, and a solution can be extracted as a path in the DD. While constructing such a DD may require large amount of computational space and time, Bergman et al. [57] proposed DD-based branch-and-bound, an algorithm to solve a DP model by repeatedly constructing relaxed and restricted DDs, which are approximations of the exact DD; they are computationally cheaper to construct and give bounds on the optimal objective value. To construct a relaxed DD, DD-based branch-and-bound requires a merging operator, a function to map two states to a single state. The currently developed general-purpose solvers using DD-based branch-and-bound, ddo [57, 58] and CODD [59], require a user to provide a merging operator in addition to a DP model. Therefore, compared to DIDP, they are DD solvers rather than model-based paradigms based on DP.

Several approaches do not fit in the above directions. DP2PNSolver [60] takes a DP model coded in a Java style language, gDPS, as input and compiles it to program code, e.g., Java code. It is not designed to incorporate information more than a Bellman equation, and the output program code solves the Bellman equation by simple state enumeration. DP was also used as a unified modeling format to combine CP with reinforcement learning [61, 62], but this approach is still in the scope of CP; in their framework, a CP model based on a DP model is given to a CP solver, and a reinforcement learning agent trained in a Markov decision process based on the same DP model is used for a value selection heuristic of the CP solver.

Our modeling language is inspired by Planning Domain Definition Language (PDDL) [63, 64], a standard modeling language for AI planning, which is based on a state transition system. While PDDL and related languages such as PDDL+ [65] and Relational Dynamic Influence Diagram Language (RDDL) [66] are dominant in AI planning, other modeling languages for state transition systems have been proposed in AI. Hernádvölgyi et al. [67] proposed PSVN, where a state is described by a fixed length vector. PSVN is designed to allow the automatic generation of a heuristic function for heuristic search algorithms. In the CP community, HADDOCK [68], a modeling language for DDs based on a state transition system, was proposed and used with a CP solver for constraint propagation.

Martelli and Montanari [69] and subsequent work [70, 71] showed that a generalized version of heuristic search algorithms can be used for DP. Unified frameworks for DP, heuristic search, and branch-and-bound were proposed by Ibaraki [72] and Kumar and Kanal [44]. Holte and Fan [73] discussed the similarity between abstraction heuristics for heuristic search and state space relaxation for DP.

1.2 Contributions

We summarize the contributions of this paper as follows:

1.

We propose DIDP, a novel model-based paradigm for combinatorial optimization based on DP. While existing model-based paradigms such as MIP and CP use constraint-based representations of problems, DIDP uses a state-based representation.
2.

We develop a state-based modeling language for DIDP. Although our language is inspired by PDDL, an existing language for AI planning, it is specifically designed for combinatorial optimization and has features that PDDL does not have: a user can investigate efficient optimization models by incorporating redundant information such as dominance relation and bounds.
3.

We formally show that a class of state space search algorithms can be used as DIDP solvers under reasonable theoretical conditions. We implement such solvers in a software framework and demonstrate that they empirically outperform MIP, CP, and existing state-based solvers including AI planners in a number of combinatorial optimization problems. Since our framework is published as open-source software, AI researchers can use it as a platform to develop and apply state space search algorithms to combinatorial optimization problems.

1.3 Overview

In Section 2, we introduce DP with an example. In Section 3, we define Dynamic Programming Description Language (DyPDL), a modeling formalism for DIDP. We also present the theoretical properties of DyPDL including its computational complexity. In Section 4, we present YAML-DyPDL, a practical modeling language for DyPDL. In Section 5, we propose DIDP solvers. First, we show that state space search can be used to solve DyPDL models under reasonable theoretical conditions. In particular, we prove the completeness and optimality of state space search for DyPDL. Then, we develop seven DIDP solvers using existing heuristic search algorithms. In Section 6, we formulate DyPDL models for existing combinatorial optimization problem classes. In Section 7, we experimentally compare the DIDP solvers with commercial MIP and CP solvers using the common benchmark instances of eleven problem classes. We show that the best-performing DIDP solver outperforms both MIP and CP on seven problem classes while the worst performer does so on six of eleven. We also demonstrate that DIDP achieves superior or competitive performance to existing state-based frameworks, domain-independent AI planners, Picat, and ddo. Section 8 concludes the paper.

This paper is an extended version of two conference papers [74, 75], which lacked formal and theoretical aspects due to space limitations. This paper has the following new contributions:

1.

We formally define DyPDL and theoretically analyze it.
2.

We formally define heuristic search for DyPDL and show its completeness and optimality.
3.

We introduce DyPDL models for two additional problems, the orienteering problem with time windows and the multi-dimensional knapsack problem, and use them in the experimental evaluation. These two problems are maximization problems in contrast to the nine minimization problems used in the conference papers.
4.

We show the optimality gap achieved by DIDP solvers in the experimental evaluation. Our dual bound computation method for beam search is improved from our previous work [76].
5.

We empirically investigate the importance of heuristic functions in a DIDP solver.
6.

We empirically compare the DIDP solvers with domain-independent AI planners, Picat, and ddo.

2 Dynamic Programming

Dynamic Programming (DP) is a computational problem-solving method, where a problem is recursively decomposed to subproblems, and each problem is represented by a state [21]. In this paper, we focus on DP to solve combinatorial optimization problems, where we make a finite number of discrete decisions to optimize an objective function. In particular, we assume that a state is transformed into another state by making a decision, and a solution is a finite sequence of decisions. We acknowledge that DP has more general applications beyond our assumptions; for example, when DP is applied to Markov decision processes, the outcome of a decision is represented by a probability distribution over multiple states, and a solution is a policy that maps a state to a probability distribution over decisions [21, 77, 78]. However, this is not the topic of this work as we restrict ourselves to solutions that can be represented by a path of states.

As a running example, we use a combinatorial optimization problem, the traveling salesperson problem with time windows (TSPTW) [79]. In this problem, a set of customers $N=\{0,...,n-1\}$ is given. A solution is a tour starting from the depot (index $0$ ), visiting each customer exactly once, and returning to the depot. Visiting customer $j$ from $i$ requires the travel time $c_{ij}\geq 0$ . In the beginning, $t=0$ . The visit to customer $i$ must be within a time window $[a_{i},b_{i}]$ . Upon earlier arrival, waiting until $a_{i}$ is required. The objective is to minimize the total travel time, excluding the waiting time. Finding a valid tour for TSPTW is NP-complete [79].

Dumas et al. [23] applied DP to solve TSPTW. In their approach, a state is a tuple of variables $(U,i,t)$ , where $U$ is the set of unvisited customers, $i$ is the current location, and $t$ is the current time. At each step, we consider visiting one of the unvisited customers as a decision. the value function $V$ maps a state to the optimal cost of visiting all customers in $U$ and returning to the depot starting from location $i$ at time $t$ . The value function is defined by the following recursive equation called a Bellman equation [21]:

\displaystyle\begin{split}&V(U,i,t)=\begin{cases}c_{i0}&\text{if }U=\emptyset\\ \min\limits_{j\in U:t+c_{ij}\leq b_{j}}c_{ij}+V(U\setminus\{j\},j,\max\{t+c_{ij},a_{j}\})&\text{if }U\neq\emptyset.\\ \end{cases}\end{split}

(1)

When all customers are visited ( $U=\emptyset$ ), the objective value is defined to be the travel time to return to the depot. Otherwise, we consider visiting each customer $j$ from $i$ . The objective value is defined at the minimum of the sum of the travel time to $j$ and the optimal objective value of the resulting state. We assume the second line of Equation (1) to be $\infty$ if there is no $j\in U$ with $t+c_{ij}\leq b_{j}$ . The optimal objective value for the original problem is $V(N\setminus\{0\},0,0)$ . Dumas et al. [23] solved the Bellman equation by enumerating states.

The state-based problem representation in DP is fundamentally different from existing model-based paradigms such as mixed-integer programming (MIP) and constraint programming (CP), which are constraint-based: the problem is defined by a set of decision variables, constraints on the variables, and an objective function. For example, a MIP model for TSPTW [10] uses a binary variable $x_{ij}$ to represent if customer $j$ is visited from $i$ and a continuous variable $t_{i}$ to represent the time to visit $i$ . The model includes constraints ensuring that each customer is visited exactly once ( $\sum_{j\in N\setminus\{i\}}x_{ij}=\sum_{j\in N\setminus\{i\}}x_{ji}=1$ ), and the time window is satisfied ( $a_{i}\leq t_{i}\leq b_{i}$ ). The objective function is represented as $\sum_{i\in N}\sum_{j\in N\setminus\{i\}}c_{ij}x_{ij}$ . Similarly, a CP model uses a variable to represent the time to visit each customer and constraints to ensure that the time window is satisfied [11].

3 Dynamic Programming Description Language (DyPDL)

To shift from problem-specific DP methods to a declarative model-based paradigm based on DP, we formally define a class of models that we focus on in this paper. We introduce a solver-independent theoretical formalism, Dynamic Programming Description Language (DyPDL). Since DP is based on a state-based representation, we design DyPDL inspired by AI planning formalisms such as STRIPS [80] and SAS+ [81], where a problem is described as a state transition system. In DyPDL, a problem is described by states and transitions between states, and a solution corresponds to a sequence of transitions satisfying particular conditions. A state is a complete assignment to state variables.

Definition 1.

A state variable is either an element, set, or numeric variable. An element variable $v$ has domain $D_{v}=\mathbb{Z}^{+}_{0}$ (nonnegative integers). A set variable $v$ has domain $D_{v}=2^{\mathbb{Z}^{+}_{0}}$ (sets of nonnegative integers). A numeric variable $v$ has domain $D_{v}=\mathbb{Q}$ (rational numbers).

Definition 2.

Given a set of state variables $\mathcal{V}=\{v_{1},...,v_{n}\}$ , a state is a tuple of values $S=(d_{1},...,d_{n})$ where $d_{i}\in D_{v_{i}}$ for $i=1,...,n$ . We denote the value $d_{i}$ of variable $v_{i}$ in state $S$ by $S[v_{i}]$ and the set of all states by $\mathcal{S}=D_{v_{1}}\times...\times D_{v_{n}}$ .

Example 1.

In our TSPTW example in Equation (1), a state is represented by three variables, a set variable $U$ , an element variable $i$ , and a numeric variable $t$ .

A state can be transformed into another state by changing the values of the state variables. To describe such changes, we define expressions: functions returning a value given a state.

Definition 3.

An element expression $e:\mathcal{S}\to\mathbb{Z}^{+}_{0}$ is a function that maps a state to a nonnegative value. A set expression $e:\mathcal{S}\to 2^{\mathbb{Z}^{+}_{0}}$ is a function that maps a state to a set of nonnegative values. A numeric expression $e:\mathcal{S}\to\mathbb{Q}$ is a function that maps a state to a numeric value. A condition $c:\mathcal{S}\to\{\bot,\top\}$ is a function that maps a state to a Boolean value. Given a state $S$ , we denote $c(S)=\top$ by $S\models c$ and $c(S)=\bot$ by $S\not\models c$ . For a set of conditions $C$ , we denote $\forall c\in C,S\models c$ by $S\models C$ and $\exists c\in C,S\not\models c$ by $S\not\models C$ .

With the above expressions, we formally define transitions, which transform one state into another. A transition has an effect to update state variables, preconditions defining when it is applicable, and a cost expression to update the objective value.

Definition 4.

A transition $\tau$ is a 3-tuple $\langle\mathsf{eff}_{\tau},\mathsf{cost}_{\tau},\mathsf{pre}_{\tau}\rangle$ where $\mathsf{eff}_{\tau}$ is the effect, $\mathsf{cost}_{\tau}$ is the cost expression, and $\mathsf{pre}_{\tau}$ is the set of preconditions.

1.

The effect $\mathsf{eff}_{\tau}$ is a tuple of expressions $(e_{1},...,e_{n})$ where $e_{i}:\mathcal{S}\to D_{v_{i}}$ . We denote the expression $e_{i}$ corresponding to a state variable $v_{i}$ by $\mathsf{eff}_{\tau}[v_{i}]$ .
2.

The cost expression $\mathsf{cost}_{\tau}:\mathbb{Q}\times\mathcal{S}\to\mathbb{Q}$ is a function that maps a numeric value $x$ and a state $S$ to a numeric value $\mathsf{cost}_{\tau}(x,S)$ .
3.

$\mathsf{pre}_{\tau}$ is a set of conditions, i.e., $c:\mathcal{S}\to\{\bot,\top\}$ for each $c\in\mathsf{pre}_{\tau}$ , and each of such a condition is called a precondition of $\tau$ .

Given a set of transitions $\mathcal{T}$ , the set of applicable transitions $\mathcal{T}(S)$ in a state $S$ is defined as $\mathcal{T}(S)=\{\tau\in\mathcal{T}\mid S\models\mathsf{pre}_{\tau}\}$ . Given a state $S$ and a transition $\tau\in\mathcal{T}(S)$ , the successor state $S[\![\tau]\!]$ , which results from applying $\tau$ in $S$ , is defined as $S[\![\tau]\!][v]=\mathsf{eff}_{\tau}[v](S)$ for each variable $v$ . For a sequence of transitions $\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle$ , the state $S[\![\sigma]\!]$ , which results from applying $\sigma$ in $S$ , is defined as $S[\![\sigma]\!]=S[\![\sigma_{1}]\!][\![\sigma_{2}]\!]...[\![\sigma_{m}]\!]$ . If $\sigma$ is an empty sequence, i.e., $\sigma=\langle\rangle$ , $S[\![\sigma]\!]=S$ .

Example 2.

In our TSPTW example in Equation (1), we consider a transition $\tau_{j}$ to visit customer $j$ . The effect of $\tau_{j}$ is defined as $\mathsf{eff}_{\tau_{j}}[U](S)=S[U]\setminus\{j\}$ , $\mathsf{eff}_{\tau_{j}}[i](S)=j$ , and $\mathsf{eff}_{\tau_{j}}[t](S)=\max\{S[t]+c_{S[i],j},a_{j}\}$ . The preconditions are $S\mapsto j\in S[U]$ and $S\mapsto S[t]+c_{S[i],j}\leq b_{j}$ . The cost expression is defined as $\mathsf{cost}_{\tau_{j}}(x,S)=c_{S[i],j}+x$ .

We define a base case, where further transitions are not considered, and the cost is defined by a function of the state. We use the term ‘base case’ to represent the conditions where the recursion of a Bellman equation stops.

Definition 5 (Base case).

A base case $B$ is a tuple $\langle C_{B},\mathsf{base\_cost}_{B}\rangle$ where $C_{B}$ is a set of conditions and $\mathsf{base\_cost}_{B}:\mathcal{S}\to\mathbb{Q}$ is a numeric expression. A state $S$ with $S\models C_{B}$ is called a base state, and its base cost is $\mathsf{base\_cost}_{B}(S)$ .

Example 3.

In our TSPTW example in Equation (1), a base case $B$ is defined by $C_{B}=\{S\mapsto S[U]=\emptyset\}$ and $\mathsf{base\_cost}_{B}(S)=c_{S[i],0}$ .

Now, we define the state transition system based on the above definitions. We additionally introduce the target state, which is the start of the state transition system, and the state constraints, which must be satisfied by all states. The name ‘target state’ is inspired by a Bellman equation; the target state is the target of the Bellman equation, corresponding to the original problem, to which we want to compute the optimal objective value. The cost of a solution is computed by repeatedly applying the cost expression of a transition backward from the base cost; this is also inspired by recursion in a Bellman equation, where the objective value of the current state is computed from the objective values of the successor states.

Definition 6.

A DyPDL model is a tuple $\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle$ with a specification of minimization or maximization, where $\mathcal{V}$ is the set of state variables, $S^{0}$ is a state called the target state, $\mathcal{T}$ is the set of transitions, $\mathcal{B}$ is the set of base cases, and $\mathcal{C}$ is a set of conditions called state constraints.

A solution for a DyPDL model is a sequence of transitions $\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle$ such that

1.

All transitions are applicable, i.e., $\sigma_{i}\in\mathcal{T}(S^{i-1})$ where $S^{i}=S^{i-1}[\![\sigma_{i}]\!]$ for $i=1,...,m$ .
2.

The target state and all intermediate states do not satisfy a base case, i.e., $\forall B\in\mathcal{B},S^{i}\not\models C_{B}$ for $i=0,...,m-1$ .
3.

The target state, all intermediate states, and the final state satisfy the state constraints, i.e., $S^{i}\models\mathcal{C}$ for $i=0,...,m$ .
4.

The final state is a base state, i.e., $\exists B\in\mathcal{B},S^{m}\models C_{B}$ .

If the target state is a base state and satisfies the state constraints, an empty sequence $\langle\rangle$ is a solution. Given a state $S$ , an $S$ -solution is a sequence of transitions satisfying the above conditions except that it starts from a state $S$ instead of the target state.

For minimization, the cost of an $S$ -solution $\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle$ is recursively defined as follows:

\mathsf{solution\_cost}(\sigma,S)=\begin{cases}\mathsf{cost}_{\sigma_{1}}(\mathsf{solution\_cost}(\langle\sigma_{2},...,\sigma_{m}\rangle,S[\![\sigma_{1}]\!]),S)&\text{ if }|\sigma|\geq 1\\ \min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)&\text{ if }|\sigma|=0\\ \end{cases}

where $|\sigma|$ is the number of transitions in $\sigma$ , i.e., $|\sigma|=m$ , and $\langle\sigma_{2},...,\sigma_{m}\rangle=\langle\rangle$ if $m=1$ . For maximization, we replace $\min$ with $\max$ . For a solution for the DyPDL model ( $S^{0}$ -solution) $\sigma$ , we denote its cost by $\mathsf{solution\_cost}(\sigma)$ , omitting $S^{0}$ . An optimal $S$ -solution for minimization (maximization) is an $S$ -solution with its cost less (greater) than or equal to the cost of any $S$ -solution. An optimal solution for a DyPDL model is an optimal $S^{0}$ -solution.

Example 4.

In our TSPTW example in Equation (1), the state variables are $U$ , $i$ , and $t$ as in Example 1, the target state is $S^{0}=(N\setminus\{0\},0,0)$ , a transition to visit customer $j$ is defined for each $j\in N\setminus\{0\}$ as in Example 2, and one base case is defined as in Example 3. For a solution corresponding a tour $\langle\pi_{1},...,\pi_{n-1}\rangle$ , where $\pi_{i}\in N\setminus\{0\}$ and $\pi_{i}\neq\pi_{j}$ for $i\neq j$ , the cost is defined as $c_{0,\pi_{1}}+...+c_{\pi_{n-2},\pi_{n-1}}+c_{\pi_{n-1},0}$ . The optimization direction is minimization. We do not use state constraints in this example at this point, but we show a use case later in Section 3.3.

3.1 Complexity

We have defined expressions as functions of states and have not specified further details. Therefore, the complexity of an optimization problem with a DyPDL model depends on the complexity of evaluating expressions. In addition, for example, if we have an infinite number of preconditions, evaluating the applicability of a single transition may not terminate in finite time. Given these facts, we consider the complexity of a model whose definition is finite.

Definition 7.

A DyPDL model is finitely defined if the following conditions are satisfied:

1.

The numbers of the state variables, transitions, base cases, and state constraints are finite.
2.

Each transition has a finite number of preconditions.
3.

Each base case has a finite number of conditions.
4.

All the effects, the cost expression, the preconditions of the transitions, the conditions and the costs of the base cases, and the state constraints can be evaluated in finite time.

Even with this restriction, finding a solution for a DyPDL model is an undecidable problem. We show this by reducing one of the AI planning formalisms, which is undecidable, into a DyPDL model. We define a numeric planning task and its solution in Definitions 8 and 9 following Helmert [82].

Definition 8.

A numeric planning task is a tuple $\langle V_{P},V_{N},\textit{Init},\textit{Goal},\textit{Ops}\rangle$ where $V_{P}$ is a finite set of propositional variables, $V_{N}$ is a finite set of numeric variables, Init is a state called the initial state, Goal is a finite set of propositional and numeric conditions, and Ops is a finite set of operators.

1.

A state is defined by a pair of functions $(\alpha,\beta)$ , where $\alpha:V_{P}\to\{\bot,\top\}$ and $\beta:V_{N}\to\mathbb{Q}$ .
2.

A propositional condition is written as $v=\top$ where $v\in V_{P}$ . A state $(\alpha,\beta)$ satisfies it if $\alpha(v)=\top$ .
3.

A numeric condition is written as $f(v_{1},...,v_{n})\textbf{ relop }0$ where $v_{1},...,v_{n}\in V_{N}$ , $f:\mathbb{Q}^{n}\to\mathbb{Q}$ maps $n$ numeric variables to a rational number, and $\textbf{relop}\in\{=,\neq,<,\leq,\geq,>\}$ . A state $(\alpha,\beta)$ satisfies it if $f(\beta(v_{1}),...,\beta(v_{n}))\textbf{ relop }0$ .

An operator in Ops is a pair $\langle\textit{Pre},\textit{Eff}\rangle$ , where Pre is a finite set of conditions (preconditions), and Eff is a finite set of propositional and numeric effects.

1.

A propositional effect is written as $v\leftarrow t$ where $v\in V_{P}$ and $t\in\{\bot,\top\}$ .
2.

A numeric effect is written as $v\leftarrow f(v_{1},...,v_{n})$ where $v,v_{1},...,v_{n}\in V_{N}$ and $f:\mathbb{Q}^{n}\to\mathbb{Q}$ maps $n$ numeric variables to a rational number.

All functions that appear in numeric conditions and numeric effects are restricted to functions represented by arithmetic operators $\{+,-,\cdot,/\}$ , but the divisor must be a non-zero constant.

Definition 9.

Given a numeric planning task $\langle V_{P},V_{N},\textit{Init},\textit{Goal},\textit{Ops}\rangle$ , the state transition graph is a directed graph where nodes are states and there is an edge $((\alpha,\beta),(\alpha^{\prime},\beta^{\prime}))$ if there exists an operator $\langle\textit{Pre},\textit{Eff}\rangle\in\textit{Ops}$ satisfying the following conditions.

1.

$(\alpha,\beta)$ satisfies all conditions in Pre.
2.

$\alpha^{\prime}(v)=t$ if $v\leftarrow t\in\textit{Eff}$ and $\alpha^{\prime}(v)=\alpha(v)$ otherwise.
3.

$\beta^{\prime}(v)=f(\beta(v_{1}),...,\beta(v_{n}))$ if $v\leftarrow f(v_{1},...,v_{n})\in\textit{Eff}$ and $\beta^{\prime}(v)=\beta(v)$ otherwise.

A solution for the numeric planning task is a path from the initial state to a state that satisfies all goal conditions in Goal in the state transition graph.

Example 5.

We represent TSPTW as a numeric planning task similar to the DyPDL model in Example 4. Unlike the DyPDL model, we ignore the objective value since the definition of a numeric planning task does not consider it. In our numeric planning task, a proposition $u_{i}=\top$ represents that customer $i$ is not visited, $v_{i}=\top$ represents that customer $i$ is visited, and $l_{i}=\top$ represents that the current location is customer $i$ . We have only one numeric variable $t$ representing the current time. Therefore,

1.

$V_{P}=\{u_{1},...,u_{n-1},v_{1},...,v_{n-1},l_{0},...,l_{n-1}\}$ .
2.

$V_{n}=\{t\}$ .
3.

$\textit{Init}=(\alpha^{0},\beta^{0})$ such that $\forall i\in N\setminus\{0\},\alpha^{0}(u_{i})=\top,\alpha^{0}(v_{i})=\bot,\alpha^{0}(l_{i})=\bot$ , $\alpha^{0}(l_{0})=\top$ , and $\beta^{0}(t)=0$ .
4.

$\textit{Goal}=\{v_{1}=\top,...,v_{n}=\top,l_{0}=\top\}$ .

To visit customer $j\in N\setminus\{0\}$ from $i\in N$ , we define two operators, one of which represents earlier arrival. In addition, we define an operator to return to the depot $0$ from customer $i$ after visiting all customers. We have the following three types of operators.

1.

$\textit{Pre}=\{l_{i}=\top,u_{j}=\top,t+c_{ij}-a_{j}<0\}$ and $\textit{Eff}=\{l_{i}\leftarrow\bot,l_{j}\leftarrow\top,u_{j}\leftarrow\bot,v_{j}\leftarrow\top,t\leftarrow a_{j}\}$ .
2.

$\textit{Pre}=\{l_{i}=\top,u_{j}=\top,t+c_{ij}-a_{j}\geq 0,t+c_{ij}-b_{j}\leq 0\}$ and $\textit{Eff}=\{l_{i}\leftarrow\bot,l_{j}\leftarrow\top,u_{j}\leftarrow\bot,v_{j}\leftarrow\top,t\leftarrow t+c_{ij}\}$ .
3.

$\textit{Pre}=\{l_{i}=\top,v_{1}=\top,...,v_{n-1}=\top\ \}$ and $\textit{Eff}=\{l_{i}\leftarrow\bot,l_{0}\leftarrow\top,t\leftarrow t+c_{i0}\}$ .

Helmert [82] showed that finding a solution for the above-defined numeric planning task is undecidable. To show the undecidability of DyPDL, we reduce a numeric planning task into a DyPDL model by replacing all propositional variables with a single set variable.

Theorem 1.

Finding a solution for a finitely defined DyPDL model is undecidable.

Proof.

Let $\langle V_{P},V_{N},\textit{Init},\textit{Goal},\textit{Ops}\rangle$ be a numeric planning task. We compose a DyPDL model as follows:

1.

If $V_{P}\neq\emptyset$ , we introduce a set variable $P^{\prime}$ in the DyPDL model. For each numeric variable $v\in V_{N}$ in the numeric planning task, we introduce a numeric variable $v^{\prime}$ in the DyPDL model.
2.

Let $(\alpha^{0},\beta^{0})=\textit{Init}$ . We index propositional variables in $V_{P}$ using $i=0,...,|V_{P}|-1$ and denote the $i$ -th variable by $u_{i}$ . In the target state $S^{0}$ of the DyPDL model, $S^{0}[P^{\prime}]=\{i\in\{0,...,|V_{P}|-1\}\mid\alpha^{0}(u_{i})=\top\}$ and $S^{0}[v^{\prime}]=\beta^{0}(v)$ for each numeric variable $v\in V_{N}$ .
3.

We introduce a base case $B=\langle C_{B},0\rangle$ in the DyPDL model. For each propositional condition $u_{i}=\top$ in Goal, we introduce a condition $S\mapsto i\in S[P^{\prime}]$ in $C_{B}$ . For each numeric condition $f(v_{1},...,v_{n})\textbf{ relop }0$ in Goal, we introduce a condition $S\mapsto f(S[v_{1}],...,S[v_{n}])\textbf{ relop }0$ in $C_{B}$ .
4.

For each operator $o=\langle\textit{Pre},\textit{Eff}\rangle$ , we introduce a transition $o^{\prime}=\langle\mathsf{eff}_{o^{\prime}},\mathsf{cost}_{o^{\prime}},\mathsf{pre}_{o^{\prime}}\rangle$ with $\mathsf{cost}_{o^{\prime}}(x,S)=x+1$ .
5.

For each propositional condition $u_{i}=\top$ in Pre, we introduce $S\mapsto i\in S[P^{\prime}]$ in $\mathsf{pre}_{o^{\prime}}$ . For each numeric condition $f(v_{1},...,v_{n})\textbf{ relop }$ 0 in Pre, we introduce $S\mapsto f(S[v^{\prime}_{1}],...,S[v^{\prime}_{n}])\textbf{ relop }0$ in $\mathsf{pre}_{o^{\prime}}$ .
6.

Let $\textit{Add}=\{i\in\{0,...,|V_{P}|-1\}\mid u_{i}\leftarrow\top\in\textit{Eff}\}$ and $\textit{Del}=\{i\in\{0,...,|V_{P}|-1\}\mid u_{i}\leftarrow\bot\in\textit{Eff}\}$ . We have $\mathsf{eff}_{o^{\prime}}[P^{\prime}](S)=(S[P^{\prime}]\setminus\textit{Del})\cup\textit{Add}$ . We have $\mathsf{eff}_{o^{\prime}}[v^{\prime}](S)=f(S[v^{\prime}_{1}],...,S[v^{\prime}_{n}])$ if $v\leftarrow f(v_{1},...,v_{n})\in\textit{Eff}$ and $\mathsf{eff}_{o^{\prime}}[v^{\prime}](S)=S[v^{\prime}]$ otherwise.
7.

The set of state constraints is empty.

The construction of the DyPDL model is done in finite time. The numbers of propositional variables, numeric variables, goal conditions, transitions, preconditions, and effects are finite. Therefore, the DyPDL model is finitely defined.

Let $\sigma=\langle o^{\prime}_{1},...,o^{\prime}_{m}\rangle$ be a solution for the DyPDL model. Let $S^{j}=S^{j-1}[\![o^{\prime}_{j}]\!]$ for $j=1,...,m$ . Let $(\alpha^{j},\beta^{j})$ be a numeric planning state such that $\alpha^{j}(u_{i})=\top$ if $i\in S^{j}[P^{\prime}]$ , $\alpha^{j}(u_{i})=\bot$ if $i\notin S^{j}[P^{\prime}]$ , and $\beta^{j}(v)=S^{j}[v^{\prime}]$ . Note that $(\alpha^{0},\beta^{0})=\textit{Init}$ satisfies the above condition by construction. We prove that the state transition graph for the numeric planning task has edge $((\alpha^{j},\beta^{j}),(\alpha^{j+1},\beta^{j+1}))$ for $j=0,...,m-1$ , and $(\alpha^{m},\beta^{m})$ satisfies all conditions in Goal.

Let $o_{j}=(\textit{Pre}_{j},\textit{Eff}_{j})$ . Since $o^{\prime}_{j}$ is applicable in $S^{j}$ , for each propositional condition $u_{i}=\top$ in $\textit{Pre}_{j}$ , the set variable satisfies $i\in S^{j}[P^{\prime}]$ . For each numeric condition $f(v_{1},...,v_{n})\textbf{ relop }0$ in $\textit{Pre}_{j}$ , the numeric variables satisfy $f(S^{j}[v^{\prime}_{1}],...,S^{j}[v^{\prime}_{n}])\textbf{ relop }0$ . By construction, $\alpha^{j}(u_{i})=\top$ for $i\in S^{j}[P^{\prime}]$ and $f(\beta^{j}(v_{1}),...,\beta^{j}(v_{n}))\textbf{ relop }0$ . Therefore, $(\alpha^{j},\beta^{j})$ satisfies all conditions in $\textit{Pre}_{j}$ . Similarly, $(\alpha^{m},\beta^{m})$ satisfies all conditions in Goal since $S^{m}$ satisfies all conditions in the base case.

Let $\textit{Add}_{j}=\{i\in\{0,...,|V_{P}|-1\}\mid u_{i}\leftarrow\top\in\textit{Eff}_{j}\}$ and $\textit{Del}_{j}=\{i\in\{0,...,|V_{P}|-1\}\mid u_{i}\leftarrow\bot\in\textit{Eff}_{j}\}$ . By construction, $S^{j+1}[P^{\prime}]=(S^{j}[P^{\prime}]\setminus\textit{Del}_{j})\cup\textit{Add}_{j}$ . Therefore, for $i$ with $u_{i}\leftarrow\bot\in\textit{Eff}_{j}$ , we have $i\notin S^{j+1}[P^{\prime}]$ , which implies $\alpha^{j+1}(u_{i})=\bot$ . For $i$ with $u_{i}\leftarrow\top\in\textit{Eff}_{j}$ , we have $i\in S^{j+1}[P^{\prime}]$ , which implies $\alpha^{j+1}(u_{i})=\top$ . For other $i$ , $i\in S^{j+1}[P^{\prime}]$ if $i\in S^{j}[P^{\prime}]$ and $i\notin S^{j+1}[P^{\prime}]$ if $i\notin S^{j}[P^{\prime}]$ , which imply $\alpha^{j+1}(u_{i})=\alpha^{j}(u_{i})$ . For $v$ with $v\leftarrow f(v_{1},...,v_{n})\in\textit{Eff}_{j}$ , we have $S^{j+1}[v^{\prime}]=f(S^{j}[v^{\prime}_{1}],...,S^{j}[v^{\prime}_{n}])=f(\beta^{j}(v_{1}),...,\beta^{j}(v_{n}))$ , which implies $\beta^{j+1}(v)=f(\beta^{j}(v_{1}),...,\beta^{j}(v_{n}))$ . For other $v$ , we have $S^{j+1}[v^{\prime}]=S^{j}[v^{\prime}]=\beta^{j}(v)$ , which implies $\beta^{j+1}(v)=\beta^{j}(v)$ . Therefore, edge $((\alpha^{j},\beta^{j}),(\alpha^{j+1},\beta^{j+1}))$ exists in the state transition graph.

Thus, by solving the DyPDL model, we can find a solution for the numeric planning task. Since the numeric planning task is undecidable, finding a solution for a DyPDL model is also undecidable. ∎

Note that applying this reduction to the numeric planning task in Example 5 does not result in the DyPDL model in Example 4. For example, Example 5 defines two operators for each pair of customers and one operator for each customer to return to the depot, and thus DyPDL model constructed from this numeric planning task has $2n(n-1)+n-1$ transitions in total. In contrast, Example 4 defines one transition for each customer, resulting in $n-1$ transitions in total. This difference is due to the expressiveness of DyPDL, i.e., referring the value of a state variable in preconditions and effects, enables us to model the same problem with fewer transitions.

With the above proof, DyPDL can be considered a superset of the numeric planning formalism in Definition 8, which corresponds to a subset of Planning Domain Definition Language (PDDL) 2.1 [64] called level 2 of PDDL 2.1 [82]. As we will present later, our practical modeling language for DyPDL enables a user to include dominance between states and bounds on the solution cost, which cannot be modeled in PDDL. In contrast, full PDDL 2.1 has additional features such as durative actions, which cannot be modeled in DyPDL.

Helmert [82] and subsequent work [83] proved undecidability for more restricted numeric planning formalisms than Definition 8 such as subclasses of a restricted numeric planning task, where functions in numeric conditions are linear and numeric effects increase or decrease a numeric variable only by a constant. We expect that these results can be easily applied to DyPDL since our reduction is straightforward. Previous work also investigated conditions with which a numeric planning task becomes more tractable, e.g., decidable or PSPACE-complete [82, 84, 85]. We also expect that we can generalize such conditions to DyPDL. However, in this paper, we consider typical properties in DP models for combinatorial optimization problems.

In our TSPTW example, we have a finite number of states; each state is the result of a sequence of visits to a finite number of customers. In addition, such a sequence does not encounter the same state more than once since the cardinality of the set of unvisited customers, $U$ , is strictly decreasing. We formalize these properties as finiteness and acyclicity. In practice, all DP models presented and evaluated in this paper are finite and acyclic.

First, we introduce reachability. A state $S^{\prime}$ is reachable from another state $S$ if there exists a sequence of transitions that transforms $S$ into $S^{\prime}$ . The definition is similar to that of an $S$ -solution.

Definition 10 (Reachability).

Let $S$ be a state satisfying the state constraints. A state $S^{\prime}$ is reachable from a state $S$ with a non-empty sequence of transitions $\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle$ if the following conditions are satisfied:

1.

All transitions are applicable, i.e., $\sigma_{1}\in\mathcal{T}(S)$ and $\sigma_{i+1}\in\mathcal{T}(S^{i})$ where $S^{1}=S[\![\sigma_{1}]\!]$ and $S^{i+1}=S^{i}[\![\sigma_{i+1}]\!]$ for $i=1,...,m-1$ .
2.

All intermediate states do not satisfy a base case, i.e., $\forall B\in\mathcal{B},S^{i}\not\models C_{B}$ for $i=1,...,m-1$ .
3.

All intermediate states satisfy the state constraints, i.e., $S^{i}\models\mathcal{C}$ for $i=1,...,m-1$ .
4.

The final state is $S^{\prime}$ , i.e., $S^{\prime}=S^{m}$ .

We say that $S^{\prime}$ is reachable from $S$ if there exists a non-empty sequence of transitions with the above conditions. We say that $S$ is a reachable state if it is the target state or reachable from the target state.

Definition 11.

A DyPDL model is finite if it is finitely defined, and the set of reachable states is finite.

Definition 12.

A DyPDL model is acyclic if any reachable state is not reachable from itself.

If a model is finite, we can enumerate all reachable states and check if there is a base state in finite time. If there is a reachable base state, and the model is acyclic, then there are a finite number of solutions, and each solution has a finite number of transitions. Therefore, by enumerating all sequences with which a state is reachable, identifying solutions, and computing their costs, we can find an optimal solution in finite time.

Theorem 2.

A finite and acyclic DyPDL model has an optimal solution, or the model is infeasible. A problem to decide if a solution whose cost is less (greater) than a given rational number exists for minimization (maximization) is decidable.

3.2 The Bellman Equation for DyPDL

A Bellman equation is useful to succinctly represents a DP model as we have presented the example DP model for TSPTW using it (Equation (1)). Here, we make an explicit connection between DyPDL and a Bellman equation. Focusing on finite and acyclic DyPDL models, which are typical properties in combinatorial optimization problems, we present the Bellman equation representing the optimal solution cost.¹¹1While a Bellman equation can be used for a cyclic model by considering a fixed point, we do not consider it in this paper as such a model is not used in this paper, and it complicates our discussion. A Bellman equation requires a special cost structure, the Principle of Optimality [21]. We formally define it in the context of DyPDL. Intuitively, an optimal $S$ -solution must be constructed from an optimal $S[\![\tau]\!]$ -solution, where $\tau$ is a transition applicable in $S$ .

Definition 13.

Given any reachable state $S$ and an applicable transition $\tau\in\mathcal{T}(S)$ , a DyPDL model satisfies the Principle of Optimality if for any $x,y\in\mathbb{Q}$ , $x\leq y\rightarrow\mathsf{cost}_{\tau}(x,S)\leq\mathsf{cost}_{\tau}(y,S)$ .

With this property, we give the Bellman equation for a DyPDL model, defining the value function $V$ that maps a state to the optimal $S$ -solution cost or $\infty$ ( $-\infty$ ) if an $S$ -solution does not exist in minimization (maximization).

Theorem 3.

Consider a finite, acyclic, and minimization DyPDL model $\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle$ satisfying the Principle of Optimality. For each reachable state $S$ , there exists an optimal $S$ -solution with a finite number of transitions, or there does not exist an $S$ -solution. Let $\mathcal{S}^{\prime}$ be a set of reachable states, and $V:\mathcal{S}^{\prime}\to\mathbb{Q}\cup\{\infty\}$ be a function of a state that returns $\infty$ if there does not exist an $S$ -solution or the cost of an optimal $S$ -solution otherwise. Then, $V$ satisfies the following equation:

V(S)=\begin{cases}\infty&\text{if }S\not\models\mathcal{C}\\ \min\limits_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)&\text{else if }\exists B\in\mathcal{B},S\models C_{B}\\ \min\limits_{\tau\in\mathcal{T}(S)}\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S)&\text{else if }\exists\tau\in\mathcal{T}(S),V(S[\![\tau]\!])<\infty\\ \infty&\text{else.}\end{cases}

(2)

For maximization, we replace $\infty$ with $-\infty$ , $\min$ with $\max$ , and $<$ with $>$ .

Proof.

Since the model is acyclic, we can define a partial order over reachable states where $S$ precedes its successor state $S[\![\tau]\!]$ if $\tau\in\mathcal{T}(S)$ . We can sort reachable states topologically according to this order. Since the set of reachable states is finite, there exists a state that does not precede any reachable state. Let $S$ be such a state. Then, one of the following holds: $S\not\models\mathcal{C}$ , $S$ is a base state, or $\mathcal{T}(S)=\emptyset$ by Definition 10. If $S\not\models\mathcal{C}$ , there does not exist an $S$ -solution and $V(S)=\infty$ , which is consistent with the first line of Equation (2). If $S$ satisfies a base case, since the only $S$ -solution is an empty sequence by Definition 6, $V(S)=\min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)$ , which is consistent with the second line of Equation (2). If $S$ is not a base state and $\mathcal{T}(S)=\emptyset$ , then there does not exist an $S$ -solution and $V(S)=\infty$ , which is consistent with the fourth line of Equation (2).

Assume that for each reachable state $S[\![\tau]\!]$ preceded by a reachable state $S$ in the topological order, one of the following conditions holds:

1.

There does not exist an $S[\![\tau]\!]$ -solution, i.e., $V(S[\![\tau]\!])=\infty$ .
2.

There exists an optimal $S[\![\tau]\!]$ -solution with a finite number of transitions with cost $V(S[\![\tau]\!])<\infty$ .

If the first case holds for each $\tau\in\mathcal{T}(S)$ , there does not exist an $S$ -solution, and $V(S)=\infty$ . Since $V(S[\![\tau]\!])=\infty$ for each $\tau$ , $V(S)=\infty$ is consistent with the fourth line of Equation (2). The first case of the assumption also holds for $S$ . If the second case holds for some $\tau$ , there exists an optimal $S[\![\tau]\!]$ -solution with a finite number of transitions and the cost $V(S[\![\tau]\!])<\infty$ . By concatenating $\tau$ and the optimal $S[\![\tau]\!]$ -solution, we can construct an $S$ -solution with a finite number of transitions and the cost $\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S)$ . This $S$ -solution is better or equal to any other $S$ -solution $\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle$ starting with $\sigma_{1}=\tau$ since

\mathsf{solution\_cost}(\sigma,S)=\mathsf{cost}_{\tau}(\mathsf{solution\_cost}(\langle\sigma_{2},...,\sigma_{m}\rangle,S[\![\tau]\!]),S)\geq\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S)

by the Principle of Optimality (Definition 13). By considering all possible $\tau$ ,

V(S)=\min\limits_{\tau\in\mathcal{T}(S)}\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S),

which is consistent with the third line of Equation (2). The second case of the assumption also holds for $S$ . We can prove the theorem by mathematical induction. The proof for maximization is similar. ∎

3.3 Redundant Information

In AI planning, a model typically includes only information necessary to define a problem [63, 86]. In contrast, in operations research (OR), an optimization model often includes redundant information implied by the other parts of the model, e.g., valid inequalities in MIP. While such information is unnecessary to define a model, it sometimes improves the performance of solvers. Redundant information has also been exploited by some problem-specific DP methods in previous work [23, 48, 29]. For example, Dumas et al. [23] used redundant information implied by the Bellman equation for TSPTW (Equation (1)). Given a state $(U,i,t)$ , if a customer $j\in U$ cannot be visited by the deadline $b_{j}$ even if we use the shortest path with travel time $c^{*}_{ij}$ , then the state does not lead to a solution, so it is ignored. While this technique was algorithmically used in previous work, we can represent it declaratively using the value function and formulate it as a state constraint.

V(U,i,t)=\infty\text{ if }\exists j\in U,t+c^{*}_{ij}>b_{j}.

(3)

Example 6.

In our TSPTW Example, Inequality (3) can be represented by a set of state constraints $\{S\mapsto j\not\in U\lor S[t]+c^{*}_{S[i],j}\leq b_{j}\mid j\in N\setminus\{0\}\}$ .

While we have represented Equation (3) using state constraints, we can also consider other types of redundant information. We introduce three new concepts, state dominance, dual bound functions, and forced transitions, to declaratively represent such information.

3.3.1 State Dominance

Another technique used by Dumas et al. [23] is state dominance. If we have two states $(U,i,t)$ and $(U,i,t^{\prime})$ with $t\leq t^{\prime}$ , then $(U,i,t)$ leads to at least as good a solution as $(U,i,t^{\prime})$ , so the latter is ignored. In terms of the value function:

V(U,i,t)\leq V(U,i,t^{\prime})\text{ if }t\leq t^{\prime}.

(4)

We define state dominance for DyPDL. Intuitively, one state $S$ dominates another state $S^{\prime}$ if $S$ always leads to an as good or a better solution, and thus $S^{\prime}$ can be ignored when we have $S$ . In addition to this intuition, we require that $S$ leads to a not longer solution. This condition ensures that a state $S$ does not dominate its successor $S[\![\tau]\!]$ on an optimal solution; $S[\![\tau]\!]$ should not be ignored even if we have $S$ since the optimal $S$ -solution cannot be found without considering $S[\![\tau]\!]$ .

Definition 14 (State Dominance).

For a minimization DyPDL model, a state $S$ dominates another state $S^{\prime}$ , denoted by $S^{\prime}\preceq S$ , iff, for any $S^{\prime}$ -solution $\sigma^{\prime}$ , there exists an $S$ -solution $\sigma$ such that $\mathsf{solution\_cost}(\sigma,S)\leq\mathsf{solution\_cost}(\sigma^{\prime},S^{\prime})$ and $\sigma$ has no more transitions than $\sigma^{\prime}$ . For maximization, we replace $\leq$ with $\geq$ .

Our definition of dominance is inspired by simulation-based dominance in AI planning [87]. In that paradigm, $S$ dominates $S^{\prime}$ only if for each applicable transition $\tau^{\prime}$ in $S^{\prime}$ , there exists an applicable transition $\tau$ in $S$ such that $S[\![\tau]\!]$ dominates $S^{\prime}[\![\tau^{\prime}]\!]$ . Also, if $S$ dominates a base state $S^{\prime}$ (a goal state in planning terminology), $S$ is also a base state. In addition to the original transitions, simulation-based dominance adds a NOOP transition, which stays in the current state. In simulation-based dominance, if we have an $S^{\prime}$ -solution, we also have $S$ -solution with an equal number of transitions (or possibly fewer transitions if we exclude NOOP). Therefore, intuitively, Definition 14 is a generalization of simulation-based dominance; a formal discussion is out of the scope of this paper.

In practice, it may be difficult to always detect if one state dominates another or not, and thus an algorithm may focus on dominance that can be easily detected based on a sufficient condition, e.g., $t\leq t^{\prime}$ for $(U,i,t)$ and $(U,i,t^{\prime})$ . We define an approximate dominance relation to represent such a strategy. First, we clarify the condition that should be satisfied by an approximate dominance relation.

Theorem 4.

For a DyPDL model, the dominance relation is a preorder, i.e., the following conditions hold.

1.

$S\preceq S$ for a state $S$ (reflexivity).
2.

$S^{\prime\prime}\preceq S^{\prime}\land S^{\prime}\preceq S\rightarrow S^{\prime\prime}\preceq S$ for states $S$ , $S^{\prime}$ , and $S^{\prime\prime}$ (transitivity).

Proof.

The first condition holds by Definition 14. For the second condition, for any $S^{\prime\prime}$ -solution $\sigma^{\prime\prime}$ , there exists an equal or better $S^{\prime}$ -solution $\sigma^{\prime}$ having no more transitions than $\sigma^{\prime\prime}$ . There exists an equal or better $S$ -solution $\sigma$ for $\sigma^{\prime}$ having no more transitions than $\sigma^{\prime}$ . Therefore, the cost of $\sigma$ is equal to or better than $\sigma^{\prime\prime}$ , and $\sigma$ has no more transitions than $\sigma^{\prime\prime}$ . ∎

Definition 15.

For a DyPDL model, an approximate dominance relation $\preceq_{a}$ is a preorder over two states such that $S^{\prime}\preceq_{a}S\rightarrow S^{\prime}\preceq S$ for reachable states $S$ and $S^{\prime}$ .

Example 7.

For our TSPTW example, Inequality (4) is represented by an approximate dominance relation such that $S^{\prime}\preceq_{a}S$ iff $S[U]=S^{\prime}[U]$ , $S[i]=S^{\prime}[i]$ , and $S[t]\leq S^{\prime}[t]$ .

1.

The reflexivity holds since $S[U]=S[U]$ , $S[i]=S[i]$ , and $S[t]\leq S[t]$ .
2.

The transitivity holds since $S[U]=S^{\prime}[U]$ and $S^{\prime}[U]=S^{\prime\prime}[U]$ imply $S[U]=S^{\prime\prime}[U]$ , $S[i]=S^{\prime}[i]$ and $S^{\prime}[i]=S^{\prime\prime}[i]$ imply $S[i]=S^{\prime\prime}[i]$ , and $S[t]\leq S^{\prime}[t]$ and $S^{\prime}[t]\leq S^{\prime\prime}[t]$ imply $S[t]\leq S^{\prime\prime}[t]$ .
3.

All $S$ - and $S^{\prime}$ -solutions have the same number of transitions since each solution has $|S[U]|=|S^{\prime}[U]|$ transitions to visit all unvisited customers.

An approximate dominance relation is sound but not complete: it always detects the dominance if two states are the same and otherwise may produce a false negative but never a false positive.

Similarly to Inequality (4), we can represent approximate dominance relation in DyPDL using the value function. In what follows, we assume that $-\infty<x<\infty$ for any $x\in\mathbb{Q}$ .

Theorem 5.

Given a finite and acyclic DyPDL model satisfying the Principle of Optimality, let $V$ be the value function of the Bellman equation for minimization. Given an approximate dominance relation $\preceq_{a}$ , for reachable states $S$ and $S^{\prime}$ ,

V(S)\leq V(S^{\prime})\text{ if }S^{\prime}\preceq_{a}S.

(5)

For maximization, we replace $\leq$ with $\geq$ .

Proof.

For reachable states $S$ and $S^{\prime}$ with $S^{\prime}\preceq_{a}S$ , assume that there exist $S$ - and $S^{\prime}$ -solutions. Then, $V(S)$ ( $V(S^{\prime})$ ) is the cost of an optimal $S$ -solution ( $S^{\prime}$ -solution). By Definitions 14 and 15, for minimization, an optimal $S$ -solution has an equal or smaller cost than any $S^{\prime}$ -solution, so $V(S)\leq V(S^{\prime})$ . If there does not exist an $S$ -solution, by Definition 14, there does not exist an $S^{\prime}$ -solution, so $V(S)=V(S^{\prime})=\infty$ . If there does not exist an $S^{\prime}$ -solution, $V(S)\leq V(S^{\prime})=\infty$ . The proof for the maximization is similar. ∎

3.3.2 Dual Bound Function

While not used by Dumas et al. [23], bounds on the value function have been used by previous problem-specific DP methods [48, 29]. When we know an upper bound on the optimal solution cost for minimization, and we can prove that a state cannot lead to a solution with its cost less than the upper bound, we can ignore the state. For our TSPTW example, we define a lower bound function based on the one used for a sequential ordering problem by previous work [88]. Since the minimum travel time to visit customer $j$ is $c^{\text{in}}_{j}=\min_{k\in N\setminus\{j\}}c_{kj}$ , the cost of visiting all customers in $U$ and returning to the depot is at least the sum of $c^{\text{in}}_{j}$ for each $j\in U\cup\{0\}$ . Similarly, we also use the minimum travel time from $j$ , $c^{\text{out}}_{j}=\min_{k\in N\setminus\{j\}}c_{jk}$ , to underestimate the total travel time. Using the value function,

V(U,i,t)\geq\max\left\{\sum_{j\in U\cup\{0\}}c^{\text{in}}_{j},\sum_{j\in U\cup\{i\}}c^{\text{out}}_{j}\right\}.

(6)

We formalize this technique as a dual bound function that underestimates (overestimates) the cost of a solution in minimization (maximization).

Definition 16.

For a DyPDL model, a function $\eta:\mathcal{S}\mapsto\mathbb{Q}\cup\{\infty,-\infty\}$ is a dual bound function iff, for any reachable state $S$ and any $S$ -solution $\sigma$ , $\eta(S)$ is a dual bound on $\mathsf{solution\_cost}(\sigma,S)$ , i.e., $\eta(S)\leq\mathsf{solution\_cost}(\sigma,S)$ for minimization. For maximization, we replace $\leq$ with $\geq$ .

Example 8.

For our TSPTW example, Inequality (6) is represented as $\eta(S)=\max\left\{\sum\limits_{j\in S[U]\cup\{0\}}c^{\text{in}}_{j},\sum\limits_{j\in S[U]\cup\{S[i]\}}c^{\text{out}}_{j}\right\}$ .

A function that always returns $-\infty$ ( $\infty$ ) for minimization (maximization) is trivially a dual bound function. If there exists an $S$ -solution $\sigma$ for minimization, $\eta(S)\leq\mathsf{solution\_cost}(\sigma,S)<\infty$ . Otherwise, $\eta(S)$ can be any value, including $\infty$ . Thus, if a dual bound function can detect that an $S$ -solution does not exist, the function should return $\infty$ to tell a solver that there is no $S$ -solution. For maximization, a dual bound function should return $-\infty$ in such a case.

Theorem 6.

Given a finite and acyclic DyPDL model satisfying the Principle of Optimality, let $V$ be the value function of the Bellman equation for minimization. Given a dual bound function $\eta$ , for a reachable state $S$ ,

V(S)\geq\eta(S).

(7)

For maximization, we replace $\geq$ with $\leq$ .

Proof.

For a reachable state $S$ , if there exists an $S$ -solution, the cost of an optimal $S$ -solution is $V(S)$ . By Definition 16, $\eta(S)$ is a lower bound of the cost of any $S$ -solution, so $\eta(S)\leq V(S)$ . Otherwise, $\eta(S)\leq V(S)=\infty$ . The proof for maximization is similar. ∎

3.3.3 Forced Transitions

We introduce yet another type of redundant information, forced transitions. Since forced transitions are not identified in our TSPTW example, we first present a motivating example, the talent scheduling problem [2]. In this problem, a set of actors $A$ and a set of scenes $N$ are given, and the objective is to find a sequence of scenes to shoot to minimize the total cost paid for the actors. In a scene $s\in N$ , a set of actors $A_{s}\subseteq A$ plays for $d_{s}$ days. We incur cost $c_{a}$ of actor $a$ for each day they are on location. If an actor plays on days $i$ and $j$ , they are on location on days $i,i+1,...,j$ even if they do not play on day $i+1$ to $j-1$ . The objective is to find a sequence of scenes such that the total cost is minimized.

Garcia de la Banda et al. [29] proposed a DP method for talent scheduling, where a state is represented by a set of unscheduled scenes $Q$ . The actors who played in scenes $N\setminus Q$ have already arrived at the location and are staying there if they play in a scene in $Q$ . Therefore, the set of actors on location is

L(Q)=\left(\bigcup_{s\in Q}A_{s}\cap\bigcup_{s\in N\setminus Q}A_{s}\right).

At each step, we shoot a scene $s$ from $Q$ and pay the cost $d_{s}\sum_{a\in A_{s}\cup L(Q)}c_{a}$ . The Bellman equation is as follows:

V(Q)=\begin{cases}0&\text{if }Q=\emptyset\\ \min\limits_{s\in Q}d_{s}\sum_{a\in A_{s}\cup L(Q)}c_{a}+V(Q\setminus\{s\})&\text{if }Q\neq\emptyset.\end{cases}

(8)

In this model, we can shoot a scene $s$ without paying an extra cost if $A_{s}=L(Q)$ . Therefore, if such an scene exists, we should shoot it next ignoring other scenes. Using $b_{s}=d_{s}\sum_{a\in A_{s}}c_{s}$ , which can be precomputed,

V(Q)=b_{s}+V(Q\setminus\{s\})\text{ if }\exists s\in Q,A_{s}=L(Q).

(9)

We can exploit more redundant information in the preconditions of the transitions and a dual bound function, following Garcia de la Banda et al. [29]. We present the complete model with such information in B.

We formalize Equation (9) as forced transitions in DyPDL.

Definition 17.

Given a set of applicable transitions $\mathcal{T}(S)$ in a state $S$ , a transition $\tau\in\mathcal{T}(S)$ is a forced transition if an $S$ -solution does not exists, or for each $S$ -solution $\sigma^{\prime}$ , there exists an $S$ -solution $\sigma$ whose first transition is $\tau$ and satisfies $\mathsf{solution\_cost}(\sigma,S)\leq\mathsf{solution\_cost}(\sigma^{\prime},S)$ for minimization. For maximization, we replace $\leq$ with $\geq$ .

Theorem 7.

Given a finite and acyclic DyPDL model satisfying the Principle of Optimality, let $V$ be the value function of the Bellman equation for minimization. Let $S$ be a reachable state with $V(S)<\infty$ for minimization or $V(S)>-\infty$ for maximization and $\tau$ be a forced transition applicable in a state $S$ . Then,

V(S)=\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S).

(10)

Proof.

We assume minimization, and the proof for maximization is similar. Since $V(S)<\infty$ , there exists an $S$ -solution. Since the model is finite and acyclic, there are a finite number of $S$ -solutions, and we can find an optimal $S$ -solution. By Definition 17, there exists an optimal $S$ -solution $\langle\tau,\sigma_{1},...,\sigma_{m}\rangle$ . Since $V(S)$ is the cost of an optimal $S$ -solution,

V(S)=\mathsf{solution\_cost}(\langle\tau,\sigma_{1},...,\sigma_{m}\rangle,S)=\mathsf{cost}_{\tau}(\mathsf{solution\_cost}(\langle\sigma_{1},...,\sigma_{m}\rangle,S[\![\tau]\!]),S).

Since an $S[\![\tau]\!]$ -solution $\langle\sigma_{1},...,\sigma_{m}\rangle$ exists, there exists an optimal $S[\![\tau]\!]$ -solution $\langle\sigma^{\prime}_{1},...,\sigma^{\prime}_{m^{\prime}}\rangle$ with cost $V(S[\![\tau]\!])\leq\mathsf{solution\_cost}(\langle\sigma_{1},...,\sigma_{m}\rangle,S[\![\tau]\!])$ . By the Principle of Optimality (Definition 13),

V(S)\geq\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S).

Since $\langle\tau,\sigma^{\prime}_{1},...,\sigma^{\prime}_{m^{\prime}}\rangle$ is also an $S$ -solution,

V(S)\leq\mathsf{solution\_cost}(\langle\tau,\sigma^{\prime}_{1},...,\sigma^{\prime}_{m^{\prime}}\rangle,S)=\mathsf{cost}_{\tau}(\mathsf{solution\_cost}(\langle\sigma^{\prime}_{1},...,\sigma^{\prime}_{m^{\prime}}\rangle,S[\![\tau]\!]),S)=\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S).

Therefore, $V(S)=\mathsf{cost}_{\tau}(V(S[\![\tau]\!]),S)$ . ∎

4 YAML-DyPDL: A Practical Modeling Language

As a practical modeling language for DyPDL, we propose YAML-DyPDL on top of a data serialization language, YAML 1.2.²²2https://yaml.org/ YAML-DyPDL is inspired by PDDL in AI planning [63]. However, in PDDL, a model typically contains only information necessary to define a problem, while YAML-DyPDL allows a user to explicitly model redundant information, i.e., implications of the definition. Such is the standard convention in OR and is commonly exploited in problem-specific DP algorithms for combinatorial optimization (e.g., Dumas et al. [23]). In particular, in YAML-DyPDL, a user can explicitly define an approximate dominance relation (Definition 15), dual bound functions (Definition 16), and forced transitions (Definition 17). In PDDL, while forced transitions may be realized with preconditions of actions, approximate dominance relation and dual bound functions cannot be modeled.

In the DyPDL formalism, expressions and conditions are defined as functions. In a practical implementation, the kinds of functions that can be used as expressions are defined by the syntax of a modeling language. In YAML-DyPDL, for example, arithmetic operations (e.g., addition, subtraction, multiplication, and division) and set operations (e.g., adding an element, removing an element, union, intersection, and difference) using state variables can be used. We give an example of YAML-DyPDL here. A detailed description of the syntax is given as software documentation in our repository.³³3https://github.com/domain-independent-dp/didp-rs/blob/main/didp-yaml/docs/dypdl-guide.md

4.1 Example

⬇

1cost_type: integer

2reduce: min

3objects:

4 - customer

5state_variables:

6 - name: U

7 type: set

8 object: customer

9 - name: i

10 type: element

11 object: customer

12 - name: t

13 type: integer

14 preference: less

15tables:

16 - name: a

17 type: integer

18 args:

19 - customer

20 - name: b

21 type: integer

22 args:

23 - customer

24 - name: c

25 type: integer

26 args:

27 - customer

28 - customer

29 - name: cstar

30 type: integer

31 args:

32 - customer

33 - customer

34 - name: cin

35 type: integer

36 args:

37 - customer

38 - name: cout

39 type: integer

40 args:

41 - customer

⬇

42cost_type: integer

43reduce: min

44objects:

45 - customer

46state_variables:

47 - name: U

48 type: set

49 object: customer

50 - name: i

51 type: element

52 object: customer

53 - name: t

54 type: integer

55 preference: less

56tables:

57 - name: a

58 type: integer

59 args:

60 - customer

61 - name: b

62 type: integer

63 args:

64 - customer

65 - name: c

66 type: integer

67 args:

68 - customer

69 - customer

70 - name: cstar

71 type: integer

72 args:

73 - customer

74 - customer

75 - name: cin

76 type: integer

77 args:

78 - customer

79 - name: cout

80 type: integer

81 args:

82 - customer

83transitions:

84 - name: visit

85 parameters:

86 - name: j

87 object: U

88 effect:

89 U: (remove j U)

90 i: j

91 t: (max (+ t (c i j)) (a j))

92 cost: (+ (c i j) cost)

93 preconditions:

94 - (<= (+ t (c i j)) (b j))

95constraints:

96 - condition: (<= (+ t (cstar i j)) (b j))

97 forall:

98 - name: j

99 object: U

100base_cases:

101 - conditions:

102 - (is_empty U)

103 cost: (c i 0)

104dual_bounds:

105 - (+ (sum cin U) (cin 0))

106 - (+ (sum cout U) (cout i))

Figure 1: YAML-DyPDL domain file for TSPTW.

We present how the DyPDL model in Example 4 is described by YAML-DyPDL. Following PDDL, we require two files, a domain file and a problem file, to define a DyPDL model. A domain file describes a class of problems by declaring state variables and constants and defining transitions, base cases, and dual bound functions using expressions. In contrast, a problem file describes one problem instance by defining information specific to that instance, e.g., the target state and the values of constants.

Figure 1 shows the domain file for the DyPDL model of TSPTW. The domain file is a map in YAML, which associates keys with values. In YAML, a key and a value are split by :. Keys and values can be maps, lists of values, strings, integers, and floating-point numbers. A list is described by multiple lines starting with -, and each value after - is an element of the list. In YAML, we can also use a JSON-like syntax,⁴⁴4https://www.json.org/json-en.html where a map is described as { key_1: value_1, …, key_n: value_n }, and a list is described as [value_1, …, value_n].

4.1.1 Cost Type

The first line defines key cost_type and its value integer, meaning that the cost of the DyPDL model is computed in integers. While the DyPDL formalism considers numeric expressions that return a rational number, in a software implementation, it is beneficial to differentiate integer and continuous values. In YAML-DyPDL, we explicitly divide numeric expressions into integer and continuous expressions. The value of the key reduce is min, which means that we want to minimize the cost.

4.1.2 Object Types

The key objects, whose value is a list of strings, defines object types. In the example, the list only contains one value, customer. An object type is associated with a set of $n$ nonnegative integers $\{0,...,n-1\}$ , where $n$ is defined in a problem file. The customer object type represents a set of customers $N=\{0,...,n-1\}$ in TSPTW. The object type is used later to define a set variable and constants.

4.1.3 State Variables

The key state_variables defines state variables. The value is a list of maps describing a state variable. For each state variable, we have key name defining the name and key type defining the type, which is either element, set, integer, or continuous.

The variable U is the set variable $U$ representing the set of unvisited customers. YAML-DyPDL requires associating a set variable with an object type. The variable $U$ is associated with the object type, customer, by object: customer. Then, the domain of $U$ is restricted to $2^{N}$ . This requirement arises from practical implementations of set variables; we want to know the maximum cardinality of a set variable to efficiently represent it in a computer program (e.g., using a fixed length bit vector).

The variable i is the element variable $i$ representing the current location. YAML-DyPDL also requires associating an element variable with an object type for readability; by associating an element variable with an object type, it is easier to understand the meaning of the variable. However, the domain of the element variable is not restricted by the number of objects, $n$ ; while objects are indexed from $0$ to $n-1$ , a user may want to use $n$ to represent none of them.

The variable t is the numeric variable $t$ representing the current time. For this variable, the preference is defined by preference: less, which means that a state having smaller $t$ dominates another state if $U$ and $i$ are the same. Such a variable is called a resource variable. Resource variables define approximate dominance relation in Definition 15: given two states $S$ and $S^{\prime}$ , if a state $S[v]\geq S^{\prime}[v]$ for each resource variable $v$ where greater is preferred (preference: greater), $S[v]\leq S^{\prime}[v]$ for each resource variable where less is preferred (preference: less), and $S[v]=S^{\prime}[v]$ for each non-resource variable $v$ , then $S$ dominates $S^{\prime}$ . This relation trivially satisfies reflexivity and transitivity, and thus it is a preorder.

4.1.4 Tables

The value of the key tables is a list of maps declaring tables of constants. A table maps a tuple of objects to a constant. The table a represents the beginning of the time window $a_{j}$ at customer $j$ , so the values in the table are integers (type: integer). The concrete values are given in a problem file. The key args defines the object types associated with a table using a list. For a, one customer $j$ is associated with the value $a_{j}$ , so the list contains only one string customer. The tables b, cin, and cout are defined for the deadline $b_{j}$ , the minimum travel time to a customer $c^{\text{in}}_{j}$ , and the minimum travel time from a customer $c^{\text{out}}_{j}$ , respectively. The table c is for $c_{kj}$ , the travel time from customer $k$ to $j$ . This table maps a pair of customers to an integer value, so the value of args is a list equivalent to [customer, customer]. Similarly, the shortest travel time $c^{*}_{kj}$ is represented by the table cstar.

4.1.5 Transitions

The value of the key transitions is a list of maps defining transitions. Using parameters, we can define multiple transitions in the same scheme but associated with different objects. The key name defines the name of the parameter, j, and object defines the object type. Basically, the value of the key object should be the name of the object type, e.g., customer. However, we can also use the name of a set variable. In the example, by using object: U, we state that the transition is defined for each object $j\in N$ with a precondition $j\in U$ .

The key preconditions defines preconditions by using a list of conditions. In YAML-DyPDL, conditions and expressions are described by arithmetic operations in a LISP-like syntax. In the precondition of the transition in our example, (c i j) corresponds to $c_{ij}$ , so ( $<=$ (+ t (c i j)) (b j)) corresponds to $t+c_{ij}\leq b_{j}$ . The key effect defines the effect by using a map, whose keys are names of the state variables. For set variable U, the value is a set expression (remove j U), corresponding to $U\setminus\{j\}$ . For element variable i, the value is an element expression j, corresponding to $j$ . For integer variable t, the value is an integer expression (max (+ t (c i j)) (a j)), corresponding to $\max\{t+c_{ij},a_{j}\}$ . The key cost defines the cost expression (+ (c i j) cost), corresponding to $c_{ij}+x$ . In the example, the cost expression must be an integer expression since the cost_type is integer. In the cost expression, we can use cost to represent the cost of the successor state ( $x$ ). We can also have a key forced, whose value is Boolean, indicating that transition is known to be a forced transition when it is applicable. We do not have it in the example, which means the transition is not known to be forced.

4.1.6 State Constraints

The value of the key constraints is a list of state constraints. In the DyPDL model, we have $\forall j\in U,t+c^{*}_{ij}\leq b_{j}$ . Similarly to the definition of transitions, we can define multiple state constraints with the same scheme associated with different objects using forall. The value of the key forall is a map defining the name of the parameter and the associated object type or set variable. The value of the key condition is a string describing the condition, ( $<=$ (+ t (cstar i j)) (b j)), which uses the parameter j.

4.1.7 Base Cases

The value of the key base_cases is a list of maps defining base cases. Each map has two keys, conditions and cost. The value of the key conditions is a list of conditions, and the value of the key cost is a numeric expression (must be an integer expression in the example since cost_type is integer). The condition (is_empty U) corresponds to $U=\emptyset$ , and the cost (c i 0) corresponds to $c_{i0}$ .

4.1.8 Dual Bound Functions

The value of the key dual_bounds is a list of numeric expressions describing dual bound functions. In the example, we use (+ (sum cin U) (cin 0)) and (+ (sum cout U) (cout i)) corresponding to $\sum_{j\in U}c^{\text{in}}_{j}+c^{\text{in}}_{0}=\sum_{j\in U\cup\{0\}}c^{\text{in}}_{j}$ and $\sum_{j\in U}c^{\text{out}}_{j}+c^{\text{out}}_{i}=\sum_{j\in U\cup\{i\}}c^{\text{out}}_{j}$ , respectively. Since cost_type is integer, they are integer expressions.

4.1.9 Problem File

⬇ 1object_numbers: 2 customer: 4 3target: 4 U: [1, 2, 3] 5 i: 0 6 t: 0 7table_values: 8 a: { 1: 5, 2: 0, 3: 8 } 9 b: { 1: 16, 2: 10, 3: 14 } 10 c: 11 { 12 [0, 1]: 3, [0, 2]: 4, [0, 3]: 5, 13 [1, 0]: 3, [1, 2]: 5, [1, 3]: 4, 14 [2, 0]: 4, [2, 1]: 5, [2, 3]: 3, 15 [3, 0]: 5, [3, 1]: 4, [3, 2]: 3, 16 } 17 cstar: 18 { 19 [0, 1]: 3, [0, 2]: 4, [0, 3]: 5, 20 [1, 0]: 3, [1, 2]: 5, [1, 3]: 4, 21 [2, 0]: 4, [2, 1]: 5, [2, 3]: 3, 22 [3, 0]: 5, [3, 1]: 4, [3, 2]: 3, 23 } 24 cin: { 0: 3, 1: 3, 2: 3, 3: 3 } 25 cout: { 0: 3, 1: 3, 2: 3, 3: 3 }

Figure 2: YAML-DyPDL problem file for TSPTW.

Turning to the problem file (Figure 2), the value of object_numbers is a map defining the number of objects for each object type. The value of target is a map defining the values of the state variables in the target state. For the set variable U, a list of nonnegative integers is used to define a set of elements in the set. The value of table_values is a map defining the values of the constants in the tables. For a, b, cin, and cout, a key is the index of an object, and a value is an integer. For c and cstar, a key is a list of the indices of objects.

4.2 Complexity

In Section 3, we showed that finding a solution for a DyPDL model is undecidable in general by reducing a numeric planning task to a DyPDL model. YAML-DyPDL has several restrictions compared to Definition 6. A set variable is associated with an object type, restricting its domain to a subset of a given finite set. In addition, expressions are limited by the syntax. However, these restrictions do not prevent the reduction.

Theorem 8.

Finding a solution for a finitely defined DyPDL model is undecidable even with the following restrictions.

1.

The domain of each set variable $v$ is restricted to $2^{N_{v}}$ where $N_{v}=\{0,...,n_{v}-1\}$ , and $n_{v}$ is a positive integer.
2.

Numeric expressions and element expressions are functions represented by arithmetic operations $\{+,-,\cdot,/\}$ .
3.

Set expressions are functions constructed by a set of constants, set variables, and the intersection, union, and difference of two set expressions.
4.

A condition compares two numeric expressions, compares two element expressions, or checks if a set expression is a subset of another set expression.

Proof.

We can follow the proof of Theorem 1 even with the restrictions. Since the number of propositional variables in the set $V_{P}$ in a numeric planning task is finite, we can use $n_{P^{\prime}}=|V_{P}|$ for the set variable $P^{\prime}$ representing propositional variables. Arithmetic operations $\{+,-,\cdot,/\}$ are sufficient for numeric expressions by Definition 8. Similarly, if we consider a condition $i\in S[P^{\prime}]$ , which checks if $i$ is included in a set variable $P^{\prime}$ , as $\{i\}\subseteq S[P^{\prime}]$ , the last two restrictions do not prevent the compilation of the numeric planning task to the DyPDL model. ∎

With the above reduction, we can use a system to solve a YAML-DyPDL model for the numeric planning formalism in Theorem 1.

5 State Space Search for DyPDL

We use state space search [31], which finds a path in an implicitly defined graph, to solve a DyPDL model. In particular, we focus on heuristic search algorithms [89, 33], which estimate path costs using a heuristic function. Once we interpret the state transition system defined by a DyPDL model as a graph, it is intuitive that we can use state space search to solve the model. However, in practice, state space search is not always applicable; a DyPDL model needs to satisfy particular conditions. Therefore, in what follows, we formally present state space search algorithms and the conditions with which they can be used to solve DyPDL models.

Definition 18.

Given a DyPDL model, the state transition graph is a directed graph where nodes are reachable states and there is an edge from $S$ to $S^{\prime}$ labeled with $\tau$ , $(S,S^{\prime},\tau)$ , iff $\tau\in\mathcal{T}(S)$ and $S^{\prime}=S[\![\tau]\!]$ .

We use the term path to refer to both a sequence of edges in the state transition graph and a sequence of transitions as they are equivalent. A state $S^{\prime}$ is reachable from $S$ iff there exists a path from $S$ to $S^{\prime}$ in the state transition graph, and an $S$ -solution corresponds to a path from $S$ to a base state. Trivially, if a model is acyclic, the state transition graph is also acyclic.

5.1 Cost Algebras

For a DyPDL model, we want to find a solution that minimizes or maximizes the cost. Shortest path algorithms such as Dijkstra’s algorithm [90] and A* [91] find the path minimizing the sum of the weights associated with the edges. In DyPDL, the cost of a solution can be more general, defined by cost expressions of the transitions. Edelkamp et al. [92] extended the shortest path algorithms to cost-algebraic heuristic search algorithms, which can handle more general cost structures. They introduced the notion of cost algebras, which define the cost of a path using a binary operator to combine edge weights and an operation to select the best value. Following their approach, first, we define a monoid.

Definition 19.

Let $A$ be a set, $\times:A\times A\rightarrow A$ be a binary operator, and $\mathbf{1}\in A$ . A tuple $\langle A,\times,\mathbf{1}\rangle$ is a monoid if the following conditions are satisfied.

1.

$x\times y\in A$ for $x,y\in A$ (closure).
2.

$x\times(y\times z)=(x\times y)\times z$ for $x,y,z,\in A$ (associativity).
3.

$x\times\mathbf{1}=\mathbf{1}\times x=x$ for $x\in A$ (identity).

Next, we define isotonicity, a property of a set and a binary operator with regard to comparison. Since minimization or maximization over rational numbers is sufficient for our use case, we restrict the set $A$ to rational numbers, and the comparison operator to $\leq$ . The original paper by Edelkamp et al. [92] is more general.

Definition 20 (Isotonicity).

Given a set $A\subseteq\mathbb{Q}\cup\{-\infty,\infty\}$ and a binary operator $\times:A\times A\rightarrow A$ , $A$ is isotone if $x\leq y\rightarrow x\times z\leq y\times z$ and $x\leq y\rightarrow z\times x\leq z\times y$ for $x,y,z\in A$ .

With a monoid and isotonicity, we define a cost algebra.

Definition 21.

Let $\langle A,\times,\mathbf{1}\rangle$ be a monoid where $A\subseteq\mathbb{Q}\cup\{-\infty,\infty\}$ is isotone. The monoid $\langle A,\times,\mathbf{1}\rangle$ is a cost algebra if $\forall x\in A,\mathbf{1}\leq x$ for minimization or $\forall x\in A,\mathbf{1}\geq x$ for maximization.

5.2 Cost-Algebraic DyPDL Models

To apply cost-algebraic heuristic search, we focus on DyPDL models where cost expressions satisfy particular conditions. First, we define a monoidal DyPDL model, where cost expressions are represented by a binary operator in a monoid.

Definition 22.

Let $\langle A,\times,\mathbf{1}\rangle$ be a monoid where $A\subseteq\mathbb{Q}\cup\{-\infty,\infty\}$ . A DyPDL model $\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle$ is monoidal with $\langle A,\times,\mathbf{1}\rangle$ if the cost expression of every transition $\tau\in\mathcal{T}$ is represented as $\mathsf{cost}_{\tau}(x,S)=w_{\tau}(S)\times x$ where $w_{\tau}:\mathcal{S}\to A\setminus\{-\infty,\infty\}$ is a numeric expression , and the cost $\mathsf{base\_cost}_{B}$ of each base case $B\in\mathcal{B}$ returns a value in $A\setminus\{-\infty,\infty\}$ .

We also define a cost-algebraic DyPDL model, which requires stricter conditions.

Definition 23.

A monoidal DyPDL model $\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle$ with a monoid $\langle A,\times,\mathbf{1}\rangle$ is cost-algebraic if $\langle A,\times,\mathbf{1}\rangle$ is a cost algebra.

For example, the DP model for TSPTW is cost-algebraic with a cost algebra $\langle\mathbb{Q}_{0}^{+},+,0\rangle$ since the cost expression of each transition is defined as $(x,S)\mapsto c_{S[i],j}+x$ with $c_{S[i],j}\geq 0$ .

When a model is monoidal, we can associate a weight to each edge in the state transition graph. The weight of a path can be computed by repeatedly applying the binary operator to the weights of the edges in the path.

Definition 24.

Given a monoidal DyPDL model with $\langle A,\times,\mathbf{1}\rangle$ , the weight of an edge $(S,S^{\prime},\tau)$ is $w_{\tau}(S)$ . The weight of a path $\langle(S,S^{1},\sigma_{1}),(S^{1},S^{2},\sigma_{2}),...,(S^{m-1},S^{m},\sigma_{m})\rangle$ defined by a sequence of transitions $\sigma$ is

w_{\sigma}(S)=w_{\sigma_{1}}(S)\times w_{\sigma_{2}}(S^{1})\times...\times w_{\sigma_{m}}(S^{m-1}).

For an empty path $\langle\rangle$ , the weight is $\mathbf{1}$ .

The order of applications of the binary operator $\times$ does not matter due to the associativity. Differently from the original cost-algebraic heuristic search, the weight of a path corresponding to an $S$ -solution may not be equal to the cost of the $S$ -solution in Definition 6 due to our inclusion of the cost of a base state. In the following lemma, we associate the weight of a path with the cost of a solution.

Lemma 1.

Given a monoidal DyPDL model with a monoid $\langle A,\times,\mathbf{1}\rangle$ and a state $S$ , let $\sigma$ be an $S$ -solution. For minimization, $\mathsf{solution\_cost}(\sigma,S)=w_{\sigma}(S)\times\min_{B\in\mathcal{B}:S[\![\sigma]\!]\models C_{B}}\mathsf{base\_cost}_{B}(S[\![\sigma]\!])$ . For maximization, we replace $\min$ with $\max$ .

Proof.

If $\sigma$ is an empty sequence, since $w_{\sigma}(S)=\mathbf{1}$ and $S[\![\sigma]\!]=S$ ,

w_{\sigma}(S)\times\min_{B\in\mathcal{B}:S[\![\sigma]\!]\models C_{B}}\mathsf{base\_cost}_{B}(S[\![\sigma]\!])=\min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)=\mathsf{solution\_cost}(\sigma,S)

by Definition 6. Otherwise, let $\sigma=\langle\sigma_{1},...,\sigma_{m}\rangle$ , $S^{1}=S[\![\sigma_{1}]\!]$ , and $S^{i+1}=S^{i}[\![\sigma_{i+1}]\!]$ for $i=1,...,m-1$ . Following Definitions 6 and 22,

\mathsf{solution\_cost}(\sigma,S)=\mathsf{cost}_{\sigma_{1}}(\mathsf{solution\_cost}(\langle\sigma_{2},...,\sigma_{m}\rangle,S^{1}),S)=w_{\sigma_{1}}(S)\times\mathsf{solution\_cost}(\langle\sigma_{2},...,\sigma_{m}\rangle,S^{1}).

For $2\leq i\leq m$ , we get

\begin{split}\mathsf{solution\_cost}(\langle\sigma_{i},...,\sigma_{m}\rangle,S^{i-1})&=\mathsf{cost}_{\sigma_{i}}(\mathsf{solution\_cost}(\langle\sigma_{i+1},...,\sigma_{m}\rangle,S^{i}),S^{i-1})\\ &=w_{\sigma_{i}}(S^{i-1})\times\mathsf{solution\_cost}(\langle\sigma_{i+1},...,\sigma_{m}\rangle,S^{i}).\end{split}

For $i=m+1$ , $\mathsf{solution\_cost}(\langle\rangle,S^{m})=\min_{B\in\mathcal{B}:S^{m}\models C_{B}}\mathsf{base\_cost}_{B}(S^{m})$ . Thus, we get the equation in the lemma by Definition 24. The proof for maximization is similar. ∎

We show that isotonicity is sufficient for the Principle of Optimality in Definition 13. First, we prove its generalized version in Theorem 9. In what follows, we denote the concatenation of sequences of transitions $\sigma$ and $\sigma^{\prime}$ by $\langle\sigma;\sigma^{\prime}\rangle$ .

Theorem 9.

Consider a monoidal DyPDL model with $\langle A,\times,\mathbf{1}\rangle$ such that $A\subseteq\mathbb{Q}\cup\{-\infty,\infty\}$ and $A$ is isotone. Let $S^{\prime}$ and $S^{\prime\prime}$ be states reachable from $S$ with sequences of transitions $\sigma^{\prime}$ and $\sigma^{\prime\prime}$ , respectively, with $w_{\sigma^{\prime}}(S)\leq w_{\sigma^{\prime\prime}}(S)$ . For minimization, if there exist $S^{\prime}$ - and $S^{\prime\prime}$ -solutions $\sigma^{1}$ and $\sigma^{2}$ with $\mathsf{solution\_cost}(\sigma^{1},S^{\prime})\leq\mathsf{solution\_cost}(\sigma^{2},S^{\prime\prime})$ , then $\langle\sigma^{\prime};\sigma^{1}\rangle$ and $\langle\sigma^{\prime\prime};\sigma^{2}\rangle$ , are $S$ -solutions with $\mathsf{solution\_cost}(\langle\sigma^{\prime};\sigma^{1}\rangle,S)\leq\mathsf{solution\_cost}(\langle\sigma^{\prime\prime};\sigma^{2}\rangle,S)$ . For maximization, we replace $\leq$ with $\geq$ .

Proof.

The sequences $\langle\sigma^{\prime};\sigma^{1}\rangle$ and $\langle\sigma^{\prime\prime};\sigma^{2}\rangle$ are $S$ -solutions by Definition 6. Since $\mathsf{solution\_cost}(\langle\sigma^{\prime};\sigma^{1}\rangle,S)=w_{\sigma^{\prime}}(S)\times\mathsf{solution\_cost}(\sigma^{1},S^{\prime})$ and $A$ is isotone,

\begin{split}\mathsf{solution\_cost}(\langle\sigma^{\prime};\sigma^{1}\rangle,S)&=w_{\sigma^{\prime}}(S)\times\mathsf{solution\_cost}(\sigma^{1},S^{\prime})\leq w_{\sigma^{\prime\prime}}(S)\times\mathsf{solution\_cost}(\sigma^{1},S^{\prime})\\ &\leq w_{\sigma^{\prime\prime}}(S)\times\mathsf{solution\_cost}(\sigma^{2},S^{\prime\prime})=\mathsf{solution\_cost}(\langle\sigma^{\prime\prime};\sigma^{2}\rangle,S).\end{split}

The proof for maximization is similar. ∎

Corollary 1.

Let $\langle A,\times,\mathbf{1}\rangle$ be a monoid where $A\subseteq\mathbb{Q}\cup\{-\infty,\infty\}$ and $A$ is isotone. A monoidal DyPDL model with $\langle A,\times,\mathbf{1}\rangle$ satisfies the Principle of Optimality in Definition 13.

Intuitively, the Principle of Optimality or its sufficient condition, isotonicity, ensures that an optimal path can be constructed by extending an optimal subpath. Thus, a state space search algorithm can discard suboptimal paths to reach each node in the state transition graph. Without this property, a state space search algorithm may need to enumerate suboptimal paths to a node since they may lead to an optimal path to a goal node.

5.3 Formalization of Heuristic Search for DyPDL

A state space search algorithm searches for a path between nodes in a graph. In particular, we focus on unidirectional search algorithms, which visit nodes by traversing edges from one node (the initial node) to find a path to one of the nodes satisfying particular conditions (goal nodes). Moreover, we focus on heuristic search algorithms, which use a heuristic function to estimate the path cost from a state to a goal node using a heuristic function $h$ . For a node $S$ , a unidirectional heuristic search algorithms maintains $g(S)$ (the $g$ -value), the best path cost from the initial node to $S$ , and $h(S)$ (the $h$ -value), the estimated path cost from $S$ to a goal node. These values are used in two ways: search guidance and pruning.

For search guidance, typically, the priority of a node $S$ is computed from $g(S)$ and $h(S)$ , and the node to visit next is selected based on it. For pruning, a heuristic function needs to be admissible: $h(S)$ is a lower bound of the shortest path weight from a node $S$ to a goal node. In the conventional shortest path problem, if a heuristic function $h$ is admissible, $g(S)+h(S)$ is a lower bound on the weight of a path from the initial node to a goal node via $S$ . Therefore, when we have found a path from the initial node to a goal node with weight $\overline{\gamma}$ , we can prune the path to $S$ if $g(S)+h(S)\geq\overline{\gamma}$ . With this pruning, a heuristic search algorithm can be considered a branch-and-bound algorithm [72, 93].

While the above two functionalities of a heuristic function are fundamentally different, it is common that a single admissible heuristic function is used for both purposes. In particular, A* [91] visits the node that minimizes the $f$ -value, $f(S)=g(S)+h(S)$ . While A* does not explicitly prune paths, if the weights of edges are nonnegative, it never discovers a path $\sigma(S)$ such that $g(S)+h(S)>\gamma^{*}$ , where $\gamma^{*}$ is the shortest path weight from the initial node to a goal node.⁵⁵5While $f^{*}$ is conventionally used to represent the optimal path weight, we use $\gamma^{*}$ to explicitly distinguish it from $f$ -values. Thus, A* implicitly prunes non-optimal paths while guiding the search with the $f$ -values. However, in general, we can use different functions for the two purposes, and the one used for search guidance need not be admissible. Such multi-heuristic search algorithms have been developed particularly for the bounded-suboptimal setting, where we want to find a solution whose suboptimality is bounded by a constant factor, and the anytime setting, where we want to find increasingly better solutions until proving optimality [32, 94, 95, 96, 97, 98].

In DyPDL, a dual bound function can be used for search guidance, but we may use a different heuristic function. In this section, we do not introduce heuristic functions for search guidance and do not specify how to select the next node to visit. Instead, we provide a generic heuristic search algorithm that uses a dual bound function only for pruning and discuss its completeness and optimality. To explicitly distinguish pruning from search guidance, for a dual bound function, we use $\eta$ as in Definition 16 instead of $h$ and do not use $f$ .

We show generic pseudo-code of a heuristic search algorithm for a monoidal DyPDL model in Algorithm 1. The algorithm starts from the target state $S^{0}$ and searches for a path to a base state by traversing edges in the state transition graph. The open list $O$ stores candidate states to search. The set $G$ stores generated states to detect duplicate or dominated states. If the model satisfies isotonicity, with Theorem 9, we just need to consider the best path to each state in terms of the weight. The sequence of transitions $\sigma(S)$ represents the best path found so far from the target state $S^{0}$ to $S$ . The $g$ -value of $S$ , $g(S)$ , is the weight of the path $\sigma(S)$ . The function $\eta$ is a dual bound function, which underestimates the cost of an $S$ -solution by the $\eta$ -value of $S$ , $\eta(S)$ . The best solution found so far, $\overline{\sigma}$ , and its cost $\overline{\gamma}$ (i.e., the primal bound) is also maintained. Algorithm 1 is for minimization, and $\infty$ is replaced with $-\infty$ , $\min$ is replaced with $\max$ , $<$ is replaced with $>$ , and $\geq$ and $\leq$ are swapped for maximization. All the theoretical results shown later can be easily adapted to maximization.

Algorithm 1 Heuristic search for minimization with a monoidal DyPDL model

\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle

with

\langle A,\times,\mathbf{1}\rangle

. An approximate dominance relation

\preceq_{a}

and a dual bound function

\eta

are given as input.

1:if

S^{0}\not\models\mathcal{C}

then return NULL

\overline{\gamma}\leftarrow\infty,\overline{\sigma}\leftarrow\text{NULL}

\triangleright

Initialize the solution.

\sigma(S^{0})\leftarrow\langle\rangle

g(S^{0})\leftarrow\mathbf{1}

\triangleright

Initialize the

g

-value.

G,O\leftarrow\{S^{0}\}

\triangleright

Initialize the open list.

5:while

O\neq\emptyset

6: Let

S\in O

\triangleright

Select a state.

O\leftarrow O\setminus\{S\}

\triangleright

Remove the state.

8: if

\exists B\in\mathcal{B},S\models C_{B}

then

\text{current\_cost}\leftarrow g(S)\times\min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)

\triangleright

Compute the solution cost.

10: if

\text{current\_cost}<\overline{\gamma}

then

11:

\overline{\gamma}\leftarrow\text{current\_cost}

\overline{\sigma}\leftarrow\sigma(S)

\triangleright

Update the best solution.

12:

O\leftarrow\{S^{\prime}\in O\mid g(S^{\prime})\times\eta(S^{\prime})<\overline{\gamma}\}

\triangleright

Prune states in the open list.

13: else

14: for all

\tau\in\mathcal{T}^{*}(S):S[\![\tau]\!]\models\mathcal{C}

15:

g_{\text{current}}\leftarrow g(S)\times w_{\tau}(S)

\triangleright

Compute the

g

-value.

16: if

\not\exists S^{\prime}\in G

such that

S[\![\tau]\!]\preceq_{a}S^{\prime}

and

g_{\text{current}}\geq g(S^{\prime})

then

17: if

g_{\text{current}}\times\eta(S[\![\tau]\!])<\overline{\gamma}

then

18: if

\exists S^{\prime}\in G

such that

S^{\prime}\preceq_{a}S[\![\tau]\!]

and

g_{\text{current}}\leq g(S^{\prime})

then

19:

G\leftarrow G\setminus\{S^{\prime}\}

O\leftarrow O\setminus\{S^{\prime}\}

\triangleright

Remove a dominated state.

20:

\sigma(S[\![\tau]\!])\leftarrow\langle\sigma(S);\tau\rangle

g(S[\![\tau]\!])\leftarrow g_{\text{current}}

21:

G\leftarrow G\cup\{S[\![\tau]\!]\}

O\leftarrow O\cup\{S[\![\tau]\!]\}

\triangleright

Insert the successor state.

22:return

\overline{\sigma}

\triangleright

Return the solution.

If the target state $S^{0}$ violates the state constraints, the model does not have a solution, so we return NULL (line 1). Otherwise, the open list $O$ and $G$ are initialized with $S^{0}$ (line 4). The $g$ -value of $S^{0}$ is initialized to $\mathbf{1}$ following Definition 24 (line 3). Initially, the solution cost $\overline{\gamma}=\infty$ , and $\overline{\sigma}=\text{NULL}$ (line 2). When $O$ is empty, $\overline{\sigma}$ is returned (line 22). In such a case, the state transition graph is exhausted, and the current solution $\overline{\sigma}$ is an optimal solution, or the model does not have a solution if $\overline{\sigma}=\text{NULL}$ .

When $O$ is not empty, a state $S\in O$ is selected and removed from $O$ (lines 6 and 7). We do not specify how to select $S$ in Algorithm 1 as it depends on the concrete heuristic search algorithms implemented. If $S$ is a base state, $\sigma(S)$ is a solution, so we update the best solution if $\sigma(S)$ is better (lines 8–11). If the best solution is updated, we prune a state $S^{\prime}$ in $O$ such that $g(S^{\prime})\times\eta(S^{\prime})$ is not better than the new solution cost since the currently found paths to such states do not lead to a better solution (line 12).

If $S$ is not a base state, $S$ is expanded. We define a set of applicable transitions considering forced transitions as

\mathcal{T}^{*}(S)=\begin{cases}\{\tau\}&\text{if }\exists\tau\in\mathcal{T}(S),\tau\text{ is identified to be a forced transition}\\ \mathcal{T}(S)&\text{otherwise.}\end{cases}

(11)

We expect that identifying all forced transitions is not practical but identifying some of them is feasible, e.g., based on sufficient conditions defined by a user as in the talent scheduling example in Section 3.3.3. In the first line, when multiple forced transitions are identified, we assume that one of them is selected. For example, we can select the one defined first in the model in practice. A successor states $S[\![\tau]\!]$ is generated for each transition $\tau\in\mathcal{T}^{*}(S)$ (line 14). In doing so, successor states violating state constraints are discarded. For each successor state, we check if a state $S^{\prime}$ that dominates $S[\![\tau]\!]$ and has a better or equal $g$ -value is already generated (line 16). In such a case, $\sigma(S^{\prime})$ leads to a better or equal solution, so we prune $S[\![\tau]\!]$ . Since $S[\![\tau]\!]$ itself dominates $S[\![\tau]\!]$ , this check also works as duplicate detection. If a dominating state in $G$ is not detected, and $g_{\text{current}}\times\eta(S[\![\tau]\!])$ is better than the primal bound (line 17), we insert it into $G$ and $O$ (line 21). The best path to $S[\![\tau]\!]$ is updated to $\langle\sigma(S);\tau\rangle$ , which is an extension of $\sigma(S)$ with $\tau$ (line 20). Before doing so, we remove an existing state $S^{\prime}$ from $G$ if $S^{\prime}$ is dominated by $S[\![\tau]\!]$ with a worse or equal $g$ -value (line 18).

Algorithm 1 terminates in finite time if the model is finite and cost-algebraic. In addition, even if the model is not cost-algebraic, if it is acyclic, it still terminates in finite time. First, we show the termination for a finite and acyclic model. Intuitively, with such a model, Algorithm 1 enumerates a finite number of paths from the target state, so it eventually terminates.

Theorem 10.

Given a finite, acyclic, and monoidal DyPDL model, Algorithm 1 terminates in finite time.

Proof.

Unless the target state violates the state constraints, the algorithm terminates when $O$ becomes an empty set. In each iteration of the loop in lines 5–21, at least one state is removed from $O$ by line 7. However, multiple successor states can be added to $O$ in each iteration by line 21. We prove that the number of iterations that reach line 21 is finite. With this property, $O$ eventually becomes an empty set with finite iterations.

A successor state $S[\![\tau]\!]$ is inserted into $O$ if it is not dominated by a state in $G$ with a better or equal $g$ -value, and $g_{\text{current}}\times\eta(S[\![\tau]\!])$ is less than the current solution cost. Suppose that $S[\![\tau]\!]$ was inserted into $O$ and $G$ in line 21, and now the algorithm generates $S[\![\tau]\!]$ again in line 14. Suppose that $g_{\text{current}}=g(S)\times w_{\tau}(S)\geq g(S[\![\tau]\!])$ . If $S[\![\tau]\!]\in G$ , then we do not add $S[\![\tau]\!]$ to $O$ due to line 18. If $S[\![\tau]\!]\not\in G$ , then $S[\![\tau]\!]$ was removed from $G$ , so we should have generated a state $S^{\prime}$ such that $S[\![\tau]\!]\preceq_{a}S^{\prime}$ and $g(S^{\prime})\leq g(S[\![\tau]\!])$ (lines 16 and 19). It is possible that $S^{\prime}$ was also removed from $G$ , but in such a case, we have another state $S^{\prime\prime}\in G$ such that $S[\![\tau]\!]\preceq_{a}S^{\prime}\preceq_{a}S^{\prime\prime}$ and $g(S^{\prime\prime})\leq g(S^{\prime})\leq g(S[\![\tau]\!])$ , so $S[\![\tau]\!]$ is not inserted into $O$ again. Thus, if $S[\![\tau]\!]$ was ever inserted into $G$ , then $S[\![\tau]\!]$ is inserted into $O$ in line 21 only if $g_{\text{current}}<g(S[\![\tau]\!])$ . We need to find a better path from $S^{0}$ to $S[\![\tau]\!]$ . Since the model is finite and acyclic, the number of paths from $S^{0}$ to each state is finite. Therefore, each state is inserted into $O$ finite times. Since the model is finite, the number of reachable states is finite. By line 14, we only generate reachable states. Thus, we reach line 21 finitely many times. ∎

When the state transition graph contains cycles, there can be an infinite number of paths even if the graph is finite. However, if the model is cost-algebraic, the cost monotonically changes along a path, so having a cycle does not improve a solution. Thus, the algorithm terminates in finite time by enumerating a finite number of acyclic paths. We start with the following lemma, which confirms that the $g$ -value is the weight of the path from the target state.

Lemma 2.

After line 4 of Algorithm 1, for each state $S\in O$ , $S$ is the target state $S^{0}$ , or $S$ is reachable from $S^{0}$ with $\sigma(S)$ such that $g(S)=w_{\sigma(S)}(S^{0})$ at all lines except for 20–21.

Proof.

Assume that the following condition holds at the beginning of the current iteration: for each state $S\in O$ , $S$ is the target state $S^{0}$ with $g(S^{0})=\mathbf{1}$ , or $S$ is reachable from $S^{0}$ with $\sigma(S)$ and $g(S)=w_{\sigma(S)}(S)$ . In the first iteration, $O=\{S^{0}\}$ , so the assumption holds. When the assumption holds, the condition continues to hold until reaching lines 20–21, where the $g$ -value is updated, and a new state is added to $O$ . If we reach these lines, a non-base state $S$ was removed from $O$ in line 7. Each successor state $S[\![\tau]\!]$ is reachable from $S$ with $\langle\tau\rangle$ . By the assumption, $S=S^{0}$ , or $S$ is reachable from $S^{0}$ with $\sigma(S)$ . Therefore, $S[\![\tau]\!]$ is reachable from $S^{0}$ with $\langle\sigma(S);\tau\rangle$ . If $S[\![\tau]\!]$ is inserted into $O$ , then $\sigma(S[\![\tau]\!])=\langle\sigma(S);\tau\rangle$ . If $S=S^{0}$ ,

g(S[\![\tau]\!])=g(S^{0})\times w_{\tau}(S^{0})=\mathbf{1}\times w_{\tau}(S^{0})=w_{\tau}(S^{0})=w_{\sigma(S[\![\tau]\!])}(S^{0}).

If $S$ is not the target state, since $g(S)=w_{\sigma(S)}(S^{0})$ , by Definition 24,

g(S[\![\tau]\!])=g(S)\times w_{\tau}(S)=w_{\sigma(S)}(S^{0})\times w_{\tau}(S)=w_{\sigma(S[\![\tau]\!])}(S^{0}).

Thus, $S[\![\tau]\!]$ is reachable from $S^{0}$ with $\sigma(S[\![\tau]\!])$ and $g(S[\![\tau]\!])=w_{\sigma(S[\![\tau]\!])}(S^{0})$ , so the condition holds after line 21. By mathematical induction, the lemma is proved. ∎

Theorem 11.

Given a finite and cost-algebraic DyPDL model, Algorithm 1 terminates in finite time.

Proof.

The proof is almost the same as the proof of Theorem 10. However, now, there may be an infinite number of paths to a state since the state transition graph may contain cycles. We show that the algorithm never considers a path containing cycles when the model is cost-algebraic. Assume that for each state $S$ , the best-found path $\sigma(S)$ is acyclic up to the current iteration. This condition holds at the beginning since $\sigma(S^{0})=\langle\rangle$ is acyclic. Suppose that the algorithm generates a successor state $S[\![\tau]\!]$ that is already included in the path $\sigma(S)$ . Then, $S[\![\tau]\!]$ was generated before. In addition, $S[\![\tau]\!]$ is not a base state since it has a successor state on $\sigma(S)$ . Since $\sigma(S)$ is acyclic, $S[\![\tau]\!]$ is included only once. Let $\sigma(S)=\langle\sigma^{1};\sigma^{2}\rangle$ where $\sigma^{1}$ is the path from $S^{0}$ to $S[\![\tau]\!]$ . By Lemma 2, we have

g_{\text{current}}=g(S)\times w_{\tau}(S)=w_{\sigma^{1}}(S^{0})\times w_{\sigma^{2}}(S[\![\tau]\!])\times w_{\tau}(S).

If $g(S[\![\tau]\!])$ and $\sigma(S[\![\tau]\!])$ were updated after $S[\![\tau]\!]$ was generated with $\sigma(S[\![\tau]\!])=\sigma^{1}$ , then a path from $S^{0}$ to $S[\![\tau]\!]$ with a smaller weight was found by line 16. Thus, $g(S[\![\tau]\!])\leq w_{\sigma^{1}}(S^{0})=w_{\sigma^{1}}(S^{0})\times\mathbf{1}$ . By Definition 21, $\mathbf{1}\leq w_{\sigma^{2}}(S[\![\tau]\!])\times w_{\tau}(S)$ . Since $A$ is isotone,

g_{\text{current}}=w_{\sigma^{1}}(S^{0})\times w_{\sigma^{2}}(S[\![\tau]\!])\times w_{\tau}(S)\geq w_{\sigma^{1}}(S^{0})\times\mathbf{1}\geq g(S[\![\tau]\!]).

Therefore, $S[\![\tau]\!]$ is not inserted into $O$ , and $\sigma(S[\![\tau]\!])$ remains acyclic. Thus, by mathematical induction, for each state, the number of insertions into $O$ is at most the number of acyclic paths to that state, which is finite. ∎

We confirm that $\overline{\sigma}$ is a solution for a model when it is not NULL even during execution. In other words, Algorithm 1 is an anytime algorithm that can return a solution before proving the optimality.

Theorem 12.

After line 11 of Algorithm 1, if $\overline{\sigma}\neq\text{NULL}$ , then $\overline{\sigma}$ is a solution for the DyPDL model with $\overline{\gamma}=\mathsf{solution\_cost}(\overline{\sigma})$ .

Proof.

The solution $\overline{\sigma}$ is updated in line 11 when a base state $S$ is removed from $O$ in line 7. If $S=S^{0}$ , then $\overline{\sigma}=\langle\rangle$ , which is a solution. Since $g(S^{0})=\mathbf{1}$ , $\overline{\gamma}=\min_{B\in\mathcal{B}:S^{0}\models C_{B}}\mathsf{base\_cost}_{B}(S^{0})=\mathsf{solution\_cost}(\overline{\sigma})$ by Definition 6. If $S$ is not the target state, $\overline{\sigma}=\sigma(S)$ , which is a solution since $S$ is reachable from $S^{0}$ with $\sigma(S)$ by Lemma 2, and $S$ is a base state. Since $g(S)=w_{\sigma(S)}(S^{0})$ , it holds that $\overline{\gamma}=w_{\sigma(S)}(S^{0})\times\min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)=\mathsf{solution\_cost}(\overline{\sigma})$ by Lemma 1. ∎

Finally, we prove the optimality of Algorithm 1. Our proof is based on the following lemma, whose proof is presented in A.

Lemma 3.

In Algorithm 1, suppose that a solution exists for the DyPDL model, and let $\hat{\gamma}$ be its cost. When reaching line 5, at least one of the following two conditions is satisfied:

1.

$\overline{\gamma}\leq\hat{\gamma}$ .
2.

The open list $O$ contains a state $\hat{S}$ such that an $\hat{S}$ -solution $\hat{\sigma}$ exists, and $\langle\sigma(\hat{S});\hat{\sigma}\rangle$ is a solution for the model with $\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}$ .

Theorem 13.

Let $\langle A,\times,\mathbf{1}\rangle$ be a monoid where $A\subseteq\mathbb{Q}\cup\{-\infty,\infty\}$ and $A$ is isotone. Given a monoidal DyPDL model $\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle$ with $\langle A,\times,\mathbf{1}\rangle$ , if an optimal solution exists for the model, and Algorithm 1 returns a solution that is not NULL, then the solution is optimal. If Algorithm 1 returns NULL, then the model is infeasible.

Proof.

Suppose that a solution exists, and let $\hat{\gamma}$ be its cost. By Lemma 3, when we reach line 5 with $O=\emptyset$ , $\overline{\gamma}\leq\hat{\gamma}$ . Since $\overline{\gamma}\neq\infty$ , it holds that $\overline{\sigma}\neq\text{NULL}$ by line 11. Therefore, if a solution exists, NULL is never returned, i.e., NULL is returned only if the model is infeasible. Suppose that an optimal solution exists, and let $\gamma^{*}$ be its cost. Now, consider the above discussion with $\hat{\gamma}=\gamma^{*}$ . When we reach line 5 with $O=\emptyset$ , $\overline{\gamma}\leq\gamma^{*}$ . By Theorem 12, $\overline{\sigma}$ is a solution with $\mathsf{solution\_cost}(\overline{\sigma})=\overline{\gamma}$ . Since $\mathsf{solution\_cost}(\overline{\sigma})\geq\gamma^{*}$ , $\overline{\gamma}=\gamma^{*}$ and $\overline{\sigma}$ is an optimal solution. Therefore, if an optimal solution exists, and the algorithm returns a solution, the solution is optimal. ∎

Corollary 2.

Given a finite and cost-algebraic DyPDL model, the model has an optimal solution, or the model is infeasible. A problem to decide if a solution whose cost is less (greater) than a given rational number exists for minimization (maximization) is decidable.

Note that Theorem 13 does not require a model to be finite, acyclic, or cost-algebraic. While the algorithm terminates in finite time if the model is finite and acyclic or cost-algebraic, there is no guarantee in general due to the undecidability in Theorem 1. However, even for such a model, if the algorithm terminates, the optimality or infeasibility is proved.

As shown in the proof of Lemma 3, when an optimal solution exists, a state $\hat{S}$ such that there exists an optimal solution extending $\sigma(\hat{S})$ is included in the open list. For minimization (maximization), by taking the minimum (maximum) $g(S)\times\eta(S)$ value in the open list, we can obtain a dual bound, i.e., a lower (upper) bound on the optimal solution cost.

Theorem 14.

\min_{S\in O}g(S)\times\eta(S)\leq\gamma^{*}.

Proof.

By Lemma 3, if $O\neq\emptyset$ , then $\overline{\gamma}=\gamma^{*}$ , or there exists a state $\hat{S}\in O$ on an optimal solution, i.e., there exists an $\hat{S}$ -solution $\hat{\sigma}$ such that $\langle\sigma(\hat{S});\hat{\sigma}\rangle$ is an optimal solution. If $\overline{\gamma}=\gamma^{*}$ , by lines 12 and 17, $\min_{S\in O}g(S)\times\eta(S)<\gamma^{*}$ . Otherwise, $\hat{S}\in O$ . Since $\eta(\hat{S})\leq\mathsf{solution\_cost}(\hat{\sigma},\hat{S})$ and $A$ is isotone,

\min_{S\in O}g(S)\times\eta(S)\leq g(\hat{S})\times\eta(\hat{S})\leq g(\hat{S})\times\mathsf{solution\_cost}(\hat{\sigma},\hat{S})=\gamma^{*}.

∎

5.4 Heuristic Search Algorithms for DyPDL

We introduce existing heuristic search algorithms as instantiations of Algorithm 1 so that we can use them for DyPDL. In particular, each algorithm differs in how to select a state $S$ to remove from the open list $O$ in line 6. In addition to A*, which is the most fundamental heuristic search algorithm, we select anytime algorithms that have been applied to combinatorial optimization problems in problem-specific settings. For detailed descriptions of the algorithms, please refer to the papers that proposed them. Similar to A*, in our configuration, these algorithms use a heuristic function $h$ and guide the search with the $f$ -value, which is computed as $f(S)=g(S)\times h(S)$ , where $\times$ is a binary operator such as $+$ . As we discussed in Section 5.3, $h$ is not necessarily a dual bound function and not necessarily admissible.

5.4.1 CAASDy: Cost-Algebraic A* Solver for DyPDL

A* selects a state with the best $f$ -value in lines 7 (i.e., the minimum $f$ -value for minimization and the maximum $f$ -value for maximization). If there are multiple states with the best $f$ -value, one is selected according to a tie-breaking strategy. Among states with the same $f$ -value, we select a state with the best $h$ -value (with “best” defined accordingly to the best $f$ -value). In what follows, if we select a state according to the $f$ -values in other algorithms, we also assume that a state with the best $f$ -value is selected, and ties are broken by the $h$ -values. If there are multiple states with the best $f$ - and $h$ -values, we use another tie-breaking strategy, which is not specified here and discussed later when we describe the implementation. We call our solver cost-algebraic A* solver for DyPDL (CAASDy) as we originally proposed it only for cost-algebraic models [74]. However, as shown in Theorems 12 and 13, CAASDy is applicable to monoidal and acyclic models with a monoid $\langle A,\times,\mathbf{1}\rangle$ if $A$ is isotone.

In original A*, if $h$ is admissible, the first solution found is optimal. Previous work has generalized A* to cost-algebraic heuristic search with this property [92]. In our case, if a model is not cost-algebraic, the first solution may not be an optimal solution. In addition, even if a model is cost-algebraic, our problem setting is slightly different from Edelkamp et al. [92]: a base case has a cost, so the cost of a solution can be different from the weight of the corresponding path. If a model is cost-algebraic and the costs of the base cases do not matter, we can prove that the first solution found by CAASDy is optimal.

Theorem 15.

Given a cost-algebraic DyPDL model with a monoid $\langle A,\times,\mathbf{1}\rangle$ , let $h$ be an admissible heuristic function, i.e., given any reachable state $S$ and any $S$ -solution $\sigma$ , $h(S)\leq\mathsf{solution\_cost}(\sigma,S)$ for minimization. If an optimal solution exists for the model, the costs of base cases are $\mathbf{1}$ , i.e., $\forall B\in\mathcal{B},\mathsf{base\_cost}_{B}(S)=\mathbf{1}$ , and $h(S)\geq\mathbf{1}$ for any reachable state $S$ , then the first found solution by CAASDy is optimal.

Proof.

Let $\overline{\sigma}=\sigma(S)$ be the first found solution with the cost $\overline{\gamma}$ in line 11. Since $\min\limits_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)=\mathbf{1}$ , $\overline{\gamma}=g(S)$ . Since $\mathbf{1}\leq h(S)\leq\min\limits_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)=\mathbf{1}$ , $f(S)=g(S)\times h(S)=g(S)=\overline{\gamma}$ . If $\sigma(S)$ is not an optimal solution, by Lemma 3, $O$ contains a state $\hat{S}\in O$ such that there exists an $\hat{S}$ -solution $\hat{\sigma}$ with $\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)=\gamma^{*}<\overline{\gamma}$ . Since $A$ is isotone, $\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)=g(\hat{S})\times\mathsf{solution\_cost}(\hat{\sigma},\hat{S})\geq g(\hat{S})\times h(\hat{S})$ . Therefore,

f(\hat{S})=g(\hat{S})\times h(\hat{S})\leq\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)<\overline{\gamma}=f(S).

Thus, $\hat{S}$ should have been expanded before $S$ , which is a contradiction. ∎

5.4.2 Depth-First Branch-and-Bound (DFBnB)

Theorem 15 indicates a potential disadvantage of A*: it does not find any feasible solutions before proving the optimality, which may take a long time. In practice, having a subopitmal solution is better than having no solution. In some combinatorial optimization problems, we can find a solution by applying a fixed number of transitions. For example, in talent scheduling, any sequence of scenes is a solution. With this observation, prioritizing depth in the state transition graph possibly leads to quickly finding a solution.

Depth-first branch-and-bound (DFBnB) expands states in the depth-first order, i.e., a state $S$ that maximizes $|\sigma(S)|$ is selected in line 6. Concretely, the open list $O$ is implemented as a stack, and the state on the top of the stack (the state added to $O$ most recently) is selected in line 6. Successor states of the same state have the same priority, so ties are broken by the $f$ -values. DFBnB has been applied to state-based formulations of combinatorial optimization problems such as the sequential ordering problem (SOP) [88], the traveling salesperson problem (TSP), and single machine scheduling [99, 100].

5.4.3 Cyclic-Best First Search (CBFS)

In DFBnB, in terms of search guidance, $f$ -values are used to break ties between successor states of the same state. Cyclic-best first search (CBFS) [101] is similar to DFBnB, but it relies more on $f$ -values to guide the search and can be viewed as a hybridization of A* and DFBnB. CBFS partitions the open list $O$ into layers $O_{i}$ for each depth $i$ . A state $S$ is inserted into $O_{i}$ if $\sigma(S)$ has $i$ transitions. At the beginning, $O_{0}=\{S^{0}\}$ and $O_{i}=\emptyset$ for $i>0$ . Starting with $i=0$ , if $O_{i}\neq\emptyset$ , CBFS selects a state having the best priority from $O_{i}$ in line 6 and inserts successor states into $O_{i+1}$ in line 21. We use the $f$ -value as the priority. After that, CBFS increases $i$ by $1$ . However, when $i$ is the maximum depth, CBFS resets $i$ to $0$ instead of incrementing it. The maximum depth is usually known in a problem-specific setting, but we do not use a fixed parameter in our setting. Instead, we set $i$ to $0$ when a new best solution is found after line 11, or $O_{j}=\emptyset$ for all $j\geq i$ . In problem specific settings, CBFS was used in single machine scheduling [101] and the simple assembly line balancing problem (SALBP-1) [102, 103].

5.4.4 Anytime Column Search (ACS)

Anytime column search (ACS) [99] can be considered a generalized version of CBFS, expanding $b$ states at each depth. ACS also partitions the open list $O$ into $O_{i}$ for each depth $i$ and selects a state from $O_{i}$ in line 6, starting with $i=0$ and $O_{0}=\{S_{0}\}$ . ACS increases $i$ by $1$ after removing $b$ states from $O_{i}$ or when $O_{i}$ becomes empty, where $b$ is a parameter. We remove the best $b$ states according to the $f$ -values.

Anytime column progressive search (ACPS) [99] is a non-parametric version of ACS, which starts from $b=1$ and increases $b$ by $1$ when it reaches the maximum depth. Similarly to CBFS, we set $i$ to $0$ and increase $b$ by $1$ when a new best solution is found or $\forall j\geq i,O_{j}=\emptyset$ . For combinatorial optimization, ACS and ACPS were evaluated on TSP [99].

5.4.5 Anytime Pack Search (APS)

Anytime pack search (APS) [100] hybridizes A* and DFBnB in a different way from CBFS and ACS. It maintains the set of the best states $O_{b}\subseteq O$ , initialized with $\{S^{0}\}$ , the set of the best successor states $O_{c}\subseteq O$ , and a suspend list $O_{s}\subseteq O$ . APS expands all states from $O_{b}$ in line 6 and inserts the best $b$ successor states according to a priority into $O_{c}$ and other successor states into $O_{s}$ . When there are fewer than $b$ successor states, all of them are inserted into $O_{c}$ . After expanding all states in $O_{b}$ , APS swaps $O_{b}$ and $O_{c}$ and continues the procedure. If $O_{b}$ and $O_{c}$ are empty, the best $b$ states are moved from $O_{s}$ to $O_{b}$ . We use the $f$ -value as the priority to select states.

Anytime pack progressive search (APPS) [100] starts from $b=1$ and increases $b$ by $\delta$ if $b<\overline{b}$ when the best $b$ states are moved from $O_{s}$ to $O_{b}$ , where $\delta$ and $\overline{b}$ are parameters. We use $\delta=1$ and $\overline{b}=\infty$ following the configuration in TSP and single machine scheduling [100].

5.4.6 Discrepancy-Based Search

We consider a search strategy originating from the CP community, discrepancy-based search [104], which assumes that successor states of the same state are assigned priorities based on some estimation. In our case, we use the $f$ -value as the priority. The idea of discrepancy-based search is that the estimation may make mistakes, but the number of mistakes made by a strong guidance heuristic is limited. For each path, the discrepancy, the number of deviations from the estimated best path, is maintained. In our case, the target state has a discrepancy of $0$ . When a state $S$ has a discrepancy of $d$ , the successor states with the best priority has the discrepancy of $d$ . Other successor states have the discrepancy of $d+1$ . Discrepancy-based search algorithms search states whose discrepancy is smaller than an upper bound and iteratively increase the upper bound.

Discrepancy-bounded depth-first search (DBDFS) [105] performs depth-first search that only expands states having the discrepancy between $(i-1)k$ and $ik-1$ inclusive, where $i$ starts from $1$ and increases by $1$ when all states within the range are expanded, and $k$ is a parameter. The open list is partitioned into two sets $O_{0}$ and $O_{1}$ , and $O_{0}=\{S^{0}\}$ and $O_{1}=\emptyset$ at the beginning. A state is selected from $O_{0}$ in line 6. Successor states with the discrepancy between $(i-1)k$ and $ik-1$ are added to $O_{0}$ , and other states are added to $O_{1}$ . When $O_{0}$ becomes empty, it is swapped with $O_{1}$ , and $i$ is increased by $1$ . The discrepancy of states in $O_{1}$ is $ik$ because the discrepancy is increased by at most $1$ at a successor state. Therefore, after swapping $O_{0}$ with $O_{1}$ , the discrepancy of states in $O_{0}$ falls between the new bounds, $ik$ and $(i+1)k-1$ . For depth-first search, when selecting a state to remove from $O_{0}$ , we break ties by the $f$ -values. We use $k=1$ in our configuration. Discrepancy-based search was originally proposed as tree search [104, 105] and later applied to state space search for SOP [88].

5.4.7 Complete Anytime Beam Search (CABS)

While the above algorithms except for A* are similar to depth-first search, we consider beam search, a heuristic search algorithm that searches the state transition graph layer by layer, similar to breadth-first search. Previous work in heuristic search has shown that breadth-first search can be beneficial in saving memory by discarding states in previous layers [106]. However, breadth-first search may take a long time to find a solution, particularly in our setting. In DP models for combinatorial optimization problems such as TSPTW and talent scheduling, all solutions have the same length, and an algorithm needs to reach the last layer to find a solution. For such settings, breadth-first search may need to expand many states before reaching the last layer. Beam search mitigates this issue by expanding at most $b$ states at each layer, where $b$ is a parameter called a beam width, while losing completeness.

Since beam search cannot be considered an instantiation of Algorithm 1, we provide dedicated pseudo-code in Algorithm 2. Beam search maintains states in the same layer, i.e., states that are reached with the same number of transitions, in the open list $O$ , which is initialized with the target state. Beam search expands all states in $O$ , inserts the best $b$ successor states into $O$ , and discards the remaining successor states, where $b$ is a parameter called a beam width. Beam search may discard all successor states leading to solutions, so it is incomplete, i.e., it may not find a solution.

Algorithm 2 Beam search for minimization with a monoidal DyPDL model

\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle

with

\langle A,\times,\mathbf{1}\rangle

. An approximate dominance relation

\preceq_{a}

, a dual bound function

\eta

, a primal bound

\overline{\gamma}

, and a beam width

b

are given as input.

1:if

S^{0}\not\models\mathcal{C}

then return NULL,

\top

\overline{\sigma}\leftarrow\text{NULL}

\text{complete}\leftarrow\top

\triangleright

Initialize the solution.

l\leftarrow 0

\sigma^{l}(S^{0})\leftarrow\langle\rangle

g^{l}(S^{0})\leftarrow\mathbf{1}

\triangleright

Initialize the

g

-value.

O\leftarrow\{S^{0}\}

\triangleright

Initialize the open list.

5:while

O\neq\emptyset

and

\overline{\sigma}=\text{NULL}

G\leftarrow\emptyset

\triangleright

Initialize the set of states.

7: for all

S\in O

8: if

\exists B\in\mathcal{B},S\models C_{B}

then

\text{current\_cost}\leftarrow g^{l}(S)\times\min_{B\in\mathcal{B}:S\models C_{B}}\mathsf{base\_cost}_{B}(S)

\triangleright

Compute the solution cost.

10: if

\text{current\_cost}<\overline{\gamma}

then

11:

\overline{\gamma}\leftarrow\text{current\_cost}

\overline{\sigma}\leftarrow\sigma^{l}(S)

\triangleright

Update the best solution.

12: else

13: for all

\tau\in\mathcal{T}^{*}(S):S[\![\tau]\!]\models\mathcal{C}

14:

g_{\text{current}}\leftarrow g^{l}(S)\times w_{\tau}(S)

\triangleright

Compute the

g

-value.

15: if

\not\exists S^{\prime}\in G

such that

S[\![\tau]\!]\preceq_{a}S^{\prime}

and

g_{\text{current}}\geq g^{l+1}(S^{\prime})

then

16: if

g_{\text{current}}\times\eta(S[\![\tau]\!])<\overline{\gamma}

then

17: if

\exists S^{\prime}\in G

such that

S^{\prime}\preceq_{a}S[\![\tau]\!]

and

g_{\text{current}}\leq g^{l+1}(S^{\prime})

then

18:

G\leftarrow G\setminus\{S^{\prime}\}

\triangleright

Remove a dominated state.

19:

\sigma^{l+1}(S[\![\tau]\!])\leftarrow\langle\sigma^{l}(S);\tau\rangle

g^{l+1}(S[\![\tau]\!])\leftarrow g_{\text{current}}

\triangleright

Update the

g

-value.

20:

G\leftarrow G\cup\{S[\![\tau]\!]\}

\triangleright

Insert the successor state.

21:

l\leftarrow l+1

\triangleright

Proceed to the next layer.

22:

O\leftarrow\{S\in G\mid g^{l}(S)\times\eta(S)<\overline{\gamma}\}

\triangleright

Prune states by the bound.

23: if

|O|>b

then

24:

O\leftarrow

the best

b

states in

O

\text{complete}\leftarrow\bot

\triangleright

Keep the best

b

states.

25:if complete and

O\neq\emptyset

then

26:

\text{complete}\leftarrow\bot

\triangleright

A better solution may exist.

27:return

\overline{\sigma}

, complete

\triangleright

Return the solution.

Algorithm 3 CABS for minimization with a monoidal DyPDL model

\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle

with

\langle A,\times,\mathbf{1}\rangle

. An approximate dominance relation

\preceq_{a}

and a dual bound function

\eta

are given as input.

\overline{\gamma}\leftarrow\infty

\overline{\sigma}\leftarrow\text{NULL}

b\leftarrow 1

\triangleright

Initialization.

2:loop

\sigma,\text{complete}\leftarrow

BeamSearch(

\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle

\preceq_{a}

\eta

\overline{\gamma}

b

)

\triangleright

Execute Algorithm 2.

4: if

\mathsf{solution\_cost}(\sigma)<\overline{\gamma}

then

\overline{\gamma}\leftarrow\mathsf{solution\_cost}(\sigma)

\overline{\sigma}\leftarrow\sigma

\triangleright

Update the solution.

6: if complete then

7: return

\overline{\sigma}

\triangleright

Return the solution.

b\leftarrow 2b

\triangleright

Double the beam width.

Complete anytime beam search (CABS) [107] is a complete version of beam search. In our configuration, CABS iteratively executes beam search with a beam width $b$ starting from $b=1$ and doubles $b$ after each iteration until finding an optimal solution or proving infeasibility. With this strategy, CABS tries to quickly find a solution with small $b$ and eventually converges to breadth-first search when $b$ is large enough. Note that this configuration follows Libralesso et al. [88, 108], who applied CABS to SOP and permutation flowshop. Originally, Zhang [107] considered a generalized version of beam search, i.e., successor states inserted into $O$ are decided by a user-provided pruning rule, which can be different from selecting the best $b$ states. In this generalized version, CABS repeats beam search while relaxing the pruning rule and terminates when it finds a satisfactory solution according to some criterion.

In Algorithm 2, beam search maintains the set of states in the current layer using the open list $O$ , and the set of states in the next layer using $G$ . The open list $O$ is updated to $G$ after generating all successor states while pruning states based on the bound (line 22). If $O$ contains more than $b$ states, only the best $b$ states are kept. This operation may prune optimal solutions, so the flag complete, which indicates the completeness of beam search, becomes $\bot$ . Beam search terminates when $O$ becomes empty, or a solution is found (line 5). Even if $\text{complete}=\top$ , and a solution is found, when $O$ is not empty, we may miss a better solution (line 25). Therefore, we update complete to $\bot$ in such a case (line 26). We can derive the maximization version of Algorithm 2 in a similar way as Algorithm 1.

Beam search in Algorithm 2 has several properties that are different from Algorithm 1.

1.

The set $G$ , which is used to detect dominance, contains only states in the next layer.
2.

A state $S$ may be dropped from the open list $O$ even if $g(S)\times\eta(S)<\overline{\gamma}$ (line 24).
3.

Beam search may terminate when a solution is found even if $O\neq\emptyset$ .

With Property (1), beam search can potentially save memory compared to Algorithm 1. This method can be considered layered duplicate detection as proposed in previous work [106]. With this strategy, we do not detect duplicates when the same states appear in different layers. When a generated successor state $S[\![\tau]\!]$ in the next layer is the same as a state $S^{\prime}$ in the current layer, in line 19, we do not want to update $g(S^{\prime})$ and $\sigma(S^{\prime})$ since we do not check if a better path to $S^{\prime}$ is found. Thus, we maintain $l$ , which is incremented by 1 after each layer (line 21), and use $g^{l}$ and $\sigma^{l}$ to differentiate $g$ and $\sigma$ for different layers.

Our layered duplicate detection mechanism prevents us from using beam search when the state transition graph contains cycles; beam search cannot store states found in the previous layers, so it continues to expand states in a cycle. This issue can be addressed by initializing $G$ with $\{S^{0}\}$ outside the while loop, e.g., just after line 4, and removing line 6. With this modification, beam search can be used for a cyclic but cost-algebraic DyPDL model.

By Properties (2) and (3), there is no guarantee that beam search proves the optimality or infeasibility unless $\text{complete}=\top$ . However, CABS (Algorithm 3) has the guarantee of optimality as it repeats beam search until complete becomes $\top$ . In what follows, we formalize the above points. Once again, we present the theoretical results for minimization, but they can be easily adapted to maximization.

Theorem 16.

Given a finite, acyclic, and monoidal DyPDL model, beam search terminates in finite time.

Proof.

Suppose that we have generated a successor state $S[\![\tau]\!]$ , which was generated before. The difference from Algorithm 1 is that we need to consider is Property (1). If $S[\![\tau]\!]$ was generated before as a successor state of a state in the current layer, by the proof of Theorem 10, there exists a state $S^{\prime}\in G$ with $S[\![\tau]\!]\preceq_{a}S^{\prime}$ . The successor state $S[\![\tau]\!]$ is inserted into $G$ again only if we find a better path to $S[\![\tau]\!]$ . If $S[\![\tau]\!]$ was generated before as a successor state of a state in a previous layer, the path to $S[\![\tau]\!]$ at that time was shorter (in terms of the number of transitions) than that of the current path. Thus, the current path is different from the previous path. The successor state $S[\![\tau]\!]$ may be inserted into $G$ since it is not included in $G$ . In either case, $S[\![\tau]\!]$ is inserted into $G$ again only if we find a new path to it. Since the number of paths to $S[\![\tau]\!]$ is finite, we insert $S[\![\tau]\!]$ into $G$ finite times. The rest of the proof follows that of Theorem 10. ∎

As we discussed above, with a slight modification, we can remove Property (1) and prove the termination of beam search for a cost-algebraic DyPDL model.

Since Properties (1)–(3) do not affect the proofs of Lemma 2 and Theorem 12, the following theorem holds.

Theorem 17.

After line 11 of Algorithm 2, if $\overline{\sigma}\neq\text{NULL}$ , then $\overline{\sigma}$ is a solution for the model with $\overline{\gamma}=\mathsf{solution\_cost}(\overline{\sigma})$ .

We also prove the optimality of beam search when $\text{complete}=\top$ .

Theorem 18.

Let $\langle A,\times,\mathbf{1}\rangle$ be a monoid where $A\subseteq\mathbb{Q}\cup\{-\infty,\infty\}$ and $A$ is isotone. Given a monoidal DyPDL model $\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle$ with $\langle A,\times,\mathbf{1}\rangle$ and $\overline{\gamma}\in A$ , if an optimal solution exists for the model, and beam search returns $\overline{\sigma}\neq\text{NULL}$ and $\text{complete}=\top$ , then $\overline{\sigma}$ is an optimal solution. If beam search returns $\overline{\sigma}=\text{NULL}$ and $\text{complete}=\top$ , then there does not exist a solution whose cost is less than $\overline{\gamma}$ .

Proof.

When $\text{complete}=\top$ is returned, during the execution, beam search never reached lines 24 and 26. Therefore, we can ignore Properties (2) and (3). If we modify Algorithm 2 so that $G$ contains states in all layers as discussed above, we can also ignore Property (1). By ignoring Properties (1)–(3), we can consider beam search as an instantiation of Algorithm 1. If the model is infeasible, or an optimal solution exists with the cost $\gamma^{*}$ and $\overline{\gamma}>\gamma^{*}$ at the beginning, the proof is exactly the same as that of Theorem 13. If $\overline{\gamma}\leq\gamma^{*}$ was given as input, beam search has never updated $\overline{\sigma}$ and $\overline{\gamma}$ , and NULL is returned if it terminates. In such a case, indeed, no solution has a cost less than $\overline{\gamma}\leq\gamma^{*}$ .

The above proof is for beam search with the modification. We confirm that it is also valid with beam search in Algorithm 2 without modification, i.e., we consider Property (1). The proof of Theorem 13 depends on Lemma 3, which claims that when a solution with a cost $\hat{\gamma}$ exists and $\overline{\gamma}>\hat{\gamma}$ , the open list contains a state $\hat{S}$ such that there exists an $\hat{S}$ -solution $\hat{\sigma}=\langle\hat{\sigma}_{1},...,\hat{\sigma}_{m}\rangle$ with $\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}$ . At the beginning, $\hat{S}=S^{0}$ exists in $O$ . When such a state $\hat{S}$ exists in the current layer, it is expanded. First, we show that a successor state of $\hat{S}$ satisfying the condition is generated. If no applicable forced transitions are identified in $\hat{S}$ , a successor state $\hat{S}[\![\hat{\sigma}_{1}]\!]$ with $\sigma(\hat{S}[\![\hat{\sigma}_{1}]\!])=\langle\sigma(\hat{S});\hat{\sigma}_{1}\rangle$ is generated, and

\mathsf{solution\_cost}(\langle\sigma(\hat{S}[\![\hat{\sigma}_{1}]\!]);\hat{\sigma}_{2},...,\hat{\sigma}_{m}\rangle)=\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}.

If an applicable forced transitions is identified, only one successor $\hat{S}[\![\tau]\!]$ is generated with a forced transition $\tau$ . By Definition 17, there exists an $\hat{S}$ -solution $\langle\tau;\hat{\sigma}^{\prime}\rangle$ with $\mathsf{solution\_cost}(\langle\tau,\hat{\sigma}^{\prime}\rangle,\hat{S})\leq\mathsf{solution\_cost}(\hat{\sigma},\hat{S})$ . Since $A$ is isotone,

\begin{split}\mathsf{solution\_cost}(\langle\sigma(\hat{S}[\![\tau]\!]);\hat{\sigma}^{\prime}\rangle)&=\mathsf{solution\_cost}(\langle\sigma(\hat{S});\tau;\hat{\sigma}^{\prime}\rangle)=w_{\sigma(\hat{S})}(S^{0})\times\mathsf{solution\_cost}(\langle\tau,\hat{\sigma}^{\prime}\rangle,\hat{S})\\ &\leq w_{\sigma(\hat{S})}(S^{0})\times\mathsf{solution\_cost}(\hat{\sigma},\hat{S})=\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}.\end{split}

If the successor state $\hat{S}[\![\tau]\!]$ (or $\hat{S}[\![\hat{\sigma}_{1}]\!]$ ) is not inserted into $G$ in line 20, another state $S^{\prime}\in G$ dominates $\hat{S}[\![\tau]\!]$ with a better or equal $g$ -value, so there exists a solution extending $\sigma(S^{\prime})$ with the cost at most $\hat{\gamma}$ . Thus, $S^{\prime}$ can be considered a new $\hat{S}$ . When $\hat{S}[\![\tau]\!]$ or $S^{\prime}$ is removed from $G$ by line 18, another state $S^{\prime\prime}$ that dominates $\hat{S}[\![\tau]\!]$ or $S^{\prime}$ with a better or equal $g$ -value is inserted into $G$ , and there exists a solution extending $\sigma(S^{\prime\prime})$ with the cost at most $\hat{\gamma}$ . ∎

For CABS, since beam search returns an optimal solution or proves the infeasibility when $\text{complete}=\top$ by Theorem 18, the optimality is straightforward by line 6 of Algorithm 3.

Corollary 3.

We prove that CABS terminates when a DyPDL model is finite, monoidal, and acyclic.

Theorem 19.

Given a finite, acyclic, and monoidal DyPDL model, CABS terminates in finite time.

Proof.

When the beam width $b$ is sufficiently large, e.g., equal to the number of reachable states in the model, beam search never reaches line 24. Since the number of reachable states is finite, $b$ eventually becomes such a large number with finite iterations. Suppose that we call beam search with sufficiently large $b$ . If $\text{complete}=\top$ is returned, we are done. Otherwise, beam search should have found a new solution whose cost is better than $\overline{\gamma}$ in line 11 and reached line 26. In this case, there exists a solution for the model. Since the state transition graph is finite and acyclic, there are a finite number of solutions, and there exists an optimal solution with the cost $\gamma^{*}$ . Since $\overline{\gamma}$ decreases after each call if $\text{complete}=\bot$ , eventually, $\overline{\gamma}$ becomes $\gamma^{*}$ , and $\text{complete}=\top$ is returned with finite iterations. By Theorem 16, each call of beam search terminates in finite time. Therefore, CABS terminates in finite time. ∎

To obtain a dual bound from beam search, we need to slightly modify Theorem 14; since beam search may discard states leading to optimal solutions, we need to keep track of the minimum (or maximum for maximization) $g^{l}(S)\times\eta(S)$ value for all discarded states in addition to states in $O$ .

Theorem 20.

Let $\langle A,\times,\mathbf{1}\rangle$ be a monoid where $A\subseteq\mathbb{Q}\cup\{-\infty,\infty\}$ and $A$ is isotone. Given a monoidal DyPDL model $\langle\mathcal{V},S^{0},\mathcal{T},\mathcal{B},\mathcal{C}\rangle$ with $\langle A,\times,\mathbf{1}\rangle$ and $\overline{\gamma}\in A$ , let $D_{m}$ be the set of states dropped in layer $m\leq l-1$ by line 24 of Algorithm 2. If an optimal solution for the model exists and has the cost $\gamma^{*}$ , just after line 22,

\min\left\{\overline{\gamma},\min_{S\in O}g^{l}(S)\times\eta(S),\min_{m=1,...,l-1}\min_{S\in D_{m}}g^{m}(S)\times\eta(S)\right\}\leq\gamma^{*}

where we assume $\min_{S\in O}g^{l}(S)\times\eta(S)=\infty$ if $O=\emptyset$ and $\min_{S\in D_{m}}g^{m}(S)\times\eta(S)=\infty$ if $D_{m}=\emptyset$ .

Proof.

If $\overline{\gamma}\leq\gamma^{*}$ , the inequality holds trivially, so we assume $\overline{\gamma}>\gamma^{*}$ . We prove that there exists a state $\hat{S}\in O\cup\bigcup_{m=1}^{l-1}D_{m}$ on an optimal path, i.e., there exists an $\hat{S}$ -solution $\hat{\sigma}$ such that $\langle\sigma^{m}(\hat{S});\hat{\sigma}\rangle$ is an optimal solution where $m\in\{0,...,l\}$ . Initially, $O=\{S^{0}\}$ , so the condition is satisfied. Suppose that a state $\hat{S}$ on an optimal path is included in $O$ just before line 7. If $\hat{S}$ is a base state, we reach line 9, and $\text{current\_cost}=\gamma^{*}$ . Since $\overline{\gamma}>\gamma^{*}$ , $\overline{\gamma}$ is updated to $\gamma^{*}$ in line 11. Then, $\overline{\gamma}=\gamma^{*}\leq\gamma^{*}$ will hold after line 22. If $\hat{S}$ is not a base state, by a similar argument to the proof of Theorem 18 (or Theorem 13 if we consider the modified version where states in all layers are kept in $G$ ), a state on an optimal path, $S^{\prime}$ , will be included in $G$ just before line 22. Since $g^{l}(S^{\prime})\times\eta(S^{\prime})\leq\gamma^{*}<\overline{\gamma}$ , $S^{\prime}\in O$ holds after line 22, and $\min_{S\in O}g^{l}(S)\times\eta(S)\leq g^{l}(S^{\prime})\times\eta(S^{\prime})\leq\gamma^{*}$ . After line 24, $S^{\prime}$ will be included in either $O$ or $D_{l}$ , which can be considered a new $\hat{S}$ . Suppose that $\hat{S}\in D_{m}$ just before line 7. Since $\hat{S}$ is never removed from $D_{m}$ , $\min_{S\in D_{m}}g^{m}(S)\times\eta(S)\leq g^{m}(\hat{S})\times\eta(\hat{S})\leq\gamma^{*}$ always holds. By mathematical induction, the theorem is proved. ∎

6 DyPDL Models for Combinatorial Optimization Problems

To show the flexibility of DyPDL, in addition to TSPTW and talent scheduling, we formulate DyPDL models for NP-hard combinatorial optimization problems from different application domains such as routing, packing, scheduling, and manufacturing. We select problem classes whose DyPDL models can be solved by our heuristic search solvers. The models are monoidal with isotonicity, which guarantees the optimality of the heuristic search solvers, as shown in Theorem 13 and Corollary 3. While some models are cost-algebraic and others are not, all of them are acyclic, so the heuristic search solvers terminate in finite time, as shown in Theorems 10 and 19. We diversify the problem classes with the following criteria:

1.

Both minimization and maximization problems are included.
2.

DyPDL models with different binary operators for the cost expressions are included: addition ( $+$ ) and taking the maximum ( $\max$ ).
3.

Each of DIDP, MIP, and CP outperforms the others in at least one problem class as shown in Section 7.

We present DyPDL models for six problem classes satisfying the above criteria in this section and three in B. For some of the problem classes, problem-specific DP approaches were previously proposed, and our DyPDL models are based on them. To concisely represent the models, we only present the Bellman equations since all models are finite and acyclic and satisfy the Principle of Optimality in Definition 13. The YAML-DyPDL files for the models are publicly available in our repository.⁶⁶6https://github.com/Kurorororo/didp-models

6.1 Capacitated Vehicle Routing Problem (CVRP)

In the capacitated vehicle routing problem (CVRP) [109], customers $N=\{0,...,n-1\}$ , where $0$ is the depot, are given, and each customer $i\in N\setminus\{0\}$ has the demand $d_{i}\geq 0$ . A solution is a tour to visit each customer in $N\setminus\{0\}$ exactly once using $m$ vehicles, which start from and return to the depot. The sum of demands of customers visited by a single vehicle must be less than or equal to the capacity $q$ . We assume $d_{i}\leq q$ for each $i\in N$ . Visiting customer $j$ from $i$ requires the travel time $c_{ij}\geq 0$ , and the objective is to minimize the total travel time. CVRP is strongly NP-hard since it generalizes TSP [110].

We formulate the DyPDL model based on the giant-tour representation [24]. We sequentially construct tours for the $m$ vehicles. Let $U$ be a set variable representing unvisited customers, $i$ be an element variable representing the current location, $l$ be a numeric variable representing the current load, and $k$ be a numeric variable representing the number of used vehicles. Both $l$ and $k$ are resource variables where less is preferred. At each step, one customer $j$ is visited by the current vehicle or a new vehicle. When a new vehicle is used, $j$ is visited via the depot, $l$ is reset, and $k$ is increased. Similar to TSPTW, let $c^{\text{in}}_{j}=\min_{k\in N\setminus\{j\}}c_{kj}$ and $c^{\text{out}}_{j}=\min_{k\in N\setminus\{j\}}c_{jk}$ .

	$\displaystyle\text{compute }V(N\setminus\{0\},0,0,1)$		(12)
	$\displaystyle V(U,i,l,k)=\begin{cases}\infty&\text{if }(m-k+1)q<l+\sum\limits_{j\in U}d_{j}\\ c_{i0}&\text{else if }U=\emptyset\\ \min\left\{\begin{array}[]{l}\min\limits_{j\in U:l+d_{j}\leq q}c_{ij}+V(U\setminus\{j\},j,l+d_{j},k)\\ \min\limits_{j\in U}c_{i0}+c_{0j}+V(U\setminus\{j\},j,d_{j},k+1)\end{array}\right.&\text{else if }\exists j\in U,l+d_{j}\leq q\land k<m\\ \min\limits_{j\in U:l+d_{j}\leq q}c_{ij}+V(U\setminus\{j\},j,l+d_{j},k)&\text{else if }\exists j\in U:l+d_{j}\leq q\\ \min\limits_{j\in U}c_{i0}+c_{0j}+V(U\setminus\{j\},j,d_{j},k+1)&\text{else if }k<m\\ \infty&\text{else}\end{cases}$		(13)
	$\displaystyle V(U,i,l,k)\leq V(U,i,l^{\prime},k^{\prime})~{}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\text{if }l\leq l^{\prime}\land k\leq k^{\prime}$		(14)
	$\displaystyle V(U,i,l,k)\geq\max\left\{\sum_{j\in U\cup\{0\}}c^{\text{in}}_{j},\sum_{j\in U\cup\{i\}}c^{\text{out}}_{j}\right\}.$		(15)

The first line of Equation (13) represents a state constraint: in a state, if the sum of capacities of the remaining vehicles ( $(m-k+1)q$ ) is less than the sum of the current load ( $l$ ) and the demands of the unvisited customers ( $\sum_{j\in U}d_{j}$ ), it does not lead to a solution. The second line is a base case where all customers are visited. The model has two types of transitions: directly visiting customer $j$ , which is applicable when the current vehicle has sufficient space ( $l+d_{j}\leq q$ ), and visiting $j$ with a new vehicle from the depot, which is applicable when there is an unused vehicle ( $k<m$ ). The third line is active when both of them are possible, and the fourth and fifth lines are active when only one of them is possible. Recall that a state $S$ dominates another state $S^{\prime}$ iff for any $S^{\prime}$ -solution, there exists an equal or better $S$ -solution with an equal or shorter length in Definition 14. If $l\leq l^{\prime}$ and $k\leq k^{\prime}$ , any $(U,i,l^{\prime},k^{\prime})$ -solution is also a $(U,i,l,k)$ -solution, so the dominance implied by Inequality (14) satisfies this condition. Inequality (15) is a dual bound function defined in the same way as Inequality (6) of the DyPDL model for TSPTW. Similar to the DyPDL model for TSPTW, this model is cost-algebraic with a cost algebra $\langle\mathbb{Q}^{+}_{0},+,0\rangle$ . However, since the base cost $c_{i0}$ is not necessarily zero, the first solution found by CAASDy may not be optimal (Theorem 15).

6.2 Multi-Commodity Pickup and Delivery TSP (m-PDTSP)

A one-to-one multi-commodity pickup and delivery traveling salesperson problem (m-PDTSP) [111] is to pick up and deliver commodities using a single vehicle. Similar to CVRP, m-PDTSP is a generalization of TSP and is strongly NP-hard. In this problem, customers $N=\{0,...,n-1\}$ , edges $A\subseteq N\times N$ , and commodities $M=\{0,...,m-1\}$ are given. The vehicle can visit customer $j$ directly from customer $i$ with the travel time $c_{ij}\geq 0$ if $(i,j)\in A$ . Each commodity $k\in M$ is picked up at customer $p_{k}\in N$ and delivered to customer $d_{k}\in N$ . The load increases (decreases) by $w_{k}$ at $p_{k}$ ( $d_{k}$ ) and must not exceed the capacity $q$ . The vehicle starts from $0$ , visits each customer once, and stops at $n-1$ . We assume that cyclic dependencies between commodities, e.g., $p_{k}=d_{k^{\prime}}$ and $p_{k^{\prime}}=d_{k}$ , do not exist.

We propose a DyPDL model based on the 1-PDTSP reduction [112] and the DP model by Castro et al. [113]. In a state, a set variable $U$ represents the set of unvisited customers, an element variable $i$ represents the current location, and a numeric resource variable $l$ represents the current load. The net change of the load at customer $j$ is represented by $\delta_{j}=\sum_{k\in M:p_{k}=j}w_{k}-\sum_{k\in M:d_{k}=j}w_{k}$ , and the customers that must be visited before $j$ is represented by $P_{j}=\{p_{k}\mid k\in M:d_{k}=j\}$ , both of which can be precomputed. The set of customers that can be visited next is $X(U,i,l)=\{j\in U\mid(i,j)\in A\land l+\delta_{j}\leq q\land P_{j}\cap U=\emptyset\}$ . Let $c^{\text{in}}_{j}=\min_{k\in N:(k,j)\in A}c_{kj}$ and $c^{\text{out}}_{j}=\min_{k\in N:(j,k)\in A}c_{jk}$ .

	$\displaystyle\text{compute }V(N\setminus\{0,n-1\},0,0)$		(16)
	$\displaystyle V(U,i,l)=\begin{cases}c_{i,n-1}&\text{if }U=\emptyset\land(i,n-1)\in A\\ \min\limits_{j\in X(U,i,l)}c_{ij}+V(U\setminus\{j\},j,l+\delta_{j})&\text{else if }X(U,i,l)\neq\emptyset\\ \infty&\text{else}\end{cases}$		(17)
	$\displaystyle V(U,i,l)\leq V(U,i,l^{\prime})\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\ \ \text{if }l\leq l^{\prime}$		(18)
	$\displaystyle V(U,i,l)\geq\max\left\{\sum_{j\in U\cup\{n-1\}}c^{\text{in}}_{j},\sum_{j\in U\cup\{i\}}c^{\text{out}}_{j}\right\}.$		(19)

Similarly to CVRP, Inequalities (18) and (19) represent resource variables and a dual bound function, the model is cost-algebraic with a cost algebra $\langle\mathbb{Q}^{+}_{0},+,0\rangle$ , and the base cost $c_{i,n-1}$ is not necessary zero.

6.3 Orienteering Problem with Time Windows (OPTW)

In the orienteering problem with time windows (OPTW) [114], customers $N=\{0,...,n-1\}$ are given, where $0$ is the depot. Visiting customer $j$ from $i$ requires the travel time $c_{ij}>0$ while producing the integer profit $p_{j}\geq 0$ . Each customer $j$ can be visited only in the time window $[a_{j},b_{j}]$ , and the vehicle needs to wait until $a_{j}$ upon earlier arrival. The objective is to maximize the total profit while starting from the depot at time $t=0$ and returning to the depot by $b_{0}$ . OPTW is strongly NP-hard since it is a generalization of the orienteering problem, which is NP-hard [115].

Our DyPDL model is similar to the DP model by Righini and Salani [26] but designed for DIDP with forced transitions and a dual bound function. A set variable $U$ represents the set of customers to visit, an element variable $i$ represents the current location, and a numeric resource variable $t$ represents the current time, where less is preferred. We visit customers one by one using transitions. Customer $j$ can be visited next if it can be visited and the depot can be reached by the deadline after visiting $j$ . Let $c^{*}_{ij}$ be the shortest travel time from $i$ to $j$ . Then, the set of customers that can be visited next is $X(U,i,t)=\{j\in U\mid t+c_{ij}\leq b_{j}\land t+c_{ij}+c^{*}_{j0}\leq b_{0}\}$ . In addition, we remove a customer that can no longer be visited using a forced transition. If $t+c^{*}_{ij}>b_{j}$ , then we can no longer visit customer $j$ . If $t+c^{*}_{ij}+c^{*}_{j0}>b_{0}$ , then we can no longer return to the depot after visiting $j$ . Thus, the set of unvisited customers that can no longer be visited is represented by $Y(U,i,t)=\{j\in U\mid t+c^{*}_{ij}>b_{j}\lor t+c^{*}_{ij}+c^{*}_{j0}>b_{0}\}$ . The set $Y(U,i,t)$ is not necessarily equivalent to $U\setminus X(U,i,t)$ since it is possible that $j$ cannot be visited directly from $i$ but can be visited via another customer when the triangle inequality does not hold.

If we take the sum of profits over $U\setminus Y(U,i,t)$ , we can compute an upper bound on the value of the current state. In addition, we use another upper bound considering the remaining time limit $b_{0}-t$ . We consider a relaxed problem, where the travel time to customer $j$ is always $c^{\text{in}}_{j}=\min_{k\in N\setminus\{j\}}c_{kj}$ . This problem can be viewed as the well-known 0-1 knapsack problem [116, 117], which is to maximize the total profit of items included in a knapsack such that the total weight of the included items does not exceed the capacity of the knapsack. Each customer $j\in U\setminus Y(U,i,t)$ is an item with the profit $p_{j}$ and the weight $c^{\text{in}}_{j}$ , and the capacity of the knapsack is $b_{0}-t-c^{\text{in}}_{0}$ since we need to return to the depot. Then, we can use the Dantzig upper bound [118], which sorts items in the descending order of the efficiency $e^{\text{in}}_{j}=p_{j}/c^{\text{in}}_{j}$ and includes as many items as possible. When an item $k$ exceeds the remaining capacity $q$ , then it is included fractionally, i.e., the profit is increased by $\lfloor qe^{\text{in}}_{k}\rfloor$ . This procedural upper bound is difficult to represent efficiently with the current YAML-DyPDL due to its declarative nature. Therefore, we further relax the problem by using $\max_{j\in U\setminus Y(U,i,t)}e^{\text{in}}_{j}$ as the efficiencies of all items, i.e., we use $\lfloor(b_{0}-t-c^{\text{in}}_{0})\max_{j\in U\setminus Y(U,i,t)}e^{\text{in}}_{j}\rfloor$ as an upper bound. Similarly, based on $c^{\text{out}}_{j}=\min_{k\in N\setminus\{j\}}c_{jk}$ , the minimum travel time from $j$ , we also use $\lfloor(b_{0}-t-c^{\text{out}}_{i})\max_{j\in U\setminus Y(U,i,t)}e^{\text{out}}_{j}\rfloor$ where $e^{\text{out}}_{j}=p_{j}/c^{\text{out}}_{j}$ .

	$\displaystyle\text{compute }V(N\setminus\{0\},0,0)$		(20)
	$\displaystyle V(U,i,t)=\begin{cases}0&\text{if }t+c_{i0}\leq b_{0}\land U=\emptyset\\ V(U\setminus\{j\},i,t)&\text{else if }\exists j\in Y(U,i,t)\\ V(U\setminus\{j\},i,t)&\text{else if }\exists j\in U\land X(U,i,t)=\emptyset\\ \max\limits_{j\in X(U,i,t)}p_{j}+V(U\setminus\{j\},j,\max\{t+c_{ij},a_{j}\})&\text{else if }X(U,i,t)\neq\emptyset\\ -\infty&\text{else}\end{cases}$		(21)
	$\displaystyle V(U,i,t)\geq V(U,i,t^{\prime})~{}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\text{if }t\leq t^{\prime}$		(22)
	$\displaystyle V(U,i,t)\leq\min\left\{\sum_{j\in U\setminus Y(U,i,t)}p_{j},\left\lfloor(b_{0}-t-c^{\text{in}}_{0})\max_{j\in U\setminus Y(U,i,t)}e^{\text{in}}_{j}\right\rfloor,\left\lfloor(b_{0}-t-c^{\text{out}}_{i})\max_{j\in U\setminus Y(U,i,t)}e^{\text{out}}_{j}\right\rfloor\right\}.$		(23)

The second line of Equation (21) removes an arbitrary customer $j$ in $Y(U,i,t)$ , which is a forced transition. The third line also defines a forced transition to remove a customer $j$ in $U$ when no customer can be visited directly ( $X(U,i,t)=\emptyset$ ); in such a case, even if $j\in U\setminus Y(U,i,t)$ , i.e., $j+c^{*}_{ij}\leq b_{j}$ , the shortest path to customer $j$ is not available. The base case (the first line of Equation (21)) becomes active when all customers are visited or removed. This condition forces the vehicle to visit as many customers as possible. Since each transition removes one customer from $U$ , and all customers must be removed in a base state, all $(U,i,t)$ - and $(U,i,t^{\prime})$ -solutions have the same length. If $t\leq t^{\prime}$ , more customers can potentially be visited, so $(U,i,t)$ leads to an equal or better solution than $(U,i,t^{\prime})$ . Thus, the dominance implied by Inequality (22) satisfies Definition 14. The cost expressions are represented by the addition of nonnegative values, so the model is monoidal with a monoid $\langle\mathbb{Q}_{0}^{+},+,0\rangle$ , and $\mathbb{Q}_{0}^{+}$ is isotone. However, the model is not cost-algebraic since it is maximization and $\forall x\in\mathbb{Q}_{0}^{+},0\geq x$ does not hold. Thus, the first solution found by CAASDy may not be optimal.

6.4 Bin Packing

In the bin packing problem [116], items $N=\{0,...,n-1\}$ are given, and each item $i$ has weight $w_{i}$ . The objective is to pack items in bins with the capacity $q$ while minimizing the number of bins. We assume $q\geq w_{i}$ for each $i\in N$ . Bin packing is strongly NP-hard [119].

In our DyPDL model, we pack items one by one. A set variable $U$ represents the set of unpacked items, and a numeric resource variable $r$ represents the remaining space in the current bin, where more is preferred. In addition, we use an element resource variable $k$ representing the number of used bins, where less is preferred. The model breaks symmetry by packing item $i$ in the $i$ -th or an earlier bin. Thus, $X(U,r,k)=\{i\in U\mid r\geq w_{i}\land i+1\geq k\}$ represents items that can be packed in the current bin. When $\forall j\in U,r<w_{j}$ , then a new bin is opened, and any item in $Y(U,k)=\{i\in U\mid i\geq k\}$ can be packed; it is a forced transition.

For a dual bound function, we use lower bounds, LB1, LB2, and LB3, used by Johnson [120]. The first lower bound, LB1, is $\left\lceil(\sum_{i\in U}w_{i}-r)/q\right\rceil$ , which relaxes the problem by allowing splitting an item across multiple bins. The second lower bound, LB2, only considers items in $\{i\in U\mid w_{i}\geq q/2\}$ . If $w_{i}>q/2$ , item $i$ cannot be packed with other items considered. If $w_{i}=q/2$ , at most one additional item $j$ with $w_{j}=q/2$ can be packed. Let $a_{i}=1$ if $w_{i}>q/2$ and $a_{i}=0$ otherwise. Let $b_{i}=1/2$ if $w_{i}=q/2$ and $b_{i}=0$ otherwise. The number of bins is lower bounded by $\sum_{i\in U}a_{i}+\left\lceil\sum_{i\in U}b_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{2}\right)$ , where $\mathbbm{1}$ is an indicator function that returns 1 if the given condition is true and 0 otherwise. The last term considers the case when an item with $w_{i}\geq q/2$ can be packed in the current bin. Similarly, LB3 only considers items in $\{i\in U\mid w_{i}\geq q/3\}$ . Let $c=1$ if $w_{i}>2q/3$ , $c_{i}=2/3$ if $w_{i}=2q/3$ , $c_{i}=1/2$ if $q/3<w_{i}<2q/3$ , $c_{i}=1/3$ if $w_{i}=q/3$ , and $c_{i}=0$ otherwise. The number of bins is lower bounded by $\left\lceil\sum_{i\in U}c_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{3}\right)$ .

	$\displaystyle\text{compute }V(N,0,0)$		(24)
	$\displaystyle V(U,r,k)=\begin{cases}0&\text{if }U=\emptyset\\ 1+V(U\setminus\{i\},q-w_{i},k+1)&\text{else if }\exists i\in Y(U,k)\land\forall j\in U,r<w_{j}\\ \min\limits_{i\in X(U,r,k)}V(U\setminus\{i\},r-w_{i},k)&\text{else if }X(U,r,k)\neq\emptyset\\ \infty&\text{else}\end{cases}$		(25)
	$\displaystyle V(U,r,k)\leq V(U,r^{\prime},k^{\prime})~{}\quad\quad\quad\quad\quad\quad\quad\quad\text{if }r\geq r^{\prime}\land k\leq k^{\prime}$		(26)
	$\displaystyle V(U,r,k)\geq\max\left\{\begin{array}[]{l}\left\lceil(\sum_{i\in U}w_{i}-r)/q\right\rceil\\ \sum_{i\in U}a_{i}+\left\lceil\sum_{i\in U}b_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{2}\right)\\ \left\lceil\sum_{i\in U}c_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{3}\right).\end{array}\right.$		(30)

Since each transition packs one item, any $(U,r,k)$ - and $(U,r^{\prime},k^{\prime})$ solutions have the same length. It is easy to see that $(U,r,k)$ leads to an equal or better solution than $(U,r^{\prime},k^{\prime})$ if $r\geq r^{\prime}$ and $k\leq k^{\prime}$ , so the dominance implied by Inequality (26) is valid. This model is cost-algebraic with a cost algebra $\langle\mathbb{Z}^{+}_{0},+,0\rangle$ . Since the base cost is always zero, the first solution found by CAASDy is optimal by Theorem 15.

6.5 Simple Assembly Line Balancing Problem (SALBP-1)

The variant of the simple assembly line balancing problem (SALBP) called SALBP-1 [121, 122] is the same as bin packing except for precedence constraints. In SALBP-1, we are given set of tasks $N=\{0,...,n-1\}$ , and each task $i$ has a processing time $w_{i}$ . A task is scheduled in a station, and the sum of the processing times of tasks in a station must not exceed the cycle time $q$ . Stations are ordered, and each task must be scheduled in the same or later station than its predecessors $P_{i}\subseteq N$ . SALBP-1 is strongly NP-hard since it is a generalization of bin packing [123].

We formulate a DyPDL model based on that of bin packing and inspired by a problem-specific heuristic search method for SALBP-1 [102, 103]. Due to the precedence constraint, we cannot schedule an arbitrary item when we open a station unlike bin packing. Thus, we do not use an element resource variable $k$ . Now, the set of tasks that can be scheduled in the current station is represented by $X(U,r)=\{i\in U\mid r\geq w_{i}\land P_{i}\cap U=\emptyset\}$ . We introduce a transition to open a new station only when $X(U,r)=\emptyset$ , which is called a maximum load pruning rule in the literature [124, 125]. Since bin packing is a relaxation of SALBP-1, we can use the dual bound function for bin packing.

	$\displaystyle\text{compute }V(N,0)$		(31)
	$\displaystyle V(U,r)=\begin{cases}0&\text{if }U=\emptyset\\ 1+V(U,q)&\text{else if }X(U,r)=\emptyset\\ \min\limits_{i\in X(U,r)}V(U\setminus\{i\},r-w_{i})&\text{else}\end{cases}$		(32)
	$\displaystyle V(U,r)\leq V(U,r^{\prime})~{}~{}~{}\quad\quad\quad\quad\quad\quad\quad\text{if }r\geq r^{\prime}$		(33)
	$\displaystyle V(U,r)\geq\max\left\{\begin{array}[]{l}\left\lceil(\sum_{i\in U}w_{i}-r)/q\right\rceil\\ \sum_{i\in U}a_{i}+\left\lceil\sum_{i\in U}b_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{2}\right)\\ \left\lceil\sum_{i\in U}c_{i}\right\rceil-\mathbbm{1}\left(r\geq\frac{q}{3}\right).\end{array}\right.$		(37)

The length of a $(U,r)$ -solution is the sum of $|U|$ and the number of stations opened, which is the cost of that solution. Therefore, if $r\geq r^{\prime}$ , then state $(U,r)$ leads to an equal or better and shorter solution than $(U,r^{\prime})$ , so the dominance implied by Inequality (33) is valid. Similar to bin packing, this model is cost-algebraic, and the base cost is zero, so the first solution found by CAASDy is optimal.

6.6 Graph-Clear

In the graph-clear problem [126], an undirected graph $(N,E)$ with the node weight $a_{i}$ for $i\in N$ and the edge weight $b_{ij}$ for $\{i,j\}\in E$ is given. In the beginning, all nodes are contaminated. In each step, one node can be made clean by sweeping it using $a_{i}$ robots and blocking each edge $\{i,j\}$ using $b_{ij}$ robots. However, while sweeping a node, an already swept node becomes contaminated if it is connected by a path of unblocked edges to a contaminated node. The optimal solution minimizes the maximum number of robots per step to make all nodes clean. This optimization problem is NP-hard since finding a solution whose cost is smaller than a given value is NP-complete [126].

Previous work [19] proved that there exists an optimal solution in which a swept node is never contaminated again. Based on this observation, the authors developed a state-based formula as the basis for MIP and CP models. We use the state-based formula directly as a DyPDL model. A set variable $C$ represents swept nodes, and one node in $N\setminus C$ is swept at each step. We block all edges connected to $c$ and all edges from contaminated nodes to already swept nodes. We assume that $b_{ij}=0$ if $\{i,j\}\notin E$ .

	$\displaystyle\text{compute }V(\emptyset)$	(38)
$\displaystyle\begin{split}&V(C)=\begin{cases}0&\text{if }C=N\\ \min\limits_{c\in N\setminus C}\max\left\{a_{c}+\sum\limits_{i\in N}b_{ci}+\sum\limits_{i\in C}\sum\limits_{j\in(N\setminus C)\setminus\{c\}}b_{ij},V(C\cup\{c\})\right\}&\text{else}\end{cases}\end{split}$		(39)
	$\displaystyle V(C)\geq 0.$	(40)

Viewing the maximum of two values ( $\max$ ) as a binary operator, $\langle\mathbb{Z}_{0}^{+},\max,0\rangle$ is a monoid since $\max\{x,y\}\in\mathbb{Z}_{0}^{+}$ , $\max\{x,\max\{y,z\}\}=\max\{\max\{x,y\},z\}$ , and $\max\{x,0\}=\max\{0,x\}=x$ for $x,y,z\in\mathbb{Z}_{0}^{+}$ . It is isotone since $x\leq y\rightarrow\max\{x,z\}\leq\max\{y,z\}$ and $x\leq y\rightarrow\max\{z,x\}\leq\max\{z,y\}$ . Since $\forall x\in\mathbb{Z}_{0}^{+},0\leq x$ , $\langle\mathbb{Z}_{0}^{+},\max,0\rangle$ is a cost algebra, so the DyPDL model is cost-algebraic. Since the base cost is always zero, the first solution found by CAASDy is optimal.

7 Experimental Evaluation

We implement and experimentally evaluate DIDP solvers using the heuristic search algorithms described in Section 5.4. We compare our DIDP solvers with commercial MIP and CP solvers, Gurobi 11.0.2 [127] and IBM ILOG CP Optimizer 22.1.0 [20]. We select state-of-the-art MIP and CP models in the literature when multiple models exist and develop a new model when we do not find an existing one. We also compare DIDP with existing state-based general-purpose solvers, domain-independent AI planners, a logic programming language, and a decision diagram-based (DD-based) solver.

7.1 Software Implementation of DIDP

We develop didp-rs v0.7.0,⁷⁷7https://github.com/domain-independent-dp/didp-rs/releases/tag/v0.7.0 a software implementation of DIDP in Rust. It has four components, dypdl,⁸⁸8https://crates.io/crates/dypdl dypdl-heuristic-search,⁹⁹9https://crates.io/crates/dypdl-heuristic-search didp-yaml,¹⁰¹⁰10https://crates.io/crates/didp-yaml and DIDPPy.¹¹¹¹11https://didppy.readthedocs.io The library dypdl is for modeling, and dypdl-heuristic-search is a library for heuristic search solvers. The commandline interface didp-yaml takes YAML-DyPDL domain and problem files and a YAML file specifying a solver as input and returns the result. DIDPPy is a Python interface whose modeling capability is equivalent to didp-yaml. In our experiment, we use didp-yaml.

As DIDP solvers, dypdl-heuristic-search implements CAASDy, DFBnB, CBFS, ACPS, APPS, DBDFS, and CABS. These solvers can handle monoidal DyPDL models with a monoid $\langle A,\times,\mathbf{1}\rangle$ where $A\subseteq\mathbb{Q}\cup\{-\infty,\infty\}$ , $\times\in\{+,\max\}$ , and $\mathbf{1}=0$ if $\times=+$ or $\mathbf{1}$ is the minimum value in $A$ if $\times=\max$ .

In all solvers, we use the dual bound function provided with a DyPDL model as a heuristic function. Thus, $f(S)=g(S)\times h(S)=g(S)\times\eta(S)$ . By Theorem 14, the best $f$ -value in the open list is a dual bound. In CAASDy, states in the open list are ordered by the $f$ -values in a binary heap, so a dual bound can be obtained by checking the top of the binary heap. Similarly, in DFBnB, CBFS, and ACPS, since states with each depth are ordered by the $f$ -values, by keeping track of the best $f$ -value in each depth, we can compute a dual bound. In APPS, when the set of the best states $O_{b}$ and the set of the best successor states $O_{c}$ become empty, the best $f$ -value of states in the suspend list $O_{s}$ is a dual bound, where states are ordered by the $f$ -values. In DBDFS, we keep track of the best $f$ -value of states inserted into $O_{1}$ and use it as a dual bound when $O_{0}$ becomes empty. In CABS, based on Theorem 20, the best $f$ -value of discarded states is maintained, and a dual bound is computed after generating all successor states in a layer. In CAASDy, CBFS, ACPS, APPS, and CABS, when the $f$ - and $h$ -values of two states are the same, the tie is broken according to the implementation of the binary heap that is used to implement the open list. In DFBnB and DBDFS, the open list is implemented with a stack, and successor states are sorted before being pushed to the stack, so the tie-breaking depends on the implementation of the sorting algorithm. While a dual bound function is provided in each DyPDL model used in our experiment, it is not required in general; when no dual bound function is provided, the DIDP solvers use the $g$ -value instead of the $f$ -value to guide the search and do not perform pruning.

As explained in Section 4.1.5, forced transitions can be explicitly defined in a DyPDL model. If such transitions are applicable in a state, our solvers keeps only the first defined one $\tau$ in the set of applicable transitions, i.e., $\mathcal{T}^{*}(S)=\{\tau\}$ in Algorithms 1 and 2. Otherwise, no forced transitions are considered, i.e., $\mathcal{T}^{*}(S)=\mathcal{T}(S)$ .

7.2 Benchmarks

We describe benchmark instances and MIP and CP models for each problem class. Except for TSPTW, DyPDL models are presented in Section 6 and B. All benchmark instances are in text format, so they are converted to YAML-DyPDL problem files by a Python script. All instances in one problem class share the same YAML-DyPDL domain file except for the multi-dimensional knapsack problem, where the number of state variables depends on an instance, and thus a domain file is generated for each instance by the Python script. All instances generated by us, MIP and CP models, YAML-DyPDL domain files, and the Python scripts are available from our repository.¹²¹²12https://github.com/Kurorororo/didp-models

7.2.1 TSPTW

For TSPTW, we use 340 instances from Dumas et al. [23], Gendreau et al. [128], Ohlmann and Thomas [129], and Ascheuer [130], where travel times are integers; while didp-rs can handle floating point numbers, the CP solver we use, CP Optimizer, does not. In these instances, the deadline to return to the depot, $b_{0}$ , is defined, but $\forall i\in N,b_{i}+c_{i0}\leq b_{0}$ holds, i.e., we can always return to the depot after visiting the final customer. Thus, in our DyPDL model (Equation (1) with redundant information in Inequalities (3), (4), and (6)), $b_{0}$ is not considered. For MIP, we use Formulation (1) proposed by Hungerländer and Truden [10]. When there are zero-cost edges, flow-based subtour elimination constraints [131] are added. We adapt a CP model for a single machine scheduling problem with time windows and sequence-dependent setup times [11] to TSPTW, where an interval variable represents the time to visit a customer. We change the objective to the sum of travel costs (setup time in their model) and add a $\mathsf{First}$ constraint ensuring that the depot is visited first.

7.2.2 CVRP

We use 207 instances in A, B, D, E, F, M, P, and X sets from CVRPLIB [132]. We use the DyPDL model in Section 6.1, a MIP model proposed by Gadegaard and Lysgaard [12], and a CP model proposed by Rabbouch et al. [13].

7.2.3 m-PDTSP

We use 1178 instances from Hernández-Pérez and Salazar-González [111], which are divided into class1, class2, and class3 sets. We use the DyPDL model in Section 6.2, the MCF2C+IP formulation for MIP [14], and the CP model proposed by Castro et al. [113]. In all models, unnecessary edges are removed by a preprocessing method [14].

7.2.4 OPTW

We use 144 instances from Righini and Salani [133, 25], Montemanni and Gambardella [134], and Vansteenwegen et al. [135]. In these instances, service time $s_{i}$ spent at each customer $i$ is defined, so we incorporate it in the travel time, i.e., we use $s_{i}+c_{ij}$ as the travel time from $i$ to $j$ . We use the MIP model described in Vansteenwegen et al. [136]. For CP, we develop a model similar to that of TSPTW, described in B.5.

7.2.5 Multi-Dimensional Knapsack Problem (MDKP)

We use 276 instances of the multi-dimensional knapsack problem (MDKP) [116, 117] from OR-Library [137], excluding one instance that has fractional item weights; while the DIDP solvers can handle fractional weights, the CP solver does not. We use a DyPDL model in B.1.1 and the MIP model described in Cacchiani et al. [138]. For CP, we develop a model using the $\mathsf{Pack}$ global constraint [139] for each dimension (see B.1.2).

7.2.6 Bin Packing

We use 1615 instances in BPPLIB [140], proposed by Falkenauer [141] (Falkenauer U and Falkenauer T), Scholl et al. [142] (Scholl 1, Scholl 2, and Scholl 3), Wäscher and Gau [143] (Wäscher), Schwerin and Wäscher [144] (Schwerin 1 and Schwerin 2), and Schoenfield [145] (Hard28). We use the DyPDL model in Section 6.4 and the MIP model by Martello and Toth [116] extended with inequalities ensuring that bins are used in order of index and item $j$ is packed in the $j$ -th bin or earlier as described in Delorme et al. [146]. We implement a CP model using $\mathsf{Pack}$ while ensuring that item $j$ is packed in bin $j$ or before. For MIP and CP models, the upper bound on the number of bins is computed by the first-fit decreasing heuristic. We show the CP model in B.6.

7.2.7 SALBP-1

We use 2100 instances proposed by Morrison et al. [103]. We use the DyPDL model in Section 6.5 and the NF4 formulation for MIP [15]. Our CP model is based on Bukchin and Raviv [16] but is implement with the global constraint $\mathsf{Pack}$ in CP Optimizer as it performs better than the original model (see B.7). In addition, the upper bound on the number of stations is computed in the same way as the MIP model instead of using a heuristic.

7.2.8 Single Machine Total Weighted Tardiness

We use 375 instances of single machine scheduling to minimize total weighted tardiness ( $1||\sum w_{i}T_{i}$ ) [147] in OR-Library [137] with 40, 50, and 100 jobs. We use a DyPDL model in B.2.1 and the formulation with assignment and positional date variables (F4) for MIP [17]. For CP, we formulate a model using interval variables, as described in B.2.2. We extract precedence relations between jobs using the method proposed by Kanet [148] and incorporate them into the DyPDL and CP models but not into the MIP model as its performance is not improved.

7.2.9 Talent Scheduling

Garcia de la Banda and Stuckey [28] considered instances with 8, 10, 12, 14, 16, 18, 20, 22 actors and 16, 18, …, 64 scenes, resulting in 200 configurations in total. For each configuration, they randomly generated 100 instances. We use the first five instances for each configuration, resulting in 1000 instances in total. We use an extended version of the DyPDL model presented in Section 3.3.3 (see B.3.1) and a MIP model described in Qin et al. [149]. For CP, we extend the model used in Chu and Stuckey [150] with the $\mathsf{AllDifferent}$ global constraint [151], which is redundant but slightly improves the performance in practice, as described in B.3.2. In all models, a problem is simplified by preprocessing as described in Garcia de la Banda and Stuckey [28].

7.2.10 Minimization of Open Stacks Problem (MOSP)

We use 570 instances of the minimization of open stacks problem (MOSP) [152] from four sets: Constraint Modelling Challenge [153], SCOOP Project,¹³¹³13https://cordis.europa.eu/project/id/32998 Faggioli and Bentivoglio [154], and Chu and Stuckey [155]. We use a DyPDL model based on a problem-specific algorithm [155] presented in B.4. The MIP and CP models are proposed by Martin et al. [18]. From their two MIP models, we select MOSP-ILP-I as it solves more instances optimally in their paper.

7.2.11 Graph-Clear

We generated 135 instances using planar and random graphs in the same way as Morin et al. [19], where the number of nodes in a graph is 20, 30, or 40. For planar instances, we use a planar graph generator [156] with the input parameter of 1000. We use the DyPDL model in Section 6.6 and MIP and CP models proposed by Morin et al. [19]. From the two proposed CP models, we select CPN as it solves more instances optimally.

7.3 Comparison with MIP and CP

We use Rust 1.70.0 for didp-rs and Python 3.10.2 for the Python scripts to convert instances to YAML-DyPDL files and the Python interfaces of Gurobi and CP Optimizer. All experiments are performed on an Intel Xeon Gold 6418 processor with a single thread, an 8 GB memory limit, and a 30-minute time limit using GNU Parallel [157].

7.3.1 Coverage

Table 1: Coverage (c.) and the number of instances where the memory limit is reached (m.) in each problem class. The coverage of a DIDP solver is in bold if it is higher than MIP and CP, and the higher of MIP and CP is in bold if there is no better DIDP solver. The highest coverage is underlined.

	MIP		CP		CAASDy		DFBnB		CBFS		ACPS		APPS		DBDFS		CABS		CABS/0
	c.	m.	c.	m.	c.	m.	c.	m.	c.	m.	c.	m.	c.	m.	c.	m.	c.	m.	c.	m.
TSPTW (340)	224	0	47	0	257	83	242	34	257	81	257	82	257	83	256	83	259	0	259	0
CVRP (207)	28	5	0	0	6	201	6	187	6	201	6	201	6	201	6	201	6	0	5	3
m-PDTSP (1178)	940	0	1049	0	952	226	985	193	988	190	988	190	988	190	987	191	1035	0	988	15
OPTW (144)	16	0	49	0	64	79	64	60	64	80	64	80	64	80	64	78	64	0	-	-
MDKP (276)	168	0	6	0	4	272	4	272	5	271	5	271	5	271	4	272	5	1	-	-
Bin Packing (1615)	1160	0	1234	0	922	632	526	1038	1115	431	1142	405	1037	520	426	1118	1167	4	242	14
SALBP-1 (2100)	1431	250	1584	0	1657	406	1629	470	1484	616	1626	474	1635	465	1404	696	1802	0	1204	1
$1\|\|\sum w_{i}T_{i}$ (375)	107	0	150	0	270	105	233	8	272	103	272	103	265	110	268	107	288	0	-	-
Talent Scheduling (1000)	0	0	0	0	207	793	189	388	214	786	214	786	206	794	205	795	239	0	231	0
MOSP (570)	241	14	437	0	483	87	524	46	523	47	524	46	523	47	522	48	527	0	-	-
Graph-Clear (135)	26	0	4	0	78	57	99	36	101	34	101	34	99	36	82	53	103	19	-	-

Since all solvers are exact, we evaluate coverage, the number of instances where an optimal solution is found and its optimality is proved within time and memory limits. We include the number of instances where infeasibility is proved in coverage. We show the coverage of each method in each problem class in Table 1. In the table, if a DIDP solver has higher coverage than MIP and CP, it is emphasized in bold. If MIP or CP is better than all DIDP solvers, its coverage is in bold. The highest coverage is underlined. We explain CABS/0 in Tables 1–3 later in Section 7.5.

CAASDy, ACPS, APPS, and CABS outperform both MIP and CP in seven problem classes: TSPTW, OPTW, SALBP-1, $1||\sum w_{i}T_{i}$ , talent scheduling, MOSP, and graph-clear. In addition, the DIDP solvers except for CAASDy have higher coverage than MIP and CP in the class1 instances of m-PDTSP (145 (CABS) and 144 (others) vs. 128 (MIP and CP)). Comparing the DIDP solvers, CABS has the highest coverage in all problem classes. As shown, each DIDP solver except for CABS reaches the memory limit in most of the instances it is unable to solve while CABS rarely reaches the memory limit. This difference is possibly because CABS needs to store only states in the current and next layers, while other solvers need to store all generated and not dominated states in the open list.

MIP has the highest coverage in CVRP and MDKP, and CP in m-PDTSP and bin packing. MIP runs out of memory in some instances while CP never does. In particular, in the MIP model for SALBP-1, the number of decision variables and constraints is quadratic in the number of tasks in the worst case, and MIP reaches the memory limit in 250 instances with 1000 tasks.

7.3.2 Optimality Gap

Table 2: Average optimality gap in each problem class. The optimality gap of a DIDP solver is in bold if it is lower than MIP and CP, and the lower of MIP and CP is in bold if there is no better DIDP solver. The lowest optimality gap is underlined.

	MIP	CP	CAASDy	DFBnB	CBFS	ACPS	APPS	DBDFS	CABS	CABS/0
TSPTW (340)	0.2200	0.7175	0.2441	0.1598	0.1193	0.1194	0.1217	0.1408	0.1151	0.2085
CVRP (207)	0.8647	0.9868	0.9710	0.7484	0.7129	0.7123	0.7164	0.7492	0.6912	0.9111
m-PDTSP (1178)	0.1838	0.1095	0.2746	0.2097	0.1807	0.1807	0.1840	0.2016	0.1599	0.1878
OPTW (144)	0.6650	0.2890	0.5556	0.3583	0.2683	0.2683	0.2778	0.3359	0.2696	-
MDKP (276)	0.0008	0.4217	0.9855	0.4898	0.4745	0.4745	0.4742	0.4854	0.4676	-
Bin Packing (1615)	0.0438	0.0043	0.4291	0.0609	0.0083	0.0075	0.0105	0.0651	0.0049	0.7386
SALBP-1 (2100)	0.2712	0.0108	0.2100	0.0257	0.0115	0.0096	0.0094	0.0273	0.0057	0.3695
$1\|\|\sum w_{i}T_{i}$ (375)	0.4981	0.3709	0.2800	0.3781	0.2678	0.2679	0.2878	0.2845	0.2248	-
Talent Scheduling (1000)	0.8867	0.9509	0.7930	0.2368	0.1884	0.1884	0.2003	0.2462	0.1697	0.6667
MOSP (570)	0.3169	0.1931	0.1526	0.0713	0.0362	0.0359	0.0392	0.0655	0.0200	-
Graph-Clear (135)	0.4465	0.4560	0.4222	0.2359	0.0995	0.0996	0.1089	0.2636	0.0607	-

We also evaluate optimality gap, a relative difference between primal and dual bounds. The optimality gap measures how close a solver is to prove the optimality in instances that are note optimally solved. Let $\overline{\gamma}$ be a primal bound and $\underline{\gamma}$ be a dual bound found by a solver. We define the optimality gap, $\delta\left(\overline{\gamma},\underline{\gamma}\right)$ , as follows:

\delta\left(\overline{\gamma},\underline{\gamma}\right)=\begin{cases}0&\text{if }\overline{\gamma}=\underline{\gamma}=0\\ \frac{|\overline{\gamma}-\underline{\gamma}|}{\max\left\{|\overline{\gamma}|,|\underline{\gamma}|\right\}}&\text{else.}\\ \end{cases}

The optimality gap is $0$ when the optimality is proved and positive otherwise. We also use $0$ as the optimality gap when the infeasibility is proved. In the second line, when the signs of the primal and dual bounds are the same, since $|\overline{\gamma}-\underline{\gamma}|\leq\max\{|\overline{\gamma}|,|\underline{\gamma}|\}$ , the optimality gap never exceeds $1$ . In practice, we observe that primal and dual bounds found are always nonnegative in our experiment. Therefore, we use $1$ as the optimality gap when either a primal or dual bound is not found.

We show the average optimality gap in Table 2. Similar to Table 1, the optimality gap of a DIDP solver is in bold if it is better than MIP and CP, and the best value is underlined. ACPS, APPS, and CABS achieve a better optimality gap than MIP and CP in the seven problem classes where they have higher coverage. In addition, the DIDP solvers except for CAASDy outperform MIP and CP in CVRP, where MIP has the highest coverage; in large instances of CVRP, MIP fails to find feasible solutions, which results in a high average optimality gap. Comparing the DIDP solvers, CABS is the best in all problem classes except for OPTW, where CBFS and ACPS are marginally better. CAASDy is the worst among the DIDP solvers in all problem classes except for $1||\sum w_{i}T_{i}$ ; similar to MIP in CVRP, CAASDy does not provide a primal bound for any unsolved instances except for eleven instances of m-PDTSP.

7.3.3 Primal Integral

Table 3: Average primal integral in each problem class. The primal integral of a DIDP solver is in bold if it is lower than MIP and CP, and the lower of MIP and CP is in bold if there is no better DIDP solver. The lowest primal integral is underlined.

	MIP	CP	CAASDy	DFBnB	CBFS	ACPS	APPS	DBDFS	CABS	CABS/0
TSPTW (340)	479.03	48.97	458.26	46.31	9.49	10.06	29.36	56.65	9.25	13.71
CVRP (207)	1127.55	482.89	1748.23	420.66	423.45	418.29	440.59	523.42	333.68	335.75
m-PDTSP (1178)	177.60	26.04	333.53	23.37	6.51	6.49	9.23	17.87	5.24	5.31
OPTW (144)	438.06	15.58	1018.23	175.49	54.05	54.29	74.37	139.64	57.95	-
MDKP (276)	0.65	15.86	1773.92	236.12	211.69	211.62	211.84	237.99	201.72	-
Bin Packing (1615)	88.07	8.05	778.60	104.46	9.98	8.40	13.82	111.85	5.04	11.56
SALBP-1 (2100)	538.79	28.43	383.35	35.59	10.83	7.28	6.74	38.80	1.92	19.42
$1\|\|\sum w_{i}T_{i}$ (375)	64.89	3.49	513.24	136.99	111.19	103.76	105.97	97.34	71.21	-
Talent Scheduling (1000)	106.10	18.91	1435.12	119.03	40.72	40.39	60.12	143.45	25.41	50.78
MOSP (570)	95.20	13.01	275.48	4.41	1.39	1.20	1.37	7.72	0.31	-
Graph-Clear (135)	334.87	83.49	764.00	4.63	0.70	0.74	3.45	87.90	0.37	-

To evaluate the performance of anytime solvers, we use the primal integral [158], which considers the balance between the solution quality and computational time. For an optimization problem, let $\sigma^{t}$ be a solution found by a solver at time $t$ , $\sigma^{*}$ be an optimal (or best-known) solution, and $\gamma$ be a function that returns the solution cost. The primal gap function $p$ is

p(t)=\begin{cases}0&\text{if }\gamma(\sigma^{t})=\gamma(\sigma^{*})=0\\ 1&\text{if no }\sigma^{t}\text{ or }\gamma(\sigma^{t})\gamma(\sigma^{*})<0\\ \frac{\left|\gamma(\sigma^{*})-\gamma(\sigma^{t})\right|}{\max\{|\gamma(\sigma^{*})|,|\gamma(\sigma^{t})|\}}&\text{else.}\end{cases}

The primal gap takes a value in $[0,1]$ , and lower is better. Let $t_{i}\in[0,T]$ for $i=1,...,l-1$ be a time point when a new better solution is found by a solver, $t_{0}=0$ , and $t_{l}=T$ . The primal integral is defined as $P(T)=\sum_{i=1}^{l}p(t_{i-1})\cdot(t_{i}-t_{i-1})$ . It takes a value in $[0,T]$ , and lower is better. $P(T)$ decreases if the same solution cost is achieved faster or a better solution is found with the same computational time. When an instance is proved to be infeasible at time $t$ , we use $p(t)=0$ , so $P(T)$ corresponds to the time to prove infeasibility. For TSPTW, CVRP, and $1||\sum w_{i}T_{i}$ , we use the best-known solutions provided with the instances to compute the primal gap. For other problems, we use the best solutions found by the solvers evaluated.

We show the average primal integral in Table 3. Similar to Tables 1 and 2, the primal integral of a DIDP solver is in bold if it is better than MIP and CP, and the best value is underlined. CBFS, ACPS, and APPS outperform MIP and CP in six problem classes (TSPTW, CVRP, m-PDTSP, SALBP-1, MOSP, and graph-clear). In addition to these problem classes, CABS achieves a better primal integral than CP in bin packing. Comparing the DIDP solvers, CABS is the best in all problem classes except for OPTW, where CBFS and ACPS are better. As mentioned above, CAASDy does not find feasible solutions for almost all unsolved instances, resulting in the worst primal integral in all problem classes.

In CVRP, while MIP solves more instances optimally, the DIDP solvers except for CAASDy achieve a better average primal integral since MIP fails to find primal bounds for large instances as mentioned. In m-PDTSP, the DIDP solvers except for CAASDy have a lower primal integral than CP, which has the highest coverage. In contrast, in OPTW, $1||\sum w_{i}T_{i}$ , and talent scheduling, where the DIDP solvers solve more instances, CP has a better primal integral.

7.4 Performance of DIDP Sovlers and Problem Characteristics

In CVRP, MIP solves more instances than the DIDP solvers even though the DyPDL model is similar to other routing problems (i.e., TSPTW, m-PDTSP, and OPTW) where the DIDP solvers are better. Similarly, in bin packing, CP solves more instances even though the DyPDL model is similar to that of SALBP-1, where CABS is the best. One common feature in the subset of the problem classes where DIDP is better than MIP or CP is a sequential dependency. In TSPTW and OPTW, a solution is a sequence of visited customers, the time when a customer is visited depends on the partial sequence of customers visited before, and time window constraints restrict possible sequences. Similarly, in m-PDTSP and SALBP-1, precedence constraints restrict possible sequences. In the DyPDL models, these constraints restrict possible paths in the state transition graphs and thus reduce the number of generated states. In contrast, in CVRP, as long as the capacity constraint is satisfied for each vehicle, customers can be visited in any order. In the DyPDL model for bin packing, while item $i$ must be packed in the $i$ -th or earlier bin and an arbitrary item is packed in a new bin by a forced transition, the remaining items can be packed in any order. We conjecture that this difference, whether a sequential dependency exists or not, may be important for the performance of the DIDP solvers observed in our experiments. However, sequential dependencies do not appear to be the only factor: DIDP also outperforms the other solvers in talent scheduling, MOSP, and Graph-Clear which do not exhibit such sequential dependencies. Detailed analysis of model characteristics that affect the performance of DIDP solvers is left for future work.

7.5 Evaluating the Importance of Dual Bound Functions

As we described above, our DIDP solvers use the dual bound function defined in a DyPDL model as an admissible heuristic function, which is used for both search guidance and state pruning. In $1||\sum w_{i}T_{i}$ , MOSP, and graph-clear, we use a trivial dual bound function, which always returns 0. Nevertheless, the DIDP solvers show better performance than MIP and CP. This result suggests that representing these problems using state transition systems provides a fundamental advantage on these problems while raising the question of the impact of non-trivial dual bound functions on problem solving performance. To investigate, we evaluate the performance of CABS with DyPDL models where the dual bound function is replaced with a function that always returns 0. In other words, beam search keeps the best $b$ states according to the $g$ -values and prunes a state $S$ if $g(S)\geq\overline{\gamma}$ in minimization, where $\overline{\gamma}$ is a primal bound. Since the zero dual bound function is not valid for OPTW and MDKP, where the DyPDL models maximize the nonnegative total profit, we use only TSPTW, CVRP, m-PDTSP, bin packing, SALBP-1, and talent scheduling. We call this configuration CABS/0 and show results in Tables 1–3. CABS/0 has a lower coverage than CABS in all problem classes except for TSPTW, where it achieves the same coverage. In terms of the optimality gap and primal integral, CABS is better than CABS/0 in all problem classes, and the difference is particularly large in bin packing and SALBP-1. The result confirms the importance of a non-trivial dual bound function for the current DIDP solvers.

7.6 Comparison with Other State-Based Approaches

We compare DIDP with domain-independent AI planning and Picat, a logic programming language that has an AI planning module. For domain-independent AI planning, we formulate numeric planning models for TSPTW, CVRP, m-PDTSP, bin packing, and SALBP-1 based on the DyPDL models using PDDL 2.1. We use NLM-CutPlan Orbit [159], the winner of the optimal numeric track in the International Planning Competition (IPC) 2023,¹⁴¹⁴14https://ipc2023-numeric.github.io/ to solve the models. For MOSP, we use a PDDL model for classical planning that was previously used in IPC from 2006 to 2008 with Ragnarok [160], the winner of the optimal classical track in IPC 2023.¹⁵¹⁵15https://ipc2023-classical.github.io/ In these PDDL models, we are not able to model redundant information represented by state constraints, resource variables, dual bound functions, and forced transitions. For Picat, we formulate models for TSPTW, CVRP, m-PDTSP, bin packing, SALBP-1, $1||\sum w_{i}T_{i}$ , and talent scheduling using the AI planning module with the best_plan_bb predicate, which performs a branch-and-bound algorithm. These models are equivalent to the DyPDL models except that they do not have resource variables. For OPTW, MDKP, MOSP, and graph-clear, we use tabling, a feature to cache the evaluation results of predicates, and the models do not include resource variables and dual bound functions. We provide more details for the PDDL and Picat models in C, and the implementations are available in our repository.¹⁶¹⁶16https://github.com/Kurorororo/didp-models

For NLMCut-Plan and Ragnarok, a problem instance is translated to PDDL files by a Python script. For Picat, a problem instance of CVRP, m-PDTSP, OPTW, SALBP-1, $1||\sum w_{i}T_{i}$ , and talent scheduling is preprocessed and formatted by a Python script so that Picat can easily parse it. We use GCC 12.3 for NLM-CutPlan and Ragnarok, IBM ILOG CPLEX 22.1.1 as a linear programming solver for Ragnarok, and Picat 3.6.

Table 4: Coverage of MIP, CP, PDDL planners, Picat, and CABS. For PDDL, Ragnarok is used for MOSP, and NLM-CutPlan Orbit is used for the other problem classes. The coverage of a solver is in bold if it is higher than MIP and CP, and the higher of MIP and CP is in bold if there is no better solver. The highest coverage is underlined.

	MIP	CP	PDDL	Picat	CABS
TSPTW (340)	224	47	61	210	259
CVRP (207)	28	0	1	6	6
m-PDTSP (1178)	940	1049	1031	804	1035
OPTW (144)	16	49	-	26	64
MDKP (276)	168	6	-	3	5
Bin Packing (1615)	1160	1234	18	895	1167
SALBP-1 (2100)	1431	1584	871	1590	1802
$1\|\|\sum w_{i}T_{i}$ (375)	107	150	-	199	288
Talent Scheduling (1000)	0	0	-	84	239
MOSP (570)	241	437	193	162	527
Graph-Clear (135)	26	4	-	45	103

Table 4 compares MIP, CP, the PDDL planners, Picat, and CABS. Since the PDDL planners and Picat return only an optimal solution, we evaluate only coverage for them. CABS has higher or equal coverage than the planners and Picat in all problem classes. This result is not surprising since the planners and the AI planning module and tabling in Picat are not designed for combinatorial optimization. It might be possible to improve the PDDL and Picat models so that they are more suited for these approaches. Moreover, different PDDL planners might be better for combinatorial optimization. However, our point is to show that the performance achieved by the DIDP solvers is not a trivial consequence of the state-based modeling approach, and DIDP is doing something that existing approaches are not able to easily do.

We also compare DIDP with ddo, a DD-based solver [58], using TSPTW and talent scheduling, for which previous work developed models for ddo [161, 162, 163, 164]. For TSPTW, while we minimize the total travel time, which does not include the waiting time, the model for ddo minimizes the makespan, which is the time spent until returning to the depot. Therefore, we adapt our DyPDL model to minimize the makespan: when visiting customer $j$ from the current location $i$ with time $t$ , we increase the cost by $\max\{c_{ij},a_{j}-t\}$ instead of $c_{ij}$ . To avoid confusion, in what follows, we call TSPTW to minimize the makespan TSPTW-M.

Since defining a merging operator required by ddo is a non-trivial task, we do not evaluate ddo for problem classes other than TSPTW-M and talent scheduling. When two states are merged into one state, solutions for the original states must also be a solution for the merged state with the same or better cost. In the DP model for TSPTW-M, three state variables, the set of unvisited customers $U$ , the current location $i$ , and the current time $t$ , are used. When two states with the sets of unvisited customers $U$ and $U^{\prime}$ are merged, the set of customers that must be visited should be $U\cap U^{\prime}$ , but the set of customers that may be visited should be $U\cup U^{\prime}$ . Thus, in the model for ddo, $U$ is replaced with two state variables, one representing the set of customers that must be visited and another representing the set of customers that may be visited, and these two variables have the same values in a non-merged state. Similarly, the current locaton is represented by a set of locations that can be considered as the current location. As in this example, defining a merging operator requires a significant change in the DP model.

Table 5: Coverage and the average optimality gap of ddo and CABS in TSPTW-M and talent scheduling. For TSPTW-M, the optimality gap is not presented since ddo runs out of 8 GB memory in all unsolved instances and does not report intermediate solutions. For talent scheduling, the average optimality gap is computed from 976 instances where ddo does not reach the memory limit. A value of ddo or CABS is in bold if it is better than MIP and CP, and the better one of MIP and CP is in bold otherwise. The best value is underlined.

	Coverage				Optimality Gap
	MIP	CP	ddo	CABS	MIP	CP	ddo	CABS
TSPTW-M (340)	114	331	213	260			-	-
Talent Scheduling (1000)	0	0	210	239	0.8871	0.9513	0.1424	0.1730

We evaluate ddo 2.0 using Rust 1.70.0 and present the result in Table 5. We use the ddo models for TSPTW-M and talent scheduling obtained from the published repository of ddo.¹⁷¹⁷17https://github.com/xgillard/ddo/tree/b2e68bfc085af7cc09ece38cc9c81acb0da6e965/ddo/examples We also adapt the MIP and CP models for TSPTW to TSPTW-M and evaluate them. While ddo returns the best solution and dual bound found within the time limit, it does not return intermediate solutions and bounds during solving. Since we manage the memory limit using an external process, when ddo reaches the memory limit, it is killed without returning the best solution. In TSPTW-M, ddo reaches the memory limit in all unsolved instances. Therefore, we evaluate only coverage in TSPTW-M and present the average optimality gap computed from 976 out of 1000 talent scheduling instances where ddo does not reach the memory limit. CABS is competitive with ddo: it is better in TSPTW-M and talent scheduling in coverage, but ddo has a better average optimality gap in talent scheduling. Note that when we adapt TSPTW to minimize makespan, CP performance increases considerably from the coverage of 47 (see Table 1) to 331 instances. We conjecture that the improvement is a result of the strong back-propagation of the maximum objective function with specialized global constraints for scheduling [165].

8 Conclusion and Future Work

We proposed domain-independent dynamic programming (DIDP), a novel model-based paradigm for combinatorial optimization based on dynamic programming (DP). We introduced Dynamic Programming Description Language (DyPDL), a modeling formalism for DP, and YAML-DyPDL, a modeling language for DyPDL. We developed seven DIDP solvers using heuristic search algorithms and experimentally showed that DIDP outperforms mixed-integer programming and constraint programming in a number of combinatorial optimization problem classes. This result shows that DIDP is promising and complements existing model-based paradigms.

The significance of DIDP is that it is the first model-based paradigm designed for combinatorial optimization based on DP. DIDP is based on two different fields, artificial intelligence (AI) and operations research (OR). In particular, we focused on the state-based representations of problems, which are common in AI planning and DP but not previously exploited in a model-based paradigm for combinatorial optimization. In AI planning, PDDL, the state-based modeling language, is commonly used, and some combinatorial optimization problems such as the minimization of open stacks problem were modeled in PDDL and used in International Planning Competitions. However, PDDL and AI planners are not specifically designed for combinatorial optimization. In OR, DP and state space search methods were used in problem-specific settings, but little work has developed a model-based paradigm based on DP. DIDP bridges these gaps, benefitting from both AI and OR. Since DIDP has a state-based modeling formalism similar to AI planning, we can apply heuristic search algorithms studied in AI to various combinatorial optimization problems. Since DIDP follows the OR approach that allows a user to incorporate redundant information into optimization models, we can develop efficient models for application problems built upon problem-specific DP methods studied in OR.

DIDP opens up new research opportunities. As shown in the experimental result, state-based paradigms such as DIDP and constraint-based paradigms such as mixed-integer programming and constraint programming are suited to different problem classes. Even within state-based paradigms, DIDP and decision diagram-based solvers have different strengths. Analyzing the characteristics of problems that make DIDP superior to others is an interesting direction. DIDP also makes it possible to investigate better DP models, for example, by incorporating redundant information.

There is also a significant opportunity to improve DIDP. One of the most important directions is to develop better heuristic functions for the heuristic search solvers. The current solvers use a dual bound function provided in a DyPDL model for two roles: search guidance and state pruning. While we demonstrated the importance of a dual bound function in Section 7.5, using it for search guidance is not necessarily justified as discussed in Section 5.3; we may improve the anytime behavior by using an inadmissible heuristic function for search guidance. Disentangling search guidance from pruning and developing better functions for each role is one of our future plans. In particular, we are considering developing methods to automatically compute heuristic functions from a DyPDL model as studied in AI planning. In addition, decision diagrams (DDs) can be a source of heuristic functions; in the existing DD-based solver [57, 58], relaxed DDs, where multiple nodes are merged together to make the graph smaller, are used to compute a dual bound. To obtain a relaxed DD, the DD-based solver requires a user to provide a merging operator. Thus, for DIDP, there are two directions: developing a domain-independent merging operator for DyPDL and extending DyPDL so that a user can declaratively incorporate a merging operator into a model.

In contrast to applying AI techniques to DIDP, using DIDP for AI planning is also possible. In proving the undecidability of DyPDL, we showed that numeric planning tasks can be modeled in DyPDL. As this result implies that we can automatically transform a numeric planning task into a DyPDL model, investigating more efficient DyPDL models for each planning domain is an interesting direction for future work.

Appendix A Proof of Lemma 3

Proof.

Once $\overline{\gamma}\leq\hat{\gamma}$ holds, $\overline{\gamma}$ never increases, so the lemma continues to hold. We only consider $\overline{\gamma}>\hat{\gamma}$ in the current iteration and examine if the lemma will hold in the next iteration. Now, we further specify our assumption: when $\overline{\gamma}>\hat{\gamma}$ , then $O$ contains a state $\hat{S}$ such that an $\hat{S}$ -solution $\hat{\sigma}$ exists, $\langle\sigma(\hat{S});\hat{\sigma}\rangle$ is a solution for the model with $\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}$ , and $|\hat{\sigma}|\leq|\sigma^{\prime}|$ for each $S^{\prime}$ -solution $\sigma^{\prime}$ with $\mathsf{solution\_cost}(\langle\sigma(S^{\prime});\sigma^{\prime}\rangle)\leq\hat{\gamma}$ for each $S^{\prime}\in G$ . Here, we denote the number of transitions in a solution $\sigma$ as $|\sigma|$ . At the beginning, $\hat{S}=S^{0}\in O$ , any solution is an extension of $\sigma(S^{0})$ , and $G=\{S^{0}\}$ , so the assumption holds.

In line 7, $S$ is removed from $O$ . If $S$ is a base state, $\overline{\gamma}$ and $\overline{\sigma}$ can be updated. If $\overline{\gamma}$ becomes less than or equal to $\hat{\gamma}$ , the assumption holds in the next iteration. Otherwise, $\overline{\gamma}>\hat{\gamma}$ . Since there exists a solution extending $\sigma(\hat{S})$ with the cost at most $\hat{\gamma}$ by the assumption, $S\neq\hat{S}$ . By Lemma 2, $g(\hat{S})\times\eta(\hat{S})=w_{\sigma(\hat{S})}(S^{0})\times\eta(\hat{S})$ . By Definition 16, $\eta(\hat{S})\leq\mathsf{solution\_cost}(\hat{\sigma},\hat{S})$ . Since $A$ is isotone, $g(\hat{S})\times\eta(\hat{S})\leq w_{\sigma(\hat{S})}(S^{0})\times\mathsf{solution\_cost}(\hat{\sigma},\hat{S})\leq\hat{\gamma}<\overline{\gamma}$ . Thus, $\hat{S}$ is not removed from $O$ in line 12.

If $S$ is not a base state, its successor states are generated in lines 14–21. Since $S$ was included in $O$ , $S\in G$ by lines 19 and 21. If there exists an $S$ -solution $\sigma^{1}$ with $\mathsf{solution\_cost}(\langle\sigma(S);\sigma^{1}\rangle)\leq\hat{\gamma}$ , then $|\sigma^{1}|\geq|\hat{\sigma}|$ by the assumption. We consider the following cases.

1.

There does not exist an $S$ -solution $\sigma^{1}$ satisfying both $\mathsf{solution\_cost}(\langle\sigma(S);\sigma^{1}\rangle)\leq\hat{\gamma}$ and $|\sigma^{1}|\leq|\hat{\sigma}|$ .
2.

There exists an $S$ -solution $\sigma^{1}$ with $\mathsf{solution\_cost}(\langle\sigma(S);\sigma^{1}\rangle)\leq\hat{\gamma}$ and $|\sigma^{1}|=|\hat{\sigma}|$ .

In the first case, $S\neq\hat{S}$ . For each successor state $S[\![\tau]\!]$ , if there does not exist an $S[\![\tau]\!]$ -solution $\sigma^{2}$ such that $\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)\leq\hat{\gamma}$ holds, adding $S[\![\tau]\!]$ to $O$ in line 21 does not affect the assumption as long as $\hat{S}\in G$ . Suppose that there exists an $S[\![\tau]\!]$ -solution $\sigma^{2}$ such that $\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)\leq\hat{\gamma}$ holds. Then, since $\langle\tau;\sigma^{2}\rangle$ is an $S$ -solution, $|\langle\tau;\sigma^{2}\rangle|>|\hat{\sigma}|$ , so $|\sigma^{2}|\geq|\hat{\sigma}|$ . Again, as long as $\hat{S}\in G$ , adding $S[\![\tau]\!]$ to $G$ does not affect the assumption. Removing $\hat{S}$ from $G$ in line 16 is possible only if $\hat{S}\preceq_{a}S[\![\tau]\!]$ and $g_{\text{curent}}=g(S)\times w_{\tau}(S)\leq g(\hat{S})$ in line 19. In such a case, since $\hat{S}\preceq S[\![\tau]\!]$ , there exists an $S[\![\tau]\!]$ -solution $\sigma^{2}$ such that $\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])\leq\mathsf{solution\_cost}(\hat{\sigma},\hat{S})$ with $|\sigma^{2}|\leq|\hat{\sigma}|$ . Since $\langle\sigma(S);\tau;\sigma^{2}\rangle$ is a solution for the model, by Lemma 2,

\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)=g(S)\times w_{\tau}(S)\times\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!]).

Since $S[\![\tau]\!]$ is reachable from $S^{0}$ with $\langle\sigma(S);\tau\rangle$ , by Theorem 9,

\begin{split}\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)&=g(S)\times w_{\tau}(S)\times\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])\leq g(\hat{S})\times\mathsf{solution\_cost}(\hat{\sigma},\hat{S})\\ &=\mathsf{solution\_cost}(\langle\sigma(\hat{S});\hat{\sigma}\rangle)\leq\hat{\gamma}.\end{split}

Because $|\sigma^{2}|\leq|\hat{\sigma}|$ , by considering $S[\![\tau]\!]$ as a new $\hat{S}$ , the assumption will hold in the next iteration.

In the second case, first, we assume that no applicable forced transitions are identified. In the loop from line 14 to line 21, let $S[\![\tau]\!]$ be the first successor such that there exists an $S[\![\tau]\!]$ -solution $\sigma^{2}$ with $\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)\leq\hat{\gamma}$ and $|\langle\tau;\sigma^{2}\rangle|=|\hat{\sigma}|$ , which implies $|\sigma^{2}|=|\hat{\sigma}|-1$ . Such a successor state exists since at least $S[\![\sigma^{1}_{1}]\!]$ satisfies the condition, where $\sigma^{1}_{1}$ is the first transition of $\sigma^{1}$ . First, we show that $S[\![\tau]\!]$ is inserted into $O$ in line 21. Then, we prove that $S[\![\tau]\!]$ or another successor state replacing $S[\![\tau]\!]$ in line 19 can be considered a new $\hat{S}$ in the next iteration. For a successor state $S[\![\tau^{\prime}]\!]$ considered before $S[\![\tau]\!]$ , if there exists an $S[\![\tau^{\prime}]\!]$ -solution $\sigma^{3}$ with $\mathsf{solution\_cost}(\langle\sigma(S[\![\tau^{\prime}]\!]);\sigma^{3}\rangle)\leq\hat{\gamma}$ , then $|\langle\tau^{\prime};\sigma^{3}\rangle|>|\hat{\sigma}|$ , so $|\sigma^{3}|\geq|\hat{\sigma}|>|{\sigma^{2}}|$ . Therefore, adding $S[\![\tau^{\prime}]\!]$ to $G$ does not affect the assumption. Suppose that $S[\![\tau]\!]$ is not added to $O$ due to line 16. Then, there exists a state $S^{\prime}\in G$ such that $S[\![\tau]\!]\preceq_{a}S^{\prime}$ and $g_{\text{current}}=g(S)\times w_{\tau}(S)\geq g(S^{\prime})$ . Since $S[\![\tau]\!]\preceq_{a}S^{\prime}$ , there exists an $S^{\prime}$ -solution $\sigma^{\prime}$ with $\mathsf{solution\_cost}(\sigma^{\prime},S^{\prime})\leq\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])$ and $|\sigma^{\prime}|\leq|\sigma^{2}|$ . However, by the assumption, $|\sigma^{\prime}|\geq|\hat{\sigma}|>|\sigma^{2}|$ , which is a contradiction. Therefore, there does not exist such $S^{\prime}$ , and the condition in line 16 is true. Next, we examine the condition in line 17. By Lemma 2,

\mathsf{solution\_cost}(\langle\sigma(S);\tau;\sigma^{2}\rangle)=g(S)\times w_{\tau}(S)\times\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])\leq\hat{\gamma}.

Since $\eta(S[\![\tau]\!])\leq\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])$ and $A$ is isotone,

g_{\text{current}}\times\eta(S[\![\tau]\!])=g(S)\times w_{\tau}(S)\times\eta(S[\![\tau]\!])\leq g(S)\times w_{\tau}(S)\times\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])\leq\hat{\gamma}<\overline{\gamma}.

Therefore, the condition in line 17 is true, and $S[\![\tau]\!]$ is inserted into $O$ . For a successor state $S[\![\tau^{\prime}]\!]$ generated after $S[\![\tau]\!]$ , suppose that there does not exist an $S[\![\tau^{\prime}]\!]$ -solution $\sigma^{3}$ with $\mathsf{solution\_cost}(\langle\sigma(S);\tau^{\prime};\sigma^{3}\rangle)\leq\hat{\gamma}$ . Adding $S[\![\tau^{\prime}]\!]$ to $G$ does not affect the assumption as long as $S[\![\tau]\!]\in G$ . If there exists such $\sigma^{3}$ , then $|\sigma^{3}|=|\hat{\sigma}|-1$ or $|\sigma^{3}|\geq|\hat{\sigma}|$ by the assumption. In the former case, adding $S[\![\tau^{\prime}]\!]$ to $G$ does not affect the assumption as long as $S[\![\tau^{\prime}]\!]\in G$ since we can consider $S[\![\tau^{\prime}]\!]$ as a new $\hat{S}$ in the next iteration. In the latter case, adding $S[\![\tau^{\prime}]\!]$ to $G$ does not affect the assumption as long as $S[\![\tau]\!]\in G$ since $|\sigma^{3}|>|\sigma^{2}|$ . The remaining problem is the possibility that $S[\![\tau]\!]$ is removed in line 19. If the condition in line 16 is true, then $g(S)\times w_{\tau^{\prime}}(S)\leq g(S)\times w_{\tau}(S)$ , and there exists an $S[\![\tau^{\prime}]\!]$ -solution $\sigma^{3}$ with $\mathsf{solution\_cost}(\sigma^{3},S[\![\tau^{\prime}]\!])\leq\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])$ and $|\sigma^{3}|\leq|\sigma^{2}|$ . By Theorem 9,

\begin{split}\mathsf{solution\_cost}(\langle\sigma(S);\tau^{\prime};\sigma^{3}\rangle)&=g(S)\times w_{\tau^{\prime}}(S)\times\mathsf{solution\_cost}(\sigma^{3},S[\![\tau^{\prime}]\!])\\ &\leq g(S)\times w_{\tau}(S)\times\mathsf{solution\_cost}(\sigma^{2},S[\![\tau]\!])\leq\hat{\gamma}.\end{split}

Therefore, if we consider $S[\![\tau^{\prime}]\!]$ as a new $\hat{S}$ in the next iteration instead of $S[\![\tau]\!]$ , the situation does not change. Similarly, if $S[\![\tau^{\prime}]\!]$ is replaced with another successor state, by considering it as a new $\hat{S}$ , the situation does not change, and the assumption will hold in the next iteration.

When applicable forced transitions are identified, only one successor state $S[\![\tau]\!]$ is generated, where $\tau$ is a forced transition, and there exists an $S$ -solution $\langle\tau;\sigma^{2}\rangle$ with $\mathsf{solution\_cost}(\langle\tau;\sigma^{2}\rangle,S)\leq\mathsf{solution\_cost}(\sigma^{1},S)$ . Since $A$ is isotone,

w_{\sigma(S)}(S^{0})\times\mathsf{solution\_cost}(\langle\tau;\sigma^{2}\rangle,S)\leq w_{\sigma(S)}(S^{0})\times\mathsf{solution\_cost}(\sigma^{1},S)=\mathsf{solution\_cost}(\langle\sigma(S);\sigma^{1}\rangle)\leq\hat{\gamma}.

If $G$ contains a state $S^{\prime}$ such that there exists an $S^{\prime}$ -solution $\sigma^{\prime}$ with $\mathsf{solution\_cost}(\langle\sigma(S^{\prime});\sigma^{\prime}\rangle)\leq\hat{\gamma}$ and $|\sigma^{\prime}|\leq|\sigma^{2}|$ , then, by considering the one minimizing $|\sigma^{\prime}|$ as a new $\hat{S}$ , the condition will hold in the next iteration. If $G$ does not contain such a state, then $S[\![\tau]\!]$ can be considered a new $\hat{S}$ if it is inserted into $O$ . By a similar argument as the previous paragraph, we can prove that $S[\![\tau]\!]$ is inserted into $O$ and $G$ . ∎

Appendix B Additional Problem and Model Definitions

We present problem definitions and DyPDL models that are not covered in Section 6 and new CP models used in the experimental evaluation.

B.1 Multi-Dimensional Knapsack Problem (MDKP)

The multi-dimensional knapsack problem (MDKP) [116, 117] is a generalization of the 0-1 knapsack problem. In this problem, each item $i\in\{0,...,n-1\}$ has the integer profit $p_{i}\geq 0$ and $m$ -dimensional nonnegative weights $(w_{i,0},...,w_{i,m-1})$ , and the knapsack has the $m$ -dimensional capacities $(q_{0},...,q_{m-1})$ . In each dimension, the total weight of items included in the knapsack must not exceed the capacity. The objective is to maximize the total profit. MDKP is strongly NP-hard [119, 138]

B.1.1 DyPDL Model for MDKP

In our DyPDL model, we decide whether to include an item one by one. An element variable $i$ represents the index of the item considered currently, and a numeric variable $r_{j}$ represents the remaining space in the $j$ -th dimension. We can use the total profit of the remaining items as a dual bound function. In addition, we consider an upper bound similar to that of OPTW by ignoring dimensions other than $j$ . Let $e_{kj}=p_{k}/w_{kj}$ be the efficiency of item $k$ in dimension $j$ . Then, $\lfloor r_{j}\max_{k=i,...,n-1}e_{kj}\rfloor$ is an upper bound on the cost of a $(i,r_{j})$ -solution. If $w_{kj}=0$ , we define $e_{kj}=\sum_{k=i,...,n-1}p_{k}$ , i.e., the maximum additional profit achieved from $(i,r_{j})$ . In such a case, $\max\{r_{j},1\}\cdot\max_{k=1,...,n-1}e_{ij}$ is still a valid upper bound.

	$\displaystyle\text{compute }V(0,q_{0},...,q_{m-1})$		(41)
	$\displaystyle V(i,r_{0},...,r_{m-1})=\begin{cases}0&\text{if }i=n\\ \max\left\{\begin{array}[]{l}p_{i}+V(i+1,r_{0}-w_{i0},...,r_{m-1}-w_{i,m-1})\\ V(i+1,r_{0},...,r_{m-1})\end{array}\right.&\text{else if }\forall j\in M,w_{ij}\leq r_{ij}\\ V(i+1,r_{0},...,r_{m-1})&\text{else}\end{cases}$		(42)
	$\displaystyle V(i,r_{0},...,r_{m-1})\leq\min\left\{\sum_{k=i,...,n-1}p_{k},\min\limits_{j\in M}\left\lfloor\max\{r_{j},1\}\cdot\max\limits_{k=i,...,n-1}e_{kj}\right\rfloor\right\}$		(43)

where $M=\{0,...,m-1\}$ . Similar to OPTW, this model is monoidal, and the isotonicity is satisfied, but it is not cost-algebraic, so the first solution found by CAASDy may not be optimal (Theorem 15).

B.1.2 CP Model for MDKP

We use the $\mathsf{Pack}$ global constraint [139] and consider packing all items into two bins; one represents the knapsack, and the other represents items not selected. We introduce a binary variable $x_{i}$ representing the bin where item $i$ is packed ( $x_{i}=0$ represents the item is in the knapsack). We define an integer variable $y_{j,0}$ representing the total weight of the items in the knapsack in dimension $j$ of the knapsack and $y_{j,1}$ representing the total weight of the items not selected.

$\displaystyle\max$	$\displaystyle\sum_{i\in N}p_{i}(1-x_{i})$		(44)
s.t.	$\displaystyle\mathsf{Pack}(\{y_{j,0},y_{j,1}\},\{x_{i}\mid i\in N\},\{w_{ij}\mid i\in N\})$	$\displaystyle j=0,...,m-1$	(45)
	$\displaystyle y_{j,0}\leq q_{j}$	$\displaystyle j=0,...,m-1$	(46)
	$\displaystyle y_{j,0},y_{j,1}\in\mathbb{Z}_{0}^{+}$	$\displaystyle j=0,...,m-1$	(47)
	$\displaystyle x_{i}\in\{0,1\}$	$\displaystyle\forall i\in N.$	(48)

B.2 Single Machine Total Weighted Tardiness ( $1||\sum w_{i}T_{i}$ )

In single machine scheduling to minimize the total weighted tardiness ( $1||\sum w_{i}T_{i}$ ) [147], a set of jobs $N$ is given, and each job $i\in N$ has a processing time $p_{i}$ , a deadline $d_{i}$ , and an weight $w_{i}$ , all of which are nonnegative. The objective is to schedule all jobs on a machine while minimizing the sum of the weighted tardiness, $\sum_{i\in N}w_{i}\max\{0,C_{i}-d_{i}\}$ where $C_{i}$ is the completion time of job $i$ . This problem is strongly NP-hard [166].

B.2.1 DyPDL Model for $1||\sum w_{i}T_{i}$

We formulate a DyPDL model based on an existing DP model [22, 167], where one job is scheduled at each step. Let $F$ be a set variable representing the set of scheduled jobs. A numeric expression $T(i,F)=\max\{0,\sum_{j\in F}p_{j}+p_{i}-d_{i}\}$ represents the tardiness of $i$ when it is scheduled after $F$ . We introduce a set $P_{i}$ , representing the set of jobs that can be scheduled before $i$ without losing optimality. While $P_{i}$ is redundant information not defined in the problem, it can be extracted in preprocessing using precedence theorems [147].

	$\displaystyle\text{compute }V(\emptyset)$		(49)
	$\displaystyle V(F)=\begin{cases}0&\text{if }F=N\\ \min\limits_{i\in N\setminus F:P_{i}\setminus F=\emptyset}w_{i}T(i,F)+V(F\cup\{i\})&\text{else}\end{cases}$		(50)
	$\displaystyle V(F)\geq 0.$		(51)

This model is cost-algebraic with a cost algebra $\langle\mathbb{Q}_{0}^{+},+,0\rangle$ since $w_{i}T(i,F)\geq 0$ . Since the base cost is always zero, the first solution found by CAASDy is optimal.

B.2.2 CP Model for $1||\sum w_{i}T_{i}$

We use an interval variable $x_{i}$ with the duration $p_{i}$ and the time within $[0,\sum_{j\in N}p_{j}]$ , representing the interval of time when job $i$ is processed.

$\displaystyle\min$	$\displaystyle\sum_{i\in N}w_{i}\max\{\mathsf{EndOf}(x_{i})-d_{i},0\}$		(52)
s.t.	$\displaystyle\mathsf{NoOverlap}(\pi)$		(53)
	$\displaystyle\mathsf{Before}(\pi,x_{i},x_{j})$	$\displaystyle\forall j\in N,\forall i\in P_{j}$	(54)
	$\displaystyle x_{i}:\mathsf{intervalVar}\left(p_{i},\left[0,\sum_{j\in N}p_{j}\right]\right)$	$\displaystyle\forall i\in N$	(55)
	$\displaystyle\pi:\mathsf{sequenceVar}(\{x_{0},...,x_{n-1}\}).$		(56)

B.3 DyPDL Model for Talent Scheduling

In talent scheduling, we are given a set of scenes $N=\{0,...,n-1\}$ , a set of actors $A=\{0,...,m-1\}$ , the set of actors $A_{s}\subseteq A$ playing in scene $s$ , and the set of scenes $N_{a}\subseteq N$ where actor $a$ plays. In addition, $d_{s}$ is the duration of a scene $s$ , and $c_{a}$ is the cost of an actor $a$ per day.

B.3.1 DyPDL Model for Talent Scheduling

We use a set variable $Q$ representing the set of scenes that are not yet shot and a set expression $L(Q)=\bigcup_{s\in Q}A_{s}\cap\bigcup_{s\in N\setminus Q}A_{s}$ to represent the set of actors on location after shooting $N\setminus Q$ . If $A_{s}=L(Q)$ , then $s$ should be immediately shot because all actors are already on location: a forced transition. For the dual bound function, we underestimate the cost to shoot $s$ by $b_{s}=\sum_{a\in A_{s}}c_{a}$ .

If there exist two scenes $s_{1}$ and $s_{2}$ in $Q$ such that $A_{s_{1}}\subseteq A_{s_{2}}$ and $A_{s_{2}}\subseteq\bigcup_{s\in N\setminus Q}A_{s}\cup A_{s_{1}}$ , it is known that scheduling $s_{2}$ before $s_{1}$ is always better, denoted by $s_{2}\preceq s_{1}$ . Since two scenes with the same set of actors are merged into a single scene in preprocessing without losing optimality, we can assume that all $A_{s}$ are different. With this assumption, the relationship is a partial order: it is reflexive because $A_{s_{1}}\subseteq A_{s_{1}}$ and $A_{s_{1}}\subseteq\bigcup_{s\in N\setminus Q}A_{s}\cup A_{s_{1}}$ ; it is antisymmetric because if $s_{1}\preceq s_{2}$ and $s_{2}\preceq s_{1}$ , then $A_{s_{1}}\subseteq A_{s_{2}}$ and $A_{s_{2}}\subseteq A_{s_{1}}$ , which imply $s_{1}=s_{2}$ ; it is transitive because if $s_{2}\preceq s_{1}$ and $s_{3}\preceq s_{2}$ , then $A_{s_{1}}\subseteq A_{s_{2}}\subseteq A_{s_{3}}$ and $A_{s_{3}}\subseteq\bigcup_{s\in N\setminus Q}A_{s}\cup A_{s_{2}}\subseteq\bigcup_{s\in N\setminus Q}A_{s}\cup A_{s_{1}}$ , which imply $s_{3}\preceq s_{1}$ . Therefore, the set of candidate scenes to shoot next, $R(Q)=\{s_{1}\in Q\mid\not\exists s_{2}\in Q\setminus\{s_{1}\},s_{2}\preceq s_{1}\}$ , is not empty.

	$\displaystyle V(Q)=\begin{cases}0&\text{if }Q=\emptyset\\ b_{s}+V(Q\setminus\{s\})&\text{else if }\exists s\in Q,A_{s}=L(Q)\\ \min\limits_{s\in R(Q)}d_{s}\sum_{a\in A_{s}\cup L(Q)}c_{a}+V(Q\setminus\{s\})&\text{else if }Q\neq\emptyset\end{cases}$		(57)
	$\displaystyle V(Q)\geq\sum_{s\in Q}b_{s}.$		(58)

B.3.2 CP Model for Talent Scheduling

We extend the model used by Chu and Stuckey [150], which was originally implemented with MiniZinc [168], with the $\mathsf{AllDifferent}$ global constraint. While $\mathsf{AllDifferent}$ is redundant in the model, it slightly improved the performance in our preliminary experiment. Let $x_{i}$ be a variable representing the $i$ -th scene in the schedule, $b_{si}$ be a variable representing if scene $s$ is shot before the $i$ -th scene, $o_{ai}$ be a variable representing if any scene in $N_{a}$ is shot by the $i$ -th scene, and $f_{ai}$ be a variable representing if all scenes in $N_{a}$ finish before the $i$ -th scene. The CP model is

$\displaystyle\min$	$\displaystyle\sum_{i\in N}d_{x_{i}}\sum_{a\in A}c_{a}o_{ai}(1-f_{ai})$		(59)
s.t.	$\displaystyle\mathsf{AllDifferent}(\{x_{i}\mid i\in N\})$		(60)
	$\displaystyle b_{s0}=0$	$\displaystyle\forall s\in N$	(61)
	$\displaystyle b_{si}=b_{s,i-1}+\mathbbm{1}(x_{i-1}=s)$	$\displaystyle\forall i\in N\setminus\{0\},\forall s\in N$	(62)
	$\displaystyle b_{si}=1\rightarrow x_{i}\neq s$	$\displaystyle\forall i\in N\setminus\{0\},\forall s\in N$	(63)
	$\displaystyle o_{a0}=\mathbbm{1}\left(\bigvee_{s\in N_{a}}x_{0}=s\right)$	$\displaystyle\forall a\in A$	(64)
	$\displaystyle o_{ai}=\mathbbm{1}\left(o_{a,i-1}=1\lor\bigvee_{s\in N_{a}}x_{i}=s\right)$	$\displaystyle\forall i\in N\setminus\{0\},\forall a\in A$	(65)
	$\displaystyle f_{ai}=\prod_{s\in N_{a}}b_{si}$	$\displaystyle\forall i\in N,a\in A$	(66)
	$\displaystyle x_{i}\in N$	$\displaystyle\forall i\in N$	(67)
	$\displaystyle b_{si},f_{si}\in\{0,1\}$	$\displaystyle\forall s,i\in N.$	(68)

B.4 DyPDL Model for Minimization of Open Stacks Problem (MOSP)

In the minimization of open stacks problem (MOSP) [152], customers $C$ and products $P$ are given, and each customer $c$ orders products $P_{c}\subseteq P$ . A solution is a sequence in which products are produced. When producing product $i$ , a stack for customer $c$ with $i\in P_{c}$ is opened, and it is closed when all of $P_{c}$ are produced. The objective is to minimize the maximum number of open stacks at a time. MOSP is strongly NP-hard [169].

For MOSP, customer search is a state-of-the-art exact method [155]. It searches for an order of customers to close stacks, from which the order of products is determined; for each customer $c$ , all products ordered by $c$ and not yet produced are consecutively produced in an arbitrary order. We formulate customer search as a DyPDL model. A set variable $R$ represents customers whose stacks are not closed, and $O$ represents customers whose stacks have been opened. Let $N_{c}=\{c^{\prime}\in C\mid P_{c}\cap P_{c^{\prime}}\neq\emptyset\}$ be the set of customers that order at least one of the same products as customer $c$ . When producing items for customer $c$ , we need to open stacks for customers in $N_{c}\setminus O$ , and stacks for customers in $O\cap R$ remain open.

	$\displaystyle\text{compute }V(C,\emptyset)$		(69)
	$\displaystyle V(R,O)=\begin{cases}0&\text{if }R=\emptyset\\ \min\limits_{c\in R}\max\left\{\|(O\cap R)\cup(N_{c}\setminus O)\|,V(R\setminus\{c\},O\cup N_{c})\right\}&\text{else}\end{cases}$		(70)
	$\displaystyle V(R,O)\geq 0.$		(71)

Similar to the DyPDL model for graph-clear in Section 6.6, this model is cost-algebraic, and the base cost is zero, so the first solution found by CAASDy is optimal.

B.5 CP Model for Orienteering Problem with Time Windows (OPTW)

In the orienteering problem with time windows (OPTW), we are given a set of customers $N=\{0,...,n-1\}$ , the travel time $c_{ij}$ from customer $i$ to $j$ , and the profit $p_{i}$ of customer $i$ . We define an optional interval variable $x_{i}$ that represents visiting customer $i$ in $[a_{i},b_{i}]$ . We also introduce an interval variable $x_{n}$ that represents returning to the depot ( $0$ ) and define $c_{in}=c_{i0}$ and $c_{ni}=c_{0i}$ for each $i\in N$ . We use a sequence variable $\pi$ to sequence the interval variables.

$\displaystyle\max$	$\displaystyle\sum_{i\in N\setminus\{0\}}p_{i}\mathsf{Pres}(x_{i})$		(72)
s.t.	$\displaystyle\mathsf{NoOverlap}(\pi,\{c_{ij}\mid(i,j)\in(N\cup\{n\})\times(N\cup\{n\})\})$		(73)
	$\displaystyle\mathsf{First}(\pi,x_{0})$		(74)
	$\displaystyle\mathsf{Last}(\pi,x_{n})$		(75)
	$\displaystyle x_{i}:\mathsf{optIntervalVar}(0,[a_{i},b_{i}])$	$\displaystyle\forall i\in N\setminus\{0\}$	(76)
	$\displaystyle x_{0}:\mathsf{intervalVar}(0,[0,0])$		(77)
	$\displaystyle x_{n}:\mathsf{intervalVar}(0,[0,b_{0}])$		(78)
	$\displaystyle\pi:\mathsf{sequenceVar}(\{x_{0},...,x_{n}\}).$		(79)

Objective (72) maximizes the total profit, where $\mathsf{Pres}(x_{i})=1$ if the optional interval variable $x_{i}$ is present. Constraint (73) ensures that if $x_{j}$ is present in $\pi$ after $x_{i}$ , the distance between them is at least $c_{ij}$ . Constraints (74) and (75) ensure that the tour starts from and returns to the depot.

B.6 CP Model for Bin Packing

In the bin packing problem, we are given a set of items $N=\{0,...,n-1\}$ , the weight $w_{i}$ of each item $i\in N$ , and the capacity of a bin $q$ . We compute the upper bound $\bar{m}$ on the number of bins using the first-fit decreasing heuristic and use $M=\{0,...,\hat{m}-1\}$ . We use $\mathsf{Pack}$ and ensure that item $i$ is packed in the $i$ -th or an earlier bin.

$\displaystyle\min$	$\displaystyle\max_{i\in N}x_{i}+1$		(80)
s.t.	$\displaystyle\mathsf{Pack}(\{y_{j}\mid j\in M\},\{x_{i}\mid i\in N\},\{w_{i}\mid i\in N\})$		(81)
	$\displaystyle 0\leq y_{j}\leq q$	$\displaystyle\forall j\in M$	(82)
	$\displaystyle 0\leq x_{i}\leq i$	$\displaystyle\forall i\in N$	(83)
	$\displaystyle y_{j}\in\mathbb{Z}$	$\displaystyle\forall j\in M$	(84)
	$\displaystyle x_{i}\in\mathbb{Z}$	$\displaystyle\forall i\in N.$	(85)

B.7 CP Model for Simple Assembly Line Balancing Problem (SALBP-1)

In SALBP-1, in addition to tasks and the capacity of a station, we are given $P_{i}$ , the set of the predecessors of task $i$ . We implement the CP model proposed by Bukchin and Raviv [16] with the addition of $\mathsf{Pack}$ . For an upper bound on the number of stations, instead of using a heuristic to compute it, we use $\bar{m}=\min\{n,2\lceil\sum_{i\in N}w_{i}/q\rceil\}$ following the MIP model [15]. Let $m$ be the number of stations, $x_{i}$ be the index of the station of task $i$ , and $y_{j}$ be the sum of the weights of tasks scheduled in station $j$ . The set of all direct and indirect predecessors of task $i$ is $\tilde{P}_{i}=\{j\in N\mid j\in P_{i}\lor\exists k\in\tilde{P}_{j},j\in\tilde{P}_{k}\}$ . The set of all direct and indirect successors of task $i$ is $\tilde{S}_{i}=\{j\in N\mid i\in P_{j}\lor\exists k\in\tilde{S}_{i},j\in\tilde{S}_{k}\}$ . Thus, $e_{i}=\left\lceil\frac{w_{i}+\sum_{k\in\tilde{P}_{i}}t_{k}}{q}\right\rceil$ is a lower bound on the number of stations required to schedule task $i$ , $l_{i}=\left\lfloor\frac{w_{i}-1+\sum_{k\in\tilde{S}_{i}}t_{k}}{q}\right\rfloor$ is a lower bound on the number of stations between the station of task $i$ and the last station, and $d_{ij}=\left\lfloor\frac{w_{i}+t_{j}-1+\sum_{k\in\tilde{S}_{i}\cap\tilde{P}_{j}}t_{k}}{q}\right\rfloor$ is a lower bound on the number of stations between the stations of tasks $i$ and $j$ .

$\displaystyle\min\$	$\displaystyle m$		(86)
s.t.	$\displaystyle\mathsf{Pack}(\{y_{j}\mid j\in M\},\{x_{i}\mid i\in N\},\{w_{i}\mid i\in N\})$		(87)
	$\displaystyle 0\leq y_{j}\leq q$	$\displaystyle\forall j\in M$	(88)
	$\displaystyle e_{i}-1\leq x_{i}\leq m-1-l_{i}$	$\displaystyle\forall i\in N$	(89)
	$\displaystyle x_{i}+d_{ij}\leq x_{j}$	$\displaystyle\forall j\in N,\forall i\in\tilde{P}_{j},\not\exists k\in\tilde{S}_{i}\cap\tilde{P}_{j}:d_{ij}\leq d_{ik}+d_{kj}$	(90)
	$\displaystyle m\in\mathbb{Z}$		(91)
	$\displaystyle y_{j}\in\mathbb{Z}$	$\displaystyle\forall j\in M$	(92)
	$\displaystyle x_{i}\in\mathbb{Z}$	$\displaystyle\forall i\in N.$	(93)

Constraint (89) states the lower and upper bounds on the index of the station of $i$ . Constraint (90) is an enhanced version of the precedence constraint using $d_{ij}$ .

Appendix C Modeling in AI Planning

In Section 7.6, we compare DIDP with AI planning approaches: numeric planning using planning domain definition languages (PDDL) [86, 64] and Picat, a logic programming language that has an AI planning module. We explain how we formulate models for these approaches.

C.1 PDDL

We model TSPTW, CVRP, m-PDTSP, bin packing, and SALBP-1 as linear numeric planning tasks [170], where preconditions and effects of actions are represented by linear formulas of numeric state variables. In these models, the objective is to minimize the sum of nonnegative action costs, which is a standard in optimal numeric planning [171, 172, 173, 174, 175, 176, 177, 178, 179]. We use PDDL 2.1 [64] to formulate the models and NLM-CutPlan Orbit [159], the winner of the optimal numeric track of the International Planning Competition (IPC) 2023,¹⁸¹⁸18https://ipc2023-numeric.github.io/ to solve the models. The PDDL models are adaptations of the DyPDL models presented in Section 6, where a numeric or element variable in DyPDL becomes a numeric variable in PDDL (except for the element variable representing the current location in the routing problems as explained in the next paragraph), a set variable is represented by a predicate and a set of objects, the target state becomes the initial state, base cases become the goal conditions, and a transition becomes an action. However, we are unable to model dominance between states and dual bound functions in PDDL. In addition, we cannot differentiate forced transitions and other transitions. We explain other differences between the PDDL models and the DyPDL models in the following paragraphs.

In the DyPDL models of the routing problems (TSPTW, CVRP, and m-PDTSP), each transition visits one customer and increases the cost by the travel time from the current location to the customer. Since a goal state in PDDL is not associated with a cost unlike a base case in DyPDL, we also define an action to return to the depot, which increases the cost by the travel time to the depot. While the travel time depends on the current location, for NLM-CutPlan Orbit, the cost of an action must be a nonnegative constant independent of a state, which is a standard in admissible heuristic functions for numeric planning [171, 173, 174, 176, 177, 179]. Thus, in the PDDL models, we define one action with two parameters, the current location (?from) and the destination (?to), so that the cost of each grounded action becomes a state-independent constant (c ?from ?to) corresponding to the travel time. We define a predicate (visited ?customer) representing if a customer is visited and (location ?customer) representing the current location. In each action, we use (location ?from) and (not (visited ?to)) as preconditions and (not (location ?from)), (location ?to), and (visited ?to) as effects.

In the DyPDL models of TSPTW, each transition updates the current time $t$ to $\max\{t+c_{ij},a_{j}\}$ , where $c_{ij}$ is the travel time and $a_{j}$ is the beginning of the time window at the destination. While this effect can be written as (assign (t) (max (+ (t) (c ?from ?to)) (a ?to))) in PDDL, to represent effects as linear formulas, we introduce two actions: one with a precondition ( $>=$ (+ (t) (c ?from ?to)) (a ?to)) and an effect (increase (t) (c ?from ?to)), corresponding to $t\leftarrow t+c_{ij}$ if $t+c_{ij}\geq a_{j}$ , and another with a precondition ( $<$ (+ (t) (c ?from ?to)) (a ?to)) and an effect (assign (t) (a ?to)), corresponding to $t\leftarrow a_{j}$ if $t+c_{ij}<a_{j}$ .

In the DyPDL models of TSPTW and CVRP, we have redundant state constraints. While a state constraint could be modeled by introducing it as a precondition of each action, we do not use the state constraints in the PDDL models of TSPTW and CVRP because efficiently modeling them is non-trivial: straightforward approaches result in an exponential number of actions. For TSPTW, the state constraint checks if all unvisited customers can be visited by the deadline, represented as $\forall j\in U,t+c^{*}_{ij}\leq b_{j}$ , where $U$ is the set of the unvisited customers, $c^{*}_{ij}$ is the shortest travel time from the current location to customer $j$ , and $b_{j}$ is the deadline. One possible way to model this constraint is to define a disjunctive precondition (or (visited ?j) ( $<$ = (+ (t) (cstar ?from ?j)) (b ?j))) for each customer ?j, where (t) is a numeric variable corresponding to $t$ , (cstar ?from ?j) is a numeric constant corresponding to $c^{*}_{ij}$ , and (b ?j) is a numeric constant corresponding to $b_{j}$ . However, the heuristic function used by NLM-CutPlan Orbit does not support disjunctive preconditions, and NLM-CutPlan Orbit compiles an action with disjunctive preconditions into a set of actions with different combinations of the preconditions.¹⁹¹⁹19This approach is inherited from Fast Downward [36], the standard classical planning framework on which NLM-CutPlan Orbit is based. In our case, each action has one of the two preconditions, (visited ?j) or ( $<=$ (+ (t) (cstar ?from ?j)) (b ?j)), resulting in $2^{n}$ actions in total, where $n$ is the number of customers. In CVRP, the state constraint takes the sum of demands over all unvisited customers. To model this computation independently of a state, we need to define an action for each possible set of unvisited customers, resulting in $2^{n}$ actions in total.

In the DyPDL models of bin packing and SALBP-1, each transition packs an item in the current bin (schedules a task in the current station for SALBP-1) or opens a new bin. When opening a new bin, the transition checks if no item can be packed in the current bin as a precondition, which is unnecessary but useful to exclude suboptimal solutions. However, for similar reasons to the state constraint in TSPTW, we do not model this precondition in the PDDL models. We could model this condition by defining (or (packed ?j) ( $>$ (w ?j) (r))) for each item ?j, where (packed ?j) represents if ?j is already packed, (w ?j) represents the weight of ?j, and (r) is the remaining capacity. However, as discussed above, NLM-CutPlan Orbit would generate an exponential number of actions with this condition.

In addition to the above problem classes, we also use MOSP: it was used as a benchmark domain in the classical planning tracks of the International Planning Competitions from 2006 to 2014. This PDDL formulation is different from our DyPDL model. To solve the model, we use Ragnarok [160], the winner of the optimal classical track of IPC 2023.²⁰²⁰20https://ipc2023-classical.github.io/

We do not use other problem classes since their DyPDL models do not minimize the sum of the state-independent and nonnegative action costs. In $1||\sum w_{i}T_{i}$ and talent scheduling, since the cost of each transition depends on a set variable, we need an exponential number of actions to make it state-independent. In OPTW and MDKP, the objective is to maximize the sum of the nonnegative profits. In graph-clear, the objective is to minimize the maximum value of state-dependent weights associated with transitions.

C.2 Picat

Picat is a logic programming language, in which DP can be used with tabling, a feature to store and reuse the evaluation results of predicates, without implementing a DP algorithm. Picat provides an AI planning module based on tabling, where a state, goal conditions, and actions can be programmatically described by expressions in Picat. While the cost of a plan is still restricted to the sum of the nonnegative action costs, each action cost can be state-dependent. In addition, an admissible heuristic function can be defined and used by a solving algorithm. Thus, we can define a dual bound function as an admissible heuristic function in the AI planning module. However, we cannot model dominance between states. Using the AI planning module, we formulate models for TSPTW, CVRP, m-PDTSP, bin packing, SALBP-1, $1||\sum w_{i}T_{i}$ , and talent scheduling, which are the same as the DyPDL models except that they do not define dominance. To solve the formulated models, we use the best_plan_bb predicate, which performs a branch-and-bound algorithm using the heuristic function. For OPTW, MDKP, MOSP, and graph-clear, we do not use the AI planning module due to the objective structure of their DyPDL models. We define DP models for these problem classes, which are the same as the DyPDL models except that they do not define dominance and dual bound functions, using tabling without the AI planning module.

Acknowledgement

This work was supported by the Natural Sciences and Engineering Research Council of Canada. This reseearch was enabled in part by support provided by Compute Ontario and the Digital Research Alliance of Canada (alliancecan.ca).

References

Toth and Vigo [2014] P. Toth, D. Vigo, Vehicle Routing: Problems, Methods, and Applications, second ed., Society for Industrial and Applied Mathematics, 2014. doi:10.1137/1.9781611973594.
Cheng et al. [1993] T. C. E. Cheng, J. E. Diamond, B. M. T. Lin, Optimal scheduling in film production to minimize talent hold cost, J. Optim. Theory Appl. 79 (1993) 479–492. doi:10.1007/BF00940554.
Pinedo [2009] M. L. Pinedo, Planning and Scheduling in Manufacturing and Services, second ed., Springer, New York, NY, 2009. doi:10.1007/978-1-4419-0910-7.
Boysen et al. [2021] N. Boysen, P. Schulze, A. Scholl, Assembly line balancing: What happened in the last fifteen years?, Eur. J. Oper. Res. 301 (2021) 797–814. doi:10.1016/j.ejor.2021.11.043.
Russell and Norvig [2020] S. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, fourth ed., Pearson, 2020.
Beck and Fox [1998] J. C. Beck, M. S. Fox, A generic framework for constraint-directed search and scheduling, AI Mag. 19 (1998) 103. doi:10.1609/aimag.v19i4.1426.
Bengio et al. [2021] Y. Bengio, A. Lodi, A. Prouvost, Machine learning for combinatorial optimization: A methodological tour d’horizon, Eur. J. Oper. Res. 290 (2021) 405–421. doi:10.1016/j.ejor.2020.07.063.
Korte and Vygen [2018] B. Korte, J. Vygen, Combinatorial Optimization: Theory and Algorithms, sixth ed., Springer, Berlin, Heidelberg, 2018. doi:10.1007/978-3-662-56039-6.
Freuder [1997] E. Freuder, In pursuit of the holy grail, Constraints 2 (1997) 57–61. doi:10.1023/A:1009749006768.
Hungerländer and Truden [2018] P. Hungerländer, C. Truden, Efficient and easy-to-implement mixed-integer linear programs for the traveling salesperson problem with time windows, Transp. Res. Proc. 30 (2018) 157–166. doi:10.1016/j.trpro.2018.09.018.
Booth et al. [2016] K. E. Booth, T. T. Tran, G. Nejat, J. C. Beck, Mixed-integer and constraint programming techniques for mobile robot task planning, IEEE Robot. Autom. Lett. 1 (2016) 500–507. doi:10.1109/LRA.2016.2522096.
Gadegaard and Lysgaard [2021] S. L. Gadegaard, J. Lysgaard, A symmetry-free polynomial formulation of the capacitated vehicle routing problem, Discrete Appl. Math. 296 (2021) 179–192. doi:10.1016/j.dam.2020.02.012.
Rabbouch et al. [2019] B. Rabbouch, F. Saâdaoui, R. Mraihi, Constraint programming based algorithm for solving large-scale vehicle routing problems, in: Hybrid Artificial Intelligent Systems, Springer International Publishing, Cham, 2019, pp. 526–539. doi:10.1007/978-3-030-29859-3_45.
Letchford and Salazar-González [2016] A. N. Letchford, J.-J. Salazar-González, Stronger multi-commodity flow formulations of the (capacitated) sequential ordering problem, Eur. J. Oper. Res. 251 (2016) 74–84. doi:10.1016/j.ejor.2015.11.001.
Ritt and Costa [2018] M. Ritt, A. M. Costa, Improved integer programming models for simple assembly line balancing and related problems, Int. Trans. Oper. Res. 25 (2018) 1345–1359. doi:10.1111/itor.12206.
Bukchin and Raviv [2018] Y. Bukchin, T. Raviv, Constraint programming for solving various assembly line balancing problems, Omega 78 (2018) 57–68. doi:10.1016/j.omega.2017.06.008.
Keha et al. [2009] A. B. Keha, K. Khowala, J. W. Fowler, Mixed integer programming formulations for single machine scheduling problems, Comput. Ind. Eng. 56 (2009) 357–367. doi:10.1016/j.cie.2008.06.008.
Martin et al. [2021] M. Martin, H. H. Yanasse, M. J. Pinto, Mathematical models for the minimization of open stacks problem, Int. Trans. Oper. Res. 29 (2021) 2944–2967. doi:10.1111/itor.13053.
Morin et al. [2018] M. Morin, M. P. Castro, K. E. Booth, T. T. Tran, C. Liu, J. C. Beck, Intruder alert! Optimization models for solving the mobile robot graph-clear problem, Constraints 23 (2018) 335–354. doi:10.1007/s10601-018-9288-3.
Laborie et al. [2018] P. Laborie, J. Rogerie, P. Shaw, P. Vilím, IBM ILOG CP optimizer for scheduling, Constraints 23 (2018) 210–250. doi:10.1007/s10601-018-9281-x.
Bellman [1957] R. Bellman, Dynamic Programming, Princeton University Press, 1957.
Held and Karp [1962] M. Held, R. M. Karp, A dynamic programming approach to sequencing problems, J. Soc. Ind. Appl. Math. 10 (1962) 196–210. doi:10.1137/0110015.
Dumas et al. [1995] Y. Dumas, J. Desrosiers, E. Gelinas, M. M. Solomon, An optimal algorithm for the traveling salesman problem with time windows, Oper. Res. 43 (1995) 367–371. doi:10.1287/opre.43.2.367.
Gromicho et al. [2012] J. Gromicho, J. J. V. Hoorn, A. L. Kok, J. M. Schutten, Restricted dynamic programming: A flexible framework for solving realistic VRPs, Comput. Oper. Res. 39 (2012) 902–909. doi:10.1016/j.cor.2011.07.002.
Righini and Salani [2008] G. Righini, M. Salani, New dynamic programming algorithms for the resource constrained elementary shortest path problem, Networks 51 (2008) 155–170. doi:10.1002/net.20212.
Righini and Salani [2009] G. Righini, M. Salani, Decremental state space relaxation strategies and initialization heuristics for solving the orienteering problem with time windows with dynamic programming, Comput. Oper. Res. 36 (2009) 1191–1203. doi:10.1016/j.cor.2008.01.003.
Lawler [1964] E. L. Lawler, On scheduling problems with deferral costs, Manag. Sci. 11 (1964) 280–288. doi:10.1287/mnsc.11.2.280.
Garcia de la Banda and Stuckey [2007] M. Garcia de la Banda, P. J. Stuckey, Dynamic programming to minimize the maximum number of open stacks, INFORMS J. Comput. 19 (2007) 607–617. doi:10.1287/ijoc.1060.0205.
Garcia de la Banda et al. [2011] M. Garcia de la Banda, P. J. Stuckey, G. Chu, Solving talent scheduling with dynamic programming, INFORMS J. Comput. 23 (2011) 120–137. doi:10.1287/ijoc.1090.0378.
Ghallab et al. [2004] M. Ghallab, D. Nau, P. Traverso, Automated Planning, The Morgan Kaufmann Series in Artificial Intelligence, Morgan Kaufmann, Burlington, 2004. doi:10.1016/B978-1-55860-856-6.X5000-5.
Russell and Norvig [2020] S. Russell, P. Norvig, Solving problems by searching, in: Artificial Intelligence: A Modern Approach, fourth ed., Pearson, 2020, pp. 63–109.
Pearl and Kim [1982] J. Pearl, J. H. Kim, Studies in semi-admissible heuristics, IEEE Trans. Pattern Anal. Machh. Intell. PAMI-4 (1982) 392–399. doi:10.1109/TPAMI.1982.4767270.
Edelkamp and Schrödl [2012] S. Edelkamp, S. Schrödl, Heuristic Search: Theory and Applications, Morgan Kaufmann, San Francisco, 2012. doi:10.1016/C2009-0-16511-X.
Bonet and Geffner [2001] B. Bonet, H. Geffner, Planning as heuristic search, Artificial Intelligence 129 (2001) 5–33. doi:10.1016/S0004-3702(01)00108-4.
Hoffmann and Nebel [2001] J. Hoffmann, B. Nebel, The FF planning system: Fast plan generation through heuristic search, Journal of Artificial Intelligence Research 14 (2001) 253–302. doi:10.1613/jair.855.
Helmert [2006] M. Helmert, The Fast Downward planning system, J. Artif. Intell. Res. 26 (2006) 191–246. doi:10.1613/jair.1705.
Karp and Held [1967] R. M. Karp, M. Held, Finite-state processes and dynamic programming, SIAM J. Appl. Math. 15 (1967) 693–718. doi:10.1137/0115060.
Ibaraki [1972] T. Ibaraki, Representation theorems for equivalent optimization problems, Inf. Control 21 (1972) 397–435. doi:10.1016/S0019-9958(72)90125-8.
Ibaraki [1973a] T. Ibaraki, Finite state representations of discrete optimization problems, SIAM J. Comput. 2 (1973a) 193–210. doi:10.1137/0202016.
Ibaraki [1973b] T. Ibaraki, Solvable classes of discrete dynamic programming, J. Math. Anal. Appl. 43 (1973b) 642–693. doi:10.1016/0022-247X(73)90283-7.
Ibaraki [1974] T. Ibaraki, Classes of discrete optimization problems and their decision problems, J. Comput. Syst. Sci. 8 (1974) 84–116. doi:10.1016/S0022-0000(74)80024-3.
Martelli and Montanari [1975] A. Martelli, U. Montanari, On the foundations of dynamic programming, in: S. Rinaldi (Ed.), Topics in Combinatorial Optimization, Springer, Vienna, 1975, pp. 145–163. doi:10.1007/978-3-7091-3291-3_9.
Helman [1982] P. Helman, A New Theory of Dynamic Programming, Ph.D. thesis, University of Michigan, Ann Arbor, 1982.
Kumar and Kanal [1988] V. Kumar, L. N. Kanal, The cdp: A unifying formulation for heuristic search, dynamic programming, and branch-and-bound, in: L. Kanal, V. Kumar (Eds.), Search in Artificial Intelligence, Springer, New York, NY, 1988, pp. 1–27. doi:10.1007/978-1-4613-8788-6_1.
Michie [1968] D. Michie, “memo” functions and machine learning, Nature 218 (1968) 19–22. doi:10.1038/218019a0.
Bird [1980] R. S. Bird, Tabulation techniques for recursive programs, ACM Comput. Surveys 12 (1980) 403–417. doi:10.1145/356827.356831.
Tamaki and Sato [1986] H. Tamaki, T. Sato, Old resolution with tabulation, in: Third International Conference on Logic Programming (ICLP), Springer, Berlin, Heidelberg, 1986, pp. 84–98.
Puchinger and Stuckey [2008] J. Puchinger, P. J. Stuckey, Automating branch-and-bound for dynamic programs, in: Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, PEPM ’08, Association for Computing Machinery, New York, NY, USA, 2008, pp. 81–89. doi:10.1145/1328408.1328421.
Zhou et al. [2015] N.-F. Zhou, H. Kjellerstrand, J. Fruhman, Constraint Solving and Planning with Picat, Springer, Cham, 2015. doi:10.1007/978-3-319-25883-6.
Giegerich and Meyer [2002] R. Giegerich, C. Meyer, Algebraic dynamic programming, in: Proceedings of the Ninth Algebraic Methodology and Software Technology, Springer, Berlin, Heidelberg, 2002, pp. 349–364. doi:10.1007/3-540-45719-4_24.
zu Siederdissen et al. [2015] C. H. zu Siederdissen, S. J. Prohaska, P. F. Stadler, Algebraic dynamic programming over general data structures, BMC Bioinformatics 16 (2015). doi:10.1186/1471-2105-16-S19-S2.
Eisner et al. [2005] J. Eisner, E. Goldlust, N. A. Smith, Compiling comp ling: Weighted dynamic programming and the Dyna language, in: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), Association for Computational Linguistics, USA, 2005, pp. 281–290. doi:10.3115/1220575.1220611.
Vieira et al. [2017] T. Vieira, M. Francis-Landau, N. W. Filardo, F. Khorasani, J. Eisner, Dyna: Toward a self-optimizing declarative language for machine learning applications, in: Proceedings of the First ACM SIGPLAN Workshop on Machine Learning and Programming Languages (MAPL), ACM, 2017, pp. 8–17. doi:10.1145/3088525.3088562.
Sundstrom and Guzzella [2009] O. Sundstrom, L. Guzzella, A generic dynamic programming matlab function, in: 2009 IEEE Control Applications, (CCA) & Intelligent Control, (ISIC), 2009, pp. 1625–1630. doi:10.1109/CCA.2009.5281131.
Miretti et al. [2021] F. Miretti, D. Misul, E. Spessa, Dynaprog: Deterministic dynamic programming solver for finite horizon multi-stage decision problems, SoftwareX 14 (2021) 100690. doi:10.1016/j.softx.2021.100690.
Hooker [2013] J. N. Hooker, Decision diagrams and dynamic programming, in: Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems – 10th International Conference, CPAIOR 2013, Springer, Berlin, Heidelberg, 2013, pp. 94–110. doi:10.1007/978-3-642-38171-3_7.
Bergman et al. [2016] D. Bergman, A. A. Cire, W.-J. van Hoeve, J. N. Hooker, Discrete optimization with decision diagrams, INFORMS J. Comput. 28 (2016) 47–66. doi:10.1287/ijoc.2015.0648.
Gillard et al. [2020] X. Gillard, P. Schaus, V. Coppé, Ddo, a generic and efficient framework for MDD-based optimization, in: Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI-20, International Joint Conferences on Artificial Intelligence Organization, 2020, pp. 5243–5245. doi:10.24963/ijcai.2020/757, demos.
Michel and van Hoeve [2024] L. Michel, W.-J. van Hoeve, CODD: A decision diagram-based solver for combinatorial optimization, in: ECAI 2024 – 27th European Conference on Artificial Intelligence, volume 392 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2024, pp. 4240–4247. doi:10.3233/FAIA240997.
Lew and Mauch [2006] A. Lew, H. Mauch, Dynamic Programming: A Computational Tool, Springer, Berlin, Heidelberg, 2006. doi:10.1007/978-3-540-37014-7.
Chalumeau et al. [2021] F. Chalumeau, I. Coulon, Q. Cappart, L.-M. Rousseau, Seapearl: A constraint programming solver guided by reinforcement learning, in: Integration of Constraint Programming, Artificial Intelligence, and Operations Research – 18th International Conference, CPAIOR 2021, Springer International Publishing, Cham, 2021, pp. 392–409. doi:10.1007/978-3-030-78230-6_25.
Cappart et al. [2021] Q. Cappart, T. Moisan, L.-M. Rousseau, I. Prémont-Schwarz, A. A. Cire, Combining reinforcement learning and constraint programming for combinatorial optimization, in: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, Palo Alto, California USA, 2021, pp. 3677–3687. doi:10.1609/aaai.v35i5.16484.
Ghallab et al. [1998] M. Ghallab, A. Howe, C. Knoblock, D. McDermott, A. Ram, M. Veloso, D. Weld, D. Wilkins, PDDL - The Planning Domain Definition Language, Technical Report, Yale Center for Computational Vison and Control, 1998. CVC TR-98-003/DCS TR-1165.
Fox and Long [2003] M. Fox, D. Long, PDDL2.1: An extension to PDDL for expressing temporal planning domains, J. Artif. Intell. Res. 20 (2003) 61–124. doi:10.1613/jair.1129.
Fox and Long [2006] M. Fox, D. Long, Modelling mixed discrete-continuous domains for planning, J. Artif. Intell. Res. 27 (2006) 235–297. doi:10.1613/jair.2044.
Sanner [2010] S. Sanner, Relational dynamic influence diagram language (rddl): Language description, 2010. URL: http://users.cecs.anu.edu.au/~ssanner/IPPC_2011/RDDL.pdf, accessed on 2024-05-31.
Hernádvölgyi et al. [2000] I. T. Hernádvölgyi, R. C. Holte, T. Walsh, Experiments with automatically created memory-based heuristics, in: Abstraction, Reformulation, and Approximation. SARA 2000, Springer, Berlin, Heidelberg, 2000, pp. 281–290. doi:10.1007/3-540-44914-0_18.
Gentzel et al. [2020] R. Gentzel, L. Michel, W.-J. van Hoeve, Haddock: A language and architecture for decision diagram compilation, in: Principles and Practice of Constraint Programming – CP 2020, Springer International Publishing, Cham, 2020, pp. 531–547. doi:10.1007/978-3-030-58475-7_31.
Martelli and Montanari [1975] A. Martelli, U. Montanari, From dynamic programming to search algorithms with functional costs, in: Proceedings of the Fourth International Joint Conference on Artificial Intelligence, IJCAI-75, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1975, pp. 345–350.
Catalano et al. [1979] A. Catalano, S. Gnesi, U. Montanari, Shortest path problems and tree grammars: An algebraic framework, in: Graph-Grammars and Their Application to Computer Science and Biology, Springer, Berlin, Heidelberg, 1979, pp. 167–179. doi:10.1007/BFb0025719.
Gnesi et al. [1981] S. Gnesi, U. Montanari, A. Martelli, Dynamic programming as graph searching: An algebraic approach, J. ACM 28 (1981) 737–751. doi:10.1145/322276.322285.
Ibaraki [1978] T. Ibaraki, Branch-and-bound procedure and state-space representation of combinatorial optimization problems, Inf. Control 36 (1978) 1–27. doi:10.1016/S0019-9958(78)90197-3.
Holte and Fan [2015] R. Holte, G. Fan, State space abstraction in artificial intelligence and operations research, in: Planning, Search, and Optimization: Papers from the 2015 AAAI Workshop, 2015, pp. 55–60.
Kuroiwa and Beck [2023a] R. Kuroiwa, J. C. Beck, Domain-independent dynamic programming: Generic state space search for combinatorial optimization, in: Proceedings of the 33rd International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, Palo Alto, California USA, 2023a, pp. 236–244. doi:10.1609/icaps.v33i1.27200.
Kuroiwa and Beck [2023b] R. Kuroiwa, J. C. Beck, Solving domain-independent dynamic programming problems with anytime heuristic search, in: Proceedings of the 33rd International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, Palo Alto, California USA, 2023b, pp. 245–253. doi:10.1609/icaps.v33i1.27201.
Kuroiwa and Beck [2024] R. Kuroiwa, J. C. Beck, Parallel beam search algorithms for domain-independent dynamic programming, in: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, Washington, DC, USA, 2024, pp. 20743–20750. doi:10.1609/aaai.v38i18.30062.
Howard [1960] R. A. Howard, Dynamic programming and markov process, John Wiley & Sons, Inc., New York, 1960.
Sutton and Barto [2018] R. S. Sutton, A. G. Barto, Dynamic programming, in: Reinforcement Learning: An Introduction, second edition ed., A Bradford Book, Cambridge, MA, USA, 2018.
Savelsbergh [1985] M. W. P. Savelsbergh, Local search in routing problems with time windows, Ann. Oper. Res. 4 (1985) 285–305. doi:10.1007/BF02022044.
Fikes and Nilsson [1971] R. E. Fikes, N. J. Nilsson, Strips: A new approach to the application of theorem proving to problem solving, Artif. Intell. 2 (1971) 189–208. doi:10.1016/0004-3702(71)90010-5.
Bäckström and Nebel [1995] C. Bäckström, B. Nebel, Complexity results for SAS+ planning, Comput. Intell. 11 (1995) 625–656. doi:10.1111/j.1467-8640.1995.tb00052.x.
Helmert [2002] M. Helmert, Decidability and undecidability results for planning with numerical state variables, in: Proceedings of the Sixth International Conference on Artificial Intelligence Planning Systems (AIPS), AAAI Press, 2002, pp. 44–53.
Gnad et al. [2023] D. Gnad, M. Helmert, P. Jonsson, A. Shleyfman, Planning over integers: Compilations and undecidability, in: Proceedings of the 23rd International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, Palo Alto, California USA, 2023, pp. 148–152. doi:10.1609/icaps.v33i1.27189.
Shleyfman et al. [2023] A. Shleyfman, D. Gnad, P. Jonsson, Structurally restricted fragments of numeric planning – a complexity analysis, in: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, Washington, DC, USA, 2023, pp. 12112–12119. doi:10.1609/aaai.v37i10.26428.
Gigante and Scala [2023] N. Gigante, E. Scala, On the compilability of bounded numeric planning, in: Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI-23, International Joint Conferences on Artificial Intelligence Organization, 2023, pp. 5341–5349. doi:10.24963/ijcai.2023/593, main track.
McDermott [2000] D. M. McDermott, The 1998 AI planning systems competition, AI Mag. 21 (2000) 35–55. doi:10.1609/aimag.v21i2.1506.
Torralba and Hoffmann [2015] Á. Torralba, J. Hoffmann, Simulation-based admissible dominance pruning, in: Proceedings of the 24th International Joint Conference on Artificial Intelligence, IJCAI-15, AAAI Press/International Joint Conferences on Artificial Intelligence Organization, Palo Alto, California USA, 2015, pp. 1689–1695.
Libralesso et al. [2020] L. Libralesso, A.-M. Bouhassoun, H. Cambazard, V. Jost, Tree search for the sequential ordering problem, in: ECAI 2020 – 24th European Conference on Artificial Intelligence, volume 325 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2020, pp. 459–465. doi:10.3233/FAIA200126.
Pearl [1984] J. Pearl, Heuristics: Intelligent Search Strategies for Computer Problem Solving, Addison-Wesley Longman Publishing Co., Inc., USA, 1984. doi:10.5555/525.
Dijkstra [1959] E. W. Dijkstra, A note on two problems in connexion with graphs, Numer. Math. (1959) 269–271. doi:10.1007/BF01386390.
Hart et al. [1968] P. E. Hart, N. J. Nilsson, B. Raphael, A formal basis for the heuristic determination of minimum cost paths, IEEE Trans. Syst. Sci. Cybern. 4 (1968) 100–107. doi:10.1109/TSSC.1968.300136.
Edelkamp et al. [2005] S. Edelkamp, S. Jabbar, A. L. Lafuente, Cost-algebraic heuristic search, in: Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), AAAI Press, 2005, pp. 1362–1367.
Nau et al. [1984] D. S. Nau, V. Kumar, L. Kanal, General branch and bound, and its relation to A* and AO*, Artif. Intell. 23 (1984) 29–58. doi:10.1016/0004-3702(84)90004-3.
Chakrabarti et al. [1989] P. Chakrabarti, S. Ghose, A. Pandey, S. De Sarkar, Increasing search efficiency using multiple heuristics, Inf. Process. Lett. 30 (1989) 33–36. doi:10.1016/0020-0190(89)90171-3.
Baier et al. [2009] J. A. Baier, F. Bacchus, S. A. McIlraith, A heuristic search approach to planning with temporally extended preferences, Artif. Intell. 173 (2009) 593–618. doi:10.1016/j.artint.2008.11.011, advances in Automated Plan Generation.
Thayer and Ruml [2011] J. T. Thayer, W. Ruml, Bounded suboptimal search: A direct approach using inadmissible estimates, in: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI-11, AAAI Press/International Joint Conferences on Artificial Intelligence Orginization, Menlo Park, California, 2011, pp. 674–679. doi:10.5591/978-1-57735-516-8/IJCAI11-119.
Aine et al. [2016] S. Aine, S. Swaminathan, V. Narayanan, V. Hwang, M. Likhachev, Multi-heuristic A*, Int. J. Robot. Res. 35 (2016) 224–243. doi:10.1177/0278364915594029.
Fickert et al. [2022] M. Fickert, T. Gu, W. Ruml, New results in bounded-suboptimal search, in: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, Palo Alto, California USA, 2022, pp. 10166–10173. doi:10.1609/aaai.v36i9.21256.
Vadlamudi et al. [2012] S. G. Vadlamudi, P. Gaurav, S. Aine, P. P. Chakrabarti, Anytime column search, in: AI 2012: Advances in Artificial Intelligence, Springer, Berlin, Heidelberg, 2012, pp. 254–265. doi:10.1007/978-3-642-35101-3_22.
Vadlamudi et al. [2016] S. G. Vadlamudi, S. Aine, P. P. Chakrabarti, Anytime pack search, Nat. Comput. 15 (2016) 395–414. doi:10.1007/978-3-642-45062-4_88.
Kao et al. [2009] G. K. Kao, E. C. Sewell, S. H. Jacobson, A branch, bound, and remember algorithm for the $1|r_{t}|\sum t_{i}$ scheduling problem, J. Sched. 12 (2009) 163–175. doi:10.1007/s10951-008-0087-3.
Sewell and Jacobson [2012] E. C. Sewell, S. H. Jacobson, A branch, bound, and remember algorithm for the simple assembly line balancing problem, INFORMS J. Comput. 24 (2012) 433–442. doi:10.1287/ijoc.1110.0462.
Morrison et al. [2014] D. R. Morrison, E. C. Sewell, S. H. Jacobson, An application of the branch, bound, and remember algorithm to a new simple assembly line balancing dataset, Eur. J. Oper. Res. 236 (2014) 403–409. doi:10.1016/j.ejor.2013.11.033.
Harvey and Ginsberg [1995] W. D. Harvey, M. L. Ginsberg, Limited discrepancy search, in: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI-95, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1995, pp. 607–613.
Beck and Perron [2000] J. C. Beck, L. Perron, Discrepancy-bounded depth first search, in: Second International Workshop on Integration of AI and OR Technologies for Combinatorial Optimization Problems, CPAIOR 2000, 2000.
Zhou and Hansen [2006] R. Zhou, E. A. Hansen, Breadth-first heuristic search, Artif. Intell. 170 (2006) 385–408. doi:10.1016/j.artint.2005.12.002.
Zhang [1998] W. Zhang, Complete anytime beam search, in: Proceedings of the 15th National Conference on Artificial Intelligence (AAAI), AAAI Press, 1998, pp. 425–430.
Libralesso et al. [2022] L. Libralesso, P. A. Focke, A. Secardin, V. Jost, Iterative beam search algorithms for the permutation flowshop, Eur. J. Oper. Res. 301 (2022) 217–234. doi:10.1016/j.ejor.2021.10.015.
Dantzig and Ramser [1959] G. B. Dantzig, J. H. Ramser, The truck dispatching problem, Manag. Sci. 6 (1959) 80–91. doi:10.1287/mnsc.6.1.80.
Toth and Vigo [2002] P. Toth, D. Vigo, Models, relaxations and exact approaches for the capacitated vehicle routing problem, Discrete Applied Mathematics 123 (2002) 487–512. doi:10.1016/S0166-218X(01)00351-1.
Hernández-Pérez and Salazar-González [2009] H. Hernández-Pérez, J. J. Salazar-González, The multi-commodity one-to-one pickup-and-delivery traveling salesman problem, Eur. J. Oper. Res. 196 (2009) 987–995. doi:10.1016/j.ejor.2008.05.009.
Gouveia and Ruthmair [2015] L. Gouveia, M. Ruthmair, Load-dependent and precedence-based models for pickup and delivery problems, Comput. Oper. Res. 63 (2015) 56–71. doi:10.1016/j.cor.2015.04.008.
Castro et al. [2020] M. P. Castro, A. A. Cire, J. C. Beck, An MDD-based lagrangian approach to the multicommodity pickup-and-delivery TSP, INFORMS J. Comput. 32 (2020) 263–278. doi:10.1287/ijoc.2018.0881.
Kantor and Rosenwein [1992] M. G. Kantor, M. B. Rosenwein, The orienteering problem with time windows, J. Oper. Res. Soc. 43 (1992) 629–635. doi:10.1057/jors.1992.88.
Golden et al. [1987] B. L. Golden, L. Levy, R. Vohra, The orienteering problem, Naval Research Logistics (NRL) 34 (1987) 307–318. doi:10.1002/1520-6750(198706)34:3<307::AID-NAV3220340302>3.0.CO;2-D.
Martello and Toth [1990] S. Martello, P. Toth, Knapsack Problems: Algorithms and Computer Implementations, John Wiley & Sons, Inc., New York, NY, USA, 1990.
Kellerer et al. [2004] H. Kellerer, U. Pferschy, D. Pisinger, Knapsack Problems, Springer, Berlin, Heidelberg, 2004. doi:10.1007/978-3-540-24777-7.
Dantzig [1957] G. B. Dantzig, Discrete-variable extremum problems, Oper. Res. 5 (1957) 266–277. doi:10.1287/opre.5.2.266.
Garey and Johnson [1979] M. R. Garey, D. S. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness, W. H. Freeman and Company, New York, 1979.
Johnson [1988] R. V. Johnson, Optimally balancing large assembly lines with ‘Fable’, Manag. Sci. 34 (1988) 240–253. doi:10.1287/mnsc.34.2.240.
Salveson [1955] M. E. Salveson, The assembly-line balancing problem, J. Ind. Eng. 6 (1955) 18–25. doi:10.1115/1.4014559.
İlker Baybars [1986] İlker Baybars, A survey of exact algorithms for the simple assembly line balancing problem, Manag. Sci. 32 (1986) 909–932. doi:10.1287/mnsc.32.8.909.
Álvarez Miranda and Pereira [2019] E. Álvarez Miranda, J. Pereira, On the complexity of assembly line balancing problems, Computers & Operations Research 108 (2019) 182–186. doi:10.1016/j.cor.2019.04.005.
Jackson [1956] J. R. Jackson, A computing procedure for a line balancing problem, Manag. Sci. 2 (1956) 261–271. doi:10.1287/mnsc.2.3.261.
Scholl and Klein [1997] A. Scholl, R. Klein, SALOME: A bidirectional branch-and-bound procedure for assembly line balancing, INFORMS J. Comput. 9 (1997) 319–335. doi:10.1287/ijoc.9.4.319.
Kolling and Carpin [2007] A. Kolling, S. Carpin, The graph-clear problem: Definition, theoretical properties and its connections to multirobot aided surveillance, in: Proceedings of IEEE International Conference on Intelligent Robots and Systems (IROS), 2007, pp. 1003–1008. doi:10.1109/IROS.2007.4399368.
Gurobi Optimization [2023] L. Gurobi Optimization, Gurobi optimizer reference manual, 2023. URL: https://www.gurobi.com, accessed on 2024-05-31.
Gendreau et al. [1998] M. Gendreau, A. Hertz, G. Laporte, M. Stan, A generalized insertion heuristic for the traveling salesman problem with time windows, Oper. Res. 46 (1998) 330–346. doi:10.1287/opre.46.3.330.
Ohlmann and Thomas [2007] J. W. Ohlmann, B. W. Thomas, A compressed-annealing heuristic for the traveling salesman problem with time windows, INFORMS J. Comput. 19 (2007) 80–90. doi:10.1287/ijoc.1050.0145.
Ascheuer [1995] N. Ascheuer, Hamiltonian Path Problems in the On-Line Optimization of Flexible Manufacturing Systems, Ph.D. thesis, Technische Universität Berlin, 1995.
Gavish and Graves [1978] B. Gavish, S. C. Graves, The Travelling Salesman Problem and Related Problems, Technical Report, Operations Research Center, Massachusetts Institute of Technology, 1978. Working Paper OR 078-78.
Uchoa et al. [2017] E. Uchoa, D. Pecin, A. Pessoa, M. Poggi, T. Vidal, A. Subramanian, New benchmark instances for the capacitated vehicle routing problem, Eur. J. Oper. Res. 257 (2017) 845–858. doi:10.1016/j.ejor.2016.08.012.
Righini and Salani [2006] G. Righini, M. Salani, Dynamic Programming for the Orienteering Problem with Time Windows, Technical Report 91, Dipartimento di Tecnologie dell’Informazione, Universita degli Studi Milano, Crema, Italy, 2006.
Montemanni and Gambardella [2009] R. Montemanni, L. M. Gambardella, Ant colony system for team orienteering problems with time windows, Found. Comput. Decis. Sci. 34 (2009) 287–306.
Vansteenwegen et al. [2009] P. Vansteenwegen, W. Souffriau, G. Vanden Berghe, D. Van Oudheusden, Iterated local search for the team orienteering problem with time windows, Comput. Oper. Res. 36 (2009) 3281–3290. doi:10.1016/j.cor.2009.03.008, new developments on hub location.
Vansteenwegen et al. [2011] P. Vansteenwegen, W. Souffriau, D. V. Oudheusden, The orienteering problem: A survey, Eur. J. Oper. Res. 209 (2011) 1–10. doi:10.1016/j.ejor.2010.03.045.
Beasley [1990] J. E. Beasley, OR-Library: Distributing test problems by electronic mail, J. Oper. Res. Soc. 41 (1990) 1069–1072. doi:10.2307/2582903.
Cacchiani et al. [2022] V. Cacchiani, M. Iori, A. Locatelli, S. Martello, Knapsack problems — an overview of recent advances. part II: Multiple, multidimensional, and quadratic knapsack problems, Comput. Oper. Res. 143 (2022) 105693. doi:10.1016/j.cor.2021.105693.
Shaw [2004] P. Shaw, A constraint for bin packing, in: Principles and Practice of Constraint Programming – CP 2004, Springer, Berlin, Heidelberg, 2004, pp. 648–662. doi:10.1007/978-3-540-30201-8_47.
Delorme et al. [2018] M. Delorme, M. Iori, S. Martello, BPPLIB: A library for bin packing and cutting stock problems, Optim. Lett. 12 (2018) 235–250. doi:10.1007/s11590-017-1192-z.
Falkenauer [1996] E. Falkenauer, A hybrid grouping genetic algorithm for bin packing, J. Heuristics 2 (1996) 5–30. doi:10.1007/BF00226291.
Scholl et al. [1997] A. Scholl, R. Klein, C. Jürgens, Bison: A fast hybrid procedure for exactly solving the one-dimensional bin packing problem, Comput. Oper. Res. 24 (1997) 627–645. doi:10.1016/S0305-0548(96)00082-2.
Wäscher and Gau [1996] G. Wäscher, T. Gau, Heuristics for the integer one-dimensional cutting stock problem: A computational study, Oper.-Res.-Spektrum 18 (1996) 131–144. doi:10.1007/BF01539705.
Schwerin and Wäscher [1997] P. Schwerin, G. Wäscher, The bin-packing problem: A problem generator and some numerical experiments with FFD packing and MTP, Int. Trans. Oper. Res. 4 (1997) 377–389. doi:10.1016/S0969-6016(97)00025-7.
Schoenfield [2002] J. E. Schoenfield, Fast, Exact Solution of Open Bin Packing Problems Without Linear Programming, Technical Report, US Army Space and Missile Defense Command, Hutsville, Alabama, USA, 2002.
Delorme et al. [2016] M. Delorme, M. Iori, S. Martello, Bin packing and cutting stock problems: Mathematical models and exact algorithms, Eur. J. Oper. Res. 255 (2016) 1–20. doi:10.1016/j.ejor.2016.04.030.
Emmons [1969] H. Emmons, One-machine sequencing to minimize certain functions of job tardiness, Oper. Res. 17 (1969) 701–715. doi:10.1287/opre.17.4.701.
Kanet [2007] J. J. Kanet, New precedence theorems for one-machine weighted tardiness, Math. Oper. Res. 32 (2007) 579–588. doi:10.1287/moor.1070.0255.
Qin et al. [2016] H. Qin, Z. Zhang, A. Lim, X. Liang, An enhanced branch-and-bound algorithm for the talent scheduling problem, Eur. J. Oper. Res. 250 (2016) 412–426. doi:10.1016/j.ejor.2015.10.002.
Chu and Stuckey [2015] G. Chu, P. J. Stuckey, Learning value heuristics for constraint programming, in: Integration of AI and OR Techniques in Constraint Programming – 12th International Conference, CPAIOR 2015, Springer International Publishing, Cham, 2015, pp. 108–123. doi:10.1007/978-3-319-18008-3_8.
Lauriere [1978] J.-L. Lauriere, A language and a program for stating and solving combinatorial problems, Artif. Intell. 10 (1978) 29–127. doi:10.1016/0004-3702(78)90029-2.
Yuen and Richardson [1995] B. J. Yuen, K. V. Richardson, Establishing the optimality of sequencing heuristics for cutting stock problems, Eur. J. Oper. Res. 84 (1995) 590–598. doi:10.1016/0377-2217(95)00025-L.
Smith and Gent [2005] B. Smith, I. Gent, Constraint modelling challenge report 2005, https://ipg.host.cs.st-andrews.ac.uk/challenge/, 2005.
Faggioli and Bentivoglio [1998] E. Faggioli, C. A. Bentivoglio, Heuristic and exact methods for the cutting sequencing problem, Eur. J. Oper. Res. 110 (1998) 564–575. doi:10.1016/S0377-2217(97)00268-3.
Chu and Stuckey [2009] G. Chu, P. J. Stuckey, Minimizing the maximum number of open stacks by customer search, in: Principles and Practice of Constraint Programming – CP 2009, Springer, Berlin, Heidelberg, 2009, pp. 242–257. doi:10.1007/978-3-642-04244-7_21.
Fusy [2009] E. Fusy, Uniform random sampling of planar graphs in linear time, Random Struct. Algor. 35 (2009) 464–522. doi:10.1002/rsa.20275.
Tange [2011] O. Tange, GNU parallel - the command-line power tool, ;login: USENIX Mag. 36 (2011) 42–47.
Berthold [2013] T. Berthold, Measuring the impact of primal heuristics, Oper. Res. Lett. 41 (2013) 611–614. doi:10.1016/j.orl.2013.08.007.
Kuroiwa et al. [2023] R. Kuroiwa, A. Shleyfman, J. C. Beck, NLM-CutPlan, 2023. URL: https://ipc2023-numeric.github.io/abstracts/NLM_CutPlan_Abstract.pdf, accessed on 2024-05-31.
Drexler et al. [2023] D. Drexler, D. Gnad, P. Höft, J. Seipp, D. Speck, S. Ståhlberg, Ragnarok, 2023. URL: https://ipc2023-classical.github.io/abstracts/planner17_ragnarok.pdf, accessed on 2024-05-31.
Gillard et al. [2021] X. Gillard, V. Coppé, P. Schaus, A. A. Cire, Improving the filtering of branch-and-bound MDD solver, in: Integration of Constraint Programming, Artificial Intelligence, and Operations Research – 18th International Conference, CPAIOR 2021, Springer International Publishing, Cham, 2021, pp. 231–247. doi:10.1007/978-3-030-78230-6_15.
Coppé et al. [2023] V. Coppé, X. Gillard, P. Schaus, Boosting decision diagram-based branch-and-bound by pre-solving with aggregate dynamic programming, in: 29th International Conference on Principles and Practice of Constraint Programming (CP 2023), Leibniz International Proceedings in Informatics (LIPIcs), Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2023, pp. 13:1–13:17. doi:10.4230/LIPIcs.CP.2023.13.
Coppé et al. [2024a] V. Coppé, X. Gillard, P. Schaus, Decision diagram-based branch-and-bound with caching for dominance and suboptimality detection, INFORMS J. Comput. (2024a). doi:10.1287/ijoc.2022.0340.
Coppé et al. [2024b] V. Coppé, X. Gillard, P. Schaus, Modeling and exploiting dominance rules for discrete optimization with decision diagrams, in: Integration of Constraint Programming, Artificial Intelligence, and Operations Research – 21st International Conference, CPAIOR 2024, 2024b.
Baptiste et al. [2001] P. Baptiste, C. Pape, W. Nuijten, Constraint-Based Scheduling: Applying Constraint Programming to Scheduling Problems, International Series in Operations Research & Management Science, Springer, New York, NY, 2001. doi:10.1007/978-1-4615-1479-4.
Lenstra et al. [1977] J. Lenstra, A. Rinnooy Kan, P. Brucker, Complexity of machine scheduling problems 1 (1977) 343–362. doi:10.1016/S0167-5060(08)70743-X.
Abdul-Razaq et al. [1990] T. S. Abdul-Razaq, C. N. Potts, L. N. V. Wassenhove, A survey of algorithms for the single machine total weighted tardiness scheduling problem, Discrete Appl. Math. 26 (1990) 235–253. doi:10.1016/0166-218X(90)90103-J.
Nethercote et al. [2007] N. Nethercote, P. J. Stuckey, R. Becket, S. Brand, G. J. Duck, G. Tack, MiniZinc: Towards a standard CP modelling language, in: Principles and Practice of Constraint Programming – CP 2007, Springer, Berlin, Heidelberg, 2007, pp. 529–543.
Linhares and Yanasse [2002] A. Linhares, H. H. Yanasse, Connections between cutting-pattern sequencing, vlsi design, and flexible machines, Computers & Operations Research 29 (2002) 1759–1772. doi:10.1016/S0305-0548(01)00054-5.
Hoffmann [2003] J. Hoffmann, The Metric-FF planning system: Translating ”ignoring delete lists” to numeric state variables, J. Artif. Intell. Res. 20 (2003) 291–341. doi:10.1613/jair.1144.
Scala et al. [2017] E. Scala, P. Haslum, D. Magazzeni, S. Thiébaux, Landmarks for numeric planning problems, in: Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI-17, International Joint Conferences on Artificial Intelligence Organization, 2017, pp. 4384–4390. doi:10.24963/ijcai.2017/612, main track.
Piacentini et al. [2018a] C. Piacentini, M. Castro, A. Cire, J. C. Beck, Compiling optimal numeric planning to mixed integer linear programming, in: Proceedings of the 28th International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, 2018a, pp. 383–387. doi:10.1609/icaps.v28i1.13919.
Piacentini et al. [2018b] C. Piacentini, M. P. Castro, A. A. Cire, J. C. Beck, Linear and integer programming-based heuristics for cost-optimal numeric planning, in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, Palo Alto, California USA, 2018b, pp. 6254–6261. doi:10.1609/aaai.v32i1.12082.
Scala et al. [2020] E. Scala, P. Haslum, S. Thiébaux, M. Ramírez, Subgoaling techniques for satisficing and optimal numeric planning, J. Artif. Intell. Res. 68 (2020) 691–752. doi:10.1613/jair.1.11875.
Leofante et al. [2020] F. Leofante, E. Giunchiglia, E. Ábráham, A. Tacchella, Optimal planning modulo theories, in: Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI-20, International Joint Conferences on Artificial Intelligence Organization, 2020, pp. 4128–4134. doi:10.24963/ijcai.2020/571, main track.
Kuroiwa et al. [2022a] R. Kuroiwa, A. Shleyfman, C. Piacentini, M. P. Castro, J. C. Beck, The LM-cut heuristic family for optimal numeric planning with simple conditions, J. Artif. Intell. Res. 75 (2022a) 1477–1548. doi:10.1613/jair.1.14034.
Kuroiwa et al. [2022b] R. Kuroiwa, A. Shleyfman, J. C. Beck, LM-cut heuristics for optimal linear numeric planning, in: Proceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, Palo Alto, California USA, 2022b, pp. 203–212. doi:10.1609/icaps.v32i1.19803.
Shleyfman et al. [2023] A. Shleyfman, R. Kuroiwa, J. C. Beck, Symmetry detection and breaking in linear cost-optimal numeric planning, in: Proceedings of the 33rd International Conference on Automated Planning and Scheduling (ICAPS), AAAI Press, Palo Alto, California USA, 2023, pp. 393–401. doi:10.1609/icaps.v33i1.27218.
Kuroiwa et al. [2023] R. Kuroiwa, A. Shleyfman, J. Beck, Extracting and exploiting bounds of numeric variables for optimal linear numeric planning, in: ECAI 2023 – 26th European Conference on Artificial Intelligence, volume 372 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2023, pp. 1332–1339. doi:10.3233/FAIA230409.